aboutsummaryrefslogtreecommitdiff
path: root/gcc/expr.cc
AgeCommit message (Collapse)AuthorFilesLines
2024-07-17expr: Allow same precision modes conversion between {ibm_extended, ↵Kewen Lin1-9/+30
ieee_quad}_format With some historical reasons, rs6000 defines KFmode, TFmode and IFmode to have different mode precisions, but it causes some issues and needs some workarounds such as PR112993. So we are going to make all rs6000 128 bit scalar FP modes have 128 bit precision. Be prepared for that, this patch is to make function convert_mode_scalar allow same precision FP modes conversion if their underlying formats are ibm_extended_format and ieee_quad_format respectively, just like the existing special treatment on arm_bfloat_half_format <-> ieee_half_format. It also factors out all the relevant checks into a lambda function. Besides, similar to ieee fp16 -> bfloat conversion, it adopts trunc_optab rather than sext_optab for ibm128 to ieee128 conversion. PR target/112993 gcc/ChangeLog: * expr.cc (convert_mode_scalar): Allow same precision conversion between scalar floating point modes if whose underlying format is ibm_extended_format or ieee_quad_format, and refactor assertion with new lambda function acceptable_same_precision_modes. Use trunc_optab rather than sext_optab for ibm128 to ieee128 conversion. * optabs-libfuncs.cc (gen_trunc_conv_libfunc): Use trunc_optab rather than sext_optab for ibm128 to ieee128 conversion.
2024-06-18Make more use of force_lowpart_subregRichard Sandiford1-8/+9
This patch makes target-independent code use force_lowpart_subreg instead of simplify_gen_subreg and lowpart_subreg in some places. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created), or at least would be invalid to call when !can_create_pseudo_p () and temporaries are needed. (2) The value is obviously an rvalue rather than an lvalue. Doing this should reduce the likelihood of bugs like PR115464 occuring in other situations. gcc/ * builtins.cc (expand_builtin_issignaling): Use force_lowpart_subreg instead of simplify_gen_subreg and lowpart_subreg. * expr.cc (convert_mode_scalar, expand_expr_real_2): Likewise. * optabs.cc (expand_doubleword_mod): Likewise.
2024-06-18Make more use of force_subregRichard Sandiford1-15/+12
This patch makes target-independent code use force_subreg instead of simplify_gen_subreg in some places. The criteria were: (1) The code is obviously specific to expand (where new pseudos can be created), or at least would be invalid to call when !can_create_pseudo_p () and temporaries are needed. (2) The value is obviously an rvalue rather than an lvalue. (3) The offset wasn't a simple lowpart or highpart calculation; a later patch will deal with those. Doing this should reduce the likelihood of bugs like PR115464 occuring in other situations. gcc/ * expmed.cc (store_bit_field_using_insv): Use force_subreg instead of simplify_gen_subreg. (store_bit_field_1): Likewise. (extract_bit_field_as_subreg): Likewise. (extract_integral_bit_field): Likewise. (emit_store_flag_1): Likewise. * expr.cc (convert_move): Likewise. (convert_modes): Likewise. (emit_group_load_1): Likewise. (emit_group_store): Likewise. (expand_assignment): Likewise.
2024-06-13expand: constify sepops operand to expand_expr_real_2 and ↵Andrew Pinski1-4/+4
expand_widen_pattern_expr [PR113212] While working on an expand patch back in January I noticed that the first argument (of sepops type) of expand_expr_real_2 could be constified as it was not to be touched by the function (nor should it be). There is code in internal-fn.cc that depends on expand_expr_real_2 not touching the ops argument so constification makes this more obvious. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR middle-end/113212 * expr.h (const_seqpops): New typedef. (expand_expr_real_2): Constify the first argument. * optabs.cc (expand_widen_pattern_expr): Likewise. * optabs.h (expand_widen_pattern_expr): Likewise. * expr.cc (expand_expr_real_2): Likewise (do_store_flag): Likewise. Remove incorrect store to ops->code. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-06-13[APX CCMP] Add targetm.have_ccmp hook [PR115370]Hongyu Wang1-1/+1
In cfgexpand, there is an optimization for branch which tests targetm.gen_ccmp_first == NULL. However for target like x86-64, the hook was implemented but it does not indicate that ccmp was enabled. Add a new target hook TARGET_HAVE_CCMP and replace the middle-end check for the existance of gen_ccmp_first to avoid misoptimization. gcc/ChangeLog: PR target/115370 PR target/115463 * target.def (have_ccmp): New target hook. * targhooks.cc (default_have_ccmp): New function. * targhooks.h (default_have_ccmp): New prototype. * doc/tm.texi.in: Add TARGET_HAVE_CCMP. * doc/tm.texi: Regenerate. * cfgexpand.cc (expand_gimple_cond): Call targetm.have_ccmp instead of checking if targetm.gen_ccmp_first exists. * expr.cc (expand_expr_real_gassign): Likewise. * config/i386/i386.cc (ix86_have_ccmp): New target hook to check if APX_CCMP enabled. (TARGET_HAVE_CCMP): Define.
2024-05-14[1/3] expr: Export clear_by_pieces()Christoph Müllner1-5/+1
Make clear_by_pieces() available to other parts of the compiler, similar to store_by_pieces(). gcc/ChangeLog: * expr.cc (clear_by_pieces): Remove static from clear_by_pieces. * expr.h (clear_by_pieces): Add prototype for clear_by_pieces.
2024-05-07expansion: Use __trunchfbf2 calls rather than __extendhfbf2 [PR114907]Jakub Jelinek1-2/+10
The HF and BF modes have the same size/precision and neither is a subset nor superset of the other. So, using either __extendhfbf2 or __trunchfbf2 is weird. The expansion apparently emits __extendhfbf2, but on the libgcc side we apparently have __trunchfbf2 implemented. I think it is easier to switch to using what is available rather than adding new entrypoints to libgcc, even alias, because this is backportable. 2024-05-07 Jakub Jelinek <jakub@redhat.com> PR middle-end/114907 * expr.cc (convert_mode_scalar): Use trunc_optab rather than sext_optab for HF->BF conversions. * optabs-libfuncs.cc (gen_trunc_conv_libfunc): Likewise. * gcc.dg/pr114907.c: New test.
2024-04-09Fix up duplicated words mostly in comments, part 2Jakub Jelinek1-1/+1
Another patch from eyeballing git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 ' output, this time in gcc/ subdirectory. 2024-04-09 Jakub Jelinek <jakub@redhat.com> gcc/ * expr.cc (convert_mode_scalar): Fix duplicated words in comment; into into -> it into. * function.h (function::cond_uids): Fix duplicated words in comment; same same -> same. * config/riscv/riscv-vector-costs.cc (costs::adjust_vect_cost_per_loop): Fix duplicated words in comment; model model -> model. * config/riscv/riscv-vector-builtins-shapes.cc (build_base): Fix duplicated words in comment; for for -> for. * config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Fix duplicated words in comment; more more -> more. * config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Fix duplicated words in comment; be be -> be. * tree-profile.cc (masking_vectors): Fix duplicated words in comment; has has -> has, the the -> the. * value-range.cc (irange::set_range_from_bitmask): Fix duplicated words in comment; the the -> the. * gcov.cc (add_condition_counts): Fix duplicated words in comment; to to -> to. * vr-values.cc (get_scev_info): Fix duplicated words in comment; the the -> to the. * tree-vrp.cc (fully_replaceable): Fix duplicated words in comment; by by -> by. * mode-switching.cc (single_succ_confluence_n): Fix duplicated words in comment; the the -> the. * tree-ssa-phiopt.cc (value_replacement): Fix duplicated words in comment; can can -> we can. * gimple-range-phi.cc (phi_analyzer::process_phi): Fix duplicated words in comment; it it -> it is. * tree-ssa-sccvn.cc (visit_phi): Fix duplicated words in comment; to to -> to. * rtl-ssa/accesses.h (use_info::next_debug_insn_use): Fix duplicated words in comment; if if -> if. * doc/options.texi (InverseMask): Fix duplicated words; and and -> and. Change take to takes. * doc/invoke.texi (fanalyzer-undo-inlining): Fix duplicated words; be be -> be. (-minline-memops-threshold): Likewise. gcc/analyzer/ * analyzer.opt (Wanalyzer-undefined-behavior-strtok): Fix duplicated words; in in -> in. * program-state.cc (sm_state_map::replay_call_summary): Fix duplicated words in comment; to to -> to. (program_state::replay_call_summary): Likewise. * region-model.cc (region_model::replay_call_summary): Likewise. gcc/c/ * c-decl.cc (previous_tag): Fix duplicated words in comment; the the -> the. (diagnose_mismatched_decls): Fix duplicated words in comment; about about -> about. gcc/cp/ * constexpr.cc (build_new_constexpr_heap_type): Fix duplicated words in comment; is is -> is. * cp-tree.def (CO_RETURN_EXPR): Fix duplicated words in comment; for for -> for. * parser.cc (fixup_blocks_walker): Fix duplicated words in comment; is is -> is. * semantics.cc (fixup_template_type): Fix duplicated words in comment; for for -> for. (finish_omp_for): Fix duplicated words in comment; the the -> the. * pt.cc (more_specialized_fn): Fix duplicated words in comment; think think -> think. (type_targs_deducible_from): Fix duplicated words in comment; the the -> the. gcc/jit/ * docs/topics/expressions.rst (Constructor expressions): Fix duplicated words; have have -> have.
2024-04-03expr: Fix up emit_push_insn [PR114552]Jakub Jelinek1-3/+6
r13-990 added optimizations in multiple spots to optimize during expansion storing of constant initializers into targets. In the load_register_parameters and expand_expr_real_1 cases, it checks it has a tree as the source and so knows we are reading that whole decl's value, so the code is fine as is, but in the emit_push_insn case it checks for a MEM from which something is pushed and checks for SYMBOL_REF as the MEM's address, but still assumes the whole object is copied, which as the following testcase shows might not always be the case. In the testcase, k is 6 bytes, then 2 bytes of padding, then another 4 bytes, while the emit_push_insn wants to store just the 6 bytes. The following patch simply verifies it is the whole initializer that is being stored, I think that is best thing to do so late in GCC 14 cycle as well for backporting. For GCC 15, perhaps the code could stop requiring it must be at offset zero, nor that the size is equal, but could use get_symbol_constant_value/fold_ctor_reference gimple-fold APIs to actually extract just part of the initializer if we e.g. push just some subset (of course, still verify that it is a subset). For sizes which are power of two bytes and we have some integer modes, we could use as type for fold_ctor_reference corresponding integral types, otherwise dunno, punt or use some structure (e.g. try to find one in the initializer?), whatever. But even in the other spots it could perhaps handle loading of COMPONENT_REFs or MEM_REFs from the .rodata vars. 2024-04-03 Jakub Jelinek <jakub@redhat.com> PR middle-end/114552 * expr.cc (emit_push_insn): Only use store_constructor for immediate_const_ctor_p if int_expr_size matches size. * gcc.c-torture/execute/pr114552.c: New test.
2024-03-15expand: EXTEND_BITINT CALL_EXPR results [PR114332]Jakub Jelinek1-1/+2
The x86-64 and aarch64 psABIs (and the unwritten ia64 psABI part) say that the padding bits of _BitInt are undefined, while the expansion internally typically assumes that non-mode precision integers are sign/zero extended and extends after operations. We handle that mismatch with EXTEND_BITINT done when reading from untrusted sources like function arguments, reading _BitInt from memory etc. but otherwise keep relying on stuff being extended internally (say in pseudos). The return value of a function is an ABI boundary though too and we need to extend that too. 2024-03-15 Jakub Jelinek <jakub@redhat.com> PR middle-end/114332 * expr.cc (expand_expr_real_1): EXTEND_BITINT also CALL_EXPR results.
2024-03-04vect: Fix integer overflow calculating maskAndrew Stubbs1-6/+6
The masks and bitvectors were broken when nunits==32 on hosts where int is 32-bit. gcc/ChangeLog: * dojump.cc (do_compare_and_jump): Use full-width integers for shifts. * expr.cc (store_constructor): Likewise. (do_store_flag): Likewise.
2024-02-23expr: Fix REDUCE_BIT_FIELD in multiplication expansion [PR114054]Jakub Jelinek1-6/+6
The following testcase ICEs, because the REDUCE_BIT_FIELD macro uses the target variable implicitly: #define REDUCE_BIT_FIELD(expr) (reduce_bit_field \ ? reduce_to_bit_field_precision ((expr), \ target, \ type) \ : (expr)) and so when the code below reuses the target variable, documented to be The value may be stored in TARGET if TARGET is nonzero. TARGET is just a suggestion; callers must assume that the rtx returned may not be the same as TARGET. for something unrelated (the value that should be returned), this misbehaves (in the testcase target is set to a CONST_INT, which has VOIDmode and reduce_to_bit_field_precision assert checking doesn't like that). Needed to say that If TARGET is CONST0_RTX, it means that the value will be ignored. but in expand_expr_real_2 does at the start: ignore = (target == const0_rtx || ((CONVERT_EXPR_CODE_P (code) || code == COND_EXPR || code == VIEW_CONVERT_EXPR) && TREE_CODE (type) == VOID_TYPE)); /* We should be called only if we need the result. */ gcc_assert (!ignore); - so such target is mainly meant for calls and the like in other routines. Certainly doesn't expect that target changes from not being ignored initially to ignore later on and other CONST_INT results as well as anything which is not an object into which anything can be stored. So, the following patch fixes that by using a more appripriate temporary for the result, which other code is using. 2024-02-23 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/114054 * expr.cc (expand_expr_real_2) <case MULT_EXPR>: Use temp variable instead of target parameter for result. * gcc.dg/bitint-92.c: New test.
2024-02-14middle-end/113576 - zero padding of vector bools when expanding comparesRichard Biener1-0/+17
The following zeros paddings of vector bools when expanding compares and the mode used for the compare is an integer mode. In that case targets cannot distinguish between a 4 element and 8 element vector compare (both get to the QImode compare optab) so we have to do the job in the middle-end. PR middle-end/113576 * expr.cc (do_store_flag): For vector bool compares of vectors with padding zero that. * dojump.cc (do_compare_and_jump): Likewise.
2024-02-09lower-bitint: Fix handling of VIEW_CONVERT_EXPRs to minimally supported huge ↵Jakub Jelinek1-1/+4
INTEGER_TYPEs [PR113783] On the following testcases memcpy lowering folds the calls to reading and writing of MEM_REFs with huge INTEGER_TYPEs - uint256_t with OImode or uint512_t with XImode. Further optimization turn the load from MEM_REF from the large/huge _BitInt var into VIEW_CONVERT_EXPR from it to the uint256_t/uint512_t. The backend doesn't really support those except for "movoi"/"movxi" insns, so it isn't possible to handle it like casts to supportable INTEGER_TYPEs where we can construct those from individual limbs - there are no OImode/XImode shifts and the like we can use. So, the following patch makes sure for such VCEs that the SSA_NAME operand of the VCE lives in memory and then turns it into a VIEW_CONVERT_EXPR so that we actually load the OImode/XImode integer from memory (i.e. a mov). We need to make sure those aren't merged with other operations in the gimple_lower_bitint hunks. For SSA_NAMEs which have underlying VAR_DECLs that is all we need, those VAR_DECL have ARRAY_TYPEs. For SSA_NAMEs which have underlying PARM_DECLs or RESULT_DECLs those have BITINT_TYPE and I had to tweak expand_expr_real_1 for that so that it doesn't try convert_modes on those when one of the modes is BLKmode - we want to fall through into the adjust_address on the MEM. 2024-02-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/113783 * gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Look through VIEW_CONVERT_EXPR for final cast checks. Handle VIEW_CONVERT_EXPRs from large/huge _BitInt to > MAX_FIXED_MODE_SIZE INTEGER_TYPEs. (gimple_lower_bitint): Don't merge mergeable operations or other casts with VIEW_CONVERT_EXPRs to > MAX_FIXED_MODE_SIZE INTEGER_TYPEs. * expr.cc (expand_expr_real_1): Don't use convert_modes if either mode is BLKmode. * gcc.dg/bitint-88.c: New test.
2024-01-29middle-end/113622 - handle store with variable index to registerRichard Biener1-3/+20
The following implements storing to a non-MEM_P with a variable offset. We usually avoid this by forcing expansion to memory but this doesn't work for hard register variables. The solution is to spill and operate on the stack. PR middle-end/113622 * expr.cc (expand_assignment): Spill hard registers if we index them with a variable offset. * gcc.target/i386/pr113622-2.c: New testcase. * gcc.target/i386/pr113622-3.c: Likewise.
2024-01-23aarch64/expr: Use ccmp when the outer expression is used twice [PR100942]Andrew Pinski1-45/+58
Ccmp is not used if the result of the and/ior is used by both a GIMPLE_COND and a GIMPLE_ASSIGN. This improves the code generation here by using ccmp in this case. Two changes is required, first we need to allow the outer statement's result be used more than once. The second change is that during the expansion of the gimple, we need to try using ccmp. This is needed because we don't use expand the ssa name of the lhs but rather expand directly from the gimple. A small note on the ccmp_4.c testcase, we should be able to get slightly better than with this patch but it is one extra instruction compared to before. PR target/100942 gcc/ChangeLog: * ccmp.cc (ccmp_candidate_p): Add outer argument. Allow if the outer is true and the lhs is used more than once. (expand_ccmp_expr): Update call to ccmp_candidate_p. * expr.h (expand_expr_real_gassign): Declare. * expr.cc (expand_expr_real_gassign): New function, split out from... (expand_expr_real_1): ...here. * cfgexpand.cc (expand_gimple_stmt_1): Use expand_expr_real_gassign. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ccmp_3.c: New test. * gcc.target/aarch64/ccmp_4.c: New test. * gcc.target/aarch64/ccmp_5.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com>
2024-01-21Re: [PATCH] Avoid ICE with m68k-elf -malign-int and libcallsMikael Pettersson1-2/+3
>> emit_library_call_value_1 calls emit_push_insn with NULL_TREE >> for TYPE. Sometimes emit_push_insn needs to assign a temp with >> that TYPE, which causes a segfault. >> >> Fixed by computing the TYPE from MODE when needed. >> >> Original patch by Thorsten Otto. >> [ ... ] > This really needs to happen in the two call paths which pass in > NULL_TREE for the type. Note how the type is used to determine padding > earlier in emit_push_insn. That would also make the code more > consistent with the comment before emit_push_insn which implies that > both MODE and TYPE are valid. > > > Additionally you should bootstrap and regression test this patch on at > least one target. Updated as requested, and bootstrapped and tested on {x86_64,aarch64,m68k}-linux-gnu without regressions. gcc/ PR target/82420 PR target/111279 * calls.cc (emit_library_call_value_1): Pass valid TYPE to emit_push_insn. * expr.cc (emit_push_insn): Likewise. gcc/testsuite/ PR target/82420 * gcc.target/m68k/pr82420.c: New test. Co-authored-by: Thorsten Otto <admin@tho-otto.de>
2024-01-19expansion: Fix ICEs with BLKmode VIEW_CONVERT_EXPR around non-BLKmode VAR_DECLsJakub Jelinek1-0/+4
On aarch64 the backend decides to use non-BLKmode for some arrays like unsigned long[4] - OImode in that case, but the corresponding BITINT_TYPEs have BLKmode (like structures containing that many limb elements). This later causes ICEs durring expansion when expanding VIEW_CONVERT_EXPR from non-BLKmode VAR_DECL to BLKmode BITINT_TYPE. The following fix contains two parts, the discover_nonconstant_array_refs_r is make sure we force such variables into memory and the expand_expr_real_1 change makes sure we don't try to extract a bitfield or something similar which doesn't really work for BLKmode - as op0 is a MEM, all we need is the op0 = adjust_address (op0, mode, 0); at the end to change the MEM's mode to BLKmode. 2024-01-19 Jakub Jelinek <jakub@redhat.com> Richard Biener <rguenther@suse.de> * cfgexpand.cc (discover_nonconstant_array_refs_r): Force non-BLKmode VAR_DECLs referenced in BLKmode VIEW_CONVERT_EXPRs into memory. * expr.cc (expand_expr_real_1) <case VIEW_CONVERT_EXPR>: Do nothing but adjust_address also for BLKmode mode and MEM op0.
2024-01-11expr: Limit the store flag optimization for single bit to non-vectors [PR113322]Andrew Pinski1-0/+2
The problem here is after the recent vectorizer improvements, we end up with a comparison against a vector bool 0 which then tries expand_single_bit_test which is not expecting vector comparisons at all. The IR was: vector(4) <signed-boolean:1> mask_patt_5.13; _Bool _12; mask_patt_5.13_44 = vect_perm_even_41 != { 0.0, 1.0e+0, 2.0e+0, 3.0e+0 }; _12 = mask_patt_5.13_44 == { 0, 0, 0, 0 }; and we tried to call expand_single_bit_test for the last comparison. Rejecting the vector comparison is needed. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/113322 gcc/ChangeLog: * expr.cc (do_store_flag): Don't try single bit tests with comparison on vector types. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr113322-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-11middle-end/112740 - vector boolean CTOR expansion issueRichard Biener1-3/+5
The optimization to expand uniform boolean vectors by sign-extension works only for dense masks but it failed to check that. PR middle-end/112740 * expr.cc (store_constructor): Check the integer vector mask has a single bit per element before using sign-extension to expand an uniform vector. * gcc.dg/pr112740.c: New testcase.
2024-01-04Improved RTL expansion of field assignments into promoted registers.Roger Sayle1-5/+18
This patch fixes PR rtl-optmization/104914 by tweaking/improving the way the fields are written into a pseudo register that needs to be kept sign extended. The motivating example from the bugzilla PR is: extern void ext(int); void foo(const unsigned char *buf) { int val; ((unsigned char*)&val)[0] = *buf++; ((unsigned char*)&val)[1] = *buf++; ((unsigned char*)&val)[2] = *buf++; ((unsigned char*)&val)[3] = *buf++; if(val > 0) ext(1); else ext(0); } which at the end of the tree optimization passes looks like: void foo (const unsigned char * buf) { int val; unsigned char _1; unsigned char _2; unsigned char _3; unsigned char _4; int val.5_5; <bb 2> [local count: 1073741824]: _1 = *buf_7(D); MEM[(unsigned char *)&val] = _1; _2 = MEM[(const unsigned char *)buf_7(D) + 1B]; MEM[(unsigned char *)&val + 1B] = _2; _3 = MEM[(const unsigned char *)buf_7(D) + 2B]; MEM[(unsigned char *)&val + 2B] = _3; _4 = MEM[(const unsigned char *)buf_7(D) + 3B]; MEM[(unsigned char *)&val + 3B] = _4; val.5_5 = val; if (val.5_5 > 0) goto <bb 3>; [59.00%] else goto <bb 4>; [41.00%] <bb 3> [local count: 633507681]: ext (1); goto <bb 5>; [100.00%] <bb 4> [local count: 440234144]: ext (0); <bb 5> [local count: 1073741824]: val ={v} {CLOBBER(eol)}; return; } Here four bytes are being sequentially written into the SImode value val. On some platforms, such as MIPS64, this SImode value is kept in a 64-bit register, suitably sign-extended. The function expand_assignment contains logic to handle this via SUBREG_PROMOTED_VAR_P (around line 6264 in expr.cc) which outputs an explicit extension operation after each store_field (typically insv) to such promoted/extended pseudos. The first observation is that there's no need to perform sign extension after each byte in the example above; the extension is only required after changes to the most significant byte (i.e. to a field that overlaps the most significant bit). The bug fix is actually a bit more subtle, but at this point during code expansion it's not safe to use a SUBREG when sign-extending this field. Currently, GCC generates (sign_extend:DI (subreg:SI (reg:DI) 0)) but combine (and other RTL optimizers) later realize that because SImode values are always sign-extended in their 64-bit hard registers that this is a no-op and eliminates it. The trouble is that it's unsafe to refer to the SImode lowpart of a 64-bit register using SUBREG at those critical points when temporarily the value isn't correctly sign-extended, and the usual backend invariants don't hold. At these critical points, the middle-end needs to use an explicit TRUNCATE rtx (as this isn't a TRULY_NOOP_TRUNCATION), so that the explicit sign-extension looks like (sign_extend:DI (truncate:SI (reg:DI)), which avoids the problem. 2024-01-04 Roger Sayle <roger@nextmovesoftware.com> Jeff Law <jlaw@ventanamicro.com> gcc/ChangeLog PR rtl-optimization/104914 * expr.cc (expand_assignment): When target is SUBREG_PROMOTED_VAR_P a sign or zero extension is only required if the modified field overlaps the SUBREG's most significant bit. On MODE_REP_EXTENDED targets, don't refer to the temporarily incorrectly extended value using a SUBREG, but instead generate an explicit TRUNCATE rtx.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-11-finline-stringops: avoid too-wide smallest_int_mode_for_size [PR112784]Alexandre Oliva1-11/+9
smallest_int_mode_for_size may abort when the requested mode is not available. Call int_mode_for_size instead, that signals the unsatisfiable request in a more graceful way. for gcc/ChangeLog PR middle-end/112784 * expr.cc (emit_block_move_via_loop): Call int_mode_for_size for maybe-too-wide sizes. (emit_block_cmp_via_loop): Likewise. for gcc/testsuite/ChangeLog PR middle-end/112784 * gcc.target/i386/avx512cd-inline-stringops-pr112784.c: New.
2023-12-11expr: catch more `a*bool` while expanding [PR 112935]Andrew Pinski1-2/+3
After r14-1655-g52c92fb3f40050 (and the other commits which touch zero_one_valued_p), we end up with a with `bool * a` but where the bool is an SSA name that might not have non-zero bits set on it (to 0x1) even though it does the non-zero bits would be 0x1. The case of coremarks, it is only phiopt4 which adds the new ssa name and nothing afterwards updates the nonzero bits on it. This fixes the regression by using gimple_zero_one_valued_p rather than tree_nonzero_bits to match the cases where the SSA_NAME didn't have the non-zero bits set. gimple_zero_one_valued_p handles one level of cast and also and an `&`. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR middle-end/112935 * expr.cc (expand_expr_real_2): Use gimple_zero_one_valued_p instead of tree_nonzero_bits to find boolean defined expressions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2023-12-07expr: Handle BITINT_TYPE in count_type_elements [PR112881]Jakub Jelinek1-0/+1
The following testcaser ICEs during gimplification, because count_type_elements doesn't handle BITINT_TYPE. It should handle it like other integral types. 2023-12-07 Jakub Jelinek <jakub@redhat.com> PR middle-end/112881 * expr.cc (count_type_elements): Handle BITINT_TYPE like INTEGER_TYPE. * gcc.dg/bitint-50.c: New test.
2023-11-29Introduce -finline-stringopsAlexandre Oliva1-17/+379
try_store_by_multiple_pieces was added not long ago, enabling variable-sized memset to be expanded inline when the worst-case in-range constant length would, using conditional blocks with powers of two to cover all possibilities of length and alignment. This patch introduces -finline-stringops[=fn] to request expansions to start with a loop, so as to still take advantage of known alignment even with long lengths, but without necessarily adding store blocks for every power of two. This makes it possible for the supported stringops (memset, memcpy, memmove, memset) to be expanded, even if storing a single byte per iteration. Surely efficient implementations can run faster, with a pre-loop to increase alignment, but that would likely be excessive for inline expansions. Still, in some cases, such as in freestanding environments, users prefer to inline such stringops, especially those that the compiler may introduce itself, even if the expansion is not as performant as a highly optimized C library implementation could be, to avoid depending on a C runtime library. for gcc/ChangeLog * expr.cc (emit_block_move_hints): Take ctz of len. Obey -finline-stringops. Use oriented or sized loop. (emit_block_move): Take ctz of len, and pass it on. (emit_block_move_via_sized_loop): New. (emit_block_move_via_oriented_loop): New. (emit_block_move_via_loop): Take incr. Move an incr-sized block per iteration. (emit_block_cmp_via_cmpmem): Take ctz of len. Obey -finline-stringops. (emit_block_cmp_via_loop): New. * expr.h (emit_block_move): Add ctz of len defaulting to zero. (emit_block_move_hints): Likewise. (emit_block_cmp_hints): Likewise. * builtins.cc (expand_builtin_memory_copy_args): Pass ctz of len to emit_block_move_hints. (try_store_by_multiple_pieces): Support starting with a loop. (expand_builtin_memcmp): Pass ctz of len to emit_block_cmp_hints. (expand_builtin): Allow inline expansion of memset, memcpy, memmove and memcmp if requested. * common.opt (finline-stringops): New. (ilsop_fn): New enum. * flag-types.h (enum ilsop_fn): New. * doc/invoke.texi (-finline-stringops): Add. for gcc/testsuite/ChangeLog * gcc.dg/torture/inline-mem-cmp-1.c: New. * gcc.dg/torture/inline-mem-cpy-1.c: New. * gcc.dg/torture/inline-mem-cpy-cmp-1.c: New. * gcc.dg/torture/inline-mem-move-1.c: New. * gcc.dg/torture/inline-mem-set-1.c: New.
2023-11-24Clean up by_pieces_ninsnsHaochen Gui1-11/+8
The by pieces compare can be implemented by overlapped operations. So it should be taken into consideration when doing the adjustment for overlap operations. The mode returned from widest_fixed_size_mode_for_size is already checked with mov_optab in by_pieces_mode_supported_p called by widest_fixed_size_mode_for_size. So it is no need to check mov_optab again in by_pieces_ninsns. The patch fixes these issues. gcc/ * expr.cc (by_pieces_ninsns): Include by pieces compare when do the adjustment for overlap operations. Replace mov_optab checks with gcc assertion.
2023-11-23expr: Fix &bitint_var handling in initializers [PR112336]Jakub Jelinek1-0/+1
As the following testcase shows, we ICE when trying to emit ADDR_EXPR of a bitint variable which doesn't have mode width. The problem is in the EXTEND_BITINT stuff which makes sure we treat the padding bits on memory reads from user bitint vars as undefined. When expanding ADDR_EXPR on such vars inside outside of initializers, expand_expr_addr* uses EXPAND_CONST_ADDRESS modifier and EXTEND_BITINT does nothing, but in initializers it keeps using EXPAND_INITIALIZER modifier. So, we need to treat EXPAND_INITIALIZER the same as EXPAND_CONST_ADDRESS for this regard. 2023-11-23 Jakub Jelinek <jakub@redhat.com> PR middle-end/112336 * expr.cc (EXTEND_BITINT): Don't call reduce_to_bit_field_precision if modifier is EXPAND_INITIALIZER. * gcc.dg/bitint-41.c: New test.
2023-11-14Fix ICE generating uniform vector masksAndrew Stubbs1-1/+1
Most targets have an "and" instructions for their vector mask size, but RISC-V only has DImode "and". Fixed by allowing wider instruction modes. gcc/ChangeLog: PR target/112481 * expr.cc (store_constructor): Use OPTAB_WIDEN for mask adjustment.
2023-11-10vect: Don't set excess bits in unform masksAndrew Stubbs1-2/+14
AVX ignores any excess bits in the mask (at least for vector sizes >=8), but AMD GCN magically uses a larger vector than was intended (the smaller sizes are "fake"), leading to wrong-code. This patch fixes amdgcn execution failures in gcc.dg/vect/pr81740-1.c, gfortran.dg/c-interop/contiguous-1.f90, gfortran.dg/c-interop/ff-descriptor-7.f90, and others. gcc/ChangeLog: * expr.cc (store_constructor): Add "and" operation to uniform mask generation.
2023-10-30Expand: Checking available optabs for scalar modes in by pieces operationsHaochen Gui1-10/+13
The former patch (f08ca5903c7) examines the scalar modes by target hook scalar_mode_supported_p. It causes some i386 regression cases as XImode and OImode are not enabled in i386 target function. This patch examines the scalar mode by checking if the corresponding optabs are available for the mode. gcc/ PR target/111449 * expr.cc (qi_vector_mode_supported_p): Rename to... (by_pieces_mode_supported_p): ...this, and extends it to do the checking for both scalar and vector mode. (widest_fixed_size_mode_for_size): Call by_pieces_mode_supported_p to examine the mode. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Likewise.
2023-10-23Expand: Enable vector mode for by pieces comparesHaochen Gui1-34/+61
Vector mode compare instructions are efficient for equality compare on rs6000. This patch refactors the codes of by pieces operation to enable vector mode for compare. gcc/ PR target/111449 * expr.cc (can_use_qi_vectors): New function to return true if we know how to implement OP using vectors of bytes. (qi_vector_mode_supported_p): New function to check if optabs exists for the mode and certain by pieces operations. (widest_fixed_size_mode_for_size): Replace the second argument with the type of by pieces operations. Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. Call scalar_mode_supported_p to check if the scalar mode is supported. (by_pieces_ninsns): Pass the type of by pieces operation to widest_fixed_size_mode_for_size. (class op_by_pieces_d): Remove m_qi_vector_mode. Add m_op to record the type of by pieces operations. (op_by_pieces_d::op_by_pieces_d): Change last argument to the type of by pieces operations, initialize m_op with it. Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::get_usable_mode): Pass m_op to function widest_fixed_size_mode_for_size. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call can_use_qi_vectors and qi_vector_mode_supported_p to do the check. (op_by_pieces_d::run): Pass m_op to function widest_fixed_size_mode_for_size. (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES. (store_by_pieces_d::store_by_pieces_d): Set m_op with the op. (can_store_by_pieces): Pass the type of by pieces operations to widest_fixed_size_mode_for_size. (clear_by_pieces): Initialize class store_by_pieces_d with CLEAR_BY_PIECES. (compare_by_pieces_d::compare_by_pieces_d): Set m_op to COMPARE_BY_PIECES.
2023-10-18Fix expansion of `(a & 2) != 1`Andrew Pinski1-4/+5
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.
2023-10-16expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg ↵Vineet Gupta1-7/+0
[target/111466] RISC-V suffers from extraneous sign extensions, despite/given the ABI guarantee that 32-bit quantities are sign-extended into 64-bit registers, meaning incoming SI function args need not be explicitly sign extended (so do SI return values as most ALU insns implicitly sign-extend too.) Existing REE doesn't seem to handle this well and there are various ideas floating around to smarten REE about it. RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE etc. Another approach would be to prevent EXPAND from generating the sign_extend in the first place which this patch tries to do. The hunk being removed was introduced way back in 1994 as 5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag") This survived full testsuite run for RISC-V rv64gc with surprisingly no fallouts: test results before/after are exactly same. | | # of unexpected case / # of unique unexpected case | | gcc | g++ | gfortran | | rv64imafdc_zba_zbb_zbs_zicond/| 264 / 87 | 5 / 2 | 72 / 12 | | lp64d/medlow Granted for something so old to have survived, there must be a valid reason. Unfortunately the original change didn't have additional commentary or a test case. That is not to say it can't/won't possibly break things on other arches/ABIs, hence the RFC for someone to scream that this is just bonkers, don't do this 🙂 I've explicitly CC'ed Jakub and Roger who have last touched subreg promoted notes in expr.cc for insight and/or screaming 😉 Thanks to Robin for narrowing this down in an amazing debugging session @ GNU Cauldron. ``` foo2: sext.w a6,a1 <-- this goes away beq a1,zero,.L4 li a5,0 li a0,0 .L3: addw a4,a2,a5 addw a5,a3,a5 addw a0,a4,a0 bltu a5,a6,.L3 ret .L4: li a0,0 ret ``` Signed-off-by: Vineet Gupta <vineetg@rivosinc.com> Co-developed-by: Robin Dapp <rdapp.gcc@gmail.com> PR target/111466 gcc/ * expr.cc (expand_expr_real_2): Do not clear SUBREG_PROMOTED_VAR_P. gcc/testsuite * gcc.target/riscv/pr111466.c: New test.
2023-09-29Remove poly_int_podRichard Sandiford1-4/+4
poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-29Simplify & expand c_readstrRichard Sandiford1-5/+2
c_readstr only operated on integer modes. It worked by reading the source string into an array of HOST_WIDE_INTs, converting that array into a wide_int, and from there to an rtx. It's simpler to do this by building a target memory image and using native_decode_rtx to convert that memory image into an rtx. It avoids all the endianness shenanigans because both the string and native_decode_rtx follow target memory order. It also means that the function can handle all fixed-size modes, which simplifies callers and allows vector modes to be used more widely. gcc/ * builtins.h (c_readstr): Take a fixed_size_mode rather than a scalar_int_mode. * builtins.cc (c_readstr): Likewise. Build a local array of bytes and use native_decode_rtx to get the rtx image. (builtin_memcpy_read_str): Simplify accordingly. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Likewise. (builtin_memset_gen_str): Likewise. * expr.cc (string_cst_read_str): Likewise.
2023-09-20middle-end: use MAX_FIXED_MODE_SIZE instead of precidion of TImode/DImodeJakub Jelinek1-10/+4
On Tue, Sep 19, 2023 at 05:50:59PM +0100, Richard Sandiford wrote: > How about using MAX_FIXED_MODE_SIZE for things like this? Seems like a good idea. The following patch does that. 2023-09-20 Jakub Jelinek <jakub@redhat.com> * match.pd ((x << c) >> c): Use MAX_FIXED_MODE_SIZE instead of GET_MODE_PRECISION of TImode or DImode depending on whether TImode is supported scalar mode. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise. * expr.cc (expand_expr_real_1): Likewise. * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): Likewise. * ubsan.cc (ubsan_encode_value, ubsan_type_descriptor): Likewise.
2023-09-07middle-end: Avoid calling targetm.c.bitint_type_info inside of gcc_assert ↵Jakub Jelinek1-1/+2
[PR102989] On Thu, Sep 07, 2023 at 10:36:02AM +0200, Thomas Schwinge wrote: > Minor comment/question: are we doing away with the property that > 'assert'-like "calls" must not have side effects? Per 'gcc/system.h', > this is "OK" for 'gcc_assert' for '#if ENABLE_ASSERT_CHECKING' or > '#elif (GCC_VERSION >= 4005)' -- that is, GCC 4.5, which is always-true, > thus the "offending" '#else' is never active. However, it's different > for standard 'assert' and 'gcc_checking_assert', so I'm not sure if > that's a good property for 'gcc_assert' only? For example, see also > <https://gcc.gnu.org/PR6906> "warn about asserts with side effects", or > recent <https://gcc.gnu.org/PR111144> > "RFE: could -fanalyzer warn about assertions that have side effects?". You're right, the #define gcc_assert(EXPR) ((void)(0 && (EXPR))) fallback definition is incompatible with the way I've used it, so for --disable-checking built by non-GCC it would not work properly. 2023-09-07 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1): Don't call targetm.c.bitint_type_info inside gcc_assert, as later code relies on it filling info variable. * gimple-fold.cc (clear_padding_bitint_needs_padding_p, clear_padding_type): Likewise. * varasm.cc (output_constant): Likewise. * fold-const.cc (native_encode_int, native_interpret_int): Likewise. * stor-layout.cc (finish_bitfield_representative, layout_type): Likewise. * gimple-lower-bitint.cc (bitint_precision_kind): Likewise.
2023-09-06Middle-end _BitInt support [PR102989]Jakub Jelinek1-5/+56
The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-10expr: Small optimization [PR102989]Jakub Jelinek1-6/+4
Small optimization to avoid testing modifier multiple times. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * expr.cc (expand_expr_real_1) <case MEM_REF>: Add an early return for EXPAND_WRITE or EXPAND_MEMORY modifiers to avoid testing it multiple times.
2023-07-28PR rtl-optimization/110587: Reduce useless moves in compile-time hog.Roger Sayle1-9/+4
This patch is one of a series of fixes for PR rtl-optimization/110587, a compile-time regression with -O0, that attempts to address the underlying cause. As noted previously, the pathological test case pr28071.c contains a large number of useless register-to-register moves that can produce quadratic behaviour (in LRA). These moves are generated during RTL expansion in emit_group_load_1, where the middle-end attempts to simplify the source before calling extract_bit_field. This is reasonable if the source is a complex expression (from before the tree-ssa optimizers), or a SUBREG, or a hard register, but it's not particularly useful to copy a pseudo register into a new pseudo register. This patch eliminates that redundancy. The -fdump-tree-expand for pr28071.c compiled with -O0 currently contains 777K lines, with this patch it contains 717K lines, i.e. saving about 60K lines (admittedly of debugging text output, but it makes the point). 2023-07-28 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR middle-end/28071 PR rtl-optimization/110587 * expr.cc (emit_group_load_1): Simplify logic for calling force_reg on ORIG_SRC, to avoid making a copy if the source is already in a pseudo register.
2023-06-29cselib+expr+bitmap: Change return type of predicate functions from int to boolUros Bizjak1-52/+52
gcc/ChangeLog: * cselib.h (rtx_equal_for_cselib_1): Change return type from int to bool. (references_value_p): Ditto. (rtx_equal_for_cselib_p): Ditto. * expr.h (can_store_by_pieces): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (safe_from_p): Ditto. * sbitmap.h (bitmap_equal_p): Ditto. * cselib.cc (references_value_p): Change return type from int to void and adjust function body accordingly. (rtx_equal_for_cselib_1): Ditto. * expr.cc (is_aligning_offset): Ditto. (can_store_by_pieces): Ditto. (mostly_zeros_p): Ditto. (all_zeros_p): Ditto. (safe_from_p): Ditto. (is_aligning_offset): Ditto. (try_casesi): Ditto. (try_tablejump): Ditto. (store_constructor): Change "need_to_clear" and "const_bounds_p" variables to bool. * sbitmap.cc (bitmap_equal_p): Change return type from int to bool.
2023-06-29middle-end/110452 - bad code generation with AVX512 mask splatRichard Biener1-0/+13
The following adds an alternate way of expanding a uniform mask vector constructor like _55 = _2 ? -1 : 0; vect_cst__56 = {_55, _55, _55, _55, _55, _55, _55, _55}; when the mask mode is a scalar int mode like for AVX512 or GCN. Instead of piecewise building the result via shifts and ors we can take advantage of uniformity and signedness of the component and simply sign-extend to the result. Instead of cmpl $3, %edi sete %cl movl %ecx, %esi leal (%rsi,%rsi), %eax leal 0(,%rsi,4), %r9d leal 0(,%rsi,8), %r8d orl %esi, %eax orl %r9d, %eax movl %ecx, %r9d orl %r8d, %eax movl %ecx, %r8d sall $4, %r9d sall $5, %r8d sall $6, %esi orl %r9d, %eax orl %r8d, %eax movl %ecx, %r8d orl %esi, %eax sall $7, %r8d orl %r8d, %eax kmovb %eax, %k1 we then get cmpl $3, %edi sete %cl negl %ecx kmovb %ecx, %k1 Code generation for non-uniform masks remains bad, but at least I see no easy way out for the most general case here. PR middle-end/110452 * expr.cc (store_constructor): Handle uniform boolean vectors with integer mode specially.
2023-06-13Avoid duplicate vector initializations during RTL expansion.Roger Sayle1-2/+5
This middle-end patch avoids some redundant RTL for vector initialization during RTL expansion. For the simple test case: typedef __int128 v1ti __attribute__ ((__vector_size__ (16))); __int128 key; v1ti foo() { return (v1ti){key}; } the middle-end currently expands: (set (reg:V1TI 85) (const_vector:V1TI [ (const_int 0) ])) (set (reg:V1TI 85) (mem/c:V1TI (symbol_ref:DI ("key")))) where we create a dead instruction that initializes the vector to zero, immediately followed by a set of the entire vector. This patch skips this zeroing instruction when the vector has only a single element. It also updates the code to indicate when we've cleared the vector, so that we don't need to initialize zero elements. 2023-06-13 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expr.cc (store_constructor) <case VECTOR_TYPE>: Don't bother clearing vectors with only a single element. Set CLEARED if the vector was initialized to zero.
2023-06-06Handle const_int in expand_single_bit_testAndrew Pinski1-3/+7
After expanding directly to rtl instead of creating a tree, we could end up with a const_int which is not ready to be handled by extract_bit_field. So need to the constant folding here instead. OK? bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/110117 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Handle const_int from expand_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr110117-1.c: New test. * gcc.dg/pr110117-2.c: New test.
2023-06-06Improve do_store_flag for single bit when there is no non-zero bitsAndrew Pinski1-17/+11
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or turn off the bit tracking part of CCP so we would lose out what TER was able to do before hand. This moves around the TER code so that it is used instead of just the nonzerobits. It also makes it easier to remove the TER part of the code later on too. OK? Bootstrapped and tested on x86_64-linux-gnu. Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that. gcc/ChangeLog: * expr.cc (do_store_flag): Rearrange the TER code so that it overrides the nonzero bits info if we had `a & POW2`.
2023-06-05Remove widen_plus/minus_expr tree codesAndre Vieira1-6/+0
This patch removes the old widen plus/minus tree codes which have been replaced by internal functions. 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> gcc/ChangeLog: * doc/generic.texi: Remove old tree codes. * expr.cc (expand_expr_real_2): Remove old tree code cases. * gimple-pretty-print.cc (dump_binary_rhs): Likewise. * optabs-tree.cc (optab_for_tree_code): Likewise. (supportable_half_widening_operation): Likewise. * tree-cfg.cc (verify_gimple_assign_binary): Likewise. * tree-inline.cc (estimate_operator_cost): Likewise. (op_symbol_code): Likewise. * tree-vect-data-refs.cc (vect_get_smallest_scalar_type): Likewise. (vect_analyze_data_ref_accesses): Likewise. * tree-vect-generic.cc (expand_vector_operations_1): Likewise. * cfgexpand.cc (expand_debug_expr): Likewise. * tree-vect-stmts.cc (vectorizable_conversion): Likewise. (supportable_widening_operation): Likewise. * gimple-range-op.cc (gimple_range_op_handler::maybe_non_standard): Likewise. * optabs.def (vec_widen_ssubl_hi_optab, vec_widen_ssubl_lo_optab, vec_widen_saddl_hi_optab, vec_widen_saddl_lo_optab, vec_widen_usubl_hi_optab, vec_widen_usubl_lo_optab, vec_widen_uaddl_hi_optab, vec_widen_uaddl_lo_optab): Remove optabs. * tree-pretty-print.cc (dump_generic_node): Remove tree code definition. * tree.def (WIDEN_PLUS_EXPR, WIDEN_MINUS_EXPR, VEC_WIDEN_PLUS_HI_EXPR, VEC_WIDEN_PLUS_LO_EXPR, VEC_WIDEN_MINUS_HI_EXPR, VEC_WIDEN_MINUS_LO_EXPR): Likewise.
2023-06-04Improve do_store_flag for comparing single bit against that bitAndrew Pinski1-3/+8
This is a case which I noticed while working on the previous patch. Sometimes we end up with `a == CST` instead of comparing against 0. This happens in the following code: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); t ^= (1<<30); return t != 0; } ``` We should handle the case where the nonzero bits is the same as the comparison operand. Changes from v1: * v2: Updated for the bit extraction changes. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * expr.cc (do_store_flag): Improve for single bit testing not against zero but against that single bit.
2023-06-04Improve do_store_flag for single bit comparison against 0Andrew Pinski1-5/+20
While working something else, I noticed we could improve the following function code generation: ``` unsigned f(unsigned t) { if (t & ~(1<<30)) __builtin_unreachable(); return t != 0; } ``` Right know we just emit a comparison against 0 instead of just a shift right by 30. There is code in do_store_flag which already optimizes `(t & 1<<30) != 0` to `(t >> 30) & 1` (using bit extraction if available). This patch extends it to handle the case where we know t has a nonzero of just one bit set. Changes from v1: * v2: Updated for the bit extraction improvements. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * expr.cc (do_store_flag): Extend the one bit checking case to handle the case where we don't have an and but rather still one bit is known to be non-zero.
2023-05-20Fix expand_single_bit_test for big-endianAndrew Pinski1-1/+8
I had thought extract_bit_field bitpos argument was the shifted position and not the bitposition like BIT_FIELD_REF so I had removed the code which would use the correct bitposition for BYTES_BIG_ENDIAN. Committed as obvious; I checked big-endian MIPS to make sure we are now producing the correct code. gcc/ChangeLog: * expr.cc (expand_single_bit_test): Correct bitpos for big-endian.