aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-07-03tree-optimization/110506 - ICE in pattern recog with TYPE_PRECISIONRichard Biener2-2/+20
The following re-orders checks to make sure we check TYPE_PRECISION on an integral type. PR tree-optimization/110506 * tree-vect-patterns.cc (vect_recog_rotate_pattern): Re-order TYPE_PRECISION access with INTEGRAL_TYPE_P check. * gcc.dg/pr110506-2.c: New testcase.
2023-07-03tree-optimization/110506 - bogus non-zero mask in CCP for vector typesRichard Biener2-0/+25
get_value_for_expr was blindlessly using TYPE_PRECISION to produce a mask for vector typed entities which the new tree checking now catches. PR tree-optimization/110506 * tree-ssa-ccp.cc (get_value_for_expr): Check for integral type before relying on TYPE_PRECISION to produce a nonzero mask. * gcc.dg/pr110506.c: New testcase.
2023-07-03testsuite: Add vect_float_strict to testcase [PR 110381]Christophe Lyon1-0/+1
As discussed in the PR, the testcase needs /* { dg-require-effective-target vect_float_strict } */ 2023-02-03 Andrew Pinski <apinski@marvell.com> PR tree-optimization/110381 gcc/testsuite/ * gcc.dg/vect/pr110381.c: Add vect_float_strict.
2023-07-03MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain conditionsJie Mei1-13/+17
This patch allows mips16e2 acts the same with -O1~3 when generating ZEB/ZEH instead of ANDI under the -O0 option, which shrinks the code size. gcc/ChangeLog: * config/mips/mips.md(*and<mode>3_mips16): Generates ZEB/ZEH instructions.
2023-07-03MIPS: Add CACHE instruction for mips16e2Jie Mei4-5/+60
This patch adds CACHE instruction from mips16e2 with corresponding tests. gcc/ChangeLog: * config/mips/mips.cc(mips_9bit_offset_address_p): Restrict the address register to M16_REGS for MIPS16. (BUILTIN_AVAIL_MIPS16E2): Defined a new macro. (AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above. (AVAIL_NON_MIPS16 (cache..)): Update to AVAIL_MIPS16E2_OR_NON_MIPS16. * config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2. * config/mips/mips.md (mips_cache): Mark as extended MIPS16. gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.
2023-07-03MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2Jie Mei1-3/+6
The MIPS16e2 ASE has PREF, LL and SC instructions, they use 9 bits immediate, like mips32r6. The MIPS32 PRE-R6 uses 16 bits immediate. gcc/ChangeLog: * config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause for ISA_HAS_MIPS16E2. (ISA_HAS_SYNC): Same as above. (ISA_HAS_LL_SC): Same as above.
2023-07-03MIPS: Add load/store word left/right instructions for mips16e2Jie Mei4-8/+169
This patch adds LWL/LWR, SWL/SWR instructions with their corresponding tests. gcc/ChangeLog: * config/mips/mips.cc(mips_expand_ins_as_unaligned_store): Add logics for generating instruction. * config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2. * config/mips/mips.md(mov_<load>l): Generates instructions. (mov_<load>r): Same as above. (mov_<store>l): Adjusted for the conditions above. (mov_<store>r): Same as above. (mov_<store>l_mips16e2): Add machine description for `define_insn mov_<store>l_mips16e2`. (mov_<store>r_mips16e2): Add machine description for `define_insn mov_<store>r_mips16e2`. gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2.c: New tests for mips16e2.
2023-07-03MIPS: Add LUI instruction for mips16e2Jie Mei3-12/+56
This patch adds LUI instruction from mips16e2 with corresponding test. gcc/ChangeLog: * config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction. (mips_const_insns): Same as above. (mips_output_move): Same as above. (mips_output_function_prologue): Same as above. * config/mips/mips.md: Same as above gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2.c: Add new tests for mips16e2.
2023-07-03MIPS: Add bitwise instructions for mips16e2Jie Mei7-21/+263
There are shortened bitwise instructions in the mips16e2 ASE, for instance, ANDI, ORI/XORI, EXT, INS etc. . This patch adds these instrutions with corresponding tests. gcc/ChangeLog: * config/mips/constraints.md(Yz): New constraints for mips16e2. * config/mips/mips-protos.h(mips_bit_clear_p): Declared new function. (mips_bit_clear_info): Same as above. * config/mips/mips.cc(mips_bit_clear_info): New function for generating instructions. (mips_bit_clear_p): Same as above. * config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2. * config/mips/mips.md(extended_mips16): Generates EXT and INS instructions. (*and<mode>3): Generates INS instruction. (*and<mode>3_mips16): Generates EXT, INS and ANDI instructions. (ior<mode>3): Add logics for ORI instruction. (*ior<mode>3_mips16_asmacro): Generates ORI instrucion. (*ior<mode>3_mips16): Add logics for XORI instruction. (*xor<mode>3_mips16): Generates XORI instrucion. (*extzv<mode>): Add logics for EXT instruction. (*insv<mode>): Add logics for INS instruction. * config/mips/predicates.md(bit_clear_operand): New predicate for generating bitwise instructions. (and_reg_operand): Add logics for generating bitwise instructions. gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2.c: New tests for mips16e2.
2023-07-03MIPS: Add instruction about global pointer register for mips16e2Jie Mei4-7/+121
The mips16e2 ASE uses eight general-purpose registers from mips32, with some special-purpose registers, these registers are GPRs: s0-1, v0-1, a0-3, and special registers: t8, gp, sp, ra. As mentioned above, the special register gp is used in mips16e2, which is the global pointer register, it is used by some of the instructions in the ASE, for instance, ADDIU, LB/LBU, etc. . This patch adds these instructions with corresponding tests. gcc/ChangeLog: * config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate instructions that uses global pointer register. (mips16_unextended_reference_p): Same as above. (mips_pic_base_register): Same as above. (mips_init_relocs): Same as above. * config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro. (GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`. * config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above. (*lowsi_mips16_gp):New `define_insn *low<mode>_mips16`. gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.
2023-07-03MIPS: Add MOVx instructions support for mips16e2Jie Mei4-2/+111
This patch adds MOVx instructions from mips16e2 (movn,movz,movtn,movtz) with corresponding tests. gcc/ChangeLog: * config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for ISA_HAS_MIPS16E2. * config/mips/mips.md(*mov<GPR:mode>_on_<MOVECC:mode>): Add logics for MOVx insts. (*mov<GPR:mode>_on_<MOVECC:mode>_mips16e2): Generate MOVx instruction. (*mov<GPR:mode>_on_<GPR2:mode>_ne): Add logics for MOVx insts. (*mov<GPR:mode>_on_<GPR2:mode>_ne_mips16e2): Generate MOVx instruction. * config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate for MOVx insts. gcc/testsuite/ChangeLog: * gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.
2023-07-03MIPS: Add basic support for mips16e2Jie Mei6-2/+32
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE, which includes all MIPS16e instructions, with some addition. It defines new special instructions for increasing code density (e.g. Extend, PC-relative instructions, etc.). This patch adds basic support for mips16e2 used by the following series of patches. gcc/ChangeLog: * config/mips/mips.cc(mips_file_start): Add mips16e2 info for output file. * config/mips/mips.h(__mips_mips16e2): Defined a new predefine macro. (ISA_HAS_MIPS16E2): Defined a new macro. (ASM_SPEC): Pass mmips16e2 to the assembler. * config/mips/mips.opt: Add -m(no-)mips16e2 option. * config/mips/predicates.md: Add clause for TARGET_MIPS16E2. * doc/invoke.texi: Add -m(no-)mips16e2 option.. gcc/testsuite/ChangeLog: * gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2 option. (mips-dg-init): Handle the recognization of mips16e2 targets. (mips-dg-options): Add dependencies for mips16e2.
2023-07-03Daily bump.GCC Administrator4-1/+56
2023-07-03d: Fix testcase failure of gdc.dg/Wbuiltin_declaration_mismatch2.d.Iain Buclaw1-22/+22
Seen at least on aarch64-*-darwin, the parameters used to instantiate the shufflevector intrinsic meant the return type was __vector(int[1]), which resulted in the error: vector type '__vector(int[1])' is not supported on this platform. All instantiations have now been fixed so the expected warning/error is now given by the compiler. gcc/testsuite/ChangeLog: * gdc.dg/Wbuiltin_declaration_mismatch2.d: Fix failed tests.
2023-07-02tree-ssa-math-opts: Fix up ICE in match_uaddc_usubc [PR110508]Jakub Jelinek2-5/+17
The match_uaddc_usubc matching doesn't require that the second .{ADD,SUB}_OVERFLOW has REALPART_EXPR of its lhs used, only that there is at most one. So, in the weird case where the REALPART_EXPR of it isn't present, we shouldn't ICE trying to replace that REALPART_EXPR with REALPART_EXPR of .U{ADD,SUB}C result. 2023-07-02 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/110508 * tree-ssa-math-opts.cc (match_uaddc_usubc): Only replace re2 with REALPART_EXPR opf nlhs if re2 is non-NULL. * gcc.dg/pr110508.c: New test.
2023-07-02xtensa: The use of CLAMPS instruction also requires TARGET_MINMAX, as well ↵Takayuki 'January June' Suwa2-7/+4
as TARGET_CLAMPS Because both smin and smax requiring TARGET_MINMAX are essential to the RTL representation. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p): Simplify. * config/xtensa/xtensa.md (*xtensa_clamps): Add TARGET_MINMAX to the condition.
2023-07-02xtensa: Fix missing mode warning in "*eqne_INT_MIN"Takayuki 'January June' Suwa1-1/+1
gcc/ChangeLog: * config/xtensa/xtensa.md (*eqne_INT_MIN): Add missing ":SI" to the match_operator.
2023-07-02Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].Iain Sandoe2-7/+24
This support the -fconstant-cfstrings option as used by clang (and expect by some build scripts) as an alias to the target-specific -mconstant-cfstrings. The documentation is also updated to reflect that the 'f' option is only available on Darwin, and to add the 'm' option to the Darwin section of the invocation text. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR target/108743 gcc/ChangeLog: * config/darwin.opt: Add fconstant-cfstrings alias to mconstant-cfstrings. * doc/invoke.texi: Amend invocation descriptions to reflect that the fconstant-cfstrings is a target-option alias and to add the missing mconstant-cfstrings option description to the Darwin section.
2023-07-02d: Add testcase from PR108962Iain Buclaw1-0/+13
The issue was fixed in r14-2232. PR d/108962 gcc/testsuite/ChangeLog: * gdc.dg/pr108962.d: New test.
2023-07-02d: Fix core.volatile.volatileLoad discarded if result is unusedIain Buclaw3-0/+26
The first pass of code generation in the D front-end splits up all compound expressions and discards expressions that have no side effects. This included calls to the `volatileLoad' intrinsic if its result was not used, causing such calls to be eliminated from the program. We already set TREE_THIS_VOLATILE on the expression, however the tree documentation says if this bit is set in an expression, so is TREE_SIDE_EFFECTS. So set TREE_SIDE_EFFECTS on the expression too. This prevents any early discarding from occuring. PR d/110516 gcc/d/ChangeLog: * intrinsics.cc (expand_volatile_load): Set TREE_SIDE_EFFECTS on the expanded expression. (expand_volatile_store): Likewise. gcc/testsuite/ChangeLog: * gdc.dg/torture/pr110516a.d: New test. * gdc.dg/torture/pr110516b.d: New test.
2023-07-02Daily bump.GCC Administrator4-1/+84
2023-07-02d: Fix accesses of immutable arrays using constant index still bounds checkedIain Buclaw6-0/+51
Starts setting TREE_READONLY against specific kinds of VAR_DECLs, so that the middle-end/optimization passes can more aggressively constant fold D code that makes use of `immutable' or `const'. PR d/110514 gcc/d/ChangeLog: * decl.cc (get_symbol_decl): Set TREE_READONLY on certain kinds of const and immutable variables. * expr.cc (ExprVisitor::visit (ArrayLiteralExp *)): Set TREE_READONLY on immutable dynamic array literals. gcc/testsuite/ChangeLog: * gdc.dg/pr110514a.d: New test. * gdc.dg/pr110514b.d: New test. * gdc.dg/pr110514c.d: New test. * gdc.dg/pr110514d.d: New test.
2023-07-01d: Don't generate code that throws exceptions when compiling with ↵Iain Buclaw8-7/+28
`-fno-exceptions' The version flags for RTMI, RTTI, and exceptions was unconditionally predefined. These are now only predefined if the feature flag is enabled. It was noticed that there was no `-fexceptions' definition inside d/lang.opt, so the detection of the exceptions option flag was only partially working. Once that was fixed, a few places in the front-end implementation were found to fall fowl of `nothrow' rules, these have been fixed upstream and backported here as well. Reviewed-on: https://github.com/dlang/dmd/pull/15357 https://github.com/dlang/dmd/pull/15360 PR d/110471 gcc/d/ChangeLog: * d-builtins.cc (d_init_versions): Predefine D_ModuleInfo, D_Exceptions, and D_TypeInfo only if feature is enabled. * lang.opt: Add -fexceptions. gcc/testsuite/ChangeLog: * gdc.dg/pr110471a.d: New test. * gdc.dg/pr110471b.d: New test. * gdc.dg/pr110471c.d: New test.
2023-07-01Add testcase from PR25623Jan Hubicka1-0/+19
gcc/testsuite/ChangeLog: PR tree-optimization/25623 * gfortran.dg/pr25623.f90: New test.
2023-07-01Fix profile update in copy-headerJan Hubicka6-29/+154
Most common source of profile mismatches is now copyheader pass. The reason is that in comon case the duplicated header condition will become constant true and that needs changes in the loop exit condition probability. While this can be done by jump threading it is not, since it gives up on loops. Copy header pass now has logic to prove that first exit will become true, so this patch adds necessary pumbing to the profile updating. This is done in gimple_duplicate_sese_region in a way that is specific for this particular case. I think general case is kind-of unsolvable and loop-ch is the only user of the infrastructure. If we later invent some new users, maybe we can export the region and region_copy arrays and let user to do the update. With the patch we now get: Pass dump id and name |static mismat|dynamic mismatch |in count |in count 107t cunrolli | 3 +3| 19237 +19237 127t ch | 13 +10| 19237 131t dom | 39 +26| 19237 133t isolate-paths | 47 +8| 19237 134t reassoc | 49 +2| 19237 136t forwprop | 53 +4| 226943 +207706 159t cddce | 61 +8| 242222 +15279 161t ldist | 62 +1| 242222 172t ifcvt | 66 +4| 415472 +173250 173t vect | 143 +77| 10859784 +10444312 176t cunroll | 294 +151| 150357763 +139497979 183t loopdone | 291 -3| 150289533 -68230 194t tracer | 322 +31| 153230990 +2941457 195t fre | 317 -5| 153230990 197t dom | 286 -31| 154448079 +1217089 199t threadfull | 293 +7| 154724763 +276684 200t vrp | 297 +4| 155042448 +317685 204t dce | 294 -3| 155017073 -25375 206t sink | 292 -2| 155017073 211t cddce | 298 +6| 155018657 +1584 255t optimized | 296 -2| 155018657 256r expand | 273 -23| 154592622 -426035 258r into_cfglayout | 268 -5| 154592661 +39 275r loop2_unroll | 272 +4| 159701866 +5109205 291r ce2 | 270 -2| 159723509 312r pro_and_epilogue | 290 +20| 159792505 +68996 315r jump2 | 296 +6| 164234016 +4441511 323r bbro | 294 -2| 159385430 -4848586 So ch introduces 10 new mismatches while originally it did 308. At bbro the number of mismatches dropped from 432 to 294. Most offender is now cunroll pass. I think it is the case where loop has multiple exits and one of exits becomes to be false in all but last peeled iteration. This is another case where non-trivial loop update is needed. Honza gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Add elliminated_edge parmaeter; update profile. * tree-cfg.h (gimple_duplicate_sese_region): Update prototype. * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Rename to ... (static_loop_exit): ... this; return the edge to be elliminated. (ch_base::copy_headers): Handle profile updating for eliminated exits. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ifc-20040816-1.c: Reduce number of mismatches from 2 to 1. * gcc.dg/tree-ssa/loop-ch-profile-1.c: New test. * gcc.dg/tree-ssa/loop-ch-profile-2.c: New test.
2023-07-01i386: Add STV support for DImode and SImode rotations by constant.Roger Sayle4-0/+336
This patch implements scalar-to-vector (STV) support for DImode and SImode rotations by constant bit counts. Scalar rotations are almost always optimal on x86, requiring only one or two instructions, but it is also possible to implement these efficiently with SSE2, requiring only one or two instructions for SImode rotations and at most 3 instructions for DImode rotations. This allows GCC to STV rotations with a small or no penalty if there are other (net) benefits to converting a chain. An example of the benefits is shown below, which is based upon the BLAKE2 cryptographic hash function: unsigned long long a,b,c,d; unsigned long rot(unsigned long long x, int y) { return (x<<y) | (x>>(64-y)); } void foo() { d = rot(d ^ a,32); c = c + d; b = rot(b ^ c,24); a = a + b; d = rot(d ^ a,16); c = c + d; b = rot(b ^ c,63); } where with -m32 -O2 -msse2 Before (59 insns, 247 bytes): foo: pushl %edi xorl %edx, %edx pushl %esi pushl %ebx subl $16, %esp movq a, %xmm1 movq d, %xmm0 movq b, %xmm2 pxor %xmm1, %xmm0 psrlq $32, %xmm0 movd %xmm0, %eax movd %edx, %xmm0 movd %eax, %xmm3 punpckldq %xmm0, %xmm3 movq c, %xmm0 paddq %xmm3, %xmm0 pxor %xmm0, %xmm2 movd %xmm2, %ecx psrlq $32, %xmm2 movd %xmm2, %ebx movl %ecx, %eax shldl $24, %ebx, %ecx shldl $24, %eax, %ebx movd %ebx, %xmm4 movd %ecx, %xmm2 punpckldq %xmm4, %xmm2 movdqa .LC0, %xmm4 pand %xmm4, %xmm2 paddq %xmm2, %xmm1 movq %xmm1, a pxor %xmm3, %xmm1 movd %xmm1, %esi psrlq $32, %xmm1 movd %xmm1, %edi movl %esi, %eax shldl $16, %edi, %esi shldl $16, %eax, %edi movd %esi, %xmm1 movd %edi, %xmm3 punpckldq %xmm3, %xmm1 pand %xmm4, %xmm1 movq %xmm1, d paddq %xmm1, %xmm0 movq %xmm0, c pxor %xmm2, %xmm0 movd %xmm0, 8(%esp) psrlq $32, %xmm0 movl 8(%esp), %eax movd %xmm0, 12(%esp) movl 12(%esp), %edx shrdl $1, %edx, %eax xorl %edx, %edx movl %eax, b movl %edx, b+4 addl $16, %esp popl %ebx popl %esi popl %edi ret After (32 insns, 165 bytes): movq a, %xmm1 xorl %edx, %edx movq d, %xmm0 movq b, %xmm2 movdqa .LC0, %xmm4 pxor %xmm1, %xmm0 psrlq $32, %xmm0 movd %xmm0, %eax movd %edx, %xmm0 movd %eax, %xmm3 punpckldq %xmm0, %xmm3 movq c, %xmm0 paddq %xmm3, %xmm0 pxor %xmm0, %xmm2 pshufd $68, %xmm2, %xmm2 psrldq $5, %xmm2 pand %xmm4, %xmm2 paddq %xmm2, %xmm1 movq %xmm1, a pxor %xmm3, %xmm1 pshuflw $147, %xmm1, %xmm1 pand %xmm4, %xmm1 movq %xmm1, d paddq %xmm1, %xmm0 movq %xmm0, c pxor %xmm2, %xmm0 pshufd $20, %xmm0, %xmm0 psrlq $1, %xmm0 pshufd $136, %xmm0, %xmm0 pand %xmm4, %xmm0 movq %xmm0, b ret 2023-07-01 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Provide gains/costs for ROTATE and ROTATERT (by an integer constant). (general_scalar_chain::convert_rotate): New helper function to convert a DImode or SImode rotation by an integer constant into SSE vector form. (general_scalar_chain::convert_insn): Call the new convert_rotate for ROTATE and ROTATERT. (general_scalar_to_vector_candidate_p): Consider ROTATE and ROTATERT to be candidates if the second operand is an integer constant, valid for a rotation (or shift) in the given mode. * config/i386/i386-features.h (general_scalar_chain): Add new helper method convert_rotate. gcc/testsuite/ChangeLog * gcc.target/i386/rotate-6.c: New test case. * gcc.target/i386/sse2-stv-1.c: Likewise.
2023-07-01Fix update_bb_profile_for_threadingJan Hubicka3-4/+22
Fix profile some of profile mismatched caused by profile updating. It seems that I misupdated update_bb_profile_for_threading in 2017 which results in invalid updates from rtl threading and threadbackwards. update_bb_profile_for_threading knows that some paths to BB are being redirected elsehwere and those paths will exit from BB with E. So it needs to determine probability of the duplicated path and redistribute probablities. For some reaosn however the conditonal probability of redirected path is computed after its counts is subtracted which is wrong and often results in probability greater than 100%. I also fixed error mesage. Compilling tramp3d I now get following passes producing mismpatches: Pass dump id and name |static mismatcdynamic mismatch |in count |in count 113t fre | 2 +2| 0 114t mergephi | 2 | 0 115t threadfull | 2 | 0 116t vrp | 2 | 0 127t ch | 307 +305| 347194302 +347194302 130t thread | 313 +6| 347221478 +27176 131t dom | 321 +8| 346841121 -380357 134t reassoc | 323 +2| 346841121 136t forwprop | 327 +4| 347026371 +185250 144t pre | 326 -1| 347040926 +14555 172t ifcvt | 338 +2| 347218249 +156280 173t vect | 409 +71| 356357418 +9139169 176t cunroll | 377 -32| 126071925 -230285493 183t loopdone | 376 -1| 126015489 -56436 194t tracer | 379 +3| 127258199 +1242710 197t dom | 375 -4| 128352165 +1093966 199t threadfull | 379 +4| 128526112 +173947 200t vrp | 381 +2| 128724673 +198561 204t dce | 374 -7| 128632495 -92178 206t sink | 370 -4| 128618043 -14452 211t cddce | 372 +2| 128632495 +14452 248t ehcleanup | 370 -2| 128618755 -13740 255t optimized | 362 -8| 128576810 -41945 256r expand | 356 -6| 128899768 +322958 258r into_cfglayout | 353 -3| 129051765 +151997 259r jump | 354 +1| 129051765 262r cse1 | 353 -1| 129051765 275r loop2_unroll | 355 +2| 132182110 +3130345 277r loop2_done | 354 -1| 132182109 -1 312r pro_and_epilogue | 371 +17| 132222324 +40215 323r bbro | 375 +4| 132095926 -126398 Without the patch at jump2 time we get over 432 mismatches, so 15% improvement. Some of the mismathces are unavoidable. I think ch mismatches are mostly due to loop header copying where the header condition constant propagates. Most common case should be threadable in early optimizations and we also could do better on profile updating here. Bootstrapped/regtested x6_64-linux, comitted. gcc/ChangeLog: PR tree-optimization/103680 * cfg.cc (update_bb_profile_for_threading): Fix profile update; make message clearer. gcc/testsuite/ChangeLog: PR tree-optimization/103680 * gcc.dg/tree-ssa/pr103680.c: New test. * gcc.dg/tree-prof/cmpsf-1.c: Un-xfail.
2023-07-01Daily bump.GCC Administrator5-1/+205
2023-06-30c++: fix up caching of level lowered ttpsPatrick Palka3-18/+31
Due to level/depth mismatches between the template parameters of a level lowered ttp and the original ttp, the ttp comparison check added by r14-418-g0bc2a1dc327af9 never actually holds outside of erroneous cases. Moreover, it'd be good to also cache the overall TEMPLATE_TEMPLATE_PARM instead of only the TEMPLATE_PARM_INDEX. It's tricky to cache all level lowered ttps since the result of level lowering may depend on more than just the depth of the arguments, e.g. for TT in template<class T> struct A { template<template<T> class TT> void f(); } the substitution T=int yields a different level-lowered ttp than T=char. But these kinds of ttps seem to be rare in practice, and "simple" ttps that don't depend on outer template parameters are easy enough to cache like so. Unfortunately, this means we're back to expecting a duplicate error in nontype12.C again since the ttp in question isn't "simple" so caching of the (erroneous) lowered ttp doesn't happen. gcc/cp/ChangeLog: * cp-tree.h (TEMPLATE_PARM_DESCENDANTS): Harden. (TEMPLATE_TYPE_DESCENDANTS): Define. (TEMPLATE_TEMPLATE_PARM_SIMPLE_P): Define. * pt.cc (reduce_template_parm_level): Revert r14-418-g0bc2a1dc327af9 change. (process_template_parm): Set TEMPLATE_TEMPLATE_PARM_SIMPLE_P appropriately. (uses_outer_template_parms): Determine the outer depth of a template template parm without relying on DECL_CONTEXT. (tsubst) <case TEMPLATE_TEMPLATE_PARM>: Cache lowering a simple template template parm. Consistently use 'code'. gcc/testsuite/ChangeLog: * g++.dg/template/nontype12.C: Refine and XFAIL the dg-bogus duplicate diagnostic check.
2023-06-30Use TYPE_INCLUDES_FLEXARRAY in __builtin_object_size [PR ↵Qing Zhao2-1/+156
tree-optimization/101832] __builtin_object_size should treat struct with TYPE_INCLUDES_FLEXARRAY as flexible size. gcc/ChangeLog: PR tree-optimization/101832 * tree-object-size.cc (addr_object_size): Handle structure/union type when it has flexible size. gcc/testsuite/ChangeLog: PR tree-optimization/101832 * gcc.dg/builtin-object-size-pr101832.c: New test.
2023-06-30Fix couple of endianness issues in fold_ctor_referenceEric Botcazou5-21/+148
fold_ctor_reference attempts to use a recursive local processing in order to call native_encode_expr on the leaf nodes of the constructor, before falling back to calling native_encode_initializer if this fails. There are a couple of issues related to endianness present in it: 1) it does not specifically handle integral bit-fields; now these are left justified on big-endian platforms so cannot be treated like ordinary fields. 2) it does not check that the constructor uses the native storage order. gcc/ * gimple-fold.cc (fold_array_ctor_reference): Fix head comment. (fold_nonarray_ctor_reference): Likewise. Specifically deal with integral bit-fields. (fold_ctor_reference): Make sure that the constructor uses the native storage order. gcc/testsuite/ * gcc.c-torture/execute/20230630-1.c: New test. * gcc.c-torture/execute/20230630-2.c: Likewise. * gcc.c-torture/execute/20230630-3.c: Likewise * gcc.c-torture/execute/20230630-4.c: Likewise
2023-06-30jit.exp: handle dwarf version mismatch in jit-check-debug-info [PR110466]David Malcolm1-0/+4
gcc/testsuite/ChangeLog: PR jit/110466 * jit.dg/jit.exp (jit-check-debug-info): Gracefully handle too early versions of gdb that don't support our dwarf version, via "unsupported". Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-30jit: avoid using __vector in testcase [PR110466]David Malcolm1-11/+11
r13-4531-gd2e782cb99c311 added test coverage to libgccjit's vector support, but used __vector, which doesn't work on Power. Additionally the size param to gcc_jit_type_get_vector was wrong. Fixed thusly. gcc/testsuite/ChangeLog: PR jit/110466 * jit.dg/test-expressions.c (run_test_of_comparison): Fix size param to gcc_jit_type_get_vector. (verify_comparisons): Use a typedef rather than __vector. Co-authored-by: Marek Polacek <polacek@redhat.com> Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-06-30Fix handling of __builtin_expect_with_probability and improve first-match ↵Jan Hubicka4-22/+63
heuristics While looking into the std::vector _M_realloc_insert codegen I noticed that call of __throw_bad_alloc is predicted with 10% probability. This is because the conditional guarding it has __builtin_expect (cond, 0) on it. This incorrectly takes precedence over more reliable heuristics predicting that call to cold noreturn is likely not going to happen. So I reordered the predictors so __builtin_expect_with_probability comes first after predictors that never makes a mistake (so user can use it to always specify the outcome by hand). I also downgraded malloc predictor since I do think user-defined malloc functions & new operators may behave funny ways and moved usual __builtin_expect after the noreturn cold predictor. This triggered latent bug in expr_expected_value_1 where if (*predictor < predictor2) *predictor = predictor2; should be: if (predictor2 < *predictor) *predictor = predictor2; which eventually triggered an ICE on combining heuristics. This made me notice that we can do slightly better while combining expected values in case only one of the parameters (such as in a*b when we expect a==0) can determine overall result. Note that the new code may pick weaker heuristics in case that both values are predicted. Not sure if this scenario is worth the extra CPU time: there is not correct way to combine the probabilities anyway since we do not know if the predictions are independent, so I think users should not rely on it. Fixing this issue uncovered another problem. In 2018 Martin Liska added code predicting that MALLOC returns non-NULL but instead of that he predicts that it returns true (boolean 1). This sort of works for testcase testing malloc (10) != NULL but, for example, we will predict malloc (10) == malloc (10) as true, which is not right and such comparsion may happen in real code I think proper way is to update expr_expected_value_1 to work with value ranges, but that needs greater surgery so I decided to postpone this and only add FIXME and fill PR110499. gcc/ChangeLog: PR middle-end/109849 * predict.cc (estimate_bb_frequencies): Turn to static function. (expr_expected_value_1): Fix handling of binary expressions with predicted values. * predict.def (PRED_MALLOC_NONNULL): Move later in the priority queue. (PRED_BUILTIN_EXPECT_WITH_PROBABILITY): Move to almost top of the priority queue. * predict.h (estimate_bb_frequencies): No longer declare it. gcc/testsuite/ChangeLog: PR middle-end/109849 * gcc.dg/predict-18.c: Improve testcase.
2023-06-30modula-2: Amend the handling of failed select() calls in RTint [PR108835].Iain Sandoe1-16/+54
When we make a select() that fails, there is an attempt to (a) diagnose why and (b) make a fallback. These actions are causing some tests to hang on some Darwin versions, this is because the first action that is tried to assist in diagnosis/fallback handling is to replace the set timeout with NIL (which causes select to wait forever, modulo other reasons it might complete). To fix this, call select with a zero timeout when checking for error conditions. Also, as we check the possible failure conditions, if we find a change that succeeds, then stop looking for errors. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR testsuite/108835 gcc/m2/ChangeLog: * gm2-libs/RTint.mod: Do not use NIL timeout setting on select, test failures sequentially, finishing on the first success.
2023-06-30fold-const+optabs: Change return type of predicate functions from int to boolUros Bizjak4-88/+88
Also change some internal variables and function argument from int to bool. gcc/ChangeLog: * fold-const.h (multiple_of_p): Change return type from int to bool. * fold-const.cc (split_tree): Change negl_p, neg_litp_p, neg_conp_p and neg_var_p variables to bool. (const_binop): Change sat_p variable to bool. (merge_ranges): Change no_overlap variable to bool. (extract_muldiv_1): Change same_p variable to bool. (tree_swap_operands_p): Update function body for bool return type. (fold_truth_andor): Change commutative variable to bool. (multiple_of_p): Change return type from int to void and adjust function body accordingly. * optabs.h (expand_twoval_unop): Change return type from int to bool. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto. * optabs.cc (add_equal_note): Ditto. (widen_operand): Change no_extend argument from int to bool. (expand_binop): Ditto. (expand_twoval_unop): Change return type from int to void and adjust function body accordingly. (expand_twoval_binop): Ditto. (can_compare_p): Ditto. (have_add2_insn): Ditto. (have_addptr3_insn): Ditto. (have_sub2_insn): Ditto. (have_insn_for): Ditto.
2023-06-30AArch64: New RTL for ABDLOluwatamilore Adebayo16-57/+803
This patch adds new RTL for ABDL (sabdl, sabdl2, uabdl, uabdl2). gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_widen_<su>abdl_lo_<mode>, vec_widen_<su>abdl_hi_<mode>): Expansions for abd vec widen optabs. (aarch64_<su>abdl<mode>_insn): VQW based abdl RTL. * config/aarch64/iterators.md (USMAX_EXT): Code attributes that give the appropriate extend RTL for the max RTL. gcc/testsuite/ChangeLog: * gcc.target/aarch64/abd_2.c: Added ABDL testcases. * gcc.target/aarch64/abd_3.c: Added ABDL testcases. * gcc.target/aarch64/abd_4.c: Added ABDL testcases. * gcc.target/aarch64/abd_none_2.c: Added ABDL testcases. * gcc.target/aarch64/abd_none_3.c: Added ABDL testcases. * gcc.target/aarch64/abd_none_4.c: Added ABDL testcases. * gcc.target/aarch64/abd_run_1.c: Added ABDL testcases. * gcc.target/aarch64/sve/abd_1.c: Added ABDL testcases. * gcc.target/aarch64/sve/abd_2.c: Added ABDL testcases. * gcc.target/aarch64/sve/abd_none_1.c: Added ABDL testcases. * gcc.target/aarch64/sve/abd_none_2.c: Added ABDL testcases.
2023-06-30Mid engine setup [SU]ABDLOluwatamilore Adebayo4-48/+176
This updates vect_recog_abd_pattern to recognize the widening variant of absolute difference (ABDL, ABDL2). gcc/ChangeLog: * internal-fn.def (VEC_WIDEN_ABD): New internal hilo optab. * optabs.def (vec_widen_sabd_optab, vec_widen_sabd_hi_optab, vec_widen_sabd_lo_optab, vec_widen_sabd_odd_even, vec_widen_sabd_even_optab, vec_widen_uabd_optab, vec_widen_uabd_hi_optab, vec_widen_uabd_lo_optab, vec_widen_uabd_odd_even, vec_widen_uabd_even_optab): New optabs. * doc/md.texi: Document them. * tree-vect-patterns.cc (vect_recog_abd_pattern): Update to to build a VEC_WIDEN_ABD call if the input precision is smaller than the precision of the output. (vect_recog_widen_abd_pattern): Should an ABD expression be found preceeding an extension, replace the two with a VEC_WIDEN_ABD.
2023-06-30RISC-V: Refactor vxrm_mode attr for type attr equalPan Li1-16/+23
This patch would like to refactor the vxrm_mode attr for duplicated eq_attr condition. The common condition of attr is extraced to one place instead of many places. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/vector.md: Refactor the common condition.
2023-06-30tree-optimization/110496 - TYPE_PRECISION issue with store-mergingRichard Biener2-1/+30
When store-merging looks for bswap opportunities we also handle BIT_FIELD_REFs where we verify the refed object is of scalar type but we don't check for the result type we eventually use. That's done later but after we eventually query TYPE_PRECISION. The following re-orders this. PR tree-optimization/110496 * gimple-ssa-store-merging.cc (find_bswap_or_nop_1): Re-order verifying and TYPE_PRECISION query for the BIT_FIELD_REF case. * gcc.dg/pr110496.c: New testcase.
2023-06-30middle-end/110489 - avoid useless work on statisticsRichard Biener1-7/+14
When we call statistics_fini_pass we unconditionally allocate the statistics hash and traverse it. When a TU has many small functions this can take considerable time. The following avoids this by never allocating the hash from this function. PR middle-end/110489 * statistics.cc (curr_statistics_hash): Add argument indicating whether we should allocate the hash. (statistics_fini_pass): If the hash isn't allocated only print the summary header.
2023-06-30Flip the nvptx port to LRASegher Boessenkool1-3/+0
... understanding that "turn on LRA" is an exaggeration here, given that nvptx isn't actually doing register allocation ('TARGET_NO_REGISTER_ALLOCATION'). gcc/ * config/nvptx/nvptx.cc (TARGET_LRA_P): Remove. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-06-30tree-optimization/110381 - fix testcaseRichard Biener1-0/+4
This adds a missing check_vect () to the execute testcase. PR tree-optimization/110381 * gcc.dg/vect/pr110381.c: Add check_vect ().
2023-06-30mips: Fix overaligned function arguments [PR109435]Jovan Dmitrovic3-1/+58
This patch changes alignment for typedef types when passed as arguments, making the alignment equal to the alignment of original (aliased) types. This change makes it impossible for a typedef type to have alignment that is less than its size. 2023-06-27 Jovan Dmitrović <jovan.dmitrovic@syrmia.com> gcc/ChangeLog: PR target/109435 * config/mips/mips.cc (mips_function_arg_alignment): Returns the alignment of function argument. In case of typedef type, it returns the aligment of the aliased type. (mips_function_arg_boundary): Relocated calculation of the aligment of function arguments. gcc/testsuite/ChangeLog: * gcc.target/mips/align-1-n64.c: New test. * gcc.target/mips/align-1-o32.c: New test.
2023-06-30Daily bump.GCC Administrator9-1/+325
2023-06-30analyzer: Fix regression bug after r14-1632-g9589a46ddadc8b [PR110198]benjamin priour3-10/+13
g++.dg/analyzer/PR100244.C was failing after a patch of PR109439. The reason was a spurious preemptive return of get_store_value upon out-of-bounds read that was preventing further checks. Now instead, a boolean value check_poisoned goes to false when a OOB is detected, and is later on given to get_or_create_initial_value. gcc/analyzer/ChangeLog: PR analyzer/110198 * region-model-manager.cc (region_model_manager::get_or_create_initial_value): Take an optional boolean value to bypass poisoning checks * region-model-manager.h: Update declaration of the above function. * region-model.cc (region_model::get_store_value): No longer returns on OOB, but rather gives a boolean to get_or_create_initial_value. (region_model::check_region_access): Update docstring. (region_model::check_region_for_write): Update docstring. Signed-off-by: benjamin priour <priour.be@gmail.com>
2023-06-29Compute ipa-predicates for conditionals involving __builtin_expect_pJan Hubicka2-0/+41
std::vector allocator looks as follows: __attribute__((nodiscard)) struct pair * std::__new_allocator<std::pair<unsigned int, unsigned int> >::allocate (struct __new_allocator * const this, size_type __n, const void * D.27753) { bool _1; long int _2; long int _3; long unsigned int _5; struct pair * _9; <bb 2> [local count: 1073741824]: _1 = __n_7(D) > 1152921504606846975; _2 = (long int) _1; _3 = __builtin_expect (_2, 0); if (_3 != 0) goto <bb 3>; [10.00%] else goto <bb 6>; [90.00%] <bb 3> [local count: 107374184]: if (__n_7(D) > 2305843009213693951) goto <bb 4>; [50.00%] else goto <bb 5>; [50.00%] <bb 4> [local count: 53687092]: std::__throw_bad_array_new_length (); <bb 5> [local count: 53687092]: std::__throw_bad_alloc (); <bb 6> [local count: 966367641]: _5 = __n_7(D) * 8; _9 = operator new (_5); return _9; } So there is check for allocated block size being greater than max_size which is wrapper in __builtin_expect. This makes ipa-fnsummary to give up analyzing predicates and it will miss the fact that the two different calls to __throw will be optimized out if __n is larady smaller than 1152921504606846975 which it is after _M_check_len. This patch extends ipa-fnsummary to understand functions that return their parameter. gcc/ChangeLog: PR tree-optimization/109849 * ipa-fnsummary.cc (decompose_param_expr): Skip functions returning its parameter. (set_cond_stmt_execution_predicate): Return early if predicate was constructed. gcc/testsuite/ChangeLog: PR tree-optimization/109849 * gcc.dg/ipa/pr109849.c: New test.
2023-06-29testsuite: Use -fno-report-bug in gcc.dg/plugin/Marek Polacek4-2/+6
Certain downstream compilers (for example, in Fedora) default to -freport-bug. The extra output breaks the following tests. We can use -fno-report-bug to fix that. Patch verified with: $ make check RUNTESTFLAGS='--target_board=unix\{,-freport-bug\} plugin.exp' gcc/testsuite/ChangeLog: * gcc.dg/plugin/crash-test-ice-sarif.c: Use -fno-report-bug. Adjust scan-sarif-file. * gcc.dg/plugin/crash-test-ice-stderr.c: Use -fno-report-bug. * gcc.dg/plugin/crash-test-write-though-null-sarif.c: Use -fno-report-bug. Adjust scan-sarif-file. * gcc.dg/plugin/crash-test-write-though-null-stderr.c: Use -fno-report-bug.
2023-06-29i386: add -fno-stack-protector to two testsMarek Polacek2-2/+2
These tests fail when the testsuite is executed with -fstack-protector-strong. To avoid this, this patch adds -fno-stack-protector to dg-options. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104610.c: Use -fno-stack-protector. * gcc.target/i386/pr69482-1.c: Likewise.
2023-06-29c++: NSDMI instantiation during overload resolution [PR110468]Patrick Palka2-0/+22
Here we find ourselves instantiating the NSDMI for A<1>::m when computing argument conversions during overload resolution, and thus tf_conv is set. The flag causes mark_used for the constructor used in the NSDMI to exit early and not instantiate its noexcept-spec, which eventually leads to an ICE from nothrow_spec_p. This patch fixes this by clearing any special tsubst flags during instantiation of an NSDMI, since the result should be independent of the context that requires the instantiation. PR c++/110468 gcc/cp/ChangeLog: * init.cc (maybe_instantiate_nsdmi_init): Mask out all tsubst flags except for tf_warning_or_error. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept79.C: New test.