aboutsummaryrefslogtreecommitdiff
path: root/gcc/testsuite
AgeCommit message (Collapse)AuthorFilesLines
2023-07-12[range-op] Enable value/mask propagation in range-op.Aldy Hernandez1-1/+1
Throw the switch in range-ops to make full use of the value/mask information instead of only the nonzero bits. This will cause most of the operators implemented in range-ops to use the value/mask information calculated by CCP's bit_value_binop() function which range-ops uses. This opens up more optimization opportunities. In follow-up patches I will change the global range setter (set_range_info) to be able to save the value/mask pair, and make both CCP and IPA be able to save the known ones bit info, instead of throwing it away. gcc/ChangeLog: * range-op.cc (irange_to_masked_value): Remove. (update_known_bitmask): Update irange value/mask pair instead of only updating nonzero bits. gcc/testsuite/ChangeLog: * gcc.dg/pr83073.c: Adjust testcase.
2023-07-12Improve profile update in loop-chJan Hubicka1-1/+1
Improve profile update in loop-ch to handle situation where duplicated header has loop invariant test. In this case we konw that all count of the exit edge belongs to the duplicated loop header edge and can update probabilities accordingly. Since we also do all the work to track this information from analysis to duplicaiton I also added code to turn those conditionals to constants so we do not need later jump threading pass to clean up. This made me to work out that the propagation was buggy in few aspects 1) it handled every PHI as PHI in header and incorrectly assigned some PHIs to be IV-like when they are not 2) it did not check for novops calls that are not required to return same value on every invocation. 3) I also added check for asm statement since those are not necessarily reproducible either. I would like to do more changes, but tried to prevent this patch from snowballing. The analysis of what statements will remain after duplication can be improved. I think we should use ranger query for other than first basic block, too and possibly drop the IV heuristics then. Also it seems that a lot of this logic is pretty much same to analysis in peeling pass, so unifying this would be nice. I also think I should move the profile update out of gimple_duplicate_sese_region (it is now very specific to ch) and rename it, since those regions are singe entry multiple exit. Bootstrapped/regtsted x86_64-linux, OK? Honza gcc/ChangeLog: * tree-cfg.cc (gimple_duplicate_sese_region): Add ORIG_ELIMINATED_EDGES parameter and rewrite profile updating code to handle edges elimination. * tree-cfg.h (gimple_duplicate_sese_region): Update prototpe. * tree-ssa-loop-ch.cc (loop_invariant_op_p): New function. (loop_iv_derived_p): New function. (should_duplicate_loop_header_p): Track invariant exit edges; fix handling of PHIs and propagation of IV derived variables. (ch_base::copy_headers): Pass around the invariant edges hash set. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/loop-ch-profile-1.c: Remove xfail.
2023-07-12riscv: thead: Fix failing XTheadCondMov tests (indirect-rv[32|64])Christoph Müllner3-208/+118
Recently, two identical XTheadCondMov tests have been added, which both fail. Let's fix that by changing the following: * Merge both files into one (no need for separate tests for rv32 and rv64) * Drop unrelated attribute check test (we already test for `th.mveqz` and `th.mvnez` instructions, so there is little additional value) * Fix the pattern to allow matching Fixes: a1806f0918c0 ("RISC-V: Optimize TARGET_XTHEADCONDMOV") gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Moved to... * gcc.target/riscv/xtheadcondmov-indirect.c: ...here. * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Removed. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12PR target/110598: Fix rega = 0; rega ^= rega regression in i386.mdRoger Sayle1-0/+46
This patch fixes the regression PR target/110598 caused by my recent addition of a peephole2. The intention of that optimization was to simplify zeroing a register, followed by an IOR, XOR or PLUS operation on it into a move, or as described in the comment: ;; Peephole2 rega = 0; rega op= regb into rega = regb. The issue is that I'd failed to consider the (rare and unusual) case, where regb is rega, where the transformation leads to the incorrect "rega = rega", when it should be "rega = 0". The minimal fix is to add a !reg_mentioned_p check to the recent peephole2. In addition to resolving the regression, I've added a second peephole2 to optimize the problematic case above, which contains a false dependency and is therefore tricky to optimize elsewhere. This is an improvement over GCC 13, for example, that generates the redundant: xorl %edx, %edx xorq %rdx, %rdx 2023-07-12 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110598 * config/i386/i386.md (peephole2): Check !reg_mentioned_p when optimizing rega = 0; rega op= regb for op in [XOR,IOR,PLUS]. (peephole2): Simplify rega = 0; rega op= rega cases. gcc/testsuite/ChangeLog PR target/110598 * gcc.target/i386/pr110598.c: New test case.
2023-07-12gfortran: Allow ref'ing PDT's len() in parameter-initializer.Andre Vehreschild1-0/+21
Fix declaring a parameter initialized using a pdt_len reference not simplifying the reference to a constant. 2023-07-12 Andre Vehreschild <vehre@gcc.gnu.org> gcc/fortran/ChangeLog: PR fortran/102003 * expr.cc (find_inquiry_ref): Replace len of pdt_string by constant. (simplify_ref_chain): Ensure input to find_inquiry_ref is NULL. (gfc_match_init_expr): Prevent PDT analysis for function calls. (gfc_pdt_find_component_copy_initializer): Get the initializer value for given component. * gfortran.h (gfc_pdt_find_component_copy_initializer): New function. * simplify.cc (gfc_simplify_len): Replace len() of PDT with pdt component ref or constant. gcc/testsuite/ChangeLog: * gfortran.dg/pdt_33.f03: New test.
2023-07-12tree-optimization/110630 - enhance SLP permute supportRichard Biener2-0/+16
The following enhances the existing lowpart extraction support for SLP VEC_PERM nodes to cover all vector aligned extractions. This allows the existing bb-slp-pr95839.c testcase to be vectorized with mips -mpaired-single and the new bb-slp-pr95839-3.c testcase with SSE2. PR tree-optimization/110630 * tree-vect-slp.cc (vect_add_slp_permutation): New offset parameter, honor that for the extract code generation. (vectorizable_slp_permutation_1): Handle offsetted identities. * gcc.dg/vect/bb-slp-pr95839.c: Make stricter. * gcc.dg/vect/bb-slp-pr95839-3.c: New variant testcase.
2023-07-12RISC-V: Support integer mult highpart auto-vectorizationJu-Zhe Zhong4-0/+111
This patch is adding an obvious missing mult_high auto-vectorization pattern. Consider this following case: void __attribute__ ((noipa)) \ mod_##TYPE (TYPE *__restrict dst, TYPE *__restrict src, int count) \ { \ for (int i = 0; i < count; ++i) \ dst[i] = src[i] / 17; \ } T (int32_t) \ TEST_ALL (DEF_LOOP) Before this patch: mod_int32_t: ble a2,zero,.L5 li a5,17 vsetvli a3,zero,e32,m1,ta,ma vmv.v.x v2,a5 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v1,0(a1) vsetvli a3,zero,e32,m1,ta,ma slli a4,a5,2 vdiv.vv v1,v1,v2 sub a2,a2,a5 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret After this patch: mod_int32_t: ble a2,zero,.L5 li a5,2021163008 addiw a5,a5,-1927 vsetvli a3,zero,e32,m1,ta,ma vmv.v.x v3,a5 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v2,0(a1) vsetvli a3,zero,e32,m1,ta,ma slli a4,a5,2 vmulh.vv v1,v2,v3 sub a2,a2,a5 vsra.vi v2,v2,31 vsra.vi v1,v1,3 vsub.vv v1,v1,v2 vsetvli zero,a5,e32,m1,ta,ma vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret Even though a single "vdiv" is lower into "1 vmulh + 2 vsra + 1 vsub", 4 more instructions are generated, we belive it's much better than before since division is very slow in the hardward. gcc/ChangeLog: * config/riscv/autovec.md (smul<mode>3_highpart): New pattern. (umul<mode>3_highpart): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/mulh-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh_run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/mulh_run-2.c: New test.
2023-07-12x86: make better use of VBROADCASTSS / VPBROADCASTDJan Beulich4-0/+62
... in vec_dupv4sf / *vec_dupv4si. The respective broadcast insns are never longer (yet sometimes shorter) than the corresponding VSHUFPS / VPSHUFD, due to the immediate operand of the shuffle insns balancing the (uniform) need for VEX3 in the broadcast ones. When EVEX encoding is respective the broadcast insns are always shorter. Add new alternatives to cover the AVX2 and AVX512 cases as appropriate. While touching this anyway, switch to consistently using "sseshuf1" in the "type" attributes for all shuffle forms. gcc/ * config/i386/sse.md (vec_dupv4sf): Make first alternative use vbroadcastss for AVX2. New AVX512F alternative. (*vec_dupv4si): New AVX2 and AVX512F alternatives using vpbroadcastd. Replace sselog1 by sseshuf1 in "type" attribute. gcc/testsuite/ * gcc.target/i386/avx2-dupv4sf.c: New test. * gcc.target/i386/avx2-dupv4si.c: Likewise. * gcc.target/i386/avx512f-dupv4sf.c: Likewise. * gcc.target/i386/avx512f-dupv4si.c: Likewise.
2023-07-12riscv: xtheadbb: Add sign/zero extension support for th.ext and th.extuChristoph Müllner2-0/+134
The current support of the bitfield-extraction instructions th.ext and th.extu (XTheadBb extension) only covers sign_extract and zero_extract. This patch add support for sign_extend and zero_extend to avoid any shifts for sign or zero extensions. gcc/ChangeLog: * config/riscv/riscv.md: No base-ISA extension splitter for XThead*. * config/riscv/thead.md (*extend<SHORT:mode><SUPERQI:mode>2_th_ext): New XThead extension INSN. (*zero_extendsidi2_th_extu): New XThead extension INSN. (*zero_extendhi<GPR:mode>2_th_extu): New XThead extension INSN. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadbb-ext-1.c: New test. * gcc.target/riscv/xtheadbb-extu-1.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2023-07-12Break false dependence for vpternlog by inserting vpxor or setting ↵liuhongt2-1/+31
constraint of input operand to '0' False dependency happens when destination is only updated by pternlog. There is no false dependency when destination is also used in source. So either a pxor should be inserted, or input operand should be set with constraint '0'. gcc/ChangeLog: PR target/110438 PR target/110202 * config/i386/predicates.md (int_float_vector_all_ones_operand): New predicate. * config/i386/sse.md (*vmov<mode>_constm1_pternlog_false_dep): New define_insn. (*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep): Ditto. (*<avx512>_cvtmask2<ssemodesuffix><mode>_pternlog_false_dep): Ditto. (*<avx512>_cvtmask2<ssemodesuffix><mode>): Adjust to define_insn_and_split to avoid false dependence. (*<avx512>_cvtmask2<ssemodesuffix><mode>): Ditto. (<mask_codefor>one_cmpl<mode>2<mask_name>): Adjust constraint of operands 1 to '0' to avoid false dependence. (*andnot<mode>3): Ditto. (iornot<mode>3): Ditto. (*<nlogic><mode>3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110438.c: New test. * gcc.target/i386/pr100711-6.c: Adjust testcase.
2023-07-12Initial Granite Rapids D SupportMo, Zewei2-0/+7
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Granite Rapids D. * common/config/i386/i386-common.cc: (processor_alias_table): Add graniterapids-d. * common/config/i386/i386-cpuinfo.h (enum processor_subtypes): Add INTEL_COREI7_GRANITERAPIDS_D. * config.gcc: Add -march=graniterapids-d. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle graniterapids-d. * config/i386/i386.h: (PTA_GRANITERAPIDS_D): New. * doc/extend.texi: Add graniterapids-d. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Add graniterapids-d. * gcc.target/i386/funcspec-56.inc: Handle new march.
2023-07-12i386: Guard 128 bit VAES builtins with AVX512VLHaochen Jiang1-0/+12
Since commit 24a8acc, 128 bit intrin is enabled for VAES. However, AVX512VL is not checked until we reached into pattern, which reports an ICE. Added an AVX512VL guard at builtin to report error when checking ISA flags. gcc/ChangeLog: * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Add OPTION_MASK_ISA_AVX512VL. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-vaes-1.c: New test.
2023-07-12Daily bump.GCC Administrator1-0/+81
2023-07-12RISC-V: Optimize permutation codegen with vcompressJu-Zhe Zhong12-0/+1033
This patch is to recognize specific permutation pattern which can be applied compress approach. Consider this following case: typedef int8_t vnx64i __attribute__ ((vector_size (64))); 1, 2, 3, 5, 7, 9, 10, 11, 12, 14, 15, 17, 19, 21, 22, 23, 26, 28, 30, 31, \ 37, 38, 41, 46, 47, 53, 54, 55, 60, 61, 62, 63, 76, 77, 78, 79, 80, 81, \ 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, \ 100, 101, 102, 103, 104, 105, 106, 107 void __attribute__ ((noinline, noclone)) test_1 (int8_t *x, int8_t *y, int8_t *out) { vnx64i v1 = *(vnx64i*)x; vnx64i v2 = *(vnx64i*)y; vnx64i v3 = __builtin_shufflevector (v1, v2, MASK_64); *(vnx64i*)out = v3; } https://godbolt.org/z/P33nev6cW Before this patch: lui a4,%hi(.LANCHOR0) addi a4,a4,%lo(.LANCHOR0) vl4re8.v v4,0(a4) li a4,64 vsetvli a5,zero,e8,m4,ta,mu vl4re8.v v20,0(a0) vl4re8.v v16,0(a1) vmv.v.x v12,a4 vrgather.vv v8,v20,v4 vmsgeu.vv v0,v4,v12 vsub.vv v4,v4,v12 vrgather.vv v8,v16,v4,v0.t vs4r.v v8,0(a2) ret After this patch: lui a4,%hi(.LANCHOR0) addi a4,a4,%lo(.LANCHOR0) vsetvli a5,zero,e8,m4,ta,ma vl4re8.v v12,0(a1) vl4re8.v v8,0(a0) vlm.v v0,0(a4) vslideup.vi v4,v12,20 vcompress.vm v4,v8,v0 vs4r.v v4,0(a2) ret gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Add vcompress optimization. * config/riscv/riscv-v.cc (emit_vlmax_compress_insn): Ditto. (shuffle_compress_patterns): Ditto. (expand_vec_perm_const_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/compress_run-6.c: New test.
2023-07-11testsuite: Skip failing analyzer tests on AIX.David Edelsohn6-0/+6
Some of the analyzer out-of-bounds-diagram tests fail on AIX. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/out-of-bounds-diagram-4.c: Skip on AIX. * gcc.dg/analyzer/out-of-bounds-diagram-5-ascii.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-5-unicode.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-7.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-13.c: Same. * gcc.dg/analyzer/out-of-bounds-diagram-15.c: Same.
2023-07-11Fortran: formal symbol attributes for intrinsic procedures [PR110288]Harald Anlauf1-0/+13
gcc/fortran/ChangeLog: PR fortran/110288 * symbol.cc (gfc_copy_formal_args_intr): When deriving the formal argument attributes from the actual ones for intrinsic procedure calls, take special care of CHARACTER arguments that we do not wrongly treat them formally as deferred-length. gcc/testsuite/ChangeLog: PR fortran/110288 * gfortran.dg/findloc_10.f90: New test.
2023-07-11rs6000: Update the vsx-vector-6.* tests.Carl Love22-282/+1267
The vsx-vector-6.h file is included into the processor specific test files vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c. The .h file contains a large number of vsx vector built-in tests. The processor specific files contain the number of instructions that the tests are expected to generate for that processor. The tests are compile only. This patch reworks the tests into a series of files for related tests. The new tests consist of a runnable test to verify the built-in argument types and the functional correctness of each built-in. There is also a compile only test that verifies the built-ins generate the expected number of instructions for the various built-in tests. gcc/testsuite/ * gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file. * gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file. * gcc.target/powerpc/vsx-vector-6.h: Remove test file. * gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file. * gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file. * gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
2023-07-11testsuite: Require vectors of doubles for pr97428.cMaciej W. Rozycki1-0/+1
The pr97428.c test assumes support for vectors of doubles, but some targets only support vectors of floats, causing this test to fail with such targets. Limit this test to targets that support vectors of doubles then. gcc/testsuite/ * gcc.dg/vect/pr97428.c: Limit to `vect_double' targets.
2023-07-11[modula2] Improve uninitialized variable analysis by combining basic blocksGaius Mulley2-0/+62
This patch combines basic blocks for static analysis of uninitialized variables providing that they are not the top of a loop, are not reached by a conditional and are not reached after a procedure call. It also avoids checking array accesses for static analysis. Finally the patch adds switch modifiers to allow static analysis to include conditional branches for subsequent basic block analysis. gcc/ChangeLog: * doc/gm2.texi (-Wuninit-variable-checking=) New item. gcc/m2/ChangeLog: * gm2-compiler/M2BasicBlock.def (InitBasicBlocksFromRange): New parameter ScopeSym. * gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): New parameter ScopeSym. (InitBasicBlocksFromRange): New parameter ScopeSym. Call ConvertQuads2BasicBlock with ScopeSym. (DisplayBasicBlocks): Uncomment. * gm2-compiler/M2Code.mod: Replace VariableAnalysis with ScopeBlockVariableAnalysis. (InitialDeclareAndOptiomize): Add parameter scope. (SecondDeclareAndOptimize): Add parameter scope. * gm2-compiler/M2GCCDeclare.mod (DeclareConstructor): Add scope parameter to DeclareTypesConstantsProceduresInRange. (DeclareTypesConstantsProceduresInRange): New parameter scope. Pass scope to DisplayQuadRange. Reformatted. * gm2-compiler/M2GenGCC.def (ConvertQuadsToTree): New parameter scope. * gm2-compiler/M2GenGCC.mod (ConvertQuadsToTree): New parameter scope. * gm2-compiler/M2Optimize.mod (KnownReachable): New parameter scope. * gm2-compiler/M2Options.def (SetUninitVariableChecking): Add arg parameter. * gm2-compiler/M2Options.mod (SetUninitVariableChecking): Add arg parameter and set boolean UninitVariableChecking and UninitVariableConditionalChecking. (UninitVariableConditionalChecking): New boolean set to FALSE. * gm2-compiler/M2Quads.def (IsGoto): New procedure function. (DisplayQuadRange): Add scope parameter. (LoopAnalysis): Add scope parameter. * gm2-compiler/M2Quads.mod: Import PutVarArrayRef. (IsGoto): New procedure function. (LoopAnalysis): Add scope parameter and use MetaErrorT1 instead of WarnStringAt. (BuildStaticArray): Call PutVarArrayRef. (BuildDynamicArray): Call PutVarArrayRef. (DisplayQuadRange): Add scope parameter. (GetM2OperatorDesc): Add relational condition cases. * gm2-compiler/M2Scope.def (ScopeProcedure): Add parameter. * gm2-compiler/M2Scope.mod (DisplayScope): Pass scopeSym to DisplayQuadRange. (ForeachScopeBlockDo): Pass scopeSym to p. * gm2-compiler/M2SymInit.def (VariableAnalysis): Rename to ... (ScopeBlockVariableAnalysis): ... this. * gm2-compiler/M2SymInit.mod (ScopeBlockVariableAnalysis): Add scope parameter. (bbEntry): New pointer to record. (bbArray): New array. (bbFreeList): New variable. (errorList): New list. (IssueConditional): New procedure. (GenerateNoteFlow): New procedure. (IssueWarning): New procedure. (IsUniqueWarning): New procedure. (CheckDeferredRecordAccess): Re-implement. (CheckBinary): Add warning and lst parameters. (CheckUnary): Add warning and lst parameters. (CheckXIndr): Add warning and lst parameters. (CheckIndrX): Add warning and lst parameters. (CheckBecomes): Add warning and lst parameters. (CheckComparison): Add warning and lst parameters. (CheckReadBeforeInitQuad): Add warning and lst parameters to all Check procedures. Add all case quadruple clauses. (FilterCheckReadBeforeInitQuad): Add warning and lst parameters. (CheckReadBeforeInitFirstBasicBlock): Add warning and lst parameters. (bbArrayKill): New procedure. (DumpBBEntry): New procedure. (DumpBBArray): New procedure. (DumpBBSequence): New procedure. (TestBBSequence): New procedure. (CreateBBPermultations): New procedure. (ScopeBlockVariableAnalysis): New procedure. (GetOp3): New procedure. (GenerateCFG): New procedure. (NewEntry): New procedure. (AppendEntry): New procedure. (init): Initialize bbFreeList and errorList. * gm2-compiler/SymbolTable.def (PutVarArrayRef): New procedure. (IsVarArrayRef): New procedure function. * gm2-compiler/SymbolTable.mod (SymVar): ArrayRef new field. (MakeVar): Set ArrayRef to FALSE. (PutVarArrayRef): New procedure. (IsVarArrayRef): New procedure function. * gm2-gcc/init.cc (_M2_M2SymInit_init): New prototype. (init_PerCompilationInit): Add call to _M2_M2SymInit_init. * gm2-gcc/m2options.h (M2Options_SetUninitVariableChecking): New definition. * gm2-lang.cc (gm2_langhook_handle_option): Add new case OPT_Wuninit_variable_checking_. * lang.opt: Wuninit-variable-checking= new entry. gcc/testsuite/ChangeLog: * gm2/switches/uninit-variable-checking/cascade/fail/cascadedif.mod: New test. * gm2/switches/uninit-variable-checking/cascade/fail/switches-uninit-variable-checking-cascade-fail.exp: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-07-11c++: coercing variable template from current inst [PR110580]Patrick Palka1-0/+16
Here during ahead of time coercion of the variable template-id v1<int>, since we pass only the innermost arguments to coerce_template_parms (and outer arguments are still dependent at this point), substitution of the default template argument V=U just lowers U from level 2 to level 1 rather than replacing it with int as expected. Thus after coercion we incorrectly end up with (effectively) v1<int, T> instead of v1<int, int>. Coercion of a class/alias template-id on the other hand always passes all levels arguments, which avoids this issue. So this patch makes us do the same for variable template-ids. PR c++/110580 gcc/cp/ChangeLog: * pt.cc (lookup_template_variable): Pass all levels of arguments to coerce_template_parms, and use the parameters from the most general template. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/var-templ83.C: New test.
2023-07-11Fix typo in the testcase.liuhongt1-1/+1
Antony Polukhin 2023-07-11 09:51:58 UTC There's a typo at https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/testsuite/g%2B%2B.target/i386/pr110170.C;h=e638b12a5ee2264ecef77acca86432a9f24b103b;hb=d41a57c46df6f8f7dae0c0a8b349e734806a837b#l87 It should be `|| !test3() || !test3r()` rather than `|| !test3() || !test4r()` gcc/testsuite/ChangeLog: PR target/110170 * g++.target/i386/pr110170.C: Fix typo.
2023-07-11testsuite: Unbreak pr110557.cc where long is 32-bitXi Ruoyao1-6/+8
On ports with 32-bit long, the test produced excess errors: gcc/testsuite/g++.dg/vect/pr110557.cc:12:8: warning: width of 'Item::y' exceeds its type Reported-by: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> gcc/testsuite/ChangeLog: * g++.dg/vect/pr110557.cc: Use long long instead of long for 64-bit type. (test): Remove an unnecessary cast.
2023-07-11Daily bump.GCC Administrator1-0/+38
2023-07-10d: Merge upstream dmd, druntime a88e1335f7, phobos 1921d29df.Iain Buclaw79-232/+686
D front-end changes: - Import dmd v2.104.1. - Deprecation phase ended for access to private method when overloaded with public method. D runtime changes: - Import druntime v2.104.1. - Linux input header translations were added to druntime. - Integration with the Valgrind `memcheck' tool has been added to the garbage collector. Phobos changes: - Import phobos v2.104.1. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd a88e1335f7. * dmd/VERSION: Bump version to v2.104.1. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime a88e1335f7. * src/MERGE: Merge upstream phobos 1921d29df. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac (libphobos-checking): Add valgrind flag. (DRUNTIME_LIBRARIES_VALGRIND): Call. * libdruntime/Makefile.am (DRUNTIME_CSOURCES): Add etc/valgrind/valgrind_.c. (DRUNTIME_DSOURCES): Add etc/valgrind/valgrind.d. (DRUNTIME_DSOURCES_LINUX): Add core/sys/linux/input.d, core/sys/linux/input_event_codes.d, core/sys/linux/uinput.d. * libdruntime/Makefile.in: Regenerate. * m4/druntime/libraries.m4 (DRUNTIME_LIBRARIES_VALGRIND): Define.
2023-07-10c++: redeclare_class_template and ttps [PR110523]Patrick Palka1-0/+15
Now that we cache level-lowered ttps we can end up processing the same ttp multiple times via (multiple calls to) redeclare_class_template, so we can't assume a ttp's DECL_CONTEXT is initially empty. PR c++/110523 gcc/cp/ChangeLog: * pt.cc (redeclare_class_template): Relax the ttp DECL_CONTEXT assert, and downgrade it to a checking assert. gcc/testsuite/ChangeLog: * g++.dg/template/ttp37.C: New test.
2023-07-10arm: Fix MVE intrinsics support with LTO (PR target/110268)Christophe Lyon2-0/+35
After the recent MVE intrinsics re-implementation, LTO stopped working because the intrinsics would no longer be defined. The main part of the patch is simple and similar to what we do for AArch64: - call handle_arm_mve_h() from arm_init_mve_builtins to declare the intrinsics when the compiler is in LTO mode - actually implement arm_builtin_decl for MVE. It was just a bit tricky to handle __ARM_MVE_PRESERVE_USER_NAMESPACE: its value in the user code cannot be guessed at LTO time, so we always have to assume that it was not defined. The led to a few fixes in the way we register MVE builtins as placeholders or not. Without this patch, we would just omit some versions of the inttrinsics when __ARM_MVE_PRESERVE_USER_NAMESPACE is true. In fact, like for the C/C++ placeholders, we need to always keep entries for all of them to ensure that we have a consistent numbering scheme. 2023-06-26 Christophe Lyon <christophe.lyon@linaro.org> PR target/110268 gcc/ * config/arm/arm-builtins.cc (arm_init_mve_builtins): Handle LTO. (arm_builtin_decl): Hahndle MVE builtins. * config/arm/arm-mve-builtins.cc (builtin_decl): New function. (add_unique_function): Fix handling of __ARM_MVE_PRESERVE_USER_NAMESPACE. (add_overloaded_function): Likewise. * config/arm/arm-protos.h (builtin_decl): New declaration. gcc/testsuite/ * gcc.target/arm/pr110268-1.c: New test. * gcc.target/arm/pr110268-2.c: New test.
2023-07-10testsuite: Add _link flavor for several arm_arch* and arm* effective-targetsChristophe Lyon1-0/+27
For arm targets, we generate many effective-targets with check_effective_target_FUNC_multilib and check_effective_target_arm_arch_FUNC_multilib which check if we can link and execute a simple program with a given set of flags/multilibs. In some cases however, it's possible to link but not to execute a program, so this patch adds similar _link effective-targets which only check if link succeeds. The patch does not uupdate the documentation as it already lacks the numerous existing related effective-targets. 2023-07-07 Christophe Lyon <christophe.lyon@linaro.org> gcc/testsuite/ * lib/target-supports.exp (arm_*FUNC_link): New effective-targets.
2023-07-10vect: Fix vectorized BIT_FIELD_REF for signed bit-fields [PR110557]Xi Ruoyao1-0/+37
If a bit-field is signed and it's wider than the output type, we must ensure the extracted result sign-extended. But this was not handled correctly. For example: int x : 8; long y : 55; bool z : 1; The vectorized extraction of y was: vect__ifc__49.29_110 = MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.27_108]; vect_patt_38.30_112 = vect__ifc__49.29_110 & { 9223372036854775552, 9223372036854775552 }; vect_patt_39.31_113 = vect_patt_38.30_112 >> 8; vect_patt_40.32_114 = VIEW_CONVERT_EXPR<vector(2) long int>(vect_patt_39.31_113); This is obviously incorrect. This pach has implemented it as: vect__ifc__25.16_62 = MEM <vector(2) long unsigned int> [(struct Item *)vectp_a.14_60]; vect_patt_31.17_63 = VIEW_CONVERT_EXPR<vector(2) long int>(vect__ifc__25.16_62); vect_patt_32.18_64 = vect_patt_31.17_63 << 1; vect_patt_33.19_65 = vect_patt_32.18_64 >> 9; gcc/ChangeLog: PR tree-optimization/110557 * tree-vect-patterns.cc (vect_recog_bitfield_ref_pattern): Ensure the output sign-extended if necessary. gcc/testsuite/ChangeLog: PR tree-optimization/110557 * g++.dg/vect/pr110557.cc: New test.
2023-07-10i386: Add new insvti_lowpart_1 and insvdi_lowpart_1 patterns.Roger Sayle2-0/+26
This patch implements another of Uros' suggestions, to investigate a insvti_lowpart_1 pattern to improve TImode parameter passing on x86_64. In PR 88873, the RTL the middle-end expands for passing V2DF in TImode is subtly different from what it does for V2DI in TImode, sufficiently so that my explanations for why insvti_lowpart_1 isn't required don't apply in this case. This patch adds an insvti_lowpart_1 pattern, complementing the existing insvti_highpart_1 pattern, and also a 32-bit variant, insvdi_lowpart_1. Because the middle-end represents 128-bit constants using CONST_WIDE_INT and 64-bit constants using CONST_INT, it's easiest to treat these as different patterns, rather than attempt <dwi> parameterization. This patch also includes a peephole2 (actually a pair) to transform xchg instructions into mov instructions, when one of the destinations is unused. This optimization is required to produce the optimal code sequences below. For the 64-bit case: __int128 foo(__int128 x, unsigned long long y) { __int128 m = ~((__int128)~0ull); __int128 t = x & m; __int128 r = t | y; return r; } Before: xchgq %rdi, %rsi movq %rdx, %rax xorl %esi, %esi xorl %edx, %edx orq %rsi, %rax orq %rdi, %rdx ret After: movq %rdx, %rax movq %rsi, %rdx ret For the 32-bit case: long long bar(long long x, int y) { long long mask = ~0ull << 32; long long t = x & mask; long long r = t | (unsigned int)y; return r; } Before: pushl %ebx movl 12(%esp), %edx xorl %ebx, %ebx xorl %eax, %eax movl 16(%esp), %ecx orl %ebx, %edx popl %ebx orl %ecx, %eax ret After: movl 12(%esp), %eax movl 8(%esp), %edx ret 2023-07-10 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Transform xchg insn with a REG_UNUSED note to a (simple) move. (*insvti_lowpart_1): New define_insn_and_split. (*insvdi_lowpart_1): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/insvdi_lowpart-1.c: New test case. * gcc.target/i386/insvti_lowpart-1.c: Likewise.
2023-07-10i386: Add AVX512 support for STV of SI/DImode rotation by constant.Roger Sayle1-0/+35
Following Uros' suggestion, this patch adds support for AVX512VL's vpro[lr][dq] instructions to the recently added scalar-to-vector (STV) enhancements to handle DImode and SImode rotations by a constant. For the test cases: unsigned long long rot1(unsigned long long x) { return (x>>1) | (x<<63); } void mem1(unsigned long long *p) { *p = rot1(*p); } with -m32 -O2 -mavx512vl, we currently generate: rot1: movl 4(%esp), %eax movl 8(%esp), %edx movl %eax, %ecx shrdl $1, %edx, %eax shrdl $1, %ecx, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vpshufd $20, %xmm0, %xmm0 vpsrlq $1, %xmm0, %xmm0 vpshufd $136, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret with this patch, we now generate: rot1: vmovq 4(%esp), %xmm0 vprorq $1, %xmm0, %xmm0 vmovd %xmm0, %eax vpextrd $1, %xmm0, %edx ret mem1: movl 4(%esp), %eax vmovq (%eax), %xmm0 vprorq $1, %xmm0, %xmm0 vmovq %xmm0, (%eax) ret 2023-07-10 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Tweak gains/costs for ROTATE/ROTATERT by integer constant on AVX512VL. (general_scalar_chain::convert_rotate): On TARGET_AVX512F generate avx512vl_rolv2di or avx412vl_rolv4si when appropriate. gcc/testsuite/ChangeLog * gcc.target/i386/avx512vl-stv-rotatedi-1.c: New test case.
2023-07-10d: Merge upstream dmd, druntime 17ccd12af3, phobos 8d3800bee.Iain Buclaw67-109/+486
D front-end changes: - Import dmd v2.104.0. - Assignment-style syntax is now allowed for `alias this'. - Overloading `extern(C)' functions is now an error. D runtime changes: - Import druntime v2.104.0. Phobos changes: - Import phobos v2.104.0. - Better static assert messages when instantiating `std.algorithm.iteration.permutations' with wrong inputs. - Added `std.system.instructionSetArchitecture' and `std.system.ISA'. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 17ccd12af3. * dmd/VERSION: Bump version to v2.104.0. * Make-lang.in (D_FRONTEND_OBJS): Rename d/apply.o to d/postordervisitor.o. * d-codegen.cc (make_location_t): Update for new front-end interface. (build_filename_from_loc): Likewise. (build_assert_call): Likewise. (build_array_bounds_call): Likewise. (build_bounds_index_condition): Likewise. (build_bounds_slice_condition): Likewise. (build_frame_type): Likewise. (get_frameinfo): Likewise. * d-diagnostic.cc (d_diagnostic_report_diagnostic): Likewise. * decl.cc (build_decl_tree): Likewise. (start_function): Likewise. * expr.cc (ExprVisitor::visit (NewExp *)): Replace code generation of `new pointer' with front-end lowering. * runtime.def (NEWITEMT): Remove. (NEWITEMIT): Remove. * toir.cc (IRVisitor::visit (LabelStatement *)): Update for new front-end interface. * typeinfo.cc (check_typeinfo_type): Likewise. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 17ccd12af3. * src/MERGE: Merge upstream phobos 8d3800bee. gcc/testsuite/ChangeLog: * gdc.dg/asm4.d: Update test.
2023-07-10Add pre_reload splitter to detect fp min/max pattern.liuhongt2-0/+111
We have ix86_expand_sse_fp_minmax to detect min/max sematics, but it requires rtx_equal_p for cmp_op0/cmp_op1 and if_true/if_false, for the testcase in the PR, there's an extra move from cmp_op0 to if_true, and it failed ix86_expand_sse_fp_minmax. This patch adds pre_reload splitter to detect the min/max pattern. Operands order in MINSS matters for signed zero and NANs, since the instruction always returns second operand when any operand is NAN or both operands are zero. gcc/ChangeLog: PR target/110170 * config/i386/i386.md (*ieee_max<mode>3_1): New pre_reload splitter to detect fp max pattern. (*ieee_min<mode>3_1): Ditto, but for fp min pattern. gcc/testsuite/ChangeLog: * g++.target/i386/pr110170.C: New test. * gcc.target/i386/pr110170.c: New test.
2023-07-10Daily bump.GCC Administrator1-0/+9
2023-07-09d: Merge upstream dmd, druntime 28a3b24c2e, phobos 8ab95ded5.Iain Buclaw81-160/+680
D front-end changes: - Import dmd v2.104.0-beta.1. - Better error message when attribute inference fails down the call stack. - Using `;' as an empty statement has been turned into an error. - Using `in' parameters with non- `extern(D)' or `extern(C++)' functions is deprecated. - `in ref' on parameters has been deprecated in favor of `-preview=in'. - Throwing `immutable', `const', `inout', and `shared' qualified objects is now deprecated. - User Defined Attributes now parse Template Arguments. D runtime changes: - Import druntime v2.104.0-beta.1. Phobos changes: - Import phobos v2.104.0-beta.1. - Better static assert messages when instantiating `std.algorithm.comparison.clamp' with wrong inputs. - `std.typecons.Rebindable' now supports all types. gcc/d/ChangeLog: * dmd/MERGE: Merge upstream dmd 28a3b24c2e. * dmd/VERSION: Bump version to v2.104.0-beta.1. * d-codegen.cc (build_bounds_slice_condition): Update for new front-end interface. * d-lang.cc (d_init_options): Likewise. (d_handle_option): Likewise. (d_post_options): Initialize global.compileEnv. * expr.cc (ExprVisitor::visit (CatExp *)): Replace code generation with new front-end lowering. (ExprVisitor::visit (LoweredAssignExp *)): New method. (ExprVisitor::visit (StructLiteralExp *)): Don't generate static initializer symbols for structs defined in C sources. * runtime.def (ARRAYCATT): Remove. (ARRAYCATNTX): Remove. libphobos/ChangeLog: * libdruntime/MERGE: Merge upstream druntime 28a3b24c2e. * src/MERGE: Merge upstream phobos 8ab95ded5. gcc/testsuite/ChangeLog: * gdc.dg/rtti1.d: Move array concat testcase to ... * gdc.dg/nogc1.d: ... here. New test.
2023-07-09Improve dumping of profile_countJan Hubicka1-1/+1
Dumps of profile_counts are quite hard to interpret since they are 64bit fixed point values. In many cases one looks at a single function and it is better to think of basic block frequency, that is how many times it is executed each invocatoin. This patch makes CFG dumps to also print this info. For example: main() { for (int i = 0; i < 10; i++) t(); } the -fdump-tree-optimized-blocks-details now prints: int main () { unsigned int ivtmp_1; unsigned int ivtmp_2; ;; basic block 2, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot ;; prev block 0, next block 3, flags: (NEW, VISITED) ;; pred: ENTRY [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) ;; succ: 3 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) ;; basic block 3, loop depth 1, count 976138697 (estimated locally, freq 10.0011), maybe hot ;; prev block 2, next block 4, flags: (NEW, VISITED) ;; pred: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE) ;; 2 [always] count:97603128 (estimated locally, freq 1.0000) (FALLTHRU,EXECUTABLE) # ivtmp_2 = PHI <ivtmp_1(3), 10(2)> t (); ivtmp_1 = ivtmp_2 + 4294967295; if (ivtmp_1 != 0) goto <bb 3>; [90.00%] else goto <bb 4>; [10.00%] ;; succ: 3 [90.0% (guessed)] count:878535568 (estimated locally, freq 9.0011) (TRUE_VALUE,EXECUTABLE) ;; 4 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE) ;; basic block 4, loop depth 0, count 97603128 (estimated locally, freq 1.0000), maybe hot ;; prev block 3, next block 1, flags: (NEW, VISITED) ;; pred: 3 [10.0% (guessed)] count:97603129 (estimated locally, freq 1.0000) (FALSE_VALUE,EXECUTABLE) return 0; ;; succ: EXIT [always] count:97603128 (estimated locally, freq 1.0000) (EXECUTABLE) } Which makes it easier to see that the inner bb is executed 10 times per invocation gcc/ChangeLog: * cfg.cc (check_bb_profile): Dump counts with relative frequency. (dump_edge_info): Likewise. (dump_bb_info): Likewise. * profile-count.cc (profile_count::dump): Add comma between quality and freq. gcc/testsuite/ChangeLog: * gcc.dg/predict-22.c: Update template.
2023-07-09Daily bump.GCC Administrator1-0/+31
2023-07-08Add missing profile_dump checkJan Hubicka1-0/+6
gcc/ChangeLog: PR tree-optimization/110600 * cfgloopmanip.cc (scale_loop_profile): Add mising profile_dump check. gcc/testsuite/ChangeLog: PR tree-optimization/110600 * gcc.c-torture/compile/pr110600.c: New test.
2023-07-08Fortran: Fix default type bugs in gfortran [PR99139, PR99368]Paul Thomas2-0/+41
2023-07-08 Steve Kargl <sgk@troutmask.apl.washington.edu> gcc/fortran PR fortran/99139 PR fortran/99368 * match.cc (gfc_match_namelist): Check for host associated or defined types before applying default type. (gfc_match_select_rank): Apply default type to selector of unknown type if possible. * resolve.cc (resolve_fl_variable): Do not apply local default initialization to assumed rank entities. gcc/testsuite/ PR fortran/99139 * gfortran.dg/pr99139.f90 : New test PR fortran/99368 * gfortran.dg/pr99368.f90 : New test
2023-07-08Fix tree-ssa/update-cunroll.cJan Hubicka2-2/+2
In this testcase the profile is misupdated before loop has two exits. The first exit is one eliminated by complete unrolling while second exit remains. We remove first exit but forget about fact that the source BB of other exit will then have higher frequency making other exit more likely. This patch fixes that in duplicate_loop_body_to_header_edge. While looking into resulting profiles I also noticed that in some cases scale_loop_profile may drop probabilities to 0 incorrectly either when trying to update exit from nested loop (which has similar problem) or when the profile was inconsistent as described in coment bellow. gcc/ChangeLog: PR middle-end/110590 * cfgloopmanip.cc (scale_loop_profile): Avoid scaling exits within inner loops and be more careful about inconsistent profiles. (duplicate_loop_body_to_header_edge): Fix profile update when eliminated exit is followed by other exit. gcc/testsuite/ChangeLog: PR middle-end/110590 * gcc.dg/tree-prof/update-cunroll-2.c: Remove xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise.
2023-07-08Fortran: fixes for procedures with ALLOCATABLE,INTENT(OUT) arguments [PR92178]Harald Anlauf3-0/+166
gcc/fortran/ChangeLog: PR fortran/92178 * trans-expr.cc (gfc_conv_procedure_call): Check procedures for allocatable dummy arguments with INTENT(OUT) and move deallocation of actual arguments after evaluation of argument expressions before the procedure is executed. gcc/testsuite/ChangeLog: PR fortran/92178 * gfortran.dg/intent_out_16.f90: New test. * gfortran.dg/intent_out_17.f90: New test. * gfortran.dg/intent_out_18.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2023-07-08Fortran: simplification of FINDLOC for constant complex arguments [PR110585]Harald Anlauf1-0/+19
gcc/fortran/ChangeLog: PR fortran/110585 * arith.cc (gfc_compare_expr): Handle equality comparison of constant complex gfc_expr arguments. gcc/testsuite/ChangeLog: PR fortran/110585 * gfortran.dg/findloc_9.f90: New test.
2023-07-08Daily bump.GCC Administrator1-0/+77
2023-07-07Fix fallout from re-enabling profile consistency checks.Jan Hubicka5-9/+9
gcc/testsuite/ChangeLog: * gcc.dg/pr43864-2.c: Avoid matching pre dump with details-blocks. * gcc.dg/pr43864-3.c: Likewise. * gcc.dg/pr43864-4.c: Likewise. * gcc.dg/pr43864.c: Likewise. * gcc.dg/unroll-7.c: xfail.
2023-07-07Collect both user and kernel events for autofdo tests and autoprofiledbootstrapEugene Rozenfeld1-1/+1
When we collect just user events for autofdo with lbr we get some events where branch sources are kernel addresses and branch targets are user addresses. Without kernel MMAP events create_gcov can't make sense of kernel addresses. Currently create_gcov fails if it can't map at least 95% of events. We sometimes get below this threshold with just user events. The change is to collect both user events and kernel events. Tested on x86_64-pc-linux-gnu. ChangeLog: * Makefile.in: Collect both kernel and user events for autofdo * Makefile.tpl: Collect both kernel and user events for autofdo gcc/testsuite/ChangeLog: * lib/target-supports.exp: Collect both kernel and user events for autofdo
2023-07-07i386: Improve __int128 argument passing (in ix86_expand_move).Roger Sayle2-0/+18
Passing 128-bit integer (TImode) parameters on x86_64 can sometimes result in surprising code. Consider the example below (from PR 43644): unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } which currently results in 6 consecutive movq instructions: foo: movq %rsi, %rax movq %rdi, %rsi movq %rdx, %rcx movq %rax, %rdi movq %rsi, %rax movq %rdi, %rdx addq %rcx, %rax adcq $0, %rdx ret The underlying issue is that during RTL expansion, we generate the following initial RTL for the x argument: (insn 4 3 5 2 (set (reg:TI 85) (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) (reg:DI 87)) "pr43644-2.c":5:1 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 84 [ x ]) (reg:TI 85)) "pr43644-2.c":5:1 -1 (nil)) which by combine/reload becomes (insn 25 3 22 2 (set (reg/v:TI 84 [ x ]) (const_int 0 [0])) "pr43644-2.c":5:1 -1 (nil)) (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0) (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 93) (nil))) (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8) (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 94) (nil))) where the heavy use of SUBREG SET_DESTs creates challenges for both combine and register allocation. The improvement proposed here is to avoid these problematic SUBREGs by adding (two) special cases to ix86_expand_move. For insn 4, which sets a TImode destination from a paradoxical SUBREG, to assign the lowpart, we can use an explicit zero extension (zero_extendditi2 was added in July 2022), and for insn 5, which sets the highpart of a TImode register we can use the *insvti_highpart_1 instruction (that was added in May 2023, after being approved for stage1 in January). This allows combine to work its magic, merging these insns into a *concatditi3 and from there into other optimized forms. So for the test case above, we now generate only a single movq: foo: movq %rdx, %rax xorl %edx, %edx addq %rdi, %rax adcq %rsi, %rdx ret But there is a little bad news. This patch causes two (minor) missed optimization regressions on x86_64; gcc.target/i386/pr82580.c and gcc.target/i386/pr91681-1.c. As shown in the test case above, we're no longer generating adcq $0, but instead using xorl. For the other FAIL, register allocation now has more freedom and is (arbitrarily) choosing a register assignment that doesn't match what the test is expecting. These issues are easier to explain and fix once this patch is in the tree. The good news is that this approach fixes a number of long standing issues, that need to checked in bugzilla, including PR target/110533 which was just opened/reported earlier this week. 2023-07-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/43644 PR target/110533 * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of TImode destinations from paradoxical SUBREGs (setting the lowpart) into explicit zero extensions. Use *insvti_highpart_1 instruction to set the highpart of a TImode destination. gcc/testsuite/ChangeLog PR target/43644 PR target/110533 * gcc.target/i386/pr110533.c: New test case. * gcc.target/i386/pr43644-2.c: Likewise.
2023-07-07d: Fix PR 108842: Cannot use enum array with -fno-druntimeIain Buclaw2-0/+15
Restrict the generating of CONST_DECLs for D manifest constants to just scalars without pointers. It shouldn't happen that a reference to a manifest constant has not been expanded within a function body during codegen, but it has been found to occur in older versions of the D front-end (PR98277), so if the decl of a non-scalar constant is requested, just return its initializer as an expression. PR d/108842 gcc/d/ChangeLog: * decl.cc (DeclVisitor::visit (VarDeclaration *)): Only emit scalar manifest constants. (get_symbol_decl): Don't generate CONST_DECL for non-scalar manifest constants. * imports.cc (ImportVisitor::visit (VarDeclaration *)): New method. gcc/testsuite/ChangeLog: * gdc.dg/pr98277.d: Add more tests. * gdc.dg/pr108842.d: New test.
2023-07-07Fix some profile consistency testcasesJan Hubicka25-29/+33
Information about profile mismatches is printed only with -details-blocks for some time. I think it should be printed even with default to make it easier to spot when someone introduces new transform that breaks the profile, but I will send separate RFC for that. This patch enables details in all testcases that greps for Invalid sum. There are 4 testcases which fails: gcc.dg/tree-ssa/loop-ch-profile-1.c here the problem is that loop header dulication introduces loop invariant conditoinal that is later updated by tree-ssa-dom but dom does not take care of updating profile. Since loop-ch knows when it duplicates loop invariant, we may be able to get this right. The test is still useful since it tests that right after ch profile is consistent. gcc.dg/tree-prof/update-cunroll-2.c This is about profile updating code in duplicate_loop_body_to_header_edge being wrong when optimized out exit is not last in the loop. In that case the probability of later exits needs to be accounted in. I will think about making this better - in general this does not seem to have easy solution, but for special case of chained tests we can definitely account for the later exits. gcc.dg/tree-ssa/update-unroll-1.c This fails after aprefetch invoked unrolling. I did not look into details yet. gcc.dg/tree-prof/update-unroll-2.c This one seems similar as previous I decided to xfail these tests and deal with them incrementally and filled in PR110590. gcc/testsuite/ChangeLog: * g++.dg/tree-prof/indir-call-prof.C: Add block-details to dump flags. * gcc.dg/pr43864-2.c: Likewise. * gcc.dg/pr43864-3.c: Likewise. * gcc.dg/pr43864-4.c: Likewise. * gcc.dg/pr43864.c: Likewise. * gcc.dg/tree-prof/cold_partition_label.c: Likewise. * gcc.dg/tree-prof/indir-call-prof.c: Likewise. * gcc.dg/tree-prof/update-cunroll-2.c: Likewise. * gcc.dg/tree-prof/update-tailcall.c: Likewise. * gcc.dg/tree-prof/val-prof-1.c: Likewise. * gcc.dg/tree-prof/val-prof-2.c: Likewise. * gcc.dg/tree-prof/val-prof-3.c: Likewise. * gcc.dg/tree-prof/val-prof-4.c: Likewise. * gcc.dg/tree-prof/val-prof-5.c: Likewise. * gcc.dg/tree-ssa/fnsplit-1.c: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-2.c: Likewise. * gcc.dg/tree-ssa/update-threading.c: Likewise. * gcc.dg/tree-ssa/update-unswitch-1.c: Likewise. * gcc.dg/unroll-7.c: Likewise. * gcc.dg/unroll-8.c: Likewise. * gfortran.dg/pr25623-2.f90: Likewise. * gfortran.dg/pr25623.f90: Likewise. * gcc.dg/tree-ssa/loop-ch-profile-1.c: Likewise; xfail. * gcc.dg/tree-ssa/update-cunroll.c: Likewise; xfail. * gcc.dg/tree-ssa/update-unroll-1.c: Likewise; xfail.
2023-07-07Fix epilogue loop profileJan Hubicka1-0/+9
Fix two bugs in scale_loop_profile which crept in during my cleanups and curiously enoug did not show on the testcases we have so far. The patch also adds the missing call to cap iteration count of the vectorized loop epilogues. Vectorizer profile needs more work, but I am trying to chase out obvious bugs first so the profile quality statistics become meaningful and we can try to improve on them. Now we get: Pass dump id and name |static mismatcdynamic mismatch |in count |in count 107t cunrolli | 3 +3| 17251 +17251 116t vrp | 5 +2| 30908 +16532 118t dce | 3 -2| 17251 -13657 127t ch | 13 +10| 17251 131t dom | 39 +26| 17251 133t isolate-paths | 47 +8| 17251 134t reassoc | 49 +2| 17251 136t forwprop | 53 +4| 202501 +185250 159t cddce | 61 +8| 216211 +13710 161t ldist | 62 +1| 216211 172t ifcvt | 66 +4| 373711 +157500 173t vect | 143 +77| 9801947 +9428236 176t cunroll | 149 +6| 12006408 +2204461 183t loopdone | 146 -3| 11944469 -61939 195t fre | 142 -4| 11944469 197t dom | 141 -1| 13038435 +1093966 199t threadfull | 143 +2| 13246410 +207975 200t vrp | 145 +2| 13444579 +198169 204t dce | 143 -2| 13371315 -73264 206t sink | 141 -2| 13371315 211t cddce | 147 +6| 13372755 +1440 255t optimized | 145 -2| 13372755 256r expand | 141 -4| 13371197 -1558 258r into_cfglayout | 139 -2| 13371197 275r loop2_unroll | 143 +4| 16792056 +3420859 291r ce2 | 141 -2| 16811462 312r pro_and_epilogue | 161 +20| 16873400 +61938 315r jump2 | 167 +6| 20910158 +4036758 323r bbro | 160 -7| 16559844 -4350314 Vect still introduces 77 profile mismatches (same as without this patch) however subsequent cunroll works much better with 6 new mismatches compared to 78. Overall it reduces 229 mismatches to 160. Also overall runtime estimate is now reduced by 6.9%. Previously the overall runtime estimate grew by 11% which was result of the fat that the epilogue profile was pretty much the same as profile of the original loop. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfgloopmanip.cc (scale_loop_profile): Fix computation of count_in and scaling blocks after exit. * tree-vect-loop-manip.cc (vect_do_peeling): Scale loop profile of the epilogue if bound is known. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vect-profile-upate.c: New test.
2023-07-07IBM Z: Fix vec_init default expanderJuergen Christ1-0/+17
Do not reinitialize vector lanes to zero since they are already initialized to zero. gcc/ChangeLog: * config/s390/s390.cc (vec_init): Fix default case gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-init-3.c: New test.
2023-07-07LRA: Refine reload pseudo classVladimir N. Makarov1-0/+14
For given testcase a reload pseudo happened to occur only in reload insns created on one constraint sub-pass. Therefore its initial class (ALL_REGS) was not refined and the reload insns were not processed on the next constraint sub-passes. This resulted into the wrong insn. PR rtl-optimization/110372 gcc/ChangeLog: * lra-assigns.cc (assign_by_spills): Add reload insns involving reload pseudos with non-refined class to be processed on the next sub-pass. * lra-constraints.cc (enough_allocatable_hard_regs_p): New func. (in_class_p): Use it. (print_curr_insn_alt): New func. (process_alt_operands): Use it. Improve debug info. (curr_insn_transform): Use print_curr_insn_alt. Refine reload pseudo class if it is not refined yet. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110372.c: New.