aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-26recog.cc: Correct comments referring to parameter match_lenHans-Peter Nilsson1-2/+2
* recog.cc (peep2_attempt, peep2_update_life): Correct head-comment description of parameter match_len.
2023-04-25Regenerate gcc.potJoseph Myers1-4089/+4190
* gcc.pot: Regenerate.
2023-04-25c++: value dependence of by-ref lambda capture [PR108975]Patrick Palka2-3/+25
We are still ICEing on the generic lambda version of the testcase from this PR, even after r13-6743-g6f90de97634d6f, due to the by-ref capture of the constant local variable 'dim' being considered value-dependent when regenerating the lambda (at which point processing_template_decl is set since the lambda is generic), which prevents us from constant folding its uses. Later during prune_lambda_captures we end up not thoroughly walking the body of the lambda and overlook the (non-folded) uses of 'dim' within the array bound and using-decls. We could fix this by making prune_lambda_captures walk the body of the lambda more thoroughly so that it finds these uses of 'dim', but ideally we should be able to constant fold all uses of 'dim' ahead of time and prune the implicit capture after all. To that end this patch makes value_dependent_expression_p return false for such by-ref captures of constant local variables, allowing their uses to get constant folded ahead of time. It seems we just need to disable the predicate's conservative early exit for reference variables (added by r5-5022-g51d72abe5ea04e) when DECL_HAS_VALUE_EXPR_P. This effectively makes us treat by-value and by-ref captures more consistently when it comes to value dependence. PR c++/108975 gcc/cp/ChangeLog: * pt.cc (value_dependent_expression_p) <case VAR_DECL>: Suppress conservative early exit for reference variables when DECL_HAS_VALUE_EXPR_P. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-const11a.C: New test.
2023-04-25riscv: relax splitter restrictions for creating pseudosVineet Gupta3-34/+24
[partial addressing of PR/109279] RISCV splitters have restrictions to not create pesudos due to a combine limitatation. And despite this being a split-during-combine limitation, all split passes take the hit due to way define*_split are used in gcc. With the original combine issue being fixed 61bee6aed2 ("combine: Don't record for UNDO_MODE pointers into regno_reg_rtx array [PR104985]") the RV splitters can now be relaxed. This improves the codegen in general. e.g. long long f(void) { return 0x0101010101010101ull; } Before li a0,0x01010000 addi a0,0x0101 slli a0,a0,16 addi a0,a0,0x0101 slli a0,a0,16 addi a0,a0,0x0101 ret With patch li a5,0x01010000 addi a5,a5,0x0101 mv a0,a5 slli a5,a5,32 add a0,a5,a0 ret This reduces the qemu icounts, even if slightly, across SPEC2017. 500.perlbench_r 0 1235310737733 1231742384460 0.29% 1 744489708820 743515759958 2 714072106766 712875768625 0.17% 502.gcc_r 0 197365353269 197178223030 1 235614445254 235465240341 2 226769189971 226604663947 3 188315686133 188123584015 4 289372107644 289187945424 503.bwaves_r 0 326291538768 326291539697 1 515809487294 515809488863 2 401647004144 401647005463 3 488750661035 488750662484 505.mcf_r 0 681926695281 681925418147 507.cactuBSSN_r 0 3832240965352 3832226068734 508.namd_r 0 1919838790866 1919832527292 510.parest_r 0 3515999635520 3515878553435 511.povray_r 0 3073889223775 3074758622749 519.lbm_r 0 1194077464296 1194077464041 520.omnetpp_r 0 1014144252460 1011530791131 0.26% 521.wrf_r 0 3966715533120 3966265425092 523.xalancbmk_r 0 1064914296949 1064506711802 525.x264_r 0 509290028335 509258131632 1 2001424246635 2001677767181 2 1914660798226 1914869407575 526.blender_r 0 1726083839515 1725974286174 527.cam4_r 0 2336526136415 2333656336419 531.deepsjeng_r 0 1689007489539 1686541299243 0.15% 538.imagick_r 0 3247960667520 3247942048723 541.leela_r 0 2072315300365 2070248271250 544.nab_r 0 1527909091282 1527906483039 548.exchange2_r 0 2086120304280 2086314757502 549.fotonik3d_r 0 2261694058444 2261670330720 554.roms_r 0 2640547903140 2640512733483 557.xz_r 0 388736881767 386880875636 0.48% 1 959356981818 959993132842 2 547643353034 546374038310 0.23% 997.specrand_fr 0 512881578 512599641 999.specrand_ir 0 512881578 512599641 This is testsuite clean, no regression w/ patch. ========= Summary of gcc testsuite ========= | # of unexpected case / # of unique unexpected case | gcc | g++ | gfortran | rv64imafdc/ lp64d/ medlow | 2 / 2 | 1 / 1 | 6 / 1 | rv64imac/ lp64/ medlow | 3 / 3 | 1 / 1 | 43 / 8 | rv32imafdc/ ilp32d/ medlow | 1 / 1 | 3 / 2 | 6 / 1 | rv32imac/ ilp32/ medlow | 1 / 1 | 3 / 2 | 43 / 8 | This came up as part of IRC chat on PR/109279 and was suggested by Andrew Pinski. gcc/ChangeLog: * config/riscv/riscv.md: riscv_move_integer() drop in_splitter arg. riscv_split_symbol() drop in_splitter arg. * config/riscv/riscv.cc: riscv_move_integer() drop in_splitter arg. riscv_split_symbol() drop in_splitter arg. riscv_force_temporary() drop in_splitter arg. * config/riscv/riscv-protos.h: riscv_move_integer() drop in_splitter arg. riscv_split_symbol() drop in_splitter arg. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-04-25Avoid creating useless debug temporariesEric Botcazou1-11/+3
insert_debug_temp_for_var_def has some strange code whereby it creates debug temporaries for SINGLE_RHS (RHS for gimple_assign_single_p) but not for other RHS in the same situation. gcc/ * tree-ssa.cc (insert_debug_temp_for_var_def): Do not create superfluous debug temporaries for single GIMPLE assignments.
2023-04-25tree-optimization/109609 - correctly interpret arg size in fnspecRichard Biener3-5/+43
By majority vote and a hint from the API name which is arg_max_access_size_given_by_arg_p this interprets a memory access size specified as given as other argument such as for strncpy in the testcase which has "1cO313" as specifying the _maximum_ size read/written rather than the exact size. There are two uses interpreting it that way already and one differing. The following adjusts the differing and clarifies the documentation. PR tree-optimization/109609 * attr-fnspec.h (arg_max_access_size_given_by_arg_p): Clarify semantics. * tree-ssa-alias.cc (check_fnspec): Correctly interpret the size given by arg_max_access_size_given_by_arg_p as maximum, not exact, size. * gcc.dg/torture/pr109609.c: New testcase.
2023-04-25'omp scan' struct block seq update for OpenMP 5.xTobias Burnus15-51/+545
While OpenMP 5.0 required a single structured block before and after the 'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence, denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but till requires the '{' ... '}' for the final-loop-body) and updates Fortran to accept zero or more than one statements. If there is no preceeding or succeeding executable statement, a warning is shown. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/fortran/ChangeLog: * openmp.cc (gfc_resolve_omp_do_blocks): Handle zero or more than one exec statements before/after 'omp scan'. * trans-openmp.cc (gfc_trans_omp_do): Likewise. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/scan-1.c: New test. * testsuite/libgomp.c/scan-23.c: New test. * testsuite/libgomp.fortran/scan-2.f90: New test. gcc/testsuite/ChangeLog: * g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning. * gfortran.dg/gomp/loop-2.f90: Likewise. * gfortran.dg/gomp/reduction5.f90: Likewise. * gfortran.dg/gomp/reduction6.f90: Likewise. * gfortran.dg/gomp/scan-1.f90: Likewise. * gfortran.dg/gomp/taskloop-2.f90: Likewise. * c-c++-common/gomp/scan-6.c: New test. * gfortran.dg/gomp/scan-8.f90: New test.
2023-04-25testsuite: Fix up ext-floating2.C on powerpc64-linuxJakub Jelinek1-0/+4
Another testcase that is failing on powerpc64-linux. The test expects a diagnostics when float64 && float128 or in another spot when float32 && float128. Now, float128 effective target is satisfied on powerpc64-linux, despite __CPP_FLOAT128_T__ not being defined, because one needs to add some extra options for it. I think 32-bit arm has similar case for float16. 2023-04-25 Jakub Jelinek <jakub@redhat.com> * g++.dg/cpp23/ext-floating2.C: Add dg-add-options for float16, float32, float64 and float128.
2023-04-25aarch64: PR target/PR99195 Annotate more simple integer binary patterns with ↵Kyrylo Tkachov2-13/+12
vcz subst rules This patch adds more straightforward annotations to some more integer binary ops to eliminate redundant fmovs around 64-bit SIMD results. Bootstrapped and tested on aarch64-none-linux. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (orn<mode>3): Rename to... (orn<mode>3<vczle><vczbe>): ... This. (bic<mode>3): Rename to... (bic<mode>3<vczle><vczbe>): ... This. (<su><maxmin><mode>3): Rename to... (<su><maxmin><mode>3<vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add tests for orn, bic, max and min.
2023-04-25aarch64: Implement V2DI,V4SI division optabs for TARGET_SVEKyrylo Tkachov3-0/+87
Similar to the mulv2di case, we can use SVE instruction to implement the V4SI and V2DI optabs for signed and unsigned integer division. This allows us to generate much cleaner code for the testcase than the current: food: fmov x1, d1 fmov x0, d0 umov x2, v0.d[1] sdiv x0, x0, x1 umov x1, v1.d[1] sdiv x1, x2, x1 fmov d0, x0 ins v0.d[1], x1 ret which now becomes: food: ptrue p0.b, all sdiv z0.d, p0/m, z0.d, z1.d ret Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (<su_optab>div<mode>3): New define_expand. * config/aarch64/iterators.md (VQDIV): New mode iterator. (vnx2di): New mode attribute. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve-neon-modes_3.c: New test.
2023-04-25testsuite: Fix up ext-floating15.C tests on powerpc64-linux [PR109278]Jakub Jelinek1-0/+1
I've noticed this test FAILs on powerpc64-linux, with FAIL: g++.dg/cpp23/ext-floating15.C -std=gnu++98 (test for excess errors) Excess errors: /home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target /home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target /home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:1: error: variable or field 'bar' declared void /home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:5: error: '_Float128' is not supported on this target /home/jakub/gcc/gcc/testsuite/g++.dg/cpp23/ext-floating15.C:8:6: error: expected primary-expression before '_Float128' and similarly other std versions. powerpc64-linux is float128 target, but needs to add some options for it. Fixed by adding them. 2023-04-25 Jakub Jelinek <jakub@redhat.com> PR c++/109278 * g++.dg/cpp23/ext-floating15.C: Add dg-add-options float128.
2023-04-25rtl-optimization/109585 - alias analysis typoRichard Biener2-1/+34
When r10-514-gc6b84edb6110dd2b4fb improved access path analysis it introduced a typo that triggers when there's an access to a trailing array in the first access path leading to false disambiguation. PR rtl-optimization/109585 * tree-ssa-alias.cc (aliasing_component_refs_p): Fix typo. * gcc.dg/torture/pr109585.c: New testcase.
2023-04-25powerpc: Fix up *branch_anddi3_dot for -m32 -mpowerpc64 [PR109566]Jakub Jelinek2-1/+28
The following testcase reduced from newlib ICEs on powerpc-linux, with -O2 -m32 -mpowerpc64 since r12-6433 PR102239 optimization was added and on the original testcase since some ranger improvements in GCC 13 made it no longer latent on newlib. The problem is that the *branch_anddi3_dot define_insn_and_split relies on the *rotldi3_mask_dot define_insn_and_split being recognized during splitting. The rs6000_is_valid_rotate_dot_mask function checks whether the mask is a CONST_INT which is a valid mask, but *rotl<mode>3_mask_dot in addition to checking that it is a valid mask also has (<MODE>mode == Pmode || UINTVAL (operands[3]) <= 0x7fffffff) test in the condition. For TARGET_64BIT that doesn't add any further requirements, but for !TARGET_64BIT && TARGET_POWERPC64 if the AND second operand is larger than INT_MAX it will not be recognized. The rs6000_is_valid_rotate_dot_mask function is used solely in one spot, condition of *branch_anddi3_dot, so the following patch adjusts it to check for that as well. 2023-04-25 Jakub Jelinek <jakub@redhat.com> PR target/109566 * config/rs6000/rs6000.cc (rs6000_is_valid_rotate_dot_mask): For !TARGET_64BIT, don't return true if UINTVAL (mask) << (63 - nb) is larger than signed int maximum. * gcc.target/powerpc/pr109566.c: New test.
2023-04-25gcov: add info about "calls" to JSON output formatMartin Liska5-12/+95
gcc/ChangeLog: * doc/gcov.texi: Document the new "calls" field and document the API bump. Mention also "block_ids" for lines. * gcov.cc (output_intermediate_json_line): Output info about calls and extend branches as well. (generate_results): Bump version to 2. (output_line_details): Use block ID instead of a non-sensual index. gcc/testsuite/ChangeLog: * g++.dg/gcov/gcov-17.C: Add call to a noreturn function. * g++.dg/gcov/test-gcov-17.py: Cover new format. * lib/gcov.exp: Add options for gcov that emit the extra info.
2023-04-25[Committed] Correct zeroextendqihi2 insn length regression on xstormy16.Roger Sayle1-1/+1
My recent tweak to the zeroextendqihi2 pattern on xstormy16 incorrectly handled the case where the operand was a MEM. MEM operands use a longer encoding than REG operands, and the incorrect instruction length resulted in assembler errors (as reported by Jeff Law). This patch restores the original length resolving this regression. Sorry for the inconvenience. Committed as obvious, after testing that a cross-compiler to xstormy16-elf builds from x86_64-pc-linux-gnu, and that gcc.c-torture/execute/memset-2.c no longer causes "operand out of range" issues in gas. Committed as obvious. 2023-04-25 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.md (zero_extendqihi2): Restore/fix length attribute for the first (memory operand) alternative.
2023-04-25aarch64: Leveraging the use of STP instruction for vec_duplicateVictor Do Nascimento4-1/+71
The backend pattern for storing a pair of identical values in 32 and 64-bit modes with the machine instruction STP was missing, and multiple instructions were needed to reproduce this behavior as a result of failed RTL pattern match in combine pass. For the test case: typedef long long v2di __attribute__((vector_size (16))); typedef int v2si __attribute__((vector_size (8))); void foo (v2di *x, long long a) { v2di tmp = {a, a}; *x = tmp; } void foo2 (v2si *x, int a) { v2si tmp = {a, a}; *x = tmp; } at -O2 on aarch64 gives: foo: stp x1, x1, [x0] ret foo2: stp w1, w1, [x0] ret instead of: foo: dup v0.2d, x1 str q0, [x0] ret foo2: dup v0.2s, w1 str d0, [x0] ret Bootstrapped and regtested on aarch64-none-linux-gnu. gcc/ * config/aarch64/aarch64-simd.md(aarch64_simd_stp<mode>): New. * config/aarch64/constraints.md: Make "Umn" relaxed memory constraint. * config/aarch64/iterators.md(ldpstp_vel_sz): New. gcc/testsuite/ * gcc.target/aarch64/stp_vec_dup_32_64-1.c: New.
2023-04-25Remove default constructor to nan_state.Aldy Hernandez2-8/+8
I think it's best to specify the default behavior of nan_state, since it's not obvious that nan_state() defaults to TRUE. Also, this avoids the ugly nan_state(false, false) idiom. gcc/ChangeLog: * value-range.cc (frange::set): Adjust constructor. * value-range.h (nan_state::nan_state): Replace default constructor with one taking an argument.
2023-04-25MAINTAINERS: add myself to write after approvalVictor Do Nascimento1-0/+1
ChangeLog: * MAINTAINERS (Write After Approval): Add myself.
2023-04-25Remove obsolete configure code in gnattoolsEric Botcazou2-92/+20
It was recently pointed out that we generate symbolic links to ghost files when building the GNAT tools, as the mlib-tgt-specific-*.adb files are gone. gnattools/ * configure.ac (TOOLS_TARGET_PAIRS): Remove obsolete settings. (EXTRA_GNATTOOLS): Likewise. * configure: Regenerate.
2023-04-25Pass correct type to irange::contains_p() in ipa-cp.cc.Aldy Hernandez1-1/+19
There is a call to contains_p() in ipa-cp.cc which passes incompatible types. This currently works because deep in the call chain, the legacy code uses tree_int_cst_lt which performs the operation with widest_int. With the upcoming removal of legacy, contains_p() will be stricter. gcc/ChangeLog: * ipa-cp.cc (ipa_range_contains_p): New. (decide_whether_version_node): Use it.
2023-04-25[PATCH v2] testsuite: Add testcase for sparc ICE [PR105573]Sam James1-0/+15
r11-10018-g33914983cf3734c2f8079963ba49fcc117499ef3 fixed PR105312 and added a test case for target/arm but the duplicate PR105573 has a test case for target/sparc that was uncommitted until now. 2023-04-21 Sam James <sam@gentoo.org> PR tree-optimization/105312 PR target/105573 gcc/testsuite/ * gcc.target/sparc/pr105573.c: New test.
2023-04-24Add alternative testcase of phi-opt-25.c that tests phioptAndrew Pinski1-0/+89
Right now phi-opt-25.c has tests like `a ? func(a) : CST` but if we add the simplifications to match.pd, then phi-opt-25.c will no longer be testing phiopt to make sure these get optimized. So this adds an alternative version which is designed to test phiopt. Committed as obvious after testing the testcase to make sure it does not fail on x86_64-linux-gnu. Thanks, Andrew Pinski gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-25a.c: New test.
2023-04-25Daily bump.GCC Administrator7-1/+294
2023-04-25[SVE] Fold svrev(svrev(v)) to v.Prathamesh Kulkarni2-0/+33
gcc/ChangeLog: * tree-ssa-forwprop.cc (is_combined_permutation_identity): Try to simplify two successive VEC_PERM_EXPRs with same VLA mask, where mask chooses elements in reverse order. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general/rev-1.c: New test.
2023-04-24Update gcc hr.po, sv.po, zh_CN.poJoseph Myers3-1146/+587
* hr.po, sv.po, zh_CN.po: Update.
2023-04-24libstdc++: Fix __max_diff_type::operator>>= for negative valuesPatrick Palka2-3/+12
This patch fixes sign bit propagation when right-shifting a negative __max_diff_type value by more than one, a bug that our existing test coverage didn't expose until r14-159-g03cebd304955a6 fixed the front end's 'signed typedef-name' handling that the test relies on (which is a non-standard extension to the language grammar). libstdc++-v3/ChangeLog: * include/bits/max_size_type.h (__max_diff_type::operator>>=): Fix propagation of sign bit. * testsuite/std/ranges/iota/max_size_type.cc: Avoid using the non-standard 'signed typedef-name'. Add some compile-time tests for right-shifting a negative __max_diff_type value by more than one.
2023-04-24PHIOPT: Add support for diamond shaped bb to match_simplify_replacementAndrew Pinski3-10/+35
This adds diamond shaped form of basic blocks to match_simplify_replacement. This is the patch is the start of removing/moving all of what minmax_replacement does to match.pd to reduce the code duplication. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note phi-opt-{23,24}.c testcase had an incorrect xfail as there should have been 2 if still because f4/f5 would not be transformed as -ABS is not allowable during early phi-opt. gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Add new arguments and support diamond shaped basic block form. (tree_ssa_phiopt_worker): Update call to match_simplify_replacement gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-23.c: Update testcase. * gcc.dg/tree-ssa/phi-opt-24.c: Likewise.
2023-04-24PHIOPT: Ignore predicates for match-and-simplify phi-optAndrew Pinski2-3/+23
This fixes a missed optimization where early phi-opt would not work when there was predicates. The easiest fix is to change empty_bb_or_one_feeding_into_p to ignore those statements while checking for only feeding statement. Note phi-opt-23.c and phi-opt-24.c still fail as we don't handle diamond form in match_and_simplify phiopt yet. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Instead of calling last_and_only_stmt, look for the last statement manually. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-13.c: Add -fno-ssa-phiopt.
2023-04-24PHIOPT: Factor out some code from match_simplify_replacementAndrew Pinski1-44/+62
This factors out the code checking if we have an empty bb or one statement that feeds into the phi so it can be used when adding diamond shaped bb form to match_simplify_replacement in the next patch. Also allows for some improvements in the next patches too. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): New function. (match_simplify_replacement): Call empty_bb_or_one_feeding_into_p instead of doing it inline.
2023-04-24PHIOPT: Allow other diamond uses when do_hoist_loads is trueAndrew Pinski1-4/+1
While working on adding diamond shaped form to match-and-simplify phiopt, I Noticed that we would not reach there if do_hoist_loads was true. In the original code before the cleanups it was not obvious why but after I finished the cleanups, it was just a matter of removing a continue and that is what this patch does. This just happens also to fix a bug report that I noticed too. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/68894 * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove the continue for the do_hoist_loads diamond case.
2023-04-24PHIOPT: Cleanup tree_ssa_phiopt_worker codeAndrew Pinski1-105/+107
This patch cleans up tree_ssa_phiopt_worker by merging common code. Making do_store_elim handled earlier. Note this does not change any overall logic of the code, just moves code around enough to be able to do this. This will make it easier to move code around even more and a few other fixes I have. Plus I think all of the do_store_elim code really should move to its own function as how much code is shared is now obvious not much. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Rearrange code for better code readability.
2023-04-24PHIOPT: Move check on diamond bb to tree_ssa_phiopt_worker from ↵Andrew Pinski3-6/+34
minmax_replacement This moves the check to make sure on the diamond shaped form bbs that the the two middle bbs are only for that diamond shaped form earlier in the shared code. Also remove the redundant check for single_succ_p since that was already done before hand. The next patch will simplify the code even further and remove redundant checks. PR tree-optimization/109604 gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the diamond form check from ... (minmax_replacement): Here. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109604-1.c: New test. * gcc.c-torture/compile/pr109604-2.c: New test.
2023-04-24c++, tree: declare some basic functions inlinePatrick Palka4-57/+54
The functions strip_array_types, is_typedef_decl, typedef_variant_p and cp_expr_location are used throughout the C++ front end including in some fairly hot parts (e.g. in the tsubst routines and cp_walk_subtree) and they're small enough that the overhead of calling them out-of-line is relatively significant. So this patch moves their definitions into the appropriate headers to enable inlining them. gcc/cp/ChangeLog: * cp-tree.h (cp_expr_location): Define here. * tree.cc (cp_expr_location): Don't define here. gcc/ChangeLog: * tree.cc (strip_array_types): Don't define here. (is_typedef_decl): Don't define here. (typedef_variant_p): Don't define here. * tree.h (strip_array_types): Define here. (is_typedef_decl): Define here. (typedef_variant_p): Define here.
2023-04-24Docs, OpenMP: Small fixes to internal OMP_FOR doc.Frederik Harwath2-4/+4
gcc/ChangeLog: * doc/generic.texi (OpenMP): Add != to allowed conditions and state that vars can be unsigned. * tree.def (OMP_FOR): Likewise.
2023-04-24aarch64: Add mulv2di3 expander for TARGET_SVEKyrylo Tkachov3-0/+81
Motivated by a recent LLVM patch I saw, we can use SVE for 64-bit vector integer MUL (plain Advanced SIMD doesn't support it). Since the Advanced SIMD regs are just the low 128-bit part of the SVE regs it all works transparently. It's a reasonably straightforward implementation of the mulv2di3 optab that wires it up through the mulvnx2di3 expander and subregs the results back to the Advanced SIMD modes. There's more such tricks possible with other operations (and we could do 64-bit multiply-add merged operations too) but for now this self-contained patch improves the mul case as without it for the testcases in the patch we'd have scalarised the arguments, moved them to GP regs, performed two GP MULs and moved them back to SIMD regs. Advertising a mulv2di3 optab from the backend should also allow for more flexibile vectorisation opportunities. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (mulv2di3): New expander. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve-neon-modes_1.c: New test. * gcc.target/aarch64/sve-neon-modes_2.c: New test.
2023-04-24MAINTAINERS: fix sorting of namesMartin Liska1-1/+1
ChangeLog: * MAINTAINERS: Fix sorting.
2023-04-24doc: Update install.texi for GCC 13Rainer Orth1-109/+78
install.texi needs some updates for GCC 13 and trunk: * We used a mixture of Solaris 2 and Solaris references. Since Solaris 1/SunOS 4 is ancient history by now, consistently use Solaris everywhere. Likewise, explicit references to Solaris 11 can go in many places since Solaris 11.3 and 11.4 is all GCC supports. * Some caveats apply to both Solaris/SPARC and x86, like the difference between as and gas. * Some specifics are obsolete, like the /usr/ccs/bin path whose contents was merged into /usr/bin in Solaris 11.0 already. Likewise, /bin/sh is ksh93 since Solaris 11.0, so there's no need to explicitly use /bin/ksh. * I've removed the reference to OpenCSW: there's barely a need for external sites to get additional packages. OpenCSW is mostly unmaintained these days and has been found to be rather harmful then helping. * The section on assembler and linker to use was partially duplicated. Better keep the info in one place. * GNAT is bundled in recent Solaris 11.4 updates, so recommend that. Tested on i386-pc-solaris2.11 with make doc/gccinstall.{info,pdf} and inspection of the latter. 2023-04-21 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc: * doc/install.texi: Consistently use Solaris rather than Solaris 2. Remove explicit Solaris 11 references. Markup fixes. (Options specification, --with-gnu-as): as and gas always differ on Solaris. Remove /usr/ccs/bin reference. (Installing GCC: Binaries, Solaris (SPARC, Intel)): Remove. (i?86-*-solaris2*): Merge assembler, linker recommendations ... (*-*-solaris2*): ... here. Update bundled GCC versions. Don't refer to pre-built binaries. Remove /bin/sh warning. Update assembler, linker recommendations. Document GNAT bootstrap compiler. (sparc-sun-solaris2*): Remove non-UltraSPARC reference. (sparc64-*-solaris2*): Move content... (sparcv9-*-solaris2*): ...here. Add GDC for 64-bit bootstrap compilers.
2023-04-24aarch64: PR target/109406 Add support for SVE2 unpredicated MULKyrylo Tkachov4-4/+57
SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders without using up a predicate registers. This patch does so. As the SVE MUL expansion currently is templated away through a code iterator I did not split it off just for this case but instead special-cased it in the define_expand. It seemed somewhat less invasive than the alternatives but I could split it off more explicitly if others want to. The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/109406 * config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL case. * config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New pattern. gcc/testsuite/ChangeLog: PR target/109406 * gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2 MUL. * gcc.target/aarch64/sve2/unpred_mul_1.c: New test.
2023-04-24[4/4] aarch64: Convert UABAL2 and SABAL2 patterns to standard RTL codesKyrylo Tkachov5-15/+144
The final patch in the series tackles the most complex of this family of patterns, UABAL2 and SABAL2. These extract the high part of the sources, perform an absdiff on them, widen the result and accumulate. The motivating testcase for this patch (series) is included and the simplification required doesn't actually trigger with just the RTL pattern change because rtx_costs block it. So this patch also extends rtx costs to recognise the (minus (smax (x, y) (smin (x, y)))) expression we use to describe absdiff in the backend and avoid recursing into its arms. This allows us to generate the single-instruction sequence expected here. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abal2<mode>): Rename to... (aarch64_<su>abal2<mode>_insn): ... This. Use RTL codes instead of unspec. (aarch64_<su>abal2<mode>): New define_expand. * config/aarch64/aarch64.cc (aarch64_abd_rtx_p): New function. (aarch64_rtx_costs): Handle ABD rtxes. * config/aarch64/aarch64.md (UNSPEC_SABAL2, UNSPEC_UABAL2): Delete. * config/aarch64/iterators.md (ABAL2): Delete. (sur): Remove handling of UNSPEC_UABAL2 and UNSPEC_SABAL2. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vabal_combine.c: New test.
2023-04-24[3/4] aarch64: Convert UABAL and SABAL patterns to standard RTL codesKyrylo Tkachov4-26/+26
With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated. There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done. Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to... (aarch64_<su>abal<mode>): ... This. Use RTL codes instead of unspec. (<sur>sadv16qi): Rename to... (<su>sadv16qi): ... This. Adjust for the above. * config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to... (<su>sad<vsi2qi>): ... This. Adjust for the above. * config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete. * config/aarch64/iterators.md (ABAL): Delete. (sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.
2023-04-24[2/4] aarch64: Convert UABDL2 and SABDL2 patterns to standard RTL codesKyrylo Tkachov3-11/+33
Similar to the previous patch for UABDL and SABDL, this patch covers the *2 versions that vec_select the high half of its input to do the asbsdiff and extend. A define_expand is added for the intrinsic to create the "select-high-half" RTX the pattern expects. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>): Rename to... (aarch64_<su>abdl2<mode>_insn): ... This. Use RTL codes instead of unspec. (aarch64_<su>abdl2<mode>): New define_expand. * config/aarch64/aarch64.md (UNSPEC_SABDL2, UNSPEC_UABDL2): Delete. * config/aarch64/iterators.md (ABDL2): Delete. (sur): Remove handling of UNSPEC_SABDL2 and UNSPEC_UABDL2.
2023-04-24[1/4] aarch64: Convert UABDL and SABDL patterns to standard RTL codesKyrylo Tkachov3-12/+10
This is the first patch in a series to improve the RTL representation of the sum-of-absolute-differences patterns in the backend. We can use standard RTL codes and remove some unspecs. For UABDL and SABDL we have a widening of the result so we can represent uabdl (x, y) as (zero_extend (minus (smax (x, y) (smin (x, y))))) and sabdl (x, y) as (zero_extend (minus (umax (x, y) (umin (x, y))))). It is important to use zero_extend rather than sign_extend for the sabdl case, as the result of the absolute difference is still a positive unsigned value (the signedness of the operation refers to the values being diffed, not the absolute value of the difference) that must be zero-extended. Bootstrapped and tested on aarch64-none-linux-gnu (these intrinsics are reasonably well-covered by the advsimd-intrinsics.exp tests) gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl<mode>): Rename to... (aarch64_<su>abdl<mode>): ... This. Use standard RTL ops instead of unspec. * config/aarch64/aarch64.md (UNSPEC_SABDL, UNSPEC_UABDL): Delete. * config/aarch64/iterators.md (ABDL): Delete. (sur): Remove handling of UNSPEC_SABDL and UNSPEC_UABDL.
2023-04-24aarch64: Add pattern to match zero-extending scalar result of ADDLVKyrylo Tkachov2-0/+100
The vaddlv_u8 and vaddlv_u16 intrinsics produce a widened scalar result (uint16_t and uint32_t). The ADDLV instructions themselves zero the rest of the V register, which gives us a free zero-extension to 32 and 64 bits, similar to how it works on the GP reg side. Because we don't model that zero-extension in the machine description this can cause GCC to move the results of these instructions to the GP regs just to do a (superfluous) zero-extension. This patch just adds a pattern to catch these cases. For the testcases we can now generate no zero-extends or GP<->FP reg moves, whereas before we generated stuff like: foo_8_32: uaddlv h0, v0.8b umov w1, v0.h[0] // FP<->GP move with zero-extension! str w1, [x0] ret Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addlv_zext.c: New test.
2023-04-24This replaces uses of last_stmt where we do not require debug skippingRichard Biener19-100/+59
There are quite some cases which want to access the control stmt ending a basic-block. Since there cannot be debug stmts after such stmt there's no point in using last_stmt which skips debug stmts and can be a compile-time hog for larger testcases. * gimple-ssa-split-paths.cc (is_feasible_trace): Avoid last_stmt. * graphite-scop-detection.cc (single_pred_cond_non_loop_exit): Likewise. * ipa-fnsummary.cc (set_cond_stmt_execution_predicate): Likewise. (set_switch_stmt_execution_predicate): Likewise. (phi_result_unknown_predicate): Likewise. * ipa-prop.cc (compute_complex_ancestor_jump_func): Likewise. (ipa_analyze_indirect_call_uses): Likewise. * predict.cc (predict_iv_comparison): Likewise. (predict_extra_loop_exits): Likewise. (predict_loops): Likewise. (tree_predict_by_opcode): Likewise. * gimple-predicate-analysis.cc (predicate::init_from_control_deps): Likewise. * gimple-pretty-print.cc (dump_implicit_edges): Likewise. * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Likewise. (replace_phi_edge_with_variable): Likewise. (two_value_replacement): Likewise. (value_replacement): Likewise. (minmax_replacement): Likewise. (spaceship_replacement): Likewise. (cond_removal_in_builtin_zero_pattern): Likewise. * tree-ssa-reassoc.cc (maybe_optimize_range_tests): Likewise. * tree-ssa-sccvn.cc (vn_phi_eq): Likewise. (vn_phi_lookup): Likewise. (vn_phi_insert): Likewise. * tree-ssa-structalias.cc (compute_points_to_sets): Likewise. * tree-ssa-threadbackward.cc (back_threader::maybe_thread_block): Likewise. (back_threader_profitability::possibly_profitable_path_p): Likewise. * tree-ssa-threadedge.cc (jump_threader::thread_outgoing_edges): Likewise. * tree-switch-conversion.cc (pass_convert_switch::execute): Likewise. (pass_lower_switch<O0>::execute): Likewise. * tree-tailcall.cc (tree_optimize_tail_calls_1): Likewise. * tree-vect-loop-manip.cc (vect_loop_versioning): Likewise. * tree-vect-slp.cc (vect_slp_function): Likewise. * tree-vect-stmts.cc (cfun_returns): Likewise. * tree-vectorizer.cc (vect_loop_vectorized_call): Likewise. (vect_loop_dist_alias_call): Likewise.
2023-04-24Avoid repeated forwarder_block_p calls in CFG cleanupRichard Biener1-2/+2
CFG cleanup maintains BB_FORWARDER_BLOCK and uses FORWARDER_BLOCK_P to check that apart from two places which use forwarder_block_p in outgoing_edges_match alongside many BB_FORWARDER_BLOCK uses. The following adjusts those. * cfgcleanup.cc (outgoing_edges_match): Use FORWARDER_BLOCK_P.
2023-04-24RISC-V: Eliminate redundant vsetvli for duplicate AVL defJuzhe-Zhong3-3/+58
This patch is the V2 patch:https://patchwork.sourceware.org/project/gcc/patch/20230328010124.235703-1-juzhe.zhong@rivai.ai/ Address comments from Jeff. Add comments for all_avail_in_compatible_p and refine comments of codes. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_avail_in_compatible_p): New function. (pass_vsetvl::refine_vsetvls): Optimize vsetvls. * config/riscv/riscv-vsetvl.h: New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.
2023-04-24RISC-V: Add function comment for cleanup_insns.Juzhe-Zhong1-0/+15
Add more comment for cleanup_insns. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::pre_vsetvl): Add function comment for cleanup_insns.
2023-04-24RISC-V: Optimize fault only first loadJuzhe-Zhong8-1/+177
V2 patch for: https://patchwork.sourceware.org/project/gcc/patch/20230330012804.110539-1-juzhe.zhong@rivai.ai/ which has been reviewed. This patch address Jeff's comment, refine ChangeLog to give more clear information. gcc/ChangeLog: * config/riscv/vector-iterators.md: New unspec to refine fault first load pattern. * config/riscv/vector.md: Refine fault first load pattern to erase avl from instructions with the fault first load property. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/ffload-1.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-2.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-3.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-5.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-6.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-7.c: New test.
2023-04-24Add testcases for ffs/ctz vectorization.liuhongt10-0/+786
gcc/testsuite/ChangeLog: PR tree-optimization/109011 * gcc.target/i386/pr109011-b1.c: New test. * gcc.target/i386/pr109011-b2.c: New test. * gcc.target/i386/pr109011-d1.c: New test. * gcc.target/i386/pr109011-d2.c: New test. * gcc.target/i386/pr109011-q1.c: New test. * gcc.target/i386/pr109011-q2.c: New test. * gcc.target/i386/pr109011-w1.c: New test. * gcc.target/i386/pr109011-w2.c: New test.
2023-04-24Daily bump.GCC Administrator3-1/+83