aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-27amdgcn: Fix bug in gfx1030 support patchAndrew Stubbs1-4/+2
The previous patch to add gfx1030 support introduced an issue with passing exit codes from kernels run under gcn-run (offload kernels were unaffected). gcc/ChangeLog: PR target/112088 * config/gcn/gcn.cc (gcn_expand_epilogue): Fix kernel epilogue register conflict.
2023-10-27amdgcn: silence warningsAndrew Stubbs2-3/+5
The operands really should be VOIDmode, so the warnings are false. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): Mention "operands" in condition to silence the warnings. (vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): Likewise. * config/gcn/gcn.md (*movti_insn): Likewise.
2023-10-27recog: Fix propagation into ASM_OPERANDSRichard Sandiford1-7/+20
An inline asm with multiple output operands is represented as a parallel set in which the SET_SRCs are the same (shared) ASM_OPERANDS. insn_propagation didn't account for this, and instead propagated into each ASM_OPERANDS individually. This meant that it could apply a substitution X->Y to Y itself, which (a) could create circularity and (b) would be semantically wrong in any case, since Y might use a different value of X. This patch checks explicitly for parallels involving ASM_OPERANDS, just like combine does. gcc/ * recog.cc (insn_propagation::apply_to_pattern_1): Handle shared ASM_OPERANDS.
2023-10-27c++: another build_new_1 folding fix [PR111929]Patrick Palka2-4/+12
In build_new_1, we also need to avoid folding 'outer_nelts_check' when in a template context to prevent an ICE on the below testcase. This patch replaces the problematic fold_build2 call with build2 (we'll later fold it if appropriate during cp_fully_fold). In passing, this patch removes an unnecessary conversion of 'nelts' since it should always already be a size_t (and 'convert' isn't the best conversion entry point to use anyway since it lacks a complain parameter). PR c++/111929 gcc/cp/ChangeLog: * init.cc (build_new_1): Remove unnecessary call to convert on 'nelts'. Use build2 instead of fold_build2 for 'outer_nelts_checks'. gcc/testsuite/ChangeLog: * g++.dg/template/non-dependent28a.C: New test.
2023-10-27c++: add testcase verifying non-dep new-expr checkingPatrick Palka1-0/+20
gcc/testsuite/ChangeLog: * g++.dg/template/new14.C: New test.
2023-10-27c++: more ahead-of-time -Wparentheses warningsPatrick Palka6-38/+44
Now that we don't have to worry about looking through NON_DEPENDENT_EXPR, we can easily extend the -Wparentheses warning in convert_for_assignment to consider (non-dependent) templated assignment operator expressions as well, like r14-4111-g6e92a6a2a72d3b did in maybe_convert_cond. gcc/cp/ChangeLog: * cp-tree.h (maybe_warn_unparenthesized_assignment): Declare. * semantics.cc (is_assignment_op_expr_p): Generalize to return true for any assignment operator expression, not just one that has been resolved to an operator overload. (maybe_warn_unparenthesized_assignment): Factored out from ... (maybe_convert_cond): ... here. (finish_parenthesized_expr): Mention maybe_warn_unparenthesized_assignment. * typeck.cc (convert_for_assignment): Replace -Wparentheses warning logic with maybe_warn_unparenthesized_assignment. gcc/testsuite/ChangeLog: * g++.dg/warn/Wparentheses-13.C: Strengthen by expecting that we issue the -Wparentheses warnings ahead of time. * g++.dg/warn/Wparentheses-23.C: Likewise. * g++.dg/warn/Wparentheses-32.C: Remove xfails.
2023-10-27PR modula2/111530: Build failure on BSD due to getopt_long_only GNU ↵Gaius Mulley16-177/+1268
extension dependency This patch uses the libiberty getopt long functions (wrapped up inside libgm2/libm2pim/cgetopt.cc) and only enables this implementation if libgm2/configure.ac detects no getopt_long and friends on the target. gcc/m2/ChangeLog: PR modula2/111530 * gm2-libs-ch/cgetopt.c (cgetopt_cgetopt_long): Re-format. (cgetopt_cgetopt_long_only): Re-format. (cgetopt_SetOption): Re-format and assign flag to NULL if name is also NULL. * gm2-libs/GetOpt.def (AddLongOption): Add index parameter and change flag to be a VAR parameter rather than a pointer. (GetOptLong): Re-format. (GetOpt): Correct comment. * gm2-libs/GetOpt.mod: Re-write to rely on cgetopt rather than implement long option creation in GetOpt. * gm2-libs/cgetopt.def (SetOption): has_arg type is INTEGER. libgm2/ChangeLog: PR modula2/111530 * Makefile.in: Regenerate. * aclocal.m4: Regenerate. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac (AC_CHECK_HEADERS): Include getopt.h. (GM2_CHECK_LIB): getopt_long check. (GM2_CHECK_LIB): getopt_long_only check. * libm2cor/Makefile.in: Regenerate. * libm2iso/Makefile.in: Regenerate. * libm2log/Makefile.in: Regenerate. * libm2min/Makefile.in: Regenerate. * libm2pim/Makefile.in: Regenerate. * libm2pim/cgetopt.cc: Re-write using conditional on configure and long function code from libiberty/getopt.c. gcc/testsuite/ChangeLog: PR modula2/111530 * gm2/pimlib/run/pass/testgetopt.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-10-27[PATCH] RISC-V: Fix wrong tune parameters on int_divYangyu Chen1-3/+3
This patch fixes an issue with the cost on "int_div" in various RISC-V tune parameters including those for Rocket, SiFive U7 series, and T-Head C906. This incorrect cost value interferes with the optimization process. For example, it prevents the optimization of division by a constant to a more efficient method known as Barrett reduction. This lack of optimization negatively affects the performance of these systems. The integer div cost of the Rocket and SiFive U7 is taken from the Rocket-Chip Divider source code[1] with BigCore configuration[2]. It shows the divUnroll unchanged which is 1 by default. Thus, the maximum int_div cycles should be the dataWidth + 1, which is 33 for 32-bit and 65 for 64-bit. As for C906, the divider takes 2 cycle to start[3], and it produce 2-bit result each cycle[4]. Thus, the maximum int_div cycles should be the dataWidth / 2 + 2, which is 18 for 32-bit and 34 for 64-bit. I also test the performance on VisionFive2 which has Qual-Core Sifive U74. I write a simple C program to do 1e8 times div by constant 6 in int32. The result shows it takes 1.998s using div, and 0.420s using barrett reduction to replace div with mul, which is 4.75x faster. [1] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/rocket/Multiplier.scala#L40 [2] https://github.com/chipsalliance/rocket-chip/blob/v1.6/src/main/scala/subsystem/Configs.scala#L97 [3] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div.v#L267 [4] https://github.com/T-head-Semi/openc906/blob/af5614d72de7e5a4b8609c427d2e20af1deb21c4/C906_RTL_FACTORY/gen_rtl/iu/rtl/aq_iu_div_shift2_kernel.v#L93 gcc/ChangeLog: * config/riscv/riscv.cc (rocket_tune_info): Fix int_div cost. (sifive_7_tune_info, thead_c906_tune_info): Likewise.
2023-10-27RISC-V: Add rawmemchr expander.Robin Dapp10-211/+429
This patch adds a vectorized rawmemchr expander. It also moves the vectorized expand_block_move to riscv-string.cc. gcc/ChangeLog: * config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander. * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): Define. (expand_rawmemchr): Define. * config/riscv/riscv-v.cc (force_vector_length_operand): Remove static. (expand_block_move): Move from here... * config/riscv/riscv-string.cc (expand_block_move): ...to here. (expand_rawmemchr): Add vectorized expander. * internal-fn.cc (expand_RAWMEMCHR): Fix typo. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/peel-2.c: Add -fno-tree-loop-distribute-patterns. * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto. * gcc.target/riscv/rvv/rvv.exp: Add builtin directory. * gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
2023-10-27RISC-V: Fix cond_sqrt tests.Robin Dapp7-7/+154
As long as we do not have universal Zvfh support in binutils linking against libm does not work out of the box. This patch splits the cond_sqrt tests into non-zvfh and zvfh variants and makes the run-zvfh ones depend on a zvfh target. While at it, I also added Zvfh handling to the testsuite helpers. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Remove Float16. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto. * lib/target-supports.exp: Add zvfh handling. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-zvfh-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-zvfh-2.c: New test.
2023-10-27[RA]: Add cost calculation for reg equivalence invariantsVladimir N. Makarov1-0/+4
My recent patch improving cost calculation for pseudos with equivalence resulted in failure of gcc.target/arm/eliminate.c on aarch64. This patch fixes this failure. gcc/ChangeLog: * ira-costs.cc: (get_equiv_regno, calculate_equiv_gains): Process reg equivalence invariants.
2023-10-27i386: Fiy typo in "partial_memory_read_stall" tune option.Uros Bizjak1-1/+1
gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_PARTIAL_MEMORY_READ_STALL): i386: Fiy typo in "partial_memory_read_stall" tune option.
2023-10-27Move OpenMP tests to gomp subdirPaul-Antoine Arras2-4/+0
gcc/testsuite/ChangeLog: * gfortran.dg/c_ptr_tests_20.f90: Moved to... * gfortran.dg/gomp/c_ptr_tests_20.f90: ...here. * gfortran.dg/c_ptr_tests_21.f90: Moved to... * gfortran.dg/gomp/c_ptr_tests_21.f90: ...here.
2023-10-27aarch64: Add basic target_print_operand support for CONST_STRINGVictor Do Nascimento1-0/+5
Motivated by the need to print system register names in output assembly, this patch adds the required logic to `aarch64_print_operand' to accept rtxs of type CONST_STRING and process these accordingly. Consequently, an rtx such as: (set (reg/i:DI 0 x0) (unspec:DI [(const_string ("s3_3_c13_c2_2"))]) can now be output correctly using the following output pattern when composing `define_insn's: "mrs\t%x0, %1" gcc/ChangeLog * config/aarch64/aarch64.cc (aarch64_print_operand): Add support for CONST_STRING.
2023-10-27PR target/110551: Fix reg allocation for widening multiplications on x86.Roger Sayle2-19/+68
This patch contains clean-ups of the widening multiplication patterns in i386.md, and provides variants of the existing highpart multiplication peephole2 transformations (that tidy up register allocation after reload), and thereby fixes PR target/110551, which is a superfluous move instruction. For the new test case, compiled on x86_64 with -O2. Before: mulx64: movabsq $-7046029254386353131, %rcx movq %rcx, %rax mulq %rdi xorq %rdx, %rax ret After: mulx64: movabsq $-7046029254386353131, %rax mulq %rdi xorq %rdx, %rax ret The clean-ups are (i) that operand 1 is consistently made register_operand and operand 2 becomes nonimmediate_operand, so that predicates match the constraints, (ii) the representation of the BMI2 mulx instruction is updated to use the new umul_highpart RTX, and (iii) because operands 0 and 1 have different modes in widening multiplications, "a" is a more appropriate constraint than "0" (which avoids spills/reloads containing SUBREGs). The new peephole2 transformations are based upon those at around line 9951 of i386.md, that begins with the comment ;; Highpart multiplication peephole2s to tweak register allocation. ;; mov imm,%rdx; mov %rdi,%rax; imulq %rdx -> mov imm,%rax; imulq %rdi 2023-10-27 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110551 * config/i386/i386.md (<u>mul<mode><dwi>3): Make operands 1 and 2 take "regiser_operand" and "nonimmediate_operand" respectively. (<u>mulqihi3): Likewise. (*bmi2_umul<mode><dwi>3_1): Operand 2 needs to be register_operand matching the %d constraint. Use umul_highpart RTX to represent the highpart multiplication. (*umul<mode><dwi>3_1): Operand 2 should use regiser_operand predicate, and "a" rather than "0" as operands 0 and 2 have different modes. (define_split): For mul to mulx conversion, use the new umul_highpart RTX representation. (*mul<mode><dwi>3_1): Operand 1 should be register_operand and the constraint %a as operands 0 and 1 have different modes. (*<u>mulqihi3_1): Operand 1 should be register_operand matching the constraint %0. (define_peephole2): Providing widening multiplication variants of the peephole2s that tweak highpart multiplication register allocation. gcc/testsuite/ChangeLog PR target/110551 * gcc.target/i386/pr110551.c: New test case.
2023-10-27preprocessor: c++: Support `#pragma GCC target' macros [PR87299]Lewis Hyatt15-27/+219
`#pragma GCC target' is not currently handled in preprocess-only mode (e.g., when running gcc -E or gcc -save-temps). As noted in the PR, this means that if the target pragma defines any macros, those macros are not effective in preprocess-only mode. Similarly, such macros are not effective when compiling with C++ (even when compiling without -save-temps), because C++ does not process the pragma until after all tokens have been obtained from libcpp, at which point it is too late for macro expansion to take place. Since r13-1544 and r14-2893, there is a general mechanism to handle pragmas under these conditions as well, so resolve the PR by using the new "early pragma" support. toplev.cc required some changes because the target-specific handlers for `#pragma GCC target' may call target_reinit(), and toplev.cc was not expecting that function to be called in preprocess-only mode. I added some additional testcases from the PR for x86. The other targets that support `#pragma GCC target' (aarch64, arm, nios2, powerpc, s390) already had tests verifying that the pragma sets macros as expected; here I have added -save-temps versions of some of them, to test that they now work in preprocess-only mode as well. gcc/c-family/ChangeLog: PR preprocessor/87299 * c-pragma.cc (init_pragma): Register `#pragma GCC target' and related pragmas in preprocess-only mode, and enable early handling. (c_reset_target_pragmas): New function refactoring code from... (handle_pragma_reset_options): ...here. * c-pragma.h (c_reset_target_pragmas): Declare. gcc/cp/ChangeLog: PR preprocessor/87299 * parser.cc (cp_lexer_new_main): Call c_reset_target_pragmas () after preprocessing is complete, before starting compilation. gcc/ChangeLog: PR preprocessor/87299 * toplev.cc (no_backend): New static global. (finalize): Remove argument no_backend, which is now a static global. (process_options): Likewise. (do_compile): Likewise. (target_reinit): Don't do anything in preprocess-only mode. (toplev::main): Adapt to no_backend change. (toplev::finalize): Likewise. gcc/testsuite/ChangeLog: PR preprocessor/87299 * c-c++-common/pragma-target-1.c: New test. * c-c++-common/pragma-target-2.c: New test. * g++.target/i386/pr87299-1.C: New test. * g++.target/i386/pr87299-2.C: New test. * gcc.target/i386/pr87299-1.c: New test. * gcc.target/i386/pr87299-2.c: New test. * gcc.target/s390/target-attribute/tattr-2b.c: New test. * gcc.target/aarch64/pragma_cpp_predefs_1b.c: New test. * gcc.target/arm/pragma_arch_attribute_1b.c: New test. * gcc.target/nios2/custom-fp-2b.c: New test. * gcc.target/powerpc/float128-3b.c: New test.
2023-10-27Fortran: Fix some problems with SELECT TYPE selectors [PR104625].Paul Thomas5-12/+79
2023-10-27 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/104625 * expr.cc (gfc_check_vardef_context): Check that the target does have a vector index before emitting the specific error. * match.cc (copy_ts_from_selector_to_associate): Ensure that class valued operator expressions set the selector rank and use the rank to provide the associate variable with an appropriate array spec. * resolve.cc (resolve_operator): Reduce stacked parentheses to a single pair. (fixup_array_ref): Extract selector symbol from parentheses. gcc/testsuite/ PR fortran/104625 * gfortran.dg/pr104625.f90: New test. * gfortran.dg/associate_55.f90: Change error check.
2023-10-27MATCH: Simplify `(X &| B) CMP X` if possible [PR 101590]Andrew Pinski7-0/+205
I noticed we were missing these simplifications so let's add them. This adds the following simplifications: U & N <= U -> true U & N > U -> false When U is known to be as non-negative. When N is also known to be non-negative, this is also true: U | N < U -> false U | N >= U -> true When N is a negative integer, the result flips and we get: U | N < U -> true U | N >= U -> false We could extend this later on to be the case where we know N is nonconstant but is known to be negative. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/101590 PR tree-optimization/94884 gcc/ChangeLog: * match.pd (`(X BIT_OP Y) CMP X`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bitcmp-1.c: New test. * gcc.dg/tree-ssa/bitcmp-2.c: New test. * gcc.dg/tree-ssa/bitcmp-3.c: New test. * gcc.dg/tree-ssa/bitcmp-4.c: New test. * gcc.dg/tree-ssa/bitcmp-5.c: New test. * gcc.dg/tree-ssa/bitcmp-6.c: New test.
2023-10-27Support vec_cmpmn/vcondmn for v2hf/v4hf.liuhongt6-23/+352
gcc/ChangeLog: PR target/103861 * config/i386/i386-expand.cc (ix86_expand_sse_movcc): Handle V2HF/V2BF/V4HF/V4BFmode. * config/i386/i386.cc (ix86_get_mask_mode): Return QImode when data_mode is V4HF/V2HFmode. * config/i386/mmx.md (vec_cmpv4hfqi): New expander. (vcond_mask_<mode>v4hi): Ditto. (vcond_mask_<mode>qi): Ditto. (vec_cmpv2hfqi): Ditto. (vcond_mask_<mode>v2hi): Ditto. (mmx_plendvb_<mode>): Add 2 combine splitters after the patterns. (mmx_pblendvb_v8qi): Ditto. (<code>v2hi3): Add a combine splitter after the pattern. (<code><mode>3): Ditto. (<code>v8qi3): Ditto. (<code><mode>3): Ditto. * config/i386/sse.md (vcond<mode><mode>): Merge this with .. (vcond<sseintvecmodelower><mode>): .. this into .. (vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): .. this, and extend to V8BF/V16BF/V32BFmode. gcc/testsuite/ChangeLog: * g++.target/i386/part-vect-vcondhf.C: New test. * gcc.target/i386/part-vect-vec_cmphf.c: New test.
2023-10-27Daily bump.GCC Administrator11-1/+358
2023-10-27RISC-V: Move lmul calculation into macroJuzhe-Zhong2-10/+11
Notice we calculate LMUL according to --param=riscv-autovec-lmul in multiple places: int lmul = riscv_autovec_lmul == RVV_DYNAMIC ? RVV_M8 : riscv_autovec_lmul; Create a new macro for it for easier matain. gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_MAX_LMUL): New macro. * config/riscv/riscv-v.cc (preferred_simd_mode): Adapt macro. (autovectorize_vector_modes): Ditto. (can_find_related_mode_p): Ditto.
2023-10-27RISC-V: Add AVL propagation PASS for RVV auto-vectorizationJuzhe-Zhong11-6/+482
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization which is a known issue for a long time and I finally find the time to address it. Consider a simple vector addition operation: https://godbolt.org/z/7hfGfEjW3 void foo (int *__restrict a, int *__restrict b, int *__restrict n) { for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; } Optimized IR: Loop body: _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma ... vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0) vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1) vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2 .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4) We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling. The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment: vect__7.12_19 = vect__6.11_20 + vect__4.8_27; GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization. Such flow are used by all other targets like ARM SVE (RVV also uses such flow): ARM SVE: .L3: ld1w z30.s, p7/z, [x0, x3, lsl 2] -> predicated load ld1w z31.s, p7/z, [x1, x3, lsl 2] -> predicated load add z31.s, z31.s, z30.s -> un-predicated add st1w z31.s, p7, [x0, x3, lsl 2] -> predicated store Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it. Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons: 1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend. 2. Changing Loop vectorizer for it will make code base ugly and hard to maintain. 3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, .... We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns. To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls due to AVL/VL toggling. The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS) Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several experiments and tries. The reasons as follows: 1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL PASS become heavy and heavy again, then we will need to refactor it again in the future. Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor fixes. 2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS. 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation. 4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations. This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements. We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate VSETVL PASS again which is already so complicated.) Here is an example to demonstrate more: https://godbolt.org/z/bE86sv3q5 void foo2 (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict a2, int *__restrict b2, int *__restrict c2, int *__restrict a3, int *__restrict b3, int *__restrict c3, int *__restrict a4, int *__restrict b4, int *__restrict c4, int *__restrict a5, int *__restrict b5, int *__restrict c5, int n) { for (int i = 0; i < n; i++){ a[i] = b[i] + c[i]; b5[i] = b[i] + c[i]; a2[i] = b2[i] + c2[i]; a3[i] = b3[i] + c3[i]; a4[i] = b4[i] + c4[i]; a5[i] = a[i] + a4[i]; a[i] = a5[i] + b5[i]+ a[i]; a[i] = a[i] + c[i]; b5[i] = a[i] + c[i]; a2[i] = a[i] + c2[i]; a3[i] = a[i] + c3[i]; a4[i] = a[i] + c4[i]; a5[i] = a[i] + a4[i]; a[i] = a[i] + b5[i]+ a[i]; } } 1. Loop Body: Before this patch: After this patch: vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma vle32.v v2,0(a2) vle32.v v2,0(a2) vle32.v v4,0(a1) vle32.v v3,0(t2) vle32.v v1,0(t2) vle32.v v4,0(a1) vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0) vadd.vv v4,v2,v4 vadd.vv v4,v2,v4 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1 vle32.v v3,0(s0) vadd.vv v1,v1,v4 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2 vadd.vv v1,v1,v4 vadd.vv v2,v1,v2 vadd.vv v1,v1,v4 vse32.v v2,0(t5) vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1 vle32.v v4,0(a5) vadd.vv v2,v2,v1 vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2 vadd.vv v1,v1,v2 vadd.vv v3,v1,v3 vadd.vv v2,v1,v2 vle32.v v5,0(a5) vadd.vv v4,v1,v4 vle32.v v6,0(t6) vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3) vse32.v v2,0(t5) vse32.v v2,0(a0) vse32.v v4,0(a3) vadd.vv v3,v3,v1 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5 vadd.vv v3,v1,v3 vse32.v v3,0(t4) vadd.vv v2,v2,v1 vadd.vv v1,v1,v6 vadd.vv v2,v2,v1 vse32.v v2,0(a3) vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6) vse32.v v2,0(a0) vse32.v v3,0(t3) vle32.v v2,0(t0) vsetvli a7,zero,e32,m1,ta,ma vadd.vv v3,v3,v1 vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t4) vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2 vadd.vv v1,v1,v2 sub t1,t1,a4 vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6) It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated. 2. Epilogue: Before this patch: After this patch: .L5: .L5: ld s0,8(sp) ret addi sp,sp,16 jr ra This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma' The final codegen after this patch: foo2: lw t1,56(sp) ld t6,0(sp) ld t3,8(sp) ld t0,16(sp) ld t2,24(sp) ld t4,32(sp) ld t5,40(sp) ble t1,zero,.L5 .L3: vsetvli a4,t1,e32,m1,ta,ma vle32.v v2,0(a2) vle32.v v3,0(t2) vle32.v v4,0(a1) vle32.v v1,0(t0) vadd.vv v4,v2,v4 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2 vadd.vv v2,v1,v2 vse32.v v2,0(t5) vadd.vv v2,v2,v1 vadd.vv v2,v2,v1 slli a7,a4,2 vadd.vv v3,v1,v3 vle32.v v5,0(a5) vle32.v v6,0(t6) vse32.v v3,0(t3) vse32.v v2,0(a0) vadd.vv v3,v3,v1 vadd.vv v2,v1,v5 vse32.v v3,0(t4) vadd.vv v1,v1,v6 vse32.v v2,0(a3) vse32.v v1,0(a6) sub t1,t1,a4 add a1,a1,a7 add a2,a2,a7 add a5,a5,a7 add t6,t6,a7 add t0,t0,a7 add t2,t2,a7 add t5,t5,a7 add a3,a3,a7 add a6,a6,a7 add t3,t3,a7 add t4,t4,a7 add a0,a0,a7 bne t1,zero,.L3 .L5: ret PR target/111318 PR target/111888 gcc/ChangeLog: * config.gcc: Add AVL propagation pass. * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto. * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto. * config/riscv/t-riscv: Ditto. * config/riscv/riscv-avlprop.cc: New file. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto. * gcc.target/riscv/rvv/autovec/pr111318.c: New test. * gcc.target/riscv/rvv/autovec/pr111888.c: New test. Tested-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-26libstdc++: Fix exception thrown by std::shared_lock::unlock() [PR112089]Jonathan Wakely2-1/+24
The incorrect errc constant here looks like a copy&paste error. libstdc++-v3/ChangeLog: PR libstdc++/112089 * include/std/shared_mutex (shared_lock::unlock): Change errc constant to operation_not_permitted. * testsuite/30_threads/shared_lock/locking/112089.cc: New test.
2023-10-26libstdc++: Add dg-timeout-factor to <chrono> IO testsJonathan Wakely19-0/+19
This avoids failures due to compilation timeouts when testing with a low tool_timeout value. libstdc++-v3/ChangeLog: * testsuite/20_util/duration/io.cc: Double timeout using dg-timeout-factor. * testsuite/std/time/day/io.cc: Likewise. * testsuite/std/time/format.cc: Likewise. * testsuite/std/time/hh_mm_ss/io.cc: Likewise. * testsuite/std/time/month/io.cc: Likewise. * testsuite/std/time/month_day/io.cc: Likewise. * testsuite/std/time/month_day_last/io.cc: Likewise. * testsuite/std/time/month_weekday/io.cc: Likewise. * testsuite/std/time/month_weekday_last/io.cc: Likewise. * testsuite/std/time/weekday/io.cc: Likewise. * testsuite/std/time/weekday_indexed/io.cc: Likewise. * testsuite/std/time/weekday_last/io.cc: Likewise. * testsuite/std/time/year/io.cc: Likewise. * testsuite/std/time/year_month/io.cc: Likewise. * testsuite/std/time/year_month_day/io.cc: Likewise. * testsuite/std/time/year_month_day_last/io.cc: Likewise. * testsuite/std/time/year_month_weekday/io.cc: Likewise. * testsuite/std/time/year_month_weekday_last/io.cc: Likewise. * testsuite/std/time/zoned_time/io.cc: Likewise.
2023-10-26Add attribute((null_terminated_string_arg(PARAM_IDX)))David Malcolm13-28/+629
This patch adds a new function attribute to GCC for marking that an argument is expected to be a null-terminated string. For example, consider: void test_a (const char *p) __attribute__((null_terminated_string_arg (1))); which would indicate to humans and compilers that argument 1 of "test_a" is expected to be a null-terminated string, with the idea: - we should complain if it's not valid to read from *p up to the first '\0' character in the buffer - we should complain if *p is not terminated, or if it's uninitialized before the first '\0' character This is independent of the nonnull-ness of the pointer: if you also want to express that the argument must be non-null, we already have __attribute__((nonnull (N))), so the user can write e.g.: void test_b (const char *p) __attribute__((null_terminated_string_arg (1)) __attribute__((nonnull (1))); which can also be spelled as: void test_b (const char *p) __attribute__((null_terminated_string_arg (1), nonnull (1))); For a function similar to strncpy, we can use the "access" attribute to express a maximum size of the read: void test_c (const char *p, size_t sz) __attribute__((null_terminated_string_arg (1), nonnull (1), access (read_only, 1, 2))); The patch implements: (a) C/C++ frontends: recognition of this attribute (b) analyzer: usage of this attribute gcc/analyzer/ChangeLog: * region-model.cc (region_model::check_external_function_for_access_attr): Split out, replacing with... (region_model::check_function_attr_access): ...this new function and... (region_model::check_function_attrs): ...this new function. (region_model::check_one_function_attr_null_terminated_string_arg): New. (region_model::check_function_attr_null_terminated_string_arg): New. (region_model::handle_unrecognized_call): Update for renaming of check_external_function_for_access_attr to check_function_attrs. (region_model::check_for_null_terminated_string_arg): Add return value to one overload. Make both overloads const. * region-model.h: Include "stringpool.h" and "attribs.h". (region_model::check_for_null_terminated_string_arg): Add return value to one overload. Make both overloads const. (region_model::check_external_function_for_access_attr): Delete decl. (region_model::check_function_attr_access): New decl. (region_model::check_function_attr_null_terminated_string_arg): New decl. (region_model::check_one_function_attr_null_terminated_string_arg): New decl. (region_model::check_function_attrs): New decl. gcc/c-family/ChangeLog: * c-attribs.cc (c_common_attribute_table): Add "null_terminated_string_arg". (handle_null_terminated_string_arg_attribute): New. gcc/ChangeLog: * doc/extend.texi (Common Function Attributes): Add null_terminated_string_arg. gcc/testsuite/ChangeLog: * c-c++-common/analyzer/attr-null_terminated_string_arg-access-read_write.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-access-without-size.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-multiple.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-2.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull-sized.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-nonnull.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-nullable-sized.c: New test. * c-c++-common/analyzer/attr-null_terminated_string_arg-nullable.c: New test. * c-c++-common/attr-null_terminated_string_arg.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-10-26testsuite, aarch64: Normalise options to aarch64.exp.Iain Sandoe1-4/+5
When the compiler is configured --with-cpu= and that is different from the baselines assumed, we see excess tes fails (primarly in body code scans which are necessarily sensitive to costs). To stabilize the testsuite against such changes, use aarch64-with-arch-dg-options () to provide suitable consistent defaults. e.g. for --with-cpu=xgene1 we see over 100 excess fails which are removed by this change. gcc/testsuite/ChangeLog: * gcc.target/aarch64/aarch64.exp: Use aarch64-with-arch-dg-options to normaize the options to the tests in aarch64.exp. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26testsuite, Darwin: Adjust target test for modern OS.Iain Sandoe1-1/+1
The same conditions on use of DYLD_LIBRARY_PATH apply to OS versions 11 to 14, so make the test general. gcc/testsuite/ChangeLog: * lib/target-libpath.exp: Skip DYLD_LIBRARY_PATH for all current OS versions > 10. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26match: Simplify `a != C1 ? abs(a) : C2` when C2 == abs(C1) [PR111957]Andrew Pinski2-0/+35
This adds a match pattern for `a != C1 ? abs(a) : C2` which gets simplified to `abs(a)`. if C1 was originally *_MIN then change it over to use absu instead of abs. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111957 gcc/ChangeLog: * match.pd (`a != C1 ? abs(a) : C2`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-40.c: New test.
2023-10-26Add effective target to OpenMP testsPaul-Antoine Arras2-0/+2
This adds an effective target DejaGnu directive to prevent these testcases from failing on GCC configurations that do not support OpenMP. This fixes 8d2130a4e5c. gcc/testsuite/ChangeLog: * gfortran.dg/c_ptr_tests_20.f90: Add "fopenmp" effective target. * gfortran.dg/c_ptr_tests_21.f90: Add "fopenmp" effective target.
2023-10-26[range-op] Remove unused variable in fold_range.Aldy Hernandez1-1/+0
gcc/ChangeLog: * range-op-float.cc (range_operator::fold_range): Delete unused variable.
2023-10-26[range-ops] Remove unneeded parameters from rv_fold.Aldy Hernandez3-66/+14
Now that the floating point version of rv_fold calculates its result in an frange, we can remove the superfluous LB, UB, and MAYBE_NAN arguments. gcc/ChangeLog: * range-op-float.cc (range_operator::fold_range): Remove superfluous code. (range_operator::rv_fold): Remove unneeded arguments. (operator_plus::rv_fold): Same. (operator_minus::rv_fold): Same. (operator_mult::rv_fold): Same. (operator_div::rv_fold): Same. * range-op-mixed.h: Remove lb, ub, and maybe_nan arguments from rv_fold methods. * range-op.h: Same.
2023-10-26[range-ops] Add frange& argument to rv_fold.Aldy Hernandez3-32/+107
The floating point version of rv_fold returns its result in 3 pieces: the lower bound, the upper bound, and a maybe_nan bit. It is cleaner to return everything in an frange, thus bringing the floating point version of rv_fold in line with the integer version. This first patch adds an frange argument, while keeping the current functionality, and asserting that we get the same results. In a follow-up patch I will nuke the now useless 3 arguments. Splitting this into two patches makes it easier to bisect any problems if any should arise. gcc/ChangeLog: * range-op-float.cc (range_operator::fold_range): Pass frange argument to rv_fold. (range_operator::rv_fold): Add frange argument. (operator_plus::rv_fold): Same. (operator_minus::rv_fold): Same. (operator_mult::rv_fold): Same. (operator_div::rv_fold): Same. * range-op-mixed.h: Add frange argument to rv_fold methods. * range-op.h: Same.
2023-10-26RISC-V: Pass abi to g++ rvv testsuitePatrick O'Neill1-1/+3
On rv32gcv testcases like g++.target/riscv/rvv/base/bug-22.C fail with: FAIL: g++.target/riscv/rvv/base/bug-22.C (test for excess errors) Excess errors: cc1plus: error: ABI requires '-march=rv32' This patch adds the -mabi argument to g++ rvv tests. gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/rvv.exp: Add -mabi argument to CFLAGS. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-26libatomic: Consider '--with-build-sysroot=[...]' for target libraries' ↵Thomas Schwinge6-3/+15
build-tree testing (instead of build-time 'CC' etc.) [PR109951] Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7 "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]", this is commit 5ff06d762a88077aff0fb637c931c64e6f47f93d "libatomic/test: Fix compilation for build sysroot" done differently, avoiding build-tree testing use of any random gunk that may appear in build-time 'CC'. PR testsuite/109951 libatomic/ * configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/lib/libatomic.exp (libatomic_init): If '--with-build-sysroot=[...]' was specified, use it for build-tree testing. * testsuite/libatomic-site-extra.exp.in (GCC_UNDER_TEST): Don't set. (SYSROOT_CFLAGS_FOR_TARGET): Set.
2023-10-26libffi: Consider '--with-build-sysroot=[...]' for target libraries' ↵Thomas Schwinge7-6/+20
build-tree testing (instead of build-time 'CC' etc.) [PR109951] Similar to commit fb5d27be272b71fb9026224535fc73f125ce3be7 "libgomp: Consider '--with-build-sysroot=[...]' for target libraries' build-tree testing (instead of build-time 'CC' etc.) [PR91884, PR109951]", this is commit a0b48358cb1e70e161a87ec5deb7a4b25defba6b "libffi/test: Fix compilation for build sysroot" done differently, avoiding build-tree testing use of any random gunk that may appear in build-time 'CC', 'CXX'. PR testsuite/109951 libffi/ * configure.ac: 'AC_SUBST(SYSROOT_CFLAGS_FOR_TARGET)'. <local.exp>: Don't set 'CC_FOR_TARGET', 'CXX_FOR_TARGET', instead set 'SYSROOT_CFLAGS_FOR_TARGET'. * Makefile.in: Regenerate. * configure: Likewise. * include/Makefile.in: Likewise. * man/Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/lib/libffi.exp (libffi_target_compile): If '--with-build-sysroot=[...]' was specified, use it for build-tree testing.
2023-10-26testsuite: Allow general skips/requires in PCH testsRichard Sandiford4-24/+42
dg-pch.exp handled dg-require-effective-target pch_supported_debug as a special case, by grepping the source code. This patch tries to generalise it to other dg-require-effective-targets, and to dg-skip-if. There also seemed to be some errors in check-flags. It used: lappend $args [list <elt>] which treats the contents of args as a variable name. I think it was supposed to be "lappend args" instead. From the later code, the element was supposed to be <elt> itself, rather than a singleton list containing <elt>. We can also save some time by doing the common early-exit first. Doing this removes the need to specify the dg-require-effective-target in both files. Tested by faking unsupported debug and checking that the tests were still correctly skipped. gcc/testsuite/ * lib/target-supports-dg.exp (check-flags): Move default argument handling further up. Fix a couple of issues in the lappends. Avoid frobbing the compiler flags if the return value is already known to be 1. * lib/dg-pch.exp (dg-flags-pch): Process the dg-skip-if and dg-require-effective-target directives to see whether the assembly test should be skipped. * gcc.dg/pch/valid-1.c: Remove dg-require-effective-target. * gcc.dg/pch/valid-1b.c: Likewise.
2023-10-26arm: Use deltas for Arm switch tablesRichard Ball6-12/+242
For normal optimization for the Arm state in gcc we get an uncompressed table of jump targets. This is in the middle of the text segment far larger than necessary, especially at -Os. This patch compresses the table to use deltas in a similar manner to Thumb code generation. Similar code is also used for -fpic where we currently generate a jump to a jump. In this format the jumps are too dense for the hardware branch predictor to handle accurately, so execution is likely to be very expensive. Changes to switch statements for arm include a new function to handle the assembly generation for different machine modes. This allows for more optimisation to be performed in aout.h where arm has switched from using ASM_OUTPUT_ADDR_VEC_ELT to using ASM_OUTPUT_ADDR_DIFF_ELT. In ASM_OUTPUT_ADDR_DIFF_ELT new assembly generation options have been added to utilise the different machine modes. Additional changes made to the casesi expand and insn, CASE_VECTOR_PC_RELATIVE, CASE_VECTOR_SHORTEN_MODE and LABEL_ALIGN_AFTER_BARRIER are all to accomodate this new approach to switch statement generation. New tests have been added and no regressions on arm-none-eabi. gcc/ChangeLog: * config/arm/aout.h (ASM_OUTPUT_ADDR_DIFF_ELT): Add table output for different machine modes for arm. * config/arm/arm-protos.h (arm_output_casesi): New prototype. * config/arm/arm.h (CASE_VECTOR_PC_RELATIVE): Make arm use ASM_OUTPUT_ADDR_DIFF_ELT. (CASE_VECTOR_SHORTEN_MODE): Change table size calculation for TARGET_ARM. (LABEL_ALIGN_AFTER_BARRIER): Change to accommodate .p2align 2 for TARGET_ARM. * config/arm/arm.cc (arm_output_casesi): New function. * config/arm/arm.md (arm_casesi_internal): Change casesi expand and insn. for arm to use new function arm_output_casesi. gcc/testsuite/ChangeLog: * gcc.target/arm/arm-switchstatement.c: New test.
2023-10-26Darwin: Make metadata symbol lables linker-visible for GNU objc.Iain Sandoe1-1/+1
Now we have shifted to using the same relocation mechanism as clang for objective-c typeinfo the static linker needs to have a linker-visible symbol for metadata names (this is only needed for GNU objective C, for NeXT the names are in separate sections). gcc/ChangeLog: * config/darwin.h (darwin_label_is_anonymous_local_objc_name): Make metadata names linker-visibile for GNU objective C. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-26[RA]: Modfify cost calculation for dealing with equivalencesVladimir N. Makarov3-8/+179
RISCV target developers reported that pseudos with equivalence used in a loop can be spilled. Simple changes of heuristics of cost calculation of pseudos with equivalence or even ignoring equivalences resulted in numerous testsuite failures on different targets or worse spec2017 performance. This patch implements more sophisticated cost calculations of pseudos with equivalences. The patch does not change RA behaviour for targets still using the old reload pass instead of LRA. The patch solves the reported problem and improves x86-64 specint2017 a bit (specfp2017 performance stays the same). The patch takes into account how the equivalence will be used: will it be integrated into the user insns or require an input reload insn. It requires additional pass over insns. To compensate RA slow down, the patch removes a pass over insns in the reload pass used by IRA before. This also decouples IRA from reload more and will help to remove the reload pass in the future if it ever happens. gcc/ChangeLog: * dwarf2out.cc (reg_loc_descriptor): Use lra_eliminate_regs when LRA is used. * ira-costs.cc: Include regset.h. (equiv_can_be_consumed_p, get_equiv_regno, calculate_equiv_gains): New functions. (find_costs_and_classes): Call calculate_equiv_gains and redefine mem_cost of pseudos with equivs when LRA is used. * var-tracking.cc: Include ira.h and lra.h. (vt_initialize): Use lra_eliminate_regs when LRA is used.
2023-10-26Fortran: Fix incompatible types between INTEGER(8) and TYPE(c_ptr)Paul-Antoine Arras4-5/+132
In the context of an OpenMP declare variant directive, arguments of type C_PTR are sometimes recognised as C_PTR in the base function and as INTEGER(8) in the variant - or the other way around, depending on the parsing order. This patch prevents such situation from turning into a compile error. 2023-10-20 Paul-Antoine Arras <pa@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/fortran/ChangeLog: * interface.cc (gfc_compare_types): Return true if one type is C_PTR and the other is a compatible INTEGER(8). * misc.cc (gfc_typename): Handle the case where an INTEGER(8) actually holds a TYPE(C_PTR). gcc/testsuite/ChangeLog: * gfortran.dg/c_ptr_tests_20.f90: New test, checking that INTEGER(8) and TYPE(C_PTR) are recognised as compatible. * gfortran.dg/c_ptr_tests_21.f90: New test, exercising the error detection for C_FUNPTR.
2023-10-26DOC: Update COND_LEN documentJuzhe-Zhong1-6/+12
gcc/ChangeLog: * doc/md.texi: Adapt COND_LEN pseudo code.
2023-10-26PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.Roger Sayle2-2/+10
This patch is my proposed solution to PR rtl-optimization/91865. Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is possible for combine's make_compound_operation to unintentionally generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be matched by the backend. For the new test case: const int table[2] = {1, 2}; int foo (char i) { return table[i]; } compiling with -O2 -mlarge on msp430 we currently see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Failed to match this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ])))) which results in the following code: foo: AND #0xff, R12 RLAM.A #4, R12 { RRAM.A #4, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA With this patch, we now see: Trying 2 -> 7: 2: r25:HI=zero_extend(R12:QI) REG_DEAD R12:QI 7: r28:PSI=sign_extend(r25:HI)#0 REG_DEAD r25:HI Successfully matched this instruction: (set (reg:PSI 28 [ iD.1772 ]) (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing combination of insns 2 and 7 original costs 4 + 8 = 12 replacement cost 8 foo: MOV.B R12, R12 RLAM.A #1, R12 MOVX.W table(R12), R12 RETA 2023-10-26 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog PR rtl-optimization/91865 * combine.cc (make_compound_operation): Avoid creating a ZERO_EXTEND of a ZERO_EXTEND. gcc/testsuite/ChangeLog PR rtl-optimization/91865 * gcc.target/msp430/pr91865.c: New test case.
2023-10-26Pass type of comparison operands instead of comparison result to ↵liuhongt2-2/+2
truth_type_for in build_vec_cmp. gcc/c/ChangeLog: * c-typeck.cc (build_vec_cmp): Pass type of arg0 to truth_type_for. gcc/cp/ChangeLog: * typeck.cc (build_vec_cmp): Pass type of arg0 to truth_type_for.
2023-10-26LoongArch:Enable vcond_mask_mn expanders for SF/DF modes.Jiahao Xu6-14/+316
If the vcond_mask patterns don't support fp modes, the vector FP comparison instructions will not be generated. gcc/ChangeLog: * config/loongarch/lasx.md (vcond_mask_<ILASX:mode><ILASX:mode>): Change to (vcond_mask_<mode><mode256_i>): this. * config/loongarch/lsx.md (vcond_mask_<ILSX:mode><ILSX:mode>): Change to (vcond_mask_<mode><mode_i>): this. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vcond-1.c: New test. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: New test. * gcc.target/loongarch/vector/lsx/lsx-vcond-1.c: New test. * gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: New test.
2023-10-26testsuite: Fix _BitInt in gcc.misc-tests/godump-1.cStefan Schulze Frielinghaus2-12/+18
Currently _BitInt is only supported on x86_64 which means that for other targets all tests fail with e.g. gcc.misc-tests/godump-1.c:237:1: sorry, unimplemented: '_BitInt(32)' is not supported on this target 237 | _BitInt(32) b32_v; | ^~~~~~~ Instead of requiring _BitInt support for godump-1.c, move _BitInt tests into godump-2.c such that all other tests in godump-1.c are still executed in case of missing _BitInt support. gcc/testsuite/ChangeLog: * gcc.misc-tests/godump-1.c: Move _BitInt tests into godump-2.c. * gcc.misc-tests/godump-2.c: New test.
2023-10-26More '#ifdef ASM_OUTPUT_DEF' -> 'if (TARGET_SUPPORTS_ALIASES)' etc.Thomas Schwinge3-29/+39
Per commit a8b522b483ebb8c972ecfde8779a7a6ec16aecd6 (Subversion r251048) "Introduce TARGET_SUPPORTS_ALIASES", there is the idea that a back end may or may not provide symbol aliasing support ('TARGET_SUPPORTS_ALIASES') independent of '#ifdef ASM_OUTPUT_DEF', and in particular, depending not just on static but instead on dynamic (run-time) configuration. There did remain a few instances where we currently still assume that from '#ifdef ASM_OUTPUT_DEF' follows 'TARGET_SUPPORTS_ALIASES'. Change these to 'if (TARGET_SUPPORTS_ALIASES)', similarly, or 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'. gcc/ * ipa-icf.cc (sem_item::target_supports_symbol_aliases_p): 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);' before 'return true;'. * ipa-visibility.cc (function_and_variable_visibility): Change '#ifdef ASM_OUTPUT_DEF' to 'if (TARGET_SUPPORTS_ALIASES)'. * varasm.cc (output_constant_pool_contents) [#ifdef ASM_OUTPUT_DEF]: 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'. (do_assemble_alias) [#ifdef ASM_OUTPUT_DEF]: 'if (!TARGET_SUPPORTS_ALIASES)', 'gcc_checking_assert (seen_error ());'. (assemble_alias): Change '#if !defined (ASM_OUTPUT_DEF)' to 'if (!TARGET_SUPPORTS_ALIASES)'. (default_asm_output_anchor): 'gcc_checking_assert (TARGET_SUPPORTS_ALIASES);'.
2023-10-26set hardcmp eh probsAlexandre Oliva2-1/+28
Set execution count of EH blocks, and probability of EH edges. for gcc/ChangeLog PR tree-optimization/111520 * gimple-harden-conditionals.cc (pass_harden_compares::execute): Set EH edge probability and EH block execution count. for gcc/testsuite/ChangeLog PR tree-optimization/111520 * g++.dg/torture/harden-comp-pr111520.cc: New.
2023-10-26rename make_eh_edges to make_eh_edgeAlexandre Oliva6-9/+9
Since make_eh_edges creates at most one edge, rename it to make_eh_edge. for gcc/ChangeLog * tree-eh.h (make_eh_edges): Rename to... (make_eh_edge): ... this. * tree-eh.cc: Likewise. Adjust all callers... * gimple-harden-conditionals.cc: ... here, ... * gimple-harden-control-flow.cc: ... here, ... * tree-cfg.cc: ... here, ... * tree-inline.cc: ... and here.
2023-10-26Daily bump.GCC Administrator12-1/+415
2023-10-25Darwin: Handle the fPIE option specially.Iain Sandoe1-2/+13
For Darwin, PIE requires PIC codegen, but otherwise is only a link-time change. For almost all Darwin, we do not report __PIE__; the exception is 32bit X86 and from Darwin12 to 17 only (32 bit is no longer supported after Darwin17). gcc/ChangeLog: * config/darwin.cc (darwin_override_options): Handle fPIE. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>