aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2025-08-13Mark epiphany and rl78 as obsolete targetsAndrew Pinski1-1/+1
rl78 still uses reload rather than LRA. epiphany still uses reload and causes ICEs during reload. Both don't have a maintainer. epiphany has been without one since 2024 (2023 email) while rl78 has been without one since 2018. gcc/ChangeLog: * config.gcc: Mark epiphany*-*-* and rl78*-*-* as obsolete targets. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-13x86-64: Remove redundant TLS callsH.J. Lu24-154/+1153
For TLS calls: 1. UNSPEC_TLS_GD: (parallel [ (set (reg:DI 0 ax) (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) (const_int 0 [0]))) (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) (reg/f:DI 7 sp)] UNSPEC_TLS_GD) (clobber (reg:DI 5 di))]) 2. UNSPEC_TLS_LD_BASE: (parallel [ (set (reg:DI 0 ax) (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) (const_int 0 [0]))) (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) 3. UNSPEC_TLSDESC: (parallel [ (set (reg/f:DI 104) (plus:DI (unspec:DI [ (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) (reg:DI 114) (reg/f:DI 7 sp)] UNSPEC_TLSDESC) (const:DI (unspec:DI [ (symbol_ref:DI ("e") [flags 0x1a]) ] UNSPEC_DTPOFF)))) (clobber (reg:CC 17 flags))]) (parallel [ (set (reg:DI 101) (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) (reg:DI 112) (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) (clobber (reg:CC 17 flags))]) they return the same value for the same input value. But multiple calls with the same input value may be generated for simple programs like: void a(long *); int b(void); void c(void); static __thread long e; long d(void) { a(&e); if (b()) c(); return e; } When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are generated: .type d, @function d: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 leaq e@TLSDESC(%rip), %rbx movq %rbx, %rax call *e@TLSCALL(%rax) addq %fs:0, %rax movq %rax, %rdi call a@PLT call b@PLT testl %eax, %eax jne .L8 movq %rbx, %rax call *e@TLSCALL(%rax) popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 movq %fs:(%rax), %rax ret .p2align 4,,10 .p2align 3 .L8: .cfi_restore_state call c@PLT movq %rbx, %rax call *e@TLSCALL(%rax) popq %rbx .cfi_def_cfa_offset 8 movq %fs:(%rax), %rax ret .cfi_endproc There are 3 "call *e@TLSCALL(%rax)". They all return the same value. Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, extend it to also remove redundant TLS calls to generate: d: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 leaq e@TLSDESC(%rip), %rax movq %fs:0, %rdi call *e@TLSCALL(%rax) addq %rax, %rdi movq %rax, %rbx call a@PLT call b@PLT testl %eax, %eax jne .L8 movq %fs:(%rbx), %rax popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L8: .cfi_restore_state call c@PLT movq %fs:(%rbx), %rax popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc with only one "call *e@TLSCALL(%rax)". This reduces the number of __tls_get_addr calls in libgcc.a by 72%: __tls_get_addr calls before after libgcc.a 868 243 gcc/ PR target/81501 * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD, X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC. (redundant_load): Renamed to ... (redundant_pattern): This. (ix86_place_single_vector_set): Replace redundant_load with redundant_pattern. (replace_tls_call): New. (ix86_place_single_tls_call): Likewise. (pass_remove_redundant_vector_load): Renamed to ... (pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, kind, x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and candidate_vector_p. (pass_x86_cse::candidate_gnu_tls_p): New. (pass_x86_cse::candidate_gnu2_tls_p): Likewise. (pass_x86_cse::candidate_vector_p): Likewise. (remove_redundant_vector_load): Renamed to ... (pass_x86_cse::x86_cse): This. Extend to remove redundant TLS calls. (make_pass_remove_redundant_vector_load): Renamed to ... (make_pass_x86_cse): This. * config/i386/i386-passes.def: Replace pass_remove_redundant_vector_load with pass_x86_cse. * config/i386/i386-protos.h (ix86_tls_get_addr): New. (make_pass_remove_redundant_vector_load): Renamed to ... (make_pass_x86_cse): This. * config/i386/i386.cc (ix86_tls_get_addr): Remove static. * config/i386/i386.h (machine_function): Add tls_descriptor_call_multiple_p. * config/i386/i386.md (tls64): New attribute. (@tls_global_dynamic_64_<mode>): Set tls_descriptor_call_multiple_p. (@tls_local_dynamic_base_64_<mode>): Likewise. (@tls_dynamic_gnu2_64_<mode>): Likewise. (*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd. (*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to ld_base. (*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea. (*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call. (*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to combine. gcc/testsuite/ PR target/81501 * g++.target/i386/pr81501-1.C: New test. * gcc.target/i386/pr81501-1a.c: Likewise. * gcc.target/i386/pr81501-1b.c: Likewise. * gcc.target/i386/pr81501-2a.c: Likewise. * gcc.target/i386/pr81501-2b.c: Likewise. * gcc.target/i386/pr81501-3.c: Likewise. * gcc.target/i386/pr81501-4a.c: Likewise. * gcc.target/i386/pr81501-4b.c: Likewise. * gcc.target/i386/pr81501-5.c: Likewise. * gcc.target/i386/pr81501-6a.c: Likewise. * gcc.target/i386/pr81501-6b.c: Likewise. * gcc.target/i386/pr81501-7.c: Likewise. * gcc.target/i386/pr81501-8a.c: Likewise. * gcc.target/i386/pr81501-8b.c: Likewise. * gcc.target/i386/pr81501-9a.c: Likewise. * gcc.target/i386/pr81501-9b.c: Likewise. * gcc.target/i386/pr81501-10a.c: Likewise. * gcc.target/i386/pr81501-10b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-13Darwin: Handle linker '-no_deduplicate' option.Iain Sandoe4-7/+60
Newer linker support an option to disable deduplication of entities. This speeds up linking and can improve debug experience. Adopting the same criteria as clang in adding the option. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config.in: Regenerate. * config/darwin.h (DARWIN_LD_NO_DEDUPLICATE): New. (LINK_SPEC): Handle -no_deduplicate. * configure: Regenerate. * configure.ac: Detect linker support for -no_deduplicate.
2025-08-13Darwin: Handle string constants specially when asan is enabled.Iain Sandoe4-14/+48
The Darwin ABI uses a different section for string constants when address sanitizing is enabled. This adds defintions of the asan- specific sections and switches string constants to the correct section. It also makes the string constant symbols linker-visible when asan is enabled, but not otherwise. gcc/ChangeLog: * config/darwin-sections.def (asan_string_section, asan_globals_section, asan_liveness_section): New. * config/darwin.cc (objc_method_decl): Use asan sections when asan is enabled. (darwin_encode_section_info): Alter string constant linker visibility depending on asan. (machopic_select_section): Use the asan sections when asan is enabled. gcc/testsuite/ChangeLog: * gcc.dg/torture/darwin-cfstring-3.c: Adjust for amended string labels. * g++.dg/torture/darwin-cfstring-3.C: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-08-13[RISC-V][PR target/121160] Avoid bogus force_reg callJeff Law2-2/+62
When we canonicalize the comparison for a czero sequence we need to handle both integer and fp comparisons. Furthermore, within the integer space we want to make sure we promote any sub-word objects to a full word. All that is working fine. After promotion we then force the value into a register if it is not a register or constant already. The idea is not to have to special case subregs in subsequent code. This works fine except when we're presented with a floating point object that would be a subword. (subreg:SF (reg:SI)) on rv64 for example. So this tightens up that force_reg step. Bootstapped and regression tested on riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf. Pushing to the trunk after pre-commit verifies no regressions. Jeff PR target/121160 gcc/ * config/riscv/riscv.cc (canonicalize_comparands); Tighten check for forcing value into a GPR. gcc/testsuite/ * gcc.target/riscv/pr121160.c: New test.
2025-08-13forwprop: Move check of limit first [PR121474]Andrew Pinski1-3/+3
This is the first step in handling the review part of: https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692091.html ''' Oh, as we now do alias walks in forwprop maybe we should make this conditional and do this not for all pass instances, since it makes forwprop possibly a lot slower? ''' The check of the limit was after the alias check which could slow down things. This moves the check of the limit to begining of the if. Bootstrapped and tested on x86_64-linux-gnu. Pushed as obvious. PR tree-optimization/121474 gcc/ChangeLog: * tree-ssa-forwprop.cc (optimize_aggr_zeroprop): Move the check for limit before the alias check. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-13cobol: Implement and use faster __gg__packed_to_binary() routine.Robert Dubner1-42/+18
The new routine uses table lookups more effectively, and avoids __int128 arithmetic until necessary. gcc/cobol/ChangeLog: * genutil.cc (get_binary_value): Use the new routine. libgcobol/ChangeLog: * libgcobol.cc (get_binary_value_local): Use the new routine. * stringbin.cc (int_from_string): Removed. (__gg__packed_to_binary): Implement new routine. * stringbin.h (__gg__packed_to_binary): Likewise.
2025-08-13c++: fix typo in commentBenjamin Wu1-1/+1
gcc/cp/ChangeLog: * lex.cc (init_operators): Fix typo.
2025-08-13Introduce SLP_TREE_PERMUTE_PRichard Biener4-35/+32
The following wraps SLP_TREE_CODE checks against VEC_PERM_EXPR (the only relevant code) in a new SLP_TREE_PERMUTE_P predicate. Most places guard against SLP_TREE_REPRESENTATIVE being NULL. * tree-vectorizer.h (SLP_TREE_PERMUTE_P): New. * tree-vect-slp-patterns.cc (linear_loads_p): Adjust. (vect_detect_pair_op): Likewise. (addsub_pattern::recognize): Likewise. * tree-vect-slp.cc (vect_print_slp_tree): Likewise. (vect_gather_slp_loads): Likewise. (vect_is_slp_load_node): Likewise. (optimize_load_redistribution_1): Likewise. (vect_optimize_slp_pass::is_cfg_latch_edge): Likewise. (vect_optimize_slp_pass::internal_node_cost): Likewise. (vect_optimize_slp_pass::start_choosing_layouts): Likewise. (vect_optimize_slp_pass::backward_cost): Likewise. (vect_optimize_slp_pass::forward_pass): Likewise. (vect_optimize_slp_pass::get_result_with_layout): Likewise. (vect_optimize_slp_pass::materialize): Likewise. (vect_optimize_slp_pass::dump): Likewise. (vect_optimize_slp_pass::decide_masked_load_lanes): Likewise. (vect_update_slp_vf_for_node): Likewise. (vect_slp_analyze_node_operations_1): Likewise. (vect_schedule_slp_node): Likewise. (vect_schedule_scc): Likewise. * tree-vect-stmts.cc (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise.
2025-08-13Remove use of STMT_VINFO_DEF_TYPE in vect_analyze_stmtRichard Biener1-1/+1
This removes a use of STMT_VINFO_DEF_TYPE. * tree-vect-stmts.cc (vect_analyze_stmt): Use SLP_TREE_DEF_TYPE instead of STMT_VINFO_DEF_TYPE.
2025-08-13Fold GATHER_SCATTER_*_P into vect_memory_access_typeRichard Biener5-48/+49
The following splits up VMAT_GATHER_SCATTER into VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce the uses of (full) gs_info, but it also makes the kind representable by a single entry rather than the ifn and decl tristate. The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN, since that's what we end up checking. * tree-vectorizer.h (vect_memory_access_type): Replace VMAT_GATHER_SCATTER with three separate access types, VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. (mat_gather_scatter_p): New predicate. (GATHER_SCATTER_LEGACY_P): Remove. (GATHER_SCATTER_IFN_P): Likewise. (GATHER_SCATTER_EMULATED_P): Likewise. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Adjust. (get_load_store_type): Likewise. (vect_get_loop_variant_data_ptr_increment): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Likewise. * config/riscv/riscv-vector-costs.cc (costs::need_additional_vector_vars_p): Likewise. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-08-13Simplify vect_supportable_dr_alignment APIRichard Biener3-4/+3
The gather_scatter_info pointer is only used as flag, so pass down a flag. * tree-vectorizer.h (vect_supportable_dr_alignment): Pass a bool instead of a pointer to gather_scatter_info. * tree-vect-data-refs.cc (vect_supportable_dr_alignment): Likewise. * tree-vect-stmts.cc (get_load_store_type): Adjust.
2025-08-13Fortran: Use associated TBP subroutine not found [PR89092]Paul Thomas2-1/+50
2025-08-13 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/89092 * resolve.cc (was_declared): Add subroutine attribute. gcc/testsuite/ PR fortran/89092 * gfortran.dg/pr89092.f90: New test.
2025-08-13LoongArch: Define hook TARGET_COMPUTE_PRESSURE_CLASSES[PR120476].Lulu Cheng1-0/+15
The rtx cost value defined by the target backend affects the calculation of register pressure classes in the IRA, thus affecting scheduling. This may cause program performance degradation. For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r. This problem can be avoided by defining a set of register pressure classes in the target backend instead of using the default IRA to automatically calculate them. gcc/ChangeLog: PR target/120476 * config/loongarch/loongarch.cc (loongarch_compute_pressure_classes): New function. (TARGET_COMPUTE_PRESSURE_CLASSES): Define.
2025-08-13LoongArch: Add support for _BitInt [PR117599]Yang Yujie5-3/+235
This patch adds support for C23's _BitInt for LoongArch. From the LoongArch psABI[1]: > _BitInt(N) objects are stored in little-endian order in memory > and are signed by default. > > For N ≤ 64, a _BitInt(N) object have the same size and alignment > of the smallest fundamental integral type that can contain it. > The unused high-order bits within this containing type are filled > with sign or zero extension of the N-bit value, depending on whether > the _BitInt(N) object is signed or unsigned. The _BitInt(N) object > propagates its signedness to the containing type and is laid out > in a register or memory as an object of this type. > > For N > 64, _BitInt(N) objects are implemented as structs of 64-bit > integer chunks. The number of chunks is the smallest even integer M > so that M * 64 ≥ N. These objects are of the same size of the struct > containing the chunks, but always have 16-byte alignment. If there > are unused bits in the highest-ordered chunk that contains used > bits, they are defined as the sign- or zero- extension of the used > bits depending on whether the _BitInt(N) object is signed or > unsigned. If an entire chunk is unused, its bits are undefined. [1] https://github.com/loongson/la-abi-specs PR target/117599 gcc/ChangeLog: * config/loongarch/loongarch.h: Define a PROMOTE_MODE case for small _BitInts. * config/loongarch/loongarch.cc (loongarch_promote_function_mode): Same. (loongarch_bitint_type_info): New function. (TARGET_C_BITINT_TYPE_INFO): Declare. libgcc/ChangeLog: * config/loongarch/t-softfp-tf: Enable _BitInt helper functions. * config/loongarch/t-loongarch: Same. * config/loongarch/libgcc-loongarch.ver: New file. gcc/testsuite/ChangeLog: * gcc.target/loongarch/bitint-alignments.c: New test. * gcc.target/loongarch/bitint-args.c: New test. * gcc.target/loongarch/bitint-sizes.c: New test.
2025-08-12[RISC-V][PR target/121113] Handle HFmode in various insn reservationsJeff Law4-4/+15
So this is a minor bug in a few DFA descriptions such as the Xiangshan and a couple of the SiFive descriptions. While Xiangshan covers every insn type, some of the reservations check the mode of the operation. Concretely the fdiv/fsqrt unit reservations vary based on the mode. They handled DF/SF, but not HF (the relevant iterators don't include BF). This patch just adds HF support with the same characteristics as SF. Those who know these designs better could perhaps improve the reservation, but this at least keeps us from aborting. I did check the other published DFAs for mode dependent reservations. That's show I found the p400/p600 issue. Tested in my tester, waiting for CI to render its verdict before pushing. PR target/121113 gcc/ * config/riscv/sifive-p400.md: Handle HFmode for fdiv/fsqrt. * config/riscv/sifive-p600.md: Likewise. * config/riscv/xiangshan.md: Likewise. gcc/testsuite/ * gcc.target/riscv/pr121113.c: New test.
2025-08-12cobol: Implement faster zoned decimal to binary conversion.Robert Dubner1-205/+51
Replace " value *= 10; value += digit" routines with a new one that does two digits at a time and avoids __int128 calculations until they are necessary. These changes also clean up the conversion behavior when a digit is not valid. gcc/cobol/ChangeLog: * genutil.cc (get_binary_value): Use the new routine. libgcobol/ChangeLog: * libgcobol.cc (int128_to_field): Use the new routine. (get_binary_value_local): Use the new routine. (format_for_display_internal): Formatting. (__gg__get_file_descriptor): Likewise. * stringbin.cc (string_from_combined): Formatting. (packed_from_combined): Likewise. (int_from_string): New routine. (__gg__numeric_display_to_binary): Likewise. * stringbin.h (__gg__numeric_display_to_binary): Likewise.
2025-08-12testsuite: fix jit.dg/test-error-impossible-must-tail-call.c [PR119783]David Malcolm1-2/+4
I added this test back in r7-934-g15c671a79ca66d, but it looks like r15-2125-g81824596361cf4 changed the error message. gcc/testsuite/ChangeLog: PR testsuite/119783 jit.dg/test-error-impossible-must-tail-call.c * jit.dg/test-error-impossible-must-tail-call.c (verify_code): Check that we get a suitable-looking error message, but don't try to specify exactly what the message is. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-08-12jit: don't use &vect[0] in libgccjit++.h [PR121516]David Malcolm1-9/+9
gcc/jit/ChangeLog: PR jit/121516 * libgccjit++.h (context::new_struct_type): Replace use of &fields[0] with fields.data (). (context::new_function): Likewise for params. (context::new_rvalue): Likewise for elements. (context::new_call): Likewise for args. (block::end_with_switch): Likewise for cases. (block::end_with_extended_asm_goto): Likewise for goto_blocks. (context::new_struct_ctor): Likewise for fields and values. (context::new_array_ctor): Likewise for values. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-08-12x86: Convert integer constant to mode of moveH.J. Lu2-0/+20
For (set (reg/v:DI 106 [ k ]) (const_int 3000000000 [0xb2d05e00])) ... (set (reg:V4SI 115 [ _13 ]) (vec_duplicate:V4SI (subreg:SI (reg/v:DI 106 [ k ]) 0))) ... (set (reg:V2SI 118 [ _9 ]) (vec_duplicate:V2SI (subreg:SI (reg/v:DI 106 [ k ]) 0))) we should generate (set (reg:SI 125) (const_int -1294967296 [0xffffffffb2d05e00])) (set (reg:V4SI 124) (vec_duplicate:V4SI (reg:VSI 125)) ... (set (reg:V4SI 115 [ _13 ]) (reg:V4SI 124) ... (set (reg:V2SI 118 [ _9 ]) (subreg:V2SI (reg:V4SI 124)) by converting integer constant to mode of move. gcc/ PR target/121497 * config/i386/i386-features.cc (ix86_broadcast_inner): Convert integer constant to mode of move gcc/testsuite/ PR target/121497 * gcc.target/i386/pr121497.c: New test. Co-authored-by: Liu, Hongtao <hongtao.liu@intel.com> Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-13Daily bump.GCC Administrator5-1/+178
2025-08-13RISC-V: RISC-V: Add test for vec_duplicate + vmerge.vvm combine with GR2VR ↵Pan Li18-0/+398
cost 0, 1 and 15 Add asm dump check and run test for vec_duplicate + vmerge.vvm combine to vmerge.vxm, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-1-i8.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-2-i8.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx-merge-3-i8.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmerge-run-1-i8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-13RISC-V: Combine vec_duplicate + vmerge.vv to vmerge.vx on GR2VR costPan Li1-0/+18
This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_MERGE_0(T) \ void \ test_vx_merge_##T##_case_0 (T * restrict out, T * restrict in, \ T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ { \ if (i % 2 == 0) \ out[i] = x; \ else \ out[i] = in[i]; \ } \ } DEF_VX_MERGE_0(int32_t) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 ... 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma ... 22 │ vmerge.vvm v1,v1,v2,v0 ... 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 ... 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma ... 20 │ vmerge.vxm v1,v1,a2,v0 ... 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*merge_vx_<mode>): Add new pattern to combine the vmerge.vxm. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-12RISC-V: Expand const_vector with 2 elts per pattern.Robin Dapp2-17/+116
Hi, In PR121334 we are asked to expand a const_vector of size 4 with poly_int elements. It has 2 elts per pattern so is neither a const_vector_duplicate nor a const_vector_stepped. We don't allow this kind of constant in legitimate_constant_p but expr apparently still wants us to expand it under certain conditions. This patch implements a basic expander for such kinds of patterns. As slide1up is used to build the individual vectors it also adds a helper function expand_slide1up. I regtested on rv64gcv_zvl512b but unfortunately the newly created pattern is not even executed. I tried some variations of the original code but didn't manage to trigger it. Regards Robin PR target/121334 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_slide1up): New function. (expand_vector_init_trailing_same_elem): Use new function. (expand_const_vector_onestep): New function. (expand_const_vector): Uew expand_slide1up. (expand_vector_init_merge_repeating_sequence): Ditto. (shuffle_off_by_one_patterns): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr121334.c: New test.
2025-08-12LoongArch: macro instead enum for base abi typemengqinggang1-6/+4
enum can't be used in #if. For #if expression, identifiers that are not macros, which are all considered to be the number zero. This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776. gcc/ChangeLog: * config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro. (ABI_BASE_LP64F): New macro. (ABI_BASE_LP64S): New macro. (N_ABI_BASE_TYPES): New macro.
2025-08-12Cleanup SLP decision during loop analysisRichard Biener1-18/+19
The following refactors the now misleading slp_done_for_suggested_uf and slp states kept during vectorizer loop analysis. * tree-vect-loop.cc (vect_analyze_loop_2): Change slp_done_for_suggested_uf to a boolean single_lane_slp_done_for_suggested_uf. Change slp to force_single_lane boolean. (vect_analyze_loop_1): Adjust similarly.
2025-08-12fwprop: Don't propagate asms [PR121253]Richard Sandiford2-0/+30
For the reasons explained in the comment, fwprop shouldn't even try to propagate an asm definition. gcc/ PR rtl-optimization/121253 * fwprop.cc (forward_propagate_into): Don't propagate asm defs. gcc/testsuite/ PR rtl-optimization/121253 * gcc.target/aarch64/pr121253.c: New test.
2025-08-12tree-optimization/121509 - failure to detect unvectorizable loopRichard Biener2-1/+48
With the hybrid stmt detection no longer working as a gate-keeper to detect unhandled stmts we have to, and can, detect those earlier. The appropriate place is vect_mark_stmts_to_be_vectorized where for trivially relevant PHIs we can stop analyzing when the PHI wasn't classified as a known def during vect_analyze_scalar_cycles. PR tree-optimization/121509 * tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Fail early when we detect a relevant but not handled PHI. * gcc.dg/vect/pr121509.c: New testcase.
2025-08-12tree-optimization/121514 - ICE with recent VN improvementRichard Biener2-4/+28
When inserting a compensation stmt during VN we are making sure to register the result for the original stmt into the hashtable so VN iteration has the chance to converge and we avoid inserting another copy each time. But the implementation doesn't work for non-SSA name values, and is also not necessary for constants since we did not insert anything for them. The following appropriately guards the calls to vn_nary_op_insert_stmt as was already done in one place. PR tree-optimization/121514 * tree-ssa-sccvn.cc (visit_nary_op): Only call vn_nary_op_insert_stmt for SSA name result. * gcc.dg/torture/pr121514.c: New testcase.
2025-08-12forwprop: Fix non-call exceptions some more with copy prop for aggregates ↵Andrew Pinski1-0/+5
[PR121494] Note this conflicts with my not yet approved patch for copy prop for aggregates into function arguments (I will get back to that soon). So the problem here is that I assumed if: *a = decl1; would not cause an exception that: decl2 = *a; would cause not cause one too. I was wrong, in some cases where the Ada front-end marks `*a` in the store as TREE_THIS_NOTRAP (due to knowing never be null or some other cases). So that means when we prop decl1 into the statement storing decl2, we need to mark that statement as possible to cleanup for eh. Bootstraped and tested on x86_64-linux-gnu. Also tested on x86_64-linux-gnu with a hack to force generate LC constant decls in the gimplifier. PR tree-optimization/121494 gcc/ChangeLog: * tree-ssa-forwprop.cc (optimize_agr_copyprop): Mark the bb of the use stmt if needed for eh cleanup. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-08-12Do not set STMT_VINFO_VECTYPE for non-dataref stmtsRichard Biener1-9/+9
Now that all STMT_VINFO_VECTYPE uses from vectorizable_* have been pruged there's no longer a need to have STMT_VINFO_VECTYPE set. We still rely on it being present on data-ref stmts and there it can differ between different SLP instances when doing BB vectorization. The following removes the setting from vect_analyze_stmt and vect_transform_stmt. Note the following clears STMT_VINFO_VECTYPE from pattern stmts (the vector type should have moved to the SLP tree by this time). * tree-vect-stmts.cc (vect_analyze_stmt): Only set STMT_VINFO_VECTYPE for dataref SLP representatives. Clear it for others and do not restore the original value. (vect_transform_stmt): Likewise.
2025-08-12Pass down vector type to avoid STMT_VINFO_VECTYPE on reduc-infoRichard Biener1-5/+4
The following passes down the vector type to functions instead of querying it from the reduc-info stmt-info. * tree-vect-loop.cc (get_initial_defs_for_reduction): Get vector type as argument. (vect_find_reusable_accumulator): Likewise. (vect_transform_cycle_phi): Adjust.
2025-08-12Do not use STMT_VINFO_VECTYPE in vectorizable_reductionRichard Biener1-1/+1
There's one use of STMT_VINFO_VECTYPE in vectorizable_reduction where I'm only 99% sure which SLP_TREE_VECTYPE to replace it with (vectorizable_reduction needs a lot of post-only-SLP TLC). The following replaces it with the hopefully appropriate one. * tree-vect-loop.cc (vectorizable_reduction): Replace STMT_VINFO_VECTYPE use with SLP_TREE_VECTYPE.
2025-08-12tree-optimization/121493 - another missed VN with aggregate copyRichard Biener2-0/+34
This is another case where opportunistically handling a first aggregate copy where we failed to match up the refs exactly (as we don't insert missing handling components) yields to a failure in the second aggregate copy that we visit. Add another fixup to deal with such situations, in-line with that present opportunistic handling. PR tree-optimization/121493 * tree-ssa-sccvn.cc (vn_reference_lookup_3): Opportunistically strip components with known offset. * gcc.dg/tree-ssa/ssa-fre-109.c: New testcase.
2025-08-12Restrict aggregate copy VN generalizationRichard Biener1-0/+5
The following avoids ending up with a MEM_REF as component to apply. * tree-ssa-sccvn.cc (vn_reference_lookup_3): When we fail to match up the two base MEM_REFs, fail.
2025-08-12fortran: add optional lower arg to c_f_pointerYuao Ma8-43/+196
This patch adds support for the optional lower argument in intrinsic c_f_pointer specified in Fortran 2023. Test cases and documentation have also been updated. gcc/fortran/ChangeLog: * check.cc (gfc_check_c_f_pointer): Check lower arg legitimacy. * intrinsic.cc (add_subroutines): Teach c_f_pointer about lower arg. * intrinsic.h (gfc_check_c_f_pointer): Add lower arg. * intrinsic.texi: Update lower arg for c_f_pointer. * trans-intrinsic.cc (conv_isocbinding_subroutine): Add logic handle lower. gcc/testsuite/ChangeLog: * gfortran.dg/c_f_pointer_shape_tests_7.f90: New test. * gfortran.dg/c_f_pointer_shape_tests_8.f90: New test. * gfortran.dg/c_f_pointer_shape_tests_9.f90: New test. Signed-off-by: Yuao Ma <c8ef@outlook.com>
2025-08-11Improve initial code generation for addsi/adddiShreya Munnangi4-4/+159
This is a patch primarily from Shreya, though I think she cribbed some code from Philipp that we had internally within Ventana and I made some minor adjustments as well. So the basic idea here is similar to her work on logical ops -- specifically when we can generate more efficient code at expansion time, then do so. In some cases the net is better code; in other cases we lessen reliance on mvconst_internal and finally it provides infrastructure that I think will help address an issue Paul Antoine reported a little while back. The most obvious case is using paired addis from initial code generation for some constants. It will also use a shNadd insn when the cost to synthesize the original value is higher than the right-shifted value. Finally it will negate the constant and use "sub" if the negated constant is cheaper than the original constant. There's more work to do in here, particularly WRT 32 bit objects for rv64. Shreya is looking at that right now. There may also be cases where another shNadd or addi would be profitable. We haven't really explored those cases in any detail, while there may be cases to handle, it's unclear how often they occur in practice. I don't want to remove the define_insn_and_split for the paired addi cases yet. I think that likely happens as a side effect of fixing Paul Antoine's issue. Bootstrapped and regression tested on a BPI & Pioneer box. Will obviously wait for the pre-commit tester before moving forward. Jeff PR target/120603 gcc/ * config/riscv/riscv-protos.h (synthesize_add): Add prototype. * config/riscv/riscv.cc (synthesize_add): New function. * config/riscv/riscv.md (addsi3): Allow any constant as operands[2] in the expander. Force the constant into a register as needed for TARGET_64BIT. Use synthesize_add for !TARGET_64BIT. (*adddi3): Renamed from adddi3. (adddi3): New expander. Use synthesize_add. gcc/testsuite * gcc.target/riscv/add-synthesis-1.c: New test. Co-authored-by: Jeff Law <jlaw@ventanamicro.com> Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2025-08-11cobol: Bring EBCDIC NumericDisplay variables into IBM compliance.Robert Dubner4-349/+224
The internal representation of Numeric Display (ND) zoned decimal variables when operating in EBCDIC mode has been brought into compliance with IBM conventions. This requires changes to data input, data output, internal conversion of zoned decimal to binary, and variable assignment. gcc/cobol/ChangeLog: * genapi.cc (compare_binary_binary): Formatting. (cobol_compare): Formatting. (mh_numeric_display): Rewrite "move ND to ND" algorithm. (initial_from_initial): Proper initialization of EBCDIC ND variables. * genmath.cc (fast_add): Delete comment. * genutil.cc (get_binary_value): Modify for updated EBCDIC. libgcobol/ChangeLog: * common-defs.h (NUMERIC_DISPLAY_SIGN_BIT): New comment; new constant. (EBCDIC_MINUS): New constant. (EBCDIC_PLUS): Likewise. (EBCDIC_ZERO): Likewise. (EBCDIC_NINE): Likewise. (PACKED_NYBBLE_PLUS): Likewise. (PACKED_NYBBLE_MINUS): Likewise. (PACKED_NYBBLE_UNSIGNED): Likewise. (NUMERIC_DISPLAY_SIGN_BIT_ASCII): Likewise. (NUMERIC_DISPLAY_SIGN_BIT_EBCDIC): Likewise. (SEPARATE_PLUS): Likewise. (SEPARATE_MINUS): Likewise. (ZONED_ZERO): Likewise. (ZONE_SIGNED_EBCDIC): Likewise. * configure: Regenerate. * libgcobol.cc (turn_sign_bit_on): Handle new EBCDIC sign convention. (turn_sign_bit_off): Likewise. (is_sign_bit_on): Likewise. (int128_to_field): EBCDIC NumericDisplay conversion. (get_binary_value_local): Likewise. (format_for_display_internal): Likewise. (normalize_id): Likewise. (__gg__inspect_format_1): Convert EBCDIC negative numbers to positive. * stringbin.cc (packed_from_combined): Quell cppcheck warning. gcc/testsuite/ChangeLog: * cobol.dg/group2/ALLOCATE_Rule_8_OPTION_INITIALIZE_with_figconst.out: Change test for updated handling of Numeric Display variables.
2025-08-12Daily bump.GCC Administrator7-1/+342
2025-08-11aarch64: Fix condition accepted by mov<ALLI>ccRichard Henderson3-8/+27
Reject QI/HImode conditions, which would require extension in order to compare. Fixes z.c:10:1: error: unrecognizable insn: 10 | } | ^ (insn 23 22 24 2 (set (reg:CC 66 cc) (compare:CC (reg:HI 128) (reg:HI 127))) "z.c":6:6 -1 (nil)) during RTL pass: vregs gcc: * config/aarch64/aarch64.md (mov<ALLI>cc): Accept MODE_CC conditions directly; reject QI/HImode conditions. gcc/testsuite: * gcc.target/aarch64/cmpbr-3.c: New. * gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: Simplify test for csel by ignoring the actual registers used.
2025-08-11aarch64: CMPBR branches must be invertableRichard Henderson7-47/+162
Restrict the immediate range to the intersection of LT/GE and GT/LE so that cfglayout can invert the condition to redirect any branch. gcc: PR target/121388 * config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the range of LT/GE and GT/LE to their intersections. * config/aarch64/aarch64.md (*aarch64_cb<INT_CMP><GPI>): Unexport. Use cmpbr_imm_predicate instead of aarch64_cb_rhs. * config/aarch64/constraints.md (Uc1): Accept 0..62. (Uc2): Remove. * config/aarch64/iterators.md (cmpbr_imm_predicate): New. (cmpbr_imm_constraint): Update to match aarch64_cb_rhs. * config/aarch64/predicates.md (aarch64_cb_reg_i63_operand): New. (aarch64_cb_reg_i62_operand): New. gcc/testsuite: PR target/121388 * gcc.target/aarch64/cmpbr.c (u32_x0_ult_64): XFAIL. (i32_x0_slt_64, u64_x0_ult_64, i64_x0_slt_64): XFAIL. * gcc.target/aarch64/cmpbr-2.c: New.
2025-08-11aarch64: Consider TARGET_CMPBR in rtx costsRichard Henderson1-0/+9
gcc: * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Use aarch64_cb_rhs to match CB insns.
2025-08-11aarch64: Fix gcc.target/aarch64/cmpbr.c enableRichard Henderson1-1/+0
gcc/testsuite: * gcc.target/aarch64/cmpbr.c: Only compile, not assemble, since we want to scan the assembly.
2025-08-11aarch64: Remove cc clobber from *aarch64_tbz<LTGE><ALLI>1Richard Henderson2-20/+31
There is a conflict between aarch64_tbzltdi1 and aarch64_cbltdi with respect to pnum_clobbers, resulting in a recog failure: 0xa1fffe fancy_abort(char const*, int, char const*) ../../gcc/diagnostics/context.cc:1640 0x81340e patch_jump_insn ../../gcc/cfgrtl.cc:1303 0xc0eafe redirect_branch_edge ../../gcc/cfgrtl.cc:1330 0xc0f372 cfg_layout_redirect_edge_and_branch ../../gcc/cfgrtl.cc:4736 0xbfb6b9 redirect_edge_and_branch(edge_def*, basic_block_def*) ../../gcc/cfghooks.cc:391 0x1fa9310 try_forward_edges ../../gcc/cfgcleanup.cc:561 0x1fa9310 try_optimize_cfg ../../gcc/cfgcleanup.cc:2931 0x1fa9310 cleanup_cfg(int) ../../gcc/cfgcleanup.cc:3143 0x1fe11e8 rest_of_handle_cse ../../gcc/cse.cc:7591 0x1fe11e8 execute ../../gcc/cse.cc:7622 The simplest solution is to remove the clobber from aarch64_tbz. This removes the possibility of expansion via TST+B.cond, which will merely fall back to TBNZ+B on shorter branches. gcc: PR target/121385 * config/aarch64/aarch64.md (*aarch64_tbz<LTGE><ALLI>1): Remove cc clobber and expansion via TST+Bcond. gcc/testsuite: PR target/121385 * gcc.target/aarch64/cmpbr-1.c: New.
2025-08-11aarch64: Disable TARGET_CMPBR with aarch64_track_speculationRichard Henderson1-2/+3
With -mtrack-speculation, CC_REGNUM must be used at every conditional branch. gcc: * config/aarch64/aarch64.h (TARGET_CMPBR): False when aarch64_track_speculation is true.
2025-08-11aarch64: Fix aarch64_split_imm24 patternsRichard Henderson3-49/+63
Both patterns used !reload_completed as a condition, which is questionable at best. The branch pattern failed to include a clobber of CC_REGNUM. Both problems were unlikely to trigger in practice, due to how the optimization pipeline is organized, but let's fix them anyway. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_split_imm24): New. * config/aarch64/aarch64-protos.h: Update. * config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Use it. Add match_scratch and cc clobbers. Use match_operator instead of iterator expansion. (*compare_cstore<GPI>_insn): Likewise.
2025-08-11aarch64: Rename and improve aarch64_split_imm24Richard Henderson3-13/+14
Two of the three uses of aarch64_imm24 included the important follow-up tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion within aarch64_if_then_else_costs produced incorrect costing. Since aarch64_split_imm24 has already matched a non-negative CONST_INT, drill down from aarch64_plus_operand to aarch64_uimm12_shift. gcc: * config/aarch64/predicates.md (aarch64_split_imm24): Rename from aarch64_imm24; exclude aarch64_move_imm and aarch64_uimm12_shift. * config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Update for aarch64_split_imm24. (*compare_cstore<GPI>_insn): Likewise. * config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Likewise.
2025-08-11aarch64: Fix gcs save/restore_stack_nonlocalRichard Henderson2-8/+8
The save/restore_stack_nonlocal patterns passed a DImode rtx to gen_tbranch_neqi3 for a QImode compare. But since we're seeding r16 with 1, GCSEnabled will clear the only set bit in r16, so we can use CBNZ instead of TBNZ. gcc: * config/aarch64/aarch64.md (tbranch_<EQL><SHORT>3): Remove. (save_stack_nonlocal): Use aarch64_gen_compare_zero_and_branch. (restore_stack_nonlocal): Likewise. gcc/testsuite: * gcc.target/aarch64/gcs-nonlocal-3.c: Match cbnz.
2025-08-11aarch64: Use aarch64_gen_compare_zero_and_branch in aarch64_restore_zaRichard Henderson4-3/+7
With -mtrack-speculation, the pattern that was directly expanded by aarch64_restore_za is disabled. Use the helper function instead. gcc: * config/aarch64/aarch64.cc (aarch64_gen_compare_zero_and_branch): Export. * config/aarch64/aarch64-protos.h (aarch64_gen_compare_zero_and_branch): Declare it. * config/aarch64/aarch64-sme.md (aarch64_restore_za): Use it. * config/aarch64/aarch64.md (*aarch64_cbz<EQL><GPI>): Unexport.
2025-08-11aarch64: Reorg aarch64_if_the_else_costs, conditional branchRichard Henderson1-23/+32
gcc: * config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to include the cost of inner within TBZ sign-bit test, only match CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test.