aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-06-05RISC-V: Support Sstvala extension.Jiawei2-0/+15
Support the Sstvala extension, which provides all needed values in Supervisor Trap Value register (stval). gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension definition. * config/riscv/riscv-ext.opt: New extension mask. * doc/riscv-ext.texi: Document the new extension. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-sstvala.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-05RISC-V: Support Sscounterenw extension.Jiawei2-0/+15
Support the Sscounterenw extension, which allows writeable enables for any supported counter. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension definition. * config/riscv/riscv-ext.opt: New extension mask. * doc/riscv-ext.texi: Document the new extension. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-sscounterenw.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-05RISC-V: Support Ssccptr extension.Jiawei2-0/+15
Support the Ssccptr extension, which allows the main memory to support page table reads. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension definition. * config/riscv/riscv-ext.opt: New extension mask. * doc/riscv-ext.texi: Document the new extension. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-ssccptr.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-05RISC-V: Support Smrnmi extension.Jiawei2-0/+15
Support the Smrnmi extension, which provides new CSRs for Machine mode Non-Maskable Interrupts. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension definition. * config/riscv/riscv-ext.opt: New extension mask. * doc/riscv-ext.texi: Document the new extension. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-smrnmi.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-05RISC-V: Support Sm/scsrind extensions.Jiawei2-0/+30
Support the Sm/scsrind extensions, which provide indirect access to machine-level CSRs. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension definition. * config/riscv/riscv-ext.opt: New extension mask. * doc/riscv-ext.texi: Document the new extension. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-smcsrind.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-05i386: Fix vmovvdup's mem attributeHu, Lin11-3/+3
Some vmovvdup pattern's type attribute is sselog1 and then mem attribute is both. Modify type attribute according to other patterns about vmovvdup. gcc/ChangeLog: * config/i386/sse.md (avx512f_movddup512<mask_name>): Change sselog1 to ssemov. (avx_movddup256<mask_name>): Ditto. (*vec_dupv2di): Change alternative 4's type attribute from sselog1 to ssemov.
2025-06-05RISC-V: Update extension defination.Jiawei1-141/+141
Update the defination of RISC-V extensions in riscv-ext.def. gcc/ChangeLog: * config/riscv/riscv-ext.def: Update declaration. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-04Avoid SIGSEGV in nvptx 'mkoffload' for voluminous PTX codeThomas Schwinge1-3/+9
In commit 50be486dff4ea2676ed022e9524ef190b92ae2b1 "nvptx: libgomp+mkoffload.cc: Prepare for reverse offload fn lookup", some additional tracking of the PTX code was added, and this assumes that potentially every single character of PTX code needs to be tracked as a new chunk of PTX code. That's problematic if we're dealing with voluminous PTX code (for example, non-trivial C++ code), and the 'file_idx' 'alloca'tion then causes stack overflow. For example: FAIL: libgomp.c++/target-std__valarray-1.C (test for excess errors) UNRESOLVED: libgomp.c++/target-std__valarray-1.C compilation failed to produce executable lto-wrapper: fatal error: [...]/build-gcc/gcc//accel/nvptx-none/mkoffload terminated with signal 11 [Segmentation fault], core dumped gcc/ * config/nvptx/mkoffload.cc (process): Use an 'auto_vec' for 'file_idx'.
2025-06-04[PATCH] RISC-V: Imply zicsr for svade and svadu extensions.Dongyan Chen1-2/+2
This patch implies zicsr for svade and svadu extensions. According to the riscv-privileged spec, the svade and svadu extensions are privileged instructions, so they should imply zicsr. gcc/ChangeLog: * config/riscv/riscv-ext.def: Imply zicsr.
2025-06-04[PATCH v2] RISC-V: Add svbare extension.Dongyan Chen2-0/+15
This patch support svbare extension, which is an extension in RVA23 profile. To enable GCC to recognize and process svbare extension correctly at compile time. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension defs. * config/riscv/riscv-ext.opt: Ditto. * doc/riscv-ext.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-60.c: New test.
2025-06-04RISC-V: Leverage get_vector_binary_rtx_cost to avoid code dup [NFC]Pan Li1-28/+16
Some similar code could be wrapped to func get_vector_binary_rtx_cost, thus leverage this function to avoid code duplication. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Rename the args to scalar2vr. (riscv_rtx_costs): Leverage above func to avoid code dup. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-06-04RISC-V: Add Shlcofideleg extension.Jiawei2-0/+15
This patch add the RISC-V Shlcofideleg extension. It supports delegating LCOFI interrupts(the count-overflow interrupts) to VS-mode.[1] [1] https://riscv.github.io/riscv-isa-manual/snapshot/privileged gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension defs. * config/riscv/riscv-ext.opt: Ditto. * doc/riscv-ext.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-shlocofideleg.c: New test. Signed-off-by: Jiawei <jiawei@iscas.ac.cn>
2025-06-04i386: Add more peephole2 for APX NDDHu, Lin11-0/+135
The patch aims to optimize movb (%rdi), %al movq %rdi, %rbx xorl %esi, %eax, %edx movb %dl, (%rdi) cmpb %sil, %al jne to xorb %sil, (%rdi) movq %rdi, %rbx jne Reduce 2 mov and 1 cmp instructions. Due to APX NDD allowing the dest register and source register to be different, some original peephole2 are invalid. Add new peephole2 patterns for APX NDD. gcc/ChangeLog: * config/i386/i386.md (define_peephole2): Define some new peephole2 for APX NDD. gcc/testsuite/ChangeLog: * gcc.target/i386/pr49095-2.c: New test.
2025-06-04i386: Add more forms peephole2 for adc/sbbHu, Lin11-0/+186
Enable -mapxf will change some patterns about adc/sbb. Hence gcc will raise an extra mov like movq 8(%rdi), %rax adcq %rax, 8(%rsi), %rax movq %rax, 8(%rdi) rather than movq 8(%rsi), %rax adcq %rax, 8(%rdi) The patch add more kinds of peephole2 to eliminate the extra mov. gcc/ChangeLog: * config/i386/i386.md: Add 4 new peephole2 by swap the original peephole2's operands' order to support new pattern. gcc/testsuite/ChangeLog: * gcc.target/i386/pr79173-13.c: New test. * gcc.target/i386/pr79173-14.c: Ditto. * gcc.target/i386/pr79173-15.c: Ditto. * gcc.target/i386/pr79173-16.c: Ditto. * gcc.target/i386/pr79173-17.c: Ditto. * gcc.target/i386/pr79173-18.c: Ditto.
2025-06-03RISC-V: Combine vec_duplicate + vidv.vv to vdiv.vx on GR2VR costPan Li3-1/+36
This patch would like to combine the vec_duplicate + vdiv.vv to the vdiv.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, /) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vdiv.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vdiv.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_vec_dup): Add new case for DIV op. * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add new func to get the cost of vector binary. (riscv_rtx_costs): Add div rtx match and leverage above wrap to get cost. * config/riscv/vector-iterators.md: Add new op div to no_shift_vx_op. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-06-03RISC-V: Use helper function to get FPR to VR move costPaul-Antoine Arras2-5/+4
Since last patch introduced get_fr2vr_cost () to get the correct cost to move data from a floating-point to a vector register, this patch replaces existing uses of the constant FR2VR. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (costs::adjust_stmt_cost): Replace FR2VR with get_fr2vr_cost (). * config/riscv/riscv.cc (riscv_register_move_cost): Likewise. (riscv_builtin_vectorization_cost): Likewise.
2025-06-03RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]Paul-Antoine Arras5-4/+93
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v6,fa0 vfmadd.vv v9,v6,v7 After, we get only one: vfmadd.vf v9,fa0,v7 On SPEC2017's 503.bwaves_r, depending on the workload, the reduction in dynamic instruction count varies from -4.66% to -4.75%. PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Add new pattern to combine vec_duplicate + vfm{add,sub}.vv into vfm{add,sub}.vf. * config/riscv/riscv-opts.h (FPR2VR_COST_UNPROVIDED): Define. * config/riscv/riscv-protos.h (get_fr2vr_cost): Declare function. * config/riscv/riscv.cc (riscv_rtx_costs): Add cost model for MULT with VEC_DUPLICATE. (get_fr2vr_cost): New function. * config/riscv/riscv.opt: Add new option --param=fpr2vr-cost. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_data.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: New test.
2025-06-02[PATCH] RISC-V: Add smcntrpmf extension.Dongyan Chen2-0/+15
This patch support smcntrpmf extension[1]. To enable GCC to recognize and process smcntrpmf extension correctly at compile time. [1]https://github.com/riscvarchive/riscv-smcntrpmf gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension defs. * config/riscv/riscv-ext.opt: Ditto. * doc/riscv-ext.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-59.c: New test.
2025-06-02or1k: Support long jump offsets with -mcmodel=largeStafford Horne2-3/+4
The -mcmodel=large option was originally added to handle generation of large binaries with large PLTs. However, when compiling the Linux kernel with allyesconfig the output binary is so large that the jump instruction 26-bit immediate is not large enough to store the jump offset to some symbols when linking. Example error: relocation truncated to fit: R_OR1K_INSN_REL_26 against symbol `do_fpe_trap' defined in .text section in arch/openrisc/kernel/traps.o We fix this by forcing jump offsets to registers when -mcmodel=large. Note, to get the Linux kernel allyesconfig config to work with OpenRISC, this patch is needed along with some other patches to the Linux hand coded assembly bits. gcc/ChangeLog: * config/or1k/predicates.md (call_insn_operand): Add condition to not allow symbol_ref operands with TARGET_CMODEL_LARGE. * config/or1k/or1k.opt: Document new -mcmodel=large implications. * doc/invoke.texi: Likewise. gcc/testsuite/ChangeLog: * gcc.target/or1k/call-1.c: New test. * gcc.target/or1k/got-1.c: New test.
2025-06-02RISC-V: Adjust build rule for gen-riscv-ext-opt and gen-riscv-ext-texiKito Cheng1-4/+9
Separate the build rules to compile and link stage to make sure BUILD_LINKERFLAGS and BUILD_LDFLAGS are applied correctly. We hit this issue when we try to build GCC with non-system-default g++, and it use newer libstdc++, and then got error from using older libstdc++ from system, that should not happened if we link with -static-libgcc and -static-libstdc++. gcc/ChangeLog: * config/riscv/t-riscv: Adjust build rule for gen-riscv-ext-opt and gen-riscv-ext-texi.
2025-06-02RISC-V: Implement full-featured iterator for riscv_subset_list [NFC]Kito Cheng2-14/+66
This commit implements a full-featured iterator for the riscv_subset_list, that it able to use range-based-for-loop. That could simplfy the code in the future, and make it more readable, also more compatible with standard C++ containers. gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Use range-based-for-loop. * config/riscv/riscv-subset.h (riscv_subset_list::iterator): New. (riscv_subset_list::const_iterator): New.
2025-06-01RISC-V: Fix line too long format issue for autovect.md [NFC]Pan Li1-18/+36
Inspired by the avg_ceil patches, notice there were even more lines too long from autovec.md. So fix that format issues. gcc/ChangeLog: * config/riscv/autovec.md: Fix line too long for sorts of pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-31xtensa: Remove include of reload.hTakayuki 'January June' Suwa1-1/+0
As one of the last steps in removing old reload. gcc/ChangeLog: * config/xtensa/xtensa.cc: Remove include of reload.h.
2025-05-31xtensa: Remove an unnecessary constraint modifier from movsf_internal insn ↵Takayuki 'January June' Suwa1-1/+1
pattern In this case, there is no need to consider reloading when memory is the destination. On the other hand, when memory is the source, reloading read from constant pool becomes double indirection and should obviously be avoided. gcc/ChangeLog: * config/xtensa/xtensa.md (movsf_internal): Remove destination side constraint modifier '^' in the third alternative.
2025-05-31xtensa: Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASSTakayuki 'January June' Suwa1-4/+23
Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS in order to avoid using ALL_REGS rclass as is done on other targets, instead of overestimating between integer and FP register move costs. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_ira_change_pseudo_allocno_class): New prototype and function. (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): Define macro. (xtensa_register_move_cost): Change between integer and FP register move cost to a value based on actual behavior, i.e. 2, the default and the same as the move cost between integer registers.
2025-05-30RISC-V: Leverage vaadd.vv for signed standard name avg_ceilPan Li1-19/+7
The avg_ceil has the rounding mode towards +inf, while the vaadd.vv has the rnu which totally match the sematics. From RVV spec, the fixed vaadd.vv with rnu, roundoff_signed(v, d) = (signed(v) >> d) + r r = v[d - 1] For vaadd, d = 1, then we have roundoff_signed(v, 1) = (signed(v) >> 1) + v[0] If v[0] is bit 0, nothing need to do as there is no rounding. If v[0] is bit 1, there will be rounding with 2 cases. Case 1: v is positive. roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf roundoff_signed(2 + 3, 1) = (5 >> 1) + 1 = 3 Case 2: v is negative. roundoff_signed(v, 1) = (signed(v) >> 1) + 1, aka round towards +inf roundoff_signed(-9 + 2, 1) = (-7 >> 1) + 1 = -4 + 1 = -3 Thus, we can leverage the vaadd with rnu directly for avg_ceil. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (avg<v_double_trunc>3_ceil): Add insn expand to leverage vaadd with rnu directly. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-30[AUTOFDO] Enable autofdo tests for aarch64Kugan Vivekanandarajah1-0/+0
autofdo tests are now running only for x86. This patch makes it run for aarch64 too. Verified that perf and create_gcov are running as expected. gcc/ChangeLog: * config/aarch64/gcc-auto-profile: Make script executable. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Enable autofdo tests for aarch64. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2025-05-29i386: Use Shuffles instead of shifts for Reduction in AMD znver4/5Pranav Gorantla3-1/+35
In AMD znver4, znver5 targets vpshufd, vpsrldq have latencies 1,2 and throughput 4 (2 for znver4),2 respectively. It is better to generate shuffles instead of shifts wherever possible. In this patch we try to generate appropriate shuffle instruction to copy higher half to lower half instead of a simple right shift during horizontal vector reduction. gcc/ChangeLog: * config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to generate reduc half for V4SI, similar modes. * config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro. * config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF): New tuning. gcc/testsuite/ChangeLog: * gcc.target/i386/reduc-pshuf.c: New test.
2025-05-29RISC-V: Add minimal support of double trap extension 1.0Jerry Zhang Jian2-0/+30
Add support of double trap extension [1], enabling GCC to recognize the following extensions at compile time. New extensions: - ssdbltrp - smdbltrp [1] https://github.com/riscv/riscv-double-trap/releases/download/v1.0/riscv-double-trap.pdf gcc/ChangeLog: * config/riscv/riscv-ext.def: New extensions * config/riscv/riscv-ext.opt: Auto re-generated * doc/riscv-ext.texi: Auto re-generated gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-57.c: New test * gcc.target/riscv/arch-58.c: New test Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>
2025-05-29RISC-V: Combine vec_duplicate + vmul.vv to vmul.vx on GR2VR costPan Li3-1/+4
This patch would like to combine the vec_duplicate + vmul.vv to the vmul.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, |) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vmul.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vmul.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case for MULT op. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op mult to no_shift_vx_ops. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-28Set znver5 addss cost to 2 againJan Hubicka1-1/+1
since uses of addss for other purposes then modelling FP addition/subtraction should be gone now, this patch sets addss cost back to 2. gcc/ChangeLog: PR target/119298 * config/i386/x86-tune-costs.h (struct processor_costs): Set addss cost back to 2.
2025-05-28RISC-V: Avoid division by zero in check_builtin_call [PR120436].Robin Dapp2-39/+5
In check_builtin_call we eventually perform a division by zero when no vector modes are present. This patch just avoids the division in that case. PR target/120436 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-shapes.cc (struct vset_def): Avoid division by zero. (struct vget_def): Ditto. * config/riscv/riscv-vector-builtins.h (struct function_group_info): Use required_extensions_specified instead of duplicating code. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr120436.c: New test.
2025-05-28RISC-V: Leverage vaadd.vv for signed standard name avg_floorPan Li1-13/+6
The signed avg_floor totally match the sematics of fixed point rvv insn vaadd, within round down. Thus, leverage it directly to implement the avf_floor. The spec of RVV is somehow not that clear about the difference between the float point and fixed point for the rounding that discard least-significant information. For float point which is not two's complement, the "discard least-significant information" indicates truncation round. For example as below: * 3.5 -> 3 * -2.3 -> -2 For fixed point which is two's complement, the "discard least-significant information" indicates round down. For example as below: * 3.5 -> 3 * -2.3 -> -3 And the vaadd takes the round down which is totally matching the sematics of the avf_floor. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (avg<v_double_trunc>3_floor): Add insn expand to leverage vaadd directly. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-28aarch64: Enable newly implemented features for FUJITSU-MONAKAYuta Mukai1-1/+1
This patch enables newly implemented features in GCC (FAMINMAX, FP8FMA, FP8DOT2, FP8DOT4, LUT) for FUJITSU-MONAKA processor (-mcpu=fujitsu-monaka). 2025-05-23 Yuta Mukai <mukai.yuta@fujitsu.com> gcc/ChangeLog: * config/aarch64/aarch64-cores.def (fujitsu-monaka): Update ISA features.
2025-05-27Enable afdo testing on AMD Zen3+Jan Hubicka1-7/+22
contrib/ChangeLog: * gen_autofdo_event.py: Add support for AMD Zen 3 and later CPUs. gcc/ChangeLog: * config/i386/gcc-auto-profile: regenerate.
2025-05-27s390x: Fix bootstrap.Juergen Christ1-1/+1
A typo in the mnemonic attribute caused a failed bootstrap. Not sure how that passed the bootstrap done before committing. gcc/ChangeLog: * config/s390/vector.md(*vec_extract<mode>): Fix mnemonic. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2025-05-27[RISC-V] Add andi+bclr synthesisShreya Munnangi2-28/+70
So this patch from Shreya adds the ability to use andi + a series of bclr insns to synthesize a logical AND, much like we're doing for IOR/XOR using ori+bset or their xor equivalents. This would regress from a code quality standpoint if we didn't make some adjustments to a handful of define_insn_and_split patterns in the riscv backend which support the same kind of idioms. Essentially we turn those define_insn_and_split patterns into the simple define_splits they always should have been. That's been the plan since we started down this path -- now is the time to make that change for a subset of patterns. It may be the case that when we're finished we may not even need those patterns. That's still TBD. I'm aware of one minor regression in xalan. As seen elsewhere, combine reconstructs the mask value, uses mvconst_internal to load it into a reg then an and instruction. That looks better than the operation synthesis, but only because of the mvconst_internal little white lie. This patch does help in a variety of places. It's fairly common in gimple.c from 502.gcc to see cases where we'd use bclr to clear a bit, then set the exact same bit a few instructions later. That was an artifact of using a define_insn_and_split -- it wasn't obvious to combine that we had two instructions manipulating the same bit. Now that is obvious to combine and the redundant operation gets removed. This has spun in my tester with no regressions on riscv32-elf and riscv64-elf. Hopefully the baseline for the tester as stepped forward 🙂 gcc/ * config/riscv/bitmanip.md (andi+bclr splits): Simplified from prior define_insn_and_splits. * config/riscv/riscv.cc (synthesize_and): Add support for andi+bclr sequences. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-05-27RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR costPan Li3-1/+4
This patch would like to combine the vec_duplicate + vxor.vv to the vxor.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, |) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vxor.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vxor.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case for XOR op. (expand_vx_binary_vec_vec_dup): Diito. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-27s390: Floating point vector lane handlingJuergen Christ1-12/+125
Since floating point and vector registers overlap on s390, more efficient code can be generated to extract FPRs from VRs. Additionally, for double vectors, more efficient code can be generated to load specific lanes. gcc/ChangeLog: * config/s390/vector.md (VF): New mode iterator. (VEC_SET_NONFLOAT): New mode iterator. (VEC_SET_SINGLEFLOAT): New mode iterator. (*vec_set<mode>): Split pattern in two. (*vec_setv2df): Extract special handling for V2DF mode. (*vec_extract<mode>): Split pattern in two. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-extract-1.c: New test. * gcc.target/s390/vector/vec-set-1.c: New test. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2025-05-26[AUTOFDO][AARCH64] Add support for profilebootstrapKugan Vivekanandarajah1-0/+53
Add support for autoprofiledbootstrap in aarch64. This is similar to what is done for i386. Added gcc/config/aarch64/gcc-auto-profile for aarch64 profile creation. How to run: configure --with-build-config=bootstrap-lto make autoprofiledbootstrap ChangeLog: * Makefile.def: AUTO_PROFILE based on cpu_type. * Makefile.in: Likewise. * configure: Regenerate. * configure.ac: Set autofdo_target. gcc/ChangeLog: * config/aarch64/gcc-auto-profile: New file. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2025-05-26rs6000: Remove include of reload.hSegher Boessenkool1-1/+0
As one of the last steps in removing old reload, I'll delete the reload.h header file. It would be a bit embarrassing if that stopped the target I am responsible for from working, so let's prevent that. We do not actually use anything from this header file (checked by building with this patch, and make check has identical results as well), so it was easy for our port. Many other ports will be like this, but some will need some adjustments. I'll do cross builds of many ports before it is all over, but it would be good if other ports tried to remove reload.h from their includes as well :-) 2025-06-26 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/rs6000.cc: Remove include of reload.h .
2025-05-25MicroBlaze does not support speculative execution (CVE-2017-5753)Michael J. Eager1-0/+4
gcc/ PR target/86772 Tracking CVE-2017-5753 * config/microblaze/microblaze.cc (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define to speculation_save_value_not_needed
2025-05-25Make i386 construcotr vectorizer costs more realisticsJan Hubicka1-4/+25
this patch attempts to make vectorizer costs of vector consructions more realistic. Currently we account one integer_to_sse cost for integer vector construction but we over-estimate 256 and 512bit vinserts by using addss instead of sse_op. This is because in reality, especially on AMD machines, vectorization of constructors may get expensive due to quite large integer<->sse move costs. Estimating real integer<->sse register traffic is quite hard since some of integer non-vector arithmetics can be done in SSE registers (for example, if there is no real arithmetics, just memory load or any code that can be converted by scalar-to-vector RTL pass). I think to fix the situation we need to proceed with Richi's recent patch on adding extra info to the cost hooks and pattern match what can eventually be STV converted. Towards that we however also need to fix current STC limitations (such as lack for int->sse conversion) and make the cost mode more meaningful. This patch removes the hack using addss to "add extra cost" to 256 and 512bit constructors. Instead I use integer_to_sse cost in add_stmt_cost. We already account 1 consversion for all constructs (no matter of size). I made it to be 2 conversions for 256 and 3 for 512 since it is closest to what we do now. Current costs tables are not matching reality for zens 1) SSE loads (which are pushed down from 10 cycles to 3 cycles) 2) SSE stores 2) SSE->integer conversion cost (which is 3 cycles instead of 5) Similarly we are not having realistic values for Intel chips, especially artifically increasing SSE->integer costs. The reason is that changing those values regressed benchmarks. This was mostly because these costs were accounted wrong on multiple spots and we kind of fine-tuned for SPECs. Other reason is that at the time the tables was merged with register allocator increasing those costs led to IRA using integer registers to spill SSE values and vice versa which does not work that well in practice. I think one of problems there is missing model for memory renaming which makes integer spilling significantly cheaper then modelled. In previous patches I fixed multiple issues on accounting loads and stores and with this change, I hope I will be able to get the tables more realistics and incrementally fix issues with individual benchmarks. I benchmarked the patch wtih -Ofast -march=native -flto on znver5 and skylake. It seems in noise for skylake, for znver5 I got what seems off-noise for xalabcbmk 8.73->8.81 (rate). Rest seems in noise too, however the change affects quite some SLP decisions when the sequence is just loads followed by vector store. gcc/ChangeLog: * config/i386/i386.cc (ix86_builtin_vectorization_cost): use sse_op instead of addss to cost vinsertti128 and vinsertti64x4; compute correct mode of vinsertti128. (ix86_vector_costs::add_stmt_cost): For integer 256bit and 512bit vector constructions account more integer_to_sse moves.
2025-05-25i386: Quote user-defined symbols in assembly in Intel syntaxLIU Hao4-2/+24
With `-masm=intel`, GCC generates registers without % prefixes. If a user-declared symbol happens to match a register, it will confuse the assembler. User-defined symbols should be quoted, so they are not to be mistaken for registers or operators. Support for quoted symbols were added in Binutils 2.26, originally for ARM assembly, where registers are also unprefixed: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=d02603dc201f80cd9d2a1f4b1a16110b1e04222b This change is required for `@SECREL32` to work in Intel syntax when targeting Windows, where `@` is allowed as part of a symbol. GNU AS fails to parse a plain symbol with that suffix: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881#c79 gcc/ChangeLog: PR target/53929 PR target/80881 * config/i386/i386-protos.h (ix86_asm_output_labelref): Declare new function for quoting user-defined symbols in Intel syntax. * config/i386/i386.cc (ix86_asm_output_labelref): Implement it. * config/i386/i386.h (ASM_OUTPUT_LABELREF): Use it. * config/i386/cygming.h (ASM_OUTPUT_LABELREF): Use it.
2025-05-24[RISC-V] shift+and+shift for logical and synthesisShreya Munnangi1-0/+30
The next chunk of Shreya's work. For this expansion we want to detect cases when the mask fits in a simm12 after shifting right by the number of trailing zeros in the mask. In that case we can synthesize the AND with a shift right, andi and shift left. I saw this case come up when doing some experimentation with mvconst_internal removed. This doesn't make any difference in spec right now, mvconst_internal will turn the sequence back into a constant load + and with register. But Shreya and I have reviewed the .expand dump on hand written tests and verified we're getting the synthesis we want. Tested on riscv32-elf and riscv64-elf. Waiting on upstream CI's verdict before moving forward. gcc/ * config/riscv/riscv.cc (synthesize_and): Use a srl+andi+sll sequence when the mask fits in a simm12 after shifting by the number of trailing zeros. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-05-24RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR costPan Li3-1/+4
This patch would like to combine the vec_duplicate + vor.vv to the vor.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, |) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vor.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vor.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case for IOR op. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-23RISC-V: Add autovec mode param.Robin Dapp2-0/+26
This patch adds a --param=autovec-mode=<MODE_NAME>. When the param is specified we make autovectorize_vector_modes return exactly this mode if it is available. This helps when testing different vectorizer settings. gcc/ChangeLog: * config/riscv/riscv-v.cc (autovectorize_vector_modes): Return user-specified mode if available. * config/riscv/riscv.opt: New param. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/param-autovec-mode.c: New test.
2025-05-23RISC-V: Default-initialize variable.Robin Dapp1-1/+1
This patch initializes saved_vxrm_mode to VXRM_MODE_NONE. This is a warning (but no error) when building the compiler so better fix it. gcc/ChangeLog: * config/riscv/riscv.cc (singleton_vxrm_need): Init saved_vxrm_mode.
2025-05-23RISC-V: Fix some dynamic LMUL costing.Robin Dapp1-23/+2
With all-SLP we annotate statements slightly differently. This patch uses STMT_VINFO_RELEVANT_P in order to walk through potential program points. Also it makes the LMUL estimate always use the same path. This helps fix a number of test cases that regressed since GCC 14. There are still some failing ones but it appears to me that the chosen LMUL is still correct and we just expect different log messages. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (compute_estimated_lmul): Always use vect_vf_for_cost and TARGET_MIN_VLEN. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Adjust expectations. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Ditto.
2025-05-23aarch64: Fold lsl+lsr+orr to rev for half-width shiftsDhruv Chawla2-2/+63
This patch folds the following pattern: lsl <y>, <x>, <shift> lsr <z>, <x>, <shift> orr <r>, <y>, <z> to: revb/h/w <r>, <x> when the shift amount is equal to half the bitwidth of the <x> register. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the target already provided the result in the expected register. * config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Avoid forcing subregs into fresh registers unnecessarily. * config/aarch64/aarch64-sve.md: Add define_split for rotate. (*v_revvnx8hi): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/shift_rev_1.c: New test. * gcc.target/aarch64/sve/shift_rev_2.c: Likewise. * gcc.target/aarch64/sve/shift_rev_3.c: Likewise.