aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-06-30RISC-V: Adding B ext, fp16 and missing scalar instruction type for sifive-7 ↵Kito Cheng1-3/+29
pipeline model [PR120659] gcc/ChangeLog: PR target/120659 * config/riscv/sifive-7.md: Add B extension, fp16 and missing scalar instruction type for sifive-7 pipeline model. gcc/testsuite/ChangeLog: PR target/120659 * gcc.target/riscv/pr120659.c: New test.
2025-06-30RISC-V: Vector-scalar negate-multiply-(subtract-)accumulate [PR119100]Paul-Antoine Arras2-16/+52
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a (possibly negated) minus-mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v6,fa0 vfnmacc.vv v2,v6,v4 After, we get only one: vfnmacc.vf v2,fa0,v4 PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfnmsub_<mode>,*vfnmadd_<mode>): Handle both add and acc variants. * config/riscv/vector.md (*pred_mul_neg_<optab><mode>_scalar_undef): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfnmacc and vfnmsac. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h (DEF_VF_MULOP_CASE_1): Fix return type. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmacc-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsac-run-1-f64.c: New test.
2025-06-30aarch64: Add support for NVIDIA GB10Kyrylo Tkachov2-1/+4
This adds support for -mcpu=gb10. This is a big.LITTLE configuration involving Cortex-X925 and Cortex-A725 cores. The appropriate MIDR numbers are added to detect them in -mcpu=native. We did not add an -mcpu=cortex-x925.cortex-a725 option because GB10 does include the crypto instructions which we want on by default, and the current convention is to not enable such extensions for Arm Cortex cores in -mcpu where they are optional in the IP. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-cores.def (gb10): New entry. * config/aarch64/aarch64-tune.md: Regenerate. * doc/invoke.texi (AArch64 Options): Document the above.
2025-06-30x86: Preserve frame pointer for no_callee_saved_registers attributeH.J. Lu4-21/+15
Update functions with no_callee_saved_registers/preserve_none attribute to preserve frame pointer since caller may use it to save the current stack: pushq %rbp movq %rsp, %rbp ... call function ... leave ret If callee changes frame pointer without restoring it, caller will fail to restore its stack after callee returns as LEAVE does mov %rbp, %rsp pop %rbp The corrupted frame pointer will corrupt stack pointer in caller. There are no regressions on Linux/x86-64. Also tested with https://github.com/python/cpython configured with "./configure --with-tail-call-interp". gcc/ PR target/120840 * config/i386/i386-expand.cc (ix86_expand_call): Don't mark hard frame pointer as clobber. * config/i386/i386-options.cc (ix86_set_func_type): Use TYPE_NO_CALLEE_SAVED_REGISTERS instead of TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * config/i386/i386.cc (ix86_function_ok_for_sibcall): Remove the TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP check. (ix86_save_reg): Merge TYPE_NO_CALLEE_SAVED_REGISTERS and TYPE_PRESERVE_NONE with TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * config/i386/i386.h (call_saved_registers_type): Remove TYPE_NO_CALLEE_SAVED_REGISTERS_EXCEPT_BP. * doc/extend.texi: Update no_callee_saved_registers documentation. gcc/testsuite/ PR target/120840 * gcc.target/i386/no-callee-saved-1.c: Updated. * gcc.target/i386/no-callee-saved-2.c: Likewise. * gcc.target/i386/no-callee-saved-7.c: Likewise. * gcc.target/i386/no-callee-saved-8.c: Likewise. * gcc.target/i386/no-callee-saved-9.c: Likewise. * gcc.target/i386/no-callee-saved-10.c: Likewise. * gcc.target/i386/no-callee-saved-18.c: Likewise. * gcc.target/i386/no-callee-saved-19a.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/pr119784a.c: Likewise. * gcc.target/i386/preserve-none-6.c: Likewise. * gcc.target/i386/preserve-none-7.c: Likewise. * gcc.target/i386/preserve-none-12.c: Likewise. * gcc.target/i386/preserve-none-13.c: Likewise. * gcc.target/i386/preserve-none-14.c: Likewise. * gcc.target/i386/preserve-none-15.c: Likewise. * gcc.target/i386/preserve-none-23.c: Likewise. * gcc.target/i386/pr120840-1a.c: New test. * gcc.target/i386/pr120840-1b.c: Likewise. * gcc.target/i386/pr120840-1c.c: Likewise. * gcc.target/i386/pr120840-1d.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-30RISC-V: Refactor the function bitmap_union_of_preds_with_entryJin Ma1-22/+19
The current implementation of this function is somewhat difficult to understand, as it uses a direct break statement within the for loop, rendering the loop meaningless. Additionally, during the Coverity check on the for loop, a warning appeared: "unreachable: Since the loop increment ix++; is unreachable, the loop body will never execute more than once." Therefore, I have made some simple refactoring to address these issues. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (bitmap_union_of_preds_with_entry): Refactor. Signed-off-by: Jin Ma <jinma@linux.alibaba.com>
2025-06-30RISC-V: Add pipeline-checker scriptKito Cheng1-0/+191
Pipeline checker utility for RISC-V architecture that validates processor pipeline models. This tool analyzes machine description files to ensure all instruction types are properly handled by pipeline scheduling models. I write this tool since I am implment vector pipeline stuff for SiFive core, but it's hard to find which instruction type is not handled by pipeline scheduling models. This tool will help me to find out which instruction type is not handled by pipeline scheduling models, so I can fix them. And I think it may be useful for other RISC-V core developers, so I decided to upstream that :) Usage: ``` ./pipeline-checker <your-pipeline-model> ``` Example: ``` $ ./pipeline-checker sifive-7.md Error: Some types are not consumed by the pipemodel Missing types: {'vfclass', 'vimovxv', 'vmov', 'rdfrm', 'wrfrm', 'ghost', 'wrvxrm', 'crypto', 'vwsll', 'vfmovfv', 'vimovvx', 'sf_vc', 'vfmovvf', 'sf_vc_se', 'rdvlenb', 'vbrev', 'vrev8', 'sf_vqmacc', 'sf_vfnrclip', 'vsetvl_pre', 'rdvl', 'vsetvl'} ``` gcc/ChangeLog: * config/riscv/pipeline-checker: New file.
2025-06-28AVR: target/120856 - Deny R24:DI in avr_hard_regno_mode_ok with Reload.Georg-Johann Lay1-1/+1
This fixes an ICE with -mno-lra when split2 tries to split the following zero_extendsidi2 insn: (set (reg:DI 24) (zero_extend:DI (reg:SI **))) The ICE is because avr_hard_regno_mode_ok allows R24:DI but disallows R28:SI when Reload is used. R28:SI is a result of zero_extendsidi2. This ICE only occurs with Reload (which will die before very long), but it occurs when building libgcc. gcc/ PR target/120856 * config/avr/avr.cc (avr_hard_regno_mode_ok) [-mno-lra]: Deny hard regs >= 4 bytes that overlap Y.
2025-06-27AVR: target/113934 - Use LRA per default.Georg-Johann Lay1-2/+2
Now that the patches for PR120424 are upstream, the last known bug associated with avr+lra has been fixed: PR118591. So we can pull the switch that turns on LRA per default. This patch only sets -mlra per default. It doesn't do any Reload related cleanup or removal from the avr backend, hence -mno-lra still works. The only new problem is that gcc.dg/torture/pr64088.c fails with LRA but not with Reload. Though that test case is awkward since it is UB but expects the compiler to behave in a specific way which avr-gcc doesn't do: PR116780. This patch also avoids a relative recent ICE that breaks building libgcc: R24:DI is allowed per hard_regno_mode_ok, but R26:SI is disallowed for Reload for old reasons. Outcome is that a split2 pattern for R24:DI = zero_extend:DI (R22:SI) runs into an ICE. AVR-LibC builds fine with this patch. The AVR-LibC testsuite passes without errors. gcc/ PR target/113934 * config/avr/avr.opt (-mlra): Turn on per default.
2025-06-27x86: Handle vector broadcast sourceH.J. Lu1-0/+9
Use the inner scalar mode of vector broadcast source in: (set (reg:V8DF 394) (vec_duplicate:V8DF (reg:V2DF 190 [ alpha ]))) to compute the vector mode for broadcast from vector source. gcc/ PR target/120830 * config/i386/i386-features.cc (ix86_get_vector_cse_mode): Handle vector broadcast source. gcc/testsuite/ PR target/120830 * g++.target/i386/pr120830.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26pru: Split 64-bit moves into a sequence of 32-bit movesDimitar Dimitrov1-0/+77
The 64-bit register-to-register moves on PRU are implemented with two instructions moving 32-bit registers. Defining a split for the 64-bit moves allows this to be described in RTL, and thus one of the 32-bit moves to be eliminated if the destination register is dead. Also, split the loading of non-trivial 64-bit integer constants. The resulting 32-bit integer constants have better chance to be loaded with something more optimal than an "ldi32". For now do the splits only after register allocation, because LRA does not yet efficiently handle subregs. See https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651366.html This patch shows slight improvement for wikisort benchmark from embench-iot: Benchmark size-before size-after difference --------- ----------- ---------- ---------- aha-mont64 1,648 1,648 0 crc32 104 104 0 depthconv 1,172 1,172 0 edn 3,040 3,040 0 huffbench 1,616 1,616 0 matmult-int 748 748 0 md5sum 700 700 0 nettle-aes 2,664 2,664 0 nettle-sha256 5,732 5,732 0 nsichneu 21,372 21,372 0 picojpeg 9,716 9,716 0 qrduino 8,556 8,556 0 sglib-combined 3,724 3,724 0 slre 3,488 3,488 0 statemate 1,132 1,132 0 tarfind 652 652 0 ud 1,004 1,004 0 wikisort 18,120 18,092 -28 xgboost 300 300 0 gcc/ChangeLog: * config/pru/pru.md (reg move splitter): New splitter for 64-bit register moves into two 32-bit moves. (const_int move splitter): New splitter for 64-bit constant integer moves into two 32-bit moves. gcc/testsuite/ChangeLog: * gcc.target/pru/mov64-subreg-1.c: New test. * gcc.target/pru/mov64-subreg-2.c: New test. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-06-26RISC-V: update prepare_ternary_operands to handle vector-scalar case [PR120828]Paul-Antoine Arras1-3/+5
This is a followup to 92e1893e0 "RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate" that caused an ICE in some cases where the mult operands were wrongly swapped. This patch ensures that operands are not swapped in the vector-scalar case. PR target/120828 gcc/ChangeLog: * config/riscv/riscv-v.cc (prepare_ternary_operands): Handle the vector-scalar case.
2025-06-26i386: Introduce crc_rev<mode>si4 expanders [PR120719]Uros Bizjak1-0/+17
Introduce crc_rev<mode>si4 expanders to generate CRC32 instruction when using __builtin_rev_crc32_data* builtins with 0x1EDC6F41 poylnomial and -mcrc32. PR target/120719 gcc/ChangeLog: * config/i386/i386.md (crc_rev<SWI124:mode>si4): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/crc-builtin-crc32.c: New test.
2025-06-26RISC-V: Fix build issueKito Cheng1-1/+1
Apparently I forgot to squash this fix into the previous commit before I push... gcc/ChangeLog: * config/riscv/riscv.md: Fix build issue.
2025-06-26RISC-V: Add comment and reorder the the include files in riscv.md [NFC]Kito Cheng1-8/+11
This patch adds a comment to the riscv.md file to clarify the purpose of the file and reorders the include files for better organization. gcc/ChangeLog: * config/riscv/riscv.md: Add comment and reorder include files.
2025-06-26x86: Also handle all 1s float vector constantH.J. Lu1-2/+4
Since float vector constant (const_vector:V4SF [(const_double:SF -QNaN [-QNaN]) repeated x4]) is an all 1s float vector constant, update the remove_redundant_vector pass to replace (insn 20 18 21 2 (set (reg:V4SF 124) (const_vector:V4SF [ (const_double:SF -QNaN [-QNaN]) repeated x4 ])) "x.cc":26:5 2426 {movv4sf_internal} (nil)) with (insn 49 2 5 2 (set (reg:V16QI 135) (const_vector:V16QI [ (const_int -1 [0xffffffffffffffff]) repeated x16 ])) -1 (nil)) ... (insn 20 18 21 2 (set (reg:V4SF 124) (subreg:V4SF (reg:V16QI 135) 0)) "x.cc":26:5 2426 {movv4sf_internal} (nil)) gcc/ PR target/120819 * config/i386/i386-features.cc (ix86_broadcast_inner): Also handle all 1s float vector constant. gcc/testsuite/ PR target/120819 * g++.target/i386/pr120819.C: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26x86: Handle REG_EH_REGION note in DEF_INSNH.J. Lu1-0/+32
For tcpsock_test.go in libgo tests, commit aba3b9d3a48a0703fd565f7c5f0caf604f59970b Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri May 9 07:17:07 2025 +0800 x86: Extend the remove_redundant_vector pass added an instruction: (insn 501 101 102 21 (set (reg:V2DI 234) (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1 (nil)) after (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) which resulted in (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) (insn 501 101 102 21 (set (reg:V2DI 234) (vec_duplicate:V2DI (reg:DI 111 [ _46 ]))) "tcpsock_test.go":691:12 discrim 1 -1 (nil)) and caused: tcpsock_test.go: In function 'net.TestTCPBig..func2': tcpsock_test.go:684:28: error: in basic block 21: 684 | go func() { | ^ tcpsock_test.go:684:28: error: flow control insn inside a basic block (insn 101 100 501 21 (set (reg:DI 111 [ _46 ]) (mem:DI (reg/f:DI 110 [ _45 ]) [5 *_45+0 S8 A64])) "tcpsock_test.go":691:12 discrim 1 99 {*movdi_internal} (expr_list:REG_DEAD (reg/f:DI 110 [ _45 ]) (expr_list:REG_EH_REGION (const_int 1 [0x1]) (nil)))) during RTL pass: rrvl tcpsock_test.go:684:28: internal compiler error: in rtl_verify_bb_insns, at cfgrtl.cc:2834 Copy the REG_EH_REGION note to the newly added instruction and split the block after the previous instruction. PR target/120816 * config/i386/i386-features.cc (remove_redundant_vector_load): Handle REG_EH_REGION note in DEF_INSN. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26x86: Add preserve_none and update no_caller_saved_registers attributesH.J. Lu5-46/+171
Add preserve_none attribute which is similar to no_callee_saved_registers attribute, except on x86-64, r12, r13, r14, r15, rdi and rsi registers are used for integer parameter passing. This can be used in an interpreter to avoid saving/restoring the registers in functions which process byte codes. It improved the pystones benchmark by 6-7%: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119628#c15 Remove -mgeneral-regs-only restriction on no_caller_saved_registers attribute. Only SSE is allowed since SSE XMM register load preserves the upper bits in YMM/ZMM register while YMM register load zeros the upper 256 bits of ZMM register, and preserving 32 ZMM registers can be quite expensive. gcc/ PR target/119628 * config/i386/i386-expand.cc (ix86_expand_call): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. * config/i386/i386-options.cc (ix86_set_func_type): Look up preserve_none attribute. Check preserve_none attribute for interrupt attribute. Don't check no_caller_saved_registers nor no_callee_saved_registers conflicts here. (ix86_set_func_type): Check no_callee_saved_registers before checking no_caller_saved_registers attribute. (ix86_set_current_function): Allow SSE with no_caller_saved_registers attribute. (ix86_handle_call_saved_registers_attribute): Check preserve_none, no_callee_saved_registers and no_caller_saved_registers conflicts. (ix86_gnu_attributes): Add preserve_none attribute. * config/i386/i386-protos.h (ix86_type_no_callee_saved_registers_p): New. * config/i386/i386.cc (x86_64_preserve_none_int_parameter_registers): New. (ix86_using_red_zone): Don't use red-zone when there are no caller-saved registers with SSE. (ix86_type_no_callee_saved_registers_p): New. (ix86_function_ok_for_sibcall): Also check TYPE_PRESERVE_NONE and call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. (ix86_comp_type_attributes): Call ix86_type_no_callee_saved_registers_p instead of looking up no_callee_saved_registers attribute. Return 0 if preserve_none attribute doesn't match in 64-bit mode. (ix86_function_arg_regno_p): For cfun with TYPE_PRESERVE_NONE, use x86_64_preserve_none_int_parameter_registers. (init_cumulative_args): Set preserve_none_abi. (function_arg_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (setup_incoming_varargs_64): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. (ix86_save_reg): Treat TYPE_PRESERVE_NONE like TYPE_NO_CALLEE_SAVED_REGISTERS. (ix86_nsaved_sseregs): Allow saving XMM registers for no_caller_saved_registers attribute. (ix86_compute_frame_layout): Likewise. (x86_this_parameter): Use x86_64_preserve_none_int_parameter_registers with preserve_none attribute. * config/i386/i386.h (ix86_args): Add preserve_none_abi. (call_saved_registers_type): Add TYPE_PRESERVE_NONE. (machine_function): Change call_saved_registers to 3 bits. * doc/extend.texi: Add preserve_none attribute. Update no_caller_saved_registers attribute to remove -mgeneral-regs-only restriction. gcc/testsuite/ PR target/119628 * gcc.target/i386/no-callee-saved-3.c: Adjust error location. * gcc.target/i386/no-callee-saved-19a.c: New test. * gcc.target/i386/no-callee-saved-19b.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/no-callee-saved-19e.c: Likewise. * gcc.target/i386/preserve-none-1.c: Likewise. * gcc.target/i386/preserve-none-2.c: Likewise. * gcc.target/i386/preserve-none-3.c: Likewise. * gcc.target/i386/preserve-none-4.c: Likewise. * gcc.target/i386/preserve-none-5.c: Likewise. * gcc.target/i386/preserve-none-6.c: Likewise. * gcc.target/i386/preserve-none-7.c: Likewise. * gcc.target/i386/preserve-none-8.c: Likewise. * gcc.target/i386/preserve-none-9.c: Likewise. * gcc.target/i386/preserve-none-10.c: Likewise. * gcc.target/i386/preserve-none-11.c: Likewise. * gcc.target/i386/preserve-none-12.c: Likewise. * gcc.target/i386/preserve-none-13.c: Likewise. * gcc.target/i386/preserve-none-14.c: Likewise. * gcc.target/i386/preserve-none-15.c: Likewise. * gcc.target/i386/preserve-none-16.c: Likewise. * gcc.target/i386/preserve-none-17.c: Likewise. * gcc.target/i386/preserve-none-18.c: Likewise. * gcc.target/i386/preserve-none-19.c: Likewise. * gcc.target/i386/preserve-none-20.c: Likewise. * gcc.target/i386/preserve-none-21.c: Likewise. * gcc.target/i386/preserve-none-22.c: Likewise. * gcc.target/i386/preserve-none-23.c: Likewise. * gcc.target/i386/preserve-none-24.c: Likewise. * gcc.target/i386/preserve-none-25.c: Likewise. * gcc.target/i386/preserve-none-26.c: Likewise. * gcc.target/i386/preserve-none-27.c: Likewise. * gcc.target/i386/preserve-none-28.c: Likewise. * gcc.target/i386/preserve-none-29.c: Likewise. * gcc.target/i386/preserve-none-30a.c: Likewise. * gcc.target/i386/preserve-none-30b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-26x86: Add debug dump for the remove_redundant_vector passH.J. Lu1-5/+60
Add debug dump for the remove_redundant_vector pass with the following output: Replace: (insn 7 4 8 2 (set (reg:V2DI 103) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ])) "x.c":8:13 2406 {movv2di_internal} (nil)) with: (insn 7 4 8 2 (set (reg:V2DI 103) (subreg:V2DI (reg:V32QI 109) 0)) "x.c":8:13 2406 {movv2di_internal} (nil)) ... Replace: (insn 16 15 17 3 (set (reg:V4DI 105) (const_vector:V4DI [ (const_int 0 [0]) repeated x4 ])) "x.c":13:28 2405 {movv4di_internal} (nil)) with: (insn 16 15 17 3 (set (reg:V4DI 105) (subreg:V4DI (reg:V32QI 109) 0)) "x.c":13:28 2405 {movv4di_internal} (nil)) ... Place: (insn 25 5 23 2 (set (reg:V32QI 109) (const_vector:V32QI [ (const_int 0 [0]) repeated x32 ])) -1 (nil)) after: (insn 23 25 24 2 (set (reg/f:DI 107 [ mem1 ]) (reg:DI 5 di [ mem1 ])) "x.c":5:1 95 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 5 di [ mem1 ]) (nil))) in the *.309r.rrvl debug dump. * config/i386/i386-features.cc (ix86_place_single_vector_set): Add debug dump. (replace_vector_const): Likewise. (remove_redundant_vector_load): Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-25arc: Use intrinsics for __builtin_mul_overflow ()Luis Silva1-0/+33
This patch handles both signed and unsigned builtin multiplication overflow. Uses the "mpy.f" instruction to set the condition codes based on the result. In the event of an overflow, the V flag is set, triggering a conditional move depending on the V flag status. For example, set "1" to "r0" in case of overflow: mov_s r0,1 mpy.f r0,r0,r1 j_s.d [blink] mov.nv r0,0 gcc/ChangeLog: * config/arc/arc.md (<su_optab>mulvsi4): New define_expand. (<su_optab>mulsi3_Vcmp): New define_insn. Signed-off-by: Luis Silva <luiss@synopsys.com>
2025-06-25arc: Add commutative multiplication patternsLuis Silva2-3/+36
This patch introduces two new instruction patterns: `*mulsi3_cmp0`: This pattern performs a multiplication and sets the CC_Z register based on the result, while also storing the result of the multiplication in a general-purpose register. `*mulsi3_cmp0_noout`: This pattern performs a multiplication and sets the CC_Z register based on the result without storing the result in a general-purpose register. These patterns are optimized to generate code using the `mpy.f` instruction, specifically used where the result is compared to zero. In addition, the previous commutative multiplication implementation was removed. It incorrectly took into account the negative flag, which is wrong. This new implementation only considers the zero flag. A test case has been added to verify the correctness of these changes. gcc/ChangeLog: * config/arc/arc.cc (arc_select_cc_mode): Handle multiplication results compared against zero, selecting CC_Zmode. * config/arc/arc.md (*mulsi3_cmp0): New define_insn. (*mulsi3_cmp0_noout): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/arc/mult-cmp0.c: New test. Signed-off-by: Luis Silva <luiss@synopsys.com>
2025-06-25ARC: Use intrinsics for __builtin_sub_overflow*()Shahab Vahedi1-0/+48
This patch covers signed and unsigned subtractions. The generated code would be something along these lines: signed: sub.f r0, r1, r2 b.v @label unsigned: sub.f r0, r1, r2 b.c @label gcc/ * config/arc/arc.md (subsi3_v, subvsi4, subsi3_c): New patterns. gcc/testsuite/ * gcc.target/arc/overflow-2.c: New file.
2025-06-25ARC: Use intrinsics for __builtin_add_overflow*()Shahab Vahedi5-1/+82
This patch covers signed and unsigned additions. The generated code would be something along these lines: signed: add.f r0, r1, r2 b.v @label unsigned: add.f r0, r1, r2 b.c @label gcc/ * config/arc/arc-modes.def (CC_V): New mode. * config/arc/arc-protos.h (arc_gen_unlikely_cbranch): New function declaration. * config/arc/arc.cc (arc_gen_unlikely_cbranch): New function. (get_arc_condition_code): Handle new mode. * config/arc/arc.md (addvsi3_v, addvsi4, addsi3_c, uaddvsi4): New patterns. * config/arc/predicates.md (proper_comparison_operator): Handel the new V_mode. (equality_comparison_operator): Likewise. gcc/testsuite/ * gcc.target/arc/overflow-1.c: New file
2025-06-25x86: Update -mtune=intel for Diamond Rapids/Clearwater ForestH.J. Lu3-124/+3
-mtune=intel is used to generate a single binary to run well on both big core and small core, similar to hybrid CPUs. Update -mtune=intel to tune for Diamond Rapids and Clearwater Forest, instead of Silvermont. PR target/120815 * common/config/i386/i386-common.cc (processor_alias_table): Replace CPU_SLM/PTA_NEHALEM with CPU_HASWELL/PTA_HASWELL for PROCESSOR_INTEL. * config/i386/i386-options.cc (processor_cost_table): Replace intel_cost with alderlake_cost. * config/i386/x86-tune-costs.h (intel_cost): Removed. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Treat PROCESSOR_INTEL like PROCESSOR_ALDERLAKE. (ix86_adjust_cost): Likewise. * doc/invoke.texi: Update -mtune=intel for Diamond Rapids and Clearwater Forest. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-25i386: Remove CLDEMOTE for clientsHaochen Jiang1-3/+5
CLDEMOTE is not enabled on clients according to SDM. SDM only mentioned it will be enabled on Xeon and Atom servers, not clients. Remove them since Alder Lake (where it is introduced). gcc/ChangeLog: * config/i386/i386.h (PTA_ALDERLAKE): Use PTA_GOLDMONT_PLUS as base to remove PTA_CLDEMOTE. (PTA_SIERRAFOREST): Add PTA_CLDEMOTE since PTA_ALDERLAKE does not include that anymore. * doc/invoke.texi: Update texi file.
2025-06-24gcn: Fix glc vs. sc0 handling for scalar memory accessTobias Burnus3-17/+22
gfx942 still uses glc for scalar access ('s_...') and only uses sc0/nt/sc1 for vector access. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLC_NAME): Fix and extend the description in the comment. * config/gcn/gcn.cc (print_operand): Extend the comment about 'G' and 'g'. * config/gcn/gcn.md: Use 'glc' instead of %G where appropriate.
2025-06-24RISC-V: Add patterns for vector-scalar multiply-(subtract-)accumulate [PR119100]Paul-Antoine Arras2-8/+43
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v6,fa0 vfmacc.vv v2,v6,v4 After, we get only one: vfmacc.vf v2,fa0,v4 PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>_vf_<mode>): Handle both add and acc FMA variants. * config/riscv/vector.md (*pred_mul_<optab><mode>_scalar_undef): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfmacc and vfmsac. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop.h: Add support for acc variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_mulop_run.h: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f16.c: Define TEST_OUT. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmadd-run-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsub-run-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmadd-run-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfnmsub-run-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmacc-run-1-f64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfmsac-run-1-f64.c: New test.
2025-06-24i386: Convert LEA stack adjust insn to SUB when FLAGS_REG is deadUros Bizjak1-3/+21
ADD/SUB is faster than LEA for most processors. Also, there are several peephole2 patterns available that convert prologue esp subtractions to pushes (at the end of i386.md). These process only patterns with flags reg clobber, so they are ineffective with clobber-less stack ptr adjustments, introduced by r16-1551 ("x86: Enable separate shrink wrapping"). Introduce a peephole2 pattern that adds a clobber to a clobber-less stack ptr adjustments when FLAGS_REG is dead. gcc/ChangeLog: * config/i386/i386.md (@pro_epilogue_adjust_stack_add_nocc<mode>): Add type attribute. (pro_epilogue_adjust_stack_add_nocc peephole2 pattern): Convert pro_epilogue_adjust_stack_add_nocc variant to pro_epilogue_adjust_stack_add when FLAGS_REG is dead.
2025-06-24s390: Fix float vector extract for pre-z13Juergen Christ1-2/+2
Also provide the vec_extract patterns for floats on pre-z13 machines to prevent ICEing in those cases. gcc/ChangeLog: * config/s390/vector.md (VF): Don't restrict modes. (VEC_SET_SINGLEFLOAT): Ditto. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-extract-1.c: Fix test on arch11. * gcc.target/s390/vector/vec-set-1.c: Run test on arch11. * gcc.target/s390/vector/vec-extract-2.c: New test. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2025-06-24AArch64: promote aarch64-autovec-peference to mautovec-preferenceTamar Christina2-2/+17
As requested in my patch for -mmax-vectorization this promotes the parameter --param aarch64-autovec-preference to a first class top target flag. If both the parameter and the flag is specified the parameter takes precedence with the reasoning that it may already be embedded in build systems. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_override_options_internal): Set value of parameter based on option. * config/aarch64/aarch64.opt (autovec-preference): New. * doc/invoke.texi (autovec-preference): Document it. gcc/testsuite/ChangeLog: * gcc.target/aarch64/autovec_param_asimd-only_2.c: New test. * gcc.target/aarch64/autovec_param_default_2.c: New test. * gcc.target/aarch64/autovec_param_prefer-asimd_2.c: New test. * gcc.target/aarch64/autovec_param_prefer-sve_2.c: New test. * gcc.target/aarch64/autovec_param_sve-only_2.c: New test.
2025-06-24AArch64: propose -mmax-vectorization as an option to override vector costingTamar Christina2-0/+12
With the middle-end providing a way to make vectorization more profitable by scaling vect-scalar-cost-multiplier this makes a more user friendly option to make it easier to use. I propose making it an actual -m option that we document and retain vs using the parameter name. In the future I would like to extend this option to modify additional costing in the AArch64 backend itself. This can be used together with --param aarch64-autovec-preference to get the vectorizer to say, always vectorize with SVE. I did consider making this an additional enum to --param aarch64-autovec-preference but I also think this is a useful thing to be able to set with pragmas and attributes, but am open to suggestions. Note that as a follow up I plan on extending -fdump-tree-vect to support -stats which is then intended to be usable with this flag. gcc/ChangeLog: * config/aarch64/aarch64.opt (max-vectorization): New. * config/aarch64/aarch64.cc (aarch64_override_options_internal): Save and restore option. Implement it through vect-scalar-cost-multiplier. (aarch64_attributes): Default to off. * common/config/aarch64/aarch64-common.cc (aarch64_handle_option): Initialize option. * doc/extend.texi (max-vectorization): Document attribute. * doc/invoke.texi (max-vectorization): Document flag. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/cost_model_17.c: New test. * gcc.target/aarch64/sve/cost_model_18.c: New test.
2025-06-24x86: Extend the remove_redundant_vector passH.J. Lu3-92/+341
Extend the remove_redundant_vector pass to handle vector broadcasts from constant and variable scalars. When broadcasting from constants and function arguments, we can place a single widest vector broadcast at entry of the nearest common dominator for basic blocks with all uses since constants and function arguments aren't changed. For broadcast from variables with a single definition, the single definition is replaced with the widest broadcast. gcc/ PR target/92080 * config/i386/i386-expand.cc (ix86_expand_call): Set recursive_function to true for recursive call. * config/i386/i386-features.cc (ix86_place_single_vector_set): Add an argument for inner scalar, default to nullptr. Set the source from inner scalar if not nullptr. (ix86_get_vector_load_mode): Renamed to ... (ix86_get_vector_cse_mode): This. Add an argument for scalar mode and handle integer and float scalar modes. (replace_vector_const): Add an argument for scalar mode and pass it to ix86_get_vector_load_mode. (x86_cse_kind): New. (redundant_load): Likewise. (ix86_broadcast_inner): Likewise. (remove_redundant_vector_load): Also support const0_rtx and constm1_rtx broadcasts. Handle vector broadcasts from constant and variable scalars. * config/i386/i386.h (machine_function): Add recursive_function. gcc/testsuite/ * gcc.target/i386/keylocker-aesdecwide128kl.c: Updated to expect movdqa instead pxor. * gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise. * gcc.target/i386/keylocker-aesencwide128kl.c: Likewise. * gcc.target/i386/keylocker-aesencwide256kl.c: Likewise. * gcc.target/i386/pr92080-4.c: New test. * gcc.target/i386/pr92080-5.c: Likewise. * gcc.target/i386/pr92080-6.c: Likewise. * gcc.target/i386/pr92080-7.c: Likewise. * gcc.target/i386/pr92080-8.c: Likewise. * gcc.target/i386/pr92080-9.c: Likewise. * gcc.target/i386/pr92080-10.c: Likewise. * gcc.target/i386/pr92080-11.c: Likewise. * gcc.target/i386/pr92080-12.c: Likewise. * gcc.target/i386/pr92080-13.c: Likewise. * gcc.target/i386/pr92080-14.c: Likewise. * gcc.target/i386/pr92080-15.c: Likewise. * gcc.target/i386/pr92080-16.c: Likewise. * gcc.target/i386/pr92080-17.c: Likewise. * gcc.target/i386/pr92080-18.c: Likewise. * gcc.target/i386/pr92080-19.c: Likewise. * gcc.target/i386/pr92080-20.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-24x86: Update memcpy/memset inline strategies for -mtune=genericH.J. Lu1-11/+28
Update memcpy and memset inline strategies for -mtune=generic: 1. Don't align memory. 2. For known sizes, prefer vector loop, unroll loop with 4 moves or stores per iteration without aligning the loop, up to 256 bytes. 3. For unknown sizes, use memcpy/memset. 4. Since each loop iteration has 4 stores and 8 stores for zeroing with unroll loop may be needed, change CLEAR_RATIO to 10 so that zeroing up to 72 bytes are fully unrolled with 9 stores without SSE. gcc/ PR target/70308 PR target/101366 PR target/102294 PR target/108585 PR target/118276 PR target/119596 PR target/119703 PR target/119704 * config/i386/x86-tune-costs.h (generic_memcpy): Updated. (generic_memset): Likewise. (generic_cost): Change CLEAR_RATIO to 10. gcc/testsuite/ PR target/70308 PR target/101366 PR target/102294 PR target/108585 PR target/118276 PR target/119596 PR target/119703 PR target/119704 * g++.target/i386/memset-pr101366-1.C: New test. * g++.target/i386/memset-pr101366-2.C: Likewise. * g++.target/i386/memset-pr108585-1a.C: Likewise. * g++.target/i386/memset-pr108585-1b.C: Likewise. * g++.target/i386/memset-pr118276-1a.C: Likewise. * g++.target/i386/memset-pr118276-1b.C: Likewise. * g++.target/i386/memset-pr118276-1c.C: Likewise. * gcc.target/i386/memcpy-strategy-12.c: Likewise. * gcc.target/i386/memcpy-strategy-13.c: Likewise. * gcc.target/i386/memset-pr70308-1a.c: Likewise. * gcc.target/i386/memset-pr70308-1b.c: Likewise. * gcc.target/i386/memset-strategy-25.c: Likewise. * gcc.target/i386/memset-strategy-26.c: Likewise. * gcc.target/i386/memset-strategy-27.c: Likewise. * gcc.target/i386/memset-strategy-28.c: Likewise. * gcc.target/i386/memset-strategy-29.c: Likewise. * gcc.target/i386/memset-strategy-30.c: Likewise. * gcc.target/i386/memset-strategy-31.c: Likewise. * gcc.target/i386/auto-init-padding-3.c: Expect XMM stores. * gcc.target/i386/auto-init-padding-9.c: Likewise. * gcc.target/i386/mvc17.c: Fail with "rep mov" * gcc.target/i386/pr111657-1.c: Scan for unrolled loop. Fail with "rep mov". * gcc.target/i386/shrink_wrap_1.c: Also pass -mmemset-strategy=rep_8byte:-1:align. * gcc.target/i386/sw-1.c: Also pass -mstringop-strategy=rep_byte. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-24Fix shrink wrap separate ICE for mingw [PR120741]Lili Cui1-2/+0
gcc/ChangeLog: PR target/120741 * config/i386/i386.cc (ix86_expand_prologue): Remove 1 assertion. gcc/testsuite/ChangeLog: PR target/120741 * gcc.target/i386/pr120741.c: New test. * gcc.target/i386/shrink-wrap-separate-mingw.c: Likewise.
2025-06-23[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-VJeff Law1-1/+1
Fix typo in comment spotted by Peter B. PR target/118241 gcc/ * config/riscv/predicates.md: Fix comment typo in recent change.
2025-06-23RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR costPan Li3-2/+5
This patch would like to combine the vec_duplicate + vsaddu.vv to the vsaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, FUNC) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = FUNC (in[i], x); \ } T sat_add(T a, T b) { return (a + b) | (-(T)((T)(a + b) < a)); } DEF_VX_BINARY(uint32_t, sat_add) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vsaddu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vsaddu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case US_PLUS. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op us_plus. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-06-23x86: Don't use vmovdqu16/vmovdqu8 with non-EVEX registersH.J. Lu1-3/+3
Don't use vmovdqu16/vmovdqu8 with non-EVEX register operands just because AVX512BW is available. gcc/ PR target/120728 * config/i386/i386.cc (ix86_get_ssemov): Use vmovdqu16/vmovdqu8 only with EVEX register operands. gcc/testsuite/ PR target/120728 * gcc.target/i386/avx512bw-vmovdqu16-1.c: Scan vmovdqu for non-EVEX register operands. * gcc.target/i386/avx512bw-vmovdqu8-1.c: Likewise. * gcc.target/i386/avx512fp16-13.c: Likewise. * gcc.target/i386/pr100865-10b.c: Likewise. * gcc.target/i386/pr100865-3.c: Likewise. * gcc.target/i386/pr100865-4b.c: Likewise. * gcc.target/i386/pr100865-5b.c: Likewise. * gcc.target/i386/pr90773-15.c: Likewise. * gcc.target/i386/pr90773-16.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. * gcc.target/i386/pr95483-5.c: Likewise. * gcc.target/i386/pr120728.c: New test. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-23x86: Add PROCESSOR_XXX comments to processor_cost_tableH.J. Lu1-57/+57
Add a PROCESSOR_XXX comment to each entry in processor_cost_table to describe which processor the cost enry is applied to. * config/i386/i386-options.cc (processor_cost_table): Add a PROCESSOR_XXX comment to each entry. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-22[RISC-V][PR target/119830] Fix RISC-V codegen on 32bit hostsAndrew Pinski1-3/+3
So this is Andrew's patch from the PR. We weren't clean for a 32bit host in some of the arithmetic for constant synthesis. I confirmed the bug on a 32bit linux host, then confirmed that Andrew's patch from the PR fixes the problem, then ran Andrew's patch through my tester successfully. Naturally I'll wait for pre-commit testing, but I'm not expecting problems. PR target/119830 gcc/ * config/riscv/riscv.cc (riscv_build_integer_1): Make arithmetic in bclr case clean for 32 bit hosts. gcc/testsuite/ * gcc.target/riscv/pr119830.c: New test.
2025-06-22xtensa: Make use of DEPBITS instructionTakayuki 'January June' Suwa2-1/+21
This patch implements bitfield insertion MD pattern using the DEPBITS machine instruction, the counterpart of the EXTUI instruction, if available. /* example */ struct foo { unsigned int b:10; unsigned int r:11; unsigned int g:11; }; void test(struct foo *p) { p->g >>= 1; } ;; result (endianness: little) test: entry sp, 32 l32i.n a8, a2, 0 extui a9, a8, 1, 10 depbits a8, a9, 0, 11 s32i.n a8, a2, 0 retw.n gcc/ChangeLog: * config/xtensa/xtensa.h (TARGET_DEPBITS): New macro. * config/xtensa/xtensa.md (insvsi): New insn pattern.
2025-06-22xtensa: Implement TARGET_ZERO_CALL_USED_REGSTakayuki 'January June' Suwa1-0/+56
This patch implements the target-specific ZERO_CALL_USED_REGS hook, since if -fzero-call-used-regs=all the default hook tries to assign 0 to B0 (bit 0 of the BR register) and the ICE will be thrown. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_zero_call_used_regs): New prototype and function. (TARGET_ZERO_CALL_USED_REGS): Define macro.
2025-06-21[RISC-V][PR target/118241] Fix data prefetch predicate/constraint for RISC-VJeff Law3-2/+18
The RISC-V prefetch support is broken in a few ways. This addresses the data side prefetch problems. I'd mistakenly thought this BZ was a prefetch.i related (which has deeper problems). The basic problem is we were accepting any valid address when in fact there are restrictions. This patch more precisely defines the predicate such that we allow REG REG+D Where D must have the low 5 bits clear. Note that absolute addresses fall into the REG+D form using the x0 for the register operand since it always has the value zero. The test verifies REG, REG+D, ABS addressing modes that are valid as well as REG+D and ABS which must be reloaded into a REG because the displacement has low bits set. An earlier version of this patch has gone through testing in my tester on rv32 and rv64. Obviously I'll wait for pre-commit CI to do its thing before moving forward. This is a good backport candidate after simmering on the trunk for a bit. PR target/118241 gcc/ * config/riscv/predicates.md (prefetch_operand): New predicate. * config/riscv/constraints.md (Q): New constraint. * config/riscv/riscv.md (prefetch): Use new predicate and constraint. (riscv_prefetchi_<mode>): Similarly. gcc/testsuite/ * gcc.target/riscv/pr118241.c: New test.
2025-06-21RISC-V: Fix ICE for expand_select_vldi [PR120652]Pan Li1-1/+1
The will be one ICE when expand pass, the bt similar as below. during RTL pass: expand red.c: In function 'main': red.c:20:5: internal compiler error: in require, at machmode.h:323 20 | int main() { | ^~~~ 0x2e0b1d6 internal_error(char const*, ...) ../../../gcc/gcc/diagnostic-global-context.cc:517 0xd0d3ed fancy_abort(char const*, int, char const*) ../../../gcc/gcc/diagnostic.cc:1803 0xc3da74 opt_mode<machine_mode>::require() const ../../../gcc/gcc/machmode.h:323 0xc3de2f opt_mode<machine_mode>::require() const ../../../gcc/gcc/poly-int.h:1383 0xc3de2f riscv_vector::expand_select_vl(rtx_def**) ../../../gcc/gcc/config/riscv/riscv-v.cc:4218 0x21c7d22 gen_select_vldi(rtx_def*, rtx_def*, rtx_def*) ../../../gcc/gcc/config/riscv/autovec.md:1344 0x134db6c maybe_expand_insn(insn_code, unsigned int, expand_operand*) ../../../gcc/gcc/optabs.cc:8257 0x134db6c expand_insn(insn_code, unsigned int, expand_operand*) ../../../gcc/gcc/optabs.cc:8288 0x11b21d3 expand_fn_using_insn ../../../gcc/gcc/internal-fn.cc:318 0xef32cf expand_call_stmt ../../../gcc/gcc/cfgexpand.cc:3097 0xef32cf expand_gimple_stmt_1 ../../../gcc/gcc/cfgexpand.cc:4264 0xef32cf expand_gimple_stmt ../../../gcc/gcc/cfgexpand.cc:4411 0xef95b6 expand_gimple_basic_block ../../../gcc/gcc/cfgexpand.cc:6472 0xefb66f execute ../../../gcc/gcc/cfgexpand.cc:7223 The select_vl op_1 and op_2 may be the same const_int like (const_int 32). And then maybe_legitimize_operands will: 1. First mov the const op_1 to a reg. 2. Resue the reg of op_1 for op_2 as the op_1 and op_2 is equal. That will break the assumption that the op_2 of select_vl is immediate, or something like CONST_INT_POLY. The below test suites are passed for this patch series. * The rv64gcv fully regression test. PR target/120652 gcc/ChangeLog: * config/riscv/autovec.md: Add immediate_operand for select_vl operand 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr120652-1.c: New test. * gcc.target/riscv/rvv/autovec/pr120652-2.c: New test. * gcc.target/riscv/rvv/autovec/pr120652-3.c: New test. * gcc.target/riscv/rvv/autovec/pr120652.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-06-20amdgcn: allow SImode in VCC_HI [PR120722]Andrew Stubbs1-2/+1
This patch isn't fully tested yet, but it fixes the build failure, so that will do for now. SImode was not allowed in VCC_HI because there were issues, way back before the port went upstream, so it's possible we'll find out what those issues were again soon. gcc/ChangeLog: PR target/120722 * config/gcn/gcn.cc (gcn_hard_regno_mode_ok): Allow SImode in VCC_HI.
2025-06-20x86: Get the widest vector mode from MOVE_MAXH.J. Lu1-22/+9
Since MOVE_MAX defines the maximum number of bytes that an instruction can move quickly between memory and registers, use it to get the widest vector mode in vector loop when inlining memcpy and memset. gcc/ PR target/120708 * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use MOVE_MAX to get the widest vector mode in vector loop. gcc/testsuite/ PR target/120708 * gcc.target/i386/memcpy-pr120708-1.c: New test. * gcc.target/i386/memcpy-pr120708-2.c: Likewise. * gcc.target/i386/memcpy-pr120708-3.c: Likewise. * gcc.target/i386/memcpy-pr120708-4.c: Likewise. * gcc.target/i386/memcpy-pr120708-5.c: Likewise. * gcc.target/i386/memcpy-pr120708-6.c: Likewise. * gcc.target/i386/memset-pr120708-1.c: Likewise. * gcc.target/i386/memset-pr120708-2.c: Likewise. * gcc.target/i386/memcpy-strategy-1.c: Drop dg-skip-if. Replace -march=atom with -mno-avx -msse2 -mtune=generic -mtune-ctrl=^sse_typeless_stores. * gcc.target/i386/memcpy-strategy-2.c: Likewise. * gcc.target/i386/memcpy-vector_loop-1.c: Likewise. * gcc.target/i386/memcpy-vector_loop-2.c: Likewise. * gcc.target/i386/memset-vector_loop-1.c: Likewise. * gcc.target/i386/memset-vector_loop-2.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-20or1k: Improve If-Conversion by delaying cbranch splitsStafford Horne2-2/+87
When working on PR120587 I found that the ce1 pass was not able to properly optimize branches on OpenRISC. This is because of the early splitting of "compare" and "branch" instructions during the expand pass. Convert the cbranch* instructions from define_expand to define_insn_and_split. This dalays the instruction split until after the ce1 pass is done giving ce1 the best opportunity to perform the optimizations on the original form of cbranch<mode>4 instructions. gcc/ChangeLog: * config/or1k/or1k.cc (or1k_noce_conversion_profitable_p): New function. (or1k_is_cmov_insn): New function. (TARGET_NOCE_CONVERSION_PROFITABLE_P): Define macro. * config/or1k/or1k.md (cbranchsi4): Convert to insn_and_split. (cbranch<mode>4): Convert to insn_and_split. Signed-off-by: Stafford Horne <shorne@gmail.com>
2025-06-20or1k: Implement *extendbisi* to fix ICE in convert_mode_scalar [PR120587]Stafford Horne2-0/+29
After commit 2dcc6dbd8a0 ("emit-rtl: Use simplify_subreg_regno to validate hardware subregs [PR119966]") the OpenRISC port is broken again. Add extend* iinstruction patterns for the SR_F pseudo registers to avoid having to use the subreg conversions which no longer work. gcc/ChangeLog: PR target/120587 * config/or1k/or1k.md (zero_extendbisi2_sr_f): New expand. (extendbisi2_sr_f): New expand. * config/or1k/predicates.md (sr_f_reg_operand): New predicate. Signed-off-by: Stafford Horne <shorne@gmail.com>
2025-06-20RISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR costPan Li3-2/+5
This patch would like to combine the vec_duplicate + vminu.vv to the vminu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, FUNC) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = FUNC (in[i], x); \ } uint32_t min(uint32 a, uint32 b) { return a > b ? b : a; } DEF_VX_BINARY(uint32_t, min) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vminu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vminu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case UMIN. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op umin. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-06-19x86: Enable *mov<mode>_(and|or) only for -OzH.J. Lu1-2/+13
commit ef26c151c14a87177d46fd3d725e7f82e040e89f Author: Roger Sayle <roger@nextmovesoftware.com> Date: Thu Dec 23 12:33:07 2021 +0000 x86: PR target/103773: Fix wrong-code with -Oz from pop to memory. added "*mov<mode>_and" and extended "*mov<mode>_or" to transform "mov $0,mem" to the shorter "and $0,mem" and "mov $-1,mem" to the shorter "or $-1,mem" for -Oz. But the new pattern: (define_insn "*mov<mode>_and" [(set (match_operand:SWI248 0 "memory_operand" "=m") (match_operand:SWI248 1 "const0_operand")) (clobber (reg:CC FLAGS_REG))] "reload_completed" "and{<imodesuffix>}\t{%1, %0|%0, %1}" [(set_attr "type" "alu1") (set_attr "mode" "<MODE>") (set_attr "length_immediate" "1")]) and the extended pattern: (define_insn "*mov<mode>_or" [(set (match_operand:SWI248 0 "nonimmediate_operand" "=rm") (match_operand:SWI248 1 "constm1_operand")) (clobber (reg:CC FLAGS_REG))] "reload_completed" "or{<imodesuffix>}\t{%1, %0|%0, %1}" [(set_attr "type" "alu1") (set_attr "mode" "<MODE>") (set_attr "length_immediate" "1")]) aren't guarded for -Oz. As a result, "and $0,mem" and "or $-1,mem" are generated without -Oz. 1. Change *mov<mode>_and" to define_insn_and_split and split it to "mov $0,mem" if not -Oz. 2. Change "*mov<mode>_or" to define_insn_and_split and split it to "mov $-1,mem" if not -Oz. 3. Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it should be transformed to "or $-1,reg". gcc/ PR target/120427 * config/i386/i386.md (*mov<mode>_and): Changed to define_insn_and_split. Split it to "mov $0,mem" if not -Oz. (*mov<mode>_or): Changed to define_insn_and_split. Split it to "mov $-1,mem" if not -Oz. (peephole2): Don't transform "mov $-1,reg" to "push $-1; pop reg" for -Oz since it will be transformed to "or $-1,reg". gcc/testsuite/ PR target/120427 * gcc.target/i386/cold-attribute-4.c: Compile with -Oz. * gcc.target/i386/pr120427-1.c: New test. * gcc.target/i386/pr120427-2.c: Likewise. * gcc.target/i386/pr120427-3.c: Likewise. * gcc.target/i386/pr120427-4.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-06-19RISC-V: Add generic tune as default.Dongyan Chen3-1/+25
According to the discussion in https://gcc.gnu.org/pipermail/gcc-patches/2025-June/686893.html, by creating a -mtune=generic may be a good idea to slove the question regarding the branch cost. Changes for v2: - Delete the code about -mcpu=generic. gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add "generic" tune. * config/riscv/riscv.cc: Add generic_tune_info. * config/riscv/riscv.h (RISCV_TUNE_STRING_DEFAULT): Change default tune. gcc/testsuite/ChangeLog: * gcc.target/riscv/zicond-primitiveSemantics_compare_reg_reg_return_reg_reg.c: New test.
2025-06-19RISC-V: Use riscv_2x_xlen_mode_p [NFC]Kito Cheng1-8/+4
Use riscv_v_ext_mode_p to check the mode size is 2x XLEN, instead of using "(GET_MODE_UNIT_SIZE (mode) == (UNITS_PER_WORD * 2))". gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Use riscv_2x_xlen_mode_p. (riscv_binary_cost): Ditto. (riscv_hard_regno_mode_ok): Ditto.