aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-05-23aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructionsDhruv Chawla2-90/+75
This patch modifies the shift expander to immediately lower constant shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns to match the lowered forms of the shifts, as the predicate register is not required for these instructions. Bootstrapped and regtested on aarch64-linux-gnu. Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * config/aarch64/aarch64-sve.md (@aarch64_adr<mode>_shift): Match lowered form of ashift. (*aarch64_adr<mode>_shift): Likewise. (*aarch64_adr_shift_sxtw): Likewise. (*aarch64_adr_shift_uxtw): Likewise. (<ASHIFT:optab><mode>3): Check amount instead of operands[2] in aarch64_sve_<lr>shift_operand. (v<optab><mode>3): Generate unpredicated shifts for constant operands. (@aarch64_pred_<optab><mode>): Convert to a define_expand. (*aarch64_pred_<optab><mode>): Create define_insn_and_split pattern from @aarch64_pred_<optab><mode>. (*post_ra_v_ashl<mode>3): Rename to ... (aarch64_vashl<mode>3_const): ... this and remove reload requirement. (*post_ra_v_<optab><mode>3): Rename to ... (aarch64_v<optab><mode>3_const): ... this and remove reload requirement. * config/aarch64/aarch64-sve2.md (@aarch64_sve_add_<sve_int_op><mode>): Match lowered form of SHIFTRT. (*aarch64_sve2_sra<mode>): Likewise. (*bitmask_shift_plus<mode>): Match lowered form of lshiftrt.
2025-05-22iesFrom: Alexandre Oliva <oliva@adacore.com>Alexandre Oliva2-7/+21
[aarch64] [vxworks] mark x18 as fixed, adjust tests VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set (in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead. This patch marks x18 as fixed if the newly-introduced TARGET_OS_USES_R18 is defined, so that it is not chosen by the register allocator, rejects -fsanitize-shadow-call-stack due to the register conflict, and adjusts tests that depend on x18 or on the static chain register. for gcc/ChangeLog * config/aarch64/aarch64-vxworks.h (TARGET_OS_USES_R18): Define. Update comments. * config/aarch64/aarch64.cc (aarch64_conditional_register_usage): Mark x18 as fixed on VxWorks. (aarch64_override_options_internal): Issue sorry message on -fsanitize=shadow-call-stack if TARGET_OS_USES_R18. for gcc/testsuite/ChangeLog * gcc.dg/cwsc1.c (CHAIN, aarch64): x9 instead x18 for __vxworks. * gcc.target/aarch64/reg-alloc-4.c: Drop x18-assigned asm operand on vxworks. * gcc.target/aarch64/shadow_call_stack_1.c: Don't expect -ffixed-x18 error on vxworks, but rather the sorry message. * gcc.target/aarch64/shadow_call_stack_2.c: Skip on vxworks. * gcc.target/aarch64/shadow_call_stack_3.c: Likewise. * gcc.target/aarch64/shadow_call_stack_4.c: Likewise. * gcc.target/aarch64/shadow_call_stack_5.c: Likewise. * gcc.target/aarch64/shadow_call_stack_6.c: Likewise. * gcc.target/aarch64/shadow_call_stack_7.c: Likewise. * gcc.target/aarch64/shadow_call_stack_8.c: Likewise. * gcc.target/aarch64/stack-check-prologue-19.c: Likewise. * gcc.target/aarch64/stack-check-prologue-20.c: Likewise.
2025-05-22[RISC-V] Clear both upper and lower bits using 3 shiftsShreya Munnangi1-0/+28
So the next step in Shreya's work. In the prior patch we used two shifts to clear bits at the high or low end of an object. In this patch we use 3 shifts to clear bits on both ends. Nothing really special here. With mvconst_internal still in the tree it's of marginal value, though Shreya and I have confirmed the code coming out of expand looks good. It's just that combine reconstitutes the operation via mvconst_internal+and which looks cheaper. When I was playing in this space earlier I definitely saw testsuite cases that need this case handled to not regress with mvconst_internal removed. This has spun in my tester on rv32 and rv64 and it's bootstrap + testing on my BPI with a mere 23 hours to go. Waiting on pre-commit testing to render a verdict before moving forward. gcc/ * config/riscv/riscv.cc (synthesize_and): When profitable, use a three shift sequence to clear bits at both upper and lower bits rather than synthesizing the constant mask.
2025-05-22[PATCH][RISC-V][PR target/70557] Improve storing 0 to memory on rv32Siarhei Volkau1-2/+2
Patch is originally from Siarhei Volkau <lis8215@gmail.com>. RISC-V has a zero register (x0) which we can use to store zero into memory without loading the constant into a distinct register. Adjust the constraints of the 32-bit movdi_32bit pattern to recognize that we can store 0.0 into memory using x0 as the source register. This patch only affects RISC-V. It has been regression tested on riscv64-elf. Jeff has also tested this in his tester (riscv64-elf and riscv32-elf) with no regressions. PR target/70557 gcc/ * config/riscv/riscv.md (movdi_32bit): Add "J" constraint to allow storing 0 directly to memory.
2025-05-22aarch64: Improve rtx_cost for constants in COMPARE [PR120372]Andrew Pinski1-0/+7
The middle-end uses rtx_cost on constants with the outer of being COMPARE to find out the cost of a constant formation for a comparison instruction. So for aarch64 backend, we would just return the cost of constant formation in general. We can improve this by seeing if the outer is COMPARE and if the constant fits the constraints of the cmp instruction just set the costs to being one instruction. Built and tested for aarch64-linux-gnu. PR target/120372 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_rtx_costs <case CONST_INSN>): Handle if outer is COMPARE and the constant can be handled by the cmp instruction. gcc/testsuite/ChangeLog: * gcc.target/aarch64/imm_choice_comparison-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-22i386: Extend *cmp<mode>_minus_1 optimizations also to plus with CONST_INT ↵Jakub Jelinek2-0/+31
[PR120360] As mentioned by Linus, we can't optimize comparison of otherwise unused result of plus with CONST_INT second operand, compared against zero. This can be done using just cmp instruction with negated constant and say js/jns/je/jne etc. conditional jumps (or setcc). We already have *cmp<mode>_minus_1 instruction which handles it when (as shown in foo in the testcase) the IL has MINUS rather than PLUS, but for constants except for the minimum value the canonical form is with PLUS. The following patch adds a new pattern and predicate to handle this. 2025-05-22 Jakub Jelinek <jakub@redhat.com> PR target/120360 * config/i386/predicates.md (x86_64_neg_const_int_operand): New predicate. * config/i386/i386.md (*cmp<mode>_plus_1): New pattern. * gcc.target/i386/pr120360.c: New test.
2025-05-21[RISC-V] Clear high or low bits using shift pairsShreya Munnangi1-0/+37
So the first special case of clearing bits from Shreya's work. We can clear an arbitrary number of high bits by shifting left by the number of bits to clear, then logically shifting right to put everything in place. Similarly we can clear an arbitrary number of low bits with a right logical shift followed by a left shift. Naturally this only applies when the constant synthesis budget is 2+ insns. Even with mvconst_internal still enabled this does consistently show various small code generation improvements. I have seen a notable regression. The two shift form to wipe out high bits isn't handled well by ext-dce. Essentially it looks like we don't recognize the sequence as wiping upper bits, instead it makes bits live and as a result we're unable to remove a prior zero extension. I've opened a bug for this issue. The other case I've seen is CSE related. If we had a number of masking operations with the same mask, we might have previously CSE'd the constant. In that scenario each instance of masking would be a single AND using the CSE'd register holding the constant, whereas with this patch it'll be a pair of shifts. But on a good uarch design the pair of shifts would be fused into a single op. Given this is relatively rare and on the margins from a performance standpoint I'm not going to worry about it. This has spun in my tester for riscv32-elf and riscv64-elf. Bootstrap and regression test is in flight and due in an hour or so. Waiting on the upstream pre-commit tester and the bootstrap test before moving forward. gcc/ * config/riscv/riscv.cc (synthesize_and): When profitable, use two shift combinations to clear high or low bits rather than synthsizing the constant.
2025-05-21aarch64: Carry over zeroness in aarch64_evpc_reencodePengxuan Zheng1-1/+3
There was a bug in aarch64_evpc_reencode which could leave zero_op0_p and zero_op1_p of the struct "newd" uninitialized. r16-701-gd77c3bc1c35e303 fixed the issue by zero initializing "newd." This patch provides an alternative fix as suggested by Richard Sandiford based on the fact that the zeroness is preserved by aarch64_evpc_reencode. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy zero_op0_p and zero_op1_p from d to newd. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2025-05-21[RISC-V] Improve (x << C1) + C2 split codeJeff Law1-9/+24
I wrote this a couple months ago to fix an instruction count regression in 505.mcf on risc-v, but I don't have a trivial little testcase to add to the suite. There were two problems with the pattern. First, the code was generating a shift followed by an add after reload. Naturally combine doesn't run after reload and the code stayed in that form rather than using shadd when available. Second the splitter was just over-active. We need to make sure that the shifted form of the constant operand has a cost > 1 to synthesize. It's useless to split if the shifted constant can be synthesized in a single instruction. This has been in my tester since March. So it's been through numerous riscv64-elf and riscv32-elf test cycles as well as multiple rv64 bootstrap tests. Waiting on the upstream CI system to render a verdict before moving forward. Looking further out I'm hoping this pattern will transform into a simpler and always active define_split. gcc/ * config/riscv/riscv.md ((x << C1) + C2): Tighten split condition and generate more efficient code when splitting.
2025-05-21[RISC-V][PR target/120368] Fix 32bit shift on rv64Jeff Law1-1/+8
So a followup to last week's bugfix. In last week's change we we stopped using define_insn_and_split to rewrite instructions. That change was done to avoid dropping a masking instruction out of the RTL. As a result the pattern(s) were changed into simple define_insns, which is good. One of them uses the GPR iterator since it's supposed to work for both 32bit and 64bit shifts on rv64. But we failed to emit the right opcode for a 32bit shift on rv64. Thankfully the fix is trivial. If the mode is anything but word_mode, then we must be doing a 32-bit shift on rv64, ie the various "w" shift instructions. It's run through my tester. Just waiting on the upstream CI system to spin it. PR target/120368 gcc/ * config/riscv/riscv.md (shift with masked shift count): Fix opcode when generating an SImode shift on rv64. gcc/testsuite/ * gcc.target/riscv/pr120368.c: New test.
2025-05-21RISC-V: RISC-V: Combine vec_duplicate + vand.vv to vand.vx on GR2VR costPan Li3-1/+4
This patch would like to combine the vec_duplicate + vand.vv to the vand.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, &) Before this patch: 10 │ test_vx_binary_and_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vand.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_and_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vand.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case for rtx code AND. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op and to no_shift_vx_ops. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-21sparc: Avoid operandN variables in .md filesRichard Sandiford1-40/+47
The automatically-generated gen_* routines take their operands as individual arguments, named "operand0" upwards. These arguments are stored into an "operands" array before invoking the expander's C++ code, which can then modify the operands by writing to the array. However, the SPARC sign-extend and zero-extend expanders used the operandN variables directly, rather than operands[N]. That's a correct usage in context, since the code goes on to expand the pattern manually and invoke DONE. But it's also easy for code to accidentally write to operandN instead of operands[N] when trying to set up something like a match_dup. It sounds like Jeff had seen an instance of this. A later patch is therefore going to mark the operandN arguments as const. This patch makes way for that by using operands[N] instead of operandN for the SPARC expanders. gcc/ * config/sparc/sparc.md (zero_extendhisi2, zero_extendhidi2) (extendhisi2, extendqihi2, extendqisi2, extendqidi2) (extendhidi2): Use operands[0] and operands[1] instead of operand0 and operand1.
2025-05-21xstormy16: Avoid accessing beyond the operands[] arrayRichard Sandiford1-2/+1
The negsi2 C++ code writes to operands[2] even though the pattern has no operand 2. gcc/ * config/stormy16/stormy16.md (negsi2): Remove unused assignment.
2025-05-21nds32: Avoid accessing beyond the operands[] arrayRichard Sandiford1-5/+6
This pattern used operands[2] to hold the shift amount, even though the pattern doesn't have an operand 2 (not even as a match_dup). This caused a build failure with -Werror: array subscript 2 is above array bounds of ‘rtx_def* [2]’ gcc/ PR target/100837 * config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use a local variable instead of operands[2].
2025-05-20[RISC-V] Infrastructure of synthesizing logical AND with constantShreya Munnangi3-23/+63
So this is the next step on the path to mvconst_internal removal and is work from Shreya and myself. This puts in the infrastructure to allow us to synthesize logical AND much like we're doing with logical IOR/XOR. Unlike IOR/XOR, AND has many more special cases that can be profitable. For example, you can use shifts to clear many bits. You can use zero extension to clear bits, you can use rotate+andi+rotate, shift pairs, etc. So to make potential bisecting easy the plan is to drop in the work on logical AND in several steps, essentially one new case at a time. This step just puts the basics of a operation synthesis in place. It still uses the same code generation strategies as we are currently using. I'd like to say this is NFC, but unfortunately that's not true. While the code generation strategy is the same, this does indirectly introduce new REG_EQUAL notes. Those additional notes in turn can impact how various optimizers behave in very minor ways. As usual, this has survived my tester on riscv32-elf and riscv64-elf. Waiting on pre-commit to do its thing. And I'll start queuing up the additional cases we want to handle while waiting 😉 gcc/ * config/riscv/riscv-protos.h (synthesize_and): Prototype. * config/riscv/riscv.cc (synthesize_and): New function. * config/riscv/riscv.md (and<mode>3): Use it. Co-Authored-By: Jeff Law <jlaw@ventanamicro.com>
2025-05-20[PATCH v2 2/2] MIPS p8700 doesn't have vector extension and added the ↵Umesh Kalappa1-0/+28
dummies reservation for the same. The RISC-V backend requires all types to map to a reservation in the scheduler model. This adds types to a dummy reservation for all the types not currently handled by the p8700 model. gcc/ * config/riscv/mips-p8700.md (mips_p8700_dummies): New reservation. (mips_p8700_unknown): Reservation for all the dummies.
2025-05-20[PATCH v2 1/2] The following changes enable P8700 processor for RISCV and ↵Umesh Kalappa5-2/+170
P8700 is a high-performance processor from MIPS by extending RISCV with custom instructions Add support for the p8700 design from MIPS. gcc/ * config/riscv/mips-p8700.md: New scheduler model. * config/riscv/riscv-cores.def (mips-p87000): New tuning model and core architecture. * config/riscv/riscv-opts.h (riscv_microarchitecture_type); Add mips-p8700. * config/riscv/riscv.cc (mips_p8700_tune_info): New uarch tuning parameters. * config/riscv/riscv.md (tune): Add mips_p8700. Include mips-p8700.md * doc/invoke.texi: Document tune/cpu options for the MIPS P8700. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2025-05-19[RISC-V] Avoid multiple assignments to output objectJeff Law1-40/+49
This is the next batch of changes to reduce multiple assignments to an output object. This time I'm focused on splitters in bitmanip.md. This doesn't convert every case. For example there is one case that is very clearly dependent on eliminating mvconst_internal and adjustment of a splitter for andn and until those things happen it would clearly be a QOI implementation regression. There are cases where we set a scratch register more than once. It may be possible to use an additional scratch. I haven't tried that yet. I've seen one failure to if-convert a sequence after this patch, but it should be resolved once the logical AND changes are merged. Otherwise I'm primarily seeing slight differences in register allocation and scheduling. Nothing concerning to me. This has run through my tester, but I obviously want to see how it behaves in the upstream CI system as that tests slightly different multilibs than mine (on purpose). gcc/ * config/riscv/bitmanip.md (various splits): Avoid writing the output more than once when trivially possible.
2025-05-20RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR costPan Li3-8/+59
This patch would like to combine the vec_duplicate + vrub.vv to the vrsub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY_REVERSE_CASE_0(T, OP, NAME) \ void \ test_vx_binary_reverse_##NAME##_##T##_case_0 (T * restrict out, \ T * restrict in, T x, \ unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = x OP in[i]; \ } DEF_VX_BINARY_REVERSE_CASE_0(int32_t, -) Before this patch: 54 │ test_vx_binary_reverse_rsub_int32_t_case_0: 55 │ beq a3,zero,.L27 56 │ vsetvli a5,zero,e32,m1,ta,ma 57 │ vmv.v.x v2,a2 58 │ slli a3,a3,32 59 │ srli a3,a3,32 60 │ .L22: 61 │ vsetvli a5,a3,e32,m1,ta,ma 62 │ vle32.v v1,0(a1) 63 │ slli a4,a5,2 64 │ sub a3,a3,a5 65 │ add a1,a1,a4 66 │ vsub.vv v1,v2,v1 67 │ vse32.v v1,0(a0) 68 │ add a0,a0,a4 69 │ bne a3,zero,.L22 After this patch: 50 │ test_vx_binary_reverse_rsub_int32_t_case_0: 51 │ beq a3,zero,.L27 52 │ slli a3,a3,32 53 │ srli a3,a3,32 54 │ .L22: 55 │ vsetvli a5,a3,e32,m1,ta,ma 56 │ vle32.v v1,0(a1) 57 │ slli a4,a5,2 58 │ sub a3,a3,a5 59 │ add a1,a1,a4 60 │ vrsub.vx v1,v1,a2 61 │ vse32.v v1,0(a0) 62 │ add a0,a0,a4 63 │ bne a3,zero,.L22 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec-opt.md: Leverage the new add func to expand the vx insn. * config/riscv/riscv-protos.h (expand_vx_binary_vec_dup_vec): Add new func decl to expand format v = vop(vec_dup(x), v). (expand_vx_binary_vec_vec_dup): Diito but for format v = vop(v, vec_dup(x)). * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new func impl to expand vx for v = vop(vec_dup(x), v). (expand_vx_binary_vec_vec_dup): Diito but for another format v = vop(v, vec_dup(x)). Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-19[committed][RISC-V][PR target/120333] Remove bogus bext patternJeff Law1-26/+0
I goof'd when doing analysis of missed bext cases. For the shift into the sign bit, then shift into the low bit case (thankfully the least common), I got it in my brain that the field is at the left shift count. It's actually at word_size - 1 - left shift count. One the subtraction is included, it's no longer profitable to turn those cases into bext. Best case scenario would be sub+bext, but we can just as easily use sll+srl which fuses in some designs into a single op. So this patch removes those two patterns, adjusts the existing testcase and adds the new execution test. Given it's a partial reversion and has passed in my tester, I'm going to go ahead and push it to the trunk rather than waiting for upstream CI. PR target/120333 gcc/ * config/riscv/bitmanip.md: Remove bext formed from left+right shift patterns. gcc/testsuite/ * gcc.target/riscv/pr114512.c: Update expected output. * gcc.target/riscv/pr120333.c: New test.
2025-05-19hpux: Fix detection of atomic support when profilingJohn David Anglin1-0/+14
The pa target lacks atomic sync compare and swap instructions. These are implemented as libcalls and in libatomic. As on linux, we lie about their availability. This fixes the gcov-30.c test on hppa64-hpux11. 2025-05-19 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa-hpux.h (TARGET_HAVE_LIBATOMIC): Define. (HAVE_sync_compare_and_swapqi): Likewise. (HAVE_sync_compare_and_swaphi): Likewise. (HAVE_sync_compare_and_swapsi): Likewise. (HAVE_sync_compare_and_swapdi): Likewise.
2025-05-19[RISC-V] Fix false positive from WuninitializedJeff Law1-1/+3
As Mark and I independently tripped, there's a Wuninitialized issue in the RISC-V backend. While *I* know the value would always be properly initialized, it'd be somewhat painful to either eliminate the infeasible paths or do deep enough analysis to suppress the false positive. So this initializes OUTPUT and verifies it's got a reasonable value before using it for the final copy into operands[0]. Bootstrapped on the BPI (regression testing still has ~12hrs to go). gcc/ * config/riscv/riscv.cc (synthesize_ior_xor): Initialize OUTPUT and verify it's non-null before emitting the final copy insn.
2025-05-19arm: fully validate mem_noofs_operand [PR120351]Richard Earnshaw1-1/+2
It's not enough to just check that a memory operand is of the form mem(reg); after RA we also need to validate the register being used. The safest way to do this is to call memory_operand. PR target/120351 gcc/ChangeLog: * config/arm/predicates.md (mem_noofs_operand): Also check the op is a valid memory_operand. gcc/testsuite/ChangeLog: * gcc.target/arm/pr120351.c: New test.
2025-05-19RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cczhusonghe1-8/+8
The variables `major` and `minor` in `gen-riscv-ext-texi.cc` conflict with the macros of the same name defined in `<sys/sysmacros.h>`, which are exposed when building with newer versions of GCC on older Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them to `major_version` and `minor_version` respectively. This aligns with the GCC community's recommended practice [1] and improves code clarity. [1] https://gcc.gnu.org/pipermail/gcc-patches/2025-May/683881.html gcc/ChangeLog: * config/riscv/gen-riscv-ext-texi.cc (struct version_t):rename major/minor to major_version/minor_version. Signed-off-by: Songhe Zhu <zhusonghe@eswincomputing.com>
2025-05-19RISC-V: Support Zilsd code genKito Cheng1-0/+39
This commit adds the code gen support for Zilsd, which is a newly added extension for RISC-V. The Zilsd extension allows for loading and storing 64-bit values using even-odd register pairs. We only try to do miminal code gen support for that, which means only use the new instructions when the load store is 64 bits data, we can use that to optimize the code gen of memcpy/memset/memmove and also the prologue and epilogue of functions, but I think that probably should be done in a follow up patch. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Handle load/store with odd-even reg pair. (riscv_split_64bit_move_p): Don't split load/store if zilsd enabled. (riscv_hard_regno_mode_ok): Only allow even reg can be used for 64 bits mode for zilsd. gcc/testsuite/ChangeLog: * gcc.target/riscv/zilsd-code-gen.c: New test.
2025-05-19RISC-V: Add new operand constraint: cRKito Cheng1-0/+4
This commit introduces a new operand constraint `cR` for the RISC-V architecture, which allows the use of an even-odd RVC general purpose register (x8-x15) in inline asm. Ref: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/102 gcc/ChangeLog: * config/riscv/constraints.md (cR): New constraint. * doc/md.texi (Machine Constraints::RISC-V): Document the new cR constraint. gcc/testsuite/ChangeLog: * gcc.target/riscv/constraint-cR-pair.c: New test case.
2025-05-19i386: Combine AVX10.2 intrin filesHaochen Jiang11-3964/+3717
Since we use a single avx10.2 to enable everything, there is no need to split them into two files. gcc/ChangeLog: * config.gcc: Remove 512 intrin file. * config/i386/avx10_2-512bf16intrin.h: Removed and combined to ... * config/i386/avx10_2bf16intrin.h: ... this. * config/i386/avx10_2-512convertintrin.h: Removed and combined to ... * config/i386/avx10_2convertintrin.h: ... this. * config/i386/avx10_2-512mediaintrin.h: Removed and combined to ... * config/i386/avx10_2mediaintrin.h: ... this. * config/i386/avx10_2-512minmaxintrin.h: Removed and combined to ... * config/i386/avx10_2minmaxintrin.h: ... this. * config/i386/avx10_2-512satcvtintrin.h: Removed and combined to ... * config/i386/avx10_2satcvtintrin.h: ... this. * config/i386/immintrin.h: Remove 512 intrin file. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Combine tests and change intrin file name. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto.
2025-05-19i386: Remove duplicate iterators in mdHaochen Jiang1-221/+191
There are several iterators no longer needed in md files since after refactor in AVX10, they could directly use legacy AVX512 ones. Remove those duplicate iterators. gcc/ChangeLog: * config/i386/sse.md (VF1_VF2_AVX10_2): Removed. (VF2_AVX10_2): Ditto. (VI1248_AVX10_2): Ditto. (VFH_AVX10_2): Ditto. (VF1_AVX10_2): Ditto. (VHF_AVX10_2): Ditto. (VBF_AVX10_2): Ditto. (VI8_AVX10_2): Ditto. (VI2_AVX10_2): Ditto. (VBF): New. (div<mode>3): Use VBF instead of AVX10.2 ones. (vec_cmp<mode><avx512fmaskmodelower>): Ditto. (avx10_2_cvt2ps2phx_<mode><mask_name><round_name>): Use VHF_AVX512VL instead of AVX10.2 ones. (vcvt<convertfp8_pack><mode><mask_name>): Ditto. (vcvthf82ph<mode><mask_name>): Ditto. (VHF_AVX10_2_2): Remove not needed TARGET_AVX10_2. (usdot_prod<sseunpackmodelower><mode>): Use VI2_AVX512F instead of AVX10.2 ones. (vdpphps_<mode>): Use VF1_AVX512VL instead of AVX10.2 ones. (vdpphps_<mode>_mask): Ditto. (vdpphps_<mode>_maskz): Ditto. (vdpphps_<mode>_maskz_1): Ditto. (avx10_2_scalefbf16_<mode><mask_name>): Use VBF instead of AVX10.2 ones. (<code><mode>3): Ditto. (avx10_2_<code>bf16_<mode><mask_name>): Ditto. (avx10_2_fmaddbf16_<mode>_maskz); Ditto. (avx10_2_fmaddbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fmaddbf16_<mode>_mask): Ditto. (avx10_2_fmaddbf16_<mode>_mask3): Ditto. (avx10_2_fnmaddbf16_<mode>_maskz): Ditto. (avx10_2_fnmaddbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fnmaddbf16_<mode>_mask): Ditto. (avx10_2_fnmaddbf16_<mode>_mask3): Ditto. (avx10_2_fmsubbf16_<mode>_maskz); Ditto. (avx10_2_fmsubbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fmsubbf16_<mode>_mask): Ditto. (avx10_2_fmsubbf16_<mode>_mask3): Ditto. (avx10_2_fnmsubbf16_<mode>_maskz): Ditto. (avx10_2_fnmsubbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fnmsubbf16_<mode>_mask): Ditto. (avx10_2_fnmsubbf16_<mode>_mask3): Ditto. (avx10_2_rsqrtbf16_<mode><mask_name>): Ditto. (avx10_2_sqrtbf16_<mode><mask_name>): Ditto. (avx10_2_rcpbf16_<mode><mask_name>): Ditto. (avx10_2_getexpbf16_<mode><mask_name>): Ditto. (avx10_2_<bf16immop>bf16_<mode><mask_name>): Ditto. (avx10_2_fpclassbf16_<mode><mask_scalar_merge_name>): Ditto. (avx10_2_cmpbf16_<mode><mask_scalar_merge_name>): Ditto. (avx10_2_cvt<sat_cvt_trunc_prefix>bf162i<sat_cvt_sign_prefix>bs<mode><mask_name>): Ditto. (avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>): Use VHF_AVX512VL instead of AVX10.2 ones. (avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>): Ditto. (avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>): Use VF1_AVX512VL instead of AVX10.2 ones. (avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>): Ditto. (avx10_2_vcvtt<castmode>2<sat_cvt_sign_prefix>dqs<mode><mask_name><round_saeonly_name>): Use VF instead of AVX10.2 ones. (avx10_2_vcvttpd2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>): Use VF2 instead of AVX10.2 ones. (avx10_2_vcvttps2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>): Use VI8 instead of AVX10.2 ones. (avx10_2_minmaxbf16_<mode><mask_name>): Use VBF instead of AVX10.2 ones. (avx10_2_minmaxp<mode><mask_name><round_saeonly_name>): Use VFH_AVX512VL instead of AVX10.2 ones. (avx10_2_vmovrs<ssemodesuffix><mode><mask_name>): Use VI1248_AVX512VLBW instead of AVX10.2 ones.
2025-05-19i386: Remove avx10.1-256/512 and evex512 optionsHaochen Jiang41-925/+553
As we mentioned in GCC 15, we will remove avx10.1-256/512 and evex512 in GCC 16. Also, the combination of AVX10 and AVX512 option behavior will also be simplified in GCC 16 since AVX10.1 now implied AVX512, making the behavior matching everyone else. gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Remove feature set for AVX10_1_256. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_EVEX512_SET): Removed. (OPTION_MASK_ISA2_AVX10_1_256_SET): Removed. (OPTION_MASK_ISA_AVX10_1_SET): Imply all AVX512 features. (OPTION_MASK_ISA2_AVX10_1_SET): Ditto. (OPTION_MASK_ISA2_AVX2_UNSET): Remove AVX10_1_UNSET. (OPTION_MASK_ISA2_EVEX512_UNSET): Removed. (OPTION_MASK_ISA2_AVX10_1_UNSET): Remove AVX10_1_256. (OPTION_MASK_ISA2_AVX512F_UNSET): Unset AVX10_1. (ix86_handle_option): Remove special handling for AVX512/AVX10.1 options, evex512 and avx10_1_256. Modify ISA set for AVX10 options. * common/config/i386/i386-cpuinfo.h (enum feature_priority): Remove P_AVX10_1_256. (enum processor_features): Remove FEATURE_AVX10_1_256. * common/config/i386/i386-isas.h: Remove avx10.1-256/512. * config/i386/avx512bf16intrin.h: Rollback target push before evex512 is introduced. * config/i386/avx512bf16vlintrin.h: Ditto. * config/i386/avx512bitalgintrin.h: Ditto. * config/i386/avx512bitalgvlintrin.h: Ditto. * config/i386/avx512bwintrin.h: Ditto. * config/i386/avx512cdintrin.h: Ditto. * config/i386/avx512dqintrin.h: Ditto. * config/i386/avx512fintrin.h: Ditto. * config/i386/avx512fp16intrin.h: Ditto. * config/i386/avx512fp16vlintrin.h: Ditto. * config/i386/avx512ifmaintrin.h: Ditto. * config/i386/avx512ifmavlintrin.h: Ditto. * config/i386/avx512vbmi2intrin.h: Ditto. * config/i386/avx512vbmi2vlintrin.h: Ditto. * config/i386/avx512vbmiintrin.h: Ditto. * config/i386/avx512vbmivlintrin.h: Ditto. * config/i386/avx512vlbwintrin.h: Ditto. * config/i386/avx512vldqintrin.h: Ditto. * config/i386/avx512vlintrin.h: Ditto. * config/i386/avx512vnniintrin.h: Ditto. * config/i386/avx512vnnivlintrin.h: Ditto. * config/i386/avx512vp2intersectintrin.h: Ditto. * config/i386/avx512vp2intersectvlintrin.h: Ditto. * config/i386/avx512vpopcntdqintrin.h: Ditto. * config/i386/avx512vpopcntdqvlintrin.h: Ditto. * config/i386/gfniintrin.h: Ditto. * config/i386/vaesintrin.h: Ditto. * config/i386/vpclmulqdqintrin.h: Ditto. * config/i386/driver-i386.cc (check_avx512_features): Removed. (host_detect_local_cpu): Remove -march=native special handling. * config/i386/i386-builtins.cc (ix86_vectorize_builtin_gather): Remove TARGET_EVEX512. * config/i386/i386-c.cc (ix86_target_macros_internal): Remove EVEX512 and AVX10_1_256. * config/i386/i386-expand.cc (ix86_valid_mask_cmp_mode): Remove TARGET_EVEX512. (ix86_expand_int_sse_cmp): Ditto. (ix86_vector_duplicate_simode_const): Ditto. (ix86_expand_vector_init_duplicate): Ditto. (ix86_expand_vector_init_one_nonzero): Ditto. (ix86_emit_swsqrtsf): Ditto. (ix86_vectorize_vec_perm_const): Ditto. (ix86_expand_vecop_qihi2): Ditto. (ix86_expand_sse2_mulvxdi3): Ditto. (ix86_gen_bcst_mem): Ditto. * config/i386/i386-isa.def (EVEX512): Removed. (AVX10_1_256): Ditto. * config/i386/i386-options.cc (isa2_opts): Remove evex512 and avx10.1-256. (ix86_function_specific_save): Remove no_avx512_explicit and no_avx10_1_explicit. (ix86_function_specific_restore): Ditto. (ix86_valid_target_attribute_inner_p): Remove evex512 and avx10.1-256/512. (ix86_valid_target_attribute_tree): Remove special handling to rerun ix86_option_override_internal for AVX10.1-256. (ix86_option_override_internal): Remove warning handling. (ix86_simd_clone_adjust): Remove evex512. * config/i386/i386.cc (type_natural_mode): Remove TARGET_EVEX512. (ix86_return_in_memory): Ditto. (standard_sse_constant_p): Ditto. (standard_sse_constant_opcode): Ditto. (ix86_get_ssemov): Ditto. (ix86_legitimate_constant_p): Ditto. (ix86_vectorize_builtin_scatter): Ditto. (ix86_hard_regno_mode_ok): Ditto. (ix86_set_reg_reg_cost): Ditto. (ix86_rtx_costs): Ditto. (ix86_vector_mode_supported_p): Ditto. (ix86_preferred_simd_mode): Ditto. (ix86_autovectorize_vector_modes): Ditto. (ix86_get_mask_mode): Ditto. (ix86_simd_clone_compute_vecsize_and_simdlen): Ditto. (ix86_simd_clone_usable): Ditto. * config/i386/i386.h (BIGGEST_ALIGNMENT): Ditto. (MOVE_MAX): Ditto. (STORE_MAX_PIECES): Ditto. (PTA_SKYLAKE_AVX512): Remove PTA_EVEX512. (PTA_CANNONLAKE): Ditto. (PTA_ZNVER4): Ditto. (PTA_GRANITERAPIDS): Use PTA_AVX10_1. (PTA_DIAMONDRAPIDS): Use PTA_GRANITERAPIDS. * config/i386/i386.md: Remove TARGET_EVEX512, avx512f_512 and avx512bw_512. * config/i386/i386.opt: Remove ix86_no_avx512_explicit, ix86_no_avx10_1_explicit, mevex512, mavx10.1-256/512 and warning for mavx10.1. Modify option comment. * config/i386/i386.opt.urls: Remove evex512 and avx10.1-256/512. * config/i386/predicates.md: Remove TARGET_EVEX512. * config/i386/sse.md: Ditto. * doc/extend.texi: Remove avx10.1-256/512. Modify avx10.1 doc. * doc/invoke.texi: Remove avx10.1-256/512 and evex512. * doc/sourcebuild.texi: Remove avx10.1-256/512. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_1-1.c: Remove warning. * gcc.target/i386/avx10_1-2.c: Ditto. * gcc.target/i386/avx10_1-3.c: Ditto. * gcc.target/i386/avx10_1-4.c: Ditto. * gcc.target/i386/pr111068.c: Ditto. * gcc.target/i386/pr117946.c: Ditto. * gcc.target/i386/pr117240_avx512f.c: Remove -mevex512 and warning. * gcc.target/i386/avx10_1-11.c: Rename to ... * gcc.target/i386/avx10_1-5.c: ... this. Remove warning. * gcc.target/i386/avx10_1-12.c: Rename to ... * gcc.target/i386/avx10_1-6.c: ... this. Remove warning. * gcc.target/i386/avx10_1-26.c: Rename to ... * gcc.target/i386/avx10_1-7.c: ... this. Remove warning. The origin avx10_1-7.c is removed. * gcc.target/i386/avx10_1-10.c: Removed. * gcc.target/i386/avx10_1-13.c: Removed. * gcc.target/i386/avx10_1-14.c: Removed. * gcc.target/i386/avx10_1-15.c: Removed. * gcc.target/i386/avx10_1-16.c: Removed. * gcc.target/i386/avx10_1-17.c: Removed. * gcc.target/i386/avx10_1-18.c: Removed. * gcc.target/i386/avx10_1-19.c: Removed. * gcc.target/i386/avx10_1-20.c: Removed. * gcc.target/i386/avx10_1-21.c: Removed. * gcc.target/i386/avx10_1-22.c: Removed. * gcc.target/i386/avx10_1-23.c: Removed. * gcc.target/i386/avx10_1-8.c: Removed. * gcc.target/i386/avx10_1-9.c: Removed. * gcc.target/i386/noevex512-1.c: Removed. * gcc.target/i386/noevex512-2.c: Removed. * gcc.target/i386/noevex512-3.c: Removed. * gcc.target/i386/pr111889.c: Removed. * gcc.target/i386/pr111907.c: Removed.
2025-05-19i386: Unpush OPTION_MASK_ISA2_EVEX512 for builtinsHaochen Jiang2-669/+669
As we mentioned in GCC 15, we will remove evex512 in GCC 16 since it is not useful anymore since we will have 512 bit directly. This patch will first unpush evex512 in the builtins. gcc/ChangeLog: * config/i386/i386-builtin.def (BDESC): Remove OPTION_MASK_ISA2_EVEX512. * config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr90096.c: Adjust error message. * gcc.target/i386/pr117304-1.c: Removed.
2025-05-17[RISC-V] Fix ICE due to bogus use of gen_rtvecJeff Law1-1/+1
Found this while setting up the risc-v coordination branch off of gcc-15. Not sure why I didn't use rtvec_alloc directly here since we're going to initialize the whole vector ourselves. Using gen_rtvec was just wrong as it's walking down a non-existent varargs list. Under the "right" circumstances it can walk off a page and fault. This was seen with a test already in the testsuite (I forget which test), so no new regression test. Tested in my tester and verified the failure on the coordination branch is resolved a well. Waiting on pre-commit CI to render a verdict. gcc/ * config/riscv/riscv-vect-permconst.cc (vector_permconst:process_bb): Use rtvec_alloc, not gen_rtvec since we don't want/need to initialize the vector.
2025-05-17[RISC-V] Avoid setting output object more than once in IOR/XOR synthesisJeff Law1-16/+36
While evaluating Shreya's logical AND synthesis work on spec2017 I ran into a code quality regression where combine was failing to eliminate a redundant sign extension. I had a hunch the problem would be with the multiple sets of the same pseudo register in the AND synthesis path. I was right that the problem was multiple sets of the same pseudo, but it was actually some of the splitters in the RISC-V backend that were the culprit. Those multiple sets caused the sign bit tracking code to need to make conservative assumptions thus resulting in failure to eliminate the unnecessary sign extension. So before we start moving on the logical AND patch we're going to do some cleanups. There's multiple moving parts in play. For example, we have splitters which do multiple sets of the output register. Fixing some of those independently would result in a code quality regression. Instead they need some adjustments to or removal of mvconst_internal. Of course getting rid of mvconst_internal will trigger all kinds of code quality regressions right now which ultimately lead back to the need to revamp the logical AND expander. Point being we've got some circular dependencies and breaking them may result in short term code quality regressions. I'll obviously try to avoid those as much as possible. So to start the process this patch adjusts the recently added XOR/IOR synthesis to avoid re-using the destination register. While the reuse was clearly safe from a semantic standpoint, various parts of the compiler can do a better job for pseudos that are only set once. Given this synthesis path should only be active during initial RTL generation, we can create new pseudos at will, so we create a new one for each insn. At the end of the sequence we copy from the last set into the final destination. This has various trivial impacts on the code generation, but the resulting code looks no better or worse to me across spec2017. This has been tested in my tester and is currently bootstrapping on my BPI. Waiting on data from the pre-commit tester before moving forward... gcc/ * config/riscv/riscv.cc (synthesize_ior_xor): Avoid writing operands[0] more than once, use new pseudos instead.
2025-05-17RISC-V: Since the loop increment i++ is unreachable, the loop body will ↵Jin Ma1-3/+2
never execute more than once Reported-by: huangcunjian <huangcunjian.huang@alibaba-inc.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gpr_save_operation_p): Remove break and fixbug for elt index.
2025-05-16aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]Pengxuan Zheng5-6/+66
We can optimize AND with certain vector of immediates as FMOV if the result of the AND is as if the upper lane of the input vector is set to zero and the lower lane remains unchanged. For example, at present: v4hi f_v4hi (v4hi x) { return x & (v4hi){ 0xffff, 0xffff, 0, 0 }; } generates: f_v4hi: movi d31, 0xffffffff and v0.8b, v0.8b, v31.8b ret With this patch, it generates: f_v4hi: fmov s0, s0 ret Changes since v1: * v2: Simplify the mask checking logic by using native_decode_int and address a few other review comments. PR target/100165 gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_output_fmov): New prototype. (aarch64_simd_valid_and_imm_fmov): Likewise. * config/aarch64/aarch64-simd.md (and<mode>3<vczle><vczbe>): Allow FMOV codegen. * config/aarch64/aarch64.cc (aarch64_simd_valid_and_imm_fmov): New. (aarch64_output_fmov): Likewise. * config/aarch64/constraints.md (Df): New constraint. * config/aarch64/predicates.md (aarch64_reg_or_and_imm): Update predicate to support FMOV codegen. gcc/testsuite/ChangeLog: * gcc.target/aarch64/fmov-1-be.c: New test. * gcc.target/aarch64/fmov-1-le.c: New test. * gcc.target/aarch64/fmov-2-be.c: New test. * gcc.target/aarch64/fmov-2-le.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2025-05-16aarch64: Recognize vector permute patterns which can be interpreted as AND ↵Pengxuan Zheng1-0/+36
[PR100165] Certain permute that blends a vector with zero can be interpreted as an AND of a mask. This idea was suggested by Richard Sandiford when he was reviewing my patch which tries to optimizes certain vector permute with the FMOV instruction for the aarch64 target. For example, for the aarch64 target, at present: v4hi f_v4hi (v4hi x) { return __builtin_shuffle (x, (v4hi){ 0, 0, 0, 0 }, (v4hi){ 4, 1, 6, 3 }); } generates: f_v4hi: uzp1 v0.2d, v0.2d, v0.2d adrp x0, .LC0 ldr d31, [x0, #:lo12:.LC0] tbl v0.8b, {v0.16b}, v31.8b ret .LC0: .byte -1 .byte -1 .byte 2 .byte 3 .byte -1 .byte -1 .byte 6 .byte 7 With this patch, it generates: f_v4hi: mvni v31.2s, 0xff, msl 8 and v0.8b, v0.8b, v31.8b ret This patch also provides a target-independent routine for detecting vector permute patterns which can be interpreted as AND. Changes since v1: * v2: Rework the patch to only perform the optimization for aarch64 by calling the target independent routine vec_perm_and_mask. PR target/100165 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_evpc_and): New. (aarch64_expand_vec_perm_const_1): Call aarch64_evpc_and. * optabs.cc (vec_perm_and_mask): New. * optabs.h (vec_perm_and_mask): New prototype. gcc/testsuite/ChangeLog: * gcc.target/aarch64/and-be.c: New test. * gcc.target/aarch64/and-le.c: New test. Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
2025-05-16aarch64: Fix an oversight in aarch64_evpc_reencodePengxuan Zheng1-1/+1
Some fields (e.g., zero_op0_p and zero_op1_p) of the struct "newd" may be left uninitialized in aarch64_evpc_reencode. This can cause reading of uninitialized data. I found this oversight when testing my patches on and/fmov optimizations. This patch fixes the bug by zero initializing the struct. Pushed as obvious after bootstrap/test on aarch64-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_evpc_reencode): Zero initialize newd.
2025-05-16aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]Andrew Pinski1-1/+1
Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for the variant, it is just easier to add the cast to unsigned in the usage in driver-aarch64.cc. Build and tested on aarch64-linux-gnu. gcc/ChangeLog: PR target/118603 * config/aarch64/driver-aarch64.cc (aarch64_cpu_data): Add cast to unsigned to VARIANT of the define AARCH64_CORE. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-16aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtypeAndrew Pinski1-2/+2
There is a narrowing warning in aarch64_detect_vector_stmt_subtype about gather_load_x32_cost and gather_load_x64_cost converting from int to unsigned. These fields are always unsigned and even the constructor for sve_vec_cost takes an unsigned. So let's just move the fields over to unsigned. Build and tested for aarch64-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (struct sve_vec_cost): Change gather_load_x32_cost and gather_load_x64_cost fields to unsigned. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-05-16Manual tweak of some end_sequence callersRichard Sandiford1-10/+2
This patch mops up obvious redundancies that weren't caught by the automatic regexp replacements in earlier patches. It doesn't do anything with genemit.cc, since that will be part of a later series. gcc/ * config/arm/arm.cc (arm_gen_load_multiple_1): Simplify use of end_sequence. (arm_gen_store_multiple_1): Likewise. * expr.cc (gen_move_insn): Likewise. * gentarget-def.cc (main): Likewise.
2025-05-16Automatic replacement of end_sequence/return pairsRichard Sandiford4-13/+5
This is the result of using a regexp to replace: rtx( |_insn *)<stuff> = end_sequence (); return <stuff>; with: return end_sequence (); gcc/ * asan.cc (asan_emit_allocas_unpoison): Directly return the result of end_sequence. (hwasan_emit_untag_frame): Likewise. * config/aarch64/aarch64-speculation.cc (aarch64_speculation_clobber_sp): Likewise. (aarch64_speculation_establish_tracker): Likewise. * config/arm/arm.cc (arm_call_tls_get_addr): Likewise. * config/avr/avr-passes.cc (avr_parallel_insn_from_insns): Likewise. * config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn): Likewise. * tree-outof-ssa.cc (emit_partition_copy): Likewise.
2025-05-16Automatic replacement of get_insns/end_sequence pairsRichard Sandiford44-305/+151
This is the result of using a regexp to replace instances of: <stuff> = get_insns (); end_sequence (); with: <stuff> = end_sequence (); where the indentation is the same for both lines, and where there might be blank lines inbetween. gcc/ * asan.cc (asan_clear_shadow): Use the return value of end_sequence, rather than calling get_insns separately. (asan_emit_stack_protection, asan_emit_allocas_unpoison): Likewise. (hwasan_frame_base, hwasan_emit_untag_frame): Likewise. * auto-inc-dec.cc (attempt_change): Likewise. * avoid-store-forwarding.cc (process_store_forwarding): Likewise. * bb-reorder.cc (fix_crossing_unconditional_branches): Likewise. * builtins.cc (expand_builtin_apply_args): Likewise. (expand_builtin_return, expand_builtin_mathfn_ternary): Likewise. (expand_builtin_mathfn_3, expand_builtin_int_roundingfn): Likewise. (expand_builtin_int_roundingfn_2, expand_builtin_saveregs): Likewise. (inline_string_cmp): Likewise. * calls.cc (expand_call): Likewise. * cfgexpand.cc (expand_asm_stmt, pass_expand::execute): Likewise. * cfgloopanal.cc (init_set_costs): Likewise. * cfgrtl.cc (insert_insn_on_edge, prepend_insn_to_edge): Likewise. (rtl_lv_add_condition_to_bb): Likewise. * config/aarch64/aarch64-speculation.cc (aarch64_speculation_clobber_sp): Likewise. (aarch64_speculation_establish_tracker): Likewise. (aarch64_do_track_speculation): Likewise. * config/aarch64/aarch64.cc (aarch64_load_symref_appropriately) (aarch64_expand_vector_init, aarch64_gen_ccmp_first): Likewise. (aarch64_gen_ccmp_next, aarch64_mode_emit): Likewise. (aarch64_md_asm_adjust): Likewise. (aarch64_switch_pstate_sm_for_landing_pad): Likewise. (aarch64_switch_pstate_sm_for_jump): Likewise. (aarch64_switch_pstate_sm_for_call): Likewise. * config/alpha/alpha.cc (alpha_legitimize_address_1): Likewise. (alpha_emit_xfloating_libcall, alpha_gp_save_rtx): Likewise. * config/arc/arc.cc (hwloop_optimize): Likewise. * config/arm/aarch-common.cc (arm_md_asm_adjust): Likewise. * config/arm/arm-builtins.cc: Likewise. * config/arm/arm.cc (require_pic_register): Likewise. (arm_call_tls_get_addr, arm_gen_load_multiple_1): Likewise. (arm_gen_store_multiple_1, cmse_clear_registers): Likewise. (cmse_nonsecure_call_inline_register_clear): Likewise. (arm_attempt_dlstp_transform): Likewise. * config/avr/avr-passes.cc (bbinfo_t::optimize_one_block): Likewise. (avr_parallel_insn_from_insns): Likewise. * config/avr/avr.cc (avr_prologue_setup_frame): Likewise. (avr_expand_epilogue): Likewise. * config/bfin/bfin.cc (hwloop_optimize): Likewise. * config/c6x/c6x.cc (c6x_expand_compare): Likewise. * config/cris/cris.cc (cris_split_movdx): Likewise. * config/cris/cris.md: Likewise. * config/csky/csky.cc (csky_call_tls_get_addr): Likewise. * config/epiphany/resolve-sw-modes.cc (pass_resolve_sw_modes::execute): Likewise. * config/fr30/fr30.cc (fr30_move_double): Likewise. * config/frv/frv.cc (frv_split_scc, frv_split_cond_move): Likewise. (frv_split_minmax, frv_split_abs): Likewise. * config/frv/frv.md: Likewise. * config/gcn/gcn.cc (move_callee_saved_registers): Likewise. (gcn_expand_prologue, gcn_restore_exec, gcn_md_reorg): Likewise. * config/i386/i386-expand.cc (ix86_expand_carry_flag_compare, ix86_expand_int_movcc): Likewise. (ix86_vector_duplicate_value, expand_vec_perm_interleave2): Likewise. (expand_vec_perm_vperm2f128_vblend): Likewise. (expand_vec_perm_2perm_interleave): Likewise. (expand_vec_perm_2perm_pblendv): Likewise. (expand_vec_perm2_vperm2f128_vblend, ix86_gen_ccmp_first): Likewise. (ix86_gen_ccmp_next): Likewise. * config/i386/i386-features.cc (scalar_chain::make_vector_copies): Likewise. (scalar_chain::convert_reg, scalar_chain::convert_op): Likewise. (timode_scalar_chain::convert_insn): Likewise. * config/i386/i386.cc (ix86_init_pic_reg, ix86_va_start): Likewise. (ix86_get_drap_rtx, legitimize_tls_address): Likewise. (ix86_md_asm_adjust): Likewise. * config/ia64/ia64.cc (ia64_expand_tls_address): Likewise. (ia64_expand_compare, spill_restore_mem): Likewise. (expand_vec_perm_interleave_2): Likewise. * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Likewise. * config/m32r/m32r.cc (gen_split_move_double): Likewise. * config/m32r/m32r.md: Likewise. * config/m68k/m68k.cc (m68k_call_tls_get_addr): Likewise. (m68k_call_m68k_read_tp, m68k_sched_md_init_global): Likewise. * config/m68k/m68k.md: Likewise. * config/microblaze/microblaze.cc (microblaze_call_tls_get_addr): Likewise. * config/mips/mips.cc (mips_call_tls_get_addr): Likewise. (mips_ls2_init_dfa_post_cycle_insn): Likewise. (mips16_split_long_branches): Likewise. * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Likewise. (nvptx_gen_shared_bcast, nvptx_propagate): Likewise. (workaround_uninit_method_1, workaround_uninit_method_2): Likewise. (workaround_uninit_method_3): Likewise. * config/or1k/or1k.cc (or1k_init_pic_reg): Likewise. * config/pa/pa.cc (legitimize_tls_address): Likewise. * config/pru/pru.cc (pru_expand_fp_compare, pru_reorg_loop): Likewise. * config/riscv/riscv-shorten-memrefs.cc (pass_shorten_memrefs::transform): Likewise. * config/riscv/riscv-vsetvl.cc (pre_vsetvl::emit_vsetvl): Likewise. * config/riscv/riscv.cc (riscv_call_tls_get_addr): Likewise. (riscv_frm_emit_after_bb_end): Likewise. * config/rl78/rl78.cc (rl78_emit_libcall): Likewise. * config/rs6000/rs6000.cc (rs6000_debug_legitimize_address): Likewise. * config/s390/s390.cc (legitimize_tls_address): Likewise. (s390_two_part_insv, s390_load_got, s390_va_start): Likewise. * config/sh/sh_treg_combine.cc (sh_treg_combine::make_not_reg_insn): Likewise. * config/sparc/sparc.cc (sparc_legitimize_tls_address): Likewise. (sparc_output_mi_thunk, sparc_init_pic_reg): Likewise. * config/stormy16/stormy16.cc (xstormy16_split_cbranch): Likewise. * config/xtensa/xtensa.cc (xtensa_copy_incoming_a7): Likewise. (xtensa_expand_block_set_libcall): Likewise. (xtensa_expand_block_set_unrolled_loop): Likewise. (xtensa_expand_block_set_small_loop, xtensa_call_tls_desc): Likewise. * dse.cc (emit_inc_dec_insn_before, find_shift_sequence): Likewise. (replace_read): Likewise. * emit-rtl.cc (reorder_insns, gen_clobber, gen_use): Likewise. * except.cc (dw2_build_landing_pads, sjlj_mark_call_sites): Likewise. (sjlj_emit_function_enter, sjlj_emit_function_exit): Likewise. (sjlj_emit_dispatch_table): Likewise. * expmed.cc (expmed_mult_highpart_optab, expand_sdiv_pow2): Likewise. * expr.cc (convert_mode_scalar, emit_move_multi_word): Likewise. (gen_move_insn, expand_cond_expr_using_cmove): Likewise. (expand_expr_divmod, expand_expr_real_2): Likewise. (maybe_optimize_pow2p_mod_cmp, maybe_optimize_mod_cmp): Likewise. * function.cc (emit_initial_value_sets): Likewise. (instantiate_virtual_regs_in_insn, expand_function_end): Likewise. (get_arg_pointer_save_area, make_split_prologue_seq): Likewise. (make_prologue_seq, gen_call_used_regs_seq): Likewise. (thread_prologue_and_epilogue_insns): Likewise. (match_asm_constraints_1): Likewise. * gcse.cc (prepare_copy_insn): Likewise. * ifcvt.cc (noce_emit_store_flag, noce_emit_move_insn): Likewise. (noce_emit_cmove): Likewise. * init-regs.cc (initialize_uninitialized_regs): Likewise. * internal-fn.cc (expand_POPCOUNT): Likewise. * ira-emit.cc (emit_move_list): Likewise. * ira.cc (ira): Likewise. * loop-doloop.cc (doloop_modify): Likewise. * loop-unroll.cc (compare_and_jump_seq): Likewise. (unroll_loop_runtime_iterations, insert_base_initialization): Likewise. (split_iv, insert_var_expansion_initialization): Likewise. (combine_var_copies_in_loop_exit): Likewise. * lower-subreg.cc (resolve_simple_move,resolve_shift_zext): Likewise. * lra-constraints.cc (match_reload, check_and_process_move): Likewise. (process_addr_reg, insert_move_for_subreg): Likewise. (process_address_1, curr_insn_transform): Likewise. (inherit_reload_reg, process_invariant_for_inheritance): Likewise. (inherit_in_ebb, remove_inheritance_pseudos): Likewise. * lra-remat.cc (do_remat): Likewise. * mode-switching.cc (commit_mode_sets): Likewise. (optimize_mode_switching): Likewise. * optabs.cc (expand_binop, expand_twoval_binop_libfunc): Likewise. (expand_clrsb_using_clz, expand_doubleword_clz_ctz_ffs): Likewise. (expand_doubleword_popcount, expand_ctz, expand_ffs): Likewise. (expand_absneg_bit, expand_unop, expand_copysign_bit): Likewise. (prepare_float_lib_cmp, expand_float, expand_fix): Likewise. (expand_fixed_convert, gen_cond_trap): Likewise. (expand_atomic_fetch_op): Likewise. * ree.cc (combine_reaching_defs): Likewise. * reg-stack.cc (compensate_edge): Likewise. * reload1.cc (emit_input_reload_insns): Likewise. * sel-sched-ir.cc (setup_nop_and_exit_insns): Likewise. * shrink-wrap.cc (emit_common_heads_for_components): Likewise. (emit_common_tails_for_components): Likewise. (insert_prologue_epilogue_for_components): Likewise. * tree-outof-ssa.cc (emit_partition_copy): Likewise. (insert_value_copy_on_edge): Likewise. * tree-ssa-loop-ivopts.cc (computation_cost): Likewise.
2025-05-16RISC-V: Combine vec_duplicate + vsub.vv to vsub.vx on GR2VR costPan Li3-1/+19
This patch would like to combine the vec_duplicate + vsub.vv to the vsub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP) \ void \ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x; \ } DEF_VX_BINARY(int32_t, -) Before this patch: 10 │ test_binary_vx_sub: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma // Deleted if GR2VR cost zero 13 │ vmv.v.x v2,a2 // Ditto. 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vsub.vv v1,v2,v1 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_binary_vx_sub: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vsub.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new pattern to convert vec_duplicate + vsub.vv to vsub.vx. * config/riscv/riscv.cc (riscv_rtx_costs): Add minus as plus op. * config/riscv/vector-iterators.md: Add minus to iterator any_int_binop_no_shift_vx. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-05-15[RISC-V][PR target/120223] Don't use bset/binv for XTHEADBSJeff Law1-2/+2
Thead has the XTHEADBB extension which has a lot of overlap with Zbb. I made the incorrect assumption that XTHEADBS would largely be like Zbs when generalizing Shreya's work. As a result we can't use the operation synthesis code for IOR/XOR because we don't have binv/bset like capabilities. I should have double checked on XTHEADBS, my bad. Anyway, the fix is trivial. Don't allow bset/binv based on XTHEADBS. Already spun in my tester. Spinning in the pre-commit CI system now. PR target/120223 gcc/ * config/riscv/riscv.cc (synthesize_ior_xor): XTHEADBS does not have single bit manipulations. gcc/testsuite/ * gcc.target/riscv/pr120223.c: New test.
2025-05-15Fix regression from x86 multi-epilogue tuningRichard Biener1-7/+3
With the avx512_two_epilogues tuning enabled for zen4 and zen5 the gcc.target/i386/vect-epilogues-5.c testcase below regresses and ends up using AVX2 sized vectors for the masked epilogue rather than AVX512 sized vectors. The following patch rectifies this and adds coverage for the intended behavior. * config/i386/i386.cc (ix86_vector_costs::finish_cost): Do not suggest a first epilogue mode for AVX512 sized main loops with X86_TUNE_AVX512_TWO_EPILOGUES as that interferes with using a masked epilogue. * gcc.target/i386/vect-epilogues-1.c: New testcase. * gcc.target/i386/vect-epilogues-2.c: Likewise. * gcc.target/i386/vect-epilogues-3.c: Likewise. * gcc.target/i386/vect-epilogues-4.c: Likewise. * gcc.target/i386/vect-epilogues-5.c: Likewise.
2025-05-14RISC-V: Add augmented hypervisor series extensions.Jiawei2-0/+108
The augmented hypervisor series extensions 'sha'[1] is a new profile-defined extension series that captures the full set of features that are mandated to be supported along with the 'H' extension. [1] https://github.com/riscv/riscv-profiles/blob/main/src/rva23-profile.adoc#rva23s64-profile Version log: Update implements, fix testcase format. gcc/ChangeLog: * config/riscv/riscv-ext.def: New extension defs. * config/riscv/riscv-ext.opt: Ditto. * doc/riscv-ext.texi: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-55.c: New test.
2025-05-14RISC-V: Drop duplicate build rule for riscv-ext.opt [NFC]Kito Cheng1-2/+0
gcc/ChangeLog: * config/riscv/t-riscv: Drop duplicate build rule for riscv-ext.opt.
2025-05-14RISC-V: Regen riscv-ext.opt.urlsKito Cheng1-0/+2
gcc/ChangeLog: * config/riscv/riscv-ext.opt.urls: Regenerate.
2025-05-14Consider frequency in cost estimation when converting scalar to vector.liuhongt2-83/+105
n some benchmark, I notice stv failed due to cost unprofitable, but the igain is inside the loop, but sse<->integer conversion is outside the loop, current cost model doesn't consider the frequency of those gain/cost. The patch weights those cost with frequency. gcc/ChangeLog: PR target/120215 * config/i386/i386-features.cc (scalar_chain::mark_dual_mode_def): Weight cost of integer<->sse move with bb frequency when it's optimized_for_speed_p. (general_scalar_chain::compute_convert_gain): Ditto, and adjust function prototype to return true/false when cost model is profitable or not. (timode_scalar_chain::compute_convert_gain): Ditto. (convert_scalars_to_vector): Adjust after the upper two function prototype are changed. * config/i386/i386-features.h (class scalar_chain): Change n_integer_to_sse/n_sse_to_integer to cost_sse_integer, and add weighted_cost_sse_integer. (class general_scalar_chain): Adjust prototype to return bool intead of int. (class timode_scalar_chain): Ditto.
2025-05-14s390: Fix tf_to_fprx2Stefan Schulze Frielinghaus1-1/+1
Insn tf_to_fprx2 moves a TF value into a floating-point register pair. For alternative 0, the input is a vector register, however, in the else case instruction ldr is emitted which expects floating-point register operands only. Thus, this works only for vector registers which overlap with floating-point registers. Replace ldr with vlr so that the remaining vector registers are dealt with, too. Emitting a vlr instead of a ldr is fine since the destination register %v0 is part of a floating-point register pair which means that the low half of %v0 is ignored in the end anyway and therefore may be clobbered. gcc/ChangeLog: * config/s390/vector.md: Fix tf_to_fprx2 by using vlr instead of ldr.
2025-05-13RISC-V: Introduce riscv_ext_info_t to hold extension metadataKito Cheng1-0/+8
Define a new riscv_ext_info_t struct to aggregate all ISA extension fields (name, version, flags, implied extensions, bitmask and extra flags) generated from riscv-ext.def. Also adjust riscv_ext_flag_table_t and riscv_implied_info_t to make it able to not hold extension name, this part will refactor in later patchs. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (riscv_ext_info_t): New struct. (opt_var_ref_t): Adjust order. (cl_opt_var_ref_t): Ditto. (riscv_ext_flag_table_t): Adjust order, and add a new construct that not hold the extension name. (riscv_version_t): New struct. (riscv_implied_info_t): Adjust order, and add a new construct that not hold the extension name. (apply_extra_extension_flags): New function. (riscv_ext_infos): New. (riscv_implied_info): Adjust. * config/riscv/riscv-opts.h (EXT_FLAG_MACRO): New macro. (BITMASK_NOT_YET_ALLOCATED): New macro.