aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
12 daysRISC-V: Vector pesudoinsns with x0 operand to use imm 0Vineet Gupta1-3/+13
A couple of Vector pseudoinstructions use x0 scalar which could be inefficient on wider uarches due to regfile crossing. Instead use the imm 0 form, which should be functionally equivalent. pseudoinsn orig insn with x0 this patch -------------------- -------------------- ------------------- vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0 vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported) gcc/ChangeLog: * config/riscv/vector.md: vncvt substitute vnsrl. vnsrl with x0 replace with immediate 0. vneg substitute vrsub. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change expected pattern. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto. * gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto. * gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
13 daysRISC-V: unrecognizable insn ICE in xtheadvector/pr114194.c on 32bit targetsJin Ma2-8/+8
This is a follow-up to the patch below to avoid generating unrecognized vsetivl instructions for XTheadVector. https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html PR target/118601 gcc/ChangeLog: * config/riscv/riscv-string.cc (expand_block_move): Check with new constraint 'vl' instead of 'K'. (expand_vec_setmem): Likewise. (expand_vec_cmpmem): Likewise. * config/riscv/riscv-v.cc (force_vector_length_operand): Likewise. (expand_load_store): Likewise. (expand_strided_load): Likewise. (expand_strided_store): Likewise. (expand_lanes_load_store): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to... * gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here. * gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test. * gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test. Reported-by: Edwin Lu <ewlu@rivosinc.com>
13 daysRISC-V: Drop __riscv_vendor_feature_bitsYangyu Chen1-7/+0
As discussed from RISC-V C-API PR #101 [1], As discussed in #96, current interface is insufficient to support some cases, like a vendor buying a CPU IP from the upstream vendor but using their own mvendorid and custom features from the upstream vendor. In this case, we might need to add these extensions for each downstream vendor many times. Thus, making __riscv_vendor_feature_bits guarded by mvendorid is not a good idea. So, drop __riscv_vendor_feature_bits for now, and we should have time to discuss a better solution. [1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/101 Signed-off-by: Yangyu Chen <cyy@cyyself.name> gcc/ChangeLog: * config/riscv/riscv-feature-bits.h (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop. (struct riscv_vendor_feature_bits): Drop. libgcc/ChangeLog: * config/riscv/feature_bits.c (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop. (__init_riscv_features_bits_linux): Drop.
13 days[PR target/115478] Accept ADD, IOR or XOR when combining objects with no ↵Jeff Law2-23/+31
bits in common So the change to prefer ADD over IOR for combining two objects with no bits in common is (IMHO) generally good. It has some minor fallout. In particular the aarch64 port (and I suspect others) have patterns that recognize IOR, but not PLUS or XOR for these cases and thus tests which expected to optimize with IOR are no longer optimizing. Roger suggested using a code iterator for this purpose. Richard S. suggested a new match operator to cover those cases. I really like the match operator idea, but as Richard S. notes in the PR it would require either not validating the "no bits in common", which dramatically reduces the utility IMHO or we'd need some work to allow consistent results without polluting the nonzero bits cache. So this patch goes back to Roger's idea of just using a match iterator in the aarch64 backend (and presumably anywhere else we see this popping up). Bootstrapped and regression tested on aarch64-linux-gnu where it fixes bitint-args.c (as expected). PR target/115478 gcc/ * config/aarch64/iterators.md (any_or_plus): New code iterator. * config/aarch64/aarch64.md (extr<mode>5_insn): Use any_or_plus. (extr<mode>5_insn_alt, extrsi5_insn_uxtw): Likewise. (extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise. gcc/testsuite/ * gcc.target/aarch64/bitint-args.c: Update expected output.
13 daysaarch64: Update fp8 dependenciesAndrew Carlotti1-5/+5
We agreed with LLVM developers to not enforce the architectural dependencies between fp8 multiplication features, and they have already been removed from LLVM and Binutils. Remove them from GCC as well. gcc/ChangeLog: * config/aarch64/aarch64-option-extensions.def (SSVE_FP8FMA): Adjust formatting. (FP8DOT4): Replace FP8FMA dependency with FP8. (SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8. (FP8DOT2): Replace FP8DOT4 dependency with FP8. (SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected defines. * gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target pragmas. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c: Ditto. * gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c: Ditto. * gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto. * gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.
13 daysx86: Correct ASM_OUTPUT_SYMBOL_REFH.J. Lu1-1/+1
x is not a macro argument. It just happens to work as final.cc passes x for 2nd argument: final.cc: ASM_OUTPUT_SYMBOL_REF (file, x); PR target/118825 * config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with SYM. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
13 daysMIPS: Add some floating point instructions support for MIPSr6Jie Mei5-10/+86
This patch adds some of the float point instructions from MIPS32 Release 6(mips32r6) with their respective built-in functions and tests: min_a_s, min_a_d max_a_s, max_a_d rint_s, rint_d class_s, class_d gcc/ChangeLog: * config/mips/i6400.md (i6400_fpu_minmax): Include fclass type. (i6400_fpu_fadd): Include frint type. * config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry for __builtin_mipsr6_xxx. (MIPSR6_BUILTIN_PURE): Same as above. (CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d) (CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d) (CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d): New code_aliasing macros. (mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s, max_a_d, class_s, class_d builtins. * config/mips/mips.h (ISA_HAS_FRINT): Define a new macro. (ISA_HAS_FCLASS): Same as above. * config/mips/mips.md (UNSPEC_FRINT): New unspec. (UNSPEC_FCLASS): Same as above. (type): Add frint and fclass. (fmin_a_<mode>): Generates MINA.fmt instructions. (fmax_a_<mode>): Generates MAXA.fmt instructions. (rint<mode>2): Generates RINT.fmt instructions. (fclass_<mode>): Generates CLASS.fmt instructions. * config/mips/p6600.md (p6600_fpu_fadd): Include frint type. (p6600_fpu_fabs): Include fclass type. gcc/testsuite/ChangeLog: * gcc.target/mips/mips-class.c: New tests for MIPSr6 * gcc.target/mips/mips-minamaxa.c: Same as above. * gcc.target/mips/mips-rint.c: Same as above. Signed-off-by: Jie Mei <jie.mei@oss.cipunited.com> Co-authored-by: Xi Ruoyao <xry111@xry111.site>
14 daysi386: Fix AVX512BW intrin header with __OPTIMIZE__ [PR 118813]Haochen Jiang1-1/+1
When moving intrins around for AVX10 implementation in GCC 14, the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__", leading to the intrin file not `-Wsystem-headers -Wundef` clean since r14-4490. gcc/ChangeLog: PR target/118813 * config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__ wrap.
14 days[gcn] mkoffload.cc: Print fatal error if -march has no multilib but generic hasTobias Burnus1-7/+94
Assume that a distro has configured, e.g., a gfx9-generic multilib but not for gfx902. In that case, mkoffload would fail to link with "error: incompatible mach". With this commit, an error is printed suggesting to try the associated generic architecture instead. The behavior is unchanged if there is a multilib available for the specific ISA or when there is also no multilib for the generic ICA. Note: The build of generic multilibs are currently not enabled by default; they also require the linker/assembler of LLVM 19 or newer and, in particular, for the execution a future ROCm release. (The next one? In any case, 6.3.2 does not support generic ISAs, yet.) gcc/ChangeLog: * config/gcn/mkoffload.cc (enum elf_arch_code): Add EF_AMDGPU_MACH_AMDGCN_NONE. (elf_arch): Use enum elf_arch_code as type. (tool_cleanup): Silence warning by removing tailing '.' from error. (get_arch_name): Return enum elf_arch_code. (check_for_missing_lib): New; print fatal error if the multilib is not available but it is for the associate generic ISA. (main): Call it.
2025-02-10i386: Change RTL representation of bt[lq] [PR118623]Jakub Jelinek1-33/+33
The following testcase is miscompiled because of RTL represententation of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it actually does. Let's look e.g. at (define_insn_and_split "*jcc_bt<mode>" [(set (pc) (if_then_else (match_operator 0 "bt_comparison_operator" [(zero_extract:SWI48 (match_operand:SWI48 1 "nonimmediate_operand") (const_int 1) (match_operand:QI 2 "nonmemory_operand")) (const_int 0)]) (label_ref (match_operand 3)) (pc))) (clobber (reg:CC FLAGS_REG))] "(TARGET_USE_BT || optimize_function_for_size_p (cfun)) && (CONST_INT_P (operands[2]) ? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode) && INTVAL (operands[2]) >= (optimize_function_for_size_p (cfun) ? 8 : 32)) : !memory_operand (operands[1], <MODE>mode)) && ix86_pre_reload_split ()" "#" "&& 1" [(set (reg:CCC FLAGS_REG) (compare:CCC (zero_extract:SWI48 (match_dup 1) (const_int 1) (match_dup 2)) (const_int 0))) (set (pc) (if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)]) (label_ref (match_dup 3)) (pc)))] { operands[0] = shallow_copy_rtx (operands[0]); PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0]))); }) The define_insn part in RTL describes exactly what it does, jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ). The problem is with what it splits into. put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU, nc for NE and GEU and ICEs otherwise. CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb, in those cases e.g. for add we have (set (reg:CCC flags) (compare:CCC (plus:M x y) x)) and use (ltu (reg:CCC flags) (const_int 0)) for carry set and (geu (reg:CCC flags) (const_int 0)) for carry not set. These cases model in RTL what is actually happening, compare in infinite precision x from the result of finite precision addition in M mode and if it is less than unsigned (i.e. overflow happened), carry is set. Another use of CCCmode is in UNSPEC_* patterns, those are used with (eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset, given the UNSPEC no big deal, the middle-end doesn't know what means set or unset. But for the bt{l,q}; j{c,nc} case the above splits it into (set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0))) for bt and (set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc))) for the bit set case (so that the jump expands to jc) and ne for the bit not set case (so that the jump expands to jnc). Similarly for the different splitters for cmov and set{c,nc} etc. The problem is that when the middle-end reads this RTL, it feels the exact opposite to it. If zero_extract is 1, flags is set to comparison of 1 and 0 and that would mean using ne ne in the if_then_else, and vice versa. So, in order to better describe in RTL what is actually happening, one possibility would be to swap the behavior of put_condition_code and use NE + LTU -> c and EQ + GEU -> nc rather than the current EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The following patch uses a more limited approach, instead of representing bt{l,q}; j{c,nc} case as written above it uses (set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract))) and (set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc))) which uses the existing put_condition_code but describes what the insns actually do in RTL clearly. If zero_extract is 1, then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU, 0U >= 0U. The patch adjusts the *bt<mode> define_insn and all the splitters to it and its comparisons/conditional moves/setXX. 2025-02-10 Jakub Jelinek <jakub@redhat.com> PR target/118623 * config/i386/i386.md (*bt<mode>): Represent bt as compare:CCC of const0_rtx and zero_extract rather than zero_extract and const0_rtx. (*bt<SWI48:mode>_mask): Likewise. (*jcc_bt<mode>): Likewise. Use LTU and GEU as flags test instead of EQ and NE. (*jcc_bt<mode>_mask): Likewise. (*jcc_bt<SWI48:mode>_mask_1): Likewise. (Help combine recognize bt followed by cmov splitter): Likewise. (*bt<mode>_setcqi): Likewise. (*bt<mode>_setncqi): Likewise. (*bt<mode>_setnc<mode>): Likewise. (*bt<mode>_setncqi_2): Likewise. (*bt<mode>_setc<mode>_mask): Likewise. * gcc.c-torture/execute/pr118623.c: New test.
2025-02-08[RISC-V][PR target/118146] Fix ICE for unsupported modesJeff Law1-4/+5
There's some special case code in the risc-v move expander to try and optimize cases where the source is a subreg of a vector and the destination is a scalar mode. The code works fine except when we have no support for the given mode. ie HF or BF when those extensions aren't enabled. We'll end up tripping an assert in that case when we should have just let standard expansion do its thing. Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester to render a verdict before moving forward. PR target/118146 gcc/ * config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg of vector source better to avoid ICE. gcc/testsuite * gcc.target/riscv/pr118146-1.c: New test. * gcc.target/riscv/pr118146-2.c: New test.
2025-02-08GCN, nvptx: 'sorry, unimplemented: exception handling not supported'Thomas Schwinge2-0/+14
For GCN, this avoids ICEs further down the compilation pipeline. For nvptx, there's effectively no change: in presence of exception handling constructs, instead of 'sorry, unimplemented: target cannot support nonlocal goto', we now emit 'sorry, unimplemented: exception handling not supported'. Additionally, turn test cases into UNSUPPORTED if running into 'sorry, unimplemented: exception handling not supported'. gcc/ * config/gcn/gcn.md (exception_receiver): 'define_expand'. * config/nvptx/nvptx.md (exception_receiver): Likewise. gcc/testsuite/ * lib/gcc-dg.exp (gcc-dg-prune): Turn 'sorry, unimplemented: exception handling not supported' into UNSUPPORTED. * gcc.dg/pr104464.c: Remove GCN XFAIL. libstdc++-v3/ * testsuite/lib/prune.exp (libstdc++-dg-prune): Turn 'sorry, unimplemented: exception handling not supported' into UNSUPPORTED.
2025-02-08i386: Fix ICE with conditional QI/HI vector maxmin [PR118776]Jakub Jelinek1-9/+9
The following testcase ICEs starting with GCC 12 since r12-4526 although the bug has been introduced already in r12-2751. The problem was in the addition of cond_<code><mode> define_expand which uses nonimmediate_operand predicates for both maxmin operands for all VI1248_AVX512VLBW modes. It works fine with VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL define_expand uses ix86_fixup_binary_operands_no_copy and the *avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses % in constraint and !(MEM_P && MEM_P) check in condition (and <code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator does that too), but eventhough the 8-bit and 16-bit element maxmin is commutative too, the <mask_codefor><code><mode>3<mask_name> define_insn with VI12_AVX512VL iterator didn't use % in constraint to make it commutative. So, e.g. cond_umaxv32qi define_expand allowed nonimmediate_operand for both umax operands, but used gen_umaxv32qi_mask which wasn't commutative and only allowed nonimmediate_operand for the second operand. The following patch fixes it by keeping the <code><mode>3 VI124_256_AVX512F_AVX512BW define_expand as is (it does ix86_fixup_binary_operands_no_copy) but extending the <code><mode>3_mask define_expand from VI48_AVX512VL to VI1248_AVX512VLBW which keeps the current modes with their ISA conditions and adds the VI12_AVX512VL modes under additional TARGET_AVX512BW condition, and turning the actual define_insn into an * prefixed name (which it was before just for the non-masked case) and having the same commutative operand handling as in other define_insns. 2025-02-08 Jakub Jelinek <jakub@redhat.com> PR target/118776 * config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW iterator rather than VI48_AVX512VL. (<mask_codefor><code><mode>3<mask_name>): Rename to ... (*avx512bw_<code><mode>3<mask_name>): ... this. Use nonimmediate_operand rather than register_operand predicate and %v rather than v constraint for operand 1 and adjust condition to reject MEMs in both operand 1 and 2. * gcc.target/i386/pr118776.c: New test.
2025-02-07aarch64: gimple fold aes[ed] [PR114522]Andrew Pinski1-0/+29
Instead of waiting to get combine/rtl optimizations fixed here. This fixes the builtins at the gimple level. It should provide for slightly faster compile time since we have a simplification earlier on. Built and tested for aarch64-linux-gnu. gcc/ChangeLog: PR target/114522 * config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function. (aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese and crypto_aesd. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-02-07arm: Prefer POP {lo-reg} over LDR lo-reg, ... for thumb2 [PR118089]Richard Earnshaw1-42/+57
For thumb2, popping a single low register off the stack should prefer POP over LDR to mirror the behaviour of the PUSH on entry. This saves a couple of bytes in the resulting image. This is a relatively niche case as it's rare to push a single low register onto the stack, but still worth getting right. Whilst fixing this I've also restructured the code here somewhat to fix a bug I observed by inspection and to improve the code slightly. Firstly, the single register case is hoisted above the main loop. This not only avoids creating some RTL that immediately becomes garbage but also avoids us needing to check for this case in every iteration of the main loop body. Secondly, we iterate over just the non-zero bits in the reg mask rather than every bit and then checking if there's work to do for that bit. Finally, when emitting a pop that also pops SP off the stack we shouldn't be emitting a stack-adjust CFA note. The new SP value comes from the popped value, not from an adjustment of the previous SP value. gcc: PR target/118089 * config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure. Don't emit LDR on thumb2 when POP can be used for smaller code. Don't add a CFA adjust note when SP is popped off the stack. gcc/testsuite: PR target/118089 * gcc.target/arm/thumb2-pop-loreg.c: New test.
2025-02-07arm: fix ICE due to fix for POP {PC} changeRichard Earnshaw1-24/+27
My earlier change for making the compiler prefer POP {PC} over LDR PC, [SP], #4 had a slightly unexpected consequence in that we now also call arm_emit_multi_reg_pop to handle single register pops when the register is not PC. This exposed a latent bug in this function where the dwarf unwinding notes on the single-register POP were not being set correctly. gcc/ PR target/118089 * config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust note to single-register POP instructions.
2025-02-07RISC-V: Make VXRM as global register [PR118103]Pan Li1-1/+3
Inspired by PR118103, the VXRM register should be treated almost the same as the FRM register, aka cooperatively-managed global register. Thus, add the VXRM to global_regs to avoid the elimination by the late-combine pass. For example as below code: 21 │ 22 │ void compute () 23 │ { 24 │ size_t vl = __riscv_vsetvl_e16m1 (N); 25 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl); 26 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl); 27 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl); 28 │ 29 │ __riscv_vse16_v_u16m1 (c, vc, vl); 30 │ } 31 │ 32 │ int main () 33 │ { 34 │ initialize (); 35 │ compute(); 36 │ 37 │ return 0; 38 │ } After compile with -march=rv64gcv -O3, we will have: 30 │ compute: 31 │ csrwi vxrm,2 32 │ lui a3,%hi(a) 33 │ lui a4,%hi(b) 34 │ addi a4,a4,%lo(b) 35 │ vsetivli zero,4,e16,m1,ta,ma 36 │ addi a3,a3,%lo(a) 37 │ vle16.v v2,0(a4) 38 │ vle16.v v1,0(a3) 39 │ lui a4,%hi(c) 40 │ addi a4,a4,%lo(c) 41 │ vaaddu.vv v1,v1,v2 42 │ vse16.v v1,0(a4) 43 │ ret 44 │ .size compute, .-compute 45 │ .section .text.startup,"ax",@progbits 46 │ .align 1 47 │ .globl main 48 │ .type main, @function 49 │ main: | // csrwi vxrm,2 deleted after inline 50 │ addi sp,sp,-16 51 │ sd ra,8(sp) 52 │ call initialize 53 │ lui a3,%hi(a) 54 │ lui a4,%hi(b) 55 │ vsetivli zero,4,e16,m1,ta,ma 56 │ addi a4,a4,%lo(b) 57 │ addi a3,a3,%lo(a) 58 │ vle16.v v2,0(a4) 59 │ vle16.v v1,0(a3) 60 │ lui a4,%hi(c) 61 │ addi a4,a4,%lo(c) 62 │ li a0,0 63 │ vaaddu.vv v1,v1,v2 The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/118103 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_conditional_register_usage): Add the VXRM as the global_regs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr118103-2.c: New test. * gcc.target/riscv/rvv/base/pr118103-run-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-02-07aarch64: Fix bootstrap with --enable-checking=release [PR118771]Andrew Pinski1-0/+3
With release checking we get an uninitialization warning inside aarch64_split_move because of jump threading for the case of `npieces==0` but `npieces` is never 0 (but there is no way the compiler can know that. So this fixes the issue by adding a `gcc_assert` to the function which asserts that `npieces > 0` and fixes the uninitialization warning. Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release). The warning: aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)': aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized] 3418 | if (reg_overlap_mentioned_p (dst_pieces[0], src)) | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ aarch64.cc:3408:20: note: 'dst_pieces' declared here 3408 | auto_vec<rtx, 4> dst_pieces, src_pieces; | ^~~~~~~~~~ PR target/118771 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is greater than 0. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-02-07[gcn] Fix the output amdhsa.versionTobias Burnus2-7/+14
The amdhsa.version depends on the code object version; while V3 had 1.0, V4 has 1.1 and V5 (and V6) have 1.2. GCC used 1.0 but generated since a while either V4 or, with -march=gfx...-generic, V6. Now it uses the proper version again. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_hsa_declare_function_name): Update 'amdhsa.version' output to match used code version. * config/gcn/gen-gcn-device-macros.awk: Add a comment to crosslink.
2025-02-07LoongArch: Correct the mode for mask{eq,ne}zXi Ruoyao1-7/+3
For mask{eq,ne}z, rk is always compared with 0 in the full width, thus the mode for rk should be X. I found the issue reviewing a patch fixing a similar issue for RISC-V XTheadCondMov [1], but interestingly I cannot find a test case really blowing up on LoongArch. But as the issue is obvious enough let's fix it anyway so it won't blow up in the future. [1]: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674004.html gcc/ChangeLog: * config/loongarch/loongarch.md (*sel<code><GPR:mode>_using_<GPR2:mode>): Rename to ... (*sel<code><GPR:mode>_using_<X:mode>): ... here. (GPR2): Remove as nothing uses it now.
2025-02-07[gcn] Add gfx9-generic and generic-associated gfx*Tobias Burnus2-11/+236
This patch adds gfx9-generic, completing the gfx*-generic support. It also adds all gfx* devices that are part of any of the gfx*-generic, i.e. gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153. gcc/ChangeLog: * config/gcn/gcn-devices.def (GCN_DEVICE): Add gfx9-generic, gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034, gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153. Add a currently unused column linking, a specific ISA to a generic one (if it exists). * config/gcn/gcn-tables.opt: Regenerate * doc/invoke.texi (AMD GCN): Add the the new gfc... and the older gfx{10-3,11}-generic to -march= as 'experimental'.
2025-02-07[gcn] Fix gfx906's sramecc settingTobias Burnus2-2/+3
When compiling with -g, mkoffload.cc creates a device object file itself; however, in order that the linker dos not complain, the ELF flags must match what the compiler / linker does. For gfx906, the assembler defaults to sramecc = any, but gcn-devices.def contained unsupported, which is not the same - causing link errors. That's a regression caused by commit r15-4540-ga6b26e5ea09779 - which can be best seen by looking at the changes to mkoffload.cc. Additionally, this commit adds '...' to the GCN_DEVICE #define in gcn.cc to make it agnostic to the addition of fields. gcc/ChangeLog: * config/gcn/gcn-devices.def (GCN_DEVICE): Change sramecc for gfx906 to 'any'. * config/gcn/gcn.cc (GCN_DEVICE): Add tailing ... to #define.
2025-02-07ira: Add a target hook for callee-saved register cost scaleH.J. Lu1-0/+11
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b Author: Surya Kumari Jangala <jskumari@linux.ibm.com> Date: Tue Jun 25 08:37:49 2024 -0500 ira: Scale save/restore costs of callee save registers with block frequency scales the cost of saving/restoring a callee-save hard register in epilogue and prologue with the entry block frequency, which, if not optimizing for size, is 10000, for all targets. As the result, callee-saved registers may not be used to preserve local variable values across calls on some targets, like x86. Add a target hook for the callee-saved register cost scale in epilogue and prologue used by IRA. The default version of this target hook returns 1 if optimizing for size, otherwise returns the entry block frequency. Add an x86 version of this target hook to restore the old behavior prior to the above commit. PR rtl-optimization/111673 PR rtl-optimization/115932 PR rtl-optimization/116028 PR rtl-optimization/117081 PR rtl-optimization/117082 PR rtl-optimization/118497 * ira-color.cc (assign_hard_reg): Call the target hook for the callee-saved register cost scale in epilogue and prologue. * target.def (ira_callee_saved_register_cost_scale): New target hook. * targhooks.cc (default_ira_callee_saved_register_cost_scale): New. * targhooks.h (default_ira_callee_saved_register_cost_scale): Likewise. * config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale): New. (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise. * doc/tm.texi: Regenerated. * doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): New. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-02-06[PATCH] RISC-V: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST to correct enumCraig Blackmore1-4/+4
stack_protect_{set,test}_<mode> were showing up in RTL dumps as UNSPEC_COPYSIGN and UNSPEC_FMV_X_W due to UNSPEC_SSP_SET and UNSPEC_SSP_TEST being put in the unspecv enum instead of unspec. gcc/ChangeLog: * config/riscv/riscv.md: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST to unspec enum.
2025-02-06avr.opt.urls += -mcvtGeorg-Johann Lay1-0/+3
gcc/ * config/avr/avr.opt.urls: Add mcvt.
2025-02-06AVR: Add support for a Compact Vector Table (-mcvt).Georg-Johann Lay5-117/+142
Some AVR devices support a CVT: - Devices from the 0-series, 1-series, 2-series. - AVR16, AVR32, AVR64, AVR128 devices. The support is provided by means of a startup code file crt<mcu>-cvt.o from AVR-LibC v2.3 that can be linked instead of the traditional crt<mcu>.o. This patch adds a new command line option -mcvt that links that CVT startup code (or issues an error when the device doesn't support a CVT). PR target/118764 gcc/ * config/avr/avr.opt (-mcvt): New target option. * config/avr/avr-arch.h (AVR_CVT): New enum value. * config/avr/avr-mcus.def: Add AVR_CVT flag for devices that support it. * config/avr/avr.cc (avr_handle_isr_attribute) [TARGET_CVT]: Issue an error when a vector number larger that 3 is used. * config/avr/gen-avr-mmcu-specs.cc (McuInfo.have_cvt): New property. (print_mcu) <*avrlibc_startfile>: Use crt<mcu>-cvt.o depending on -mcvt (or issue an error when the device doesn't support a CVT). * doc/invoke.texi (AVR Options): Document -mcvt.
2025-02-06AVR: genmultilib.awk - Use more robust parsing of spaces.Georg-Johann Lay1-5/+32
gcc/ PR target/118768 * config/avr/genmultilib.awk: Parse the AVR_MCU lines in a more robust way w.r.t. white spaces.
2025-02-06LoongArch: Fix ICE caused by illegal calls to builtin functions [PR118561].Lulu Cheng1-2/+5
PR target/118561 gcc/ChangeLog: * config/loongarch/loongarch-builtins.cc (loongarch_expand_builtin_lsx_test_branch): NULL_RTX will not be returned when an error is detected. (loongarch_expand_builtin): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr118561.c: New test.
2025-02-05[committed] Disable ABS instruction on bfin portJeff Law1-6/+9
I was looking at a regression on the bfin port with a recent change to the IRA and stumbled across this just doing a general port healthyness evaluation. The ABS instruction in the blackfin ISA is defined as saturating on INT_MIN, which is a bit unexpected. We certainly can't use it when -fwrapv is enabled. Given the failures on the C23 uabs tests, I'm inclined to just disable the pattern completely. Fixes pr23047, uabs-2 and uabs-3. While it's not a regression, it's the blackfin port, so I think we've got a higher degree of freedom here. Pushing to the trunk. gcc/ * config/bfin/bfin.md (abssi): Disable pattern.
2025-02-05aarch64: Fix sve/acle/general/ldff1_8.c failuresRichard Sandiford1-1/+18
gcc.target/aarch64/sve/acle/general/ldff1_8.c and gcc.target/aarch64/sve/ptest_1.c were failing because the aarch64 port was giving a zero (unknown) cost to instructions that compute two results in parallel. This was latent until r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs as unknown. A long-standing todo here is to make insn_cost derive costs from md information, rather than having to write a lot of matching code in aarch64_rtx_costs. But that's not something we can do for GCC 15. This patch instead treats the cost of a PARALLEL as being the maximum cost of its constituent sets. I don't like this very much, since it isn't really target-specific behaviour. If it were stage 1, I'd be trying to change pattern_cost instead. gcc/ * config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs the same cost as the costliest SET.
2025-02-05arm: Use POP {pc} to return when returning [PR118089]Richard Earnshaw1-27/+35
When generating thumb2 code, LDM SP!, {PC} is a two-byte instruction, whereas LDR PC, [SP], #4 is needs 4 bytes. When optimizing for size, or when there's no obvious performance benefit prefer the former. gcc/ChangeLog: PR target/118089 * config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC} when optimizing for size, or when there's no performance benefit over LDR PC, [SP], #4. (arm_expand_epilogue): Likewise.
2025-02-05arm: remove constraints from *pop_multiple_with_writeback_and_returnRichard Earnshaw1-5/+3
This pattern is intended to be used only by the epilogue generation code and will always use fixed hard registers. As such, it does not need any register constraints, which might be misleading if a post-reload pass wanted to try renumbering various registers. So remove the constraints. Futhermore, to permit this pattern to match when popping just the PC (which is not a valid register_operand), remove the match on the first transfer register: pop_multiple_return will validate everything it needs to. gcc/ChangeLog: * config/arm/arm.md (*pop_multiple_with_writeback_and_return): Remove constraints. Don't validate the first transfer register here.
2025-02-05arm: cleanup code in ldm_stm_operation_p; relax limits on ldm/stmRichard Earnshaw1-98/+126
I needed to make some adjustments to this function to permit a push or pop of a single register in thumb2 code, since ldm/stm can be a two-byte instruction instead of 4. Trying to read the code as it was made me scratch my head as the logic was not very clear. So this patch cleans up the code somewhat, fixes a couple of minor bugs and removes the limit of having to use multiple registers when using this form of the instruction (the shape of this pattern is such that I can't see it being generated automatically by the compiler, so there should be no adverse affects of this). Buglets fixed: - Validate that the first element contains RETURN if we're matching a return instruction. - Don't allow the base address register to be stored if saving regs and the address is being updated (this is unpredictable in the architecture). - Verify that the last register loaded in a RETURN insn is the PC. gcc/ * config/arm/arm.cc (decompose_addr_for_ldm_stm): New function. (ldm_stm_operation_p): Rework to clarify logic. Allow single registers to be pushed or popped using LDM/STM.
2025-02-05RTEMS: Add Cortex-M33 multilibSebastian Huber1-2/+3
Enable use of Armv8-M instruction set. Account for CVE-2021-35465 mitigation [PR102035]. The -mfix-cmse-cve-2021-35465 option is enabled by default, if -mcpu=cortex-m33 is used. gcc/ * config/arm/t-rtems: Add Cortex-M33 multilib.
2025-02-04IBM zSystems: Do not use @PLT with larlIlya Leoshkevich2-17/+7
Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in 64-bit mode") made GCC call both static and non-static functions and load both static and non-static function addresses with the @PLT suffix. This made it difficult for linkers to distinguish calling and address taking instructions [1]. It is currently assumed that the R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used only for calling, and the R_390_PC32DBL relocation, corresponding to the empty suffix, is used only for address taking. Linkers needs to make this distinction in order to decide whether to ask ld.so to use canonical PLT entries. Normally GOT entries in shared objects contain addresses of the respective functions, with one notable exception: when a no-pie executable calls the respective function and also takes its address. Such executables assume that all addresses are known in advance, so they use addresses of the respective PLT entries. For consistency reasons, all respective GOT entries in the process must also use them. When a linker sees that a no-pie executable both calls a function and also takes its address, it creates a PLT entry and asks ld.so to consider it canonical by setting the respective undefined symbol's address, which is normally 0, to the address of this PLT entry. Improve the situation by not using @PLT with larl. Now that @PLT is not used with larl, also drop the 31-bit handling, which was required because 31-bit PLT entries require %r12 to point to the respective object's GOT, and this requirement is not satisfied when calling them by pointer from another object. Also drop the weak symbol handling, which was required because it is not possible to load an undefined weak symbol address (0) using larl. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=29655 gcc/ChangeLog: * config/s390/s390.cc (print_operand): Remove the no longer necessary 31-bit and weak symbol handling. * config/s390/s390.md (*movdi_64): Do not use @PLT with larl. (*movsi_larl): Likewise. (main_base_64): Likewise. (reload_base_64): Likewise. gcc/testsuite/ChangeLog: * gcc.target/s390/call-z10-pic-nodatarel.c: Adjust expectations. * gcc.target/s390/call-z10-pic.c: Likewise. * gcc.target/s390/call-z10.c: Likewise. * gcc.target/s390/call-z9-pic-nodatarel.c: Likewise. * gcc.target/s390/call-z9-pic.c: Likewise. * gcc.target/s390/call-z9.c: Likewise.
2025-02-03i386: Fix and improve TARGET_INDIRECT_BRANCH_REGISTER handling some moreUros Bizjak2-25/+10
gcc/ChangeLog: * config/i386/i386.md (*sibcall_pop_memory): Disable for TARGET_INDIRECT_BRANCH_REGISTER * config/i386/predicates.md (call_insn_operand): Enable when "satisfies_constraint_Bw (op)" is true, instead of open-coding constraint here. (sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)"
2025-02-03aarch64: Fix dupq_* testsuite failuresRichard Sandiford1-28/+82
This patch fixes the dupq_* testsuite failures. The tests were introduced with r15-3669-ga92f54f580c3 (which was a nice improvement) and Pengxuan originally had a follow-on patch to recognise INDEX constants during vec_init. I'd originally wanted to solve this a different way, using wildcards when building a vector and letting vector_builder::finalize find the best way of filling them in. I no longer think that's the best approach though. Stepped constants are likely to be more expensive than unstepped constants, so we should first try finding an unstepped constant that is valid, even if it has a longer representation than the stepped version. This patch therefore uses a variant of Pengxuan's idea. While there, I noticed that the (old) code for finding an unstepped constant only tried varying one bit at a time. So for index 0 in a 16-element constant, the code would try picking a constant from index 8, 4, 2, and then 1. But since the goal is to create "fewer, larger, repeating parts", it would be better to iterate over a bit-reversed increment, so that after trying an XOR with 0 and 8, we try adding 4 to each previous attempt, then 2 to each previous attempt, and so on. In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ... The test shows an example of this for 8 shorts. gcc/ * config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant): New function, split out from... (aarch64_expand_vector_init_fallback): ...here. Use a bit- reversed increment to find a constant index. Add support for stepped constants. gcc/testsuite/ * gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.
2025-02-03hppa: Revise various millicode insn patterns to use match_operandJohn David Anglin2-20/+52
LRA does not correctly support hard-register input operands that are clobbered. This is needed to support millicode calls on hppa. The operand setup is sometimes deleted. This problem can be avoided by hiding hard-register input operands using match_operand. This also potentially allows for constraints that specify the operand is both read and written. 2025-02-03 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR rtl-optimization/117248 * config/pa/predicates.md (r25_operand): New predicate. (r26_operand): Likewise. * config/pa/pa.md: Use match_operand for r25 and r26 hard register operands in mult, div, udiv, mod and umod millicode patterns.
2025-02-02x86: Change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)"H.J. Lu1-1/+1
Update commit dd6247cb8fc11a15e23e949092f89d24ff329209 Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Jan 31 12:29:04 2025 +0800 x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt to change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)". PR target/118713 * config/i386/i386-expand.cc (ix86_expand_call): Change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)". Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-02-01x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-pltH.J. Lu6-24/+27
If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should use register, not memory. Update Bs, Bw and Bz constraints to disable indirect call over memmory if TARGET_INDIRECT_BRANCH_REGISTER true, change x32 call over GOT slot to call over register and also disable sibcall over memory. gcc/ PR target/118713 * config/i386/constraints.md (Bs): Always disable if TARGET_INDIRECT_BRANCH_REGISTER is true. (Bw): Likewise. * config/i386/i386-expand.cc (ix86_expand_call): Force indirect call via register for x32 GOT slot call if TARGET_INDIRECT_BRANCH_REGISTER is true. * config/i386/i386-protos.h (ix86_nopic_noplt_attribute_p): New. * config/i386/i386.cc (ix86_nopic_noplt_attribute_p): Make it global. * config/i386/i386.md (*call_got_x32): Disable indirect call via memory for TARGET_INDIRECT_BRANCH_REGISTER. (*call_value_got_x32): Likewise. (*sibcall_value_pop_memory): Likewise. * config/i386/predicates.md (constant_call_address_operand): Return false if both TARGET_INDIRECT_BRANCH_REGISTER and ix86_nopic_noplt_attribute_p are true. gcc/testsuite/ PR target/118713 * gcc.target/i386/pr118713-1-x32.c: New test. * gcc.target/i386/pr118713-1.c: Likewise. * gcc.target/i386/pr118713-2-x32.c: Likewise. * gcc.target/i386/pr118713-2.c: Likewise. * gcc.target/i386/pr118713-3-x32.c: Likewise. * gcc.target/i386/pr118713-3.c: Likewise. * gcc.target/i386/pr118713-4-x32.c: Likewise. * gcc.target/i386/pr118713-4.c: Likewise. * gcc.target/i386/pr118713-5-x32.c: Likewise. * gcc.target/i386/pr118713-5.c: Likewise. * gcc.target/i386/pr118713-6-x32.c: Likewise. * gcc.target/i386/pr118713-6.c: Likewise. * gcc.target/i386/pr118713-7-x32.c: Likewise. * gcc.target/i386/pr118713-7.c: Likewise. * gcc.target/i386/pr118713-8-x32.c: Likewise. * gcc.target/i386/pr118713-8.c: Likewise. * gcc.target/i386/pr118713-9-x32.c: Likewise. * gcc.target/i386/pr118713-9.c: Likewise. * gcc.target/i386/pr118713-10-x32.c: Likewise. * gcc.target/i386/pr118713-10.c: Likewise. * gcc.target/i386/pr118713-11-x32.c: Likewise. * gcc.target/i386/pr118713-11.c: Likewise. * gcc.target/i386/pr118713-12-x32.c: Likewise. * gcc.target/i386/pr118713-12.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-01-30AVR: Provide built-ins for strlen where the string lives in some AS.Georg-Johann Lay2-1/+40
This patch adds built-in functions __builtin_avr_strlen_flash, __builtin_avr_strlen_flashx and __builtin_avr_strlen_memx. Purpose is that higher-level functions can use __builtin_constant_p on strlen without raising a diagnostic due to -Waddr-space-convert. gcc/ * config/avr/builtins.def (STRLEN_FLASH, STRLEN_FLASHX) (STRLEN_MEMX): New DEF_BUILTIN's. * config/avr/avr.cc (avr_ftype_strlen): New static function. (avr_builtin_supported_p): New built-ins are not for AVR_TINY. (avr_init_builtins) <strlen_flash_node, strlen_flashx_node, strlen_memx_node>: Provide new fntypes. (avr_fold_builtin) [AVR_BUILTIN_STRLEN_FLASH] [AVR_BUILTIN_STRLEN_FLASHX, AVR_BUILTIN_STRLEN_MEMX]: Fold if possible. * doc/extend.texi (AVR Built-in Functions): Document __builtin_avr_strlen_flash, __builtin_avr_strlen_flashx, __builtin_avr_strlen_memx. libgcc/ * config/avr/t-avr (LIB1ASMFUNCS): Add _strlen_memx. * config/avr/lib1funcs.S <L_strlen_memx, __strlen_memx>: Implement.
2025-01-30AVR: Only provide a built-in when it is available.Georg-Johann Lay4-4/+32
Some built-ins are not available for C++ since they are using named address-spaces or fixed-point types. gcc/ * config/avr/builtins.def (AVR_FIRST_C_ONLY_BUILTIN_ID): New macro. * config/avr/avr-protos.h (avr_builtin_supported_p): New. * config/avr/avr.cc (avr_builtin_supported_p): New function. (avr_init_builtins): Only provide a built-in when it is supported. * config/avr/avr-c.cc (avr_cpu_cpp_builtins): Only define the __BUILTIN_AVR_<NAME> build-in defines when the associated built-in function is supported. * doc/extend.texi (AVR Built-in Functions): Add a note that following built-ins are supported for only for GNU-C.
2025-01-30s390: Fix up *vec_cmpgt{,u}<mode><mode>_nocc_emu splitters [PR118696]Jakub Jelinek1-2/+2
The following testcase is miscompiled on s390x-linux with e.g. -march=z13 (both -O0 and -O2) starting with r15-7053. The problem is in the splitters which emulate TImode/V1TImode GT and GTU comparisons. For GT we want to do (ior (gt (hi op1) (hi op2)) (and (eq (hi op1) (hi op2)) (gtu (lo op1) (lo op2)))) and for GTU similarly except for gtu instead of gt in there. Now, the splitter emulation is using V2DImode comparisons where on s390x the hi part is in the first element of the vector, lo part in the second, and for the gtu case it swaps the elements of the vector. So, we get the right result in the first element of the result vector. But vrepg was then broadcasting the second element of the result vector rather than the first, and the value of the second element of the vector is instead (ior (gt (lo op1) (lo op2)) (and (eq (lo op1) (lo op2)) (gtu (hi op1) (hi op2)))) so something not really usable for the emulated comparison. The following patch fixes that. The testcase tries to test behavior of double-word smin/smax/umin/umax with various cases of the halves of both operands (one that is sometimes EQ, sometimes GT, sometimes LT, sometimes GTU, sometimes LTU). 2025-01-30 Jakub Jelinek <jakub@redhat.com> Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org> PR target/118696 * config/s390/vector.md (*vec_cmpgt<mode><mode>_nocc_emu, *vec_cmpgtu<mode><mode>_nocc_emu): Duplicate the first rather than second V2DImode element. * gcc.dg/pr118696.c: New test. * gcc.target/s390/vector/pr118696.c: New test. * gcc.target/s390/vector/vec-abs-emu.c: Expect vrepg with 0 as last operand rather than 1. * gcc.target/s390/vector/vec-max-emu.c: Likewise. * gcc.target/s390/vector/vec-min-emu.c: Likewise.
2025-01-29AVR: Allow to share libgcc's __negsi2.Georg-Johann Lay1-0/+9
libgcc has a module for __negsi2: REG_22:SI := - REG_22:SI. This patch adds a pattern that allows to share that function provided optimize_size. gcc/ * config/avr/avr.md (*negsi2.libgcc): New insn.
2025-01-29[PATCH] RX: Restrict displacement ranges in "Q" constraintYoshinori Sato1-1/+2
When using the "Q" constraint in the inline assembler, the displacement value could exceed the range specified by the instruction. To avoid this issue, a displacement range check is added to the "Q" constraint. gcc/ * config/rx/constraints.md (Q): Also check that the address passes rx_is_restricted_memory-address.
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_TRUNC [PR117688]Pan Li1-1/+1
This patch would like to fix the wroing code generation for the scalar signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually want compare -129 to -128, but if there is no sign extend like lbu, we will compare 0xff7f to 0xffffffffffffff80(assum Xmode is DImode). Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffff7f(assume Xmode is DImode) before compare in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macros. * gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test. * gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_SUB [PR117688]Pan Li1-2/+2
This patch would like to fix the wroing code generation for the scalar signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like sub. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get 0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffffff(assume XImode is DImode) before sub in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688.h: Add test helper macro. * gcc.target/riscv/pr117688-sub-run-1-s16.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s32.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s64.c: New test. * gcc.target/riscv/pr117688-sub-run-1-s8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-29RISC-V: Fix incorrect code gen for scalar signed SAT_ADD [PR117688]Pan Li1-2/+2
This patch would like to fix the wroing code generation for the scalar signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub can only work on Xmode. Unfortunately we don't have sub/add for non-Xmode like QImode in scalar, thus we need to sign extend to Xmode to ensure we have the correct value before ALU like add. The gen_lowpart will generate something like lbu which has all zero for highest bits. For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get 0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode) to 0xffffffffffffffff(assume XImode is DImode) before plus in Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117688 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper riscv_extend_to_xmode_reg with SIGN_EXTEND. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr117688-add-run-1-s16.c: New test. * gcc.target/riscv/pr117688-add-run-1-s32.c: New test. * gcc.target/riscv/pr117688-add-run-1-s64.c: New test. * gcc.target/riscv/pr117688-add-run-1-s8.c: New test. * gcc.target/riscv/pr117688.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-29RISC-V: Refactor SAT_* operand rtx extend to reg help func [NFC]Pan Li1-29/+49
This patch would like to refactor the helper function of the SAT_* scalar. The helper function will convert the define_pattern ops to the xmode reg for the underlying code-gen. This patch add new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int or the mode is non-Xmode. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ... (riscv_extend_to_xmode_reg): Rename to and add rtx_code for zero/sign extend if non-Xmode. (riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND. (riscv_expand_ussub): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-01-27[PR target/114085] Fix H8 constraint issue which led to ICEJeff Law3-11/+7
Nowhere near the top of my list, but a quick looksie Sunday led to an easy to fix backend bug. It's not a regression, but given its the H8 backend I think we've safely got a degree of freedom here. The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, so it wasn't marked as a memory constraint. LRA doesn't really handle this well -- a pseudo which didn't get a hard reg was replaced by its MEM. The stack slot doesn't fit the limited addressing forms available and LRA didn't know it just needed to reload the address into a reg. Fixed by removing REG from the "U" constraint, turning "U" into a memory constraint and adjusting a few patterns to allow "rU" instead of "U". We don't really support C++ on the H8 and as a result libstdc++ won't build. Interestingly enough that also keeps the C++ tests from working, even for a compile-only test. So no testcase. Though I did check the reduced and original test manually and ran it through my tester without any regressions. PR target/114085 gcc/ * config/h8300/constraints.md (U): No longer accept REGs. * config/h8300/logical.md (andqi3_2): Use "rU" rather than "U". (andqi3_2_clobber_flags, andqi3_1, <code>qi3_1): Likewise. * config/h8300/testcompare.md (tst_extzv_1_n): Likewise.