aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-08-22RTEMS: Add riscv multilibsSebastian Huber1-2/+7
gcc/ChangeLog: * config/riscv/t-rtems: Add -mstrict-align multilibs for targets without support for misaligned access in hardware.
2025-08-21pru: libgcc: Add software implementation for multiplicationDimitar Dimitrov1-1/+10
For cores without a hardware multiplier, set respective optabs with library functions which use software implementation of multiplication. The implementation was copied from the RL78 backend. gcc/ChangeLog: * config/pru/pru.cc (pru_init_libfuncs): Set softmpy libgcc functions for optab multiplication entries if TARGET_OPT_MUL option is not set. libgcc/ChangeLog: * config/pru/libgcc-eabi.ver: Add __pruabi_softmpyi and __pruabi_softmpyll symbols. * config/pru/t-pru: Add softmpy source files. * config/pru/pru-softmpy.h: New file. * config/pru/softmpyi.c: New file. * config/pru/softmpyll.c: New file. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21pru: Define multilib for different core variantsDimitar Dimitrov2-0/+32
Enable multilib builds for contemporary PRU core versions (AM335x and later), and older versions present in AM18xx. gcc/ChangeLog: * config.gcc: Include pru/t-multilib. * config/pru/pru.h (MULTILIB_DEFAULTS): Define. * config/pru/t-multilib: New file. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21pru: Add options to disable MUL/FILL/ZERO instructionsDimitar Dimitrov4-13/+31
Older PRU core versions (e.g. in AM1808 SoC) do not support XIN, XOUT, FILL, ZERO instructions. Add GCC command line options to optionally disable generation of those instructions, so that code can be executed on such older PRU cores. gcc/ChangeLog: * common/config/pru/pru-common.cc (TARGET_DEFAULT_TARGET_FLAGS): Keep multiplication, FILL and ZERO instructions enabled by default. * config/pru/pru.md (prumov<mode>): Gate code generation on TARGET_OPT_FILLZERO. (mov<mode>): Ditto. (zero_extendqidi2): Ditto. (zero_extendhidi2): Ditto. (zero_extendsidi2): Ditto. (@pru_ior_fillbytes<mode>): Ditto. (@pru_and_zerobytes<mode>): Ditto. (@<code>di3): Ditto. (mulsi3): Gate code generation on TARGET_OPT_MUL. * config/pru/pru.opt: Add mmul and mfillzero options. * config/pru/pru.opt.urls: Regenerate. * config/rl78/rl78.opt.urls: Regenerate. * doc/invoke.texi: Document new options. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2025-08-21x86-64: Emit the TLS call after NOTE_INSN_BASIC_BLOCKH.J. Lu1-3/+21
For a basic block with only a label: (code_label 78 11 77 3 14 (nil) [1 uses]) (note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK) emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before NOTE_INSN_BASIC_BLOCK, to avoid x.c: In function ‘aout_16_write_syms’: x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3 54 | } | ^ x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3 during RTL pass: x86_cse x.c:54:1: internal compiler error: verify_flow_info failed gcc/ PR target/121607 * config/i386/i386-features.cc (ix86_emit_tls_call): Emit the TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only a label. gcc/testsuite/ PR target/121607 * gcc.target/i386/pr121607-1a.c: New test. * gcc.target/i386/pr121607-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-21xtensa: Small improvement to "*btrue_INT_MIN"Takayuki 'January June' Suwa1-10/+7
This patch changes the implementation of the insn to test whether the result itself is negative or not, rather than the MSB of the result of the ABS machine instruction. This eliminates the need to consider bit- endianness and allows for longer branch distances. /* example */ extern void foo(int); void test0(int a) { if (a == -2147483648) foo(a); } void test1(int a) { if (a != -2147483648) foo(a); } ;; before (endianness: little) test0: entry sp, 32 abs a8, a2 bbci a8, 31, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bbsi a8, 31, .L4 mov.n a10, a2 call8 foo .L4: retw.n ;; after (endianness-independent) test0: entry sp, 32 abs a8, a2 bgez a8, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bltz a8, .L4 mov.n a10, a2 call8 foo .L4: retw.n gcc/ChangeLog: * config/xtensa/xtensa.md (*btrue_INT_MIN): Change the branch insn condition to test for a negative number rather than testing for the MSB.
2025-08-20Merge aarch64-cc-fusion into late-combineRichard Sandiford4-305/+0
I'd added the aarch64-specific CC fusion pass to fold a PTEST instruction into the instruction that feeds the PTEST, in cases where the latter instruction can set the appropriate flags as a side-effect. Combine does the same optimisation. However, as explained in the comments, the PTEST case often has: A: set predicate P based on inputs X B: clobber X C: test P and so the fusion is only possible if we move C before B. That's something that combine currently can't do (for the cases that we needed). The optimisation was never really AArch64-specific. It's just that, in an all-too-familiar fashion, we needed it in stage 3, when it was too late to add something target-independent. late-combine adds a convenient place to do the optimisation in a target-independent way, just as combine is a convenient place to do its related optimisation. gcc/ * config.gcc (aarch64*-*-*): Remove aarch64-cc-fusion.o from extra_objs. * config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete. * config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete. * config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete. * config/aarch64/aarch64-cc-fusion.cc: Delete. * late-combine.cc (late_combine::optimizable_set): Take a set_info * rather than an insn_info * and move destination tests from... (late_combine::combine_into_uses): ...here. Take a set_info * rather an insn_info *. Take the rtx set. (late_combine::parallelize_insns, late_combine::combine_cc_setter) (late_combine::combine_insn): New member functions. (late_combine::m_parallel): New member variable. * rtlanal.cc (pattern_cost): Handle sets of CC registers in the same way as comparisons.
2025-08-20AVR: target/121608 - Don't add --relax when linking with -r.Georg-Johann Lay1-1/+1
The linker rejects --relax in relocatable links (-r), hence only add --relax when -r is not specified. gcc/ PR target/121608 * config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.
2025-08-19x86: Place the TLS call before all register setting BBsH.J. Lu1-106/+228
We can't place a TLS call before a conditional jump in a basic block like (code_label 13 11 14 4 2 (nil) [1 uses]) (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (jump_insn 16 14 17 4 (set (pc) (if_then_else (le (reg:CCNO 17 flags) (const_int 0 [0])) (label_ref 27) (pc))) "x.c":10:21 discrim 1 1462 {*jcc} (expr_list:REG_DEAD (reg:CCNO 17 flags) (int_list:REG_BR_PROB 628353713 (nil))) -> 27) since the TLS call will clobber flags register nor place a TLS call in a basic block if any live caller-saved registers aren't dead at the end of the basic block: ;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104 ;; live gen 0 [ax] 102 106 108 116 117 118 120 ;; live kill 5 [di] Instead, we should place such call before all register setting basic blocks which dominate the current basic block. Keep track the replaced GNU and GNU2 TLS instructions. Use these info to place the __tls_get_addr call and mark FLAGS register as dead. gcc/ PR target/121572 * config/i386/i386-features.cc (replace_tls_call): Add a bitmap argument and put the updated TLS instruction in the bitmap. (ix86_get_dominator_for_reg): New. (ix86_check_flags_reg): Likewise. (ix86_emit_tls_call): Likewise. (ix86_place_single_tls_call): Add 2 bitmap arguments for updated GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit TLS instruction. Correct debug dump for before instruction. gcc/testsuite/ PR target/121572 * gcc.target/i386/pr121572-1a.c: New test. * gcc.target/i386/pr121572-1b.c: Likewise. * gcc.target/i386/pr121572-2a.c: Likewise. * gcc.target/i386/pr121572-2b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-19AArch64: Use vectype from SLP node instead of stmt_info [PR121536]Tamar Christina1-5/+6
commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead requires the use of SLP_TREE_VECTYPE for everything but data-refs. This means that STMT_VINFO_VECTYPE (stmt_info) will always be NULL and so aarch64_bool_compound_p will never properly cost predicate AND operations anymore resulting in less vectorization. This patch changes it to use SLP_TREE_VECTYPE and pass the slp_node to aarch64_bool_compound_p. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_bool_compound_p): Use SLP_TREE_VECTYPE instead of STMT_VINFO_VECTYPE. (aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Pass SLP node to aarch64_bool_compound_p. gcc/testsuite/ChangeLog: PR target/121536 * g++.target/aarch64/sve/pr121536.cc: New test.
2025-08-19AArch64: Fix scalar costing after removal of vectype from mid-end [PR121536]Tamar Christina1-0/+11
commit g:fb59c5719c17a04ecfd58b5e566eccd6d2ac583a stops passing the scalar type (confusingly named vectype) to the costing hook when doing scalar costing. As a result, we could no longer distinguish between FPR and GPR scalar stmts. A later commit also removed STMT_VINFO_VECTYPE from stmt_info. This leaves the only remaining option to get the type of the original stmt in the stmt_info. This patch does this when we're performing scalar costing. Ideally I'd refactor this a bit because a lot of the hooks just need to know if it's FP or not, but this seems pointless with the ongoing costing churn. So for now this restores our costing. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Set vectype from type of lhs of gimple stmt.
2025-08-18aarch64: add new constants for MTE insnsIndu Bhagat1-4/+14
Define new constants to be used by the MTE pattern definitions. gcc/ * config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define constant. (MEMTAG_ADDR_MASK): Likewise. (irg, subp, ldg): Use new constants. Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
2025-08-18LoongArch: Implement 16-byte atomic add, sub, and, or, xor, and nand with sc.qXi Ruoyao1-6/+111
gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec. (UNSPEC_TI_FETCH_SUB): Likewise. (UNSPEC_TI_FETCH_AND): Likewise. (UNSPEC_TI_FETCH_XOR): Likewise. (UNSPEC_TI_FETCH_OR): Likewise. (UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Likewise. (ALL_SC): New define_mode_iterator. (_scq): New define_mode_attr. (atomic_fetch_nand<mode>): Accept ALL_SC instead of only GPR. (UNSPEC_TI_FETCH_DIRECT): New define_int_iterator. (UNSPEC_TI_FETCH): New define_int_iterator. (amop_ti_fetch): New define_int_attr. (size_ti_fetch): New define_int_attr. (atomic_fetch_<amop_ti_fetch>ti_scq): New define_insn. (atomic_fetch_<amop_ti_fetch>ti): New define_expand.
2025-08-18LoongArch: Implement 16-byte atomic exchange with sc.qXi Ruoyao1-0/+35
gcc/ChangeLog: * config/loongarch/sync.md (atomic_exchangeti_scq): New define_insn. (atomic_exchangeti): New define_expand.
2025-08-18LoongArch: Implement 16-byte CAS with sc.qXi Ruoyao1-0/+89
gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): New define_insn. (atomic_compare_and_swapti): New define_expand.
2025-08-18LoongArch: Implement 16-byte atomic store with sc.qXi Ruoyao2-1/+28
When LSX is not available but sc.q is (for example on LA664 where the SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic store. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Accept "%t" for printing the number of the 64-bit machine register holding the upper half of a TImode. * config/loongarch/sync.md (atomic_storeti_scq): New define_insn. (atomic_storeti): expand to atomic_storeti_scq if !ISA_HAS_LSX.
2025-08-18LoongArch: Add -m[no-]scq optionXi Ruoyao7-4/+21
We'll use the sc.q instruction for some 16-byte atomic operations, but it's only added in LoongArch 1.1 evolution so we need to gate it with an option. gcc/ChangeLog: * config/loongarch/genopts/isa-evolution.in (scq): New evolution feature. * config/loongarch/loongarch-evolution.cc: Regenerate. * config/loongarch/loongarch-evolution.h: Regenerate. * config/loongarch/loongarch-str.h: Regenerate. * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.opt.urls: Regenerate. * config/loongarch/loongarch-def.cc: Make -mscq the default for -march=la664 and -march=la64v1.1. * doc/invoke.texi (LoongArch Options): Document -m[no-]scq.
2025-08-18LoongArch: Implement 16-byte atomic store with LSXXi Ruoyao1-0/+44
If the vector is naturally aligned, it cannot cross cache lines so the LSX store is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic store, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_storeti_lsx): New define_insn. (atomic_storeti): New define_expand.
2025-08-18LoongArch: Implement 16-byte atomic load with LSXXi Ruoyao1-0/+41
If the vector is naturally aligned, it cannot cross cache lines so the LSX load is guaranteed to be atomic. Thus we can use LSX to do the lock-free atomic load, instead of using a lock. gcc/ChangeLog: * config/loongarch/sync.md (atomic_loadti_lsx): New define_insn. (atomic_loadti): New define_expand.
2025-08-18LoongArch: Implement atomic_fetch_nand<GPR:mode>Xi Ruoyao1-0/+40
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand is expanded to a loop containing a CAS in the body, and CAS itself is a LL-SC loop so we have a nested loop. This is obviously not a good idea as we just need one LL-SC loop in fact. As ~(atom & mask) is (~mask) | (~atom), we can just invert the mask first and the body of the LL-SC loop would be just one orn instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_nand_mask_inverted<GPR:mode>): New define_insn. (atomic_fetch_nand<GPR:mode>): New define_expand.
2025-08-18LoongArch: Don't expand atomic_fetch_sub_{hi, qi} to LL-SC loop if -mlam-bhXi Ruoyao1-1/+1
With -mlam-bh, we should negate the addend first, and use an amadd instruction. Disabling the expander makes the compiler do it correctly. gcc/ChangeLog: * config/loongarch/sync.md (atomic_fetch_sub<SHORT:mode>): Disable if ISA_HAS_LAM_BH.
2025-08-18LoongArch: Implement subword atomic_fetch_{and, or, xor} with am*.w instructionsXi Ruoyao1-143/+34
We can just shift the mask and fill the other bits with 0 (for ior/xor) or 1 (for and), and use an am*.w instruction to perform the atomic operation, instead of using a LL-SC loop. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND): Remove. (UNSPEC_COMPARE_AND_SWAP_XOR): Remove. (UNSPEC_COMPARE_AND_SWAP_OR): Remove. (atomic_test_and_set): Rename to ... (atomic_fetch_<any_bitwise:amop><SHORT:mode>): ... this, and adapt the expansion to use it for any bitwise operations and any val, instead of just ior 1. (atomic_test_and_set): New define_expand.
2025-08-18LoongArch: Remove unneeded "andi offset, addr, 3" instruction in ↵Xi Ruoyao1-5/+4
atomic_test_and_set On LoongArch sll.w and srl.w instructions only take the [4:0] bits of rk (shift amount) into account, and we've already defined SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we don't need this instruction. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set): Remove unneeded andi instruction from the expansion.
2025-08-18LoongArch: Remove unneeded "b 3f" instruction after LL-SC loopsXi Ruoyao2-13/+10
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa or the memory model requires a barrier on failure. But with -mld-seq-sa and other memory models the barrier may be nonexisting at all, and we should remove the "b 3f" instruction as well. The implementation uses a new operand modifier "%T" to output a comment marker if the operand is a memory order for which the barrier won't be generated. "%T", and also "%t", are not really used before and the code for them in loongarch_print_operand_reloc is just some MIPS legacy. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand_reloc): Make "%T" output a comment marker if the operand is a memory order for which the barrier won't be generated; remove "%t". * config/loongarch/sync.md (atomic_cas_value_strong<mode>): Add %T before "b 3f". (atomic_cas_value_cmp_and_7_<mode>): Likewise.
2025-08-18LoongArch: Don't emit overly-restrictive barrier for LL-SC loopsXi Ruoyao1-12/+9
For LL-SC loops, if the atomic operation has succeeded, the SC instruction always imply a full barrier, so the barrier we manually inserted only needs to take the account for the failure memorder, not the success memorder (the barrier is skipped with "b 3f" on success anyway). Note that if we use the AMCAS instructions, we indeed need to consider both the success memorder an the failure memorder deciding if "_db" suffix is needed. Thus the semantics of atomic_cas_value_strong<mode> and atomic_cas_value_strong<mode>_amcas start to be different. To prevent the compiler from being too clever, use a different unspec code for AMCAS instructions. gcc/ChangeLog: * config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New UNSPEC code. (atomic_cas_value_strong<mode>): NFC, update the comment to note we only need to consider failure memory order. (atomic_cas_value_strong<mode>_amcas): Use UNSPEC_COMPARE_AND_SWAP_AMCAS instead of UNSPEC_COMPARE_AND_SWAP. (atomic_compare_and_swap<mode:GPR>): Pass failure memorder to gen_atomic_cas_value_strong<mode>. (atomic_compare_and_swap<mode:SHORT>): Pass failure memorder to gen_atomic_cas_value_cmp_and_7_si.
2025-08-18LoongArch: Allow using bstrins for masking the address in atomic_test_and_setXi Ruoyao1-5/+6
We can use bstrins for masking the address here. As people are already working on LA32R (which lacks bstrins instructions), for future-proofing we check whether (const_int -4) is an and_operand and force it into an register if not. gcc/ChangeLog: * config/loongarch/sync.md (atomic_test_and_set): Use bstrins for masking the address if possible.
2025-08-18LoongArch: Don't use "+" for atomic_{load, store} "m" constraintXi Ruoyao1-2/+2
Atomic load does not modify the memory. Atomic store does not read the memory, thus we can use "=" instead. gcc/ChangeLog: * config/loongarch/sync.md (atomic_load<mode>): Remove "+" for the memory operand. (atomic_store<mode>): Use "=" instead of "+" for the memory operand.
2025-08-18LoongArch: (NFC) Remove amo and use size insteadXi Ruoyao1-28/+25
They are the same. gcc/ChangeLog: * config/loongarch/sync.md: Use <size> instead of <amo>. (amo): Remove.
2025-08-18LoongArch: (NFC) Remove atomic_optab and use amop insteadXi Ruoyao1-4/+2
They are the same. gcc/ChangeLog: * config/loongarch/sync.md (atomic_optab): Remove. (atomic_<atomic_optab><mode>): Change atomic_optab to amop. (atomic_fetch_<atomic_optab><mode>): Likewise.
2025-08-17[PR target/121213] Avoid unnecessary constant load in amoswapAustin Law1-4/+4
PR 121213 shows an unnecessary "li target,0" in an atomic exchange loop on RISC-V. The source operand for an amoswap instruction should allow (const_int 0) in addition to GPRs. So the operand's predicate is changed to "reg_or_0_operand". The corresponding constraint is also changed to allow a reg or the constant 0. With the source operand no longer tied to the destination operand we do not need the earlyclobber for the destination, so the destination operand's constraint is adjusted accordingly. This patch does not address the unnecessary sign extension reported in the PR. Tested with no regressions on riscv32-elf and riscv64-elf. PR target/121213 gcc/ * config/riscv/sync.md (amo_atomic_exchange<mode>): Allow (const_int 0) as input operand. Do not tie input to output. No longer earlyclobber the output. gcc/testsuite * gcc.target/riscv/amo/pr121213.c: New test.
2025-08-17[PR target/109324] H8/300: Fix genrecog warnings about operands missing modes.Jan Dubiec3-20/+20
This patch fixes genrecog warnings about operands missing modes. This is done by explicitly specifying modes of operations. PR target/109324 gcc/ChangeLog: * config/h8300/addsub.md: Explicitly specify mode for plus operation. * config/h8300/jumpcall.md: Explicitly specify modes for eq and match_operand operations. * config/h8300/testcompare.md: Explicitly specify modes for eq, ltu and compare operations.
2025-08-16[PATCH] RISC-V: Fix block matching in arch-canonicalize [PR121538]Dimitar Dimitrov1-2/+20
Commit r16-3028-g0c517ddf9b136c introduced parsing of conditional blocks in riscv-ext*.def. For simplicity, it used a simple regular expression to match the C++ lambda function for each condition. But the regular expression is too simple - it matches only the first scoped code block, without any trailing closing braces. The "c" dependency for the "zca" extension has two code blocks inside its conditional. One for RV32 and one for RV64. The script matches only the RV32 block, and leaves the RV64 one. Any strings left, in turn, are considered a list of non-conditional extensions. Thus the quoted strings "d" and "zcd" from that block are taken as "simple" (non-conditional) dependencies: if (subset_list->xlen () == 64) { if (subset_list->lookup ("d")) return subset_list->lookup ("zcd"); As a result, arch-canonicalize erroneously adds "d" extension: $ ./config/riscv/arch-canonicalize rv32ec rv32efdc_zicsr_zca_zcd_zcf Before r16-3028-g0c517ddf9b136c the command returned: $ ./config/riscv/arch-canonicalize rv32ec rv32ec Fix by extending the conditional block match until the number of opening and closing braces is equal. This change might seem crude, but it does save us from introducing a full C++ parser into the simple arch-canonicalize python script. With this patch the script now returns: $ ./config/riscv/arch-canonicalize rv32ec rv32ec_zca Ok for trunk? PR target/121538 gcc/ChangeLog: * config/riscv/arch-canonicalize (parse_dep_exts): Match condition block up to closing brace. (test_parse_long_condition_block): New test.
2025-08-16x86: Add target("80387") function attributeH.J. Lu1-0/+6
Add target("80387") attribute to enable and disable x87 instructions in a function. gcc/ PR target/121541 * config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p): Add target("80387") attribute. Set the mask bit in opts_set->x_target_flags if the mask bit in opts->x_target_flags is updated. * doc/extend.texi: Document target("80387") function attribute. gcc/testsuite/ PR target/121541 * gcc.target/i386/pr121541-1a.c: New test. * gcc.target/i386/pr121541-1b.c: Likewise. * gcc.target/i386/pr121541-2.c: Likewise. * gcc.target/i386/pr121541-3.c: Likewise. * gcc.target/i386/pr121541-4.c: Likewise. * gcc.target/i386/pr121541-5a.c: Likewise. * gcc.target/i386/pr121541-5b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-17RISC-V: Update the comments of vx combine [NFC]Pan Li1-0/+20
The supported insn of vx combine is out of date, update all insn supported for now. gcc/ChangeLog: * config/riscv/autovec-opt.md: Add supported insn of vx combine. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-17RISC-V: Add missed DONE for vx combine pattern [NFC]Pan Li1-0/+4
The previous patch missed the DONE indicator of the vx combine pattern. Thus add it back. gcc/ChangeLog: * config/riscv/autovec-opt.md: Add missed DONE for vx combine pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-08-15RISC-V: fix __builtin_round clobbering FP exceptions flags [PR121534]Vineet Gupta1-0/+12
__builtin_round() fails to save/restore FP exception flags around the FP compare insn which can potentially clobber the same. Worth noting that the fflags restore bracketing is slightly different than the glibc implementation. Both FLT and FCVT can potentially clobber fflags. gcc generates below where even if branch is not taken and FCVT is not executed, FLT still executed. Thus FSFLAGS is placed AFTER the label 'L3'. glibc implementation FLT can't clobber due to early NaN check, so FSFLAGS can be moved under the branch, before the label. | convert_float_to_float_round | ... | frflags a5 | fabs.s fa5,fa0 | flt.s a4,fa5,fa4 <--- can clobber fflags | beq a4,zero,.L3 | fcvt.w.s a4,fa0,rmm <--- also | fcvt.s.w fa5,a4 | fsgnj.s fa0,fa5,fa0 | .L3: | fsflags a5 <-- both code paths Fixes: f652a35877e3 ("This is almost exclusively Jivan's work....") PR target/121534 gcc/ChangeLog: * config/riscv/riscv.md (round_pattern): save/restore fflags. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: Adjust scan pattern for additional instances of frflags/fsrflags. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2025-08-15RISC-V: MIPS prefetch extensions for MIPS RV64 P8700 and can be enabled with ↵Umesh Kalappa8-5/+96
xmipscbop. Addressed the comments and tested "runtest --tool gcc --target_board='riscv-sim/-march=rv64gc_zba_zbb_zbc_zbs/-mabi=lp64/-mcmodel=medlow' riscv.exp" and 32 bit too lint warnings can be ignored for riscv-ext.opt. gcc/ChangeLog: * config/riscv/riscv-ext-mips.def (DEFINE_RISCV_EXT): Added mips prefetch extension. * config/riscv/riscv-ext.opt: Generated file. * config/riscv/riscv.md (prefetch): Added mips prefetch address operand constraint. * config/riscv/constraints.md: Added mips specific constraint. * config/riscv/predicates.md (prefetch_operand): Updated for mips nine bits offset. * config/riscv/riscv.cc (riscv_prefetch_offset_address_p): Legitimate address with offset for prefetch check. * config/riscv/riscv-protos.h: Likewise. * config/riscv/riscv.h: Macros to support for mips cached type. * doc/riscv-ext.texi: Updated for mips prefetch. gcc/testsuite/ChangeLog: * gcc.target/riscv/mipsprefetch.c: Test file for mips.pref.
2025-08-15RISC-V: Allow errors to be suppressed when parsing architecturesRichard Sandiford4-60/+75
One of Alfie's FMV patches adds a hook that, in some cases, is used to silently query a target_version (with no diagnostics expected). In the review, I'd suggested handling this using a location_t *, with null meaning "suppress diagnostics": https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692113.html This patch tries to propagate that through the RISC-V parsing code. I realise this isn't very elegant, sorry. I think riscv_compare_version_priority should also logically suppress diagnostics, since it's supposed to be a pure query function. (From that point of view, advocating for this change for Alfie's patch might have been a bit unfair.) gcc/ * config/riscv/riscv-protos.h (riscv_process_target_version_attr): Change location_t argument to location_t *. * config/riscv/riscv-subset.h (riscv_subset_list::riscv_subset_list): Change location_t argument to location_t *. (riscv_subset_list::parse): Likwise. (riscv_subset_list::set_loc): Likewise. (riscv_minimal_hwprobe_feature_bits): Likewise. (riscv_subset_list::m_loc): Change type to location_t. * common/config/riscv/riscv-common.cc (riscv_subset_list::riscv_subset_list): Change location_t argument to location_t *. (riscv_subset_list::add): Suppress diagnostics when m_loc is null. (riscv_subset_list::parsing_subset_version): Likewise. (riscv_subset_list::parse_profiles): Likewise. (riscv_subset_list::parse_base_ext): Likewise. (riscv_subset_list::parse_single_std_ext): Likewise. (riscv_subset_list::check_conflict_ext): Likewise. (riscv_subset_list::parse_single_multiletter_ext): Likewise. (riscv_subset_list::parse): Change location_t argument to location_t *. (riscv_subset_list::set_loc): Likewise. (riscv_minimal_hwprobe_feature_bits): Likewise. (riscv_parse_arch_string): Update call accordingly. * config/riscv/riscv-target-attr.cc (riscv_target_attr_parser::m_loc): Change type to location_t *. (riscv_target_attr_parser::riscv_target_attr_parser): Change location_t argument to location_t *. (riscv_process_one_target_attr): Likewise. (riscv_process_target_attr): Likewise. (riscv_process_target_version_attr): Likewise. (riscv_target_attr_parser::parse_arch): Suppress diagnostics when m_loc is null. (riscv_target_attr_parser::handle_arch): Likewise. (riscv_target_attr_parser::handle_cpu): Likewise. (riscv_target_attr_parser::handle_tune): Likewise. (riscv_target_attr_parser::handle_priority): Likewise. (riscv_option_valid_attribute_p): Update call accordingly. (riscv_option_valid_version_attribute_p): Likewise. * config/riscv/riscv.cc (parse_features_for_version): Add a location_t * argument. (dispatch_function_versions): Update call accordingly. (riscv_compare_version_priority): Likewise, suppressing diagnostics.
2025-08-15LoongArch: Fix ICE caused by function add_stmt_cost[PR121542].Lulu Cheng1-0/+1
PR target/121542 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs::add_stmt_cost): When using vectype, first determine whether it is NULL. gcc/testsuite/ChangeLog: * gcc.target/loongarch/pr121542.c: New test.
2025-08-14[PR target/119275][RISC-V] Avoid calling gen_lowpart in cases where it would ICEJeff Law1-1/+2
So this is a minor bug in the riscv move expanders. It has a special cases for extraction from vector objects which makes assumptions that it can use gen_lowpart unconditionally. That's not always the case. We can just bypass that special code for cases where we can't use gen_lowpart and let the more generic code run. If gen_lowpart_common indicates we've got a case that can't be handled we just bypass the special extraction code. Tested on riscv64-elf and riscv32-elf. Waiting for pre-commit CI to do its thing. PR target/119275 gcc/ * config/riscv/riscv.cc (riscv_legitimize_move): Avoid calling gen_lowpart for cases where it'll fail. Just use standard expander paths for those cases. gcc/testsuite/ * gcc.target/riscv/pr119275.c: New test.
2025-08-14fix cris-elf build with binutils-2.45Mikael Pettersson1-1/+1
Since the cris port was added to gcc it has passed --em=criself to gas, as an abbreviation for --emulation=criself. Starting with binutils-2.45 that causes a hard error in gas due to ambiguity with another option. Fixed by replacing the abbreviation with the complete option. Tested by building a cross to cris-elf with binutils-2.45, which failed before but now succeeds. gcc/ PR target/121336 * config/cris/cris.h: Do not abbreviate --emulation. Signed-off-by: Mikael Pettersson <mikpelinux@gmail.com>
2025-08-14powerpc: Add missing modes to P9 if_then_elses [PR121501]Richard Sandiford1-20/+20
These patterns had one (if_then_else ...) nested within another. The outer if_then_else had SImode, which means that the "then" and "else" should also be SImode (unless they're const_ints). However, the inner if_then_else was modeless, which led to an assertion failure when trying to take a subreg of it. gcc/ PR target/121501 * config/rs6000/rs6000.md (cmprb, setb_signed, setb_unsigned) (cmprb2, cmpeqb): Add missing modes to nested if_then_elses.
2025-08-14s390: Fix zero extend patterns using vlgvStefan Schulze Frielinghaus2-69/+61
In commit r16-2316-gc6676092318 mistakenly patterns were introduced which actually should have been merged as alternatives to existing zero extend patterns. While on it, generalize the vec_extract patterns and also allow registers for the index. A subsequent patch will add register+immediate support. gcc/ChangeLog: * config/s390/s390.md: Merge movdi<mode>_zero_extend_A and movsi<mode>_zero_extend_A into zero_extendsidi2 and zero_extendhi<mode>2_z10 and zero_extend<HQI:mode><GPR:mode>2_extimm. * config/s390/vector.md (*movdi<mode>_zero_extend_A): Remove. (*movsi<mode>_zero_extend_A): Remove. (*movdi<mode>_zero_extend_B): Move to vec_extract patterns and rename to *vec_extract<mode>_zero_extend. (*movsi<mode>_zero_extend_B): Ditto. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vlgv-zero-extend-1.c: Require target s390_mvx. * gcc.target/s390/vector/vlgv-zero-extend-2.c: New test.
2025-08-13x86: Disallow MMX and 80387 in no_caller_saved_registers functionH.J. Lu1-0/+4
commit 9804b23198b39f85a7258be556c5e8aed44b9efc Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Apr 13 11:38:24 2025 -0700 x86: Add preserve_none and update no_caller_saved_registers attributes allowed MMX/80387 instructions in functions with no_caller_saved_registers attribute by accident. Update ix86_set_current_function to properly check if MMX and 80387 are enabled. gcc/ PR target/121540 * config/i386/i386-options.cc (ix86_set_current_function): Properly check if MMX and 80387 are enabled. gcc/testsuite/ PR target/121540 * gcc.target/i386/no-callee-saved-19a.c (dg-options): Add "-mno-avx -mno-mmx -mno-80387" * gcc.target/i386/no-callee-saved-19b.c: Likewise. * gcc.target/i386/no-callee-saved-19c.c: Likewise. * gcc.target/i386/no-callee-saved-19d.c: Likewise. * gcc.target/i386/no-callee-saved-19e.c: Likewise. * gcc.target/i386/pr121208-1a.c: Likewise. * gcc.target/i386/pr121208-1b.c: Likewise. * gcc.target/i386/pr121540-1.c: New test. * gcc.target/i386/pr121540-2.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-13[RISC-V][PR target/121531] Cover missing insn types in p400 and p600 ↵Jeff Law2-0/+27
scheduler models So the usual problems, DFAs without full coverage. I took the output of Kito's checker and use that to construct a dummy reservation for the p400 and p600 sifive models. Tested on riscv32-elf and riscv64-elf with no regressions. Pushing to the trunk once pre-commit CI gives the green light. PR target/121531 gcc/ * config/riscv/sifive-p400.md (sifive_p400_unknown): New reservation. * config/riscv/sifive-p600.md (sifive_p600_unkonwn): Likewise. gcc/testsuite/ * gcc.target/riscv/pr121531.c: New test.
2025-08-13x86-64: Remove redundant TLS callsH.J. Lu6-154/+708
For TLS calls: 1. UNSPEC_TLS_GD: (parallel [ (set (reg:DI 0 ax) (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) (const_int 0 [0]))) (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) (reg/f:DI 7 sp)] UNSPEC_TLS_GD) (clobber (reg:DI 5 di))]) 2. UNSPEC_TLS_LD_BASE: (parallel [ (set (reg:DI 0 ax) (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) (const_int 0 [0]))) (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) 3. UNSPEC_TLSDESC: (parallel [ (set (reg/f:DI 104) (plus:DI (unspec:DI [ (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) (reg:DI 114) (reg/f:DI 7 sp)] UNSPEC_TLSDESC) (const:DI (unspec:DI [ (symbol_ref:DI ("e") [flags 0x1a]) ] UNSPEC_DTPOFF)))) (clobber (reg:CC 17 flags))]) (parallel [ (set (reg:DI 101) (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) (reg:DI 112) (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) (clobber (reg:CC 17 flags))]) they return the same value for the same input value. But multiple calls with the same input value may be generated for simple programs like: void a(long *); int b(void); void c(void); static __thread long e; long d(void) { a(&e); if (b()) c(); return e; } When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are generated: .type d, @function d: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 leaq e@TLSDESC(%rip), %rbx movq %rbx, %rax call *e@TLSCALL(%rax) addq %fs:0, %rax movq %rax, %rdi call a@PLT call b@PLT testl %eax, %eax jne .L8 movq %rbx, %rax call *e@TLSCALL(%rax) popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 movq %fs:(%rax), %rax ret .p2align 4,,10 .p2align 3 .L8: .cfi_restore_state call c@PLT movq %rbx, %rax call *e@TLSCALL(%rax) popq %rbx .cfi_def_cfa_offset 8 movq %fs:(%rax), %rax ret .cfi_endproc There are 3 "call *e@TLSCALL(%rax)". They all return the same value. Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, extend it to also remove redundant TLS calls to generate: d: .LFB0: .cfi_startproc pushq %rbx .cfi_def_cfa_offset 16 .cfi_offset 3, -16 leaq e@TLSDESC(%rip), %rax movq %fs:0, %rdi call *e@TLSCALL(%rax) addq %rax, %rdi movq %rax, %rbx call a@PLT call b@PLT testl %eax, %eax jne .L8 movq %fs:(%rbx), %rax popq %rbx .cfi_remember_state .cfi_def_cfa_offset 8 ret .p2align 4,,10 .p2align 3 .L8: .cfi_restore_state call c@PLT movq %fs:(%rbx), %rax popq %rbx .cfi_def_cfa_offset 8 ret .cfi_endproc with only one "call *e@TLSCALL(%rax)". This reduces the number of __tls_get_addr calls in libgcc.a by 72%: __tls_get_addr calls before after libgcc.a 868 243 gcc/ PR target/81501 * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD, X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC. (redundant_load): Renamed to ... (redundant_pattern): This. (ix86_place_single_vector_set): Replace redundant_load with redundant_pattern. (replace_tls_call): New. (ix86_place_single_tls_call): Likewise. (pass_remove_redundant_vector_load): Renamed to ... (pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, kind, x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and candidate_vector_p. (pass_x86_cse::candidate_gnu_tls_p): New. (pass_x86_cse::candidate_gnu2_tls_p): Likewise. (pass_x86_cse::candidate_vector_p): Likewise. (remove_redundant_vector_load): Renamed to ... (pass_x86_cse::x86_cse): This. Extend to remove redundant TLS calls. (make_pass_remove_redundant_vector_load): Renamed to ... (make_pass_x86_cse): This. * config/i386/i386-passes.def: Replace pass_remove_redundant_vector_load with pass_x86_cse. * config/i386/i386-protos.h (ix86_tls_get_addr): New. (make_pass_remove_redundant_vector_load): Renamed to ... (make_pass_x86_cse): This. * config/i386/i386.cc (ix86_tls_get_addr): Remove static. * config/i386/i386.h (machine_function): Add tls_descriptor_call_multiple_p. * config/i386/i386.md (tls64): New attribute. (@tls_global_dynamic_64_<mode>): Set tls_descriptor_call_multiple_p. (@tls_local_dynamic_base_64_<mode>): Likewise. (@tls_dynamic_gnu2_64_<mode>): Likewise. (*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd. (*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to ld_base. (*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea. (*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call. (*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to combine. gcc/testsuite/ PR target/81501 * g++.target/i386/pr81501-1.C: New test. * gcc.target/i386/pr81501-1a.c: Likewise. * gcc.target/i386/pr81501-1b.c: Likewise. * gcc.target/i386/pr81501-2a.c: Likewise. * gcc.target/i386/pr81501-2b.c: Likewise. * gcc.target/i386/pr81501-3.c: Likewise. * gcc.target/i386/pr81501-4a.c: Likewise. * gcc.target/i386/pr81501-4b.c: Likewise. * gcc.target/i386/pr81501-5.c: Likewise. * gcc.target/i386/pr81501-6a.c: Likewise. * gcc.target/i386/pr81501-6b.c: Likewise. * gcc.target/i386/pr81501-7.c: Likewise. * gcc.target/i386/pr81501-8a.c: Likewise. * gcc.target/i386/pr81501-8b.c: Likewise. * gcc.target/i386/pr81501-9a.c: Likewise. * gcc.target/i386/pr81501-9b.c: Likewise. * gcc.target/i386/pr81501-10a.c: Likewise. * gcc.target/i386/pr81501-10b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-08-13Darwin: Handle linker '-no_deduplicate' option.Iain Sandoe1-7/+21
Newer linker support an option to disable deduplication of entities. This speeds up linking and can improve debug experience. Adopting the same criteria as clang in adding the option. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config.in: Regenerate. * config/darwin.h (DARWIN_LD_NO_DEDUPLICATE): New. (LINK_SPEC): Handle -no_deduplicate. * configure: Regenerate. * configure.ac: Detect linker support for -no_deduplicate.
2025-08-13Darwin: Handle string constants specially when asan is enabled.Iain Sandoe2-6/+40
The Darwin ABI uses a different section for string constants when address sanitizing is enabled. This adds defintions of the asan- specific sections and switches string constants to the correct section. It also makes the string constant symbols linker-visible when asan is enabled, but not otherwise. gcc/ChangeLog: * config/darwin-sections.def (asan_string_section, asan_globals_section, asan_liveness_section): New. * config/darwin.cc (objc_method_decl): Use asan sections when asan is enabled. (darwin_encode_section_info): Alter string constant linker visibility depending on asan. (machopic_select_section): Use the asan sections when asan is enabled. gcc/testsuite/ChangeLog: * gcc.dg/torture/darwin-cfstring-3.c: Adjust for amended string labels. * g++.dg/torture/darwin-cfstring-3.C: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-08-13[RISC-V][PR target/121160] Avoid bogus force_reg callJeff Law1-2/+2
When we canonicalize the comparison for a czero sequence we need to handle both integer and fp comparisons. Furthermore, within the integer space we want to make sure we promote any sub-word objects to a full word. All that is working fine. After promotion we then force the value into a register if it is not a register or constant already. The idea is not to have to special case subregs in subsequent code. This works fine except when we're presented with a floating point object that would be a subword. (subreg:SF (reg:SI)) on rv64 for example. So this tightens up that force_reg step. Bootstapped and regression tested on riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf. Pushing to the trunk after pre-commit verifies no regressions. Jeff PR target/121160 gcc/ * config/riscv/riscv.cc (canonicalize_comparands); Tighten check for forcing value into a GPR. gcc/testsuite/ * gcc.target/riscv/pr121160.c: New test.
2025-08-13Fold GATHER_SCATTER_*_P into vect_memory_access_typeRichard Biener3-8/+7
The following splits up VMAT_GATHER_SCATTER into VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce the uses of (full) gs_info, but it also makes the kind representable by a single entry rather than the ifn and decl tristate. The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN, since that's what we end up checking. * tree-vectorizer.h (vect_memory_access_type): Replace VMAT_GATHER_SCATTER with three separate access types, VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. (mat_gather_scatter_p): New predicate. (GATHER_SCATTER_LEGACY_P): Remove. (GATHER_SCATTER_IFN_P): Likewise. (GATHER_SCATTER_EMULATED_P): Likewise. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Adjust. (get_load_store_type): Likewise. (vect_get_loop_variant_data_ptr_increment): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Likewise. * config/riscv/riscv-vector-costs.cc (costs::need_additional_vector_vars_p): Likewise. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.