Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
* config/riscv/t-rtems: Add -mstrict-align multilibs for
targets without support for misaligned access in hardware.
|
|
For cores without a hardware multiplier, set respective optabs
with library functions which use software implementation of
multiplication.
The implementation was copied from the RL78 backend.
gcc/ChangeLog:
* config/pru/pru.cc (pru_init_libfuncs): Set softmpy libgcc
functions for optab multiplication entries if TARGET_OPT_MUL
option is not set.
libgcc/ChangeLog:
* config/pru/libgcc-eabi.ver: Add __pruabi_softmpyi and
__pruabi_softmpyll symbols.
* config/pru/t-pru: Add softmpy source files.
* config/pru/pru-softmpy.h: New file.
* config/pru/softmpyi.c: New file.
* config/pru/softmpyll.c: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Enable multilib builds for contemporary PRU core versions (AM335x and
later), and older versions present in AM18xx.
gcc/ChangeLog:
* config.gcc: Include pru/t-multilib.
* config/pru/pru.h (MULTILIB_DEFAULTS): Define.
* config/pru/t-multilib: New file.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
Older PRU core versions (e.g. in AM1808 SoC) do not support
XIN, XOUT, FILL, ZERO instructions. Add GCC command line options to
optionally disable generation of those instructions, so that code
can be executed on such older PRU cores.
gcc/ChangeLog:
* common/config/pru/pru-common.cc (TARGET_DEFAULT_TARGET_FLAGS):
Keep multiplication, FILL and ZERO instructions enabled by
default.
* config/pru/pru.md (prumov<mode>): Gate code generation on
TARGET_OPT_FILLZERO.
(mov<mode>): Ditto.
(zero_extendqidi2): Ditto.
(zero_extendhidi2): Ditto.
(zero_extendsidi2): Ditto.
(@pru_ior_fillbytes<mode>): Ditto.
(@pru_and_zerobytes<mode>): Ditto.
(@<code>di3): Ditto.
(mulsi3): Gate code generation on TARGET_OPT_MUL.
* config/pru/pru.opt: Add mmul and mfillzero options.
* config/pru/pru.opt.urls: Regenerate.
* config/rl78/rl78.opt.urls: Regenerate.
* doc/invoke.texi: Document new options.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
For a basic block with only a label:
(code_label 78 11 77 3 14 (nil) [1 uses])
(note 77 78 54 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
emit the TLS call after NOTE_INSN_BASIC_BLOCK, instead of before
NOTE_INSN_BASIC_BLOCK, to avoid
x.c: In function ‘aout_16_write_syms’:
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK is missing for block 3
54 | }
| ^
x.c:54:1: error: NOTE_INSN_BASIC_BLOCK 77 in middle of basic block 3
during RTL pass: x86_cse
x.c:54:1: internal compiler error: verify_flow_info failed
gcc/
PR target/121607
* config/i386/i386-features.cc (ix86_emit_tls_call): Emit the
TLS call after NOTE_INSN_BASIC_BLOCK in a basic block with only
a label.
gcc/testsuite/
PR target/121607
* gcc.target/i386/pr121607-1a.c: New test.
* gcc.target/i386/pr121607-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch changes the implementation of the insn to test whether the
result itself is negative or not, rather than the MSB of the result of
the ABS machine instruction. This eliminates the need to consider bit-
endianness and allows for longer branch distances.
/* example */
extern void foo(int);
void test0(int a) {
if (a == -2147483648)
foo(a);
}
void test1(int a) {
if (a != -2147483648)
foo(a);
}
;; before (endianness: little)
test0:
entry sp, 32
abs a8, a2
bbci a8, 31, .L1
mov.n a10, a2
call8 foo
.L1:
retw.n
test1:
entry sp, 32
abs a8, a2
bbsi a8, 31, .L4
mov.n a10, a2
call8 foo
.L4:
retw.n
;; after (endianness-independent)
test0:
entry sp, 32
abs a8, a2
bgez a8, .L1
mov.n a10, a2
call8 foo
.L1:
retw.n
test1:
entry sp, 32
abs a8, a2
bltz a8, .L4
mov.n a10, a2
call8 foo
.L4:
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.md (*btrue_INT_MIN):
Change the branch insn condition to test for a negative number
rather than testing for the MSB.
|
|
I'd added the aarch64-specific CC fusion pass to fold a PTEST
instruction into the instruction that feeds the PTEST, in cases
where the latter instruction can set the appropriate flags as a
side-effect.
Combine does the same optimisation. However, as explained in the
comments, the PTEST case often has:
A: set predicate P based on inputs X
B: clobber X
C: test P
and so the fusion is only possible if we move C before B.
That's something that combine currently can't do (for the cases
that we needed).
The optimisation was never really AArch64-specific. It's just that,
in an all-too-familiar fashion, we needed it in stage 3, when it was
too late to add something target-independent.
late-combine adds a convenient place to do the optimisation in a
target-independent way, just as combine is a convenient place to
do its related optimisation.
gcc/
* config.gcc (aarch64*-*-*): Remove aarch64-cc-fusion.o from
extra_objs.
* config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete.
* config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete.
* config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete.
* config/aarch64/aarch64-cc-fusion.cc: Delete.
* late-combine.cc (late_combine::optimizable_set): Take a set_info *
rather than an insn_info * and move destination tests from...
(late_combine::combine_into_uses): ...here. Take a set_info * rather
an insn_info *. Take the rtx set.
(late_combine::parallelize_insns, late_combine::combine_cc_setter)
(late_combine::combine_insn): New member functions.
(late_combine::m_parallel): New member variable.
* rtlanal.cc (pattern_cost): Handle sets of CC registers in the
same way as comparisons.
|
|
The linker rejects --relax in relocatable links (-r), hence only
add --relax when -r is not specified.
gcc/
PR target/121608
* config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.
|
|
We can't place a TLS call before a conditional jump in a basic block like
(code_label 13 11 14 4 2 (nil) [1 uses])
(note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(jump_insn 16 14 17 4 (set (pc)
(if_then_else (le (reg:CCNO 17 flags)
(const_int 0 [0]))
(label_ref 27)
(pc))) "x.c":10:21 discrim 1 1462 {*jcc}
(expr_list:REG_DEAD (reg:CCNO 17 flags)
(int_list:REG_BR_PROB 628353713 (nil)))
-> 27)
since the TLS call will clobber flags register nor place a TLS call in a
basic block if any live caller-saved registers aren't dead at the end of
the basic block:
;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104
;; live gen 0 [ax] 102 106 108 116 117 118 120
;; live kill 5 [di]
Instead, we should place such call before all register setting basic
blocks which dominate the current basic block.
Keep track the replaced GNU and GNU2 TLS instructions. Use these info to
place the __tls_get_addr call and mark FLAGS register as dead.
gcc/
PR target/121572
* config/i386/i386-features.cc (replace_tls_call): Add a bitmap
argument and put the updated TLS instruction in the bitmap.
(ix86_get_dominator_for_reg): New.
(ix86_check_flags_reg): Likewise.
(ix86_emit_tls_call): Likewise.
(ix86_place_single_tls_call): Add 2 bitmap arguments for updated
GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit
TLS instruction. Correct debug dump for before instruction.
gcc/testsuite/
PR target/121572
* gcc.target/i386/pr121572-1a.c: New test.
* gcc.target/i386/pr121572-1b.c: Likewise.
* gcc.target/i386/pr121572-2a.c: Likewise.
* gcc.target/i386/pr121572-2b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in
STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead
requires the use of SLP_TREE_VECTYPE for everything but data-refs.
This means that STMT_VINFO_VECTYPE (stmt_info) will always be NULL and so
aarch64_bool_compound_p will never properly cost predicate AND operations
anymore resulting in less vectorization.
This patch changes it to use SLP_TREE_VECTYPE and pass the slp_node to
aarch64_bool_compound_p.
gcc/ChangeLog:
PR target/121536
* config/aarch64/aarch64.cc (aarch64_bool_compound_p): Use
SLP_TREE_VECTYPE instead of STMT_VINFO_VECTYPE.
(aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Pass SLP
node to aarch64_bool_compound_p.
gcc/testsuite/ChangeLog:
PR target/121536
* g++.target/aarch64/sve/pr121536.cc: New test.
|
|
commit g:fb59c5719c17a04ecfd58b5e566eccd6d2ac583a stops passing the scalar type
(confusingly named vectype) to the costing hook when doing scalar costing.
As a result, we could no longer distinguish between FPR and GPR scalar stmts.
A later commit also removed STMT_VINFO_VECTYPE from stmt_info.
This leaves the only remaining option to get the type of the original stmt in
the stmt_info. This patch does this when we're performing scalar costing.
Ideally I'd refactor this a bit because a lot of the hooks just need to know if
it's FP or not, but this seems pointless with the ongoing costing churn. So for
now this restores our costing.
gcc/ChangeLog:
PR target/121536
* config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Set
vectype from type of lhs of gimple stmt.
|
|
Define new constants to be used by the MTE pattern definitions.
gcc/
* config/aarch64/aarch64.md (MEMTAG_TAG_MASK): New define
constant.
(MEMTAG_ADDR_MASK): Likewise.
(irg, subp, ldg): Use new constants.
Signed-off-by: Claudiu Zissulescu <claudiu.zissulescu-ianculescu@oracle.com>
|
|
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_TI_FETCH_ADD): New unspec.
(UNSPEC_TI_FETCH_SUB): Likewise.
(UNSPEC_TI_FETCH_AND): Likewise.
(UNSPEC_TI_FETCH_XOR): Likewise.
(UNSPEC_TI_FETCH_OR): Likewise.
(UNSPEC_TI_FETCH_NAND_MASK_INVERTED): Likewise.
(ALL_SC): New define_mode_iterator.
(_scq): New define_mode_attr.
(atomic_fetch_nand<mode>): Accept ALL_SC instead of only GPR.
(UNSPEC_TI_FETCH_DIRECT): New define_int_iterator.
(UNSPEC_TI_FETCH): New define_int_iterator.
(amop_ti_fetch): New define_int_attr.
(size_ti_fetch): New define_int_attr.
(atomic_fetch_<amop_ti_fetch>ti_scq): New define_insn.
(atomic_fetch_<amop_ti_fetch>ti): New define_expand.
|
|
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_exchangeti_scq): New
define_insn.
(atomic_exchangeti): New define_expand.
|
|
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_compare_and_swapti_scq): New
define_insn.
(atomic_compare_and_swapti): New define_expand.
|
|
When LSX is not available but sc.q is (for example on LA664 where the
SIMD unit is not enabled), we can use a LL-SC loop for 16-byte atomic
store.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Accept "%t" for printing the number of the 64-bit machine
register holding the upper half of a TImode.
* config/loongarch/sync.md (atomic_storeti_scq): New
define_insn.
(atomic_storeti): expand to atomic_storeti_scq if !ISA_HAS_LSX.
|
|
We'll use the sc.q instruction for some 16-byte atomic operations, but
it's only added in LoongArch 1.1 evolution so we need to gate it with
an option.
gcc/ChangeLog:
* config/loongarch/genopts/isa-evolution.in (scq): New evolution
feature.
* config/loongarch/loongarch-evolution.cc: Regenerate.
* config/loongarch/loongarch-evolution.h: Regenerate.
* config/loongarch/loongarch-str.h: Regenerate.
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.opt.urls: Regenerate.
* config/loongarch/loongarch-def.cc: Make -mscq the default for
-march=la664 and -march=la64v1.1.
* doc/invoke.texi (LoongArch Options): Document -m[no-]scq.
|
|
If the vector is naturally aligned, it cannot cross cache lines so the
LSX store is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic store, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_storeti_lsx): New
define_insn.
(atomic_storeti): New define_expand.
|
|
If the vector is naturally aligned, it cannot cross cache lines so the
LSX load is guaranteed to be atomic. Thus we can use LSX to do the
lock-free atomic load, instead of using a lock.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_loadti_lsx): New define_insn.
(atomic_loadti): New define_expand.
|
|
Without atomic_fetch_nandsi and atomic_fetch_nanddi, __atomic_fetch_nand
is expanded to a loop containing a CAS in the body, and CAS itself is a
LL-SC loop so we have a nested loop. This is obviously not a good idea
as we just need one LL-SC loop in fact.
As ~(atom & mask) is (~mask) | (~atom), we can just invert the mask
first and the body of the LL-SC loop would be just one orn instruction.
gcc/ChangeLog:
* config/loongarch/sync.md
(atomic_fetch_nand_mask_inverted<GPR:mode>): New define_insn.
(atomic_fetch_nand<GPR:mode>): New define_expand.
|
|
With -mlam-bh, we should negate the addend first, and use an amadd
instruction. Disabling the expander makes the compiler do it correctly.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_fetch_sub<SHORT:mode>):
Disable if ISA_HAS_LAM_BH.
|
|
We can just shift the mask and fill the other bits with 0 (for ior/xor)
or 1 (for and), and use an am*.w instruction to perform the atomic
operation, instead of using a LL-SC loop.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AND):
Remove.
(UNSPEC_COMPARE_AND_SWAP_XOR): Remove.
(UNSPEC_COMPARE_AND_SWAP_OR): Remove.
(atomic_test_and_set): Rename to ...
(atomic_fetch_<any_bitwise:amop><SHORT:mode>): ... this, and
adapt the expansion to use it for any bitwise operations and any
val, instead of just ior 1.
(atomic_test_and_set): New define_expand.
|
|
atomic_test_and_set
On LoongArch sll.w and srl.w instructions only take the [4:0] bits of
rk (shift amount) into account, and we've already defined
SHIFT_COUNT_TRUNCATED to 1 so the compiler knows this fact, thus we
don't need this instruction.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set): Remove
unneeded andi instruction from the expansion.
|
|
This instruction is used to skip an redundant barrier if -mno-ld-seq-sa
or the memory model requires a barrier on failure. But with -mld-seq-sa
and other memory models the barrier may be nonexisting at all, and we
should remove the "b 3f" instruction as well.
The implementation uses a new operand modifier "%T" to output a comment
marker if the operand is a memory order for which the barrier won't be
generated. "%T", and also "%t", are not really used before and the code
for them in loongarch_print_operand_reloc is just some MIPS legacy.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand_reloc):
Make "%T" output a comment marker if the operand is a memory
order for which the barrier won't be generated; remove "%t".
* config/loongarch/sync.md (atomic_cas_value_strong<mode>): Add
%T before "b 3f".
(atomic_cas_value_cmp_and_7_<mode>): Likewise.
|
|
For LL-SC loops, if the atomic operation has succeeded, the SC
instruction always imply a full barrier, so the barrier we manually
inserted only needs to take the account for the failure memorder, not
the success memorder (the barrier is skipped with "b 3f" on success
anyway).
Note that if we use the AMCAS instructions, we indeed need to consider
both the success memorder an the failure memorder deciding if "_db"
suffix is needed. Thus the semantics of atomic_cas_value_strong<mode>
and atomic_cas_value_strong<mode>_amcas start to be different. To
prevent the compiler from being too clever, use a different unspec code
for AMCAS instructions.
gcc/ChangeLog:
* config/loongarch/sync.md (UNSPEC_COMPARE_AND_SWAP_AMCAS): New
UNSPEC code.
(atomic_cas_value_strong<mode>): NFC, update the comment to note
we only need to consider failure memory order.
(atomic_cas_value_strong<mode>_amcas): Use
UNSPEC_COMPARE_AND_SWAP_AMCAS instead of
UNSPEC_COMPARE_AND_SWAP.
(atomic_compare_and_swap<mode:GPR>): Pass failure memorder to
gen_atomic_cas_value_strong<mode>.
(atomic_compare_and_swap<mode:SHORT>): Pass failure memorder to
gen_atomic_cas_value_cmp_and_7_si.
|
|
We can use bstrins for masking the address here. As people are already
working on LA32R (which lacks bstrins instructions), for future-proofing
we check whether (const_int -4) is an and_operand and force it into an
register if not.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_test_and_set): Use bstrins
for masking the address if possible.
|
|
Atomic load does not modify the memory. Atomic store does not read the
memory, thus we can use "=" instead.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_load<mode>): Remove "+" for
the memory operand.
(atomic_store<mode>): Use "=" instead of "+" for the memory
operand.
|
|
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md: Use <size> instead of <amo>.
(amo): Remove.
|
|
They are the same.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_optab): Remove.
(atomic_<atomic_optab><mode>): Change atomic_optab to amop.
(atomic_fetch_<atomic_optab><mode>): Likewise.
|
|
PR 121213 shows an unnecessary "li target,0" in an atomic exchange loop
on RISC-V.
The source operand for an amoswap instruction should allow (const_int 0)
in addition to GPRs. So the operand's predicate is changed to
"reg_or_0_operand". The corresponding constraint is also changed to
allow a reg or the constant 0.
With the source operand no longer tied to the destination operand we do
not need the earlyclobber for the destination, so the destination
operand's constraint is adjusted accordingly.
This patch does not address the unnecessary sign extension reported in
the PR.
Tested with no regressions on riscv32-elf and riscv64-elf.
PR target/121213
gcc/
* config/riscv/sync.md (amo_atomic_exchange<mode>): Allow
(const_int 0) as input operand. Do not tie input to output.
No longer earlyclobber the output.
gcc/testsuite
* gcc.target/riscv/amo/pr121213.c: New test.
|
|
This patch fixes genrecog warnings about operands missing modes. This is
done by explicitly specifying modes of operations.
PR target/109324
gcc/ChangeLog:
* config/h8300/addsub.md: Explicitly specify mode for plus operation.
* config/h8300/jumpcall.md: Explicitly specify modes for eq and
match_operand operations.
* config/h8300/testcompare.md: Explicitly specify modes for eq, ltu
and compare operations.
|
|
Commit r16-3028-g0c517ddf9b136c introduced parsing of conditional blocks
in riscv-ext*.def. For simplicity, it used a simple regular expression
to match the C++ lambda function for each condition. But the regular
expression is too simple - it matches only the first scoped code block,
without any trailing closing braces.
The "c" dependency for the "zca" extension has two code blocks inside
its conditional. One for RV32 and one for RV64. The script matches
only the RV32 block, and leaves the RV64 one. Any strings left, in turn,
are considered a list of non-conditional extensions. Thus the quoted
strings "d" and "zcd" from that block are taken as "simple" (non-conditional)
dependencies:
if (subset_list->xlen () == 64)
{
if (subset_list->lookup ("d"))
return subset_list->lookup ("zcd");
As a result, arch-canonicalize erroneously adds "d" extension:
$ ./config/riscv/arch-canonicalize rv32ec
rv32efdc_zicsr_zca_zcd_zcf
Before r16-3028-g0c517ddf9b136c the command returned:
$ ./config/riscv/arch-canonicalize rv32ec
rv32ec
Fix by extending the conditional block match until the number of opening
and closing braces is equal. This change might seem crude, but it does
save us from introducing a full C++ parser into the simple
arch-canonicalize python script. With this patch the script now
returns:
$ ./config/riscv/arch-canonicalize rv32ec
rv32ec_zca
Ok for trunk?
PR target/121538
gcc/ChangeLog:
* config/riscv/arch-canonicalize (parse_dep_exts):
Match condition block up to closing brace.
(test_parse_long_condition_block): New test.
|
|
Add target("80387") attribute to enable and disable x87 instructions in a
function.
gcc/
PR target/121541
* config/i386/i386-options.cc
(ix86_valid_target_attribute_inner_p): Add target("80387")
attribute. Set the mask bit in opts_set->x_target_flags if the
mask bit in opts->x_target_flags is updated.
* doc/extend.texi: Document target("80387") function attribute.
gcc/testsuite/
PR target/121541
* gcc.target/i386/pr121541-1a.c: New test.
* gcc.target/i386/pr121541-1b.c: Likewise.
* gcc.target/i386/pr121541-2.c: Likewise.
* gcc.target/i386/pr121541-3.c: Likewise.
* gcc.target/i386/pr121541-4.c: Likewise.
* gcc.target/i386/pr121541-5a.c: Likewise.
* gcc.target/i386/pr121541-5b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The supported insn of vx combine is out of date, update all
insn supported for now.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Add supported insn
of vx combine.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
The previous patch missed the DONE indicator of the vx
combine pattern. Thus add it back.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Add missed DONE
for vx combine pattern.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
__builtin_round() fails to save/restore FP exception flags around the FP
compare insn which can potentially clobber the same.
Worth noting that the fflags restore bracketing is slightly different
than the glibc implementation. Both FLT and FCVT can potentially clobber
fflags. gcc generates below where even if branch is not taken and FCVT
is not executed, FLT still executed. Thus FSFLAGS is placed AFTER the
label 'L3'. glibc implementation FLT can't clobber due to early NaN check,
so FSFLAGS can be moved under the branch, before the label.
| convert_float_to_float_round
| ...
| frflags a5
| fabs.s fa5,fa0
| flt.s a4,fa5,fa4 <--- can clobber fflags
| beq a4,zero,.L3
| fcvt.w.s a4,fa0,rmm <--- also
| fcvt.s.w fa5,a4
| fsgnj.s fa0,fa5,fa0
| .L3:
| fsflags a5 <-- both code paths
Fixes: f652a35877e3 ("This is almost exclusively Jivan's work....")
PR target/121534
gcc/ChangeLog:
* config/riscv/riscv.md (round_pattern): save/restore fflags.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: Adjust
scan pattern for additional instances of frflags/fsrflags.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
xmipscbop.
Addressed the comments and tested "runtest --tool gcc --target_board='riscv-sim/-march=rv64gc_zba_zbb_zbc_zbs/-mabi=lp64/-mcmodel=medlow' riscv.exp" and 32 bit too
lint warnings can be ignored for riscv-ext.opt.
gcc/ChangeLog:
* config/riscv/riscv-ext-mips.def (DEFINE_RISCV_EXT):
Added mips prefetch extension.
* config/riscv/riscv-ext.opt: Generated file.
* config/riscv/riscv.md (prefetch):
Added mips prefetch address operand constraint.
* config/riscv/constraints.md: Added mips specific constraint.
* config/riscv/predicates.md (prefetch_operand):
Updated for mips nine bits offset.
* config/riscv/riscv.cc (riscv_prefetch_offset_address_p):
Legitimate address with offset for prefetch check.
* config/riscv/riscv-protos.h: Likewise.
* config/riscv/riscv.h:
Macros to support for mips cached type.
* doc/riscv-ext.texi: Updated for mips prefetch.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/mipsprefetch.c: Test file for mips.pref.
|
|
One of Alfie's FMV patches adds a hook that, in some cases,
is used to silently query a target_version (with no diagnostics
expected). In the review, I'd suggested handling this using
a location_t *, with null meaning "suppress diagnostics":
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692113.html
This patch tries to propagate that through the RISC-V parsing code.
I realise this isn't very elegant, sorry.
I think riscv_compare_version_priority should also logically suppress
diagnostics, since it's supposed to be a pure query function. (From
that point of view, advocating for this change for Alfie's patch might
have been a bit unfair.)
gcc/
* config/riscv/riscv-protos.h
(riscv_process_target_version_attr): Change location_t argument
to location_t *.
* config/riscv/riscv-subset.h
(riscv_subset_list::riscv_subset_list): Change location_t argument
to location_t *.
(riscv_subset_list::parse): Likwise.
(riscv_subset_list::set_loc): Likewise.
(riscv_minimal_hwprobe_feature_bits): Likewise.
(riscv_subset_list::m_loc): Change type to location_t.
* common/config/riscv/riscv-common.cc
(riscv_subset_list::riscv_subset_list): Change location_t argument
to location_t *.
(riscv_subset_list::add): Suppress diagnostics when m_loc is null.
(riscv_subset_list::parsing_subset_version): Likewise.
(riscv_subset_list::parse_profiles): Likewise.
(riscv_subset_list::parse_base_ext): Likewise.
(riscv_subset_list::parse_single_std_ext): Likewise.
(riscv_subset_list::check_conflict_ext): Likewise.
(riscv_subset_list::parse_single_multiletter_ext): Likewise.
(riscv_subset_list::parse): Change location_t argument to location_t *.
(riscv_subset_list::set_loc): Likewise.
(riscv_minimal_hwprobe_feature_bits): Likewise.
(riscv_parse_arch_string): Update call accordingly.
* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::m_loc): Change type to location_t *.
(riscv_target_attr_parser::riscv_target_attr_parser): Change
location_t argument to location_t *.
(riscv_process_one_target_attr): Likewise.
(riscv_process_target_attr): Likewise.
(riscv_process_target_version_attr): Likewise.
(riscv_target_attr_parser::parse_arch): Suppress diagnostics when
m_loc is null.
(riscv_target_attr_parser::handle_arch): Likewise.
(riscv_target_attr_parser::handle_cpu): Likewise.
(riscv_target_attr_parser::handle_tune): Likewise.
(riscv_target_attr_parser::handle_priority): Likewise.
(riscv_option_valid_attribute_p): Update call accordingly.
(riscv_option_valid_version_attribute_p): Likewise.
* config/riscv/riscv.cc (parse_features_for_version): Add a
location_t * argument.
(dispatch_function_versions): Update call accordingly.
(riscv_compare_version_priority): Likewise, suppressing diagnostics.
|
|
PR target/121542
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_vector_costs::add_stmt_cost): When using vectype,
first determine whether it is NULL.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/pr121542.c: New test.
|
|
So this is a minor bug in the riscv move expanders. It has a special cases for
extraction from vector objects which makes assumptions that it can use
gen_lowpart unconditionally. That's not always the case.
We can just bypass that special code for cases where we can't use gen_lowpart
and let the more generic code run. If gen_lowpart_common indicates we've got a
case that can't be handled we just bypass the special extraction code.
Tested on riscv64-elf and riscv32-elf. Waiting for pre-commit CI to do its
thing.
PR target/119275
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Avoid calling
gen_lowpart for cases where it'll fail. Just use standard expander
paths for those cases.
gcc/testsuite/
* gcc.target/riscv/pr119275.c: New test.
|
|
Since the cris port was added to gcc it has passed --em=criself
to gas, as an abbreviation for --emulation=criself. Starting with
binutils-2.45 that causes a hard error in gas due to ambiguity with
another option.
Fixed by replacing the abbreviation with the complete option.
Tested by building a cross to cris-elf with binutils-2.45, which
failed before but now succeeds.
gcc/
PR target/121336
* config/cris/cris.h: Do not abbreviate --emulation.
Signed-off-by: Mikael Pettersson <mikpelinux@gmail.com>
|
|
These patterns had one (if_then_else ...) nested within another.
The outer if_then_else had SImode, which means that the "then"
and "else" should also be SImode (unless they're const_ints).
However, the inner if_then_else was modeless, which led to an
assertion failure when trying to take a subreg of it.
gcc/
PR target/121501
* config/rs6000/rs6000.md (cmprb, setb_signed, setb_unsigned)
(cmprb2, cmpeqb): Add missing modes to nested if_then_elses.
|
|
In commit r16-2316-gc6676092318 mistakenly patterns were introduced
which actually should have been merged as alternatives to existing zero
extend patterns.
While on it, generalize the vec_extract patterns and also allow
registers for the index. A subsequent patch will add
register+immediate support.
gcc/ChangeLog:
* config/s390/s390.md: Merge movdi<mode>_zero_extend_A and
movsi<mode>_zero_extend_A into zero_extendsidi2 and
zero_extendhi<mode>2_z10 and
zero_extend<HQI:mode><GPR:mode>2_extimm.
* config/s390/vector.md (*movdi<mode>_zero_extend_A): Remove.
(*movsi<mode>_zero_extend_A): Remove.
(*movdi<mode>_zero_extend_B): Move to vec_extract patterns and
rename to *vec_extract<mode>_zero_extend.
(*movsi<mode>_zero_extend_B): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vlgv-zero-extend-1.c: Require target
s390_mvx.
* gcc.target/s390/vector/vlgv-zero-extend-2.c: New test.
|
|
commit 9804b23198b39f85a7258be556c5e8aed44b9efc
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Sun Apr 13 11:38:24 2025 -0700
x86: Add preserve_none and update no_caller_saved_registers attributes
allowed MMX/80387 instructions in functions with no_caller_saved_registers
attribute by accident. Update ix86_set_current_function to properly
check if MMX and 80387 are enabled.
gcc/
PR target/121540
* config/i386/i386-options.cc (ix86_set_current_function):
Properly check if MMX and 80387 are enabled.
gcc/testsuite/
PR target/121540
* gcc.target/i386/no-callee-saved-19a.c (dg-options): Add
"-mno-avx -mno-mmx -mno-80387"
* gcc.target/i386/no-callee-saved-19b.c: Likewise.
* gcc.target/i386/no-callee-saved-19c.c: Likewise.
* gcc.target/i386/no-callee-saved-19d.c: Likewise.
* gcc.target/i386/no-callee-saved-19e.c: Likewise.
* gcc.target/i386/pr121208-1a.c: Likewise.
* gcc.target/i386/pr121208-1b.c: Likewise.
* gcc.target/i386/pr121540-1.c: New test.
* gcc.target/i386/pr121540-2.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
scheduler models
So the usual problems, DFAs without full coverage. I took the output of Kito's
checker and use that to construct a dummy reservation for the p400 and p600
sifive models.
Tested on riscv32-elf and riscv64-elf with no regressions.
Pushing to the trunk once pre-commit CI gives the green light.
PR target/121531
gcc/
* config/riscv/sifive-p400.md (sifive_p400_unknown): New reservation.
* config/riscv/sifive-p600.md (sifive_p600_unkonwn): Likewise.
gcc/testsuite/
* gcc.target/riscv/pr121531.c: New test.
|
|
For TLS calls:
1. UNSPEC_TLS_GD:
(parallel [
(set (reg:DI 0 ax)
(call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
(const_int 0 [0])))
(unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
(reg/f:DI 7 sp)] UNSPEC_TLS_GD)
(clobber (reg:DI 5 di))])
2. UNSPEC_TLS_LD_BASE:
(parallel [
(set (reg:DI 0 ax)
(call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
(const_int 0 [0])))
(unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
3. UNSPEC_TLSDESC:
(parallel [
(set (reg/f:DI 104)
(plus:DI (unspec:DI [
(symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
(reg:DI 114)
(reg/f:DI 7 sp)] UNSPEC_TLSDESC)
(const:DI (unspec:DI [
(symbol_ref:DI ("e") [flags 0x1a])
] UNSPEC_DTPOFF))))
(clobber (reg:CC 17 flags))])
(parallel [
(set (reg:DI 101)
(unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
(reg:DI 112)
(reg/f:DI 7 sp)] UNSPEC_TLSDESC))
(clobber (reg:CC 17 flags))])
they return the same value for the same input value. But multiple calls
with the same input value may be generated for simple programs like:
void a(long *);
int b(void);
void c(void);
static __thread long e;
long
d(void)
{
a(&e);
if (b())
c();
return e;
}
When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
generated:
.type d, @function
d:
.LFB0:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaq e@TLSDESC(%rip), %rbx
movq %rbx, %rax
call *e@TLSCALL(%rax)
addq %fs:0, %rax
movq %rax, %rdi
call a@PLT
call b@PLT
testl %eax, %eax
jne .L8
movq %rbx, %rax
call *e@TLSCALL(%rax)
popq %rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
movq %fs:(%rax), %rax
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
call c@PLT
movq %rbx, %rax
call *e@TLSCALL(%rax)
popq %rbx
.cfi_def_cfa_offset 8
movq %fs:(%rax), %rax
ret
.cfi_endproc
There are 3 "call *e@TLSCALL(%rax)". They all return the same value.
Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
extend it to also remove redundant TLS calls to generate:
d:
.LFB0:
.cfi_startproc
pushq %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
leaq e@TLSDESC(%rip), %rax
movq %fs:0, %rdi
call *e@TLSCALL(%rax)
addq %rax, %rdi
movq %rax, %rbx
call a@PLT
call b@PLT
testl %eax, %eax
jne .L8
movq %fs:(%rbx), %rax
popq %rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
call c@PLT
movq %fs:(%rbx), %rax
popq %rbx
.cfi_def_cfa_offset 8
ret
.cfi_endproc
with only one "call *e@TLSCALL(%rax)". This reduces the number of
__tls_get_addr calls in libgcc.a by 72%:
__tls_get_addr calls before after
libgcc.a 868 243
gcc/
PR target/81501
* config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
(redundant_load): Renamed to ...
(redundant_pattern): This.
(ix86_place_single_vector_set): Replace redundant_load with
redundant_pattern.
(replace_tls_call): New.
(ix86_place_single_tls_call): Likewise.
(pass_remove_redundant_vector_load): Renamed to ...
(pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, kind,
x86_cse, candidate_gnu_tls_p, candidate_gnu2_tls_p and
candidate_vector_p.
(pass_x86_cse::candidate_gnu_tls_p): New.
(pass_x86_cse::candidate_gnu2_tls_p): Likewise.
(pass_x86_cse::candidate_vector_p): Likewise.
(remove_redundant_vector_load): Renamed to ...
(pass_x86_cse::x86_cse): This. Extend to remove redundant TLS
calls.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
* config/i386/i386-passes.def: Replace
pass_remove_redundant_vector_load with pass_x86_cse.
* config/i386/i386-protos.h (ix86_tls_get_addr): New.
(make_pass_remove_redundant_vector_load): Renamed to ...
(make_pass_x86_cse): This.
* config/i386/i386.cc (ix86_tls_get_addr): Remove static.
* config/i386/i386.h (machine_function): Add
tls_descriptor_call_multiple_p.
* config/i386/i386.md (tls64): New attribute.
(@tls_global_dynamic_64_<mode>): Set tls_descriptor_call_multiple_p.
(@tls_local_dynamic_base_64_<mode>): Likewise.
(@tls_dynamic_gnu2_64_<mode>): Likewise.
(*tls_global_dynamic_64_<mode>): Set tls64 attribute to gd.
(*tls_local_dynamic_base_64_<mode>): Set tls64 attribute to ld_base.
(*tls_dynamic_gnu2_lea_64_<mode>): Set tls64 attribute to lea.
(*tls_dynamic_gnu2_call_64_<mode>): Set tls64 attribute to call.
(*tls_dynamic_gnu2_combine_64_<mode>): Set tls64 attribute to
combine.
gcc/testsuite/
PR target/81501
* g++.target/i386/pr81501-1.C: New test.
* gcc.target/i386/pr81501-1a.c: Likewise.
* gcc.target/i386/pr81501-1b.c: Likewise.
* gcc.target/i386/pr81501-2a.c: Likewise.
* gcc.target/i386/pr81501-2b.c: Likewise.
* gcc.target/i386/pr81501-3.c: Likewise.
* gcc.target/i386/pr81501-4a.c: Likewise.
* gcc.target/i386/pr81501-4b.c: Likewise.
* gcc.target/i386/pr81501-5.c: Likewise.
* gcc.target/i386/pr81501-6a.c: Likewise.
* gcc.target/i386/pr81501-6b.c: Likewise.
* gcc.target/i386/pr81501-7.c: Likewise.
* gcc.target/i386/pr81501-8a.c: Likewise.
* gcc.target/i386/pr81501-8b.c: Likewise.
* gcc.target/i386/pr81501-9a.c: Likewise.
* gcc.target/i386/pr81501-9b.c: Likewise.
* gcc.target/i386/pr81501-10a.c: Likewise.
* gcc.target/i386/pr81501-10b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
Newer linker support an option to disable deduplication of entities.
This speeds up linking and can improve debug experience. Adopting the
same criteria as clang in adding the option.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config.in: Regenerate.
* config/darwin.h (DARWIN_LD_NO_DEDUPLICATE): New.
(LINK_SPEC): Handle -no_deduplicate.
* configure: Regenerate.
* configure.ac: Detect linker support for -no_deduplicate.
|
|
The Darwin ABI uses a different section for string constants when
address sanitizing is enabled. This adds defintions of the asan-
specific sections and switches string constants to the correct
section.
It also makes the string constant symbols linker-visible when
asan is enabled, but not otherwise.
gcc/ChangeLog:
* config/darwin-sections.def (asan_string_section,
asan_globals_section, asan_liveness_section): New.
* config/darwin.cc (objc_method_decl): Use asan sections
when asan is enabled.
(darwin_encode_section_info): Alter string constant
linker visibility depending on asan.
(machopic_select_section): Use the asan sections when
asan is enabled.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/darwin-cfstring-3.c: Adjust for amended
string labels.
* g++.dg/torture/darwin-cfstring-3.C: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
When we canonicalize the comparison for a czero sequence we need to handle both
integer and fp comparisons. Furthermore, within the integer space we want to
make sure we promote any sub-word objects to a full word.
All that is working fine. After promotion we then force the value into a
register if it is not a register or constant already. The idea is not to have
to special case subregs in subsequent code. This works fine except when we're
presented with a floating point object that would be a subword. (subreg:SF
(reg:SI)) on rv64 for example.
So this tightens up that force_reg step. Bootstapped and regression tested on
riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf.
Pushing to the trunk after pre-commit verifies no regressions.
Jeff
PR target/121160
gcc/
* config/riscv/riscv.cc (canonicalize_comparands); Tighten check for
forcing value into a GPR.
gcc/testsuite/
* gcc.target/riscv/pr121160.c: New test.
|
|
The following splits up VMAT_GATHER_SCATTER into
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce
the uses of (full) gs_info, but it also makes the kind representable
by a single entry rather than the ifn and decl tristate.
The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN,
since that's what we end up checking.
* tree-vectorizer.h (vect_memory_access_type): Replace
VMAT_GATHER_SCATTER with three separate access types,
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED.
(mat_gather_scatter_p): New predicate.
(GATHER_SCATTER_LEGACY_P): Remove.
(GATHER_SCATTER_IFN_P): Likewise.
(GATHER_SCATTER_EMULATED_P): Likewise.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Adjust.
(get_load_store_type): Likewise.
(vect_get_loop_variant_data_ptr_increment): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Likewise.
* config/riscv/riscv-vector-costs.cc
(costs::need_additional_vector_vars_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype):
Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Likewise.
|