Age | Commit message (Collapse) | Author | Files | Lines |
|
When we canonicalize the comparison for a czero sequence we need to handle both
integer and fp comparisons. Furthermore, within the integer space we want to
make sure we promote any sub-word objects to a full word.
All that is working fine. After promotion we then force the value into a
register if it is not a register or constant already. The idea is not to have
to special case subregs in subsequent code. This works fine except when we're
presented with a floating point object that would be a subword. (subreg:SF
(reg:SI)) on rv64 for example.
So this tightens up that force_reg step. Bootstapped and regression tested on
riscv64-linux-gnu and tested on riscv32-elf and riscv64-elf.
Pushing to the trunk after pre-commit verifies no regressions.
Jeff
PR target/121160
gcc/
* config/riscv/riscv.cc (canonicalize_comparands); Tighten check for
forcing value into a GPR.
gcc/testsuite/
* gcc.target/riscv/pr121160.c: New test.
|
|
The following splits up VMAT_GATHER_SCATTER into
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce
the uses of (full) gs_info, but it also makes the kind representable
by a single entry rather than the ifn and decl tristate.
The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN,
since that's what we end up checking.
* tree-vectorizer.h (vect_memory_access_type): Replace
VMAT_GATHER_SCATTER with three separate access types,
VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and
VMAT_GATHER_SCATTER_EMULATED.
(mat_gather_scatter_p): New predicate.
(GATHER_SCATTER_LEGACY_P): Remove.
(GATHER_SCATTER_IFN_P): Likewise.
(GATHER_SCATTER_EMULATED_P): Likewise.
* tree-vect-stmts.cc (check_load_store_for_partial_vectors):
Adjust.
(get_load_store_type): Likewise.
(vect_get_loop_variant_data_ptr_increment): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Likewise.
* config/riscv/riscv-vector-costs.cc
(costs::need_additional_vector_vars_p): Likewise.
* config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype):
Likewise.
(aarch64_vector_costs::count_ops): Likewise.
(aarch64_vector_costs::add_stmt_cost): Likewise.
|
|
The rtx cost value defined by the target backend affects the
calculation of register pressure classes in the IRA, thus affecting
scheduling. This may cause program performance degradation.
For example, OpenSSL 3.5.1 SHA512 and SPEC CPU 2017 exchange_r.
This problem can be avoided by defining a set of register pressure
classes in the target backend instead of using the default IRA to
automatically calculate them.
gcc/ChangeLog:
PR target/120476
* config/loongarch/loongarch.cc
(loongarch_compute_pressure_classes): New function.
(TARGET_COMPUTE_PRESSURE_CLASSES): Define.
|
|
This patch adds support for C23's _BitInt for LoongArch.
From the LoongArch psABI[1]:
> _BitInt(N) objects are stored in little-endian order in memory
> and are signed by default.
>
> For N ≤ 64, a _BitInt(N) object have the same size and alignment
> of the smallest fundamental integral type that can contain it.
> The unused high-order bits within this containing type are filled
> with sign or zero extension of the N-bit value, depending on whether
> the _BitInt(N) object is signed or unsigned. The _BitInt(N) object
> propagates its signedness to the containing type and is laid out
> in a register or memory as an object of this type.
>
> For N > 64, _BitInt(N) objects are implemented as structs of 64-bit
> integer chunks. The number of chunks is the smallest even integer M
> so that M * 64 ≥ N. These objects are of the same size of the struct
> containing the chunks, but always have 16-byte alignment. If there
> are unused bits in the highest-ordered chunk that contains used
> bits, they are defined as the sign- or zero- extension of the used
> bits depending on whether the _BitInt(N) object is signed or
> unsigned. If an entire chunk is unused, its bits are undefined.
[1] https://github.com/loongson/la-abi-specs
PR target/117599
gcc/ChangeLog:
* config/loongarch/loongarch.h: Define a PROMOTE_MODE case for
small _BitInts.
* config/loongarch/loongarch.cc (loongarch_promote_function_mode):
Same.
(loongarch_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Declare.
libgcc/ChangeLog:
* config/loongarch/t-softfp-tf: Enable _BitInt helper functions.
* config/loongarch/t-loongarch: Same.
* config/loongarch/libgcc-loongarch.ver: New file.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/bitint-alignments.c: New test.
* gcc.target/loongarch/bitint-args.c: New test.
* gcc.target/loongarch/bitint-sizes.c: New test.
|
|
So this is a minor bug in a few DFA descriptions such as the Xiangshan and a
couple of the SiFive descriptions.
While Xiangshan covers every insn type, some of the reservations check the mode
of the operation. Concretely the fdiv/fsqrt unit reservations vary based on
the mode. They handled DF/SF, but not HF (the relevant iterators don't include
BF).
This patch just adds HF support with the same characteristics as SF. Those who
know these designs better could perhaps improve the reservation, but this at
least keeps us from aborting.
I did check the other published DFAs for mode dependent reservations. That's
show I found the p400/p600 issue.
Tested in my tester, waiting for CI to render its verdict before pushing.
PR target/121113
gcc/
* config/riscv/sifive-p400.md: Handle HFmode for fdiv/fsqrt.
* config/riscv/sifive-p600.md: Likewise.
* config/riscv/xiangshan.md: Likewise.
gcc/testsuite/
* gcc.target/riscv/pr121113.c: New test.
|
|
For
(set (reg/v:DI 106 [ k ])
(const_int 3000000000 [0xb2d05e00]))
...
(set (reg:V4SI 115 [ _13 ])
(vec_duplicate:V4SI (subreg:SI (reg/v:DI 106 [ k ]) 0)))
...
(set (reg:V2SI 118 [ _9 ])
(vec_duplicate:V2SI (subreg:SI (reg/v:DI 106 [ k ]) 0)))
we should generate
(set (reg:SI 125)
(const_int -1294967296 [0xffffffffb2d05e00]))
(set (reg:V4SI 124)
(vec_duplicate:V4SI (reg:VSI 125))
...
(set (reg:V4SI 115 [ _13 ])
(reg:V4SI 124)
...
(set (reg:V2SI 118 [ _9 ])
(subreg:V2SI (reg:V4SI 124))
by converting integer constant to mode of move.
gcc/
PR target/121497
* config/i386/i386-features.cc (ix86_broadcast_inner): Convert
integer constant to mode of move
gcc/testsuite/
PR target/121497
* gcc.target/i386/pr121497.c: New test.
Co-authored-by: Liu, Hongtao <hongtao.liu@intel.com>
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch would like to combine the vec_duplicate + vaadd.vv to the
vaadd.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_MERGE_0(T) \
void \
test_vx_merge_##T##_case_0 (T * restrict out, T * restrict in, \
T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
{ \
if (i % 2 == 0) \
out[i] = x; \
else \
out[i] = in[i]; \
} \
}
DEF_VX_MERGE_0(int32_t)
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vmerge.vvm v1,v1,v2,v0
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vmerge.vxm v1,v1,a2,v0
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*merge_vx_<mode>): Add new
pattern to combine the vmerge.vxm.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Hi,
In PR121334 we are asked to expand a const_vector of size 4 with
poly_int elements. It has 2 elts per pattern so is neither a
const_vector_duplicate nor a const_vector_stepped.
We don't allow this kind of constant in legitimate_constant_p but expr
apparently still wants us to expand it under certain conditions.
This patch implements a basic expander for such kinds of patterns.
As slide1up is used to build the individual vectors it also adds
a helper function expand_slide1up.
I regtested on rv64gcv_zvl512b but unfortunately the newly created pattern is
not even executed. I tried some variations of the original code but didn't
manage to trigger it.
Regards
Robin
PR target/121334
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_slide1up): New function.
(expand_vector_init_trailing_same_elem): Use new function.
(expand_const_vector_onestep): New function.
(expand_const_vector): Uew expand_slide1up.
(expand_vector_init_merge_repeating_sequence): Ditto.
(shuffle_off_by_one_patterns): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121334.c: New test.
|
|
enum can't be used in #if.
For #if expression, identifiers that are not macros,
which are all considered to be the number zero.
This patch may fix https://sourceware.org/bugzilla/show_bug.cgi?id=32776.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (ABI_BASE_LP64D): New macro.
(ABI_BASE_LP64F): New macro.
(ABI_BASE_LP64S): New macro.
(N_ABI_BASE_TYPES): New macro.
|
|
This is a patch primarily from Shreya, though I think she cribbed some
code from Philipp that we had internally within Ventana and I made some
minor adjustments as well.
So the basic idea here is similar to her work on logical ops --
specifically when we can generate more efficient code at expansion time,
then do so. In some cases the net is better code; in other cases we
lessen reliance on mvconst_internal and finally it provides
infrastructure that I think will help address an issue Paul Antoine
reported a little while back.
The most obvious case is using paired addis from initial code generation
for some constants. It will also use a shNadd insn when the cost to
synthesize the original value is higher than the right-shifted value.
Finally it will negate the constant and use "sub" if the negated
constant is cheaper than the original constant.
There's more work to do in here, particularly WRT 32 bit objects for
rv64. Shreya is looking at that right now. There may also be cases
where another shNadd or addi would be profitable. We haven't really
explored those cases in any detail, while there may be cases to handle,
it's unclear how often they occur in practice.
I don't want to remove the define_insn_and_split for the paired addi
cases yet. I think that likely happens as a side effect of fixing Paul
Antoine's issue.
Bootstrapped and regression tested on a BPI & Pioneer box. Will
obviously wait for the pre-commit tester before moving forward.
Jeff
PR target/120603
gcc/
* config/riscv/riscv-protos.h (synthesize_add): Add prototype.
* config/riscv/riscv.cc (synthesize_add): New function.
* config/riscv/riscv.md (addsi3): Allow any constant as operands[2]
in the expander. Force the constant into a register as needed for
TARGET_64BIT. Use synthesize_add for !TARGET_64BIT.
(*adddi3): Renamed from adddi3.
(adddi3): New expander. Use synthesize_add.
gcc/testsuite
* gcc.target/riscv/add-synthesis-1.c: New test.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
Co-authored-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
Reject QI/HImode conditions, which would require extension in
order to compare. Fixes
z.c:10:1: error: unrecognizable insn:
10 | }
| ^
(insn 23 22 24 2 (set (reg:CC 66 cc)
(compare:CC (reg:HI 128)
(reg:HI 127))) "z.c":6:6 -1
(nil))
during RTL pass: vregs
gcc:
* config/aarch64/aarch64.md (mov<ALLI>cc): Accept MODE_CC
conditions directly; reject QI/HImode conditions.
gcc/testsuite:
* gcc.target/aarch64/cmpbr-3.c: New.
* gcc.target/aarch64/ifcvt_multiple_sets_rewire.c: Simplify
test for csel by ignoring the actual registers used.
|
|
Restrict the immediate range to the intersection of LT/GE and GT/LE
so that cfglayout can invert the condition to redirect any branch.
gcc:
PR target/121388
* config/aarch64/aarch64.cc (aarch64_cb_rhs): Restrict the
range of LT/GE and GT/LE to their intersections.
* config/aarch64/aarch64.md (*aarch64_cb<INT_CMP><GPI>): Unexport.
Use cmpbr_imm_predicate instead of aarch64_cb_rhs.
* config/aarch64/constraints.md (Uc1): Accept 0..62.
(Uc2): Remove.
* config/aarch64/iterators.md (cmpbr_imm_predicate): New.
(cmpbr_imm_constraint): Update to match aarch64_cb_rhs.
* config/aarch64/predicates.md (aarch64_cb_reg_i63_operand): New.
(aarch64_cb_reg_i62_operand): New.
gcc/testsuite:
PR target/121388
* gcc.target/aarch64/cmpbr.c (u32_x0_ult_64): XFAIL.
(i32_x0_slt_64, u64_x0_ult_64, i64_x0_slt_64): XFAIL.
* gcc.target/aarch64/cmpbr-2.c: New.
|
|
gcc:
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs):
Use aarch64_cb_rhs to match CB insns.
|
|
There is a conflict between aarch64_tbzltdi1 and aarch64_cbltdi
with respect to pnum_clobbers, resulting in a recog failure:
0xa1fffe fancy_abort(char const*, int, char const*)
../../gcc/diagnostics/context.cc:1640
0x81340e patch_jump_insn
../../gcc/cfgrtl.cc:1303
0xc0eafe redirect_branch_edge
../../gcc/cfgrtl.cc:1330
0xc0f372 cfg_layout_redirect_edge_and_branch
../../gcc/cfgrtl.cc:4736
0xbfb6b9 redirect_edge_and_branch(edge_def*, basic_block_def*)
../../gcc/cfghooks.cc:391
0x1fa9310 try_forward_edges
../../gcc/cfgcleanup.cc:561
0x1fa9310 try_optimize_cfg
../../gcc/cfgcleanup.cc:2931
0x1fa9310 cleanup_cfg(int)
../../gcc/cfgcleanup.cc:3143
0x1fe11e8 rest_of_handle_cse
../../gcc/cse.cc:7591
0x1fe11e8 execute
../../gcc/cse.cc:7622
The simplest solution is to remove the clobber from aarch64_tbz.
This removes the possibility of expansion via TST+B.cond, which
will merely fall back to TBNZ+B on shorter branches.
gcc:
PR target/121385
* config/aarch64/aarch64.md (*aarch64_tbz<LTGE><ALLI>1): Remove
cc clobber and expansion via TST+Bcond.
gcc/testsuite:
PR target/121385
* gcc.target/aarch64/cmpbr-1.c: New.
|
|
With -mtrack-speculation, CC_REGNUM must be used at every
conditional branch.
gcc:
* config/aarch64/aarch64.h (TARGET_CMPBR): False when
aarch64_track_speculation is true.
|
|
Both patterns used !reload_completed as a condition, which is
questionable at best. The branch pattern failed to include a
clobber of CC_REGNUM. Both problems were unlikely to trigger
in practice, due to how the optimization pipeline is organized,
but let's fix them anyway.
gcc:
* config/aarch64/aarch64.cc (aarch64_gen_compare_split_imm24): New.
* config/aarch64/aarch64-protos.h: Update.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>): Use it.
Add match_scratch and cc clobbers. Use match_operator instead of
iterator expansion.
(*compare_cstore<GPI>_insn): Likewise.
|
|
Two of the three uses of aarch64_imm24 included the important follow-up
tests vs aarch64_move_imm and aarch64_plus_operand. Lack of the exclusion
within aarch64_if_then_else_costs produced incorrect costing.
Since aarch64_split_imm24 has already matched a non-negative CONST_INT,
drill down from aarch64_plus_operand to aarch64_uimm12_shift.
gcc:
* config/aarch64/predicates.md (aarch64_split_imm24): Rename from
aarch64_imm24; exclude aarch64_move_imm and aarch64_uimm12_shift.
* config/aarch64/aarch64.md (*aarch64_bcond_wide_imm<GPI>):
Update for aarch64_split_imm24.
(*compare_cstore<GPI>_insn): Likewise.
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Likewise.
|
|
The save/restore_stack_nonlocal patterns passed a DImode rtx to
gen_tbranch_neqi3 for a QImode compare. But since we're seeding
r16 with 1, GCSEnabled will clear the only set bit in r16, so we
can use CBNZ instead of TBNZ.
gcc:
* config/aarch64/aarch64.md (tbranch_<EQL><SHORT>3): Remove.
(save_stack_nonlocal): Use aarch64_gen_compare_zero_and_branch.
(restore_stack_nonlocal): Likewise.
gcc/testsuite:
* gcc.target/aarch64/gcs-nonlocal-3.c: Match cbnz.
|
|
With -mtrack-speculation, the pattern that was directly expanded by
aarch64_restore_za is disabled. Use the helper function instead.
gcc:
* config/aarch64/aarch64.cc
(aarch64_gen_compare_zero_and_branch): Export.
* config/aarch64/aarch64-protos.h
(aarch64_gen_compare_zero_and_branch): Declare it.
* config/aarch64/aarch64-sme.md (aarch64_restore_za): Use it.
* config/aarch64/aarch64.md (*aarch64_cbz<EQL><GPI>): Unexport.
|
|
gcc:
* config/aarch64/aarch64.cc (aarch64_if_the_else_costs): Reorg to
include the cost of inner within TBZ sign-bit test, only match
CBZ/CBNZ with valid modes, and both for the aarch64_imm24 test.
|
|
gcc:
* config/aarch64/aarch64.cc (aarch64_if_then_else_costs): Remove
else after return and re-indent.
|
|
One kilobyte not one kilobit.
gcc:
* config/aarch64/aarch64.md (BRANCH_LEN_N_1KiB): Rename
from BRANCH_LEN_N_1Kib.
|
|
The previous cost value for vec_duplicate almost bases on the operators
like add/minus. The rtx_cost function try to match them case by case
and find if it has vec_duplicate, then update the cost values.
It is Ok when we initially add it but looks confused/redundant as more
and more operators are involved. As Robin's suggestion, we only care
about the sub-rtx has vec_duplicate or not, instead of take care of
it by operators.
Thus, this PR would like to refactor that and get rid of the operators
when compute the vec_duplicate cost.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (get_vector_binary_rtx_cost): Remove.
(riscv_rtx_costs): Refactor to serach vec_duplicate on the
sub rtx.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Update
asm check due to above change.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_zvfh-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_zvfh-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Fix the bound checking for the opc1 operand of the following intrinsics:
__arm_mcrr
__arm_mcrr2
__arm_mrrc
__arm_mrrc2
gcc/ChangeLog:
PR target/121464
* config/arm/arm.md (arm_<mrrc>, arm_<mcrr>): Fix operand check.
gcc/testsuite/ChangeLog:
PR target/121464
* gcc.target/arm/acle/mcrr.c: Update testcase.
* gcc.target/arm/acle/mcrr2.c: Likewise.
* gcc.target/arm/acle/mrrc.c: Likewise.
* gcc.target/arm/acle/mrrc2.c: Likewise.
|
|
This patch fixes some comment typos, singe -> single and unsinged -> unsigned.
2025-08-11 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-cfg.cc (find_case_label_for_value): Fix comment typo,
singe-valued -> single-valued.
* config/arc/arc.md: Fix comment typos, unsinged -> unsigned.
gcc/fortran/
* gfortran.h (gfc_case): Fix comment typo, singe -> single.
gcc/testsuite/
* g++.dg/warn/template-1.C: Fix comment typo, unsinged -> unsigned.
* gcc.target/powerpc/builtins-2-p9-runnable.c (main): Likewise.
* gcc.dg/graphite/id-30.c: Likewise.
|
|
Grow the local frame down instead of up for mips16 code size.
By growing the frame downwards we get spill slots created at the lowest
address rather than highest address in a local frame. The benefit being
that when the frame is large the spill slots can still be accessed using
a 16bit instruction whereas it is less important for large local
variables to be accessed using short instructions as they are (probably)
accessed less frequently.
This is default on for MIPS16.
gcc/
* config/mips/mips.h (FRAME_GROWS_DOWNWARD) Allow the frame to
grow downwards for mips16 when -mgrow-frame-downwards is set.
* config/mips/mips.opt: Add -mgrow-frame-downwards option.
|
|
When we are using section anchors, there's a requirement that the
sequence of the content is an unbroken block. If we allow linker-
visible symbols in that block, ld(64) would be able to break it
into sub-sections on those symbol boundaries.
Do not allow symbols that should be visible to be anchored.
Do not make anchor block internal symbols linker-visible.
gcc/ChangeLog:
* config/darwin.cc (darwin_encode_section_info): Do not
make anchored symbols linker-visible.
(darwin_use_anchors_for_symbol_p): Disallow anchoring on
symbols that must be linker-visible (or external), even
if the definitions are in this TU.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
In principle, these begin (or at least delineate) a region that
could be split by the static linker. If the symbols are hidden
to newer linkers they produce diagnostics about the temporary
symbol generated.
gcc/ChangeLog:
* config/darwin.h (ASM_GENERATE_INTERNAL_LABEL): New
entry for LANCHOR.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
References to literal pool entries do not need to be reloaded or converted
to "(mem (reg X))" to load via base register.
gcc/ChangeLog:
* config/xtensa/constraints.md (T):
Change define_memory_constraint to define_special_memory_constraint.
|
|
As discussed in https://gcc.gnu.org/pipermail/gcc-patches/2025-June/685733.html
the operand of the call should be a mem rather than an unspec.
This patch moves the unspec to an additional argument of the parallel
and adjusts cmse_nonsecure_call_inline_register_clear accordingly.
The scan-rtl-dump in cmse-18.c needs a fix since we no longer emit the
'unspec' part.
In addition, I noticed that since arm_v8_1m_mve_ok is always true in
the context of the test (we know we support CMSE as per cmse.exp, and
arm_v8_1m_mve_ok finds the adequate options), we actually only use the
more permissive regex. To improve that, the patch duplicates the
test, such that cmse-18.c forces -march=armv8-m.main+fp (so FPCXP is
disabled), and cmse-19.c forces -march=armv8.1-m.main+mve (so FPCXP is
enabled). Each test uses the appropriate scan-rtl-dump, and also
checks we are using UNSPEC_NONSECURE_MEM (we need to remove -slim for
that). The tests enable an FPU via -march so that the test passes
whether the testing harness forces -mfloat-abi or not.
2025-07-08 Christophe Lyon <christophe.lyon@linaro.org>
PR target/120977
gcc/
* config/arm/arm.md (call): Move unspec parameter to parallel.
(nonsecure_call_internal): Likewise.
(call_value): Likewise.
(nonsecure_call_value_internal): Likewise.
* config/arm/thumb1.md (nonsecure_call_reg_thumb1_v5): Likewise.
(nonsecure_call_value_reg_thumb1_v5): Likewise.
* config/arm/thumb2.md (nonsecure_call_reg_thumb2_fpcxt):
Likewise.
(nonsecure_call_reg_thumb2): Likewise.
(nonsecure_call_value_reg_thumb2_fpcxt): Likewise.
(nonsecure_call_value_reg_thumb2): Likewise.
* config/arm/arm.cc (cmse_nonsecure_call_inline_register_clear):
Likewise.
gcc/testsuite
* gcc.target/arm/cmse/cmse-18.c: Check only the case when FPCXT is
not enabled.
* gcc.target/arm/cmse/cmse-19.c: New test.
|
|
This patch fixes incorrect constraints in RTL patterns for AArch64 SVE
gather/scatter with type widening/narrowing and vector-plus-immediate
addressing. The bug leads to below "immediate offset out of range"
errors during assembly, eventually causing compilation failures.
/tmp/ccsVqBp1.s: Assembler messages:
/tmp/ccsVqBp1.s:54: Error: immediate offset out of range 0 to 31 at operand 3 -- `ld1b z1.d,p0/z,[z1.d,#64]'
Current RTL patterns for such instructions incorrectly use vgw or vgd
constraints for the immediate operand, base on the vector element type
in Z registers (zN.s or zN.d). However, for gather/scatter with type
conversions, the immediate range for vector-plus-immediate addressing is
determined by the element type in memory, which differs from that in
vector registers. Using the wrong constraint can produce out-of-range
offset values that cannot be encoded in the instruction.
This patch corrects the constraints used in these patterns. A test case
that reproduces the issue is also included.
Bootstrapped and regression-tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/121449
* config/aarch64/aarch64-sve.md
(mask_gather_load<mode><v_int_container>): Use vg<Vesize>
constraints for alternatives with immediate offset.
(mask_scatter_store<mode><v_int_container>): Likewise.
gcc/testsuite/ChangeLog:
PR target/121449
* g++.target/aarch64/sve/pr121449.C: New test.
|
|
This relaxes an overzealous assert that required the fpm_t argument to
be in DImode when expanding FP8 intrinsics. Of course this fails to
account for modeless const_ints.
gcc/ChangeLog:
PR target/120986
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::expand): Relax fpm_t assert to allow
modeless const_ints.
gcc/testsuite/ChangeLog:
PR target/120986
* gcc.target/aarch64/torture/pr120986-2.c: New test.
|
|
The predication of the SVE2 FP8 dot product insns was relying on the
architectural dependency:
FEAT_FP8DOT2 => FEAT_FP8DOT4
which was relaxed in GCC as of
r15-7480-g299a8e2dc667e795991bc439d2cad5ea5bd379e2, thus leading to
unrecognisable insn ICEs when compiling a two-way FDOT with just
+fp8dot2. This patch introduces a new mode iterator which selectively
enables the appropriate mode(s) depending on which of the FP8DOT{2,4}
features are available, and uses it to fix the predication of the
patterns.
gcc/ChangeLog:
PR target/120986
* config/aarch64/aarch64-sve2.md (@aarch64_sve_dot<mode>):
Switch mode iterator from SVE_FULL_HSF to new iterator;
remove insn predicate as this is now taken care of by conditions
in the mode iterator.
(@aarch64_sve_dot_lane<mode>): Likewise.
* config/aarch64/iterators.md (SVE_FULL_HSF_FP8_FDOT): New.
gcc/testsuite/ChangeLog:
PR target/120986
* gcc.target/aarch64/pr120986-1.c: New test.
|
|
Unlike base PCS functions, __arm_streaming and __arm_streaming_compatible
functions allow/require PSTATE.SM to be 1 on entry, so they need to
be treated as STO_AARCH64_VARIANT_PCS.
Similarly, functions that share ZA or ZT0 with their callers require
ZA to be active on entry, whereas the base PCS requires ZA to be
dormant or off. These functions too need to be marked as having
a variant PCS.
gcc/
PR target/121414
* config/aarch64/aarch64.cc (aarch64_is_variant_pcs): New function,
split out from...
(aarch64_asm_output_variant_pcs): ...here. Handle various types
of SME function type.
gcc/testsuite/
PR target/121414
* gcc.target/aarch64/sme/pr121414_1.c: New test.
|
|
gcc/ChangeLog:
* config/s390/s390.cc (print_operand): Allow arbitrary wide_int
constants for _BitInt.
(s390_bitint_type_info): Implement target hook
TARGET_C_BITINT_TYPE_INFO.
libgcc/ChangeLog:
* config/s390/libgcc-glibc.ver: Export _BitInt support
functions.
* config/s390/t-softfp (softfp_extras): Add fixtfbitint
floatbitinttf.
gcc/testsuite/ChangeLog:
* gcc.target/s390/bitint-1.c: New test.
* gcc.target/s390/bitint-2.c: New test.
* gcc.target/s390/bitint-3.c: New test.
* gcc.target/s390/bitint-4.c: New test.
|
|
The following splitter from the commit r11-5747:
(define_split
[(set (match_operand:SWI 0 "register_operand")
(any_rotate:SWI
(match_operand:SWI 1 "const_int_operand")
(subreg:QI
(and
(match_operand 2 "int248_register_operand")
(match_operand 3 "const_int_operand")) 0)))]
"(INTVAL (operands[3]) & (GET_MODE_BITSIZE (<MODE>mode) - 1))
== GET_MODE_BITSIZE (<MODE>mode) - 1"
[(set (match_dup 4) (match_dup 1))
(set (match_dup 0)
(any_rotate:SWI (match_dup 4)
(subreg:QI
(and:SI (match_dup 2) (match_dup 3)) 0)))]
"operands[4] = gen_reg_rtx (<MODE>mode);")
matches any mode of (and ...) on input, but hard-codes (and:SI ...)
in the output. This causes an ICE if the incoming (and ...) is DImode
rather than SImode.
Co-developed-by: Richard Sandiford <richard.sandiford@arm.com>
PR target/96226
gcc/ChangeLog:
* config/i386/predicates.md (and_operator): New operator.
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask):
Use and_operator to match AND RTX and use its mode
in the split pattern.
|
|
This patch adds the missing PTA_POPCNT and PTA_LZCNT with the PTA_ABM
bitmask definition for the bdver1, btver1, and lujiazui architectures
in the i386 architecture configuration file.
Although these two features were not present in the original definition,
their absence does not affect the functionality of these architectures
because the POPCNT and LZCNT bits are set when ABM is enabled in the
ix86_option_override_internal function. However, including them in these
definitions improves consistency and clarity. This issue was discovered
while writing a script to extract these bitmasks from the i386.h file
referenced in [1].
Additionally, the PTA_YONGFENG bitmask appears incorrect as it includes
PTA_LZCNT while already inheriting PTA_ABM from PTA_LUJIAZUI. This seems
to be a typo and should be corrected.
[1] https://github.com/cyyself/x86-pta
gcc/ChangeLog:
* config/i386/i386.h (PTA_BDVER1):
Add missing PTA_POPCNT and PTA_LZCNT with PTA_ABM.
(PTA_ZNVER1): Ditto.
(PTA_BTVER1): Ditto.
(PTA_LUJIAZUI): Ditto.
(PTA_YONGFENG): Do not include extra PTA_LZCNT.
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
|
|
Previously, arch-canonicalize used hardcoded data to handle IMPLIED_EXT.
But this data often got out of sync with the actual C++ implementation.
Earlier, we introduced riscv-ext.def to keep track of all extension info
and generate docs. Now, arch-canonicalize also uses this same data to handle
extension implication rules directly.
One limitation is that conditional implication rules still need to be written
manually. Luckily, there aren't many of them for now, so it's still manageable.
I really wanted to avoid writing a C++ + Python binding or trying to parse C++
logic in Python...
This version also adds a `--selftest` option to run some unit tests.
gcc/ChangeLog:
* config/riscv/arch-canonicalize: Read extension data from
riscv-ext*.def and adding unittest.
|
|
This patch introduces a new `-march=unset` option for RISC-V GCC that
allows users to explicitly ignore previous `-march` options and derive
the architecture string from the `-mcpu` option instead.
This feature is particularly useful for build systems and toolchain
configurations where you want to ensure the architecture is always
derived from the CPU specification rather than relying on potentially
conflicting `-march` options.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_expand_arch):
Ignore `unset`.
* config/riscv/riscv.h (OPTION_DEFAULT_SPECS): Handle
`-march=unset`.
(ARCH_UNSET_CLEANUP_SPECS): New.
(DRIVER_SELF_SPECS): Handle -march=unset.
* doc/invoke.texi (RISC-V Options): Update documentation for
`-march=unset`.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-unset-1.c: New test.
* gcc.target/riscv/arch-unset-2.c: New test.
* gcc.target/riscv/arch-unset-3.c: New test.
* gcc.target/riscv/arch-unset-4.c: New test.
* gcc.target/riscv/arch-unset-5.c: New test.
|
|
commit 050b1708ea532ea4840e97d85fad4ca63d4cd631
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Jun 19 05:03:48 2025 +0800
x86: Get the widest vector mode from MOVE_MAX
gets the widest vector mode from MOVE_MAX. But for memset, it should
use STORE_MAX_PIECES.
gcc/
PR target/121410
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
STORE_MAX_PIECES to get the widest vector mode in vector loop
for memset.
gcc/testsuite/
PR target/121410
* gcc.target/i386/pr121410.c: New test.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
gcc/
* config/avr/avr.cc (avr_rtx_costs_1) [SIGN_EXTEND]: Adjust cost.
* config/avr/avr.md (*sext.ashift<QIPSI:mode><HISI:mode>2): New
insn and a cc split.
|
|
The i386 high-register patterns used things like:
(match_operator:SWI248 2 "extract_operator"
[(match_operand 0 "int248_register_operand" "Q")
(const_int 8)
(const_int 8)])
to match an extraction of a high register such as AH from AX/EAX/RAX.
This construct is used in contexts where only the low 8 bits of the
value matter. This is either done explicitly using a (subreg:QI ... 0)
or implicitly by assigning to an 8-bit zero_extract destination.
extract_operator therefore matches both sign_extract and zero_extract,
since the signedness of the extension beyond 8 bits is irrelevant.
But the fact that only the low 8 bits of the value are significant
means that a shift right by 8 is as good as an extraction. Shifts
right would already be used for things like:
struct s {
long a:8;
long b:8;
long c:48;
};
struct s f(struct s x, long y, long z) {
x.b = (y & z) >> 8;
return x;
}
but are used more after g:965564eafb721f8000013a3112f1bba8d8fae32b.
This patch therefore replaces extract_operator with a new predicate
called extract_high_operator that matches both extractions and shifts.
The predicate checks the extraction field and shift amount itself,
so that patterns only need to match the first operand.
Splitters used match_op_dup to preserve the choice of extraction.
But the fact that the extractions (and now shifts) are equivalent
means that we can just as easily canonicalise on one of them.
(In theory, canonicalisation would also promote CSE, although
that's unlikely in practice.) The patch goes for zero_extract,
for consistency with destinations.
gcc/
PR target/121306
* config/i386/predicates.md (extract_operator): Replace with...
(extract_high_operator): ...this new predicate.
* config/i386/i386.md (*cmpqi_ext<mode>_1, *cmpqi_ext<mode>_2)
(*cmpqi_ext<mode>_3, *cmpqi_ext<mode>_4, *movstrictqi_ext<mode>_1)
(*extzv<mode>, *insvqi_2, *extendqi<SWI24:mode>_ext_1)
(*addqi_ext<mode>_1_slp, *addqi_ext<mode>_1_slp, *addqi_ext<mode>_0)
(*addqi_ext2<mode>_0, *addqi_ext<mode>_1, *<insn>qi_ext<mode>_2)
(*subqi_ext<mode>_1_slp, *subqi_ext<mode>_2_slp, *subqi_ext<mode>_0)
(*subqi_ext2<mode>_0, *subqi_ext<mode>_1, *testqi_ext<mode>_1)
(*testqi_ext<mode>_2, *<code>qi_ext<mode>_1_slp)
(*<code>qi_ext<mode>_2_slp. *<code>qi_ext<mode>_0)
(*<code>qi_ext2<mode>_0, *<code>qi_ext<mode>_1)
(*<code>qi_ext<mode>_1_cc, *<code>qi_ext<mode>_1_cc)
(*<code>qi_ext<mode>_2, *<code>qi_ext<mode>_3, *negqi_ext<mode>_1)
(*one_cmplqi_ext<mode>_1, *ashlqi_ext<mode>_1, *<insn>qi_ext<mode>_1)
(define_peephole2): Replace uses of extract_operator with
extract_high_operator, matching only the first operand.
Use zero_extract rather than match_op_dup when splitting.
|
|
PR target/121359
gcc/
* config/avr/avr.h: Remove -mlra and remains of reload.
* config/avr/avr.cc: Same.
* config/avr/avr.md: Same.
* config/avr/avr-log.cc: Same.
* config/avr/avr-protos.h: Same.
* config/avr/avr.opt: Same.
* config/avr/avr.opt.urls: Same.
gcc/testsuite/
* gcc.target/avr/torture/pr118591-1.c: Remove -mlra.
* gcc.target/avr/torture/pr118591-2.c: Same.
|
|
After
commit 965564eafb721f8000013a3112f1bba8d8fae32b
Author: Richard Sandiford <richard.sandiford@arm.com>
Date: Tue Jul 29 15:58:34 2025 +0100
simplify-rtx: Simplify subregs of logic ops
combine generates
(set (zero_extract:SI (reg/v:SI 101 [ a ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(not:SI (sign_extract:SI (reg:SI 107 [ b ])
(const_int 8 [0x8])
(const_int 8 [0x8]))))
instead of
(set (zero_extract:SI (reg/v:SI 101 [ a ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(subreg:SI (not:QI (subreg:QI (sign_extract:SI (reg:SI 107 [ b ])
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)) 0))
Update *one_cmplqi_ext<mode>_1 to support the new pattern.
PR target/121306
* config/i386/i386.md (*one_cmplqi_ext<mode>_1): Updated to
support the new pattern.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
The previous code-gen of scalar unsigned SAT_MUL, aka usmul.
Leverage the mulhs by mistake, it should be mulhu for the
hight bit result of mul. Thus, this patch would like to make
it correct.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_xmode_usmul): Take
umulhu for high bits mul result.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: Add mulhu
asm check.
* gcc.target/riscv/sat/sat_u_mul-1-u64-from-u128.c: Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
After previous patches, we should always get a VNx16BI result
for ACLE intrinsics that return svbool_t. This patch adds
an assert that checks a more general condition than that.
gcc/
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::expand): Assert that the return value
has an appropriate mode.
|
|
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the predicate forms of svdupq.
The general predicate expansion builds an equivalent integer vector
and then compares it with zero. This patch therefore relies on
the earlier patches to the comparison patterns.
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_convert_sve_data_to_pred): Remove the mode argument.
* config/aarch64/aarch64.cc
(aarch64_sve_emit_int_cmp): Allow PRED_MODE to be VNx16BI or
the natural predicate mode for the data mode.
(aarch64_convert_sve_data_to_pred): Remove the mode argument
and instead always create a VNx16BI result.
(aarch64_expand_sve_const_pred): Update call accordingly.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdupq_impl::expand): Likewise, ensuring that the result
has mode VNx16BI.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dupq_13.c: New test.
|
|
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the predicate forms of svdup.
gcc/
* config/aarch64/aarch64-protos.h
(aarch64_emit_sve_pred_vec_duplicate): Declare.
* config/aarch64/aarch64.cc
(aarch64_emit_sve_pred_vec_duplicate): New function.
* config/aarch64/aarch64-sve.md (vec_duplicate<PRED_ALL:mode>): Use it.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdup_impl::expand): Handle boolean values specially. Check for
constants and fall back on aarch64_emit_sve_pred_vec_duplicate
for the variable case, ensuring that the result has mode VNx16BI.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dup_1.c: New test.
|
|
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svpnext* intrinsics.
gcc/
* config/aarch64/iterators.md (PNEXT_ONLY): New int iterator.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_<sve_pred_op><mode>): Restrict SVE_PITER pattern
to VNx16BI_ONLY.
(@aarch64_sve_<sve_pred_op><mode>): New PNEXT_ONLY pattern for
PRED_HSD.
(*aarch64_sve_<sve_pred_op><mode>): Likewise.
(*aarch64_sve_<sve_pred_op><mode>_cc): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/pnext_3.c: New test.
|
|
This patch continues the work of making ACLE intrinsics use VNx16BI
for svbool_t results. It deals with the svmatch* and svnmatch*
intrinsics.
gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_pred_<sve_int_op><mode>):
Split SVE2_MATCH pattern into a VNx16QI_ONLY define_ins and a
VNx8HI_ONLY define_expand. Use a VNx16BI destination for the latter.
(*aarch64_pred_<sve_int_op><mode>): New SVE2_MATCH pattern for
VNx8HI_ONLY.
(*aarch64_pred_<sve_int_op><mode>_cc): Likewise.
gcc/testsuite/
* gcc.target/aarch64/sve2/acle/general/match_4.c: New test.
* gcc.target/aarch64/sve2/acle/general/nmatch_1.c: Likewise.
|