Age | Commit message (Collapse) | Author | Files | Lines |
|
A couple of Vector pseudoinstructions use x0 scalar which could be
inefficient on wider uarches due to regfile crossing.
Instead use the imm 0 form, which should be functionally equivalent.
pseudoinsn orig insn with x0 this patch
-------------------- -------------------- -------------------
vneg.v vd,vs vrsub.vx vd,vs,x0 vrsub.vi vd,vs,0
vncvt.x.x.w vd,vs,vm vnsrl.wx vd,vs,x0,vm vnsrl.wi vd,vs,0,vm
vwcvt.x.x.v vd,vs,vm vwadd.vx vd,vs,x0,vm (imm not supported)
gcc/ChangeLog:
* config/riscv/vector.md: vncvt substitute vnsrl.
vnsrl with x0 replace with immediate 0.
vneg substitute vrsub.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: Change
expected pattern.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Ditto
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/cond_trunc-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/neg-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/trunc-3.c: Ditto.
* gcc.target/riscv/rvv/base/simplify-vdiv.c: Ditto.
* gcc.target/riscv/rvv/base/unop_v_constraint-1.c: Ditto.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
|
|
This is a follow-up to the patch below to avoid generating unrecognized
vsetivl instructions for XTheadVector.
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674185.html
PR target/118601
gcc/ChangeLog:
* config/riscv/riscv-string.cc (expand_block_move): Check with new
constraint 'vl' instead of 'K'.
(expand_vec_setmem): Likewise.
(expand_vec_cmpmem): Likewise.
* config/riscv/riscv-v.cc (force_vector_length_operand): Likewise.
(expand_load_store): Likewise.
(expand_strided_load): Likewise.
(expand_strided_store): Likewise.
(expand_lanes_load_store): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xtheadvector/pr114194.c: Move to...
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv64.c: ...here.
* gcc.target/riscv/rvv/xtheadvector/pr114194-rv32.c: New test.
* gcc.target/riscv/rvv/xtheadvector/pr118601.c: New test.
Reported-by: Edwin Lu <ewlu@rivosinc.com>
|
|
As discussed from RISC-V C-API PR #101 [1], As discussed in #96, current
interface is insufficient to support some cases, like a vendor buying a
CPU IP from the upstream vendor but using their own mvendorid and custom
features from the upstream vendor. In this case, we might need to add
these extensions for each downstream vendor many times. Thus, making
__riscv_vendor_feature_bits guarded by mvendorid is not a good idea. So,
drop __riscv_vendor_feature_bits for now, and we should have time to
discuss a better solution.
[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/101
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
gcc/ChangeLog:
* config/riscv/riscv-feature-bits.h (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop.
(struct riscv_vendor_feature_bits): Drop.
libgcc/ChangeLog:
* config/riscv/feature_bits.c (RISCV_VENDOR_FEATURE_BITS_LENGTH): Drop.
(__init_riscv_features_bits_linux): Drop.
|
|
bits in common
So the change to prefer ADD over IOR for combining two objects with no bits in
common is (IMHO) generally good. It has some minor fallout.
In particular the aarch64 port (and I suspect others) have patterns that
recognize IOR, but not PLUS or XOR for these cases and thus tests which
expected to optimize with IOR are no longer optimizing.
Roger suggested using a code iterator for this purpose. Richard S. suggested a
new match operator to cover those cases.
I really like the match operator idea, but as Richard S. notes in the PR it
would require either not validating the "no bits in common", which dramatically
reduces the utility IMHO or we'd need some work to allow consistent results
without polluting the nonzero bits cache.
So this patch goes back to Roger's idea of just using a match iterator in the
aarch64 backend (and presumably anywhere else we see this popping up).
Bootstrapped and regression tested on aarch64-linux-gnu where it fixes
bitint-args.c (as expected).
PR target/115478
gcc/
* config/aarch64/iterators.md (any_or_plus): New code iterator.
* config/aarch64/aarch64.md (extr<mode>5_insn): Use any_or_plus.
(extr<mode>5_insn_alt, extrsi5_insn_uxtw): Likewise.
(extrsi5_insn_uxtw_alt, extrsi5_insn_di): Likewise.
gcc/testsuite/
* gcc.target/aarch64/bitint-args.c: Update expected output.
|
|
We agreed with LLVM developers to not enforce the architectural
dependencies between fp8 multiplication features, and they have already
been removed from LLVM and Binutils. Remove them from GCC as well.
gcc/ChangeLog:
* config/aarch64/aarch64-option-extensions.def
(SSVE_FP8FMA): Adjust formatting.
(FP8DOT4): Replace FP8FMA dependency with FP8.
(SSVE_FP8DOT4): Replace SSVE_FP8FMA dependency with SME2+FP8.
(FP8DOT2): Replace FP8DOT4 dependency with FP8.
(SSVE_FP8DOT2): Replace SSVE_FP8DOT4 dependency with SME2+FP8.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Adjust expected
defines.
* gcc.target/aarch64/simd/vmla_lane_indices_1.c: Modify target
pragmas.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_1.c:
Ditto.
* gcc.target/aarch64/sve/acle/general-c/ternary_mfloat8_lane_group_selection_1.c:
Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_lane_mf8.c: Ditto.
* gcc.target/aarch64/sve2/acle/asm/dot_mf8.c: Ditto.
|
|
x is not a macro argument. It just happens to work as final.cc passes
x for 2nd argument:
final.cc: ASM_OUTPUT_SYMBOL_REF (file, x);
PR target/118825
* config/i386/i386.h (ASM_OUTPUT_SYMBOL_REF): Replace x with
SYM.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch adds some of the float point instructions from
MIPS32 Release 6(mips32r6) with their respective built-in
functions and tests:
min_a_s, min_a_d
max_a_s, max_a_d
rint_s, rint_d
class_s, class_d
gcc/ChangeLog:
* config/mips/i6400.md (i6400_fpu_minmax): Include
fclass type.
(i6400_fpu_fadd): Include frint type.
* config/mips/mips.cc (AVAIL_NON_MIPS16): Add an entry
for __builtin_mipsr6_xxx.
(MIPSR6_BUILTIN_PURE): Same as above.
(CODE_FOR_mipsr6_min_a_s, CODE_FOR_mipsr6_min_a_d)
(CODE_FOR_mipsr6_max_a_s, CODE_FOR_mipsr6_max_a_d)
(CODE_FOR_mipsr6_class_s, CODE_FOR_mipsr6_class_d):
New code_aliasing macros.
(mips_builtins): Add mips32r6 min_a_s, min_a_d, max_a_s,
max_a_d, class_s, class_d builtins.
* config/mips/mips.h (ISA_HAS_FRINT): Define a new macro.
(ISA_HAS_FCLASS): Same as above.
* config/mips/mips.md (UNSPEC_FRINT): New unspec.
(UNSPEC_FCLASS): Same as above.
(type): Add frint and fclass.
(fmin_a_<mode>): Generates MINA.fmt instructions.
(fmax_a_<mode>): Generates MAXA.fmt instructions.
(rint<mode>2): Generates RINT.fmt instructions.
(fclass_<mode>): Generates CLASS.fmt instructions.
* config/mips/p6600.md (p6600_fpu_fadd): Include
frint type.
(p6600_fpu_fabs): Include fclass type.
gcc/testsuite/ChangeLog:
* gcc.target/mips/mips-class.c: New tests for MIPSr6
* gcc.target/mips/mips-minamaxa.c: Same as above.
* gcc.target/mips/mips-rint.c: Same as above.
Signed-off-by: Jie Mei <jie.mei@oss.cipunited.com>
Co-authored-by: Xi Ruoyao <xry111@xry111.site>
|
|
When moving intrins around for AVX10 implementation in GCC 14,
the intrin _kshiftli_mask32 and _kshiftri_mask32 are wrongly
wrapped by "#if __OPTIMIZE__" instead of "#ifdef __OPTIMIZE__",
leading to the intrin file not `-Wsystem-headers -Wundef` clean
since r14-4490.
gcc/ChangeLog:
PR target/118813
* config/i386/avx512bwintrin.h: Fix wrong __OPTIMIZE__
wrap.
|
|
Assume that a distro has configured, e.g., a gfx9-generic multilib but not
for gfx902. In that case, mkoffload would fail to link with "error:
incompatible mach". With this commit, an error is printed suggesting to try
the associated generic architecture instead. The behavior is unchanged if
there is a multilib available for the specific ISA or when there is also no
multilib for the generic ICA.
Note: The build of generic multilibs are currently not enabled by default;
they also require the linker/assembler of LLVM 19 or newer and, in particular,
for the execution a future ROCm release. (The next one? In any case, 6.3.2
does not support generic ISAs, yet.)
gcc/ChangeLog:
* config/gcn/mkoffload.cc (enum elf_arch_code): Add
EF_AMDGPU_MACH_AMDGCN_NONE.
(elf_arch): Use enum elf_arch_code as type.
(tool_cleanup): Silence warning by removing tailing '.' from error.
(get_arch_name): Return enum elf_arch_code.
(check_for_missing_lib): New; print fatal error if the multilib
is not available but it is for the associate generic ISA.
(main): Call it.
|
|
The following testcase is miscompiled because of RTL represententation
of bt{l,q} insn followed by e.g. j{c,nc} being misleading to what it
actually does.
Let's look e.g. at
(define_insn_and_split "*jcc_bt<mode>"
[(set (pc)
(if_then_else (match_operator 0 "bt_comparison_operator"
[(zero_extract:SWI48
(match_operand:SWI48 1 "nonimmediate_operand")
(const_int 1)
(match_operand:QI 2 "nonmemory_operand"))
(const_int 0)])
(label_ref (match_operand 3))
(pc)))
(clobber (reg:CC FLAGS_REG))]
"(TARGET_USE_BT || optimize_function_for_size_p (cfun))
&& (CONST_INT_P (operands[2])
? (INTVAL (operands[2]) < GET_MODE_BITSIZE (<MODE>mode)
&& INTVAL (operands[2])
>= (optimize_function_for_size_p (cfun) ? 8 : 32))
: !memory_operand (operands[1], <MODE>mode))
&& ix86_pre_reload_split ()"
"#"
"&& 1"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extract:SWI48
(match_dup 1)
(const_int 1)
(match_dup 2))
(const_int 0)))
(set (pc)
(if_then_else (match_op_dup 0 [(reg:CCC FLAGS_REG) (const_int 0)])
(label_ref (match_dup 3))
(pc)))]
{
operands[0] = shallow_copy_rtx (operands[0]);
PUT_CODE (operands[0], reverse_condition (GET_CODE (operands[0])));
})
The define_insn part in RTL describes exactly what it does,
jumps to op3 if bit op2 in op1 is set (for op0 NE) or not set (for op0 EQ).
The problem is with what it splits into.
put_condition_code %C1 for CCCmode comparisons emits c for EQ and LTU,
nc for NE and GEU and ICEs otherwise.
CCCmode is used mainly for carry out of add/adc, borrow out of sub/sbb,
in those cases e.g. for add we have
(set (reg:CCC flags) (compare:CCC (plus:M x y) x))
and use (ltu (reg:CCC flags) (const_int 0)) for carry set and
(geu (reg:CCC flags) (const_int 0)) for carry not set. These cases
model in RTL what is actually happening, compare in infinite precision
x from the result of finite precision addition in M mode and if it is
less than unsigned (i.e. overflow happened), carry is set.
Another use of CCCmode is in UNSPEC_* patterns, those are used with
(eq (reg:CCC flags) (const_int 0)) for carry set and ne for unset,
given the UNSPEC no big deal, the middle-end doesn't know what means
set or unset.
But for the bt{l,q}; j{c,nc} case the above splits it into
(set (reg:CCC flags) (compare:CCC (zero_extract) (const_int 0)))
for bt and
(set (pc) (if_then_else (eq (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
for the bit set case (so that the jump expands to jc) and ne for
the bit not set case (so that the jump expands to jnc).
Similarly for the different splitters for cmov and set{c,nc} etc.
The problem is that when the middle-end reads this RTL, it feels
the exact opposite to it. If zero_extract is 1, flags is set
to comparison of 1 and 0 and that would mean using ne ne in the
if_then_else, and vice versa.
So, in order to better describe in RTL what is actually happening,
one possibility would be to swap the behavior of put_condition_code
and use NE + LTU -> c and EQ + GEU -> nc rather than the current
EQ + LTU -> c and NE + GEU -> nc; and adjust everything. The
following patch uses a more limited approach, instead of representing
bt{l,q}; j{c,nc} case as written above it uses
(set (reg:CCC flags) (compare:CCC (const_int 0) (zero_extract)))
and
(set (pc) (if_then_else (ltu (reg:CCC flags) (const_int 0)) (label_ref) (pc)))
which uses the existing put_condition_code but describes what the
insns actually do in RTL clearly. If zero_extract is 1,
then flags are LTU, 0U < 1U, if zero_extract is 0, then flags are GEU,
0U >= 0U. The patch adjusts the *bt<mode> define_insn and all the
splitters to it and its comparisons/conditional moves/setXX.
2025-02-10 Jakub Jelinek <jakub@redhat.com>
PR target/118623
* config/i386/i386.md (*bt<mode>): Represent bt as
compare:CCC of const0_rtx and zero_extract rather than
zero_extract and const0_rtx.
(*bt<SWI48:mode>_mask): Likewise.
(*jcc_bt<mode>): Likewise. Use LTU and GEU as flags test
instead of EQ and NE.
(*jcc_bt<mode>_mask): Likewise.
(*jcc_bt<SWI48:mode>_mask_1): Likewise.
(Help combine recognize bt followed by cmov splitter): Likewise.
(*bt<mode>_setcqi): Likewise.
(*bt<mode>_setncqi): Likewise.
(*bt<mode>_setnc<mode>): Likewise.
(*bt<mode>_setncqi_2): Likewise.
(*bt<mode>_setc<mode>_mask): Likewise.
* gcc.c-torture/execute/pr118623.c: New test.
|
|
There's some special case code in the risc-v move expander to try and optimize
cases where the source is a subreg of a vector and the destination is a scalar
mode.
The code works fine except when we have no support for the given mode. ie HF or
BF when those extensions aren't enabled. We'll end up tripping an assert in
that case when we should have just let standard expansion do its thing.
Tested in my system for rv32 and rv64, but I'll wait for the pre-commit tester
to render a verdict before moving forward.
PR target/118146
gcc/
* config/riscv/riscv.cc (riscv_legitimize_move): Handle subreg
of vector source better to avoid ICE.
gcc/testsuite
* gcc.target/riscv/pr118146-1.c: New test.
* gcc.target/riscv/pr118146-2.c: New test.
|
|
For GCN, this avoids ICEs further down the compilation pipeline. For nvptx,
there's effectively no change: in presence of exception handling constructs,
instead of 'sorry, unimplemented: target cannot support nonlocal goto', we
now emit 'sorry, unimplemented: exception handling not supported'.
Additionally, turn test cases into UNSUPPORTED if running into
'sorry, unimplemented: exception handling not supported'.
gcc/
* config/gcn/gcn.md (exception_receiver): 'define_expand'.
* config/nvptx/nvptx.md (exception_receiver): Likewise.
gcc/testsuite/
* lib/gcc-dg.exp (gcc-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.
* gcc.dg/pr104464.c: Remove GCN XFAIL.
libstdc++-v3/
* testsuite/lib/prune.exp (libstdc++-dg-prune): Turn
'sorry, unimplemented: exception handling not supported' into
UNSUPPORTED.
|
|
The following testcase ICEs starting with GCC 12 since r12-4526
although the bug has been introduced already in r12-2751.
The problem was in the addition of cond_<code><mode> define_expand
which uses nonimmediate_operand predicates for both maxmin operands
for all VI1248_AVX512VLBW modes. It works fine with
VI48_AVX512VL modes because the <code><mode>3_mask VI48_AVX512VL
define_expand uses ix86_fixup_binary_operands_no_copy and the
*avx512f_<code><mode>3<mask_name> VI48_AVX512VL define_insn uses
% in constraint and !(MEM_P && MEM_P) check in condition (and
<code><mode>3 define_expand with VI124_256_AVX512F_AVX512BW iterator
does that too), but eventhough the 8-bit and 16-bit element maxmin
is commutative too, the <mask_codefor><code><mode>3<mask_name>
define_insn with VI12_AVX512VL iterator didn't use % in constraint
to make it commutative. So, e.g. cond_umaxv32qi define_expand
allowed nonimmediate_operand for both umax operands, but used
gen_umaxv32qi_mask which wasn't commutative and only allowed
nonimmediate_operand for the second operand.
The following patch fixes it by keeping the <code><mode>3
VI124_256_AVX512F_AVX512BW define_expand as is (it does
ix86_fixup_binary_operands_no_copy) but extending the
<code><mode>3_mask define_expand from VI48_AVX512VL to
VI1248_AVX512VLBW which keeps the current modes with their
ISA conditions and adds the VI12_AVX512VL modes under additional
TARGET_AVX512BW condition, and turning the actual define_insn
into an * prefixed name (which it was before just for the non-masked
case) and having the same commutative operand handling as in other
define_insns.
2025-02-08 Jakub Jelinek <jakub@redhat.com>
PR target/118776
* config/i386/sse.md (<code><mode>3_mask): Use VI1248_AVX512VLBW
iterator rather than VI48_AVX512VL.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*avx512bw_<code><mode>3<mask_name>): ... this. Use
nonimmediate_operand rather than register_operand predicate and %v
rather than v constraint for operand 1 and adjust condition to reject
MEMs in both operand 1 and 2.
* gcc.target/i386/pr118776.c: New test.
|
|
Instead of waiting to get combine/rtl optimizations fixed here. This fixes the
builtins at the gimple level. It should provide for slightly faster compile time
since we have a simplification earlier on.
Built and tested for aarch64-linux-gnu.
gcc/ChangeLog:
PR target/114522
* config/aarch64/aarch64-builtins.cc (aarch64_fold_aes_op): New function.
(aarch64_general_gimple_fold_builtin): Call aarch64_fold_aes_op for crypto_aese
and crypto_aesd.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
For thumb2, popping a single low register off the stack should prefer
POP over LDR to mirror the behaviour of the PUSH on entry. This saves
a couple of bytes in the resulting image. This is a relatively niche
case as it's rare to push a single low register onto the stack, but
still worth getting right.
Whilst fixing this I've also restructured the code here somewhat to
fix a bug I observed by inspection and to improve the code slightly.
Firstly, the single register case is hoisted above the main loop.
This not only avoids creating some RTL that immediately becomes
garbage but also avoids us needing to check for this case in every
iteration of the main loop body.
Secondly, we iterate over just the non-zero bits in the reg mask
rather than every bit and then checking if there's work to do for that
bit.
Finally, when emitting a pop that also pops SP off the stack we
shouldn't be emitting a stack-adjust CFA note. The new SP value comes
from the popped value, not from an adjustment of the previous SP
value.
gcc:
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Restructure.
Don't emit LDR on thumb2 when POP can be used for smaller code.
Don't add a CFA adjust note when SP is popped off the stack.
gcc/testsuite:
PR target/118089
* gcc.target/arm/thumb2-pop-loreg.c: New test.
|
|
My earlier change for making the compiler prefer
POP {PC}
over
LDR PC, [SP], #4
had a slightly unexpected consequence in that we now also call
arm_emit_multi_reg_pop to handle single register pops when the
register is not PC. This exposed a latent bug in this function where
the dwarf unwinding notes on the single-register POP were not being
set correctly.
gcc/
PR target/118089
* config/arm/arm.cc (arm_emit_multi_reg_pop): Add a CFA adjust
note to single-register POP instructions.
|
|
Inspired by PR118103, the VXRM register should be treated almost the
same as the FRM register, aka cooperatively-managed global register.
Thus, add the VXRM to global_regs to avoid the elimination by the
late-combine pass.
For example as below code:
21 │
22 │ void compute ()
23 │ {
24 │ size_t vl = __riscv_vsetvl_e16m1 (N);
25 │ vuint16m1_t va = __riscv_vle16_v_u16m1 (a, vl);
26 │ vuint16m1_t vb = __riscv_vle16_v_u16m1 (b, vl);
27 │ vuint16m1_t vc = __riscv_vaaddu_vv_u16m1 (va, vb, __RISCV_VXRM_RDN, vl);
28 │
29 │ __riscv_vse16_v_u16m1 (c, vc, vl);
30 │ }
31 │
32 │ int main ()
33 │ {
34 │ initialize ();
35 │ compute();
36 │
37 │ return 0;
38 │ }
After compile with -march=rv64gcv -O3, we will have:
30 │ compute:
31 │ csrwi vxrm,2
32 │ lui a3,%hi(a)
33 │ lui a4,%hi(b)
34 │ addi a4,a4,%lo(b)
35 │ vsetivli zero,4,e16,m1,ta,ma
36 │ addi a3,a3,%lo(a)
37 │ vle16.v v2,0(a4)
38 │ vle16.v v1,0(a3)
39 │ lui a4,%hi(c)
40 │ addi a4,a4,%lo(c)
41 │ vaaddu.vv v1,v1,v2
42 │ vse16.v v1,0(a4)
43 │ ret
44 │ .size compute, .-compute
45 │ .section .text.startup,"ax",@progbits
46 │ .align 1
47 │ .globl main
48 │ .type main, @function
49 │ main:
| // csrwi vxrm,2 deleted after inline
50 │ addi sp,sp,-16
51 │ sd ra,8(sp)
52 │ call initialize
53 │ lui a3,%hi(a)
54 │ lui a4,%hi(b)
55 │ vsetivli zero,4,e16,m1,ta,ma
56 │ addi a4,a4,%lo(b)
57 │ addi a3,a3,%lo(a)
58 │ vle16.v v2,0(a4)
59 │ vle16.v v1,0(a3)
60 │ lui a4,%hi(c)
61 │ addi a4,a4,%lo(c)
62 │ li a0,0
63 │ vaaddu.vv v1,v1,v2
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/118103
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_conditional_register_usage): Add
the VXRM as the global_regs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pr118103-2.c: New test.
* gcc.target/riscv/rvv/base/pr118103-run-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
With release checking we get an uninitialization warning
inside aarch64_split_move because of jump threading for the case of `npieces==0`
but `npieces` is never 0 (but there is no way the compiler can know that.
So this fixes the issue by adding a `gcc_assert` to the function which asserts
that `npieces > 0` and fixes the uninitialization warning.
Bootstrapped and tested on aarch64-linux-gnu (with and without --enable-checking=release).
The warning:
aarch64.cc: In function 'void aarch64_split_move(rtx, rtx, machine_mode)':
aarch64.cc:3418:31: error: '*(rtx_def**)((char*)&dst_pieces + offsetof(auto_vec<rtx_def*, 4>,auto_vec<rtx_def*, 4>::m_data[0]))' may be used uninitialized [-Werror=maybe-uninitialized]
3418 | if (reg_overlap_mentioned_p (dst_pieces[0], src))
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
aarch64.cc:3408:20: note: 'dst_pieces' declared here
3408 | auto_vec<rtx, 4> dst_pieces, src_pieces;
| ^~~~~~~~~~
PR target/118771
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_split_move): Assert that npieces is
greater than 0.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The amdhsa.version depends on the code object version; while V3 had 1.0,
V4 has 1.1 and V5 (and V6) have 1.2. GCC used 1.0 but generated since
a while either V4 or, with -march=gfx...-generic, V6. Now it uses the
proper version again.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Update
'amdhsa.version' output to match used code version.
* config/gcn/gen-gcn-device-macros.awk: Add a comment to
crosslink.
|
|
For mask{eq,ne}z, rk is always compared with 0 in the full width, thus
the mode for rk should be X.
I found the issue reviewing a patch fixing a similar issue for RISC-V
XTheadCondMov [1], but interestingly I cannot find a test case really
blowing up on LoongArch. But as the issue is obvious enough let's fix
it anyway so it won't blow up in the future.
[1]: https://gcc.gnu.org/pipermail/gcc-patches/2025-January/674004.html
gcc/ChangeLog:
* config/loongarch/loongarch.md
(*sel<code><GPR:mode>_using_<GPR2:mode>): Rename to ...
(*sel<code><GPR:mode>_using_<X:mode>): ... here.
(GPR2): Remove as nothing uses it now.
|
|
This patch adds gfx9-generic, completing the gfx*-generic support.
It also adds all gfx* devices that are part of any of the gfx*-generic,
i.e. gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.
gcc/ChangeLog:
* config/gcn/gcn-devices.def (GCN_DEVICE): Add gfx9-generic,
gfx902, gfx904, gfx909, gfx1031, gfx1032, gfx1033, gfx1034,
gfx1035, gfx1101, gfx1102, gfx1150, gfx1151, gfx1152, and gfx1153.
Add a currently unused column linking, a specific ISA to a generic
one (if it exists).
* config/gcn/gcn-tables.opt: Regenerate
* doc/invoke.texi (AMD GCN): Add the the new gfc... and the older
gfx{10-3,11}-generic to -march= as 'experimental'.
|
|
When compiling with -g, mkoffload.cc creates a device object file itself;
however, in order that the linker dos not complain, the ELF flags must
match what the compiler / linker does. For gfx906, the assembler defaults
to sramecc = any, but gcn-devices.def contained unsupported, which is not
the same - causing link errors. That's a regression caused by commit
r15-4540-ga6b26e5ea09779 - which can be best seen by looking at the
changes to mkoffload.cc.
Additionally, this commit adds '...' to the GCN_DEVICE #define in gcn.cc
to make it agnostic to the addition of fields.
gcc/ChangeLog:
* config/gcn/gcn-devices.def (GCN_DEVICE): Change sramecc for
gfx906 to 'any'.
* config/gcn/gcn.cc (GCN_DEVICE): Add tailing ... to #define.
|
|
commit 3b9b8d6cfdf59337f4b7ce10ce92a98044b2657b
Author: Surya Kumari Jangala <jskumari@linux.ibm.com>
Date: Tue Jun 25 08:37:49 2024 -0500
ira: Scale save/restore costs of callee save registers with block frequency
scales the cost of saving/restoring a callee-save hard register in epilogue
and prologue with the entry block frequency, which, if not optimizing for
size, is 10000, for all targets. As the result, callee-saved registers
may not be used to preserve local variable values across calls on some
targets, like x86. Add a target hook for the callee-saved register cost
scale in epilogue and prologue used by IRA. The default version of this
target hook returns 1 if optimizing for size, otherwise returns the entry
block frequency. Add an x86 version of this target hook to restore the
old behavior prior to the above commit.
PR rtl-optimization/111673
PR rtl-optimization/115932
PR rtl-optimization/116028
PR rtl-optimization/117081
PR rtl-optimization/117082
PR rtl-optimization/118497
* ira-color.cc (assign_hard_reg): Call the target hook for the
callee-saved register cost scale in epilogue and prologue.
* target.def (ira_callee_saved_register_cost_scale): New target
hook.
* targhooks.cc (default_ira_callee_saved_register_cost_scale):
New.
* targhooks.h (default_ira_callee_saved_register_cost_scale):
Likewise.
* config/i386/i386.cc (ix86_ira_callee_saved_register_cost_scale):
New.
(TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_IRA_CALLEE_SAVED_REGISTER_COST_SCALE):
New.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
stack_protect_{set,test}_<mode> were showing up in RTL dumps as
UNSPEC_COPYSIGN and UNSPEC_FMV_X_W due to UNSPEC_SSP_SET and
UNSPEC_SSP_TEST being put in the unspecv enum instead of unspec.
gcc/ChangeLog:
* config/riscv/riscv.md: Move UNSPEC_SSP_SET and UNSPEC_SSP_TEST
to unspec enum.
|
|
gcc/
* config/avr/avr.opt.urls: Add mcvt.
|
|
Some AVR devices support a CVT:
- Devices from the 0-series, 1-series, 2-series.
- AVR16, AVR32, AVR64, AVR128 devices.
The support is provided by means of a startup code file
crt<mcu>-cvt.o from AVR-LibC v2.3 that can be linked instead
of the traditional crt<mcu>.o.
This patch adds a new command line option -mcvt that links
that CVT startup code (or issues an error when the device
doesn't support a CVT).
PR target/118764
gcc/
* config/avr/avr.opt (-mcvt): New target option.
* config/avr/avr-arch.h (AVR_CVT): New enum value.
* config/avr/avr-mcus.def: Add AVR_CVT flag for devices that
support it.
* config/avr/avr.cc (avr_handle_isr_attribute) [TARGET_CVT]: Issue
an error when a vector number larger that 3 is used.
* config/avr/gen-avr-mmcu-specs.cc (McuInfo.have_cvt): New property.
(print_mcu) <*avrlibc_startfile>: Use crt<mcu>-cvt.o depending
on -mcvt (or issue an error when the device doesn't support a CVT).
* doc/invoke.texi (AVR Options): Document -mcvt.
|
|
gcc/
PR target/118768
* config/avr/genmultilib.awk: Parse the AVR_MCU lines in
a more robust way w.r.t. white spaces.
|
|
PR target/118561
gcc/ChangeLog:
* config/loongarch/loongarch-builtins.cc
(loongarch_expand_builtin_lsx_test_branch):
NULL_RTX will not be returned when an error is detected.
(loongarch_expand_builtin): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/pr118561.c: New test.
|
|
I was looking at a regression on the bfin port with a recent change to the IRA
and stumbled across this just doing a general port healthyness evaluation.
The ABS instruction in the blackfin ISA is defined as saturating on INT_MIN,
which is a bit unexpected. We certainly can't use it when -fwrapv is enabled.
Given the failures on the C23 uabs tests, I'm inclined to just disable the
pattern completely.
Fixes pr23047, uabs-2 and uabs-3.
While it's not a regression, it's the blackfin port, so I think we've got a
higher degree of freedom here.
Pushing to the trunk.
gcc/
* config/bfin/bfin.md (abssi): Disable pattern.
|
|
gcc.target/aarch64/sve/acle/general/ldff1_8.c and
gcc.target/aarch64/sve/ptest_1.c were failing because the
aarch64 port was giving a zero (unknown) cost to instructions
that compute two results in parallel. This was latent until
r15-1575-gea8061f46a30, which fixed rtl-ssa to treat zero costs
as unknown.
A long-standing todo here is to make insn_cost derive costs from md
information, rather than having to write a lot of matching code in
aarch64_rtx_costs. But that's not something we can do for GCC 15.
This patch instead treats the cost of a PARALLEL as being the maximum
cost of its constituent sets. I don't like this very much, since it
isn't really target-specific behaviour. If it were stage 1, I'd be
trying to change pattern_cost instead.
gcc/
* config/aarch64/aarch64.cc (aarch64_insn_cost): Give PARALLELs
the same cost as the costliest SET.
|
|
When generating thumb2 code,
LDM SP!, {PC}
is a two-byte instruction, whereas
LDR PC, [SP], #4
is needs 4 bytes. When optimizing for size, or when there's no obvious
performance benefit prefer the former.
gcc/ChangeLog:
PR target/118089
* config/arm/arm.cc (thumb2_expand_return): Use LDM SP!, {PC}
when optimizing for size, or when there's no performance benefit over
LDR PC, [SP], #4.
(arm_expand_epilogue): Likewise.
|
|
This pattern is intended to be used only by the epilogue generation
code and will always use fixed hard registers. As such, it does not
need any register constraints, which might be misleading if a
post-reload pass wanted to try renumbering various registers. So
remove the constraints.
Futhermore, to permit this pattern to match when popping just the PC
(which is not a valid register_operand), remove the match on the first
transfer register: pop_multiple_return will validate everything it
needs to.
gcc/ChangeLog:
* config/arm/arm.md (*pop_multiple_with_writeback_and_return): Remove
constraints. Don't validate the first transfer register here.
|
|
I needed to make some adjustments to this function to permit a push or
pop of a single register in thumb2 code, since ldm/stm can be a
two-byte instruction instead of 4. Trying to read the code as it was
made me scratch my head as the logic was not very clear. So this
patch cleans up the code somewhat, fixes a couple of minor bugs and
removes the limit of having to use multiple registers when using this
form of the instruction (the shape of this pattern is such that I
can't see it being generated automatically by the compiler, so there
should be no adverse affects of this).
Buglets fixed:
- Validate that the first element contains RETURN if we're matching
a return instruction.
- Don't allow the base address register to be stored if saving regs
and the address is being updated (this is unpredictable in the
architecture).
- Verify that the last register loaded in a RETURN insn is the PC.
gcc/
* config/arm/arm.cc (decompose_addr_for_ldm_stm): New function.
(ldm_stm_operation_p): Rework to clarify logic. Allow single
registers to be pushed or popped using LDM/STM.
|
|
Enable use of Armv8-M instruction set.
Account for CVE-2021-35465 mitigation [PR102035].
The -mfix-cmse-cve-2021-35465 option is enabled by default,
if -mcpu=cortex-m33 is used.
gcc/
* config/arm/t-rtems: Add Cortex-M33 multilib.
|
|
Commit 0990d93dd8a4 ("IBM Z: Use @PLT symbols for local functions in
64-bit mode") made GCC call both static and non-static functions and
load both static and non-static function addresses with the @PLT
suffix. This made it difficult for linkers to distinguish calling and
address taking instructions [1]. It is currently assumed that the
R_390_PLT32DBL relocation, corresponding to the @PLT suffix, is used
only for calling, and the R_390_PC32DBL relocation, corresponding to
the empty suffix, is used only for address taking.
Linkers needs to make this distinction in order to decide whether to
ask ld.so to use canonical PLT entries. Normally GOT entries in shared
objects contain addresses of the respective functions, with one notable
exception: when a no-pie executable calls the respective function and
also takes its address. Such executables assume that all addresses are
known in advance, so they use addresses of the respective PLT entries.
For consistency reasons, all respective GOT entries in the process must
also use them.
When a linker sees that a no-pie executable both calls a function and
also takes its address, it creates a PLT entry and asks ld.so to
consider it canonical by setting the respective undefined symbol's
address, which is normally 0, to the address of this PLT entry.
Improve the situation by not using @PLT with larl.
Now that @PLT is not used with larl, also drop the 31-bit handling,
which was required because 31-bit PLT entries require %r12 to point to
the respective object's GOT, and this requirement is not satisfied when
calling them by pointer from another object.
Also drop the weak symbol handling, which was required because it is
not possible to load an undefined weak symbol address (0) using larl.
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=29655
gcc/ChangeLog:
* config/s390/s390.cc (print_operand): Remove the no longer
necessary 31-bit and weak symbol handling.
* config/s390/s390.md (*movdi_64): Do not use @PLT with larl.
(*movsi_larl): Likewise.
(main_base_64): Likewise.
(reload_base_64): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/s390/call-z10-pic-nodatarel.c: Adjust
expectations.
* gcc.target/s390/call-z10-pic.c: Likewise.
* gcc.target/s390/call-z10.c: Likewise.
* gcc.target/s390/call-z9-pic-nodatarel.c: Likewise.
* gcc.target/s390/call-z9-pic.c: Likewise.
* gcc.target/s390/call-z9.c: Likewise.
|
|
gcc/ChangeLog:
* config/i386/i386.md (*sibcall_pop_memory):
Disable for TARGET_INDIRECT_BRANCH_REGISTER
* config/i386/predicates.md (call_insn_operand): Enable when
"satisfies_constraint_Bw (op)" is true, instead of open-coding
constraint here.
(sibcall_insn_operand): Ditto with "satisfies_constraint_Bs (op)"
|
|
This patch fixes the dupq_* testsuite failures. The tests were
introduced with r15-3669-ga92f54f580c3 (which was a nice improvement)
and Pengxuan originally had a follow-on patch to recognise INDEX
constants during vec_init.
I'd originally wanted to solve this a different way, using wildcards
when building a vector and letting vector_builder::finalize find the
best way of filling them in. I no longer think that's the best
approach though. Stepped constants are likely to be more expensive
than unstepped constants, so we should first try finding an unstepped
constant that is valid, even if it has a longer representation than
the stepped version.
This patch therefore uses a variant of Pengxuan's idea.
While there, I noticed that the (old) code for finding an unstepped
constant only tried varying one bit at a time. So for index 0 in a
16-element constant, the code would try picking a constant from index 8,
4, 2, and then 1. But since the goal is to create "fewer, larger,
repeating parts", it would be better to iterate over a bit-reversed
increment, so that after trying an XOR with 0 and 8, we try adding 4
to each previous attempt, then 2 to each previous attempt, and so on.
In the previous example this would give 8, 4, 12, 2, 10, 6, 14, ...
The test shows an example of this for 8 shorts.
gcc/
* config/aarch64/aarch64.cc (aarch64_choose_vector_init_constant):
New function, split out from...
(aarch64_expand_vector_init_fallback): ...here. Use a bit-
reversed increment to find a constant index. Add support for
stepped constants.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/dupq_12.c: New test.
|
|
LRA does not correctly support hard-register input operands that
are clobbered. This is needed to support millicode calls on hppa.
The operand setup is sometimes deleted.
This problem can be avoided by hiding hard-register input operands
using match_operand. This also potentially allows for constraints
that specify the operand is both read and written.
2025-02-03 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
PR rtl-optimization/117248
* config/pa/predicates.md (r25_operand): New predicate.
(r26_operand): Likewise.
* config/pa/pa.md: Use match_operand for r25 and r26 hard
register operands in mult, div, udiv, mod and umod millicode
patterns.
|
|
Update
commit dd6247cb8fc11a15e23e949092f89d24ff329209
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Jan 31 12:29:04 2025 +0800
x86: Handle TARGET_INDIRECT_BRANCH_REGISTER for -fno-plt
to change "if (TARGET_X32 ...)" back to "else if (TARGET_X32 ...)".
PR target/118713
* config/i386/i386-expand.cc (ix86_expand_call): Change "if
(TARGET_X32 ...)" back to "else if (TARGET_X32 ...)".
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
If TARGET_INDIRECT_BRANCH_REGISTER is true, indirect call and jump should
use register, not memory. Update Bs, Bw and Bz constraints to disable
indirect call over memmory if TARGET_INDIRECT_BRANCH_REGISTER true, change
x32 call over GOT slot to call over register and also disable sibcall
over memory.
gcc/
PR target/118713
* config/i386/constraints.md (Bs): Always disable if
TARGET_INDIRECT_BRANCH_REGISTER is true.
(Bw): Likewise.
* config/i386/i386-expand.cc (ix86_expand_call): Force indirect
call via register for x32 GOT slot call if
TARGET_INDIRECT_BRANCH_REGISTER is true.
* config/i386/i386-protos.h (ix86_nopic_noplt_attribute_p): New.
* config/i386/i386.cc (ix86_nopic_noplt_attribute_p): Make it
global.
* config/i386/i386.md (*call_got_x32): Disable indirect call via
memory for TARGET_INDIRECT_BRANCH_REGISTER.
(*call_value_got_x32): Likewise.
(*sibcall_value_pop_memory): Likewise.
* config/i386/predicates.md (constant_call_address_operand):
Return false if both TARGET_INDIRECT_BRANCH_REGISTER and
ix86_nopic_noplt_attribute_p are true.
gcc/testsuite/
PR target/118713
* gcc.target/i386/pr118713-1-x32.c: New test.
* gcc.target/i386/pr118713-1.c: Likewise.
* gcc.target/i386/pr118713-2-x32.c: Likewise.
* gcc.target/i386/pr118713-2.c: Likewise.
* gcc.target/i386/pr118713-3-x32.c: Likewise.
* gcc.target/i386/pr118713-3.c: Likewise.
* gcc.target/i386/pr118713-4-x32.c: Likewise.
* gcc.target/i386/pr118713-4.c: Likewise.
* gcc.target/i386/pr118713-5-x32.c: Likewise.
* gcc.target/i386/pr118713-5.c: Likewise.
* gcc.target/i386/pr118713-6-x32.c: Likewise.
* gcc.target/i386/pr118713-6.c: Likewise.
* gcc.target/i386/pr118713-7-x32.c: Likewise.
* gcc.target/i386/pr118713-7.c: Likewise.
* gcc.target/i386/pr118713-8-x32.c: Likewise.
* gcc.target/i386/pr118713-8.c: Likewise.
* gcc.target/i386/pr118713-9-x32.c: Likewise.
* gcc.target/i386/pr118713-9.c: Likewise.
* gcc.target/i386/pr118713-10-x32.c: Likewise.
* gcc.target/i386/pr118713-10.c: Likewise.
* gcc.target/i386/pr118713-11-x32.c: Likewise.
* gcc.target/i386/pr118713-11.c: Likewise.
* gcc.target/i386/pr118713-12-x32.c: Likewise.
* gcc.target/i386/pr118713-12.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
This patch adds built-in functions __builtin_avr_strlen_flash,
__builtin_avr_strlen_flashx and __builtin_avr_strlen_memx.
Purpose is that higher-level functions can use __builtin_constant_p
on strlen without raising a diagnostic due to -Waddr-space-convert.
gcc/
* config/avr/builtins.def (STRLEN_FLASH, STRLEN_FLASHX)
(STRLEN_MEMX): New DEF_BUILTIN's.
* config/avr/avr.cc (avr_ftype_strlen): New static function.
(avr_builtin_supported_p): New built-ins are not for AVR_TINY.
(avr_init_builtins) <strlen_flash_node, strlen_flashx_node,
strlen_memx_node>: Provide new fntypes.
(avr_fold_builtin) [AVR_BUILTIN_STRLEN_FLASH]
[AVR_BUILTIN_STRLEN_FLASHX, AVR_BUILTIN_STRLEN_MEMX]: Fold if
possible.
* doc/extend.texi (AVR Built-in Functions): Document
__builtin_avr_strlen_flash, __builtin_avr_strlen_flashx,
__builtin_avr_strlen_memx.
libgcc/
* config/avr/t-avr (LIB1ASMFUNCS): Add _strlen_memx.
* config/avr/lib1funcs.S <L_strlen_memx, __strlen_memx>: Implement.
|
|
Some built-ins are not available for C++ since they are using
named address-spaces or fixed-point types.
gcc/
* config/avr/builtins.def (AVR_FIRST_C_ONLY_BUILTIN_ID): New macro.
* config/avr/avr-protos.h (avr_builtin_supported_p): New.
* config/avr/avr.cc (avr_builtin_supported_p): New function.
(avr_init_builtins): Only provide a built-in when it is supported.
* config/avr/avr-c.cc (avr_cpu_cpp_builtins): Only define the
__BUILTIN_AVR_<NAME> build-in defines when the associated built-in
function is supported.
* doc/extend.texi (AVR Built-in Functions): Add a note that
following built-ins are supported for only for GNU-C.
|
|
The following testcase is miscompiled on s390x-linux with e.g. -march=z13
(both -O0 and -O2) starting with r15-7053.
The problem is in the splitters which emulate TImode/V1TImode GT and GTU
comparisons.
For GT we want to do
(ior (gt (hi op1) (hi op2))
(and (eq (hi op1) (hi op2)) (gtu (lo op1) (lo op2))))
and for GTU similarly except for gtu instead of gt in there.
Now, the splitter emulation is using V2DImode comparisons where on s390x
the hi part is in the first element of the vector, lo part in the second,
and for the gtu case it swaps the elements of the vector.
So, we get the right result in the first element of the result vector.
But vrepg was then broadcasting the second element of the result vector
rather than the first, and the value of the second element of the vector
is instead
(ior (gt (lo op1) (lo op2))
(and (eq (lo op1) (lo op2)) (gtu (hi op1) (hi op2))))
so something not really usable for the emulated comparison.
The following patch fixes that. The testcase tries to test behavior of
double-word smin/smax/umin/umax with various cases of the halves of both
operands (one that is sometimes EQ, sometimes GT, sometimes LT, sometimes
GTU, sometimes LTU).
2025-01-30 Jakub Jelinek <jakub@redhat.com>
Stefan Schulze Frielinghaus <stefansf@gcc.gnu.org>
PR target/118696
* config/s390/vector.md (*vec_cmpgt<mode><mode>_nocc_emu,
*vec_cmpgtu<mode><mode>_nocc_emu): Duplicate the first rather than
second V2DImode element.
* gcc.dg/pr118696.c: New test.
* gcc.target/s390/vector/pr118696.c: New test.
* gcc.target/s390/vector/vec-abs-emu.c: Expect vrepg with 0 as last
operand rather than 1.
* gcc.target/s390/vector/vec-max-emu.c: Likewise.
* gcc.target/s390/vector/vec-min-emu.c: Likewise.
|
|
libgcc has a module for __negsi2: REG_22:SI := - REG_22:SI.
This patch adds a pattern that allows to share that function
provided optimize_size.
gcc/
* config/avr/avr.md (*negsi2.libgcc): New insn.
|
|
When using the "Q" constraint in the inline assembler, the displacement value
could exceed the range specified by the instruction.
To avoid this issue, a displacement range check is added to the "Q" constraint.
gcc/
* config/rx/constraints.md (Q): Also check that the address
passes rx_is_restricted_memory-address.
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_TRUNC. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff7f(-129 for HImode) trunc to QImode, we actually
want compare -129 to -128, but if there is no sign extend like lbu, we will
compare 0xff7f to 0xffffffffffffff80(assum Xmode is DImode). Thus, we have
to sign extend 0xff(Qmode) to 0xffffffffffffff7f(assume Xmode is DImode)
before compare in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_sstrunc): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688.h: Add test helper macros.
* gcc.target/riscv/pr117688-trunc-run-1-s16-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s32-to-s8.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s16.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s32.c: New test.
* gcc.target/riscv/pr117688-trunc-run-1-s64-to-s8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_SUB. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like sub. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff(-1 for QImode) sub 0x1(1 for QImode), we actually
want to -1 - 1 = -2, but if there is no sign extend like lbu, we will get
0xff - 1 = 0xfe which is incorrect. Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before sub in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_sssub): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688.h: Add test helper macro.
* gcc.target/riscv/pr117688-sub-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-sub-run-1-s8.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to fix the wroing code generation for the scalar
signed SAT_ADD. The input can be QI/HI/SI/DI while the alu like sub
can only work on Xmode. Unfortunately we don't have sub/add for
non-Xmode like QImode in scalar, thus we need to sign extend to Xmode
to ensure we have the correct value before ALU like add. The gen_lowpart
will generate something like lbu which has all zero for highest bits.
For example, when 0xff(-1 for QImode) plus 0x2(1 for QImode), we actually
want to -1 + 2 = 1, but if there is no sign extend like lbu, we will get
0xff + 2 = 0x101 which is incorrect. Thus, we have to sign extend 0xff(Qmode)
to 0xffffffffffffffff(assume XImode is DImode) before plus in Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
PR target/117688
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_ssadd): Leverage the helper
riscv_extend_to_xmode_reg with SIGN_EXTEND.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr117688-add-run-1-s16.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s32.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s64.c: New test.
* gcc.target/riscv/pr117688-add-run-1-s8.c: New test.
* gcc.target/riscv/pr117688.h: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to refactor the helper function of the SAT_*
scalar. The helper function will convert the define_pattern ops
to the xmode reg for the underlying code-gen. This patch add
new parameter for ZERO_EXTEND or SIGN_EXTEND if the input is const_int
or the mode is non-Xmode.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Rename from ...
(riscv_extend_to_xmode_reg): Rename to and add rtx_code for
zero/sign extend if non-Xmode.
(riscv_expand_usadd): Leverage the renamed function with ZERO_EXTEND.
(riscv_expand_ussub): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Nowhere near the top of my list, but a quick looksie Sunday led to an easy to
fix backend bug. It's not a regression, but given its the H8 backend I think
we've safely got a degree of freedom here.
The H8 has a constraint "U" which allowed both a subset of MEMs and REGs, so it
wasn't marked as a memory constraint. LRA doesn't really handle this well -- a
pseudo which didn't get a hard reg was replaced by its MEM. The stack slot
doesn't fit the limited addressing forms available and LRA didn't know it just
needed to reload the address into a reg.
Fixed by removing REG from the "U" constraint, turning "U" into a memory
constraint and adjusting a few patterns to allow "rU" instead of "U".
We don't really support C++ on the H8 and as a result libstdc++ won't build.
Interestingly enough that also keeps the C++ tests from working, even for a
compile-only test. So no testcase. Though I did check the reduced and
original test manually and ran it through my tester without any regressions.
PR target/114085
gcc/
* config/h8300/constraints.md (U): No longer accept REGs.
* config/h8300/logical.md (andqi3_2): Use "rU" rather than "U".
(andqi3_2_clobber_flags, andqi3_1, <code>qi3_1): Likewise.
* config/h8300/testcompare.md (tst_extzv_1_n): Likewise.
|