Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch would like to combine the vec_duplicate + vwaddu.vv to the
vwaddu.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
Before this patch:
11 beq a3,zero,.L8
12 vsetvli a5,zero,e32,m1,ta,ma
13 vmv.v.x v2,a2
...
16 .L3:
17 vsetvli a5,a3,e32,m1,ta,ma
...
22 vwaddu.vv v1,v2,v3
...
25 bne a3,zero,.L3
After this patch:
11 beq a3,zero,.L8
...
14 .L3:
15 vsetvli a5,a3,e32,m1,ta,ma
...
20 vwaddu.vx v1,a2,v3
...
23 bne a3,zero,.L3
The pattern of this patch only works on DImode, aka below pattern.
v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
+ (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));
Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
this extend op after expand.
For uint16_t => uint32_t we have:
(set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))
For uint32_t => uint64_t we have:
(set (reg:DI 148 [ _6 ])
(zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))
We can see there is no zero_extend for uint16_t to uint32_t, and we
cannot hit the pattern above. So the combine will try below pattern
for uint16_t to uint32_t.
v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
+ (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))
But it cannot match the vwaddu sematics, thus we need another handing
for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
uint16_t.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*widen_first_<any_extend:su>_vx_<mode>):
Add helper bridge pattern for vwaddu.vx combine.
(*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add
new pattern to match vwaddu.vx combine.
* config/riscv/iterators.md: Add code attr to get extend CODE.
* config/riscv/vector-iterators.md: Add Dmode iterator for
widen.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
it directly
In recent gcc versions, REGNO_OK_FOR_BASE_P() is not called directly, but
rather via regno_ok_for_base_p() which is a wrapper in gcc/addresses.h.
The wrapper obtains a hard register number from pseudo via reg_renumber
array, so REGNO_OK_FOR_BASE_P() does not need to take this into
consideration.
On the other hand, since there is only one use of REGNO_OK_FOR_BASE_P()
in the target-specific code, it would make more sense to simplify the
definition of REGNO_OK_FOR_BASE_P() and replace its call with that of
regno_ok_for_base_p().
gcc/ChangeLog:
* config/xtensa/xtensa.cc (#include):
Add "addresses.h".
* config/xtensa/xtensa.h (REGNO_OK_FOR_BASE_P):
Simplify to just a call to GP_REG_P().
(BASE_REG_P): Replace REGNO_OK_FOR_BASE_P() with the equivalent
call to regno_ok_for_base_p().
|
|
Add an expander for isnan using integer arithmetic. Since isnan is
just a compare, enable it only with -fsignaling-nans to avoid
generating spurious exceptions. This fixes part of PR66462.
int isnan1 (float x) { return __builtin_isnan (x); }
Before:
fcmp s0, s0
cset w0, vs
ret
After:
fmov w1, s0
mov w0, -16777216
cmp w0, w1, lsl 1
cset w0, cc
ret
gcc:
PR middle-end/66462
* config/aarch64/aarch64.md (isnan<mode>2): Add new expander.
gcc/testsuite:
PR middle-end/66462
* gcc.target/aarch64/pr66462.c: Update test.
|
|
An ICE was reported in the following test case:
svint8_t foo(svbool_t pg, int8_t op2) {
return svmul_n_s8_z(pg, svdup_s8(1), op2);
}
with a type mismatch in 'vec_cond_expr':
_4 = VEC_COND_EXPR <v16_2(D), v32_3(D), { 0, ... }>;
The reason is that svmul_impl::fold folds calls where one of the operands
is all ones to the other operand using
gimple_folder::fold_active_lanes_to. However, we implicitly assumed
that the argument that is passed to fold_active_lanes_to is a vector
type. In the given test case op2 is a scalar type, resulting in the type
mismatch in the vec_cond_expr.
This patch fixes the ICE by forcing a vector type of the argument
in fold_active_lanes_to before the statement with the vec_cond_expr.
In the initial version of this patch, the force_vector statement was placed in
svmul_impl::fold, but it was moved to fold_active_lanes_to to align it with
fold_const_binary which takes care of the fixup from scalar to vector
type using vector_const_binop.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
OK for trunk?
OK to backport to GCC 15?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR target/121602
* config/aarch64/aarch64-sve-builtins.cc
(gimple_folder::fold_active_lanes_to): Add force_vector
statement.
gcc/testsuite/
PR target/121602
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
|
|
Allows profiles input in '--with-arch'. Check profiles with
'riscv-profiles.def'.
gcc/ChangeLog:
* config.gcc: Accept RISC-V profiles in `--with-arch`.
* config/riscv/arch-canonicalize: Add profile detection and
skip canonicalization for profiles.
|
|
Moving RISC-V Profiles definations into 'riscv-profiles.def'. Add comments for
'riscv_profiles'.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (struct riscv_profiles): Add comments.
(RISCV_PROFILE): Removed.
* config/riscv/riscv-profiles.def: New file.
|
|
This patch implies zicsr for sdtrig and ssstrict extensions.
According to the riscv-privileged spec, the sdtrig and ssstrict extensions
are privileged extensions, so they should imply zicsr.
gcc/ChangeLog:
* config/riscv/riscv-ext.def: Imply zicsr.
|
|
gcc/ChangeLog:
* config/i386/predicates.md (avx_vbroadcast128_operand): New
predicate.
* config/i386/sse.md (*avx_vbroadcastf128_<mode>_perm): New
pre_reload splitter.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx_vbroadcastf128.c: New test.
|
|
Bring code model selection logic to vxworks.h as well.
for gcc/ChangeLog
* config/rs6000/vxworks.h (TARGET_CMODEL, SET_CMODEL): Define.
|
|
Add support for some recent AVR devices.
gcc/
* config/avr/avr-mcus.def: Add avr32eb14, avr32eb20,
avr32eb28, avr32eb32.
* doc/avr-mmcu.texi: Rebuild.
|
|
If a single instruction can store or move the whole block of memory, use
vector instruction and don't align destination.
gcc/
PR target/121934
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): If a
single instruction can store or move the whole block of memory,
use vector instruction and don't align destination.
gcc/testsuite/
PR target/121934
* gcc.target/i386/pr121934-1a.c: New test.
* gcc.target/i386/pr121934-1b.c: Likewise.
* gcc.target/i386/pr121934-2a.c: Likewise.
* gcc.target/i386/pr121934-2b.c: Likewise.
* gcc.target/i386/pr121934-3a.c: Likewise.
* gcc.target/i386/pr121934-3b.c: Likewise.
* gcc.target/i386/pr121934-4a.c: Likewise.
* gcc.target/i386/pr121934-4b.c: Likewise.
* gcc.target/i386/pr121934-5a.c: Likewise.
* gcc.target/i386/pr121934-5b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
After late-combine is added, split1 can see an input like
(insn 56 55 169 5
(set (reg/v:DI 87 [ n ])
(ior:DI (and:DI (reg/v:DI 87 [ n ])
(const_int 281474976710655 [0xffffffffffff]))
(and:DI (reg:DI 131 [ _45 ])
(const_int -281474976710656 [0xffff000000000000]))))
"pr121906.c":22:8 108 {*bstrins_di_for_ior_mask}
(nil))
And the splitter ends up emitting
(insn 184 55 185 5
(set (reg/v:DI 87 [ n ])
(reg:DI 131 [ _45 ]))
"pr121906.c":22:8 -1
(nil))
(insn 185 184 169 5
(set (zero_extract:DI (reg/v:DI 87 [ n ])
(const_int 48 [0x30])
(const_int 0 [0]))
(reg/v:DI 87 [ n ]))
"pr121906.c":22:8 -1
(nil))
which obviously lost everything in r87, instead of retaining its lower
bits as we expect. It's because the splitter didn't anticipate the
output register may be one of the input registers.
PR target/121906
gcc/
* config/loongarch/loongarch.md (*bstrins_<mode>_for_ior_mask):
Always create a new pseudo for the input register of the bstrins
instruction.
gcc/testsuite/
* gcc.target/loongarch/pr121906.c: New test.
|
|
This implements the new vector optabs vec_<su>addh_narrow<mode>
adding support for in-vectorizer use for early break.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (vec_addh_narrow<mode>): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-addhn_1.c: New test.
|
|
Add an expander for isfinite using integer arithmetic. This is
typically faster and avoids generating spurious exceptions on
signaling NaNs. This fixes part of PR66462.
int isfinite1 (float x) { return __builtin_isfinite (x); }
Before:
fabs s0, s0
mov w0, 2139095039
fmov s31, w0
fcmp s0, s31
cset w0, hi
eor w0, w0, 1
ret
After:
fmov w1, s0
mov w0, -16777216
cmp w0, w1, lsl 1
cset w0, hi
ret
gcc:
PR middle-end/66462
* config/aarch64/aarch64.md (isfinite<mode>2): Add new expander.
gcc/testsuite:
PR middle-end/66462
* gcc.target/aarch64/pr66462.c: Add tests for isfinite.
|
|
In a CAS operation, even if expected != *memory we still need to do an
atomic load of *memory into output. But I made a mistake in the initial
implementation, causing the output to contain junk in this situation.
Like a normal atomic load, the atomic load embedded in the CAS semantic
is required to work on read-only page. Thus we cannot rely on sc.q to
ensure the atomicity of the load. Use LSX to perform the load instead,
and also use LSX to compare the 16B values to keep the ll-sc loop body
short.
gcc/ChangeLog:
* config/loongarch/sync.md (atomic_compare_and_swapti_scq):
Require LSX. Change the operands for the output, the memory,
and the expected value to LSX vector modes. Add a FCCmode
output to indicate if CAS has written the desired value into
memory. Use LSX to atomically load both words of the 16B value
in memory.
(atomic_compare_and_swapti): Pun the modes to satisify
the new atomic_compare_and_swapti_scq implementation. Read the
bool return value from the FCC instead of performing a
comparision.
|
|
This modifier is intended to output $r0 for (const_int 0), but the
logic:
GET_MODE (op) != TImode || (op != CONST0_RTX (TImode) && code != REG)
will reject (const_int 0) because (const_int 0) actually does not have
a mode and GET_MODE will return VOIDmode for it.
Use reg_or_0_operand instead to fix the issue.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_print_operand): Call
reg_or_0_operand for checking the sanity of %t.
|
|
In general, tail call optimization requires that the callee's saved
registers are a superset of the caller's.
The Standard Vector Calling Convention Variant (assembler: .variant_cc)
requires that a function with this calling convention preserves vector
registers v1-v7 and v24-v31 across calls (i.e. callee-saved). However,
the same set of registers are (function-local) temporary registers
(i.e. caller-saved) on the normal (non-vector) calling convention.
Even if a function with this calling convention variant calls another
function with a non-vector calling convention, those vector registers
are correctly clobbered -- except when the sibling (tail) call
optimization occurs as it violates the general rule mentioned above.
If this happens, following function body:
1. Save v1-v7 and v24-v31 for clobbering
2. Call another function with a non-vector calling convention
(which may destroy v1-v7 and/or v24-v31)
3. Restore v1-v7 and v24-v31
4. Return.
may be incorrectly optimized into the following sequence:
1. Save v1-v7 and v24-v31 for clobbering
2. Restore v1-v7 and v24-v31 (?!)
3. Jump to another function with a non-vector calling convention
(which may destroy v1-v7 and/or v24-v31).
This commit suppresses cross CC sibling call optimization from
the vector calling convention variant.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_function_ok_for_sibcall):
Suppress cross calling convention sibcall optimization from
the vector calling convention variant.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall.c: New test.
* gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-1.c: Ditto.
* gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-2.c: Ditto.
|
|
ifcvt likes to emit
(set
(if_then_else)
(ge (reg 1) (reg2))
(reg 1)
(reg 2))
which can be recognized as min/max patterns in the backend.
This patch adds such patterns and the respective iterators as well as a
test.
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<bitmanip_minmax_cmp_insn>_cmp_<mode>3):
New min/max ifcvt pattern.
* config/riscv/iterators.md (minu): New iterator.
* config/riscv/riscv.cc (riscv_noce_conversion_profitable_p):
Remove riscv-specific adjustment.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zbb-min-max-04.c: New test.
|
|
can_find_related_mode_p incorrectly handled VLS (Vector Length Specific)
types by using TARGET_MIN_VLEN directly, which is in bits, instead of
converting it to bytes as required.
This patch fixes the issue by dividing TARGET_MIN_VLEN by 8 to convert
from bits to bytes when calculating the number of units for VLS modes.
The fix enables proper vectorization for several test cases:
- zve32f-1.c: Now correctly finds vector mode for SF mode in foo3,
enabling vectorization of an additional loop.
- zve32f_zvl256b-1.c and zve32x_zvl256b-1.c: Added -mrvv-max-lmul=m2
option to handle V8SI[2] (vector array mode) requirements during
vectorizer analysis, which needs V16SI to pass, and V16SI was enabled
incorrectly before.
Changes since V4:
- Fix testsuite, also triaged why changed.
gcc/ChangeLog:
* config/riscv/riscv-selftests.cc (riscv_run_selftests): Call
run_vectorize_related_mode_selftests.
(test_vectorize_related_mode): New function to test
vectorize_related_mode behavior.
(run_vectorize_related_mode_selftests): New function to run all
vectorize_related_mode tests.
(run_vectorize_related_mode_vla_selftests): New function to test
VLA modes.
(run_vectorize_related_mode_vls_rv64gcv_selftests): New function to
test VLS modes on rv64gcv.
(run_vectorize_related_mode_vls_rv32gc_zve32x_zvl256b_selftests):
New function to test VLS modes on rv32gc_zve32x_zvl256b.
(run_vectorize_related_mode_vls_selftests): New function to run all
VLS mode tests.
* config/riscv/riscv-v.cc (can_find_related_mode_p): Fix VLS type
handling by converting TARGET_MIN_VLEN from bits to bytes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/zve32f-1.c: Update expected
vectorization count from 2 to 3.
* gcc.target/riscv/rvv/autovec/zve32f_zvl256b-1.c: Add
-mrvv-max-lmul=m2 option.
* gcc.target/riscv/rvv/autovec/zve32x_zvl256b-1.c: Add
-mrvv-max-lmul=m2 option.
|
|
That was introduced in commit 42db504c2fb2b460e24ca0e73b8559fc33b3cf33,
over 15 years ago!
gcc/ChangeLog:
* config/xtensa/xtensa.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P):
Change "Xtrnase" in the comment to "Xtensa".
|
|
PR121878 shows a typo in the tt-ascalon-d8's pipeline description that
leads to an ICE. The problem is that the vector define_insn_reservation
patterns test for scalar modes rather than vector modes, meaning the
insns don't get handled correctly. We could correct the modes, but given
we could have multiple VLEN values, the number of modes we'd have to check
can be large and mode iterators are not allowed in the mode attribute check.
Instead, I've removed the mode check and replaced it with a test of the
Selected Elenent Width (SEW).
2025-09-09 Peter Bergner <bergner@tenstorrent.com>
gcc/
PR target/121878
* config/riscv/tt-ascalon-d8.md (tt_ascalon_d8_vec_idiv_half): Test the
Selected Element Width (SEW) rather than the mode.
(tt_ascalon_d8_vec_idiv_single): Likewise.
(tt_ascalon_d8_vec_idiv_double): Likewise.
(tt_ascalon_d8_vec_float_divsqrt_half): Likewise.
(tt_ascalon_d8_vec_float_divsqrt_single): Likewise.
(tt_ascalon_d8_vec_float_divsqrt_double): Likewise.
gcc/testsuite/
PR target/121878
* gcc.target/riscv/pr121878.c: New test.
Signed-off-by: Peter Bergner <bergner@tenstorrent.com>
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a minus RTL instruction. The other
minus operand is already wide.
Before this patch, we have four instructions, e.g.:
fcvt.d.s fa0,fa0
vsetvli a5,zero,e64,m1,ta,ma
vfmv.v.f v2,fa0
vfsub.vv v1,v1,v2
After, we get only one:
vfwsub.wf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwsub_wf_<mode>): New pattern to
combine float_extend + vec_duplicate + vfsub.vv into vfwsub.wf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwsub.wf.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h
(DEF_VF_BINOP_WIDEN_CASE_2, DEF_VF_BINOP_WIDEN_CASE_3): Swap operands.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-2-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-2-f32.c: New test.
|
|
To properly implement __builtin_ffs for SI mode, implement clz and
(for >= z17) ctz for SI mode. Otherwise, gcc falls back to a libcall
which causes problems for Linux kernel code.
Also adjust the C?Z_DEFINED_VALUE_AT_ZERO macros to return 2. Since
the optabs now return exactly the value set by these macros, return
value 2 is more appropriate and leads to better code.
gcc/ChangeLog:
* config/s390/s390.h (CLZ_DEFINED_VALUE_AT_ZERO): Adjust and
return 2.
(CTZ_DEFINED_VALUE_AT_ZERO): Return 2.
* config/s390/s390.md (clzsi2): Implement.
(ctzsi2): Implement.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr109011-2.c: Fix expected outcome.
* gcc.dg/vect/pr109011-4.c: Fix expected outcome.
* gcc.target/s390/ffs-1.c: New test.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
|
|
Because of a wrong define_insn for vec_extract_plus a vector access wasn't
combined with a preceeding plus operation which set the offset at which
to perform the vector access even though the instruction offers that capability.
Bootstrapped and regtested on s390x.
gcc/ChangeLog:
* config/s390/vector.md (*vec_extract<mode>_plus_zero_extend):
Fix define insn.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-extract-3.c: New test.
Signed-off-by: Maximilian Immanuel Brandtner <maxbr@linux.ibm.com>
|
|
The previous definition had all the GFX11 register counts doubled to fix a bug
that was encountered in early testing. This seems to have been a
misunderstanding of the problem (which is no longer reproducible).
gcc/ChangeLog:
* config/gcn/gcn-devices.def: Correct the Max ISA VGPRs counts for
GFX10 and GFX11 devices.
* config/gcn/gcn.cc (gcn_hsa_declare_function_name): Remove the wave64
VGPR count fudge.
|
|
Fix an unrecognised insn ICE that only shows while using builtins at -O0.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_expand_builtin_1): Enable the "mode" parameter
and ensure that "target" is a register for most of the builtins.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a minus RTL instruction. Both
minus operands are widened.
Before this patch, we have six instructions, e.g.:
fcvt.d.s fa0,fa0
vsetvli a5,zero,e64,m1,ta,ma
vfmv.v.f v3,fa0
vfwcvt.f.f.v v1,v2
vsetvli zero,zero,e64,m1,ta,ma
vfsub.vv v1,v1,v3
After, we get only one:
vfwsub.vf v1,v2,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwsub_vf_<mode>): New pattern to
combine float_extend + vec_duplicate + vfwsub.vv into vfwsub.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwsub.vf.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h
(DEF_VF_BINOP_WIDEN_CASE_0, DEF_VF_BINOP_WIDEN_CASE_1): Swap operands.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-1-f32.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus RTL instruction. The other
plus operand is already wide.
Before this patch, we have four instructions, e.g.:
fcvt.d.s fa0,fa0
vsetvli a5,zero,e64,m1,ta,ma
vfmv.v.f v2,fa0
vfadd.vv v1,v1,v2
After, we get only one:
vfwadd.wf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwadd_wf_<mode>): New pattern to
combine float_extend + vec_duplicate + vfadd.vv into vfwadd.wf.
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><mode>_scalar): Swap and reorder
operands to match the RTL emitted by expand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwadd.wf.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for single
widening variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: Add support
for single widening variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-2-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-2-f32.c: New test.
|
|
This reverts commit 1b7bcac0327ccd84f1966c748f4d1aedef64a9c5.
PR target/121785
gcc/
* config/aarch64/aarch64-simd.md (*bcaxqdi4): Delete.
gcc/testsuite/
* gcc.target/aarch64/simd/bcax_d.c: Remove tests for DImode arguments.
|
|
Enable SSE4.1 ceil/floor/trunc for -Os to replace a function call with
roundss or roundsd by dropping the !flag_trapping_math check.
gcc/
PR target/121861
* config/i386/i386.cc (ix86_optab_supported_p): Drop
!flag_trapping_math check for floor_optab, ceil_optab and
btrunc_optab.
gcc/testsuite/
PR target/121861
* gcc.target/i386/pr121861-1a.c: New file.
* gcc.target/i386/pr121861-1b.c: Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
|
|
gcc/ChangeLog:
* config/i386/i386-expand.cc (expand_vec_perm_vpermil): Extend
to handle V8SImode.
* config/i386/i386.cc (avx_vpermilp_parallel): Extend to
handle vector integer modes with same vector size and same
component size.
* config/i386/sse.md
(<sse2_avx_avx512f>_vpermilp<mode><mask_name>): Ditto.
(V48_AVX): New mode iterator.
(ssefltmodesuffix): Extend for V16SI/V8DI/V16SF/V8DF.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx256_avoid_vec_perm-3.c: New test.
* gcc.target/i386/avx256_avoid_vec_perm-4.c: New test.
* gcc.target/i386/avx512bw-vpalignr-4.c: Adjust testcase.
* gcc.target/i386/avx512vl-vpalignr-4.c: Ditto.
|
|
SLP may take a broadcast as kind of vec_perm, the patch checks the
permutation index to exclude those false positive.
gcc/ChangeLog:
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Check permutation index for vec_perm, don't count it if we
know it's not a cross-lane permutation.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx256_avoid_vec_perm.c: Adjust testcase.
* gcc.target/i386/avx256_avoid_vec_perm-2.c: New test.
* gcc.target/i386/avx256_avoid_vec_perm-5.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a plus RTL instruction.
Before this patch, we have four instructions, e.g.:
fcvt.d.s fa0,fa0
vsetvli a5,zero,e64,m1,ta,ma
vfmv.v.f v3,fa0
vfwadd.wv v1,v3,v2
After, we get only one:
vfwadd.vf v1,v2,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwadd_vf_<mode>): New pattern to
combine float_extend + vec_duplicate + vfwadd.vv into vfwadd.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwadd.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h
(DEF_VF_BINOP_WIDEN_CASE_0): Fix OP.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f32.c: New test.
|
|
If-conversion isn't being applied to this nbench code:
#include <stdint.h>
#define INTERNAL_FPF_PRECISION 4
typedef uint16_t u16;
void ShiftMantLeft1(u16 *carry, u16 *mantissa)
{
int i;
int new_carry;
u16 accum;
for(i=INTERNAL_FPF_PRECISION-1;i>=0;i--)
{ accum=mantissa[i];
new_carry=accum & 0x8000;
accum=accum<<1;
if(*carry)
accum|=1;
*carry=new_carry;
mantissa[i]=accum;
}
return;
}
Bumping branch_cost from 3 to 4 triggers if-conversion, improving the
nbench FP EMULATION result on Ascalon significantly. There's a risk
that more aggressive use of conditional zero instructions will negatively
impact workloads that predict well, but we haven't seen anything obvious.
gcc/ChangeLog:
* config/riscv/riscv.cc (tt_ascalon_d8_tune_info): Increase branch_cost
from 3 to 4.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the
minuend operand.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfsub.vv v1,v2,v1
After, we get only one:
vfrsub.vf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfrsub_vf_<mode>): New pattern to
combine vec_duplicate + vfsub.vv into vfrsub.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfrsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f64.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the
subtrahend operand.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfsub.vv v1,v1,v2
After, we get only one:
vfsub.vf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfsub_vf_<mode>): New pattern to
combine vec_duplicate + vfsub.vv into vfsub.vf.
* config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: Adjust scan
dumps.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfsub.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f64.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into a plus RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfadd.vv v1,v1,v2
After, we get only one:
vfadd.vf v1,v1,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfadd_vf_<mode>): New pattern to
combine vec_duplicate + vfadd.vv into vfadd.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: Adjust scan
dump.
* gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfadd.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for
vfadd.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f64.c: New test.
|
|
This pattern enables the combine pass (or late-combine, depending on the case)
to merge a float_extend'ed vec_duplicate into a mult RTL instruction.
Before this patch, we have six instructions, e.g.:
fcvt.d.s fa0,fa0
vsetvli a5,zero,e64,m1,ta,ma
vfmv.v.f v3,fa0
vfwcvt.f.f.v v1,v2
vsetvli zero,zero,e64,m1,ta,ma
vfmul.vv v1,v3,v1
After, we get only one:
vfwmul.vf v1,v2,fa0
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfwmul_vf_<mode>): New pattern to
combine float_extend + vec_duplicate + vfmul.vv into vfmul.vf.
* config/riscv/vector.md (*@pred_dual_widen_<optab><mode>_scalar):
Swap operands to match the RTL emitted by expand, i.e. first
float_extend then vec_duplicate.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwmul.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for
widening variants.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: New test
helper.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f32.c: New test.
|
|
These patterns enable the combine pass (or late-combine, depending on the case)
to merge a vec_duplicate into an unspec_vfmax RTL instruction.
Before this patch, we have two instructions, e.g.:
vfmv.v.f v2,fa0
vfmax.vv v1,v2,v1
After, we get only one:
vfmax.vf v1,v1,fa0
In some cases, it also shaves off one vsetvli.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vfmin_vf_ieee_<mode>): Rename into...
(*v<ieee_fmaxmin_op>_vf_<mode>): New pattern to combine vec_duplicate +
vf{max,min}.vv (unspec) into vf{max,min}.vf.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f16.c: Add vfmax.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f16.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f64.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f16.c: Add vfmax. Also add
missing -fno-fast-math.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f32.c: Likewise.
* gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f64.c: Likewise.
|
|
This extension defines vector instructions to calculae of the signed/unsigned
dot product of four SEW/4-bit data and accumulate the result into a SEWbit
element for all elements in a vector register.
gcc/ChangeLog:
* config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New class.
(class nds_vd4dotsu): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vd4dots): Ditto.
(nds_vd4dotsu): Ditto.
(nds_vd4dotu): Ditto.
* config/riscv/andes-vector.md
(@pred_nds_vd4dot<su><mode>): New pattern.
(@pred_nds_vd4dotsu<mode>): New pattern.
* config/riscv/genrvv-type-indexer.cc (main): Modify sew of QUAD_FIX,
QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED.
* config/riscv/riscv-vector-builtins.cc
(qexti_vvvv_ops): New operand information.
(qexti_su_vvvv_ops): New operand information.
(qextu_vvvv_ops): New operand information.
* config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def.
(required_ext_to_isa_name): Add case XANDESVDOT_EXT.
(required_extensions_specified): Ditto.
(struct function_group_info): Ditto.
* config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: New test.
|
|
I missed that the new ascalon pipeline description was put into the wrong place
during review. The net is tests which wanted to use generic-ooo explicitly for
stability in the test output ended up getting a different pipeline model and
different codegen than the test expected.
This tripped a small number of vsetvl failures in the testsuite.
This has spun on riscv64-elf and riscv32-elf in my tester and fixes the
regression. I'm going to go ahead and push it as I'm likely offline this
afternoon/evening and don't want anyone else to waste their time chasing the
regression down.
gcc/
* config/riscv/riscv-opts.h (riscv_microarchitecture_type): Fix ordering.
|
|
This extension defines vector instructions to extract a pair of FP16 data from
a floating-point register. Multiply the top FP16 data with the FP16 elements
and add the result with the bottom FP16 data.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_FP_16 for XAndesvpackfph.
* config/riscv/andes-vector-builtins-bases.cc (nds_vfpmad): New class.
* config/riscv/andes-vector-builtins-bases.h: New def.
* config/riscv/andes-vector-builtins-functions.def (nds_vfpmadt): Ditto.
(nds_vfpmadb): Ditto.
(nds_vfpmadt_frm): Ditto.
(nds_vfpmadb_frm): Ditto.
* config/riscv/andes-vector.md (@pred_nds_vfpmad<nds_tb><mode>):
New pattern.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_F16_OPS): New def.
* config/riscv/riscv-vector-builtins.cc (f16_ops): Ditto
* config/riscv/riscv-vector-builtins.def (float32_type_node): Ditto.
* config/riscv/riscv-vector-builtins.h (XANDESVPACKFPH_EXT): Ditto.
(required_ext_to_isa_name): Add case XANDESVPACKFPH_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (VHF): New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: New test.
|
|
gcc/
PR target/121794
* config/avr/avr.md (cmpqi3): Use cpi R,0 if possible.
|
|
This patch would like to combine the vec_duplicate + vnmsub.vv to the
vnmsub.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
Before this patch:
11 │ beq a3,zero,.L8
12 │ vsetvli a5,zero,e32,m1,ta,ma
13 │ vmv.v.x v2,a2
...
16 │ .L3:
17 │ vsetvli a5,a3,e32,m1,ta,ma
...
22 │ vnmsub.vv v1,v2,v3
...
25 │ bne a3,zero,.L3
After this patch:
11 │ beq a3,zero,.L8
...
14 │ .L3:
15 │ vsetvli a5,a3,e32,m1,ta,ma
...
20 │ vnmsub.vx v1,a2,v3
...
23 │ bne a3,zero,.L3
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Rename from.
(*mul_minus_vx_<mode>): Rename to and add nmsub support.
* config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Rename from.
(@pred_mul_minus_vx_<mode>): Rename to and add nmsub support.
(*pred_nmsac_<mode>_scalar_undef): Rename from.
(*pred_mul_minus_vx<mode>_undef): Rename to and add nmsub support.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This extension defines vector load instructions to move sign-extended or
zero-extended INT4 data into 8-bit vector register elements.
gcc/ChangeLog:
* config/riscv/andes-vector-builtins-bases.cc
(nds_nibbleload): New class.
* config/riscv/andes-vector-builtins-bases.h (nds_vln8): New def.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector-builtins-functions.def (nds_vln8): Ditto.
(nds_vlnu8): Ditto.
* config/riscv/andes-vector.md (@pred_intload_mov<su><mode>): New pattern.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_Q_OPS): New def.
(DEF_RVV_QU_OPS): Ditto.
* config/riscv/riscv-vector-builtins.cc
(q_v_void_const_ptr_ops): New operand information.
(qu_v_void_const_ptr_ops): Ditto.
* config/riscv/riscv-vector-builtins.def (void_const_ptr): New def.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVSINTLOAD_EXT.
(required_extensions_specified): Ditto.
* config/riscv/vector-iterators.md (NDS_QVI): New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: New test.
|
|
This patch add support for XAndesvbfhcvt ISA extension.
This extension defines instructions to perform vector floating-point
conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit
single-precision floating-point (SP) data in a vector register.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt.
* config.gcc: Add extra_objs andes-vector-builtins-bases.o
and extra_headers andes_vector.h.
* config/riscv/riscv-vector-builtins-shapes.cc
(BASE_NAME_MAX_LEN): Increase size to 20.
* config/riscv/riscv-vector-builtins.cc
(f32_to_bf16_nf_w_ops): New operand information.
(f32_to_bf16_nf_w_ops): New operand information.
(DEF_RVV_FUNCTION): New def.
* config/riscv/riscv-vector-builtins.def (bf16): Ditto.
* config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto.
(required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT.
(required_extensions_specified): Ditto.
* config/riscv/t-riscv: Add andes-vector-builtins-functions.def,
andes-vector-builtins-bases.h and andes-vector-builtins-bases.o.
* config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator.
(NDS_V_DOUBLE_TRUNC_BF): New attr.
* config/riscv/andes-vector-builtins-bases.cc: New file.
* config/riscv/andes-vector-builtins-bases.h: New file.
* config/riscv/andes-vector-builtins-functions.def: New file.
* config/riscv/andes_vector.h: New file.
* config/riscv/andes-vector.md: New file.
* config/riscv/vector.md: Include andes_vector.md.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add regression for xandesvector.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfwcvtsbf16.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfncvtbf16s.c: New test.
* gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfwcvtsbf16.c: New test.
|
|
Add pipeline description for the Tenstorrent Ascalon 8 wide CPU.
gcc/ChangeLog
* config/riscv/riscv-cores.def (RISCV_TUNE): Update.
* config/riscv/riscv-opts.h (enum riscv_microarchitecture_type):
Add tt_ascalon_d8.
* config/riscv/riscv.md: Update tune attribute and include
tt-ascalon-d8.md.
* config/riscv/tt-ascalon-d8.md: New file.
|
|
For Zvfhmin a vector mode exists but the corresponding vec_extract does
not. This patch checks that a vec_extract is available and otherwise
falls back to standard handling.
PR target/121510
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Check if we can
vec_extract.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr121510.c: New test.
|
|
There are some cases where involing zero_reg is not needed and
where there are other sequences with the same efficiency.
An example is to use SBCI R,0 instead of SBC R,__zero_reg__
when R >= R16. This may turn out to be better for small ISRs.
PR target/121794
gcc/
* config/avr/avr.cc (avr_out_compare): Only use zero_reg
when there is no other sequence of the same length.
(avr_out_plus_ext): Same.
(avr_out_plus_1): Same.
|
|
Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax
on 64-bit elements. Thus, we can use them with the fixed-width V2DImode
expander. Most of the machinery is already there on the define_insn side,
supporting V2DImode operands of the SVE pattern. We just need to wire up
the RTL emission to the v2di standard names for the TARGET_SVE case.
So for the smin case we now generate:
min_di:
ldr q30, [x0]
ptrue p7.b, all
ldr q31, [x1]
smin z30.d, p7/m, z30.d, z31.d
str q30, [x2]
ret
min_imm_di:
ldr q31, [x0]
smin z31.d, z31.d, #5
str q31, [x2]
ret
instead of the previous:
min_di:
ldr q30, [x0]
ldr q31, [x1]
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
min_imm_di:
ldr q31, [x0]
mov z30.d, #5
cmgt v29.2d, v30.2d, v31.2d
bsl v29.16b, v31.16b, v30.16b
str q29, [x2]
ret
The register operand case is the same length, though the new ptrue can now be
shared and moved away. But the immediate operand case is obviously better
as the SVE immediate form doesn't require a predicate operand.
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/iterators.md (sve_di_suf): New mode attribute.
* config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI):
Rename to...
(<optab><mode>3<sve_di_suf>): ... This. Use SVE_I_SIMD_DI mode
iterator.
* config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above
for TARGET_SVE.
gcc/testsuite/
* gcc.target/aarch64/sve/usminmax_di.c: New test.
|