Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch support the 2 intrinsic API of FP16 ZVFHMIN extension. Aka
SEW=16 for below instructions
vfwcvt.f.f.v
vfncvt.f.f.w
Then users can leverage the instrinsic APIs to perform the conversion
between RVV vector single float point and half float point.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): Add vfloat32mf2_t type to vfncvt.f.f.w operations.
(vfloat32m1_t): Likewise.
(vfloat32m2_t): Likewise.
(vfloat32m4_t): Likewise.
(vfloat32m8_t): Likewise.
* config/riscv/riscv-vector-builtins.def: Fix typo in comments.
* config/riscv/vector-iterators.md: Add single to half machine
mode conversion.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: New test.
|
|
Move all optimization patterns into autovec-opt.md to make organization
easier maintain.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>not<mode>): Move to autovec-opt.md.
(*n<optab><mode>): Ditto.
* config/riscv/autovec.md (*<optab>not<mode>): Ditto.
(*n<optab><mode>): Ditto.
* config/riscv/vector.md: Ditto.
|
|
This patch fixes PR target/110083, an ICE-on-valid regression exposed by
my recent PTEST improvements (to address PR target/109973). The latent
bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update
or delete REG_EQUAL notes attached to COMPARE instructions. As a result
the operands of COMPARE would be mismatched, with the register transformed
to V1TImode, but the immediate operand left as const_wide_int, which is
valid for TImode but not V1TImode. This remained latent when the STV
conversion converted the mode of the COMPARE to CCmode, with later passes
recognizing the REG_EQUAL note is obviously invalid as the modes didn't
match, but now that we (correctly) preserve the CCZmode on COMPARE, the
mismatched operand modes trigger a sanity checking ICE downstream.
Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare.
Before:
(expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
(const_wide_int 0x80000000000000000000000000000000))
After:
(expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ])
(const_vector:V1TI [
(const_wide_int 0x80000000000000000000000000000000)
]))
2023-06-04 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/110083
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Update or delete REG_EQUAL notes, converting CONST_INT and
CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR.
gcc/testsuite/ChangeLog
PR target/110083
* gcc.target/i386/pr110083.c: New test case.
|
|
This patch would like to allow the mov and spill operation for the RVV
vfloat16*_t types. The involved machine mode includes VNx1HF, VNx2HF,
VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/riscv.md: Add vfloat16*_t to attr mode.
* config/riscv/vector-iterators.md: Add vfloat16*_t machine mode
to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew.
* config/riscv/vector.md: Add vfloat16*_t machine mode to sew,
vlmul and ratio.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/mov-14.c: New test.
* gcc.target/riscv/rvv/base/spill-13.c: New test.
|
|
This patch fixes a cfi issue introduced by
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20
Test code:
char my_getchar();
float getf();
int test_f0()
{
int s0 = my_getchar();
float f0 = getf();
int b = my_getchar();
return f0+s0+b;
}
cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow
before patch:
test_f0:
...
.cfi_startproc
call t0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addi sp,sp,-16
.cfi_def_cfa_offset 32
...
addi sp,sp,16
.cfi_def_cfa_offset 0 // issue here
...
tail __riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset -16 // issue here
.cfi_endproc
after patch:
test_f0:
...
.cfi_startproc
call t0,__riscv_save_1
.cfi_offset 8, -8
.cfi_offset 1, -4
.cfi_def_cfa_offset 16
...
addi sp,sp,-16
.cfi_def_cfa_offset 32
...
addi sp,sp,16
.cfi_def_cfa_offset 16 // corrected here
...
tail __riscv_restore_1
.cfi_restore 8
.cfi_restore 1
.cfi_def_cfa_offset 0 // corrected here
.cfi_endproc
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with
correct offset.
|
|
There are 2 small changes in this patch, but they do not affect the result.
1. Remove unnecessary md pattern for TARGET_XTHEADCONDMOV in thead.md. The operands[4]
in "if_then_else" are always comparison operations, so the generated rtl does not match
the pattern that is expected to be deleted.
2. Change operands[4] from const0_rtx to operands[1] to maintain rtl consistency. Although
when output assembly, only operands[4] CODE will affect the output result.
Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/thead.md (*th_cond_gpr_mov<GPR:mode><GPR2:mode>): Delete.
|
|
Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))
gcc/ChangeLog:
* config/riscv/predicates.md: Change INTVAL into UINTVAL.
|
|
optimizations
This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int16_t *__restrict dst4,
int8_t *__restrict a, int8_t *__restrict b,
int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
for (int i = 0; i < n; i++)
{
dst[i] = (int16_t) a[i] * (int16_t) b[i];
dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}
In such complicate case, the operand is not single used, used by multiple statements.
GCC combine optimization will iterate the combination of the operands.
Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
Currently, we have format:
(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend))
To handle this following cases (sign and unsigned widening multiplication mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int16_t *__restrict dst4,
int8_t *__restrict a, uint8_t *__restrict b,
uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
for (int i = 0; i < n; i++)
{
dst[i] = (int16_t) a[i] * (int16_t) b[i];
dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}
Before this patch:
...
vsext.vf2 v6,v1
add t0,a0,t4
vzext.vf2 v4,v1
vmul.vv v2,v4,v6
add t0,a1,t4
vzext.vf2 v2,v1
vmul.vv v4,v2,v4
add t0,a2,t4
vmul.vv v2,v2,v6
add t0,a3,t4
sub t6,t6,t1
vsext.vf2 v2,v1
vmul.vv v2,v2,v6
...
After this patch:
...
add t0,a0,t3
vwmulsu.vv v2,v1,v3
add t0,a1,t3
vwmulu.vv v4,v3,v2
add t0,a2,t3
vwmulsu.vv v3,v1,v2
add t0,a3,t3
sub t4,t4,t1
vwmul.vv v2,v1,v3
...
gcc/ChangeLog:
* config/riscv/vector.md: Add vector-opt.md.
* config/riscv/autovec-opt.md: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
|
|
Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector
truncate.
gcc/ChangeLog:
PR target/92658
* config/i386/mmx.md (truncv2hiv2qi2): New define_insn.
(truncv2si<mode>2): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr92658-avx512bw-trunc-2.c: New test.
|
|
This bug was essentially that darwin_rs6000_special_round_type_align()
was ignoring externally-imposed capping of field alignment.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/110044
gcc/ChangeLog:
* config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align):
Make sure that we do not have a cap on field alignment before altering
the struct layout based on the type alignment of the first entry.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/darwin-abi-13-0.c: New test.
* gcc.target/powerpc/darwin-abi-13-1.c: New test.
* gcc.target/powerpc/darwin-abi-13-2.c: New test.
* gcc.target/powerpc/darwin-structs-0.h: New test.
|
|
__builtin_altivec_tr_stxvrhx
The third argument for __builtin_altivec_tr_stxvrhx should be short *
not int *. Similarly, the third argument for __builtin_altivec_tr_stxvrwx
should be int * not short *. This patch fixes the arguments in the two
builtins.
A runnable test case is added to test the __builtin_altivec_tr_stxvrbx,
__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and
__builtin_altivec_tr_stxvrdx builtins.
gcc/
* config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of third argument.
gcc/testsuite/
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test
for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.
|
|
After reload, there may be sequences like
lreg = dreg
lreg = lreg <op> const
with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND.
If dreg dies after the first insn, it is possible to use
dreg = dreg <op> const
lreg = dreg
instead which is more efficient.
gcc/
PR target/110088
* config/avr/avr.md: Add an RTL peephole to optimize operations on
non-LD_REGS after a move from LD_REGS.
(piaop): New code iterator.
|
|
Base on these:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
Add _mu C++ overloaded intrinsics for load && viota && vid.
Co-authored-by: KuanLin Chen <best124612@gmail.com>
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics.
* config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto.
|
|
This patch optimizes the following seriese vector:
[nunits - 1, nunits - 2, ...., 0]
Before this patch:
vid
vmul
vadd
After this patch:
vid
vrsub
This patch is an obvious and simple optimization, ok for trunk?
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.
|
|
Notice there is warning in predicates.md:
../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’:
../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
|| INTVAL (op) == GET_MODE_MASK (SImode)"))))
gcc/ChangeLog:
* config/riscv/predicates.md: Change INTVAL into UINTVAL.
|
|
According to doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226
Add __RISCV_ prefix to VXRM and FRM enum.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add
__RISCV_ prefix.
(DEF_RVV_FRM_ENUM): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/frm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-1.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-10.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-11.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-12.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-6.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-7.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-8.c: Ditto.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
|
|
1. This patch optimize the codegen of the following auto-vectorization codes:
void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n)
{
for (int i = 0; i < n; i++)
c[i] = (int64_t)a[i] + b[i];
}
Combine instruction from:
...
vsext.vf2
vadd.vv
...
into:
...
vwadd.wv
...
Since for PLUS operation, GCC prefer the following RTL operand order when combining:
(plus: (sign_extend:..)
(reg:)
instead of
(plus: (reg:..)
(sign_extend:)
which is different from MINUS pattern.
I split patterns of vwadd/vwsub, and add dedicated patterns for them.
2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv
optimization for complicate PLUS/MINUS codes, consider this following codes:
__attribute__ ((noipa)) void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
int16_t *__restrict dst3, int8_t *__restrict a,
int8_t *__restrict b, int8_t *__restrict a2,
int8_t *__restrict b2, int n)
{
for (int i = 0; i < n; i++)
{
dst[i] = (int16_t) a[i] + (int16_t) b[i];
dst2[i] = (int16_t) a2[i] + (int16_t) b[i];
dst3[i] = (int16_t) a2[i] + (int16_t) a[i];
}
}
Before this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v2,0(a3)
vle8.v v1,0(a4)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v3,v2
vsext.vf2 v2,v1
vadd.vv v1,v2,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v1,0(a0)
vle8.v v4,0(a5)
vsetvli t1,zero,e16,m1,ta,ma
vsext.vf2 v1,v4
vadd.vv v2,v1,v2
...
After this patch:
...
vsetvli zero,a6,e8,mf2,ta,ma
vle8.v v3,0(a4)
vle8.v v1,0(a3)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v2,v1,v3
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v2,0(a0)
vle8.v v2,0(a5)
vsetvli t4,zero,e8,mf2,ta,ma
vwadd.vv v4,v3,v2
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v4,0(a1)
vsetvli t4,zero,e8,mf2,ta,ma
sub a7,a7,a6
vwadd.vv v3,v2,v1
vsetvli zero,a6,e16,m1,ta,ma
vse16.v v3,0(a2)
...
The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS
needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate
RTL IR, extend the other operand to generate vwadd.vv.
So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations.
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv
intrinsic API expander
* config/riscv/vector.md
(@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it.
(@pred_single_widen_sub<any_extend:su><mode>): New pattern.
(@pred_single_widen_add<any_extend:su><mode>): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
|
|
This patch supports vector permutation for VLS only by vec_perm pattern.
We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation
in the future.
Fixed following comments from Robin.
gcc/ChangeLog:
* config/riscv/autovec.md (vec_perm<mode>): New pattern.
* config/riscv/predicates.md (vector_perm_operand): New predicate.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_perm): New function.
* config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Ditto.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_vec_perm): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.
|
|
More optimized than the default RTL generation.
gcc/ChangeLog:
* config/xtensa/xtensa.md (adddi3, subdi3):
New RTL generation patterns implemented according to the instruc-
tion idioms described in the Xtensa ISA reference manual (p. 600).
|
|
This is my proposed minimal fix for PR target/109973 (hopefully suitable
for backporting) that follows Jakub Jelinek's suggestion that we introduce
CCZmode and CCCmode variants of ptest and vptest, so that the i386
backend treats [v]ptest instructions similarly to testl instructions;
using different CCmodes to indicate which condition flags are desired,
and then relying on the RTL cmpelim pass to eliminate redundant tests.
This conveniently matches Intel's intrinsics, that provide different
functions for retrieving different flags, _mm_testz_si128 tests the
Z flag, _mm_testc_si128 tests the carry flag. Currently we use the
same instruction (pattern) for both, and unfortunately the *ptest<mode>_and
optimization is only valid when the ptest/vptest instruction is used to
set/test the Z flag.
The downside, as predicted by Jakub, is that GCC's cmpelim pass is
currently COMPARE-centric and not able to merge the ptests from expressions
such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a
known issue, PR target/80040.
2023-06-01 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/109973
* config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new
CODE_for_sse4_1_ptestzv2di.
(__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di.
(__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di.
(__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di.
* config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode
when expanding UNSPEC_PTEST to compare against zero.
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons.
(general_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
(timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result.
* config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype.
* config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to
check for suitable matching modes for the UNSPEC_PTEST pattern.
* config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK
to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ.
(*<sse4_1>_ptest<mode>): Add asterisk to hide define_insn. Remove
":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode.
(<sse4_1>_ptestz<mode>): New define_expand to specify CCZ.
(<sse4_1>_ptestc<mode>): New define_expand to specify CCC.
(<sse4_1>_ptest<mode>): A define_expand using CC to preserve the
current behavior.
(*ptest<mode>_and): Specify CCZ to only perform this optimization
when only the Z flag is required.
gcc/testsuite/ChangeLog
PR target/109973
* gcc.target/i386/pr109973-1.c: New test case.
* gcc.target/i386/pr109973-2.c: Likewise.
|
|
We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives
to the mov patterns. This straightforward patch does that and for the pair variants too.
For the testcase in the code we now generate the optimal assembly without any superfluous
GP<->SIMD moves.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>):
Add =r,m and =r,m alternatives.
(load_pair<DREG:mode><DREG2:mode>): Likewise.
(vec_store_pair<DREG:mode><DREG2:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/xreg-vec-modes_1.c: New test.
|
|
This patch would like to introduce the built-in type vfloat16m{f}*_t, as
well as their machine mode VNx*HF. They depend on architecture zvfhmin
or zvfh.
When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will
be true.
The underlying PATCH will implement the zvfhmin extension based on this.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin
and zvfh.
* config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16.
(main): Disable FP16 tuple.
* config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro.
(TARGET_VECTOR_ELEN_FP_16): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add FP16.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16):
New macro.
* config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16
machine mode based on TARGET_VECTOR_ELEN_FP_16.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins.cc (register_frm): New function.
(DEF_RVV_FRM_ENUM): New macro.
(handle_pragma_vector): Add FRM enum
* config/riscv/riscv-vector-builtins.def (DEF_RVV_FRM_ENUM): New macro.
(RNE): Ditto.
(RTZ): Ditto.
(RDN): Ditto.
(RUP): Ditto.
(RMM): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/frm-1.c: New test.
|
|
This straightforward patch annotates the dotproduct instructions, including the i8mm ones.
Tests included.
Nothing unexpected here.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Rename to...
(<sur>dot_prod<vsi2qi><vczle><vczbe>): ... This.
(usdot_prod<vsi2qi>): Rename to...
(usdot_prod<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_lane<vsi2qi>): Rename to...
(aarch64_<sur>dot_lane<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<sur>dot_laneq<vsi2qi>): Rename to...
(aarch64_<sur>dot_laneq<vsi2qi><vczle><vczbe>): ... This.
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi>): Rename to...
(aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi><vczle><vczbe>):
... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_11.c: New test.
|
|
This patch goes through the various alphabet soup saturating multiplication patterns, including those in TARGET_RDMA
and annotates them with <vczle><vczbe>. Many other patterns are widening and always write the full 128-bit vectors
so this annotation doesn't apply to them. Nothing out of the ordinary in this patch.
Bootstrapped and tested on aarch64-none-linux and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_sq<r>dmulh<mode>): Rename to...
(aarch64_sq<r>dmulh<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_n<mode>): Rename to...
(aarch64_sq<r>dmulh_n<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_lane<mode>): Rename to...
(aarch64_sq<r>dmulh_lane<mode><vczle><vczbe>): ... This.
(aarch64_sq<r>dmulh_laneq<mode>): Rename to...
(aarch64_sq<r>dmulh_laneq<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode><vczle><vczbe>): ... This.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Rename to...
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode><vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for qdmulh, qrdmulh.
* gcc.target/aarch64/simd/pr99195_10.c: New test.
|
|
RVV auto-vectorization
Base on V1 patch, adding comment:
;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS
;; to combine instructions as below:
;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><v_double_trunc><mode>2): Change
expand into define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.
|
|
Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884
vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
conversion)
Base on the discussion here:
https://github.com/riscv/riscv-v-spec/issues/884
vfwcvt.f.x<u>.v doesn't depend on FRM. So remove FRM preparing for mode switching support.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM.
gcc/ChangeLog:
* config/riscv/vector.md: Remove FRM.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a
pattern to match the new GIMPLE form.
With this patch, gcc.target/aarch64/rev16_2.c passes again.
2023-05-31 Christophe Lyon <christophe.lyon@linaro.org>
PR target/110039
gcc/
* config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New
pattern.
|
|
If the output code for a define_insn just does a switch (which_alternative) with no other computation we can almost always
replace it with more compact MD syntax for each alternative in a mult-alternative '@' block.
This patch cleans up some such patterns in the aarch64 backend, making them shorter and more concise.
No behavioural change intended.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>): Rewrite
output template to avoid explicit switch on which_alternative.
(*aarch64_simd_mov<VQMOV:mode>): Likewise.
(and<mode>3): Likewise.
(ior<mode>3): Likewise.
* config/aarch64/aarch64.md (*mov<mode>_aarch64): Likewise.
|
|
The insn "*shlrd_reg" shifts two registers with a funnel shifter by the
third register to get a single word result:
reg0 = (reg1 SHIFT_OP0 reg3) BIT_JOIN_OP (reg2 SHIFT_OP1 (32 - reg3))
where the funnel left shift is SHIFT_OP0 := ASHIFT, SHIFT_OP1 := LSHIFTRT
and its right shift is SHIFT_OP0 := LSHIFTRT, SHIFT_OP1 := ASHIFT,
respectively. And also, BIT_JOIN_OP can be either PLUS or IOR in either
shift direction.
[(set (match_operand:SI 0 "register_operand" "=a")
(match_operator:SI 6 "xtensa_bit_join_operator"
[(match_operator:SI 4 "logical_shift_operator"
[(match_operand:SI 1 "register_operand" "r")
(match_operand:SI 3 "register_operand" "r")])
(match_operator:SI 5 "logical_shift_operator"
[(match_operand:SI 2 "register_operand" "r")
(neg:SI (match_dup 3))])]))]
Although the RTL matching template can express it as above, there is no
way of direcing that the operator (operands[6]) that combines the two
individual shifts is commutative.
Thus, if multiple insn sequences matching the above pattern appear
adjacently, the combiner may accidentally mix them up and get partial
results.
This patch adds a new insn-and-split pattern with the two sides swapped
representation of the bit-combining operation that was lacking and
described above.
And also changes the other "*shlrd" variants from previously describing
the arbitraryness of bit-combining operations with code iterators to a
combination of the match_operator and the predicate above.
gcc/ChangeLog:
* config/xtensa/predicates.md (xtensa_bit_join_operator):
New predicate.
* config/xtensa/xtensa.md (ior_op): Remove.
(*shlrd_reg): Rename from "*shlrd_reg_<code>", and add the
insn_and_split pattern of the same name to express and capture
the bit-combining operation with both sides swapped.
In addition, replace use of code iterator with new operator
predicate.
(*shlrd_const, *shlrd_per_byte):
Likewise regarding the code iterator.
|
|
This patch would like to add new sub extension (aka ZVFH) to the -march= option.
To make it simple, only the sub extension itself is involved in this patch, and
the underlying FP16 related RVV intrinsic API depends on the TARGET_ZVFH.
The Zvfh extension depends on the Zve32f and Zfhmin extensions. You can locate
more information about ZVFH from below spec doc.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc:
(riscv_implied_info): Add zvfh item.
(riscv_ext_version_table): Ditto.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv-opts.h (MASK_ZVFH): New macro.
(TARGET_ZVFH): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/arch-21.c: New test.
* gcc.target/riscv/predef-27.c: New test.
|
|
gcc/ChangeLog:
PR target/110041
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Fix misleading identation.
|
|
As we can always broadcast an integer constant to a vector register
allow them in riscv_const_insns. We need as many instructions as
it takes to generate the constant and one vmv.vx.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_const_insns): Allow
const_vec_duplicates.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Add vmv.v.x
tests.
* gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Dito.
|
|
This patch converts the patterns for the integer widen and pairwise-add instructions
to standard RTL operations. The pairwise addition withing a vector can be represented
as an addition of two vec_selects, one selecting the even elements, and one selecting odd.
Thus for the intrinsic vpaddlq_s8 we can generate:
(set (reg:V8HI 92)
(plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 0 [0])
(const_int 2 [0x2])
(const_int 4 [0x4])
(const_int 6 [0x6])
(const_int 8 [0x8])
(const_int 10 [0xa])
(const_int 12 [0xc])
(const_int 14 [0xe])
]))
(vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ]))
(parallel [
(const_int 1 [0x1])
(const_int 3 [0x3])
(const_int 5 [0x5])
(const_int 7 [0x7])
(const_int 9 [0x9])
(const_int 11 [0xb])
(const_int 13 [0xd])
(const_int 15 [0xf])
]))))
Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation.
We already have the handy helper functions aarch64_stepped_int_parallel_p and
aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define
the right predicate for the VEC_SELECT PARALLEL.
This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP.
UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete.
(aarch64_<su>adalp<mode>): New define_expand.
(*aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn.
(aarch64_<su>addlp<mode>): Convert to define_expand.
(*aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn.
* config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete.
(ADALP): Likewise.
(USADDLP): Likewise.
* config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.
|
|
This patch reimplements the MD patterns for the UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using
standard RTL operations rather than unspecs. The correct RTL representations involves widening
the inputs before adding them and halving, followed by a truncation back to the original mode.
An unfortunate wart in the patch is that we end up having very similar expanders for the intrinsics
through the aarch64_<su>h<ADDSUB:optab><mode> and aarch64_<su>rhadd<mode> names and the standard names
for the vector averaging optabs <su>avg<mode>3_floor and <su>avg<mode>3_ceil.
I'd like to reuse <su>avg<mode>3_ceil for the intrinsics builtin as well but our scheme
in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only allowing mappings
of entries in aarch64-simd-builtins.def to:
0 - CODE_FOR_aarch64_<name><mode>
1-9 - CODE_FOR_<name><mode><1-9>
10 - CODE_FOR_<name><mode>
whereas here we want a string after the <mode> i.e. CODE_FOR_uavg<mode>3_ceil.
This patch adds a bit of remapping logic in aarch64-builtins.cc before the construction of the
builtin info that remaps the CODE_FOR_* definitions in aarch64-simd-builtins.def to the
optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to CODE_FOR_avgv4si3_ceil, for example.
It's a bit specific to this case, but this solution requires the least invasive changes while avoiding
having duplicate expanders just for the sake of a different pattern name.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of
aarch64-builtin-iterators.h. Add definition to remap shadd, uhadd,
srhadd, urhadd builtin codes for standard optab ones.
* config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor): Rename to...
(<su_optab>avg<mode>3_floor): ... This. Expand to RTL codes rather than
unspec.
(<u>avg<mode>3_ceil): Rename to...
(<su_optab>avg<mode>3_ceil): ... This. Expand to RTL codes rather than
unspec.
(aarch64_<su>hsub<mode>): New define_expand.
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): Split into...
(*aarch64_<su>h<ADDSUB:optab><mode><vczle><vczbe>_insn): ... This...
(*aarch64_<su>rhadd<mode><vczle><vczbe>_insn): ... And this.
|
|
gcc/
PR target/110036
* config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to
match libsanitizer.
|
|
This patch expresses the intrinsics for the SRA and RSRA instructions with
standard RTL codes rather than relying on UNSPECs.
These instructions perform a vector shift right plus accumulate with an
optional rounding constant addition for the RSRA variant.
There are a number of interesting points:
* The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N
is left using the UNSPECs. Expressing it as a DImode plus+shift led to all
kinds of trouble as it started matching the existing define_insns for
"add x0, x0, asr #N" instructions and adding the SRA form as an extra
alternative required a significant amount of deduplication of iterators and
things still didn't work out well. I decided not to tackle that case in
this patch. It can be attempted later.
* For the RSRA variants that add a rounding constant (1 << (shift-1)) the
addition is notionally performed in a wider mode than the input types so that
overflow is handled properly. In RTL this can be represented with an appropriate
extend operation followed by a truncate back to the original modes.
However for 128-bit input modes such as V4SI we don't have appropriate modes
defined for this widening i.e. we'd need a V4DI mode to represent the
intermediate widened result. This patch defines such modes for
V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have
more Advanced SIMD instruction that have similar intermediate widening
semantics.
* The above new modes led to a problem with stor-layout.cc. The new modes only
exist for the sake of the RTL optimisers understanding the semantics of the
instruction but are not indended to be moved to and from register or memory,
assigned to types, used as TYPE_MODE or participate in auto-vectorisation.
This is expressed in aarch64 by aarch64_classify_vector_mode returning zero
for these new modes. However, the code in stor-layout.cc:<mode_for_vector>
explicitly doesn't check this when picking a TYPE_MODE due to modes being made
potentially available later through target switching (PR38240).
This led to these modes being picked as TYPE_MODE for declarations such as:
typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit
fixed-length SVE modes are available and vector_type_mode later struggling
to rectify this.
This issue is addressed with the new target hook
TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a
vector mode can be used in any legal target attribute configuration of the
port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks
only the initial target configuration. This allows a simple adjustment in
stor-layout.cc that still disqualifies these limited modes early on while
allowing consideration of modes that can be turned on in the future with
target attributes.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes.
* config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p):
Declare prototype.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
* config/aarch64/aarch64-simd.md (*aarch64_simd_sra<mode>): Rename to...
(aarch64_<sra_op>sra_n<mode>_insn): ... This.
(aarch64_<sra_op>rsra_n<mode>_insn): New define_insn.
(aarch64_<sra_op>sra_n<mode>): New define_expand.
(aarch64_<sra_op>rsra_n<mode>): Likewise.
(aarch64_<sur>sra_n<mode>): Rename to...
(aarch64_<sur>sra_ndi): ... This.
* config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add
any_target_p argument.
(aarch64_extract_vec_duplicate_wide_int): Define.
(aarch64_const_vec_rsra_rnd_imm_p): Likewise.
(aarch64_const_vec_rnd_cst_p): Likewise.
(aarch64_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete.
(VSRA): Adjust for the above.
(sur): Likewise.
(V2XWIDE): New mode_attr.
(vec_or_offset): Likewise.
(SHIFTEXTEND): Likewise.
* config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New
predicate.
* doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document.
* doc/tm.texi.in: Regenerate.
* stor-layout.cc (mode_for_vector): Check
vector_mode_supported_any_target_p when iterating through vector modes.
* target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to
clarify that it applies to current target options.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.
|
|
Even though we can't support floating-point operations which are
depending
on FRM yet, (for example vfadd support is blocked) since the RVV
intrinsic doc is not updated
and we can't support mode switching for this.
We can support floating-point to integer conversion now since it's not
depending on FRM and
we don't need mode switching support for this ('rtz' conversions
independent FRM).
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (<optab><mode><vconvert>2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test.
|
|
Notice there is warning:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md: In function ‘rtx_def*
gen_anddi3(rtx, rtx, rtx)’:
../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning:
comparison between signed and unsigned integer expressions
[-Wsign-compare]
else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
Add unsigned conversion to fix this warning.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv.md: Fix signed and unsigned comparison
warning.
|
|
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (fnma<mode>4): New pattern.
(*fnma<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
|
|
This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is enabled.
Provide an example from the existing testcases.
Testcase:
int ConEmv_imm_imm_reg(int x, int y){
if (x == 1000) return 10;
return y;
}
Cflags:
-O2 -march=rv64gc_xtheadcondmov -mabi=lp64d
before patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,zero,a5
th.mveqz a1,zero,a5
or a0,a0,a1
ret
after patch:
ConEmv_imm_imm_reg:
addi a5,a0,-1000
li a0,10
th.mvnez a0,a1,a5
ret
Signed-off-by: Die Li <lidie@eswincomputing.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_conditional_move_onesided):
Delete.
(riscv_expand_conditional_move): Reuse the TARGET_SFB_ALU expand
process for TARGET_XTHEADCONDMOV
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output.
* gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
|
|
gcc/ChangeLog:
PR target/110021
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require
TARGET_AVX512BW to generate truncv16hiv16qi2.
|
|
In the case where the target supports extension instructions,
it is preferable to use that instead of doing the same in other ways.
For the following case
void foo (unsigned long a, unsigned long* ptr) {
ptr[0] = a & 0xffffffffUL;
ptr[1] &= 0xffffffffUL;
}
GCC generates
foo:
li a5,-1
srli a5,a5,32
and a0,a0,a5
sd a0,0(a1)
ld a4,8(a1)
and a5,a4,a5
sd a5,8(a1)
ret
but it will be profitable to generate this one
foo:
zext.w a0,a0
sd a0,0(a1)
lwu a5,8(a1)
sd a5,8(a1)
ret
This patch fixes mentioned issue.
It supports HI -> DI, HI->SI and SI -> DI extensions.
gcc/ChangeLog:
* config/riscv/riscv.md (and<mode>3): New expander.
(*and<mode>3) New pattern.
* config/riscv/predicates.md (arith_operand_or_mode_mask): New
predicate.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/and-extend-1.c: New test
* gcc.target/riscv/and-extend-2.c: New test
|
|
This patch would like to remove unnecessary comments of some self
explained parameters and try a better name to avoid misleading.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_insn): Remove unnecessary
comments and rename local variables.
(emit_nonvlmax_insn): Diito.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(emit_scalar_move_insn): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to remove the magic number in the riscv-v.cc, and
align the same value to one macro.
Signed-off-by: Pan Li <pan2.li@intel.com>
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_insn): Eliminate the
magic number.
(emit_nonvlmax_insn): Ditto.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(expand_vec_series): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.
Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));
__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
};
*(vnx32di *) out = v;
}
Before this patch:
vslide1down.vx (x31 times)
After this patch:
li a5,-1431654400
addi a5,a5,-1365
li a3,-1431654400
addi a3,a3,-1366
slli a5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a2)
Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced
class member.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): New function
to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the merge
mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get the dup
machine mode.
(expand_vector_init_merge_repeating_sequence): New function to perform
the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974
PR target/109974
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.
|
|
This patch support FMA auto-vectorization pattern. Let's RA decide
vmacc or vmadd.
Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai>
gcc/ChangeLog:
* config/riscv/autovec.md (fma<mode>4): New pattern.
(*fma<mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.
|