Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds a vectorized rawmemchr expander. It also moves the
vectorized expand_block_move to riscv-string.cc.
gcc/ChangeLog:
* config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander.
* config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx):
Define.
(expand_rawmemchr): Define.
* config/riscv/riscv-v.cc (force_vector_length_operand): Remove
static.
(expand_block_move): Move from here...
* config/riscv/riscv-string.cc (expand_block_move): ...to here.
(expand_rawmemchr): Add vectorized expander.
* internal-fn.cc (expand_RAWMEMCHR): Fix typo.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-prof/peel-2.c: Add
-fno-tree-loop-distribute-patterns.
* gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv.
* gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add builtin directory.
* gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
|
|
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
Consider a simple vector addition operation:
https://godbolt.org/z/7hfGfEjW3
void
foo (int *__restrict a,
int *__restrict b,
int *__restrict n)
{
for (int i = 0; i < n; i++)
a[i] = a[i] + b[i];
}
Optimized IR:
Loop body:
_38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma
...
vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0)
vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1)
vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
.MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
ARM SVE:
.L3:
ld1w z30.s, p7/z, [x0, x3, lsl 2] -> predicated load
ld1w z31.s, p7/z, [x1, x3, lsl 2] -> predicated load
add z31.s, z31.s, z30.s -> un-predicated add
st1w z31.s, p7, [x0, x3, lsl 2] -> predicated store
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
The reasons as follows:
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS.
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
Here is an example to demonstrate more:
https://godbolt.org/z/bE86sv3q5
void foo2 (int *__restrict a,
int *__restrict b,
int *__restrict c,
int *__restrict a2,
int *__restrict b2,
int *__restrict c2,
int *__restrict a3,
int *__restrict b3,
int *__restrict c3,
int *__restrict a4,
int *__restrict b4,
int *__restrict c4,
int *__restrict a5,
int *__restrict b5,
int *__restrict c5,
int n)
{
for (int i = 0; i < n; i++){
a[i] = b[i] + c[i];
b5[i] = b[i] + c[i];
a2[i] = b2[i] + c2[i];
a3[i] = b3[i] + c3[i];
a4[i] = b4[i] + c4[i];
a5[i] = a[i] + a4[i];
a[i] = a5[i] + b5[i]+ a[i];
a[i] = a[i] + c[i];
b5[i] = a[i] + c[i];
a2[i] = a[i] + c2[i];
a3[i] = a[i] + c3[i];
a4[i] = a[i] + c4[i];
a5[i] = a[i] + a4[i];
a[i] = a[i] + b5[i]+ a[i];
}
}
1. Loop Body:
Before this patch: After this patch:
vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2) vle32.v v2,0(a2)
vle32.v v4,0(a1) vle32.v v3,0(t2)
vle32.v v1,0(t2) vle32.v v4,0(a1)
vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0)
vadd.vv v4,v2,v4 vadd.vv v4,v2,v4
vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1
vle32.v v3,0(s0) vadd.vv v1,v1,v4
vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4
vadd.vv v1,v3,v1 vadd.vv v1,v1,v4
vadd.vv v1,v1,v4 vadd.vv v1,v1,v2
vadd.vv v1,v1,v4 vadd.vv v2,v1,v2
vadd.vv v1,v1,v4 vse32.v v2,0(t5)
vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1
vle32.v v4,0(a5) vadd.vv v2,v2,v1
vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2
vadd.vv v1,v1,v2 vadd.vv v3,v1,v3
vadd.vv v2,v1,v2 vle32.v v5,0(a5)
vadd.vv v4,v1,v4 vle32.v v6,0(t6)
vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3)
vse32.v v2,0(t5) vse32.v v2,0(a0)
vse32.v v4,0(a3) vadd.vv v3,v3,v1
vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5
vadd.vv v3,v1,v3 vse32.v v3,0(t4)
vadd.vv v2,v2,v1 vadd.vv v1,v1,v6
vadd.vv v2,v2,v1 vse32.v v2,0(a3)
vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6)
vse32.v v2,0(a0)
vse32.v v3,0(t3)
vle32.v v2,0(t0)
vsetvli a7,zero,e32,m1,ta,ma
vadd.vv v3,v3,v1
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v3,0(t4)
vsetvli a7,zero,e32,m1,ta,ma
slli a7,a4,2
vadd.vv v1,v1,v2
sub t1,t1,a4
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a6)
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
2. Epilogue:
Before this patch: After this patch:
.L5: .L5:
ld s0,8(sp) ret
addi sp,sp,16
jr ra
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
The final codegen after this patch:
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
PR target/111318
PR target/111888
gcc/ChangeLog:
* config.gcc: Add AVL propagation pass.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-avlprop.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Address kito's comments of AVL propagation patch.
Export the functions that are not only used by VSETVL PASS but also AVL propagation PASS.
No functionality change.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl to riscv-v
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-v.cc (has_vl_op): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(validate_change_or_fail): Ditto.
(nonvlmax_avl_type_p): Ditto.
(vlmax_avl_p): Ditto.
(get_sew): Ditto.
(enum vlmul_type): Ditto.
(get_vlmul): Ditto.
(count_regno_occurrences): Ditto.
* config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto.
(has_vl_op): Ditto.
(get_sew): Ditto.
(get_vlmul): Ditto.
(get_default_ta): Ditto.
(tail_agnostic_p): Ditto.
(count_regno_occurrences): Ditto.
(validate_change_or_fail): Ditto.
|
|
Address kito's comments of AVL propagation patch.
Change avl_type into avl_type_idx.
No functionality change.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (vlmax_avl_type_p): New function.
* config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto.
* config/riscv/riscv-vsetvl.cc (get_avl): Adapt function.
* config/riscv/vector.md: Change avl_type into avl_type_idx.
|
|
I didn't manage to get back to the generic vectorizer fallback for
popcount so I figured I'd rather create a popcount fallback in the
riscv backend. It uses the WWG algorithm from libgcc.
gcc/ChangeLog:
* config/riscv/autovec.md (popcount<mode>2): New expander.
* config/riscv/riscv-protos.h (expand_popcount): Define.
* config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount
with the WWG algorithm.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.
|
|
For math function autovec, there will be one step like
rtx tmp = gen_reg_rtx (vec_int_mode);
emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode);
The MU will leave the tmp (aka dest register) register unmasked elements
unchanged and it is undefined here. This patch would like to adjust the
MU to MA.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_type): Add new type
values.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): Add undef merge
operand handling.
(expand_vec_ceil): Take MA instead of MU for tmp register.
(expand_vec_floor): Ditto.
(expand_vec_nearbyint): Ditto.
(expand_vec_rint): Ditto.
(expand_vec_round): Ditto.
(expand_vec_roundeven): Ditto.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Given we have code like below:
typedef char vnx16i __attribute__ ((vector_size (16)));
vnx16i __attribute__ ((noinline, noclone))
test (vnx16i x, vnx16i y)
{
return __builtin_shufflevector (x, y, 11, 12, 13, 14, 11, 12, 13, 14,
11, 12, 13, 14, 11, 12, 13, 14);
}
It can perform the auto vectorization when
-march=rv64gcv_zvl1024b --param=riscv-autovec-preference=fixed-vlmax
but cannot when
-march=rv64gcv_zvl2048b --param=riscv-autovec-preference=fixed-vlmax
The reason comes from the miniaml machine mode of QI is RVVMF8QI, which
is 1024 / 8 = 128 bits, aka the size of VNx16QI. When we set zvl2048b,
the bit size of RVVMFQI is 2048 / 8 = 256, which is not matching the
bit size of VNx16QI (128 bits).
Thus, this patch would like to enable the VLS mode for such case, aka
VNx16QI vls mode for zvl2048b.
Before this patch:
test:
srli a4,a1,40
andi a4,a4,0xff
srli a3,a1,32
srli a5,a1,48
slli a0,a4,8
andi a3,a3,0xff
andi a5,a5,0xff
slli a2,a5,16
or a0,a3,a0
srli a1,a1,56
or a0,a0,a2
slli a2,a1,24
slli a3,a3,32
or a0,a0,a2
slli a4,a4,40
or a0,a0,a3
slli a5,a5,48
or a0,a0,a4
or a0,a0,a5
slli a1,a1,56
or a0,a0,a1
mv a1,a0
ret
After this patch:
test:
vsetivli zero,16,e8,mf8,ta,ma
vle8.v v2,0(a1)
vsetivli zero,4,e32,mf2,ta,ma
vrgather.vi v1,v2,3
vsetivli zero,16,e8,mf8,ta,ma
vse8.v v1,0(a0)
ret
PR target/111857
gcc/ChangeLog:
* config/riscv/riscv-opts.h (TARGET_VECTOR_VLS): Remove.
* config/riscv/riscv-protos.h (vls_mode_valid_p): New func decl.
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Replace
macro reference to func.
(vls_mode_valid_p): New func impl for vls mode valid or not.
* config/riscv/riscv-vector-switch.def (VLS_ENTRY): Replace
macro reference to func.
* config/riscv/vector-iterators.md: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust checker.
* gcc.target/riscv/rvv/autovec/vls/def.h: Add help define.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-6.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This just moves a few functions out of riscv.cc into riscv-string.cc in an
attempt to keep riscv.cc manageable. This was originally Christoph's code and
I'm just pushing it on his behalf.
Full disclosure: I built rv64gc after changing to verify everything still
builds. Given it was just lifting code from one place to another, I didn't run
the testsuite.
gcc/
* config/riscv/riscv-protos.h (emit_block_move): Remove redundant
prototype. Improve comment.
* config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc
into riscv-string.cc.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved
function.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
|
|
This patch would like to support the FP lfloor/lfloorf auto vectorization.
* long lfloor (double) for rv64
* long lfloorf (float) for rv32
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lfloormn2 only act on DF => DI for
rv64, and SF => SI for rv32.
Given we have code like:
void
test_lfloor (long *out, double *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lfloor (in[i]);
}
Before this patch:
.L3:
...
fld fa5,0(a1)
fcvt.l.d a5,fa5,rdn
sd a5,-8(a0)
...
bne a1,a4,.L3
After this patch:
frrm a6
...
fsrmi 2 // RDN
.L3:
...
vsetvli a3,zero,e64,m1,ta,ma
vfcvt.x.f.v v1,v1
vsetvli zero,a2,e64,m1,ta,ma
vse32.v v1,0(a0)
...
bne a2,zero,.L3
...
fsrm a6
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lfloor<mode><v_i_l_ll_convert>2): New
pattern for lfloor/lfloorf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lfloor): New func decl for expanding lfloor.
* config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl
for expanding lfloor.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support the FP lceil/lceilf auto vectorization.
* long lceil (double) for rv64
* long lceilf (float) for rv32
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lceilmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
Given we have code like:
void
test_lceil (long *out, double *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lceil (in[i]);
}
Before this patch:
.L3:
...
fld fa5,0(a1)
fcvt.l.d a5,fa5,rup
sd a5,-8(a0)
...
bne a1,a4,.L3
After this patch:
frrm a6
...
fsrmi 3 // RUP
.L3:
...
vsetvli a3,zero,e64,m1,ta,ma
vfcvt.x.f.v v1,v1
vsetvli zero,a2,e64,m1,ta,ma
vse32.v v1,0(a0)
...
bne a2,zero,.L3
...
fsrm a6
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lceil<mode><v_i_l_ll_convert>2): New
pattern] for lceil/lceilf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lceil): New func decl for expanding lceil.
* config/riscv/riscv-v.cc (expand_vec_lceil): New func impl
for expanding lceil.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support the FP lround/lroundf auto vectorization.
* long lround (double) for rv64
* long lroundf (float) for rv32
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lroundmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
Given we have code like:
void
test_lround (long *out, double *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lround (in[i]);
}
Before this patch:
.L3:
...
fld fa5,0(a1)
fcvt.l.d a5,fa5,rmm
sd a5,-8(a0)
...
bne a1,a4,.L3
After this patch:
frrm a6
...
fsrmi 4 // RMM
.L3:
...
vsetvli a3,zero,e64,m1,ta,ma
vfcvt.x.f.v v1,v1
vsetvli zero,a2,e64,m1,ta,ma
vse32.v v1,0(a0)
...
bne a2,zero,.L3
...
fsrm a6
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lround<mode><v_i_l_ll_convert>2): New
pattern for lround/lroundf.
* config/riscv/riscv-protos.h (enum insn_type): New enum value.
(expand_vec_lround): New func decl for expanding lround.
* config/riscv/riscv-v.cc (expand_vec_lround): New func impl
for expanding lround.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
I suddenly discovered I made a mistake that was lucky un-exposed.
https://godbolt.org/z/c3jzrh7or
GCC is using 32 bit index offset:
vsll.vi v1,v1,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei32.v v1,(a1),v1
This is wrong since v1 may overflow 32bit after vsll.vi.
After this patch:
vsext.vf2 v8,v4
vsll.vi v8,v8,2
vluxei64.v v8,(a1),v8
Same as Clang.
Regression passed. Ok for trunk ?
gcc/ChangeLog:
* config/riscv/autovec.md: Fix index bug.
* config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function.
* config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug.
(gather_scatter_valid_offset_mode_p): New function.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
|
|
This patch would like to support the FP lrint/lrintf auto vectorization.
* long lrint (double) for rv64
* long lrintf (float) for rv32
Due to the limitation that only the same size of data type are allowed
in the vectorier, the standard name lrintmn2 only act on DF => DI for
rv64, and SF => SI for rv32.
Given we have code like:
void
test_lrint (long *out, double *in, unsigned count)
{
for (unsigned i = 0; i < count; i++)
out[i] = __builtin_lrint (in[i]);
}
Before this patch:
.L3:
...
fld fa5,0(a1)
fcvt.l.d a5,fa5,dyn
sd a5,-8(a0)
...
bne a1,a4,.L3
After this patch:
.L3:
...
vsetvli a3,zero,e64,m1,ta,ma
vfcvt.x.f.v v1,v1
vsetvli zero,a2,e64,m1,ta,ma
vse32.v v1,0(a0)
...
bne a2,zero,.L3
The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered
by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
gcc/ChangeLog:
* config/riscv/autovec.md (lrint<mode><vlconvert>2): New pattern
for lrint/lintf.
* config/riscv/riscv-protos.h (expand_vec_lrint): New func decl
for expanding lint.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl
for vfcvt.x.f.v.
(expand_vec_lrint): New function impl for expanding lint.
* config/riscv/vector-iterators.md: New mode attr and iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for
CVT like test case.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
gcc/
* config/riscv/riscv-protos.h (riscv_vector::expand_block_move):
Declare.
* config/riscv/riscv-v.cc (riscv_vector::expand_block_move):
New function.
* config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move.
Change to ..
(cpymem<P:mode>) .. this.
gcc/testsuite/
* gcc.target/riscv/rvv/base/cpymem-1.c: New test.
* gcc.target/riscv/rvv/base/cpymem-2.c: Likewise.
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
2023-09-29 Joern Rennecke <joern.rennecke@embecosm.com>
Juzhe-Zhong <juzhe.zhong@rivai.ai>
PR target/111566
gcc/
* config/riscv/riscv-protos.h (riscv_vector::legitimize_move):
Change second parameter to rtx *.
* config/riscv/riscv-v.cc (risv_vector::legitimize_move): Likewise.
* config/riscv/vector.md: Changed callers of
riscv_vector::legitimize_move.
(*mov<mode>_mem_to_mem): Remove.
gcc/testsuite/
* gcc.target/riscv/rvv/autovec/vls/mov-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto.1
* gcc.target/riscv/rvv/autovec/vls/mov-2.c: Removed.
* gcc.target/riscv/rvv/autovec/vls/mov-4.c: Removed.
* gcc.target/riscv/rvv/autovec/vls/mov-6.c: Removed.
* gcc.target/riscv/rvv/fortran/pr111566.f90: New test.
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
|
|
This patch would like to support auto-vectorization for the
roundeven API in math.h. It depends on the -ffast-math option.
When we would like to call roundeven like v2 = roundeven (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RNE
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+-----------------+
| raw float | binary layout | after roundeven |
+-----------+---------------+-----------------+
| 8388607.5 | 0x4affffff | 8388608.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+-----------+---------------+-----------------+
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-roundeven-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call roundeven
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 0 // Rounding to nearest, ties to even
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases. We will add more run test with zfa support later.
gcc/ChangeLog:
* config/riscv/autovec.md (roundeven<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_roundeven): New func decl.
* config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support auto-vectorization for the
trunc API in math.h. It depends on the -ffast-math option.
When we would like to call trunc/truncf like v2 = trunc (v1),
we will convert it into below insns (reference the implementation of
llvm).
* vfcvt.rtz.x.f v3, v1
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
+------------+---------------+-----------------+
| raw float | binary layout | after trunc |
+------------+---------------+-----------------+
| -8388607.5 | 0xcaffffff | -8388607.0 |
| 8388607.5 | 0x4affffff | 8388607.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+------------+---------------+-----------------+
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
Befor this patch:
math-trunc-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call trunc
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
vfabs.v v2,v1
vmflt.vf v0,v2,fa5
vfcvt.rtz.x.f.v v4,v1,v0.t
vfcvt.f.x.v v2,v4,v0.t
vfsgnj.vv v2,v2,v1
bne .L4
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (btrunc<mode>2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_trunc): New func decl.
* config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl.
(expand_vec_trunc): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support auto-vectorization for the
round API in math.h. It depends on the -ffast-math option.
When we would like to call round/roundf like v2 = round (v1),
we will convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RMM
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
+------------+---------------+-----------------+
| raw float | binary layout | after round |
+------------+---------------+-----------------+
| -8388607.5 | 0xcaffffff | -8388608.0 |
| 8388607.5 | 0x4affffff | 8388608.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+------------+---------------+-----------------+
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
Befor this patch:
math-round-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call round
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 4 // RMM, rounding to nearest, ties to max magnitude
.L4:
vfabs.v v2,v1
vmflt.vf v0,v2,fa5
vfcvt.x.f.v v4,v1,v0.t
vfcvt.f.x.v v2,v4,v0.t
vfsgnj.vv v2,v2,v1
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (round<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_round): New function decl.
* config/riscv/riscv-v.cc (expand_vec_round): New function impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-round-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-round-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support auto-vectorization for the
rint API in math.h. It depends on the -ffast-math option.
When we would like to call rint/rintf like v2 = rint (v1),
we will convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
Assume we have RTZ rounding mode
+------------+---------------+-----------------+
| raw float | binary layout | after int |
+------------+---------------+-----------------+
| -8388607.5 | 0xcaffffff | -8388607.0 |
| 8388607.5 | 0x4affffff | 8388607.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+------------+---------------+-----------------+
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do
the cvt on mask.
Befor this patch:
math-rint-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call rint
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
vfabs.v v2,v1
vmflt.vf v0,v2,fa5
vfcvt.x.f.v v4,v1,v0.t
vfcvt.f.x.v v2,v4,v0.t
vfsgnj.vv v2,v2,v1
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (rint<mode>2): New pattern.
* config/riscv/riscv-protos.h (expand_vec_rint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_rint): New function impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-rint-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support auto-vectorization for the
nearbyint API in math.h. It depends on the -ffast-math option.
When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1),
we will convert it into below insns (reference the implementation of llvm).
* frflags a5
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
* fsflags a5
However, the floating point value may not need the cvt as above if
its mantissa is zero. Take single precision floating point as example:
Assume we have RTZ rounding mode
+------------+---------------+-----------------+
| raw float | binary layout | after nearbyint |
+------------+---------------+-----------------+
| 8388607.5 | 0x4affffff | 8388607.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+------------+---------------+-----------------+
All single floating point >= 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-nearbyint-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call nearbyint
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
vfabs.v v2,v1
vmflt.vf v0,v2,fa5
frflags a7
vfcvt.x.f.v v4,v1,v0.t
vfcvt.f.x.v v2,v4,v0.t
fsflags a7
vfsgnj.vv v2,v2,v1
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (nearbyint<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_vec_nearbyint): New function decl.
* config/riscv/riscv-v.cc (expand_vec_nearbyint): New func impl.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+-------------+
| raw float | binary layout | after floor |
+-----------+---------------+-------------+
| 8388607.5 | 0x4affffff | 8388607.0 |
| 8388608.0 | 0x4b000000 | 8388608.0 |
| 8388609.0 | 0x4b000001 | 8388609.0 |
+-----------+---------------+-------------+
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 2 // Rounding Down
.L4:
vfabs.v v1,v2
vmflt.vf v0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv v1,v1,v2
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (floor<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
Regression passed.
Committed.
gcc/ChangeLog:
* config/riscv/autovec.md: Add VLS conditional patterns.
* config/riscv/riscv-protos.h (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.
* config/riscv/riscv-v.cc (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS conditional tests.
* gcc.target/riscv/rvv/autovec/vls/cond_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_and-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ior-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mod-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_not-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_xor-1.c: New test.
|
|
This patch support combining cond extend and reduce_sum to cond widen reduce_sum
like combine the following three insns:
(set (reg:RVVM2HI 149)
(if_then_else:RVVM2HI
(unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
(set (reg:RVVM2HI 138)
(if_then_else:RVVM2HI
(reg:RVVMF8BI 135)
(reg:RVVM2HI 148)
(reg:RVVM2HI 149)))
(set (reg:HI 150)
(unspec:HI [
(reg:RVVM2HI 138)
] UNSPEC_REDUC_SUM))
into one insn:
(set (reg:SI 147)
(unspec:SI [
(if_then_else:RVVM2SI
(reg:RVVMF16BI 135)
(sign_extend:RVVM2SI (reg:RVVM1HI 136))
(if_then_else:RVVM2HI
(unspec:RVVMF8BI [
(const_vector:RVVMF8BI repeat [
(const_int 1 [0x1])
])
(reg:DI 146)
(const_int 2 [0x2]) repeated x2
(const_int 1 [0x1])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(const_vector:RVVM2HI repeat [
(const_int 0 [0])
])
(unspec:RVVM2HI [
(reg:SI 0 zero)
] UNSPEC_VUNDEF)))
] UNSPEC_REDUC_SUM))
Consider the following C code:
int16_t foo (int8_t *restrict a, int8_t *restrict pred)
{
int16_t sum = 0;
for (int i = 0; i < 16; i += 1)
if (pred[i])
sum += a[i];
return sum;
}
assembly before this patch:
foo:
vsetivli zero,16,e16,m2,ta,ma
li a5,0
vmv.v.i v2,0
vsetvli zero,zero,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vsetvli zero,zero,e16,m2,ta,mu
vle8.v v4,0(a0),v0.t
vmv.s.x v1,a5
vsext.vf2 v2,v4,v0.t
vredsum.vs v2,v2,v1
vmv.x.s a0,v2
slliw a0,a0,16
sraiw a0,a0,16
ret
assembly after this patch:
foo:
li a5,0
vsetivli zero,16,e16,m1,ta,ma
vmv.s.x v3,a5
vsetivli zero,16,e8,m1,ta,ma
vl1re8.v v0,0(a1)
vmsne.vi v0,v0,0
vle8.v v2,0(a0),v0.t
vwredsum.vs v1,v2,v3,v0.t
vsetivli zero,0,e16,m1,ta,ma
vmv.x.s a0,v1
slliw a0,a0,16
sraiw a0,a0,16
ret
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>):
New combine patterns.
* config/riscv/riscv-protos.h (enum insn_type): New insn_type.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-2.c: New test.
|
|
This patch split a VLS avl_type from the NONVLMAX avl_type, denoting
those RVV insn with length set to the number of units of VLS modes.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum avl_type): New VLS avl_type.
* config/riscv/riscv-v.cc (autovec_use_vlmax_p): Move comments.
|
|
Update in v4:
* Add test for _Float16.
* Remove unnecessary macro in def.h for test.
Original log:
This patch would like to support auto-vectorization for both the
ceil and ceilf of math.h. It depends on the -ffast-math option.
When we would like to call ceil/ceilf like v2 = ceil (v1), we will
convert it into below insn (reference the implementation of llvm).
* vfcvt.x.f v3, v1, RUP
* vfcvt.f.x v2, v3
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
+-----------+---------------+
| float | binary layout |
+-----------+---------------+
| 8388607.5 | 0x4affffff |
| 8388608.0 | 0x4b000000 |
| 8388609.0 | 0x4b000001 |
+-----------+---------------+
All single floating point great than 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
Befor this patch:
math-ceil-1.c:21:1: missed: couldn't vectorize loop
...
.L3:
flw fa0,0(s0)
addi s0,s0,4
addi s1,s1,4
call ceilf
fsw fa0,-4(s1)
bne s0,s2,.L3
After this patch:
...
fsrmi 3
.L4:
vfabs.v v0,v1
vmv1r.v v2,v1
vmflt.vv v0,v0,v4
sub a3,a3,a4
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv v2,v2,v1
bne .L4
.L14:
fsrm a6
ret
Please note VLS mode is also involved in this patch and covered by the
test cases.
gcc/ChangeLog:
* config/riscv/autovec.md (ceil<mode>2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_ceil): New function decl.
* config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl.
(expand_vec_float_cmp_mask): Ditto.
(expand_vec_copysign): Ditto.
(expand_vec_ceil): Ditto.
* config/riscv/vector.md: Add VLS mode support.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/test-math.h: New test.
* gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
At present, FMA autovec's patterns do not fully use the corresponding pattern
in vector.md. The previous reason is that the merge operand of pattern in
vector.md cannot be VUNDEF. Now allowing it to be VUNDEF, reunify insn used for
reload pass into vector.md, and the corresponding vlmax pattern in autovec.md
is used for combine. This patch also refactors the corresponding combine
pattern inside autovec-opt.md and removes the unused ones.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_fma<mode>):
Removed old combine patterns.
(*single_<optab>mult_plus<mode>): Ditto.
(*double_<optab>mult_plus<mode>): Ditto.
(*sign_zero_extend_fma): Ditto.
(*zero_sign_extend_fma): Ditto.
(*double_widen_fma<mode>): Ditto.
(*single_widen_fma<mode>): Ditto.
(*double_widen_fnma<mode>): Ditto.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.
(*reduc_plus_scal_<mode>): Adjust name.
(*widen_reduc_plus_scal_<mode>): Adjust name.
(*dual_widen_fma<mode>): New combine pattern.
(*dual_widen_fmasu<mode>): Ditto.
(*dual_widen_fmaus<mode>): Ditto.
(*dual_fma<mode>): Ditto.
(*single_fma<mode>): Ditto.
(*dual_fnma<mode>): Ditto.
(*single_fnma<mode>): Ditto.
(*dual_fms<mode>): Ditto.
(*single_fms<mode>): Ditto.
(*dual_fnms<mode>): Ditto.
(*single_fnms<mode>): Ditto.
* config/riscv/autovec.md (fma<mode>4):
Reafctor fma pattern.
(*fma<VI:mode><P:mode>): Removed.
(fnma<mode>4): Reafctor.
(*fnma<VI:mode><P:mode>): Removed.
(*fma<VF:mode><P:mode>): Removed.
(*fnma<VF:mode><P:mode>): Removed.
(fms<mode>4): Reafctor.
(*fms<VF:mode><P:mode>): Removed.
(fnms<mode>4): Reafctor.
(*fnms<VF:mode><P:mode>): Removed.
* config/riscv/riscv-protos.h (prepare_ternary_operands):
Adjust prototype.
* config/riscv/riscv-v.cc (prepare_ternary_operands): Refactor.
* config/riscv/vector.md (*pred_mul_plus<mode>_undef): New pattern.
(*pred_mul_plus<mode>): Removed.
(*pred_mul_plus<mode>_scalar): Removed.
(*pred_mul_plus<mode>_extended_scalar): Removed.
(*pred_minus_mul<mode>_undef): New pattern.
(*pred_minus_mul<mode>): Removed.
(*pred_minus_mul<mode>_scalar): Removed.
(*pred_minus_mul<mode>_extended_scalar): Removed.
(*pred_mul_<optab><mode>_undef): New pattern.
(*pred_mul_<optab><mode>): Removed.
(*pred_mul_<optab><mode>_scalar): Removed.
(*pred_mul_neg_<optab><mode>_undef): New pattern.
(*pred_mul_neg_<optab><mode>): Removed.
(*pred_mul_neg_<optab><mode>_scalar): Removed.
|
|
Given below example for VLS mode
void
test (vl_t *u)
{
vl_t t;
long long *p = (long long *)&t;
p[0] = p[1] = 2;
*u = t;
}
The vec_set will simplify the insn to vmv.s.x when index is 0, without
merged operand. That will result in some problems in DCE, aka:
1: 137[DI] = a0
2: 138[V2DI] = 134[V2DI] // deleted by DCE
3: 139[DI] = #2 // deleted by DCE
4: 140[DI] = #2 // deleted by DCE
5: 141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE
6: 138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
7: 135[V2DI] = 138[V2DI] // deleted by DCE
8: 142[V2DI] = 135[V2DI] // deleted by DCE
9: 143[DI] = #2
10: 142[V2DI] = vec_dup:V2DI (143[DI])
11: (137[DI]) = 142[V2DI]
The higher 64 bits of 142[V2DI] is unknown here and it generated incorrect
code when store back to memory. This patch would like to fix this issue
by adding a new SCALAR_MOVE_MERGED_OP for vec_set.
Please note this patch doesn't enable VLS for vec_set, the underlying
patches will support this soon.
gcc/ChangeLog:
* config/riscv/autovec.md: Bugfix.
* config/riscv/riscv-protos.h (SCALAR_MOVE_MERGED_OP): New enum.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch fix using wrong mode when emit vlmax reduction insn. We should
use src operand instead dest operand (which always LMUL=m1) to get the vlmax
length. This patch alse remove dest_mode and mask_mode from insn_expander
constructor, which can be geted by insn_flags.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (enum insn_flags): Change name.
(enum insn_type): Ditto.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): Removed.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: New test.
|
|
This patch refactors expand_reduction, remove the reduction_type argument
and add insn_flags argument to determine the passing of the operands.
ops has also been modified to restrict it to only two cases and to remove
operand that are not in use.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum class): Delete enum reduction_type.
(expand_reduction): Adjust expand_reduction prototype.
* config/riscv/riscv-v.cc (need_mask_operand_p): New helper function.
(expand_reduction): Refactor expand_reduction.
|
|
This patch adjust reduction patterns struct, change it from:
(any_reduc:VI
(vec_duplicate:VI
(vec_select:<VEL>
(match_operand:<V_LMUL1> 4 "register_operand" " vr, vr")
(parallel [(const_int 0)])))
(match_operand:VI 3 "register_operand" " vr, vr"))
to:
(unspec:<V_LMUL1> [
(match_operand:VI 3 "register_operand" " vr, vr")
(match_operand:<V_LMUL1> 4 "register_operand" " vr, vr")
] ANY_REDUC)
The reason for the change is that the semantics of the previous pattern is incorrect.
GCC does not have a standard rtx code to express the reduction calculation process.
It makes more sense to use UNSPEC.
Further, all reduction icode are geted by the UNSPEC and MODE (code_for_pred (unspec, mode)),
so that all reduction patterns can have a uniform icode name. After this adjust, widen_reducop
and widen_freducop are redundant.
gcc/ChangeLog:
* config/riscv/autovec.md: Change rtx code to unspec.
* config/riscv/riscv-protos.h (expand_reduction): Change prototype.
* config/riscv/riscv-v.cc (expand_reduction): Change prototype.
* config/riscv/riscv-vector-builtins-bases.cc (class widen_reducop):
Removed.
(class widen_freducop): Removed.
* config/riscv/vector-iterators.md (minu): Add reduc unspec, iterators, attrs.
* config/riscv/vector.md (@pred_reduc_<reduc><mode>): Change name.
(@pred_<reduc_op><mode>): New name.
(@pred_widen_reduc_plus<v_su><mode>): Change name.
(@pred_reduc_plus<order><mode>): Change name.
(@pred_widen_reduc_plus<order><mode>): Change name.
|
|
This patch implements expansions for the cmpstrsi and cmpstrnsi
builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb
instructions are available. The expansion basically emits a comparison
sequence which compares XLEN bits per step if possible.
This allows to inline calls to strcmp() and strncmp() if both strings
are xlen-aligned. For strncmp() the length parameter needs to be known.
The benefits over calls to libc are:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment tests
The inlining mechanism is gated by a new switches ('-minline-strcmp' and
'-minline-strncmp') and by the variable 'optimize_size'.
The amount of emitted unrolled loop iterations can be controlled by the
parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64.
The comparision sequence is inspired by the strcmp example
in the appendix of the Bitmanip specification (incl. the fast
result calculation in case the first word does not contain
a NULL byte). Additional inspiration comes from rs6000-string.c.
The emitted sequence is not triggering any readahead pagefault issues,
because only aligned strings are accessed by aligned xlen-loads.
This patch has been tested using the glibc string tests on QEMU:
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64
* rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8
* rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name.
(<optab>_not<mode>3): Likewise.
* config/riscv/riscv-protos.h (riscv_expand_strcmp): New
prototype.
* config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper
macros.
(GEN_EMIT_HELPER2): Likewise.
(emit_strcmp_scalar_compare_byte): New function.
(emit_strcmp_scalar_compare_subword): Likewise.
(emit_strcmp_scalar_compare_word): Likewise.
(emit_strcmp_scalar_load_and_compare): Likewise.
(emit_strcmp_scalar_call_to_libc): Likewise.
(emit_strcmp_scalar_result_calculation_nonul): Likewise.
(emit_strcmp_scalar_result_calculation): Likewise.
(riscv_expand_strcmp_scalar): Likewise.
(riscv_expand_strcmp): Likewise.
* config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export
INSN name.
(@slt<u>_<X:mode><GPR:mode>3): Likewise.
(cmpstrnsi): Invoke expansion function for str(n)cmp.
(cmpstrsi): Likewise.
* config/riscv/riscv.opt: Add new parameter
'-mstring-compare-inline-limit'.
* doc/invoke.texi: Document new parameter
'-mstring-compare-inline-limit'.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-strcmp.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled-2.c: New test.
* gcc.target/riscv/zbb-strcmp-disabled.c: New test.
* gcc.target/riscv/zbb-strcmp-unaligned.c: New test.
* gcc.target/riscv/zbb-strcmp.c: New test.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
|
|
This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:
rv32gc_zbb (RV64 is similar):
add a3,a0,4
li a4,-1
.L1: lw a5,0(a0)
add a0,a0,4
orc.b a5,a5
beq a5,a4,.L1
not a5,a5
ctz a5,a5
srl a5,a5,0x3
add a0,a0,a5
sub a0,a0,a3
rv64gc_xtheadbb (RV32 is similar):
add a4,a0,8
.L2: ld a5,0(a0)
add a0,a0,8
th.tstnbz a5,a5
beqz a5,.L2
th.rev a5,a5
th.ff1 a5,a5
srl a5,a5,0x3
add a0,a0,a5
sub a0,a0,a4
This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test
The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.
Tested using the glibc string tests.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen<mode>): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
(th_rev<mode>2): Likewise.
(th_tstnbz<mode>2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.
|
|
I just finished V2 version of LMUL cost model.
Turns out we don't these redundant functions.
Remove them.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (get_all_predecessors): Remove.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.
|
|
This patch add VLS modes VEC_PERM support which fix these following
FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311:
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0
FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 1
These FAILs are fixed after this patch.
PR target/111311
gcc/ChangeLog:
* config/riscv/autovec.md: Add VLS modes.
* config/riscv/riscv-protos.h (cmp_lmul_le_one): New function.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv-v.cc (cmp_lmul_le_one): Ditto.
(cmp_lmul_gt_one): Ditto.
* config/riscv/riscv.cc (riscv_print_operand): Add VLS modes.
(riscv_vectorize_vec_perm_const): Ditto.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/compress-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/compress-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/merge-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/perm-7.c: New test.
|
|
Functions which follow vector calling convention variant need be annotated by
.variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and
RISC-V ELF Specification[2].
[1] https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops
[2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.cc (riscv_asm_output_variant_cc):
Output .variant_cc directive for vector function.
(riscv_declare_function_name): Ditto.
(riscv_asm_output_alias): Ditto.
(riscv_asm_output_external): Ditto.
* config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME):
Implement ASM_DECLARE_FUNCTION_NAME.
(ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS.
(ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.
|
|
returns
I post the vector register calling convention rules from in the proposal[1]
directly here:
v0 is used to pass the first vector mask argument to a function, and to return
vector mask result from a function. v8-v23 are used to pass vector data
arguments, vector tuple arguments and the rest vector mask arguments to a
function, and to return vector data and vector tuple results from a function.
Each vector data type and vector tuple type has an LMUL attribute that
indicates a vector register group. The value of LMUL indicates the number of
vector registers in the vector register group and requires the first vector
register number in the vector register group must be a multiple of it. For
example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be
allocated to this type, but v9-v16 can not because the v9 register number is
not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a
vector mask type, its LMUL is 1.
Each vector tuple type also has an NFIELDS attribute that indicates how many
vector register groups the type contains. Thus a vector tuple type needs to
take up LMUL×NFIELDS registers.
The rules for passing vector arguments are as follows:
1. For the first vector mask argument, use v0 to pass it. The argument has now
been allocated.
2. For vector data arguments or rest vector mask arguments, starting from the
v8 register, if a vector register group between v8-v23 that has not been
allocated can be found and the first register number is a multiple of LMUL,
then allocate this vector register group to the argument and mark these
registers as allocated. Otherwise, pass it by reference. The argument has now
been allocated.
3. For vector tuple arguments, starting from the v8 register, if NFIELDS
consecutive vector register groups between v8-v23 that have not been allocated
can be found and the first register number is a multiple of LMUL, then allocate
these vector register groups to the argument and mark these registers as
allocated. Otherwise, pass it by reference. The argument has now been allocated.
NOTE: It should be stressed that the search for the appropriate vector register
groups starts at v8 each time and does not start at the next register after the
registers are allocated for the previous vector argument. Therefore, it is
possible that the vector register number allocated to a vector argument can be
less than the vector register number allocated to previous vector arguments.
For example, for the function
`void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules
of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b`
and v9 will be allocated to `c`. This approach allows more vector registers to
be allocated to arguments in some cases.
Vector values are returned in the same manner as the first named argument of
the same type would be passed.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389
gcc/ChangeLog:
* config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type.
* config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto.
* config/riscv/riscv.cc (struct riscv_arg_info): New fields.
(riscv_init_cumulative_args): Setup variant_cc field.
(riscv_vector_type_p): New function for checking vector type.
(riscv_hard_regno_nregs): Hoist declare.
(riscv_get_vector_arg): Subroutine of riscv_get_arg_info.
(riscv_get_arg_info): Support vector cc.
(riscv_function_arg_advance): Update cum.
(riscv_pass_by_reference): Handle vector args.
(riscv_v_abi): New function return vector abi.
(riscv_return_value_is_vector_type_p): New function for check vector arguments.
(riscv_arguments_is_vector_type_p): New function for check vector returns.
(riscv_fntype_abi): Implement TARGET_FNTYPE_ABI.
(TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI.
* config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi.
(MAX_ARGS_IN_VECTOR_REGISTERS): Ditto.
(MAX_ARGS_IN_MASK_REGISTERS): Ditto.
(V_ARG_FIRST): Ditto.
(V_ARG_LAST): Ditto.
(enum riscv_cc): Define all RISCV_CC variants.
* config/riscv/riscv.opt: Add --param=riscv-vector-abi.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-2.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-3.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-args-4.c: New test.
* gcc.target/riscv/rvv/base/abi-call-error-1.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return-run.c: New test.
* gcc.target/riscv/rvv/base/abi-call-return.c: New test.
|
|
Notice those functions need to be use by COST model for dynamic LMUL use.
Extract as a single patch and committed.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global.
(get_all_predecessors): New function.
(get_all_successors): Ditto.
* config/riscv/riscv-v.cc (get_all_predecessors): Ditto.
(get_all_successors): Ditto.
* config/riscv/riscv-vector-builtins.cc (sizeless_type_p): Export global.
* config/riscv/riscv-vsetvl.cc (get_all_predecessors): Remove it.
|
|
This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code`
to `unsigned icode` and use the icode directly to determine whether the
rounding_mode operand is required.
gcc/ChangeLog:
* config/riscv/autovec.md: Adjust.
* config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
* config/riscv/riscv-v.cc (needs_fp_rounding): Ditto.
(expand_cond_len_op): Ditto.
(expand_cond_len_unop): Ditto.
(expand_cond_len_binop): Ditto.
(expand_cond_len_ternop): Ditto.
|
|
This patch change the vsetvl policy to default policy
(returned by get_prefer_mask_policy and get_prefer_tail_policy) instead
fixed policy. Any policy is now returned, allowing change to agnostic
or undisturbed. In the future, users may be able to control the default
policy, such as keeping agnostic by compiler options.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here.
* config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx):
Change to default policy.
* config/riscv/riscv-vector-builtins-bases.cc: Change to default policy.
* config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete.
* config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust.
* gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.
|
|
This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions.
These functions are used to generate RVV insn. There are currently 31
such functions and a few duplicates. The reason so many functions are
needed is because there are more types of RVV instructions. There are
patterns that don't have mask operand, patterns that don't have merge
operand, and patterns that don't need a tail policy operand, etc.
Previously there was the insn_type enum, but it's value was just used
to indicate how many operands were passed in by caller. The rest of
the operands information is scattered throughout these functions.
For example, emit_vlmax_fp_insn indicates that a rounding mode operand
of FRM_DYN should also be passed, emit_vlmax_merge_insn means that
there is no mask operand or mask policy operand.
I introduced a new enum insn_flags to indicate some properties of these
RVV patterns. These insn_flags are then used to define insn_type enum.
For example for the defintion of WIDEN_TERNARY_OP:
WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P
| TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P,
This flags mean the RVV pattern has no merge operand. This flags only apply
to vwmacc instructions. After defining the desired insn_type, all the
emit_{vlmax,nonvlmax}_xxx functions are unified into three functions:
emit_vlmax_insn (icode, insn_flags, ops);
emit_nonvlmax_insn (icode, insn_flags, ops, vl);
emit_vlmax_insn_lra (icode, insn_flags, ops, vl);
Then user can select the appropriate insn_type and the appropriate emit_xxx
function for RVV patterns generation as needed.
gcc/ChangeLog:
* config/riscv/autovec-opt.md: Adjust.
* config/riscv/autovec-vls.md: Ditto.
* config/riscv/autovec.md: Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add insn_type.
(enum insn_flags): Add insn flags.
(emit_vlmax_insn): Adjust.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_insn_lra): Add.
* config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New.
(emit_vlmax_insn): Adjust.
(emit_nonvlmax_insn): Adjust.
(emit_vlmax_insn_lra): Add.
(emit_vlmax_fp_insn): Delete.
(emit_vlmax_ternary_insn): Delete.
(emit_vlmax_fp_ternary_insn): Delete.
(emit_vlmax_slide_insn): Delete.
(emit_nonvlmax_slide_tu_insn): Delete.
(emit_nonvlmax_slide_insn): Delete.
(emit_vlmax_merge_insn): Delete.
(emit_vlmax_cmp_insn): Delete.
(emit_vlmax_cmp_mu_insn): Delete.
(emit_vlmax_masked_insn): Delete.
(emit_nonvlmax_masked_insn): Delete.
(emit_vlmax_masked_store_insn): Delete.
(emit_nonvlmax_masked_store_insn): Delete.
(emit_vlmax_masked_mu_insn): Delete.
(emit_vlmax_masked_fp_mu_insn): Delete.
(emit_nonvlmax_tu_insn): Delete.
(emit_nonvlmax_fp_tu_insn): Delete.
(emit_nonvlmax_tumu_insn): Delete.
(emit_nonvlmax_fp_tumu_insn): Delete.
(emit_scalar_move_insn): Delete.
(emit_cpop_insn): Delete.
(emit_vlmax_integer_move_insn): Delete.
(emit_nonvlmax_integer_move_insn): Delete.
(emit_vlmax_gather_insn): Delete.
(emit_vlmax_masked_gather_mu_insn): Delete.
(emit_vlmax_compress_insn): Delete.
(emit_nonvlmax_compress_insn): Delete.
(emit_vlmax_reduction_insn): Delete.
(emit_vlmax_fp_reduction_insn): Delete.
(emit_nonvlmax_fp_reduction_insn): Delete.
(expand_vec_series): Adjust.
(expand_const_vector): Adjust.
(legitimize_move): Adjust.
(sew64_scalar_helper): Adjust.
(expand_tuple_move): Adjust.
(expand_vector_init_insert_elems): Adjust.
(expand_vector_init_merge_repeating_sequence): Adjust.
(expand_vec_cmp): Adjust.
(expand_vec_cmp_float): Adjust.
(expand_vec_perm): Adjust.
(shuffle_merge_patterns): Adjust.
(shuffle_compress_patterns): Adjust.
(shuffle_decompress_patterns): Adjust.
(expand_load_store): Adjust.
(expand_cond_len_op): Adjust.
(expand_cond_len_unop): Adjust.
(expand_cond_len_binop): Adjust.
(expand_gather_scatter): Adjust.
(expand_cond_len_ternop): Adjust.
(expand_reduction): Adjust.
(expand_lanes_load_store): Adjust.
(expand_fold_extract_last): Adjust.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Adjust.
* config/riscv/vector.md: Adjust.
|
|
Zcmp can share the same logic as save-restore in stack allocation: pre-allocation
by cm.push, step 1 and step 2.
Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and
local variables if any.
Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does.
So adaption has been done in .cfi directives in my patch.
gcc/ChangeLog:
* config/riscv/iterators.md
(slot0_offset): slot 0 offset in stack GPRs area in bytes
(slot1_offset): slot 1 offset in stack GPRs area in bytes
(slot2_offset): likewise
(slot3_offset): likewise
(slot4_offset): likewise
(slot5_offset): likewise
(slot6_offset): likewise
(slot7_offset): likewise
(slot8_offset): likewise
(slot9_offset): likewise
(slot10_offset): likewise
(slot11_offset): likewise
(slot12_offset): likewise
* config/riscv/predicates.md
(stack_push_up_to_ra_operand): predicates of stack adjust pushing ra
(stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0
(stack_push_up_to_s1_operand): likewise
(stack_push_up_to_s2_operand): likewise
(stack_push_up_to_s3_operand): likewise
(stack_push_up_to_s4_operand): likewise
(stack_push_up_to_s5_operand): likewise
(stack_push_up_to_s6_operand): likewise
(stack_push_up_to_s7_operand): likewise
(stack_push_up_to_s8_operand): likewise
(stack_push_up_to_s9_operand): likewise
(stack_push_up_to_s11_operand): likewise
(stack_pop_up_to_ra_operand): predicates of stack adjust poping ra
(stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0
(stack_pop_up_to_s1_operand): likewise
(stack_pop_up_to_s2_operand): likewise
(stack_pop_up_to_s3_operand): likewise
(stack_pop_up_to_s4_operand): likewise
(stack_pop_up_to_s5_operand): likewise
(stack_pop_up_to_s6_operand): likewise
(stack_pop_up_to_s7_operand): likewise
(stack_pop_up_to_s8_operand): likewise
(stack_pop_up_to_s9_operand): likewise
(stack_pop_up_to_s11_operand): likewise
* config/riscv/riscv-protos.h
(riscv_zcmp_valid_stack_adj_bytes_p):declaration
* config/riscv/riscv.cc (struct riscv_frame_info): comment change
(riscv_avoid_multi_push): helper function of riscv_use_multi_push
(riscv_use_multi_push): true if multi push is used
(riscv_multi_push_sregs_count): num of sregs in multi-push
(riscv_multi_push_regs_count): num of regs in multi-push
(riscv_16bytes_align): align to 16 bytes
(riscv_stack_align): moved to a better place
(riscv_save_libcall_count): no functional change
(riscv_compute_frame_info): add zcmp frame info
(riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp
(riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push
(riscv_gen_multi_push_pop_insn): gen function for multi push and pop
(get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push
(riscv_expand_prologue): allocate stack by cm.push
(riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret]
(riscv_expand_epilogue): allocate stack by cm.pop[ret]
(zcmp_base_adj): calculate stack adjustment base size
(zcmp_additional_adj): calculate stack adjustment additional size
(riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid
* config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra
(S0_MASK): likewise
(S1_MASK): likewise
(S2_MASK): likewise
(S3_MASK): likewise
(S4_MASK): likewise
(S5_MASK): likewise
(S6_MASK): likewise
(S7_MASK): likewise
(S8_MASK): likewise
(S9_MASK): likewise
(S10_MASK): likewise
(S11_MASK): likewise
(MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most
(ZCMP_MAX_SPIMM): max spimm value
(ZCMP_SP_INC_STEP): zcmp sp increment step
(ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10
(ZCMP_S0S11_SREGS_COUNTS): num of s0-s11
(ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp
(CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11)
* config/riscv/riscv.md: include zc.md
* config/riscv/zc.md: New file. machine description for zcmp
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rv32e_zcmp.c: New test.
* gcc.target/riscv/rv32i_zcmp.c: New test.
* gcc.target/riscv/zcmp_push_fpr.c: New test.
* gcc.target/riscv/zcmp_stack_alignment.c: New test.
|
|
This patch refactors the codes of expand_cond_len_{unop,binop,ternop}.
Introduces a new unified function expand_cond_len_op to do the main thing.
The expand_cond_len_{unop,binop,ternop} functions only care about how
to pass the operands to the intrinsic patterns.
gcc/ChangeLog:
* config/riscv/autovec.md: Adjust
* config/riscv/riscv-protos.h (RVV_VUNDEF): Clean.
(get_vlmax_rtx): Exported.
* config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted.
(emit_vlmax_masked_gather_mu_insn): Adjust.
(get_vlmax_rtx): New func.
(expand_load_store): Adjust.
(expand_cond_len_unop): Call expand_cond_len_op.
(expand_cond_len_op): New subroutine.
(expand_cond_len_binop): Call expand_cond_len_op.
(expand_cond_len_ternop): Call expand_cond_len_op.
(expand_lanes_load_store): Adjust.
|
|
Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
int last = 66; /* High start value. */
for (int i = 0; i < 4; i++)
if (a[i] < min_v)
last = i;
return last;
}
--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8
condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li a5,32
vmv.v.x v8,a1
vl8re32.v v0,0(a0)
vid.v v16
vmslt.vv v0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addi a5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vm v8,v16,v0
vslidedown.vx v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li a0,66
ret
--param=riscv-autovec-preference=scalable
condition_reduction:
csrr a6,vlenb
mv a2,a0
li a3,32
li a0,66
srli a6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma ----> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
slli a1,a5,2
vmv.v.x v2,a6
vmslt.vv v0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addi a5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vm v2,v3,v0
vslidedown.vx v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret
There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue.
vsetvl issue is not included in this patch but will be fixed soon.
gcc/ChangeLog:
* config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.
|
|
This patch adds the 'Zfa' extension for riscv, which is based on:
https://github.com/riscv/riscv-isa-manual/commits/zfb
The binutils-gdb for 'Zfa' extension:
https://sourceware.org/pipermail/binutils/2023-April/127060.html
What needs special explanation is:
1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to
accelerate the processing of JavaScript Numbers.", so it seems that no implementation
is required.
2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum.
Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and
fmaxm<hf\sf\df>3 to prepare for later.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on
the F extension.
* config/riscv/constraints.md (zfli): Constrain the floating point number that the
instructions FLI.H/S/D can load.
* config/riscv/iterators.md (ceil): New.
* config/riscv/riscv-opts.h (MASK_ZFA): New.
(TARGET_ZFA): New.
* config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New.
* config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New.
(riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is
not applicable.
(riscv_const_insns): Likewise.
(riscv_legitimize_const_move): Likewise.
(riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is
required.
(riscv_split_doubleword_move): Likewise.
(riscv_output_move): Output the mov instructions in zfa extension.
(riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate
in assembly.
(riscv_secondary_memory_needed): Likewise.
* config/riscv/riscv.md (fminm<mode>3): New.
(fmaxm<mode>3): New.
(movsidf2_low_rv32): New.
(movsidf2_high_rv32): New.
(movdfsisi3_rv32): New.
(f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New.
* config/riscv/riscv.opt: New.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zfa-fleq-fltq.c: New test.
* gcc.target/riscv/zfa-fli-zfh.c: New test.
* gcc.target/riscv/zfa-fli.c: New test.
* gcc.target/riscv/zfa-fmovh-fmovp.c: New test.
* gcc.target/riscv/zfa-fli-1.c: New test.
* gcc.target/riscv/zfa-fli-2.c: New test.
* gcc.target/riscv/zfa-fli-3.c: New test.
* gcc.target/riscv/zfa-fli-4.c: New test.
* gcc.target/riscv/zfa-fli-6.c: New test.
* gcc.target/riscv/zfa-fli-7.c: New test.
* gcc.target/riscv/zfa-fli-8.c: New test.
Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>
|
|
Hi,
This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend.
For this C code:
void
test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n)
{
for (int i = 0; i < n; i += 1)
{
a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i];
}
}
Before this patch:
...
vsetvli a7,zero,e32,m1,ta,ma
vfabs.v v2,v2
vmerge.vvm v1,v1,v2,v0
...
After this patch:
...
vsetvli a7,zero,e32,m1,ta,mu
vfabs.v v1,v2,v0.t
...
For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns
is enough.
For the FP abs pattern, We need to change the definition of `abs<mode>2` and
`@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split
in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass.
A fusion process similar to the one below:
(insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ])
(abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 1 12799 {absrvvm1sf2}
(expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
(nil)))
(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
(if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
(reg:RVVM1SF 152 [ vect_iftmp.15 ])
(reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 {vcond_mask_rvvm1sfrvvmf32bi}
(expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ])
(expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
(expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
(nil)))))
==>
(insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ])
(if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ])
(abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))
(reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {*cond_absrvvm1sf}
(expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ])
(expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ])
(expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ])
(nil)))))
Best,
Lehua
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_abs<mode>): New combine pattern.
(*copysign<mode>_neg): Ditto.
* config/riscv/autovec.md (@vcond_mask_<mode><vm>): Adjust.
(<optab><mode>2): Ditto.
(cond_<optab><mode>): New.
(cond_len_<optab><mode>): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New.
(expand_cond_len_unop): New helper func.
* config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust.
(expand_cond_len_unop): New helper func.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
|
|
This patch allow us auto-vectorize this following case:
void __attribute__ ((noinline, noclone)) \
NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \
MASKTYPE *__restrict cond, intptr_t n) \
{ \
for (intptr_t i = 0; i < n; ++i) \
if (cond[i]) \
dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2] \
+ src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5] \
+ src[i * 8 + 6] + src[i * 8 + 7]); \
}
TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) \
TEST2 (NAME##_i32, OUTTYPE, int32_t) \
TEST1 (NAME##_i32, int32_t) \
TEST (test)
ASM:
test_i32_i32_f32_8:
ble a3,zero,.L5
.L3:
vsetvli a4,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a5,zero,e32,m1,ta,ma
vmsne.vi v0,v0,0
vsetvli zero,a4,e32,m1,ta,ma
vlseg8e32.v v8,(a1),v0.t
vsetvli a5,zero,e32,m1,ta,ma
slli a6,a4,2
vadd.vv v1,v9,v8
slli a7,a4,5
vadd.vv v1,v1,v10
sub a3,a3,a4
vadd.vv v1,v1,v11
vadd.vv v1,v1,v12
vadd.vv v1,v1,v13
vadd.vv v1,v1,v14
vadd.vv v1,v1,v15
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a0),v0.t
add a2,a2,a6
add a1,a1,a7
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret
gcc/ChangeLog:
* config/riscv/autovec.md (vec_mask_len_load_lanes<mode><vsingle>):
New pattern.
(vec_mask_len_store_lanes<mode><vsingle>): Ditto.
* config/riscv/riscv-protos.h (expand_lanes_load_store): New function.
* config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode.
(expand_lanes_load_store): New function.
* config/riscv/vector-iterators.md: New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c:
Adapt test.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto.
* gcc.target/riscv/rvv/rvv.exp: Add lanes tests.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-5.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-6.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-7.c:
New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect-9.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-15.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-16.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-17.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-18.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c: New test.
|
|
The frm_mode attr has some assumptions for each define insn as below.
1. The define insn has at least 9 operands.
2. The operands[9] must be frm reg.
3. The operands[9] must be const int.
Actually, the frm operand can be operands[8], operands[9] or
operands[10], and not all the define insn has frm operands.
This patch would like to refactor frm and eliminate the above
assumptions, as well as unblock the underlying rounding mode intrinsic
API support.
After refactor, the default frm will be none, and the selected insn type
will be dyn. For the floating point which honors the frm, we will
set the frm_mode attr explicitly in define_insn.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-by: Kito Cheng <kito.cheng@sifive.com>
gcc/ChangeLog:
* config/riscv/riscv-protos.h
(enum floating_point_rounding_mode): Add NONE, DYN_EXIT and DYN_CALL.
(get_frm_mode): New declaration.
* config/riscv/riscv-v.cc (get_frm_mode): New function to get frm mode.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn): Take care of frm reg.
* config/riscv/riscv.cc (riscv_static_frm_mode_p): Migrate to FRM_XXX.
(riscv_emit_frm_mode_set): Ditto.
(riscv_emit_mode_set): Ditto.
(riscv_frm_adjust_mode_after_call): Ditto.
(riscv_frm_mode_needed): Ditto.
(riscv_frm_mode_after): Ditto.
(riscv_mode_entry): Ditto.
(riscv_mode_exit): Ditto.
* config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Ditto.
* config/riscv/vector.md
(rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none): Removed
(symbol_ref): * config/riscv/vector.md: Set frm_mode attr explicitly.
|
|
This patch is depending on middle-end patch on vectorizable_call.
Consider this following case:
void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n)
{
for (int i = 0; i < n; i++)
if (cond[i])
a[i] = b[i] + a[i];
}
Before this patch (**NO** -ffast-math):
<source>:5:21: missed: couldn't vectorize loop
<source>:5:21: missed: not vectorized: control flow in loop.
After this patch:
foo:
ble a3,zero,.L5
mv a6,a0
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vle32.v v0,0(a2)
vsetvli a7,zero,e32,m1,ta,ma
slli a4,a5,2
vmsne.vi v0,v0,0
sub a3,a3,a5
vsetvli zero,a5,e32,m1,tu,mu ------> must be TUMU
vle32.v v2,0(a0),v0.t
vle32.v v1,0(a1),v0.t
vfadd.vv v1,v1,v2,v0.t ------> generated by COND_LEN_ADD with real mask and len.
vse32.v v1,0(a6),v0.t
add a2,a2,a4
add a1,a1,a4
add a0,a0,a4
add a6,a6,a4
bne a3,zero,.L3
.L5:
ret
gcc/ChangeLog:
* config/riscv/autovec.md (cond_<optab><mode>): New pattern.
(cond_len_<optab><mode>): Ditto.
(cond_fma<mode>): Ditto.
(cond_len_fma<mode>): Ditto.
(cond_fnma<mode>): Ditto.
(cond_len_fnma<mode>): Ditto.
(cond_fms<mode>): Ditto.
(cond_len_fms<mode>): Ditto.
(cond_fnms<mode>): Ditto.
(cond_len_fnms<mode>): Ditto.
* config/riscv/riscv-protos.h (riscv_get_v_regno_alignment): Export
global.
(enum insn_type): Add new enum type.
(prepare_ternary_operands): New function.
* config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Ditto.
(emit_nonvlmax_tumu_insn): Ditto.
(emit_nonvlmax_fp_tumu_insn): Ditto.
(expand_cond_len_binop): Add condtional operations.
(expand_cond_len_ternop): Ditto.
(prepare_ternary_operands): New function.
* config/riscv/riscv.cc (riscv_memmodel_needs_amo_release): Export
riscv_get_v_regno_alignment as global scope.
* config/riscv/vector.md: Fix ternary bugs.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/rvv.exp: Add condition tests.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_arith_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift-9.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_shift_run-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/reduc_call-5.c: New test.
|
|
As I've mentioned in the main zicond thread, Ventana has had patches
that support more cases by first emitting a suitable scc instruction
essentially as a canonicalization step of the condition for zicond.
For example if we have
(set (target) (if_then_else (op (reg1) (reg2))
(true_value)
(false_value)))
The two register comparison isn't handled by zicond directly. But we
can generate something like this instead
(set (temp) (op (reg1) (reg2)))
(set (target) (if_then_else (op (temp) (const_int 0))
(true_value)
(false_value)
Then let the remaining code from Xiao handle the true_value/false_value
to make sure it's zicond compatible.
This is primarily Raphael's work. My involvement has been mostly to
move it from its original location (in the .md file) into the expander
function and fix minor problems with the FP case.
gcc/
* config/riscv/riscv.cc (riscv_expand_int_scc): Add invert_ptr
as an argument and pass it to riscv_emit_int_order_test.
(riscv_expand_conditional_move): Handle cases where the condition
is not EQ/NE or the second argument to the conditional is not
(const_int 0).
* config/riscv/riscv-protos.h (riscv_expand_int_scc): Update prototype.
Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
|
|
We always want get_mask_mode return a valid mode, it's something wrong
if it failed, so I think we could just move the `.require ()` into
get_mask_mode, instead of calling that every call-site.
The only exception is riscv_get_mask_mode, it might put supported mode
into get_mask_mode, so added a check with riscv_v_ext_mode_p to make
sure only valid vector mode will ask get_mask_mode.
gcc/ChangeLog:
* config/riscv/autovec.md (abs<mode>2): Remove `.require ()`.
* config/riscv/riscv-protos.h (get_mask_mode): Update return
type.
* config/riscv/riscv-v.cc (rvv_builder::rvv_builder): Remove
`.require ()`.
(emit_vlmax_insn): Ditto.
(emit_vlmax_fp_insn): Ditto.
(emit_vlmax_ternary_insn): Ditto.
(emit_vlmax_fp_ternary_insn): Ditto.
(emit_nonvlmax_fp_ternary_tu_insn): Ditto.
(emit_nonvlmax_insn): Ditto.
(emit_vlmax_slide_insn): Ditto.
(emit_nonvlmax_slide_tu_insn): Ditto.
(emit_vlmax_merge_insn): Ditto.
(emit_vlmax_masked_insn): Ditto.
(emit_nonvlmax_masked_insn): Ditto.
(emit_vlmax_masked_store_insn): Ditto.
(emit_nonvlmax_masked_store_insn): Ditto.
(emit_vlmax_masked_mu_insn): Ditto.
(emit_nonvlmax_tu_insn): Ditto.
(emit_nonvlmax_fp_tu_insn): Ditto.
(emit_scalar_move_insn): Ditto.
(emit_vlmax_compress_insn): Ditto.
(emit_vlmax_reduction_insn): Ditto.
(emit_vlmax_fp_reduction_insn): Ditto.
(emit_nonvlmax_fp_reduction_insn): Ditto.
(expand_vec_series): Ditto.
(expand_vector_init_merge_repeating_sequence): Ditto.
(expand_vec_perm): Ditto.
(shuffle_merge_patterns): Ditto.
(shuffle_compress_patterns): Ditto.
(shuffle_decompress_patterns): Ditto.
(expand_reduction): Ditto.
(get_mask_mode): Update return type.
* config/riscv/riscv.cc (riscv_get_mask_mode): Check vector type
is valid, and use new get_mask_mode interface.
|