diff options
author | Juzhe-Zhong <juzhe.zhong@rivai.ai> | 2023-06-12 23:11:07 +0800 |
---|---|---|
committer | Pan Li <pan2.li@intel.com> | 2023-06-13 09:10:33 +0800 |
commit | d150afb4791e8dff4fc1d4e4b10938b55e58cb16 (patch) | |
tree | 35b2bd21ec53e25a7cfa591f9d5bd9c943d7043a /gcc/ada | |
parent | 9d250bdb88e2a0496307e64675529f3b73c054b5 (diff) | |
download | gcc-d150afb4791e8dff4fc1d4e4b10938b55e58cb16.zip gcc-d150afb4791e8dff4fc1d4e4b10938b55e58cb16.tar.gz gcc-d150afb4791e8dff4fc1d4e4b10938b55e58cb16.tar.bz2 |
RISC-V: Enhance RVV VLA SLP auto-vectorization with decompress operation
According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress)
Decompress operation.
Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };
Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 3], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask = { 0, 1, 0, 1, ... };
For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
for (int i = 0; i < n; ++i)
{
a[i * 2] += b;
a[i * 2 + 1] += c;
}
}
ASM:
...
vid.v v0
vand.vi v0,v0,1
vmseq.vi v0,v0,1 ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
viota.m v3,v0
vrgather.vv v2,v1,v3,v0.t
Loop:
vsetvli zero,a5,e64,m1,ta,ma
vle64.v v1,0(a0)
vsetvli a6,zero,e64,m1,ta,ma
vadd.vv v1,v2,v1
vsetvli zero,a5,e64,m1,ta,ma
mv a5,a3
vse64.v v1,0(a0)
add a3,a3,a1
add a0,a0,a2
bgtu a5,a4,.L4
gcc/ChangeLog:
* config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
(shuffle_decompress_patterns): New function.
(expand_vec_perm_const_1): Add decompress optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.
Diffstat (limited to 'gcc/ada')
0 files changed, 0 insertions, 0 deletions