diff options
author | Juzhe-Zhong <juzhe.zhong@rivai.ai> | 2023-06-20 17:00:31 +0800 |
---|---|---|
committer | Pan Li <pan2.li@intel.com> | 2023-06-20 21:59:22 +0800 |
commit | 1c0b118babcd56dc886976b81727a9a77fc311c3 (patch) | |
tree | 0673a73e19d0a1a13106373a2851b0242b574250 /libcpp/macro.cc | |
parent | b26f1735cb8dcf690e9bd25f27d9f35002f3a291 (diff) | |
download | gcc-1c0b118babcd56dc886976b81727a9a77fc311c3.zip gcc-1c0b118babcd56dc886976b81727a9a77fc311c3.tar.gz gcc-1c0b118babcd56dc886976b81727a9a77fc311c3.tar.bz2 |
RISC-V: Optimize codegen of VLA SLP
Add comments for Robin:
We want to create a pattern where value[ix] = floor (ix / NPATTERNS).
As NPATTERNS is always a power of two we can rewrite this as
= ix & -NPATTERNS.
`
Recently, I figure out a better approach in case of codegen for VLA stepped vector.
Here is the detail descriptions:
Case 1:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8] = b[i * 8 + 37] + 1;
a[i * 8 + 1] = b[i * 8 + 37] + 2;
a[i * 8 + 2] = b[i * 8 + 37] + 3;
a[i * 8 + 3] = b[i * 8 + 37] + 4;
a[i * 8 + 4] = b[i * 8 + 37] + 5;
a[i * 8 + 5] = b[i * 8 + 37] + 6;
a[i * 8 + 6] = b[i * 8 + 37] + 7;
a[i * 8 + 7] = b[i * 8 + 37] + 8;
}
}
We need to generate the stepped vector:
NPATTERNS = 8.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 }
Before this patch:
vid.v v4 ;; {0,1,2,3,4,5,6,7,...}
vsrl.vi v4,v4,3 ;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...}
li a3,8 ;; {8}
vmul.vx v4,v4,a3 ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}
After this patch:
vid.v v4 ;; {0,1,2,3,4,5,6,7,...}
vand.vi v4,v4,-8(-NPATTERNS) ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...}
Case 2:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8] = b[i * 8 + 3] + 1;
a[i * 8 + 1] = b[i * 8 + 2] + 2;
a[i * 8 + 2] = b[i * 8 + 1] + 3;
a[i * 8 + 3] = b[i * 8 + 0] + 4;
a[i * 8 + 4] = b[i * 8 + 7] + 5;
a[i * 8 + 5] = b[i * 8 + 6] + 6;
a[i * 8 + 6] = b[i * 8 + 5] + 7;
a[i * 8 + 7] = b[i * 8 + 4] + 8;
}
}
We need to generate the stepped vector:
NPATTERNS = 4.
{ 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }
Before this patch:
li a6,134221824
slli a6,a6,5
addi a6,a6,3 ;; 64-bit: 0x0003000200010000
vmv.v.x v6,a6 ;; {3, 2, 1, 0, ... }
vid.v v4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vsrl.vi v4,v4,2 ;; {0, 0, 0, 0, 1, 1, 1, 1, ... }
li a3,4 ;; {4}
vmul.vx v4,v4,a3 ;; {0, 0, 0, 0, 4, 4, 4, 4, ... }
vadd.vv v4,v4,v6 ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }
After this patch:
li a3,-536875008
slli a3,a3,4
addi a3,a3,1
slli a3,a3,16
vmv.v.x v2,a3 ;; {3, 1, -1, -3, ... }
vid.v v4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... }
vadd.vv v4,v4,v2 ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... }
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test.
Diffstat (limited to 'libcpp/macro.cc')
0 files changed, 0 insertions, 0 deletions