diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2025-03-10 20:29:52 +0000 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2025-03-10 20:29:52 +0000 |
commit | 31dcf941ac78c4b1b01dc4b2ce9809f0209153b8 (patch) | |
tree | 2900cfa91025367e053907f51ed075db16e3edb5 /gcc/config.gcc | |
parent | e355fe414aa3aaf215c7dd9dd789ce217a1b458c (diff) | |
download | gcc-31dcf941ac78c4b1b01dc4b2ce9809f0209153b8.zip gcc-31dcf941ac78c4b1b01dc4b2ce9809f0209153b8.tar.gz gcc-31dcf941ac78c4b1b01dc4b2ce9809f0209153b8.tar.bz2 |
aarch64: Avoid unnecessary use of 2-input TBLs [PR115258]
When using TBL for (say) a V4SI permutation, the aarch64 port first
asks target-independent code to lower to a V16QI permutation.
Then, during code generation, an input like:
(reg:V4SI R)
gets converted to:
(subreg:V16QI (reg:V4SI R) 0)
aarch64_vectorize_vec_perm_const had:
d.op0 = op0 ? force_reg (op_mode, op0) : NULL_RTX;
if (op0 == op1)
d.op1 = d.op0;
else
d.op1 = op1 ? force_reg (op_mode, op1) : NULL_RTX;
But subregs (unlike regs) are not shared, so the op0 == op1 check
always failed for this case. We'd then force each subreg into a
fresh register, meaning that during the later:
aarch64_expand_vec_perm_1 (d->target, d->op0, d->op1, sel);
there is no way for aarch64_expand_vec_perm_1 to realise that
d->op0 and d->op1 are the same value. It would therefore generate
a two-input TBL in the testcase, even though a single-input TBL
is enough.
I'm not sure forcing subregs to a fresh regiter is a good idea --
it caused problems for copysign & co. -- but that's not something
to fiddle with during stage 4. Using op0 == op1 for rtx equality
is independently wrong, so we might as well just fix that for now.
The patch gets rid of extra MOVs that are a regression from GCC 14.
The testcase is based on one from Kugan, itself based on TSVC.
gcc/
PR target/115258
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const): Use
d.one_vector_p to decide whether op1 should be a copy of op0.
gcc/testsuite/
PR target/115258
* gcc.target/aarch64/pr115258_2.c: New test.
Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
Diffstat (limited to 'gcc/config.gcc')
0 files changed, 0 insertions, 0 deletions