diff options
author | Jennifer Schmitz <jschmitz@nvidia.com> | 2025-02-14 00:46:13 -0800 |
---|---|---|
committer | Jennifer Schmitz <jschmitz@nvidia.com> | 2025-05-07 15:29:11 +0200 |
commit | 210d06502f22964c7214586c54f8eb54a6965bfd (patch) | |
tree | 66ecaa0ac2a26351f045078561d1b43ac5610d43 /libjava/classpath | |
parent | 9565076f9b810541aeb63cb621d694326aa12216 (diff) | |
download | gcc-210d06502f22964c7214586c54f8eb54a6965bfd.zip gcc-210d06502f22964c7214586c54f8eb54a6965bfd.tar.gz gcc-210d06502f22964c7214586c54f8eb54a6965bfd.tar.bz2 |
AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.
SVE loads/stores using predicates that select the bottom 8, 16, 32, 64,
or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the
predicate.
For example,
svuint8_t foo (uint8_t *x) {
return svld1 (svwhilelt_b8 (0, 16), x);
}
was previously compiled to:
foo:
ptrue p3.b, vl16
ld1b z0.b, p3/z, [x0]
ret
and is now compiled to:
foo:
ldr q0, [x0]
ret
The optimization is applied during the expand pass and was implemented
by making the following changes to maskload<mode><vpred> and
maskstore<mode><vpred>:
- the existing define_insns were renamed and new define_expands for maskloads
and maskstores were added with nonmemory_operand as predicate such that the
SVE predicate matches both register operands and constant-vector operands.
- if the SVE predicate is a constant vector and contains a pattern as
described above, an ASIMD load/store is emitted instead of the SVE load/store.
The patch implements the optimization for LD1 and ST1, for 8-bit, 16-bit,
32-bit, 64-bit, and 128-bit moves, for all full SVE data vector modes.
Follow-up patches for LD2/3/4 and ST2/3/4 and potentially partial SVE vector
modes are planned.
The patch was bootstrapped and tested on aarch64-linux-gnu, no regression.
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR target/117978
* config/aarch64/aarch64-protos.h: Declare
aarch64_emit_load_store_through_mode and aarch64_sve_maskloadstore.
* config/aarch64/aarch64-sve.md
(maskload<mode><vpred>): New define_expand folding maskloads with
certain predicate patterns to ASIMD loads.
(*aarch64_maskload<mode><vpred>): Renamed from maskload<mode><vpred>.
(maskstore<mode><vpred>): New define_expand folding maskstores with
certain predicate patterns to ASIMD stores.
(*aarch64_maskstore<mode><vpred>): Renamed from maskstore<mode><vpred>.
* config/aarch64/aarch64.cc
(aarch64_emit_load_store_through_mode): New function emitting a
load/store through subregs of a given mode.
(aarch64_emit_sve_pred_move): Refactor to use
aarch64_emit_load_store_through_mode.
(aarch64_expand_maskloadstore): New function to emit ASIMD loads/stores
for maskloads/stores with SVE predicates with VL1, VL2, VL4, VL8, or
VL16 patterns.
(aarch64_partial_ptrue_length): New function returning number of leading
set bits in a predicate.
gcc/testsuite/
PR target/117978
* gcc.target/aarch64/sve/acle/general/whilelt_5.c: Adjust expected
outcome.
* gcc.target/aarch64/sve/ldst_ptrue_pat_128_to_neon.c: New test.
* gcc.target/aarch64/sve/while_7.c: Adjust expected outcome.
* gcc.target/aarch64/sve/while_9.c: Adjust expected outcome.
Diffstat (limited to 'libjava/classpath')
0 files changed, 0 insertions, 0 deletions