riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Jennifer Schmitz <jschmitz@nvidia.com>	2025-02-14 00:46:13 -0800
committer	Jennifer Schmitz <jschmitz@nvidia.com>	2025-05-07 15:29:11 +0200
commit	210d06502f22964c7214586c54f8eb54a6965bfd (patch)
tree	66ecaa0ac2a26351f045078561d1b43ac5610d43 /libjava
parent	9565076f9b810541aeb63cb621d694326aa12216 (diff)
download	gcc-210d06502f22964c7214586c54f8eb54a6965bfd.zip gcc-210d06502f22964c7214586c54f8eb54a6965bfd.tar.gz gcc-210d06502f22964c7214586c54f8eb54a6965bfd.tar.bz2

AArch64: Fold SVE load/store with certain ptrue patterns to LDR/STR.

SVE loads/stores using predicates that select the bottom 8, 16, 32, 64, or 128 bits of a register can be folded to ASIMD LDR/STR, thus avoiding the predicate. For example, svuint8_t foo (uint8_t *x) { return svld1 (svwhilelt_b8 (0, 16), x); } was previously compiled to: foo: ptrue p3.b, vl16 ld1b z0.b, p3/z, [x0] ret and is now compiled to: foo: ldr q0, [x0] ret The optimization is applied during the expand pass and was implemented by making the following changes to maskload<mode><vpred> and maskstore<mode><vpred>: - the existing define_insns were renamed and new define_expands for maskloads and maskstores were added with nonmemory_operand as predicate such that the SVE predicate matches both register operands and constant-vector operands. - if the SVE predicate is a constant vector and contains a pattern as described above, an ASIMD load/store is emitted instead of the SVE load/store. The patch implements the optimization for LD1 and ST1, for 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit moves, for all full SVE data vector modes. Follow-up patches for LD2/3/4 and ST2/3/4 and potentially partial SVE vector modes are planned. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR target/117978 * config/aarch64/aarch64-protos.h: Declare aarch64_emit_load_store_through_mode and aarch64_sve_maskloadstore. * config/aarch64/aarch64-sve.md (maskload<mode><vpred>): New define_expand folding maskloads with certain predicate patterns to ASIMD loads. (*aarch64_maskload<mode><vpred>): Renamed from maskload<mode><vpred>. (maskstore<mode><vpred>): New define_expand folding maskstores with certain predicate patterns to ASIMD stores. (*aarch64_maskstore<mode><vpred>): Renamed from maskstore<mode><vpred>. * config/aarch64/aarch64.cc (aarch64_emit_load_store_through_mode): New function emitting a load/store through subregs of a given mode. (aarch64_emit_sve_pred_move): Refactor to use aarch64_emit_load_store_through_mode. (aarch64_expand_maskloadstore): New function to emit ASIMD loads/stores for maskloads/stores with SVE predicates with VL1, VL2, VL4, VL8, or VL16 patterns. (aarch64_partial_ptrue_length): New function returning number of leading set bits in a predicate. gcc/testsuite/ PR target/117978 * gcc.target/aarch64/sve/acle/general/whilelt_5.c: Adjust expected outcome. * gcc.target/aarch64/sve/ldst_ptrue_pat_128_to_neon.c: New test. * gcc.target/aarch64/sve/while_7.c: Adjust expected outcome. * gcc.target/aarch64/sve/while_9.c: Adjust expected outcome.

Diffstat (limited to 'libjava')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: