diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-06-21 13:43:26 +0100 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2023-06-21 13:43:26 +0100 |
commit | b375c5340b55d6e7ee703717831dd27528962570 (patch) | |
tree | 5d740c7e8f8ac1f4cb763e4c0298b7980a2a8c48 /Makefile.def | |
parent | 4d9d207c668e757adefcfe5368d86bcc008fa4db (diff) | |
download | gcc-b375c5340b55d6e7ee703717831dd27528962570.zip gcc-b375c5340b55d6e7ee703717831dd27528962570.tar.gz gcc-b375c5340b55d6e7ee703717831dd27528962570.tar.bz2 |
aarch64: Avoid same input and output Z register for gather loads
The architecture recommends that load-gather instructions avoid using the same
Z register for the load address and the destination, and the Software Optimization
Guides for Arm cores recommend that as well.
This means that for code like:
svuint64_t
food (svbool_t p, uint64_t *in, svint64_t offsets, svuint64_t a)
{
return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets));
}
we'll want to avoid generating the current:
food:
ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output.
add z0.d, z1.d, z0.d
ret
However, we still want to avoid generating extra moves where there were
none before, so the tight aarch64-sve-acle.exp tests for load gathers
should still pass as they are.
This patch implements that recommendation for the load gather patterns by:
* duplicating the alternatives
* marking the output operand as early clobber
* Tying the input Z register operand in the original alternatives to 0
* Penalising the original alternatives with '?'
This results in a large-ish patch in terms of diff lines but the new
compact syntax (thanks Tamar) makes it quite a readable an regular change.
The benchmark numbers on a Neoverse V1 on fprate look okay:
diff
503.bwaves_r 0.00%
507.cactuBSSN_r 0.00%
508.namd_r 0.00%
510.parest_r 0.55%
511.povray_r 0.22%
519.lbm_r 0.00%
521.wrf_r 0.00%
526.blender_r 0.00%
527.cam4_r 0.56%
538.imagick_r 0.00%
544.nab_r 0.00%
549.fotonik3d_r 0.00%
554.roms_r 0.00%
fprate 0.10%
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>):
Add alternatives to prefer to avoid same input and output Z register.
(mask_gather_load<mode><v_int_container>): Likewise.
(*mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise.
(*mask_gather_load<mode><v_int_container>_sxtw): Likewise.
(*mask_gather_load<mode><v_int_container>_uxtw): Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>):
Likewise.
(@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>):
Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_sxtw): Likewise.
(*aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode>
<SVE_2BHSI:mode>_uxtw): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(@aarch64_ldff1_gather<mode>): Likewise.
(*aarch64_ldff1_gather<mode>_sxtw): Likewise.
(*aarch64_ldff1_gather<mode>_uxtw): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode>
<VNx4_NARROW:mode>): Likewise.
(@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_sxtw): Likewise.
(*aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode>
<VNx2_NARROW:mode>_uxtw): Likewise.
* config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise.
(@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode>
<SVE_PARTIAL_I:mode>): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/gather_earlyclobber.c: New test.
* gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.
Diffstat (limited to 'Makefile.def')
0 files changed, 0 insertions, 0 deletions