diff options
author | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-09-29 11:00:14 +0100 |
---|---|---|
committer | Kyrylo Tkachov <kyrylo.tkachov@arm.com> | 2021-09-29 11:00:14 +0100 |
commit | 8f95e3c04d659d541ca4937b3df2f1175a1c5f05 (patch) | |
tree | 7d594f44833d3cfb7697805d1d528aee5e2d34da /gcc/config.gcc | |
parent | d3e7bb15e28c554bf4484a912f3b9c18c60ec68f (diff) | |
download | gcc-8f95e3c04d659d541ca4937b3df2f1175a1c5f05.zip gcc-8f95e3c04d659d541ca4937b3df2f1175a1c5f05.tar.gz gcc-8f95e3c04d659d541ca4937b3df2f1175a1c5f05.tar.bz2 |
aarch64: Improve size optimisation heuristic for setmem expansion
This patch adjusts the setmem expansion in the backend to track the number of ops it generates
for the DUP + STR/STP inline sequences. This way we can compare the size/complexity of the sequence
against alternatives, notably just returning "false" and thus just emitting a call to memset.
The simple heuristic change here is that if we were going to emit more than 4 operations then
we shouldn't bother and just call memset. The number 4 is chosen because in the worst case for memset
we need to emit 4 instructions: 3 to move the arguments into the right registers and 1 for the call.
The speed optimisation decisions are not affected, though I do want to extend these expansions in a later
patch and I'd like to reuse this ops counting logic there. In any case this patch should make sense on its own.
For the code:
void __attribute__((__noinline__))
set127byte (int64_t *src, int c)
{
__builtin_memset (src, c, 127);
}
void __attribute__((__noinline__))
set128byte (int64_t *src, int c)
{
__builtin_memset (src, c, 128);
}
when optimising for size we now get just an immediate move + a call to memset (2 instructions) where before we'd have generated:
set127byte(long*, int):
dup v0.16b, w1
str q0, [x0, 96]
stp q0, q0, [x0]
stp q0, q0, [x0, 32]
stp q0, q0, [x0, 64]
str q0, [x0, 111]
ret
set128byte(long*, int):
dup v0.16b, w1
stp q0, q0, [x0]
stp q0, q0, [x0, 32]
stp q0, q0, [x0, 64]
stp q0, q0, [x0, 96]
ret
which is clearly undesirable for -Os.
I've adjusted the recently-added gcc.target/aarch64/memset-strict-align-1.c testcase to use a bigger struct
and switch to speed optimisation as with this patch we'll just call memset rather than expanding inline.
That is the right decision for size optimisation (the resulting code is indeed shorter).
With -O2 and the new struct size we still try the SIMD expansion and still trigger the path that the testcase is supposed to exercise.
2021-09-27 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* config/aarch64/aarch64.c (aarch64_expand_setmem): Count number of
emitted operations and adjust heuristic for code size.
2021-09-27 Kyrylo Tkachov <kyrylo.tkachov@arm.com>
* gcc.target/aarch64/memset-corner-cases-2.c: New test.
* gcc.target/aarch64/memset-strict-align-1.c: Adjust.
Diffstat (limited to 'gcc/config.gcc')
0 files changed, 0 insertions, 0 deletions