diff options
author | H.J. Lu <hjl.tools@gmail.com> | 2021-06-02 07:15:45 -0700 |
---|---|---|
committer | H.J. Lu <hjl.tools@gmail.com> | 2021-07-01 08:11:20 -0700 |
commit | edafb35bdadf309ebb9d1eddc5549f9e1ad49c09 (patch) | |
tree | 14d2f553da601c4e2dbb3d8446d43dffa78c5189 /gcc/input.c | |
parent | d63454815de3b93331025bd990efdad5296ae706 (diff) | |
download | gcc-edafb35bdadf309ebb9d1eddc5549f9e1ad49c09.zip gcc-edafb35bdadf309ebb9d1eddc5549f9e1ad49c09.tar.gz gcc-edafb35bdadf309ebb9d1eddc5549f9e1ad49c09.tar.bz2 |
x86: Convert CONST_WIDE_INT/CONST_VECTOR to broadcast
1. Update move expanders to convert the CONST_WIDE_INT and CONST_VECTOR
operands to vector broadcast from an integer with AVX.
2. Add ix86_gen_scratch_sse_rtx to return a scratch SSE register which
won't increase stack alignment requirement and blocks transformation by
the combine pass.
A small benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/memset/broadcast
shows that broadcast is a little bit faster on Intel Core i7-8559U:
$ make
gcc -g -I. -O2 -c -o test.o test.c
gcc -g -c -o memory.o memory.S
gcc -g -c -o broadcast.o broadcast.S
gcc -g -c -o vec_dup_sse2.o vec_dup_sse2.S
gcc -o test test.o memory.o broadcast.o vec_dup_sse2.o
./test
memory : 147215
broadcast : 121213
vec_dup_sse2: 171366
$
broadcast is also smaller:
$ size memory.o broadcast.o
text data bss dec hex filename
132 0 0 132 84 memory.o
122 0 0 122 7a broadcast.o
$
3. Update PR 87767 tests to expect integer broadcast instead of broadcast
from memory.
4. Update avx512f_cond_move.c to expect integer broadcast.
A small benchmark:
https://gitlab.com/x86-benchmarks/microbenchmark/-/tree/vpaddd/broadcast
shows that integer broadcast is faster than embedded memory broadcast:
$ make
gcc -g -I. -O2 -march=skylake-avx512 -c -o test.o test.c
gcc -g -c -o memory.o memory.S
gcc -g -c -o broadcast.o broadcast.S
gcc -o test test.o memory.o broadcast.o
./test
memory : 425538
broadcast : 375260
$
gcc/
PR target/100865
* config/i386/i386-expand.c (ix86_expand_vector_init_duplicate):
New prototype.
(ix86_byte_broadcast): New function.
(ix86_convert_const_wide_int_to_broadcast): Likewise.
(ix86_expand_move): Convert CONST_WIDE_INT to broadcast if mode
size is 16 bytes or bigger.
(ix86_broadcast_from_integer_constant): New function.
(ix86_expand_vector_move): Convert CONST_WIDE_INT and CONST_VECTOR
to broadcast if mode size is 16 bytes or bigger.
* config/i386/i386-protos.h (ix86_gen_scratch_sse_rtx): New
prototype.
* config/i386/i386.c (ix86_gen_scratch_sse_rtx): New function.
gcc/testsuite/
PR target/100865
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Expect integer
broadcast.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512f_cond_move.c: Also pass
-mprefer-vector-width=512 and expect integer broadcast.
* gcc.target/i386/pr100865-1.c: New test.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-6a.c: Likewise.
* gcc.target/i386/pr100865-6b.c: Likewise.
* gcc.target/i386/pr100865-6c.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7b.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-11a.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Likewise.
* gcc.target/i386/pr100865-11c.c: Likewise.
* gcc.target/i386/pr100865-12a.c: Likewise.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-12c.c: Likewise.
Diffstat (limited to 'gcc/input.c')
0 files changed, 0 insertions, 0 deletions