diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2024-01-09 08:28:42 +0000 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2024-01-09 08:28:42 +0000 |
commit | 6a67fdcb3f0cc8be47b49ddd246d0c50c3770800 (patch) | |
tree | ca489328ccad44a66d80406a9f695a1b7cf4bd83 /gcc/cp/constexpr.cc | |
parent | 43c4f982113076ad54c3405f865cc63b0a5ba5aa (diff) | |
download | gcc-6a67fdcb3f0cc8be47b49ddd246d0c50c3770800.zip gcc-6a67fdcb3f0cc8be47b49ddd246d0c50c3770800.tar.gz gcc-6a67fdcb3f0cc8be47b49ddd246d0c50c3770800.tar.bz2 |
i386: PR target/112992: Optimize mode for broadcast of constants.
The issue addressed by this patch is that when initializing vectors by
broadcasting integer constants, the compiler has the flexibility to
select the most appropriate vector mode to perform the broadcast, as
long as the resulting vector has an identical bit pattern.
For example, the following constants are all equivalent:
V4SImode {0x01010101, 0x01010101, 0x01010101, 0x01010101 }
V8HImode {0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101, 0x0101 }
V16QImode {0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, ... 0x01 }
So instruction sequences that construct any of these can be used to
construct the others (with a suitable cast/SUBREG).
On x86_64, it turns out that broadcasts of SImode constants are preferred,
as DImode constants often require a longer movabs instruction, and
HImode and QImode broadcasts require multiple uops on some architectures.
Hence, SImode is always the equal shortest/fastest implementation.
Examples of this improvement, can be seen in the testsuite.
gcc.target/i386/pr102021.c
Before:
0: 48 b8 0c 00 0c 00 0c movabs $0xc000c000c000c,%rax
7: 00 0c 00
a: 62 f2 fd 28 7c c0 vpbroadcastq %rax,%ymm0
10: c3 retq
After:
0: b8 0c 00 0c 00 mov $0xc000c,%eax
5: 62 f2 7d 28 7c c0 vpbroadcastd %eax,%ymm0
b: c3 retq
and
gcc.target/i386/pr90773-17.c:
Before:
0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7>
7: b8 0c 00 00 00 mov $0xc,%eax
c: 62 f2 7d 08 7a c0 vpbroadcastb %eax,%xmm0
12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx)
18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx)
1f: c3 retq
After:
0: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 7 <foo+0x7>
7: b8 0c 0c 0c 0c mov $0xc0c0c0c,%eax
c: 62 f2 7d 08 7c c0 vpbroadcastd %eax,%xmm0
12: 62 f1 7f 08 7f 02 vmovdqu8 %xmm0,(%rdx)
18: c7 42 0f 0c 0c 0c 0c movl $0xc0c0c0c,0xf(%rdx)
1f: c3 retq
where according to Agner Fog's instruction tables broadcastd is slightly
faster on some microarchitectures, for example Knight's Landing.
2024-01-09 Roger Sayle <roger@nextmovesoftware.com>
Hongtao Liu <hongtao.liu@intel.com>
gcc/ChangeLog
PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Allow call to
ix86_expand_vector_init_duplicate to fail, and return NULL_RTX.
(ix86_broadcast_from_constant): Revert recent change; Return a
suitable MEMREF independently of mode/target combinations.
(ix86_expand_vector_move): Allow ix86_expand_vector_init_duplicate
to decide whether expansion is possible/preferrable. Only try
forcing DImode constants to memory (and trying again) if calling
ix86_expand_vector_init_duplicate fails with an DImode immediate
constant.
(ix86_expand_vector_init_duplicate) <case E_V2DImode>: Try using
V4SImode for suitable immediate constants.
<case E_V4DImode>: Try using V8SImode for suitable constants.
<case E_V4HImode>: Fail for CONST_INT_P, i.e. use constant pool.
<case E_V2HImode>: Likewise.
<case E_V8HImode>: For CONST_INT_P try using V4SImode via widen.
<case E_V16QImode>: For CONT_INT_P try using V8HImode via widen.
<label widen>: Handle CONT_INTs via simplify_binary_operation.
Allow recursive calls to ix86_expand_vector_init_duplicate to fail.
<case E_V16HImode>: For CONST_INT_P try V8SImode via widen.
<case E_V32QImode>: For CONST_INT_P try V16HImode via widen.
(ix86_expand_vector_init): Move try using a broadcast for all_same
with ix86_expand_vector_init_duplicate before using constant pool.
gcc/testsuite/ChangeLog
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512f-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512f-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/avx512fp16-13.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Likewise.
* gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-10a.c: Likewise.
* gcc.target/i386/pr100865-10b.c: Likewise.
* gcc.target/i386/pr100865-2.c: Likewise.
* gcc.target/i386/pr100865-3.c: Likewise.
* gcc.target/i386/pr100865-4a.c: Likewise.
* gcc.target/i386/pr100865-4b.c: Likewise.
* gcc.target/i386/pr100865-5a.c: Likewise.
* gcc.target/i386/pr100865-5b.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr102021.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
Diffstat (limited to 'gcc/cp/constexpr.cc')
0 files changed, 0 insertions, 0 deletions