diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-24 16:45:08 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-24 16:45:08 +0100 |
commit | 99a6c1065de2db04d0f56f4b2cc89acecf21b72e (patch) | |
tree | 5a8677153920ebe157ae5f67104544ebb72dc5c6 /gcc/gcov-io.h | |
parent | 35f4e95265b9e89c923b349988654f1da6348a44 (diff) | |
download | gcc-99a6c1065de2db04d0f56f4b2cc89acecf21b72e.zip gcc-99a6c1065de2db04d0f56f4b2cc89acecf21b72e.tar.gz gcc-99a6c1065de2db04d0f56f4b2cc89acecf21b72e.tar.bz2 |
i386: Fine tune STV register conversion costs for -Os.
The eagle-eyed may have spotted that my recent testcases for DImode shifts
on x86_64 included -mno-stv in the dg-options. This is because the
Scalar-To-Vector (STV) pass currently transforms these shifts to use
SSE vector operations, producing larger code even with -Os. The issue
is that the compute_convert_gain currently underestimates the size of
instructions required for interunit moves, which is corrected with the
patch below.
For the simple test case:
unsigned long long shl1(unsigned long long x) { return x << 1; }
without this patch, GCC -m32 -Os -mavx2 currently generates:
shl1: push %ebp // 1 byte
mov %esp,%ebp // 2 bytes
vmovq 0x8(%ebp),%xmm0 // 5 bytes
pop %ebp // 1 byte
vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes
vmovd %xmm0,%eax // 4 bytes
vpextrd $0x1,%xmm0,%edx // 6 bytes
ret // 1 byte = 24 bytes total
with this patch, we now generate the shorter
shl1: push %ebp // 1 byte
mov %esp,%ebp // 2 bytes
mov 0x8(%ebp),%eax // 3 bytes
mov 0xc(%ebp),%edx // 3 bytes
pop %ebp // 1 byte
add %eax,%eax // 2 bytes
adc %edx,%edx // 2 bytes
ret // 1 byte = 15 bytes total
Benchmarking using CSiBE, shows that this patch saves 1361 bytes
when compiling with -m32 -Os, and saves 172 bytes when compiling
with -Os.
2023-10-24 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate values (sizes) for inter-unit moves with -Os.
Diffstat (limited to 'gcc/gcov-io.h')
0 files changed, 0 insertions, 0 deletions