diff options
author | Uros Bizjak <ubizjak@gmail.com> | 2023-05-25 19:40:26 +0200 |
---|---|---|
committer | Uros Bizjak <ubizjak@gmail.com> | 2023-05-25 19:41:09 +0200 |
commit | 52ff3f7b863da1011b73c0ab3b11f6c78b6451c7 (patch) | |
tree | 62a0ee510a7243a29279d6a69efdc12e487968cf /gcc/fortran | |
parent | 66cc0cb0f44f17049f61af6755043999c4fa5a24 (diff) | |
download | gcc-52ff3f7b863da1011b73c0ab3b11f6c78b6451c7.zip gcc-52ff3f7b863da1011b73c0ab3b11f6c78b6451c7.tar.gz gcc-52ff3f7b863da1011b73c0ab3b11f6c78b6451c7.tar.bz2 |
i386: Use 2x-wider modes when emulating QImode vector instructions
Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Currently, the compiler generates following
assembly for V16QImode multiplication (-mavx2):
vpunpcklbw %xmm0, %xmm0, %xmm3
vpunpcklbw %xmm1, %xmm1, %xmm2
vpunpckhbw %xmm0, %xmm0, %xmm0
movl $255, %eax
vpunpckhbw %xmm1, %xmm1, %xmm1
vpmullw %xmm3, %xmm2, %xmm2
vmovd %eax, %xmm3
vpmullw %xmm0, %xmm1, %xmm1
vpbroadcastw %xmm3, %xmm3
vpand %xmm2, %xmm3, %xmm0
vpand %xmm1, %xmm3, %xmm3
vpackuswb %xmm3, %xmm0, %xmm0
and only with -mavx512bw -mavx512vl generates:
vpmovzxbw %xmm1, %ymm1
vpmovzxbw %xmm0, %ymm0
vpmullw %ymm1, %ymm0, %ymm0
vpmovwb %ymm0, %xmm0
Patched compiler generates more optimized code involving multiplication
in 2x-wider mode in cases where missing truncate instruction has to be
emulated with a permutation (-mavx2):
vpmovzxbw %xmm0, %ymm0
vpmovzxbw %xmm1, %ymm1
movl $255, %eax
vpmullw %ymm1, %ymm0, %ymm1
vmovd %eax, %xmm0
vpbroadcastw %xmm0, %ymm0
vpand %ymm1, %ymm0, %ymm0
vpackuswb %ymm0, %ymm0, %ymm0
vpermq $216, %ymm0, %ymm0
The patch also adjusts cost calculation of V*QImode emulations to account
for generation of 2x-wider mode instructions.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Emulate truncation via
ix86_expand_vec_perm_const_1 when native truncate insn
is not available.
(ix86_expand_vecop_qihi_partial) <case MULT>: Use pmovzx
when available. Trivially rename some variables.
(ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.
* config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
calculation of V*QImode emulations to account for generation of
2x-wider mode instructions.
(ix86_shift_rotate_cost): Update cost calculation of V*QImode
emulations to account for generation of 2x-wider mode instructions.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512vl-pr95488-1.c: Revert 2023-05-18 change.
Diffstat (limited to 'gcc/fortran')
0 files changed, 0 insertions, 0 deletions