aboutsummaryrefslogtreecommitdiff
path: root/ChangeLog.jit
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2024-05-07 07:14:40 +0100
committerRoger Sayle <roger@nextmovesoftware.com>2024-05-07 07:16:58 +0100
commit79649a5dcd81bc05c0ba591068c9075de43bd417 (patch)
tree0c833b1c89f8afc6eaf58f0f318afcc9306f0ffe /ChangeLog.jit
parent0c43c673b0d431ca02d83bf6fae9cd60e9a3d0a8 (diff)
downloadgcc-79649a5dcd81bc05c0ba591068c9075de43bd417.zip
gcc-79649a5dcd81bc05c0ba591068c9075de43bd417.tar.gz
gcc-79649a5dcd81bc05c0ba591068c9075de43bd417.tar.bz2
PR target/106060: Improved SSE vector constant materialization on x86.
This patch resolves PR target/106060 by providing efficient methods for materializing/synthesizing special "vector" constants on x86. Currently there are three methods of materializing a vector constant; the most general is to load a vector from the constant pool, secondly "duplicated" constants can be synthesized by moving an integer between units and broadcasting (of shuffling it), and finally the special cases of the all-zeros vector and all-ones vectors can be loaded via a single SSE instruction. This patch handle additional cases that can be synthesized in two instructions, loading an all-ones vector followed by another SSE instruction. Following my recent patch for PR target/112992, there's conveniently a single place in i386-expand.cc where these special cases can be handled. Two examples are given in the original bugzilla PR for 106060. __m256i should_be_cmpeq_abs () { return _mm256_set1_epi8 (1); } is now generated (with -O3 -march=x86-64-v3) as: vpcmpeqd %ymm0, %ymm0, %ymm0 vpabsb %ymm0, %ymm0 ret and __m256i should_be_cmpeq_add () { return _mm256_set1_epi8 (-2); } is now generated as: vpcmpeqd %ymm0, %ymm0, %ymm0 vpaddb %ymm0, %ymm0, %ymm0 ret 2024-05-07 Roger Sayle <roger@nextmovesoftware.com> Hongtao Liu <hongtao.liu@intel.com> gcc/ChangeLog PR target/106060 * config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New. (struct ix86_vec_bcast_map_simode_t): New type for table below. (ix86_vec_bcast_map_simode): Table of SImode constants that may be efficiently synthesized by a ix86_vec_bcast_alg method. (ix86_vec_bcast_map_simode_cmp): New comparator for bsearch. (ix86_vector_duplicate_simode_const): Efficiently synthesize V4SImode and V8SImode constants that duplicate special constants. (ix86_vector_duplicate_value): Attempt to synthesize "special" vector constants using ix86_vector_duplicate_simode_const. * config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a vector integer mode costs with a single SSE instruction. gcc/testsuite/ChangeLog PR target/106060 * gcc.target/i386/auto-init-8.c: Update test case. * gcc.target/i386/avx512fp16-13.c: Likewise. * gcc.target/i386/pr100865-9a.c: Likewise. * gcc.target/i386/pr101796-1.c: Likewise. * gcc.target/i386/pr106060-1.c: New test case. * gcc.target/i386/pr106060-2.c: Likewise. * gcc.target/i386/pr106060-3.c: Likewise. * gcc.target/i386/pr70314.c: Update test case. * gcc.target/i386/vect-shiftv4qi.c: Likewise. * gcc.target/i386/vect-shiftv8qi.c: Likewise.
Diffstat (limited to 'ChangeLog.jit')
0 files changed, 0 insertions, 0 deletions