aboutsummaryrefslogtreecommitdiff
path: root/libcpp
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2024-06-07 13:57:23 +0100
committerRoger Sayle <roger@nextmovesoftware.com>2024-06-07 13:57:23 +0100
commitec985bc97a01577bca8307f986caba7ba7633cde (patch)
tree888f577e2c2e512be6f75d5e35943194ba4b6725 /libcpp
parent5b6d5a886ee45bb969b4de23528311472b4ab66b (diff)
downloadgcc-ec985bc97a01577bca8307f986caba7ba7633cde.zip
gcc-ec985bc97a01577bca8307f986caba7ba7633cde.tar.gz
gcc-ec985bc97a01577bca8307f986caba7ba7633cde.tar.bz2
i386: Improve handling of ternlog instructions in i386/sse.md
This patch improves the way that the x86 backend recognizes and expands AVX512's bitwise ternary logic (vpternlog) instructions. As a motivating example consider the following code which calculates the carry out from a (binary) full adder: typedef unsigned long long v4di __attribute((vector_size(32))); v4di foo(v4di a, v4di b, v4di c) { return (a & b) | ((a ^ b) & c); } with -O2 -march=cascadelake current mainline produces: foo: vpternlogq $96, %ymm0, %ymm1, %ymm2 vmovdqa %ymm0, %ymm3 vmovdqa %ymm2, %ymm0 vpternlogq $248, %ymm3, %ymm1, %ymm0 ret with the patch below, we now generate a single instruction: foo: vpternlogq $232, %ymm2, %ymm1, %ymm0 ret The AVX512 vpternlog[qd] instructions are a very cool addition to the x86 instruction set, that can calculate any Boolean function of three inputs in a single fast instruction. As the truth table for any three-input function has 8 rows, any specific function can be represented by specifying those bits, i.e. by a 8-bit byte, an immediate integer between 0 and 256. Examples of ternary functions and their indices are given below: 0x01 1: ~((b|a)|c) 0x02 2: (~(b|a))&c 0x03 3: ~(b|a) 0x04 4: (~(c|a))&b 0x05 5: ~(c|a) 0x06 6: (c^b)&~a 0x07 7: ~((c&b)|a) 0x08 8: (~a&c)&b (~a&b)&c (c&b)&~a 0x09 9: ~((c^b)|a) 0x0a 10: ~a&c 0x0b 11: ~((~c&b)|a) (~b|c)&~a 0x0c 12: ~a&b 0x0d 13: ~((~b&c)|a) (~c|b)&~a 0x0e 14: (c|b)&~a 0x0f 15: ~a 0x10 16: (~(c|b))&a 0x11 17: ~(c|b) ... 0xf4 244: (~c&b)|a 0xf5 245: ~c|a 0xf6 246: (c^b)|a 0xf7 247: (~(c&b))|a 0xf8 248: (c&b)|a 0xf9 249: (~(c^b))|a 0xfa 250: c|a 0xfb 251: (c|a)|~b (~b|a)|c (~b|c)|a 0xfc 252: b|a 0xfd 253: (b|a)|~c (~c|a)|b (~c|b)|a 0xfe 254: (b|a)|c (c|a)|b (c|b)|a A naive implementation (in many compilers) might be add define_insn patterns for all 256 different functions. The situation is even worse as many of these Boolean functions don't have a "canonical form" (as produced by simplify_rtx) and would each need multiple patterns. See the space-separated equivalent expressions in the table above. This need to provide instruction "templates" might explain why GCC, LLVM and ICC all exhibit similar coverage problems in their ability to recognize x86 ternlog ternary functions. Perhaps a unique feature of GCC's design is that in addition to regular define_insn templates, machine descriptions can also perform pattern matching via a match_operator (and its corresponding predicate). This patch introduces a ternlog_operand predicate that matches a (possibly infinite) set of expression trees, identifying those that have at most three unique operands. This then allows a define_insn_and_split to recognize suitable expressions and then transform them into the appropriate UNSPEC_VTERNLOG as a pre-reload splitter. This design allows combine to smash together arbitrarily complex Boolean expressions, then transform them into an UNSPEC before register allocation. As an "optimization", where possible ix86_expand_ternlog generates a simpler binary operation, using AND, XOR, IOR or ANDN where possible, and in a few cases attempts to "canonicalize" the ternlog, by reordering or duplicating operands, so that later CSE passes have a hope of spotting equivalent values. This patch leaves the existing ternlog patterns in sse.md (for now), many of which are made obsolete by these changes. In theory we now only need one define_insn for UNSPEC_VTERNLOG. One complication from these previous variants was that they inconsistently used decimal vs. hexadecimal to specify the immediate constant operand in assembly language, making the list of tweaks to the testsuite with this patch larger than it might have been. I propose to remove the vestigial patterns in a follow-up patch, once this approach has baked (proven to be stable) on mainline. 2024-06-07 Roger Sayle <roger@nextmovesoftware.com> Hongtao Liu <hongtao.liu@intel.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_args_builtin): Call fixup_modeless_constant before testing predicates. Only call copy_to_mode_reg on memory operands (after the first one). (ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR into a VEC_DUPLICATE if possible. (ix86_ternlog_idx): Convert an RTX expression into a ternlog index between 0 and 255, recording the operands in ARGS, if possible or return -1 if this is not possible/valid. (ix86_ternlog_leaf_p): Helper function to identify "leaves" of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc. (ix86_ternlog_operand_p): Test whether a expression is suitable for and prefered as an UNSPEC_TERNLOG. (ix86_expand_ternlog_binop): Helper function to construct the binary operation corresponding to a sufficiently simple ternlog. (ix86_expand_ternlog_andnot): Helper function to construct a ANDN operation corresponding to a sufficiently simple ternlog. (ix86_expand_ternlog): Expand a 3-operand ternary logic expression, constructing either an UNSPEC_TERNLOG or simpler rtx expression. Called from builtin expanders and pre-reload splitters. * config/i386/i386-protos.h (ix86_ternlog_idx): Prototype here. (ix86_ternlog_operand_p): Likewise. (ix86_expand_ternlog): Likewise. * config/i386/predicates.md (ternlog_operand): New predicate that calls xi86_ternlog_operand_p. * config/i386/sse.md (<avx512>_vpternlog<mode>_0): New define_insn_and_split that recognizes a SET_SRC of ternlog_operand and expands it via ix86_expand_ternlog pre-reload. (<avx512>_vternlog<mode>_mask): Convert from define_insn to define_expand. Use ix86_expand_ternlog if the mask operand is ~0 (or 255 or -1). (*<avx512>_vternlog<mode>_mask): define_insn renamed from above. gcc/testsuite/ChangeLog * gcc.target/i386/avx512f-vpternlogd-1.c: Update test case. * gcc.target/i386/avx512f-vpternlogq-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise. * gcc.target/i386/pr100711-4.c: Likewise. * gcc.target/i386/pr100711-5.c: Likewise. * gcc.target/i386/avx512f-vpternlogd-3.c: New 128-bit test case. * gcc.target/i386/avx512f-vpternlogd-4.c: New 256-bit test case. * gcc.target/i386/avx512f-vpternlogd-5.c: New 512-bit test case. * gcc.target/i386/avx512f-vpternlogq-3.c: New test case.
Diffstat (limited to 'libcpp')
0 files changed, 0 insertions, 0 deletions