diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2021-10-18 12:15:40 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2021-10-18 12:18:00 +0100 |
commit | 247c407c83f0015f4b92d5f71e45b63192f6757e (patch) | |
tree | 8203d4eeab5b3e00e2981babbf0d4f37270ac4f8 /gcc/combine.c | |
parent | 815f15d3384edbdd73cc54536f30947e15183151 (diff) | |
download | gcc-247c407c83f0015f4b92d5f71e45b63192f6757e.zip gcc-247c407c83f0015f4b92d5f71e45b63192f6757e.tar.gz gcc-247c407c83f0015f4b92d5f71e45b63192f6757e.tar.bz2 |
Try placing RTL folded constants in the constant pool.
My recent attempts to come up with a testcase for my patch to evaluate
ss_plus in simplify-rtx.c, identified a missed optimization opportunity
(that's potentially a long-time regression): The RTL optimizers no longer
place constants in the constant pool.
The motivating x86_64 example is the simple program:
typedef char v8qi __attribute__ ((vector_size (8)));
v8qi foo()
{
v8qi tx = { 1, 0, 0, 0, 0, 0, 0, 0 };
v8qi ty = { 2, 0, 0, 0, 0, 0, 0, 0 };
v8qi t = __builtin_ia32_paddsb(tx, ty);
return t;
}
which (with my previous patch) currently results in:
foo: movq .LC0(%rip), %xmm0
movq .LC1(%rip), %xmm1
paddsb %xmm1, %xmm0
ret
even though the RTL contains the result in a REG_EQUAL note:
(insn 7 6 12 2 (set (reg:V8QI 83)
(ss_plus:V8QI (reg:V8QI 84)
(reg:V8QI 85))) "ssaddqi3.c":7:12 1419 {*mmx_ssaddv8qi3}
(expr_list:REG_DEAD (reg:V8QI 85)
(expr_list:REG_DEAD (reg:V8QI 84)
(expr_list:REG_EQUAL (const_vector:V8QI [
(const_int 3 [0x3])
(const_int 0 [0]) repeated x7
])
(nil)))))
Together with the patch below, GCC will now generate the much
more sensible:
foo: movq .LC2(%rip), %xmm0
ret
My first approach was to look in cse.c (where the REG_EQUAL note gets
added) and notice that the constant pool handling functionality has been
unreachable for a while. A quick search for constant_pool_entries_cost
shows that it's initialized to zero, but never set to a non-zero value,
meaning that force_const_mem is never called. This functionality used
to work way back in 2003, but has been lost over time:
https://gcc.gnu.org/pipermail/gcc-patches/2003-October/116435.html
The changes to cse.c below restore this functionality (placing suitable
constants in the constant pool) with two significant refinements;
(i) it only attempts to do this if the function already uses a constant
pool (thanks to the availability of crtl->uses_constant_pool since 2003).
(ii) it allows different constants (i.e. modes) to have different costs,
so that floating point "doubles" and 64-bit, 128-bit, 256-bit and 512-bit
vectors don't all have the share the same cost. Back in 2003, the
assumption was that everything in a constant pool had the same
cost, hence the global variable constant_pool_entries_cost.
Although this is a useful CSE fix, it turns out that it doesn't cure
my motivating problem above. CSE only considers a single instruction,
so determines that it's cheaper to perform the ss_plus (COSTS_N_INSNS(1))
than read the result from the constant pool (COSTS_N_INSNS(2)). It's
only when the other reads from the constant pool are also eliminated,
that this transformation is a win. Hence a better place to perform
this transformation is in combine, where after failing to "recog" the
load of a suitable constant, it can retry after calling force_const_mem.
This achieves the desired transformation and allows the backend insn_cost
call-back to control whether or not using the constant pool is preferrable.
Alas, it's rare to change code generation without affecting something in
GCC's testsuite. On x86_64-pc-linux-gnu there were two families of new
failures (and I'd predict similar benign fallout on other platforms).
One failure was gcc.target/i386/387-12.c (aka PR target/26915), where
the test is missing an explicit -m32 flag. On i686, it's very reasonable
to materialize -1.0 using "fld1; fchs", but on x86_64-pc-linux-gnu we
currently generate the awkward:
testm1: fld1
fchs
fstpl -8(%rsp)
movsd -8(%rsp), %xmm0
ret
which combine now very reasonably simplifies to just:
testm1: movsd .LC3(%rip), %xmm0
ret
The other class of x86_64-pc-linux-gnu failure was from materialization
of vector constants using vpbroadcast (e.g. gcc.target/i386/pr90773-17.c)
where the decision is finely balanced; the load of an integer register
with an immediate constant, followed by a vpbroadcast is deemed to be
COSTS_N_INSNS(2), whereas a load from the constant pool is also reported
as COSTS_N_INSNS(2). My solution is to tweak the i386.c's rtx_costs
so that all other things being equal, an instruction (sequence) that
accesses memory is fractionally more expensive than one that doesn't.
2021-10-18 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* combine.c (recog_for_combine): For an unrecognized move/set of
a constant, try force_const_mem to place it in the constant pool.
* cse.c (constant_pool_entries_cost, constant_pool_entries_regcost):
Delete global variables (that are no longer assigned a cost value).
(cse_insn): Simplify logic for deciding whether to place a folded
constant in the constant pool using force_const_mem.
(cse_main): Remove zero initialization of constant_pool_entries_cost
and constant_pool_entries_regcost.
* config/i386/i386.c (ix86_rtx_costs): Make memory accesses
fractionally more expensive, when optimizing for speed.
gcc/testsuite/ChangeLog
* gcc.target/i386/387-12.c: Add explicit -m32 option.
Diffstat (limited to 'gcc/combine.c')
-rw-r--r-- | gcc/combine.c | 22 |
1 files changed, 21 insertions, 1 deletions
diff --git a/gcc/combine.c b/gcc/combine.c index 892c834..03e9a78 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11567,7 +11567,27 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) bool changed = false; if (GET_CODE (pat) == SET) - changed = change_zero_ext (pat); + { + /* For an unrecognized single set of a constant, try placing it in + the constant pool, if this function already uses one. */ + rtx src = SET_SRC (pat); + if (CONSTANT_P (src) + && !CONST_INT_P (src) + && crtl->uses_const_pool) + { + machine_mode mode = GET_MODE (src); + if (mode == VOIDmode) + mode = GET_MODE (SET_DEST (pat)); + src = force_const_mem (mode, src); + if (src) + { + SUBST (SET_SRC (pat), src); + changed = true; + } + } + else + changed = change_zero_ext (pat); + } else if (GET_CODE (pat) == PARALLEL) { int i; |