diff options
author | Kyrylo Tkachov <ktkachov@nvidia.com> | 2025-02-27 09:00:25 -0800 |
---|---|---|
committer | Kyrylo Tkachov <ktkachov@nvidia.com> | 2025-03-05 16:21:36 +0100 |
commit | db76482175c4e76db273d7fb3a00ae0f932529a6 (patch) | |
tree | a2da58e5af8ca44309e36a3b2b980044d9a122f9 | |
parent | 54da358ff51ded726fe7c026fa59c8db0a1b72ed (diff) | |
download | gcc-db76482175c4e76db273d7fb3a00ae0f932529a6.zip gcc-db76482175c4e76db273d7fb3a00ae0f932529a6.tar.gz gcc-db76482175c4e76db273d7fb3a00ae0f932529a6.tar.bz2 |
PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping
In this testcase late-combine was failing to merge:
dup v31.4s, v31.s[3]
fmla v30.4s, v31.4s, v29.4s
into the lane-wise fmla form.
This is because late-combine checks may_trap_p under the hood on the dup insn.
This ended up returning true for the insn:
(set (reg:V4SF 152 [ _32 ])
(vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
(parallel:V4SF [
(const_int 3 [0x3])]))))
Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
floating-point modes can't trap, it assumed that the V4SF parallel can trap.
The correct behaviour is to recurse into vector inside the PARALLEL and check
the sub-expression. This patch adjusts may_trap_p_1 to do just that.
With this check the above insn is not deemed to be trapping and is propagated
into the FMLA giving:
fmla vD.4s, vA.4s, vB.s[3]
Bootstrapped and tested on aarch64-none-linux-gnu.
Apparently this also fixes a regression in
gcc.target/aarch64/vmul_element_cost.c that I observed.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
PR rtl-optimization/119046
* rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.
gcc/testsuite/
PR rtl-optimization/119046
* gcc.target/aarch64/pr119046.c: New test.
-rw-r--r-- | gcc/rtlanal.cc | 1 | ||||
-rw-r--r-- | gcc/testsuite/gcc.target/aarch64/pr119046.c | 16 |
2 files changed, 17 insertions, 0 deletions
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc index 8caffaf..7ad67af 100644 --- a/gcc/rtlanal.cc +++ b/gcc/rtlanal.cc @@ -3252,6 +3252,7 @@ may_trap_p_1 (const_rtx x, unsigned flags) return true; break; + case PARALLEL: case NEG: case ABS: case SUBREG: diff --git a/gcc/testsuite/gcc.target/aarch64/pr119046.c b/gcc/testsuite/gcc.target/aarch64/pr119046.c new file mode 100644 index 0000000..aa5fa7c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr119046.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ + +#include <arm_neon.h> + +float32x4_t madd_helper_1(float32x4_t a, float32x4_t b, float32x4_t d) +{ + float32x4_t t = a; + t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d); + t = vfmaq_f32 (t, vdupq_n_f32(vgetq_lane_f32 (b, 1)), d); + return t; +} + +/* { dg-final { scan-assembler-not {\tdup\tv[0-9]+\.4s, v[0-9]+.s\[1\]\n} } } */ +/* { dg-final { scan-assembler-times {\tfmla\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.s\[1\]\n} 2 } } */ + |