target/121230 - x86 vector CTOR cost with 387 math

The following adjusts costing of vector construction from scalars for FP modes which with 387 math can reside in FP regs which need spilling to be reloaded to XMM. I've played on the safe side with mixed SSE/387 math. PR target/121230 * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): With FP mode and 387 math cost spill/reload. * gcc.target/i386/pr121230.c: New testcase.
author: Richard Biener <rguenther@suse.de> 2025-12-08 14:36:58 +0100
committer: Richard Biener <rguenther@suse.de> 2025-12-09 15:08:57 +0100
commit: 3222a8493ccfec1a2e9c71103f1a693abb97cf83 (patch)
tree: 80c44931928829e6f034d6aeeacde6f3d9d7ad9e
parent: bf8161604066dadf85dbf0c27bc2630f712c10c8 (diff)
download: gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.zip
gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.tar.gz
gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.tar.bz2
2 files changed, 30 insertions, 1 deletions
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index db43045..75a9cb6 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -26397,7 +26397,20 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 				(TREE_OPERAND (gimple_assign_rhs1 (def), 0))))))
 	    {
 	      if (fp)
-		m_num_sse_needed[where]++;
+		{
+		  /* Scalar FP values residing in x87 registers need to be
+		     spilled and reloaded.  */
+		  auto mode2 = TYPE_MODE (TREE_TYPE (op));
+		  if (IS_STACK_MODE (mode2))
+		    {
+		      int cost
+			= (ix86_cost->hard_register.fp_store[mode2 == SFmode
+							     ? 0 : 1]
+			   + ix86_cost->sse_load[sse_store_index (mode2)]);
+		      stmt_cost += COSTS_N_INSNS (cost) / 2;
+		    }
+		  m_num_sse_needed[where]++;
+		}
 	      else
 		{
 		  m_num_gpr_needed[where]++;
diff --git a/gcc/testsuite/gcc.target/i386/pr121230.c b/gcc/testsuite/gcc.target/i386/pr121230.c
new file mode 100644
index 0000000..67c9c5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr121230.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O3 -march=athlon-xp -mfpmath=387 -fexcess-precision=standard" } */
+
+typedef struct {
+    float a;
+    float b;
+} f32_2;
+
+f32_2 add32_2(f32_2 x, f32_2 y) {
+    return (f32_2){ x.a + y.a, x.b + y.b};
+}
+
+/* We do not want the vectorizer to vectorize the store and/or the
+   conversion (with IA32 we do not support V2SF add) given that spills
+   FP regs to reload them to XMM.  */
+/* { dg-final { scan-assembler-not "movss\[ \\t\]+\[0-9\]*\\\(%esp\\\), %xmm" } } */
author	Richard Biener <rguenther@suse.de>	2025-12-08 14:36:58 +0100
committer	Richard Biener <rguenther@suse.de>	2025-12-09 15:08:57 +0100
commit	3222a8493ccfec1a2e9c71103f1a693abb97cf83 (patch)
tree	80c44931928829e6f034d6aeeacde6f3d9d7ad9e
parent	bf8161604066dadf85dbf0c27bc2630f712c10c8 (diff)
download	gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.zip gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.tar.gz gcc-3222a8493ccfec1a2e9c71103f1a693abb97cf83.tar.bz2