aboutsummaryrefslogtreecommitdiff
path: root/gcc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2023-05-09 07:40:41 +0100
committerRichard Sandiford <richard.sandiford@arm.com>2023-05-09 07:40:41 +0100
commitba72a8d85180d0f4dbcea6eb3458ce175ce190b4 (patch)
tree0de119dfa4b6cb34387267374f4975740dc34200 /gcc
parent73f7109ffb159302e9d8f70948a5b43b046b38bc (diff)
downloadgcc-ba72a8d85180d0f4dbcea6eb3458ce175ce190b4.zip
gcc-ba72a8d85180d0f4dbcea6eb3458ce175ce190b4.tar.gz
gcc-ba72a8d85180d0f4dbcea6eb3458ce175ce190b4.tar.bz2
ira: Don't create copies for earlyclobbered pairs
This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a ("the previous patch"). To start by quoting that: If an insn requires two operands to be tied, and the input operand dies in the insn, IRA acts as though there were a copy from the input to the output with the same execution frequency as the insn. Allocating the same register to the input and the output then saves the cost of a move. If there is no such tie, but an input operand nevertheless dies in the insn, IRA creates a similar move, but with an eighth of the frequency. This helps to ensure that chains of instructions reuse registers in a natural way, rather than using arbitrarily different registers for no reason. This heuristic seems to work well in the vast majority of cases. However, the problem fixed in the previous patch was that we could create a copy for an operand pair even if, for all relevant alternatives, the output and input register classes did not have any registers in common. It is then impossible for the output operand to reuse the dying input register. This left unfixed a further case where copies don't make sense: there is no point trying to reuse the dying input register if, for all relevant alternatives, the output is earlyclobbered and the input doesn't match the output. (Matched earlyclobbers are fine.) Handling that case fixes several existing XFAILs and helps with a follow-on aarch64 patch. Tested on aarch64-linux-gnu and x86_64-linux-gnu. A SPEC2017 run on aarch64 showed no differences outside the noise. Also, I tried compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target per cpu directory, using the options -Os -fno-schedule-insns{,2}. The results below summarise the tests that showed a difference in LOC: Target Tests Good Bad Delta Best Worst Median ====== ===== ==== === ===== ==== ===== ====== amdgcn-amdhsa 14 7 7 3 -18 10 -1 arm-linux-gnueabihf 16 15 1 -22 -4 2 -1 csky-elf 6 6 0 -21 -6 -2 -4 hppa64-hp-hpux11.23 5 5 0 -7 -2 -1 -1 ia64-linux-gnu 16 16 0 -70 -15 -1 -3 m32r-elf 53 1 52 64 -2 8 1 mcore-elf 2 2 0 -8 -6 -2 -6 microblaze-elf 285 283 2 -909 -68 4 -1 mmix 7 7 0 -2101 -2091 -1 -1 msp430-elf 1 1 0 -4 -4 -4 -4 pru-elf 8 6 2 -12 -6 2 -2 rx-elf 22 18 4 -40 -5 6 -2 sparc-linux-gnu 15 14 1 -40 -8 1 -2 sparc-wrs-vxworks 15 14 1 -40 -8 1 -2 visium-elf 2 1 1 0 -2 2 -2 xstormy16-elf 1 1 0 -2 -2 -2 -2 with other targets showing no sensitivity to the patch. The only target that seems to be negatively affected is m32r-elf; otherwise the patch seems like an extremely minor but still clear improvement. gcc/ * ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching earlyclobbers. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs. * gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
Diffstat (limited to 'gcc')
-rw-r--r--gcc/ira-conflicts.cc3
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c2
-rw-r--r--gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c2
19 files changed, 21 insertions, 18 deletions
diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index 5aa080a..a4d93c8 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -398,6 +398,9 @@ can_use_same_reg_p (rtx_insn *insn, int output, int input)
if (op_alt[input].matches == output)
return true;
+ if (op_alt[output].earlyclobber)
+ continue;
+
if (ira_reg_class_intersect[op_alt[input].cl][op_alt[output].cl]
!= NO_REGS)
return true;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
index b74ae33..e40865f 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s16_z_tied1, svint16_t, uint64_t,
z0 = svasr_wide_z (p0, z0, x0))
/*
-** asr_wide_x0_s16_z_untied: { xfail *-*-* }
+** asr_wide_x0_s16_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.h, p0/z, z1\.h
** asr z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
index 8698aef..06e4ca2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s32_z_tied1, svint32_t, uint64_t,
z0 = svasr_wide_z (p0, z0, x0))
/*
-** asr_wide_x0_s32_z_untied: { xfail *-*-* }
+** asr_wide_x0_s32_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.s, p0/z, z1\.s
** asr z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
index 77b1669..1f840ca 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s8_z_tied1, svint8_t, uint64_t,
z0 = svasr_wide_z (p0, z0, x0))
/*
-** asr_wide_x0_s8_z_untied: { xfail *-*-* }
+** asr_wide_x0_s8_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.b, p0/z, z1\.b
** asr z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
index 9e388e4..e02c669 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_s32_z_tied1, svint32_t, int32_t,
z0 = svbic_z (p0, z0, x0))
/*
-** bic_w0_s32_z_untied: { xfail *-*-* }
+** bic_w0_s32_z_untied:
** mov (z[0-9]+\.s), w0
** movprfx z0\.s, p0/z, z1\.s
** bic z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
index bf95368..57c1e53 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_s64_z_tied1, svint64_t, int64_t,
z0 = svbic_z (p0, z0, x0))
/*
-** bic_x0_s64_z_untied: { xfail *-*-* }
+** bic_x0_s64_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.d, p0/z, z1\.d
** bic z0\.d, p0/m, z0\.d, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
index b308b59..9f08ab4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_u32_z_tied1, svuint32_t, uint32_t,
z0 = svbic_z (p0, z0, x0))
/*
-** bic_w0_u32_z_untied: { xfail *-*-* }
+** bic_w0_u32_z_untied:
** mov (z[0-9]+\.s), w0
** movprfx z0\.s, p0/z, z1\.s
** bic z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
index e82db1e..de84f3a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_u64_z_tied1, svuint64_t, uint64_t,
z0 = svbic_z (p0, z0, x0))
/*
-** bic_x0_u64_z_untied: { xfail *-*-* }
+** bic_x0_u64_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.d, p0/z, z1\.d
** bic z0\.d, p0/m, z0\.d, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
index 8d63d39..a020772 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s16_z_tied1, svint16_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_s16_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s16_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.h, p0/z, z1\.h
** lsl z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
index acd813d..bd67b70 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s32_z_tied1, svint32_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_s32_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s32_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.s, p0/z, z1\.s
** lsl z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
index 17e8e86..7eb8627 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s8_z_tied1, svint8_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_s8_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s8_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.b, p0/z, z1\.b
** lsl z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
index cff24a8..482f8d0 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u16_z_tied1, svuint16_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_u16_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u16_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.h, p0/z, z1\.h
** lsl z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
index 7b1afab..612897d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u32_z_tied1, svuint32_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_u32_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u32_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.s, p0/z, z1\.s
** lsl z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
index df8b1ec..6ca2f9e 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u8_z_tied1, svuint8_t, uint64_t,
z0 = svlsl_wide_z (p0, z0, x0))
/*
-** lsl_wide_x0_u8_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u8_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.b, p0/z, z1\.b
** lsl z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
index 863b51a..9110c5a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u16_z_tied1, svuint16_t, uint64_t,
z0 = svlsr_wide_z (p0, z0, x0))
/*
-** lsr_wide_x0_u16_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u16_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.h, p0/z, z1\.h
** lsr z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
index 73c2cf8..93af4fa 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u32_z_tied1, svuint32_t, uint64_t,
z0 = svlsr_wide_z (p0, z0, x0))
/*
-** lsr_wide_x0_u32_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u32_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.s, p0/z, z1\.s
** lsr z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
index fe44eab..2f38139 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u8_z_tied1, svuint8_t, uint64_t,
z0 = svlsr_wide_z (p0, z0, x0))
/*
-** lsr_wide_x0_u8_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u8_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.b, p0/z, z1\.b
** lsr z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
index 747f8a6..12a1b1d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_w0_f32_z_tied1, svfloat32_t, int32_t,
z0 = svscale_z (p0, z0, x0))
/*
-** scale_w0_f32_z_untied: { xfail *-*-* }
+** scale_w0_f32_z_untied:
** mov (z[0-9]+\.s), w0
** movprfx z0\.s, p0/z, z1\.s
** fscale z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
index 004cbfa..f6b1171 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_x0_f64_z_tied1, svfloat64_t, int64_t,
z0 = svscale_z (p0, z0, x0))
/*
-** scale_x0_f64_z_untied: { xfail *-*-* }
+** scale_x0_f64_z_untied:
** mov (z[0-9]+\.d), x0
** movprfx z0\.d, p0/z, z1\.d
** fscale z0\.d, p0/m, z0\.d, \1