diff options
author | Richard Sandiford <richard.sandiford@linaro.org> | 2018-02-01 11:04:00 +0000 |
---|---|---|
committer | Richard Sandiford <rsandifo@gcc.gnu.org> | 2018-02-01 11:04:00 +0000 |
commit | 8179efe00e04285184112de7dbb977a75852197c (patch) | |
tree | 5a7162b0e53a7fb9c7dbbc86cf05d9a3619cbda2 /gcc/config | |
parent | 947b137212d16d432eec201fe7f800dfdb481203 (diff) | |
download | gcc-8179efe00e04285184112de7dbb977a75852197c.zip gcc-8179efe00e04285184112de7dbb977a75852197c.tar.gz gcc-8179efe00e04285184112de7dbb977a75852197c.tar.bz2 |
[AArch64] Prefer LD1RQ for big-endian SVE
This patch deals with cases in which a CONST_VECTOR contains a
repeating bit pattern that is wider than one element but narrower
than 128 bits. The current code:
* treats the repeating pattern as a single element
* uses the associated LD1R to load and replicate it (such as LD1RD
for 64-bit patterns)
* uses a subreg to cast the result back to the original vector type
The problem is that for big-endian targets, the final cast is
effectively a form of element reverse. E.g. say we're using LD1RD to load
16-bit elements, with h being the high parts and l being the low parts:
+-----+-----+-----+-----+-----+----
lanes | 0 | 1 | 2 | 3 | 4 | ...
+-----+-----+-----+-----+-----+----
memory bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 ....
+----------------------------------
V V V V V V V V
----------+-----------------------+
register .... | 0 |
after ----------+-----------------------+ lsb
LD1RD .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|
----------------------------------+
----+-----+-----+-----+-----+-----+
expected ... | 4 | 3 | 2 | 1 | 0 |
register ----+-----+-----+-----+-----+-----+ lsb
contents .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|
----------------------------------+
A later patch fixes the handling of general subregs to account
for this, but it means that we need to do a REV instruction
after the load. It seems better to use LD1RQ[BHW] on a 128-bit
pattern instead, since that gets the endianness right without
a separate fixup instruction.
2018-02-01 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
the TImode handling for big-endian targets.
gcc/testsuite/
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
of LD1R[HWD] for multi-element constants on big-endian targets.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
From-SVN: r257288
Diffstat (limited to 'gcc/config')
-rw-r--r-- | gcc/config/aarch64/aarch64.c | 16 |
1 files changed, 12 insertions, 4 deletions
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ae142b4..6296ffe 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx dest, rtx src) /* The constant is a repeating seqeuence of at least two elements, where the repeating elements occupy no more than 128 bits. Get an integer representation of the replicated value. */ - unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; - gcc_assert (int_bits <= 128); - - scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); + scalar_int_mode int_mode; + if (BYTES_BIG_ENDIAN) + /* For now, always use LD1RQ to load the value on big-endian + targets, since the handling of smaller integers includes a + subreg that is semantically an element reverse. */ + int_mode = TImode; + else + { + unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; + gcc_assert (int_bits <= 128); + int_mode = int_mode_for_size (int_bits, 0).require (); + } rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0); if (int_value && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value)) |