riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Richard Sandiford <richard.sandiford@arm.com>	2022-02-15 18:09:33 +0000
committer	Richard Sandiford <richard.sandiford@arm.com>	2022-02-15 18:09:33 +0000
commit	4963079769c99c4073adfd799885410ad484cbbe (patch)
tree	ec53951399724d809c6296a30e7a06e81d3e72a8 /gcc/fortran
parent	63a9328cb8c601377fe73e214b708c4ae0441847 (diff)
download	gcc-4963079769c99c4073adfd799885410ad484cbbe.zip gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.gz gcc-4963079769c99c4073adfd799885410ad484cbbe.tar.bz2

vect+aarch64: Fix ldp_stp_* regressions

ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since vectorisation was enabled at -O2. In all three cases SLP is generating vector code when scalar code would be better. The problem is that the target costs do not model whether STP could be used for the scalar or vector code, so the normal latency-based costs for store-heavy code can be way off. It would be good to fix that “properly” at some point, but it isn't easy; see the existing discussion in aarch64_sve_adjust_stmt_cost for more details. This patch therefore adds an on-the-side check for whether the code is doing nothing more than set-up+stores. It then applies STP-based costs to those cases only, in addition to the normal latency-based costs. (That is, the vector code has to win on both counts rather than on one count individually.) However, at the moment, SLP costs one vector set-up instruction for every vector in an SLP node, even if the contents are the same as a previous vector in the same node. Fixing the STP costs without fixing that would regress other cases, tested in the patch. The patch therefore makes the SLP costing code check for duplicates within a node. Ideally we'd check for duplicates more globally, but that would require a more global approach to costs: the cost of an initialisation should be amoritised across all trees that use the initialisation, rather than fully counted against one arbitrarily-chosen subtree. Back on aarch64: an earlier version of the patch tried to apply the new heuristic to constant stores. However, that didn't work too well in practice; see the comments for details. The patch therefore just tests the status quo for constant cases, leaving out a match if the current choice is dubious. ldp_stp_5.c was affected by the same thing. The test would be worth vectorising if we generated better vector code, but: (1) We do a bad job of moving the { -1, 1 } constant, given that we have { -1, -1 } and { 1, 1 } to hand. (2) The vector code has 6 pairable stores to misaligned offsets. We have peephole patterns to handle such misalignment for 4 pairable stores, but not 6. So the SLP decision isn't wrong as such. It's just being let down by later codegen. The patch therefore adds -mstrict-align to preserve the original intention of the test while adding ldp_stp_19.c to check for the preferred vector code (XFAILed for now). gcc/ * tree-vectorizer.h (vect_scalar_ops_slice): New struct. (vect_scalar_ops_slice_hash): Likewise. (vect_scalar_ops_slice::op): New function. * tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function. (vect_scalar_ops_slice_hash::hash): Likewise. (vect_scalar_ops_slice_hash::equal): Likewise. (vect_prologue_cost_for_slp): Check for duplicate vectors. * config/aarch64/aarch64.cc (aarch64_vector_costs::m_stp_sequence_cost): New member variable. (aarch64_aligned_constant_offset_p): New function. (aarch64_stp_sequence_cost): Likewise. (aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic. (aarch64_vector_costs::finish_cost): Likewise. gcc/testsuite/ * gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align. * gcc.target/aarch64/ldp_stp_14.h, * gcc.target/aarch64/ldp_stp_14.c: New test. * gcc.target/aarch64/ldp_stp_15.c: Likewise. * gcc.target/aarch64/ldp_stp_16.c: Likewise. * gcc.target/aarch64/ldp_stp_17.c: Likewise. * gcc.target/aarch64/ldp_stp_18.c: Likewise. * gcc.target/aarch64/ldp_stp_19.c: Likewise.

Diffstat (limited to 'gcc/fortran')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: