diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2024-10-07 13:03:04 +0100 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2024-10-07 13:03:04 +0100 |
commit | 2abd04d01bc4e18158c785e75c91576b836f3ba6 (patch) | |
tree | 1756466222954fcb12ea6d5b55f7b436eddee3ca /gcc/gcov.cc | |
parent | 1732298d51028ae50a802e538df5d7249556255d (diff) | |
download | gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.zip gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.tar.gz gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.tar.bz2 |
vect: Restructure repeating_p case for SLP permutations [PR116583]
The repeating_p case previously handled the specific situation
in which the inputs have N lanes and the output has N lanes,
where N divides the number of vector elements. In that case,
every output uses the same permute vector.
The code was therefore structured so that the outer loop only
constructed one permute vector, with an inner loop generating
as many VEC_PERM_EXPRs from it as required.
However, the main patch for PR116583 adds support for cycling
through N permute vectors, rather than just having one.
The current structure doesn't really handle that case well.
(We'd need to interleave the results after generating them,
which sounds a bit fragile.)
This patch instead makes the transform phase calculate each output
vector's permutation explicitly, like for the !repeating_p path.
As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS.
This arguably undermines one of the justifications for using repeating_p
for constant-length vectors: that the repeating_p path involved less
work than the !repeating_p path. That justification does still hold for
the analysis phase, though, and that should be the more time-sensitive
part. And the other justification -- to get more coverage of the code --
still applies. So I'd prefer that we continue to use repeating_p for
constant-length vectors unless that causes a known missed optimisation.
gcc/
PR tree-optimization/116583
* tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove
the noutputs_per_mask inner loop and instead generate a
separate permute vector for each output.
Diffstat (limited to 'gcc/gcov.cc')
0 files changed, 0 insertions, 0 deletions