riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Richard Sandiford <richard.sandiford@arm.com>	2024-10-07 13:03:04 +0100
committer	Richard Sandiford <richard.sandiford@arm.com>	2024-10-07 13:03:04 +0100
commit	2abd04d01bc4e18158c785e75c91576b836f3ba6 (patch)
tree	1756466222954fcb12ea6d5b55f7b436eddee3ca /gcc/gcov.cc
parent	1732298d51028ae50a802e538df5d7249556255d (diff)
download	gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.zip gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.tar.gz gcc-2abd04d01bc4e18158c785e75c91576b836f3ba6.tar.bz2

vect: Restructure repeating_p case for SLP permutations [PR116583]

The repeating_p case previously handled the specific situation in which the inputs have N lanes and the output has N lanes, where N divides the number of vector elements. In that case, every output uses the same permute vector. The code was therefore structured so that the outer loop only constructed one permute vector, with an inner loop generating as many VEC_PERM_EXPRs from it as required. However, the main patch for PR116583 adds support for cycling through N permute vectors, rather than just having one. The current structure doesn't really handle that case well. (We'd need to interleave the results after generating them, which sounds a bit fragile.) This patch instead makes the transform phase calculate each output vector's permutation explicitly, like for the !repeating_p path. As a bonus, it gets rid of one use of SLP_TREE_NUMBER_OF_VEC_STMTS. This arguably undermines one of the justifications for using repeating_p for constant-length vectors: that the repeating_p path involved less work than the !repeating_p path. That justification does still hold for the analysis phase, though, and that should be the more time-sensitive part. And the other justification -- to get more coverage of the code -- still applies. So I'd prefer that we continue to use repeating_p for constant-length vectors unless that causes a known missed optimisation. gcc/ PR tree-optimization/116583 * tree-vect-slp.cc (vectorizable_slp_permutation_1): Remove the noutputs_per_mask inner loop and instead generate a separate permute vector for each output.

Diffstat (limited to 'gcc/gcov.cc')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: