diff options
author | Luke Lau <luke@igalia.com> | 2025-05-26 18:45:12 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-05-26 18:45:12 +0100 |
commit | 3033f202f6707937cd28c2473479db134993f96f (patch) | |
tree | 2b43e9cefe27089460ee3f51553c1854a4e9cfff /clang/lib/Lex/ModuleMapFile.cpp | |
parent | 841c8d48a62dc62bf8a23883225fd88d6848e45c (diff) | |
download | llvm-3033f202f6707937cd28c2473479db134993f96f.zip llvm-3033f202f6707937cd28c2473479db134993f96f.tar.gz llvm-3033f202f6707937cd28c2473479db134993f96f.tar.bz2 |
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every
interleaved memory operation supported by the in-tree targets can be
represented by a single intrinsic.
For context, [de]interleaves of fixed-length vectors are represented by
a series of shufflevectors. The intrinsics are needed for scalable
vectors, and we don't currently scalably vectorize all possible factors
of interleave groups supported by RISC-V/AArch64.
The underlying reason for this is that higher factors are currently
represented by interleaving multiple interleaves themselves, which made
sense at the time in the discussion in
https://github.com/llvm/llvm-project/pull/89018.
But after trying to integrate these for higher factors on RISC-V I think
we should revisit this design choice:
- Matching these in InterleavedAccessPass is non-trivial: We currently
only support factors that are a power of 2, and detecting this requires
a good chunk of code
- The shufflevector masks used for [de]interleaves of fixed-length
vectors are much easier to pattern match as they are strided patterns,
but for the intrinsics it's much more complicated to match as the
structure is a tree.
- Unlike shufflevectors, there's no optimisation that happens on
[de]interleave2 intriniscs
- For non-power-of-2 factors e.g. 6, there are multiple possible ways a
[de]interleave could be represented, see the discussion in #139373
- We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8
we're not really saving much
By representing these higher factors are interleaved-interleaves, we can
in theory support arbitrarily high interleave factors. However I'm not
sure this is actually needed in practice: SVE only has instructions
for factors 2,3,4, whilst RVV only supports up to factor 8.
This patch would make it much easier to support scalable interleaved
accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as
the loop vectorizer and InterleavedAccessPass wouldn't need to
construct and match trees of interleaves.
For interleave factors above 8, for which there are no hardware memory
operations to match in the InterleavedAccessPass, we can still keep the
wide load + recursive interleaving in the loop vectorizer.
Diffstat (limited to 'clang/lib/Lex/ModuleMapFile.cpp')
0 files changed, 0 insertions, 0 deletions