aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Lex/ModuleMapFile.cpp
diff options
context:
space:
mode:
authorLuke Lau <luke@igalia.com>2025-05-26 18:45:12 +0100
committerGitHub <noreply@github.com>2025-05-26 18:45:12 +0100
commit3033f202f6707937cd28c2473479db134993f96f (patch)
tree2b43e9cefe27089460ee3f51553c1854a4e9cfff /clang/lib/Lex/ModuleMapFile.cpp
parent841c8d48a62dc62bf8a23883225fd88d6848e45c (diff)
downloadllvm-3033f202f6707937cd28c2473479db134993f96f.zip
llvm-3033f202f6707937cd28c2473479db134993f96f.tar.gz
llvm-3033f202f6707937cd28c2473479db134993f96f.tar.bz2
[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)
This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.
Diffstat (limited to 'clang/lib/Lex/ModuleMapFile.cpp')
0 files changed, 0 insertions, 0 deletions