rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Luke Lau <luke@igalia.com>	2025-05-26 18:45:12 +0100
committer	GitHub <noreply@github.com>	2025-05-26 18:45:12 +0100
commit	3033f202f6707937cd28c2473479db134993f96f (patch)
tree	2b43e9cefe27089460ee3f51553c1854a4e9cfff /clang/lib/Lex/ModuleMapFile.cpp
parent	841c8d48a62dc62bf8a23883225fd88d6848e45c (diff)
download	llvm-3033f202f6707937cd28c2473479db134993f96f.zip llvm-3033f202f6707937cd28c2473479db134993f96f.tar.gz llvm-3033f202f6707937cd28c2473479db134993f96f.tar.bz2

[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)

This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.

Diffstat (limited to 'clang/lib/Lex/ModuleMapFile.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: