aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/AST/ByteCode/Compiler.cpp
diff options
context:
space:
mode:
authorSam Parker <sam.parker@arm.com>2025-08-27 12:43:52 +0100
committerGitHub <noreply@github.com>2025-08-27 12:43:52 +0100
commit7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce (patch)
treecede7a546fd32603e46a06c44e7915e782a40940 /clang/lib/AST/ByteCode/Compiler.cpp
parent810ac29cfe81cbd8f2e97d06f0acd540f841c754 (diff)
downloadllvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.zip
llvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.tar.gz
llvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.tar.bz2
[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)
First pass where we calculate the cost of the memory operation, as well as the shuffles required. Interleaving by a factor of two should be relatively cheap, as many ISAs have dedicated instructions to perform the (de)interleaving. Several of these permutations can be combined for an interleave stride of 4 and this is the highest stride we allow. I've costed larger vectors, and more lanes, as more expensive because not only is more work is needed but the risk of codegen going 'wrong' rises dramatically. I also filled in a bit of cost modelling for vector stores. It appears the main vector plan to avoid is an interleave factor of 4 with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on AArch64, and observe geomean improvement of ~3% with some kernels improving 40-60%. I know there is still significant performance being left on the table, so this will need more development along with the rest of the cost model.
Diffstat (limited to 'clang/lib/AST/ByteCode/Compiler.cpp')
0 files changed, 0 insertions, 0 deletions