riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Sam Parker <sam.parker@arm.com>	2025-08-27 12:43:52 +0100
committer	GitHub <noreply@github.com>	2025-08-27 12:43:52 +0100
commit	7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce (patch)
tree	cede7a546fd32603e46a06c44e7915e782a40940 /clang/lib/AST/ByteCode/Compiler.cpp
parent	810ac29cfe81cbd8f2e97d06f0acd540f841c754 (diff)
download	llvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.zip llvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.tar.gz llvm-7b3e77f8d94c9abda1675c62f70caf12e3d7d5ce.tar.bz2

[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)

First pass where we calculate the cost of the memory operation, as well as the shuffles required. Interleaving by a factor of two should be relatively cheap, as many ISAs have dedicated instructions to perform the (de)interleaving. Several of these permutations can be combined for an interleave stride of 4 and this is the highest stride we allow. I've costed larger vectors, and more lanes, as more expensive because not only is more work is needed but the risk of codegen going 'wrong' rises dramatically. I also filled in a bit of cost modelling for vector stores. It appears the main vector plan to avoid is an interleave factor of 4 with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on AArch64, and observe geomean improvement of ~3% with some kernels improving 40-60%. I know there is still significant performance being left on the table, so this will need more development along with the rest of the cost model.

Diffstat (limited to 'clang/lib/AST/ByteCode/Compiler.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: