rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Daniel Bertalan <dani@danielbertalan.dev>	2024-07-22 19:06:43 +0200
committer	GitHub <noreply@github.com>	2024-07-22 19:06:43 +0200
commit	90569e02e63ff5d0915446919f564e9b3638fe2a (patch)
tree	47742731a4b869b83845d24c6a9adeb948c42548 /clang/lib/CodeGen/CodeGenModule.cpp
parent	1c798e0b077f062dbe56603021a9b67c7621ffe0 (diff)
download	llvm-90569e02e63ff5d0915446919f564e9b3638fe2a.zip llvm-90569e02e63ff5d0915446919f564e9b3638fe2a.tar.gz llvm-90569e02e63ff5d0915446919f564e9b3638fe2a.tar.bz2

[Support] Add Arm NEON implementation for `llvm::xxh3_64bits` (#99634)

Compared to the generic scalar code, using Arm NEON instructions yields a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple M1. This follows the upstream implementation closely, with some simplifications made: - Removed workarounds for suboptimal codegen on older GCC - Removed instruction reordering barriers which seem to have a negligible impact according to my measurements - We do not support WebAssembly's mostly NEON-compatible API - There is no configurable mixing of SIMD and scalar code; according to the upstream comments, this is only relevant for smaller Cortex cores which can dispatch relatively few NEON micro-ops per cycle. This commit intends to use only standard ACLE intrinsics and datatypes, so it should build with all supported versions of GCC, Clang and MSVC. This feature is enabled by default when targeting AArch64, but the `LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it. XXH3 is used for ICF, string deduplication and computing the UUID in ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement for a `--threads=8` link of Chromium.framework.

Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: