aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CodeGenModule.cpp
diff options
context:
space:
mode:
authorDaniel Bertalan <dani@danielbertalan.dev>2024-07-22 19:06:43 +0200
committerGitHub <noreply@github.com>2024-07-22 19:06:43 +0200
commit90569e02e63ff5d0915446919f564e9b3638fe2a (patch)
tree47742731a4b869b83845d24c6a9adeb948c42548 /clang/lib/CodeGen/CodeGenModule.cpp
parent1c798e0b077f062dbe56603021a9b67c7621ffe0 (diff)
downloadllvm-90569e02e63ff5d0915446919f564e9b3638fe2a.zip
llvm-90569e02e63ff5d0915446919f564e9b3638fe2a.tar.gz
llvm-90569e02e63ff5d0915446919f564e9b3638fe2a.tar.bz2
[Support] Add Arm NEON implementation for `llvm::xxh3_64bits` (#99634)
Compared to the generic scalar code, using Arm NEON instructions yields a ~11x speedup: 31 vs 339.5 ms to hash 1 GiB of random data on the Apple M1. This follows the upstream implementation closely, with some simplifications made: - Removed workarounds for suboptimal codegen on older GCC - Removed instruction reordering barriers which seem to have a negligible impact according to my measurements - We do not support WebAssembly's mostly NEON-compatible API - There is no configurable mixing of SIMD and scalar code; according to the upstream comments, this is only relevant for smaller Cortex cores which can dispatch relatively few NEON micro-ops per cycle. This commit intends to use only standard ACLE intrinsics and datatypes, so it should build with all supported versions of GCC, Clang and MSVC. This feature is enabled by default when targeting AArch64, but the `LLVM_XXH_USE_NEON=0` macro can be set to explicitly disable it. XXH3 is used for ICF, string deduplication and computing the UUID in ld64.lld; this commit results in a -1.77% +/- 0.59% speed improvement for a `--threads=8` link of Chromium.framework.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')
0 files changed, 0 insertions, 0 deletions