aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/CodeGen.cpp
diff options
context:
space:
mode:
authorGuillaume Chatelet <gchatelet@google.com>2024-10-22 10:48:43 +0200
committerGitHub <noreply@github.com>2024-10-22 10:48:43 +0200
commit2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721 (patch)
treefb5136243a6c9ed8edc65a56d4344cef460ca6ba /llvm/lib/CodeGen/CodeGen.cpp
parent9ae41c24b37f5ce22c5b5a2f3bc0680aaf174f35 (diff)
downloadllvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.zip
llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.gz
llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.bz2
[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` (#113161)
When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland https://github.com/llvm/llvm-project/pull/108939.
Diffstat (limited to 'llvm/lib/CodeGen/CodeGen.cpp')
0 files changed, 0 insertions, 0 deletions