diff options
author | Guillaume Chatelet <gchatelet@google.com> | 2024-10-22 10:48:43 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-10-22 10:48:43 +0200 |
commit | 2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721 (patch) | |
tree | fb5136243a6c9ed8edc65a56d4344cef460ca6ba /llvm/lib/CodeGen/CodeGen.cpp | |
parent | 9ae41c24b37f5ce22c5b5a2f3bc0680aaf174f35 (diff) | |
download | llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.zip llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.gz llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.bz2 |
[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` (#113161)
When using `-mprefer-vector-width=128` with `-march=sandybridge` copying
3 cache lines in one go (192B) gets converted into `rep;movsb` which
translate into a 60% hit in performance.
Consecutive calls to `__builtin_memcpy_inline` (implementation behind
`builtin::Memcpy::block_offset`) are not coalesced by the compiler and
so calling it three times in a row generates the desired assembly. It
only differs in the interleaving of the loads and stores and does not
affect performance.
This is needed to reland
https://github.com/llvm/llvm-project/pull/108939.
Diffstat (limited to 'llvm/lib/CodeGen/CodeGen.cpp')
0 files changed, 0 insertions, 0 deletions