riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Guillaume Chatelet <gchatelet@google.com>	2024-10-22 10:48:43 +0200
committer	GitHub <noreply@github.com>	2024-10-22 10:48:43 +0200
commit	2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721 (patch)
tree	fb5136243a6c9ed8edc65a56d4344cef460ca6ba /llvm/lib/CodeGen/CodeGen.cpp
parent	9ae41c24b37f5ce22c5b5a2f3bc0680aaf174f35 (diff)
download	llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.zip llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.gz llvm-2f58ac4a22baa27c1e9aad1b3c6d5c687ef03721.tar.bz2

[libc][x86] copy one cache line at a time to prevent the use of `rep;movsb` (#113161)

When using `-mprefer-vector-width=128` with `-march=sandybridge` copying 3 cache lines in one go (192B) gets converted into `rep;movsb` which translate into a 60% hit in performance. Consecutive calls to `__builtin_memcpy_inline` (implementation behind `builtin::Memcpy::block_offset`) are not coalesced by the compiler and so calling it three times in a row generates the desired assembly. It only differs in the interleaving of the loads and stores and does not affect performance. This is needed to reland https://github.com/llvm/llvm-project/pull/108939.

Diffstat (limited to 'llvm/lib/CodeGen/CodeGen.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: