riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Fabian Ritter <fabian.ritter@amd.com>	2024-10-28 09:04:19 +0100
committer	GitHub <noreply@github.com>	2024-10-28 09:04:19 +0100
commit	a4fd3dba6e285734bc635b0651a30dfeffedeada (patch)
tree	8534b669c66318b8e867065b74f1a0182201d54a /llvm/lib/CodeGen/MachineOperand.cpp
parent	35f6cc6af09f48f9038fce632815a2ad6ffe8689 (diff)
download	llvm-a4fd3dba6e285734bc635b0651a30dfeffedeada.zip llvm-a4fd3dba6e285734bc635b0651a30dfeffedeada.tar.gz llvm-a4fd3dba6e285734bc635b0651a30dfeffedeada.tar.bz2

[AMDGPU] Use wider loop lowering type for LowerMemIntrinsics (#112332)

When llvm.memcpy or llvm.memmove intrinsics are lowered as a loop in LowerMemIntrinsics.cpp, the loop consists of a single load/store pair per iteration. We can improve performance in some cases by emitting multiple load/store pairs per iteration. This patch achieves that by increasing the width of the loop lowering type in the GCN target and letting legalization split the resulting too-wide access pairs into multiple legal access pairs. This change only affects lowered memcpys and memmoves with large (>= 1024 bytes) constant lengths. Smaller constant lengths are handled by ISel directly; non-constant lengths would be slowed down by this change if the dynamic length was smaller or slightly larger than what an unrolled iteration copies. The chosen default unroll factor is the result of microbenchmarks on gfx1030. This change leads to speedups of 15-38% for global memory and 1.9-5.8x for scratch in these microbenchmarks. Part of SWDEV-455845.

Diffstat (limited to 'llvm/lib/CodeGen/MachineOperand.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: