riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	macurtis-amd <macurtis@amd.com>	2025-09-15 05:03:02 -0500
committer	GitHub <noreply@github.com>	2025-09-15 05:03:02 -0500
commit	2c091e6aec2d48fbcafc9cc5909a62f0321db1fd (patch)
tree	61d078c53ededd4784d11908c3aea2f7f2d508a1 /clang/lib/Frontend/CompilerInvocation.cpp
parent	d8c2607fb1f4094db18e7716764738f9bc8489df (diff)
download	llvm-2c091e6aec2d48fbcafc9cc5909a62f0321db1fd.zip llvm-2c091e6aec2d48fbcafc9cc5909a62f0321db1fd.tar.gz llvm-2c091e6aec2d48fbcafc9cc5909a62f0321db1fd.tar.bz2

AMDGPU: Report unaligned scratch access as fast if supported by tgt (#158036)

This enables more consecutive load folding during aggressive-instcombine. The original motivating example provided by Jeff Byrnes: https://godbolt.org/z/8ebcTEjTs Example provided by Nikita Popov: https://godbolt.org/z/Gv1j4vjqE as part of my original attempt to fix the issue (PR [#133301](https://github.com/llvm/llvm-project/pull/133301), see his [comment](https://github.com/llvm/llvm-project/pull/133301#issuecomment-2984905809)). This changes the value of `IsFast` returned by `In SITargetLowering::allowsMisalignedMemoryAccessesImpl` to be non-zero for private and flat addresses if the subtarget supports unaligned scratch accesses. This enables aggressive-instcombine to do more folding of consecutive loads (see [here](https://github.com/llvm/llvm-project/blob/cbd496581fb6953a9a8d8387a010cc3a67d4654b/llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp#L811)). Summary performance impact on [composable_kernel](https://github.com/ROCm/composable_kernel): |GPU|speedup (geomean*)| |---|---| |MI300A| 1.11| |MI300X| 1.14| |MI350X| 1.03| [*] Just to be clear, this is the geomean across kernels which were impacted by this change - not across all CK kernels.

Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: