diff options
author | Nikita Popov <npopov@redhat.com> | 2024-05-16 10:21:22 +0900 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-05-16 10:21:22 +0900 |
commit | f0b3654701bde1cf7821d60698b42383edaff9f3 (patch) | |
tree | 439bfe564a1e2cf87e2bc48e3f54288a3ef3d4f1 /flang/lib/Frontend/CompilerInvocation.cpp | |
parent | 72200fcc346bee1830d9e640e42d717a55acd74c (diff) | |
download | llvm-f0b3654701bde1cf7821d60698b42383edaff9f3.zip llvm-f0b3654701bde1cf7821d60698b42383edaff9f3.tar.gz llvm-f0b3654701bde1cf7821d60698b42383edaff9f3.tar.bz2 |
[LoopUnroll] Clamp PartialThreshold for large LoopMicroOpBufferSize (#67657)
The znver3/znver4 scheduler models are outliers, specifying very large
LoopMicroOpBufferSizes at 512, while typical values for other subtargets
are on the order of ~50. Even if this information is
micro-architecturally correct (*), this does not mean that we want to
runtime unroll all loops to a size that completely fills the loop
buffer. Unless this is the single hot loop in the entire application,
the massive code size increase will bust the micro-op and instruction
caches.
Protect against this by clamping to the default PartialThreshold of 150,
which is the same as the default full-unroll threshold and half the
aggressive full-unroll threshold. Allowing more partial unrolling than
full unrolling certainly does not make sense.
(*) I strongly doubt that this is actually correct -- I believe this may
derive from an incorrect reading of Agner Fog's micro-architecture
guide. The number 4096 that was originally used here is the size of the
general micro-op cache, not that of a loop buffer. A separate loop
buffer is not listed for the Zen microarchitecture. Comparing this to
the listing for Skylake, it has a 1536 micro-op buffer, but only a 64
micro-op loopback buffer, with a note that it's rarely fully utilized.
Our scheduling model specifies LoopMicroOpBufferSize of 50 in that case.
Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions