aboutsummaryrefslogtreecommitdiff
path: root/flang/lib/Frontend/CompilerInvocation.cpp
diff options
context:
space:
mode:
authorLeandro Lupori <leandro.lupori@linaro.org>2025-07-04 08:49:51 -0300
committerGitHub <noreply@github.com>2025-07-04 08:49:51 -0300
commit0ba59587fa98849ed5107fee4134e810e84b69a3 (patch)
tree423f4c7922df67162d870ec86b54fcb689cf37b3 /flang/lib/Frontend/CompilerInvocation.cpp
parentd17a248fc6147e52c56e1bf21affc7840dea9743 (diff)
downloadllvm-0ba59587fa98849ed5107fee4134e810e84b69a3.zip
llvm-0ba59587fa98849ed5107fee4134e810e84b69a3.tar.gz
llvm-0ba59587fa98849ed5107fee4134e810e84b69a3.tar.bz2
[flang] Optimize assignments of multidimensional arrays (#146408)
Assignments of n-dimensional arrays, with trivial RHS, were always being converted to n nested loops. For contiguous arrays, it's possible to flatten them and use a single loop, that can usually be better optimized by LLVM. In a test program, using a 3-dimensional array and varying its size, the resulting speedup was as follows (measured on Graviton4): 16K 1.09 64K 1.40 128K 1.90 256K 1.91 512K 1.00 For sizes above or equal to 512K no improvement was observed. It looks like LLVM stops trying to perform aggressive loop unrolling at a certain threshold and just uses nested loops instead. Larger sizes won't fit on L1 and L2 caches too. This was noticed while profiling 527.cam4_r. This optimization makes aer_rad_props_sw slightly faster, but unfortunately it practically doesn't change 527.cam4_r total execution time.
Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions