rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Leandro Lupori <leandro.lupori@linaro.org>	2025-07-04 08:49:51 -0300
committer	GitHub <noreply@github.com>	2025-07-04 08:49:51 -0300
commit	0ba59587fa98849ed5107fee4134e810e84b69a3 (patch)
tree	423f4c7922df67162d870ec86b54fcb689cf37b3 /flang/lib/Frontend/CompilerInvocation.cpp
parent	d17a248fc6147e52c56e1bf21affc7840dea9743 (diff)
download	llvm-0ba59587fa98849ed5107fee4134e810e84b69a3.zip llvm-0ba59587fa98849ed5107fee4134e810e84b69a3.tar.gz llvm-0ba59587fa98849ed5107fee4134e810e84b69a3.tar.bz2

[flang] Optimize assignments of multidimensional arrays (#146408)

Assignments of n-dimensional arrays, with trivial RHS, were always being converted to n nested loops. For contiguous arrays, it's possible to flatten them and use a single loop, that can usually be better optimized by LLVM. In a test program, using a 3-dimensional array and varying its size, the resulting speedup was as follows (measured on Graviton4): 16K 1.09 64K 1.40 128K 1.90 256K 1.91 512K 1.00 For sizes above or equal to 512K no improvement was observed. It looks like LLVM stops trying to perform aggressive loop unrolling at a certain threshold and just uses nested loops instead. Larger sizes won't fit on L1 and L2 caches too. This was noticed while profiling 527.cam4_r. This optimization makes aer_rad_props_sw slightly faster, but unfortunately it practically doesn't change 527.cam4_r total execution time.

Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: