diff options
author | Madhur Amilkanthwar <madhura@nvidia.com> | 2025-01-21 10:49:19 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-01-21 10:49:19 +0530 |
commit | 5d281a480e5caae09962b863960d7d057e908a3c (patch) | |
tree | 218e4e0b472d139c88c66dd181b691f0e5b68f74 /flang/lib/Frontend/CompilerInvocation.cpp | |
parent | afced70e697e66fb6920b53d489d3fa4498e22dc (diff) | |
download | llvm-5d281a480e5caae09962b863960d7d057e908a3c.zip llvm-5d281a480e5caae09962b863960d7d057e908a3c.tar.gz llvm-5d281a480e5caae09962b863960d7d057e908a3c.tar.bz2 |
[LoopInterchange] Constrain number of load/stores in a loop (#118973)
In the current state of the code, the transform computes entries for the
dependency matrix until `MaxMemInstrCount` which is 100. After 99th
entry, it terminates and thus overall wastes compile-time.
It would be nice if we can compute total number of entries upfront and
early exit if the number of entries > 100. However, computing the number
of entries is not always possible as it depends on two factors:
1. Number of load-store pairs in a loop.
2. Number of common loop levels for each of the pair.
This patch constrains the whole computation on the number of loads and
stores instructions in the loop.
In another approach, I experimented with computing 1 and constraining
the number of pairs, but that did not lead to any additional benefit in
terms of compile time. However, when other issues are fixed, I can
revisit this approach.
Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions