diff options
author | Slava Zakharin <szakharin@nvidia.com> | 2024-08-06 08:23:21 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-08-06 08:23:21 -0700 |
commit | 9684c87d1402ea9327c1abd7f56bafed8e751f51 (patch) | |
tree | f20e90f18837f7ae185d63553dabd74ce623a274 /llvm/lib/IR/Module.cpp | |
parent | b809671a4184fb279abf7ae2f75ee9117c13dd60 (diff) | |
download | llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.zip llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.gz llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.bz2 |
[flang][runtime] Fixed performance regression in CopyElement. (#102081)
Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to
60% regression on x86 after #101421. The memcpy loops of the toAt and
fromAt arrays that are run to create the initial work item end up
being encoded as 'rep mov', and they add noticeable overhead
comparing to the total amount of work. 'rep mov' is not the best
choise for small size memcpy (e.g. when the array rank is 1 or 2,
it would be quite slow). Moreover, the rest of the stack related
setup is also noticeable for the simple cases.
I added a shortcut for the simple copy case, and also got rid
of the initial toAt/fromAt copies by allowing the CopyDescriptor
to use the external subscript storages.
Diffstat (limited to 'llvm/lib/IR/Module.cpp')
0 files changed, 0 insertions, 0 deletions