aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CodeGenModule.cpp
diff options
context:
space:
mode:
authorSlava Zakharin <szakharin@nvidia.com>2024-08-06 08:23:21 -0700
committerGitHub <noreply@github.com>2024-08-06 08:23:21 -0700
commit9684c87d1402ea9327c1abd7f56bafed8e751f51 (patch)
treef20e90f18837f7ae185d63553dabd74ce623a274 /clang/lib/CodeGen/CodeGenModule.cpp
parentb809671a4184fb279abf7ae2f75ee9117c13dd60 (diff)
downloadllvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.zip
llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.gz
llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.bz2
[flang][runtime] Fixed performance regression in CopyElement. (#102081)
Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to 60% regression on x86 after #101421. The memcpy loops of the toAt and fromAt arrays that are run to create the initial work item end up being encoded as 'rep mov', and they add noticeable overhead comparing to the total amount of work. 'rep mov' is not the best choise for small size memcpy (e.g. when the array rank is 1 or 2, it would be quite slow). Moreover, the rest of the stack related setup is also noticeable for the simple cases. I added a shortcut for the simple copy case, and also got rid of the initial toAt/fromAt copies by allowing the CopyDescriptor to use the external subscript storages.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')
0 files changed, 0 insertions, 0 deletions