riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Slava Zakharin <szakharin@nvidia.com>	2024-08-06 08:23:21 -0700
committer	GitHub <noreply@github.com>	2024-08-06 08:23:21 -0700
commit	9684c87d1402ea9327c1abd7f56bafed8e751f51 (patch)
tree	f20e90f18837f7ae185d63553dabd74ce623a274 /llvm/lib/IR/Module.cpp
parent	b809671a4184fb279abf7ae2f75ee9117c13dd60 (diff)
download	llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.zip llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.gz llvm-9684c87d1402ea9327c1abd7f56bafed8e751f51.tar.bz2

[flang][runtime] Fixed performance regression in CopyElement. (#102081)

Polyhedron/capacita,protein and CPU2000/facerec,wupwise showed up to 60% regression on x86 after #101421. The memcpy loops of the toAt and fromAt arrays that are run to create the initial work item end up being encoded as 'rep mov', and they add noticeable overhead comparing to the total amount of work. 'rep mov' is not the best choise for small size memcpy (e.g. when the array rank is 1 or 2, it would be quite slow). Moreover, the rest of the stack related setup is also noticeable for the simple cases. I added a shortcut for the simple copy case, and also got rid of the initial toAt/fromAt copies by allowing the CopyDescriptor to use the external subscript storages.

Diffstat (limited to 'llvm/lib/IR/Module.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: