diff options
| author | Slava Zakharin <szakharin@nvidia.com> | 2025-01-15 08:42:57 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-01-15 08:42:57 -0800 |
| commit | 3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51 (patch) | |
| tree | 8a291f2ab3cfd8be97657681673e4133b85e0efa /lldb/source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPythonImpl.h | |
| parent | 2bfa7bc570d530d2f8aec02ada6f11d1a2459805 (diff) | |
| download | llvm-3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51.zip llvm-3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51.tar.gz llvm-3bb969f3ebb25037e8eb69c30a5a0dfb5d9d0f51.tar.bz2 | |
[flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
* Get rid of any overhead related to calling runtime MATMUL
(such as descriptors creation).
* Use CPU-specific vectorization cost model for matmul loops,
which Fortran runtime cannot currently do.
* Optimize matmul of known-size arrays by complete unrolling.
One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.
Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.
At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.
This patch improves performance of galgel and tonto a little bit.
Diffstat (limited to 'lldb/source/Plugins/ScriptInterpreter/Python/ScriptInterpreterPythonImpl.h')
0 files changed, 0 insertions, 0 deletions
