diff options
author | Nicolas Vasilache <nicolas.vasilache@gmail.com> | 2021-11-22 10:22:37 +0000 |
---|---|---|
committer | Nicolas Vasilache <nicolas.vasilache@gmail.com> | 2021-11-22 10:32:34 +0000 |
commit | a9e236bed835c58be381dadb973a1db0681e4795 (patch) | |
tree | f49eaed687cba9eaedde7061518eba41bfe581ca /lldb/unittests/ScriptInterpreter/Python | |
parent | 4d21b64464ac548ec8442bc0d2a7e984ba78bd88 (diff) | |
download | llvm-a9e236bed835c58be381dadb973a1db0681e4795.zip llvm-a9e236bed835c58be381dadb973a1db0681e4795.tar.gz llvm-a9e236bed835c58be381dadb973a1db0681e4795.tar.bz2 |
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled:
```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths```
The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation.
This results in roughly 20% fewer cycles as reported by llvm-mca:
After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted):
```
Iterations: 100
Instructions: 5900
Total Cycles: 2415
Total uOps: 7300
Dispatch Width: 6
uOps Per Cycle: 3.02
IPC: 2.44
Block RThroughput: 24.0
Cycles with backend pressure increase [ 89.90% ]
Throughput Bottlenecks:
Resource Pressure [ 89.65% ]
- SKXPort1 [ 0.04% ]
- SKXPort2 [ 12.42% ]
- SKXPort3 [ 12.42% ]
- SKXPort5 [ 89.52% ]
Data Dependencies: [ 37.06% ]
- Register Dependencies [ 37.06% ]
- Memory Dependencies [ 0.00% ]
```
After this revision (inline_asm version, vblendps instructions are indeed emitted):
```
Iterations: 100
Instructions: 6300
Total Cycles: 2015
Total uOps: 7700
Dispatch Width: 6
uOps Per Cycle: 3.82
IPC: 3.13
Block RThroughput: 20.0
Cycles with backend pressure increase [ 83.47% ]
Throughput Bottlenecks:
Resource Pressure [ 83.18% ]
- SKXPort0 [ 14.49% ]
- SKXPort1 [ 14.54% ]
- SKXPort2 [ 19.70% ]
- SKXPort3 [ 19.70% ]
- SKXPort5 [ 83.03% ]
- SKXPort6 [ 14.49% ]
Data Dependencies: [ 39.75% ]
- Register Dependencies [ 39.75% ]
- Memory Dependencies [ 0.00% ]
```
An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0).
Reviewed By: ftynse, dcaballe
Differential Revision: https://reviews.llvm.org/D114335
Diffstat (limited to 'lldb/unittests/ScriptInterpreter/Python')
0 files changed, 0 insertions, 0 deletions