aboutsummaryrefslogtreecommitdiff
path: root/lldb/unittests/ScriptInterpreter/Python
diff options
context:
space:
mode:
authorNicolas Vasilache <nicolas.vasilache@gmail.com>2021-11-22 10:22:37 +0000
committerNicolas Vasilache <nicolas.vasilache@gmail.com>2021-11-22 10:32:34 +0000
commita9e236bed835c58be381dadb973a1db0681e4795 (patch)
treef49eaed687cba9eaedde7061518eba41bfe581ca /lldb/unittests/ScriptInterpreter/Python
parent4d21b64464ac548ec8442bc0d2a7e984ba78bd88 (diff)
downloadllvm-a9e236bed835c58be381dadb973a1db0681e4795.zip
llvm-a9e236bed835c58be381dadb973a1db0681e4795.tar.gz
llvm-a9e236bed835c58be381dadb973a1db0681e4795.tar.bz2
[mlir][Vector] Add a vblendps-based impl for transpose8x8 (both intrin and inline_asm)
This revision follows up on the conversation titled: ```[llvm-dev] Understanding and controlling some of the AVX shuffle emission paths``` The revision adds a vblendps-based implementation for transpose8x8 and further distinguishes between and intrinsics and an inline_asm implementation. This results in roughly 20% fewer cycles as reported by llvm-mca: After this revision (intrinsic version, resolves to virtually identical assembly as per the llvm-dev discussion, no vblendps instruction is emitted): ``` Iterations: 100 Instructions: 5900 Total Cycles: 2415 Total uOps: 7300 Dispatch Width: 6 uOps Per Cycle: 3.02 IPC: 2.44 Block RThroughput: 24.0 Cycles with backend pressure increase [ 89.90% ] Throughput Bottlenecks: Resource Pressure [ 89.65% ] - SKXPort1 [ 0.04% ] - SKXPort2 [ 12.42% ] - SKXPort3 [ 12.42% ] - SKXPort5 [ 89.52% ] Data Dependencies: [ 37.06% ] - Register Dependencies [ 37.06% ] - Memory Dependencies [ 0.00% ] ``` After this revision (inline_asm version, vblendps instructions are indeed emitted): ``` Iterations: 100 Instructions: 6300 Total Cycles: 2015 Total uOps: 7700 Dispatch Width: 6 uOps Per Cycle: 3.82 IPC: 3.13 Block RThroughput: 20.0 Cycles with backend pressure increase [ 83.47% ] Throughput Bottlenecks: Resource Pressure [ 83.18% ] - SKXPort0 [ 14.49% ] - SKXPort1 [ 14.54% ] - SKXPort2 [ 19.70% ] - SKXPort3 [ 19.70% ] - SKXPort5 [ 83.03% ] - SKXPort6 [ 14.49% ] Data Dependencies: [ 39.75% ] - Register Dependencies [ 39.75% ] - Memory Dependencies [ 0.00% ] ``` An accessible copy of the conversation is available [here](https://gist.github.com/nicolasvasilache/68c7f34012584b0e00f335bcb374ede0). Reviewed By: ftynse, dcaballe Differential Revision: https://reviews.llvm.org/D114335
Diffstat (limited to 'lldb/unittests/ScriptInterpreter/Python')
0 files changed, 0 insertions, 0 deletions