diff options
author | Sanjay Patel <spatel@rotateright.com> | 2019-04-23 15:20:17 +0000 |
---|---|---|
committer | Sanjay Patel <spatel@rotateright.com> | 2019-04-23 15:20:17 +0000 |
commit | 12a561fa1b79c449d518100e1dd0dfce0a37b65d (patch) | |
tree | 37307c19c1e8a91b664f27954e291e817c54d7cf /llvm/lib/Demangle/MicrosoftDemangleNodes.cpp | |
parent | 6e7cc49d5cb31ee09b07252b6641d7c94977fd12 (diff) | |
download | llvm-12a561fa1b79c449d518100e1dd0dfce0a37b65d.zip llvm-12a561fa1b79c449d518100e1dd0dfce0a37b65d.tar.gz llvm-12a561fa1b79c449d518100e1dd0dfce0a37b65d.tar.bz2 |
[x86] use psubus for more vsetcc lowering (PR39859)
Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1
...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'.
This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca.
We already do this transform for the SETULT predicate, so this makes the code more
symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect
anything but pre-SSE4.1 subtargets.
$ cat before.s
movdqa -16(%rip), %xmm2 ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
pxor %xmm0, %xmm2
pcmpgtw -32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
pand %xmm2, %xmm0
pandn %xmm1, %xmm2
por %xmm2, %xmm0
$ cat after.s
movdqa -16(%rip), %xmm2 ## xmm2 = [256,256,256,256,256,256,256,256]
psubusw %xmm0, %xmm2
pxor %xmm3, %xmm3
pcmpeqw %xmm2, %xmm3
pand %xmm3, %xmm0
pandn %xmm1, %xmm3
por %xmm3, %xmm0
$ llvm-mca before.s -mcpu=haswell
Iterations: 100
Instructions: 600
Total Cycles: 909
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 0.77
IPC: 0.66
Block RThroughput: 1.8
$ llvm-mca after.s -mcpu=haswell
Iterations: 100
Instructions: 700
Total Cycles: 409
Total uOps: 700
Dispatch Width: 4
uOps Per Cycle: 1.71
IPC: 1.71
Block RThroughput: 1.8
Differential Revision: https://reviews.llvm.org/D60838
llvm-svn: 358999
Diffstat (limited to 'llvm/lib/Demangle/MicrosoftDemangleNodes.cpp')
0 files changed, 0 insertions, 0 deletions