diff options
author | Andy Polyakov <appro@openssl.org> | 2017-07-20 09:48:35 +0200 |
---|---|---|
committer | Andy Polyakov <appro@openssl.org> | 2017-07-21 14:07:32 +0200 |
commit | 64d92d74985ebb3d0be58a9718f9e080a14a8e7f (patch) | |
tree | 036456c8d139587371300824d273d1c500411d1a /crypto/chacha | |
parent | bbb4ceb86eb6ea0300f744443c36fb6e980fff9d (diff) | |
download | openssl-64d92d74985ebb3d0be58a9718f9e080a14a8e7f.zip openssl-64d92d74985ebb3d0be58a9718f9e080a14a8e7f.tar.gz openssl-64d92d74985ebb3d0be58a9718f9e080a14a8e7f.tar.bz2 |
x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
"Optimize" is in quotes because it's rather a "salvage operation"
for now. Idea is to identify processor capability flags that
drive Knights Landing to suboptimial code paths and mask them.
Two flags were identified, XSAVE and ADCX/ADOX. Former affects
choice of AES-NI code path specific for Silvermont (Knights Landing
is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are
effectively mishandled at decode time. In both cases we are looking
at ~2x improvement.
AVX-512 results cover even Skylake-X :-)
Hardware used for benchmarking courtesy of Atos, experiments run by
Romain Dolbeau <romain.dolbeau@atos.net>. Kudos!
Reviewed-by: Rich Salz <rsalz@openssl.org>
Diffstat (limited to 'crypto/chacha')
-rwxr-xr-x | crypto/chacha/asm/chacha-x86_64.pl | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/crypto/chacha/asm/chacha-x86_64.pl b/crypto/chacha/asm/chacha-x86_64.pl index e2c6a32..0cfe899 100755 --- a/crypto/chacha/asm/chacha-x86_64.pl +++ b/crypto/chacha/asm/chacha-x86_64.pl @@ -24,7 +24,7 @@ # # Performance in cycles per byte out of large buffer. # -# IALU/gcc 4.8(i) 1xSSSE3/SSE2 4xSSSE3 8xAVX2 +# IALU/gcc 4.8(i) 1xSSSE3/SSE2 4xSSSE3 NxAVX(v) # # P4 9.48/+99% -/22.7(ii) - # Core2 7.83/+55% 7.90/8.08 4.35 @@ -32,8 +32,9 @@ # Sandy Bridge 8.31/+42% 5.45/6.76 2.72 # Ivy Bridge 6.71/+46% 5.40/6.49 2.41 # Haswell 5.92/+43% 5.20/6.45 2.42 1.23 -# Skylake 5.87/+39% 4.70/- 2.31 1.19 +# Skylake[-X] 5.87/+39% 4.70/- 2.31 1.19[0.57] # Silvermont 12.0/+33% 7.75/7.40 7.03(iii) +# Knights L 11.7/- - 9.60(iii) 0.80 # Goldmont 10.6/+17% 5.10/- 3.28 # Sledgehammer 7.28/+52% -/14.2(ii) - # Bulldozer 9.66/+28% 9.85/11.1 3.06(iv) @@ -50,6 +51,7 @@ # limitations, SSE2 can do better, but gain is considered too # low to justify the [maintenance] effort; # (iv) Bulldozer actually executes 4xXOP code path that delivers 2.20; +# (v) 8xAVX2 or 16xAVX-512, whichever best applicable; $flavour = shift; $output = shift; |