diff options
author | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-10-29 15:19:59 -0500 |
---|---|---|
committer | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-11-08 19:22:08 -0800 |
commit | 2d2493a644092dd3d32dd3b8f6aa3adad351db3c (patch) | |
tree | 563bf95352ce800331014c64c5270b5effa6812e /sysdeps/unix | |
parent | 419c832aba43276e285586998261d1db06033193 (diff) | |
download | glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.zip glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.tar.gz glibc-2d2493a644092dd3d32dd3b8f6aa3adad351db3c.tar.bz2 |
x86: Use VMM API in memcmpeq-evex.S and minor changes
Changes to generated code are:
1. In a few places use `vpcmpeqb` instead of `vpcmpneq` to save a
byte of code size.
2. Add a branch for length <= (VEC_SIZE * 6) as opposed to doing
the entire block of [VEC_SIZE * 4 + 1, VEC_SIZE * 8] in a
single basic-block (the space to add the extra branch without
changing code size is bought with the above change).
Change (2) has roughly a 20-25% speedup for sizes in [VEC_SIZE * 4 +
1, VEC_SIZE * 6] and negligible to no-cost for [VEC_SIZE * 6 + 1,
VEC_SIZE * 8]
From N=10 runs on Tigerlake:
align1,align2 ,length ,result ,New Time ,Cur Time ,New Time / Old Time
0 ,0 ,129 ,0 ,5.404 ,6.887 ,0.785
0 ,0 ,129 ,1 ,5.308 ,6.826 ,0.778
0 ,0 ,129 ,18446744073709551615 ,5.359 ,6.823 ,0.785
0 ,0 ,161 ,0 ,5.284 ,6.827 ,0.774
0 ,0 ,161 ,1 ,5.317 ,6.745 ,0.788
0 ,0 ,161 ,18446744073709551615 ,5.406 ,6.778 ,0.798
0 ,0 ,193 ,0 ,6.804 ,6.802 ,1.000
0 ,0 ,193 ,1 ,6.950 ,6.754 ,1.029
0 ,0 ,193 ,18446744073709551615 ,6.792 ,6.719 ,1.011
0 ,0 ,225 ,0 ,6.625 ,6.699 ,0.989
0 ,0 ,225 ,1 ,6.776 ,6.735 ,1.003
0 ,0 ,225 ,18446744073709551615 ,6.758 ,6.738 ,0.992
0 ,0 ,256 ,0 ,5.402 ,5.462 ,0.989
0 ,0 ,256 ,1 ,5.364 ,5.483 ,0.978
0 ,0 ,256 ,18446744073709551615 ,5.341 ,5.539 ,0.964
Rewriting with VMM API allows for memcmpeq-evex to be used with
evex512 by including "x86-evex512-vecs.h" at the top.
Complete check passes on x86-64.
Diffstat (limited to 'sysdeps/unix')
0 files changed, 0 insertions, 0 deletions