diff options
author | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2025-06-16 10:17:37 -0300 |
---|---|---|
committer | Adhemerval Zanella <adhemerval.zanella@linaro.org> | 2025-07-11 13:01:31 -0300 |
commit | c055c54e960579619304c7fb998e6bc12e82c5bd (patch) | |
tree | c4db98a12d980896de92f1645478f883093721da /malloc/tst-malloc-alternate-path.c | |
parent | 3d3572f59059e2b19b8541ea648a6172136ec42e (diff) | |
download | glibc-master.zip glibc-master.tar.gz glibc-master.tar.bz2 |
The SSE4.1 provides a direct instruction for trunc, which improves
modf/modff performance with a less text size. On Ryzen 9 (zen3) with
gcc 14.2.1:
x86_64-v2
reciprocal-throughput master patch difference
workload-0_1 7.9610 7.7914 2.13%
workload-1_maxint 9.4323 7.8021 17.28%
workload-maxint_maxfloat 8.7379 7.8049 10.68%
workload-integral 7.9492 7.7991 1.89%
latency master patch difference
workload-0_1 7.9511 10.8910 -36.97%
workload-1_maxint 15.8278 10.9048 31.10%
workload-maxint_maxfloat 11.3495 10.9139 3.84%
workload-integral 11.5938 10.9071 5.92%
x86_64-v3
reciprocal-throughput master patch difference
workload-0_1 8.7522 7.9781 8.84%
workload-1_maxint 9.6690 7.9872 17.39%
workload-maxint_maxfloat 8.7634 7.9857 8.87%
workload-integral 8.7397 7.9893 8.59%
latency master patch difference
workload-0_1 8.7447 9.5589 -9.31%
workload-1_maxint 13.7480 9.5690 30.40%
workload-maxint_maxfloat 10.0092 9.5680 4.41%
workload-integral 9.7518 9.5743 1.82%
For x86_64-v1 the optimization is done through a new ifunc selector.
The avx is to follow other SSE4_1 optimization (like trunc) to avoid
the ifunc for x86_64-v3.
Checked on x86_64-linux-gnu.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Diffstat (limited to 'malloc/tst-malloc-alternate-path.c')
0 files changed, 0 insertions, 0 deletions