diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2017-08-10 17:00:38 +0100 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2017-08-10 17:00:38 +0100 |
commit | 922369032c604b4dcfd535e1bcddd4687e7126a5 (patch) | |
tree | 82779a2afc66f4ef2f2c9006f90a412bffaad23e /ChangeLog | |
parent | 2449ae7b2da24c9940962304a3e44bc80e389265 (diff) | |
download | glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.zip glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.gz glibc-922369032c604b4dcfd535e1bcddd4687e7126a5.tar.bz2 |
[AArch64] Optimized memcmp.
This is an optimized memcmp for AArch64. This is a complete rewrite
using a different algorithm. The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop. The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.
This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared. After the first 8 bytes,
align the first input. This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.
This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.
* sysdeps/aarch64/memcmp.S (memcmp):
Rewrite of optimized memcmp.
Diffstat (limited to 'ChangeLog')
-rw-r--r-- | ChangeLog | 5 |
1 files changed, 5 insertions, 0 deletions
@@ -1,3 +1,8 @@ +2017-08-10 Wilco Dijkstra <wdijkstr@arm.com> + + * sysdeps/aarch64/memcmp.S (memcmp): + Rewrite of optimized memcmp. + 2017-08-10 Florian Weimer <fweimer@redhat.com> Introduce ld.so exceptions. |