aboutsummaryrefslogtreecommitdiff
path: root/ChangeLog
diff options
context:
space:
mode:
authorH.J. Lu <hjl.tools@gmail.com>2017-03-21 10:59:31 -0700
committerH.J. Lu <hjl.tools@gmail.com>2017-03-21 11:00:12 -0700
commitc15f8eb50cea7ad1a4ccece6e0982bf426d52c00 (patch)
treeda3251690c3d1f035acebce5350b0a5a0b33cc00 /ChangeLog
parenta640393a18329ef4044bf9213f6466cd2d1e69f3 (diff)
downloadglibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.zip
glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.tar.gz
glibc-c15f8eb50cea7ad1a4ccece6e0982bf426d52c00.tar.bz2
x86-64: Improve branch predication in _dl_runtime_resolve_avx512_opt [BZ #21258]
On Skylake server, _dl_runtime_resolve_avx512_opt is used to preserve the first 8 vector registers. The code layout is if only %xmm0 - %xmm7 registers are used preserve %xmm0 - %xmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %zmm0 - %zmm7 registers Branch predication always executes the fallthrough code path to preserve %zmm0 - %zmm7 registers speculatively, even though only %xmm0 - %xmm7 registers are used. This leads to lower CPU frequency on Skylake server. This patch changes the fallthrough code path to preserve %xmm0 - %xmm7 registers instead: if whole %zmm0 - %zmm7 registers are used preserve %zmm0 - %zmm7 registers if only %ymm0 - %ymm7 registers are used preserve %ymm0 - %ymm7 registers preserve %xmm0 - %xmm7 registers Tested on Skylake server. [BZ #21258] * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt): Define only if _dl_runtime_resolve is defined to _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt): Fallthrough to _dl_runtime_resolve_sse_vex.
Diffstat (limited to 'ChangeLog')
-rw-r--r--ChangeLog9
1 files changed, 9 insertions, 0 deletions
diff --git a/ChangeLog b/ChangeLog
index 3ed2df8..1367e8e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,12 @@
+2017-03-21 H.J. Lu <hongjiu.lu@intel.com>
+
+ [BZ #21258]
+ * sysdeps/x86_64/dl-trampoline.S (_dl_runtime_resolve_opt):
+ Define only if _dl_runtime_resolve is defined to
+ _dl_runtime_resolve_sse_vex.
+ * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_opt):
+ Fallthrough to _dl_runtime_resolve_sse_vex.
+
2017-03-21 Joseph Myers <joseph@codesourcery.com>
* INSTALL: Regenerated.