diff options
author | Hongyu Wang <hongyu.wang@intel.com> | 2021-11-12 10:50:46 +0800 |
---|---|---|
committer | Hongyu Wang <hongyu.wang@intel.com> | 2021-11-15 19:09:38 +0800 |
commit | 4d281ff7ddd8f6365943c0a622107f92315bb8a6 (patch) | |
tree | e53e1de41e5ef5299c57ed2e5c6e4d6287e37f2e /gcc/cp/parser.c | |
parent | d1ca8aeaf34a717dffd8f4a1f0333d25c7d1c904 (diff) | |
download | gcc-4d281ff7ddd8f6365943c0a622107f92315bb8a6.zip gcc-4d281ff7ddd8f6365943c0a622107f92315bb8a6.tar.gz gcc-4d281ff7ddd8f6365943c0a622107f92315bb8a6.tar.bz2 |
PR target/103069: Relax cmpxchg loop for x86 target
From the CPU's point of view, getting a cache line for writing is more
expensive than reading. See Appendix A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/
xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and causes
excessive cache line bouncing.
The atomic_fetch_{or,xor,and,nand} builtins generates cmpxchg loop under
-march=x86-64 like:
movl v(%rip), %eax
.L2:
movl %eax, %ecx
movl %eax, %edx
orl $1, %ecx
lock cmpxchgl %ecx, v(%rip)
jne .L2
movl %edx, %eax
andl $1, %eax
ret
To relax above loop, GCC should first emit a normal load, check and jump to
.L2 if cmpxchgl may fail. Before jump to .L2, PAUSE should be inserted to
yield the CPU to another hyperthread and to save power, so the code is
like
.L84:
movl (%rdi), %ecx
movl %eax, %edx
orl %esi, %edx
cmpl %eax, %ecx
jne .L82
lock cmpxchgl %edx, (%rdi)
jne .L84
.L82:
rep nop
jmp .L84
This patch adds corresponding atomic_fetch_op expanders to insert load/
compare and pause for all the atomic logic fetch builtins. Add flag
-mrelax-cmpxchg-loop to control whether to generate relaxed loop.
gcc/ChangeLog:
PR target/103069
* config/i386/i386-expand.c (ix86_expand_atomic_fetch_op_loop):
New expand function.
* config/i386/i386-options.c (ix86_target_string): Add
-mrelax-cmpxchg-loop flag.
(ix86_valid_target_attribute_inner_p): Likewise.
* config/i386/i386-protos.h (ix86_expand_atomic_fetch_op_loop):
New expand function prototype.
* config/i386/i386.opt: Add -mrelax-cmpxchg-loop.
* config/i386/sync.md (atomic_fetch_<logic><mode>): New expander
for SI,HI,QI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
(atomic_fetch_<logic><mode>): New expander for DI,TI modes.
(atomic_<logic>_fetch<mode>): Likewise.
(atomic_fetch_nand<mode>): Likewise.
(atomic_nand_fetch<mode>): Likewise.
* doc/invoke.texi: Document -mrelax-cmpxchg-loop.
gcc/testsuite/ChangeLog:
PR target/103069
* gcc.target/i386/pr103069-1.c: New test.
* gcc.target/i386/pr103069-2.c: Ditto.
Diffstat (limited to 'gcc/cp/parser.c')
0 files changed, 0 insertions, 0 deletions