aboutsummaryrefslogtreecommitdiff
path: root/lldb/packages/Python/lldbsuite/test/gdbclientutils.py
diff options
context:
space:
mode:
authorPierre van Houtryve <pierre.vanhoutryve@amd.com>2025-10-21 09:23:46 +0200
committerGitHub <noreply@github.com>2025-10-21 09:23:46 +0200
commit07d47c792b980746ab1ff5ea3f346c87b024bd51 (patch)
treeda8b6b53382efc215365fe38a1906cc2e8ee3cc6 /lldb/packages/Python/lldbsuite/test/gdbclientutils.py
parente4f3e9a3d1a3d78675fb3daa16cb6e97405f6627 (diff)
downloadllvm-07d47c792b980746ab1ff5ea3f346c87b024bd51.zip
llvm-07d47c792b980746ab1ff5ea3f346c87b024bd51.tar.gz
llvm-07d47c792b980746ab1ff5ea3f346c87b024bd51.tar.bz2
[AMDGPU] Update code sequence for CU-mode Release Fences in GFX10+ (#161638)
They were previously optimized to not emit any waitcnt, which is technically correct because there is no reordering of operations at workgroup scope in CU mode for GFX10+. This breaks transitivity however, for example if we have the following sequence of events in one thread: - some stores - store atomic release syncscope("workgroup") - barrier then another thread follows with - barrier - load atomic acquire - store atomic release syncscope("agent") It does not work because, while the other thread sees the stores, it cannot release them at the wider scope. Our release fences aren't strong enough to "wait" on stores from other waves. We also cannot strengthen our release fences any further to allow for releasing other wave's stores because only GFX12 can do that with `global_wb`. GFX10-11 do not have the writeback instruction. It'd also add yet another level of complexity to code sequences, with both acquire/release having CU-mode only alternatives. Lastly, acq/rel are always used together. The price for synchronization has to be paid either at the acq, or the rel. Strengthening the releases would just make the memory model more complex but wouldn't help performance. So the choice here is to streamline the code sequences by making CU and WGP mode emit almost identical (vL0 inv is not needed in CU mode) code for release (or stronger) atomic ordering. This also removes the `vm_vsrc(0)` wait before barriers. Now that the release fence in CU mode is strong enough, it is no longer needed. Supersedes #160501 Solves SC1-6454
Diffstat (limited to 'lldb/packages/Python/lldbsuite/test/gdbclientutils.py')
0 files changed, 0 insertions, 0 deletions