aboutsummaryrefslogtreecommitdiff
path: root/libgfortran
diff options
context:
space:
mode:
authorThomas Schwinge <tschwinge@baylibre.com>2024-05-10 12:50:23 +0200
committerThomas Schwinge <tschwinge@baylibre.com>2024-06-06 13:41:46 +0200
commitb4e68dd9084e48ee3e83c11d7f27548d8cca7066 (patch)
tree0f55fe080dc99174fe265d9cc4a6f46a900dbbef /libgfortran
parent395ac0417a17ba6405873f891f895417d696b603 (diff)
downloadgcc-b4e68dd9084e48ee3e83c11d7f27548d8cca7066.zip
gcc-b4e68dd9084e48ee3e83c11d7f27548d8cca7066.tar.gz
gcc-b4e68dd9084e48ee3e83c11d7f27548d8cca7066.tar.bz2
nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred'
For example, this allows for '-muniform-simt' code to be executed single-threaded, which currently fails (device-side 'trap'): the '0xffffffff' bitmask isn't correct if not all 32 threads of a warp are active. The same issue/fix, I suppose but have not verified, would apply if we were to allow for OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'. We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0. Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff', which evidently appears to do the right thing. (I've tested '-muniform-simt' code executing single-threaded.) The change that I proposed on 2022-12-15 was to emit PTX code to calculate '(1 << %ntid.x) - 1' as the actual bitmask to use instead of '0xffffffff'. This works, but the PTX JIT generates SASS code to do this computation. In turn, this change now uses PTX 'vote.all.pred' -- which even simplifies upon the original code a little bit, see the following examplary SASS 'diff' before vs. after this change: [...] /*[...]*/ SYNC (*"BRANCH_TARGETS .L_x_332"*) } .L_x_332: - /*[...]*/ VOTE.ANY R9, PT, PT ; + /*[...]*/ VOTE.ALL P1, PT ; - /*[...]*/ ISETP.NE.U32.AND P1, PT, R9, -0x1, PT ; - /*[...]*/ @!P1 BRA `(.L_x_333) ; + /*[...]*/ @P1 BRA `(.L_x_333) ; /*[...]*/ BPT.TRAP 0x1 ; .L_x_333: - /*[...]*/ @P1 EXIT ; + /*[...]*/ @!P1 EXIT ; [...] gcc/ * config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for non-full-warp execution, via 'vote.all.pred'. gcc/testsuite/ * gcc.target/nvptx/nvptx.exp (check_effective_target_default_ptx_isa_version_at_least_6_0): New. * gcc.target/nvptx/uniform-simt-2.c: Adjust. * gcc.target/nvptx/uniform-simt-5.c: New.
Diffstat (limited to 'libgfortran')
0 files changed, 0 insertions, 0 deletions