aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree.h
diff options
context:
space:
mode:
authorTom de Vries <tdevries@suse.de>2022-01-21 10:57:43 +0100
committerTom de Vries <tdevries@suse.de>2022-02-01 19:28:04 +0100
commitca902055d056773bd0ca80f68bca4b20ad0e183f (patch)
treedb44ff1af55b5491b73bed78c8997dcebdc19b7e /gcc/tree.h
parent07a971b28c880938bb7e070465ab8ee6ccdad1fb (diff)
downloadgcc-ca902055d056773bd0ca80f68bca4b20ad0e183f.zip
gcc-ca902055d056773bd0ca80f68bca4b20ad0e183f.tar.gz
gcc-ca902055d056773bd0ca80f68bca4b20ad0e183f.tar.bz2
[nvptx] Fix reduction lock
When I run the libgomp test-case reduction-cplx-dbl.c on an nvptx accelerator (T400, driver version 470.86), I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/reduction-cplx-dbl.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 \ execution test ... The problem is in this code generated for a gang reduction: ... $L39: atom.global.cas.b32 %r59, [__reduction_lock], 0, 1; setp.ne.u32 %r116, %r59, 0; @%r116 bra $L39; ld.f64 %r60, [%r44]; ld.f64 %r61, [%r44+8]; ld.f64 %r64, [%r44]; ld.f64 %r65, [%r44+8]; add.f64 %r117, %r64, %r22; add.f64 %r118, %r65, %r41; st.f64 [%r44], %r117; st.f64 [%r44+8], %r118; atom.global.cas.b32 %r119, [__reduction_lock], 1, 0; ... which is taking and releasing a lock, but missing the appropriate barriers to protect the loads and store inside the lock. Fix this by adding membar.gl barriers. Likewise, add membar.cta barriers if we protect shared memory loads and stores (even though the worker-partitioning part of the test-case is not failing). Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-01-27 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (enum nvptx_builtins): Add NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA. (VOID): New macro. (nvptx_init_builtins): Add MEMBAR_GL and MEMBAR_CTA. (nvptx_expand_builtin): Handle NVPTX_BUILTIN_MEMBAR_GL and NVPTX_BUILTIN_MEMBAR_CTA. (nvptx_lockfull_update): Add level parameter. Emit barriers. (nvptx_reduction_update, nvptx_goacc_reduction_fini): Update call to nvptx_lockfull_update. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_MEMBAR_GL. (define_expand "nvptx_membar_gl"): New expand. (define_insn "*nvptx_membar_gl"): New insn.
Diffstat (limited to 'gcc/tree.h')
0 files changed, 0 insertions, 0 deletions