diff options
author | Julian Brown <julian@codesourcery.com> | 2021-03-02 04:20:11 -0800 |
---|---|---|
committer | Thomas Schwinge <thomas@codesourcery.com> | 2021-08-09 14:47:42 +0200 |
commit | e2a58ed6dc5293602d0d168475109caa81ad0f0d (patch) | |
tree | a9b97eaab19148ad4b70da3a31fa63caf042f235 /gcc/doc | |
parent | e2e0b85c1e7cb53fd720df0d09278e3d485c733e (diff) | |
download | gcc-e2a58ed6dc5293602d0d168475109caa81ad0f0d.zip gcc-e2a58ed6dc5293602d0d168475109caa81ad0f0d.tar.gz gcc-e2a58ed6dc5293602d0d168475109caa81ad0f0d.tar.bz2 |
openacc: Middle-end worker-partitioning support
This patch implements worker-partitioning support in the middle end,
by rewriting gimple. The OpenACC execution model requires that code
can run in either "worker single" mode where only a single worker per
gang is active, or "worker partitioned" mode, where multiple workers
per gang are active. This means we need to do something equivalent
to spawning additional workers when transitioning from worker-single
to worker-partitioned mode. However, GPUs typically fix the number of
threads of invoked kernels at launch time, so we need to do something
with the "extra" threads when they are not wanted.
The scheme used is to conditionalise each basic block that executes
in "worker single" mode for worker 0 only. Conditional branches
are handled specially so "idle" (non-0) workers follow along with
worker 0. On transitioning to "worker partitioned" mode, any variables
modified by worker 0 are propagated to the other workers via GPU shared
memory. Special care is taken for routine calls, writes through pointers,
and so forth, as follows:
- There are two types of function calls to consider in worker-single
mode: "normal" calls to maths library routines, etc. are called from
worker 0 only. OpenACC routines may contain worker-partitioned loops
themselves, so are called from all workers, including "idle" ones.
- SSA names set in worker-single mode, but used in worker-partitioned
mode, are copied to shared memory in worker 0. Other workers retrieve
the value from the appropriate shared-memory location after a barrier,
and new phi nodes are introduced at the convergence point to resolve
the worker 0/other worker copies of the value.
- Local scalar variables (on the stack) also need special handling. We
broadcast any variables that are written in the current worker-single
block, and that are read in any worker-partitioned block. (This is
believed to be safe, and is flow-insensitive to ease analysis.)
- Local aggregates (arrays and composites) on the stack are *not*
broadcast. Instead we force gimple stmts modifying elements/fields of
local aggregates into fully-partitioned mode. The RHS of the
assignment is a scalar, and is thus subject to broadcasting as above.
- Writes through pointers may affect any local variable that has
its address taken. We use points-to analysis to determine the set
of potentially-affected variables for a given pointer indirection.
We broadcast any such variable which is used in worker-partitioned
mode, on a per-block basis for any block containing a write through
a pointer.
Some slides about the implementation (from 2018) are available at:
https://jtb20.github.io/gcnworkers.pdf
gcc/
* Makefile.in (OBJS): Add omp-oacc-neuter-broadcast.o.
* doc/tm.texi.in (TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD):
Add documentation hook.
* doc/tm.texi: Regenerate.
* omp-oacc-neuter-broadcast.cc: New file.
* omp-builtins.def (BUILT_IN_GOACC_BARRIER)
(BUILT_IN_GOACC_SINGLE_START, BUILT_IN_GOACC_SINGLE_COPY_START)
(BUILT_IN_GOACC_SINGLE_COPY_END): New builtins.
* passes.def (pass_omp_oacc_neuter_broadcast): Add pass.
* target.def (goacc.create_worker_broadcast_record): Add target
hook.
* tree-pass.h (make_pass_omp_oacc_neuter_broadcast): Add
prototype.
* config/gcn/gcn-protos.h (gcn_goacc_adjust_propagation_record):
Rename prototype to...
(gcn_goacc_create_worker_broadcast_record): ... this.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_propagation_record): Rename
function to...
(gcn_goacc_create_worker_broadcast_record): ... this.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_PROPAGATION_RECORD):
Rename to...
(TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD): ... this.
Co-Authored-By: Nathan Sidwell <nathan@codesourcery.com> (via 'gcc/config/nvptx/nvptx.c' master)
Co-Authored-By: Kwok Cheung Yeung <kcy@codesourcery.com>
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/tm.texi | 9 | ||||
-rw-r--r-- | gcc/doc/tm.texi.in | 2 |
2 files changed, 11 insertions, 0 deletions
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index cb01528..a30fdcb 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6409,6 +6409,15 @@ private variables at OpenACC device-lowering time using the @code{TARGET_GOACC_ADJUST_PRIVATE_DECL} target hook. @end deftypefn +@deftypefn {Target Hook} tree TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD (tree @var{rec}, bool @var{sender}, const char *@var{name}) +Create a record used to propagate local-variable state from an active +worker to other workers. A possible implementation might adjust the type +of REC to place the new variable in shared GPU memory. + +Presence of this target hook indicates that middle end neutering/broadcasting +be used. +@end deftypefn + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 4a522ae..611fc50 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4223,6 +4223,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_GOACC_EXPAND_VAR_DECL +@hook TARGET_GOACC_CREATE_WORKER_BROADCAST_RECORD + @node Anchored Addresses @section Anchored Addresses @cindex anchored addresses |