diff options
author | Alexander Monakov <amonakov@ispras.ru> | 2016-11-16 20:17:00 +0300 |
---|---|---|
committer | Alexander Monakov <amonakov@gcc.gnu.org> | 2016-11-16 20:17:00 +0300 |
commit | 5012919d0bd344ac1888e8e531072f0ccbe24d2c (patch) | |
tree | 9db609d99ee4957a92a3ad468eb36d855e6c1bc6 /gcc/doc | |
parent | 2fe2aba3cd7c2daf16c545bc7fa34481157bfcaf (diff) | |
download | gcc-5012919d0bd344ac1888e8e531072f0ccbe24d2c.zip gcc-5012919d0bd344ac1888e8e531072f0ccbe24d2c.tar.gz gcc-5012919d0bd344ac1888e8e531072f0ccbe24d2c.tar.bz2 |
nvptx backend prerequisites for OpenMP offloading
gcc/
* config/nvptx/mkoffload.c (main): Check that either OpenACC or OpenMP
is selected. Pass -mgomp to offload compiler in OpenMP case.
* config/nvptx/nvptx-protos.h (nvptx_shuffle_kind): Move enum
declaration from nvptx.c.
(nvptx_gen_shuffle): Declare.
(nvptx_output_set_softstack): Declare.
* config/nvptx/nvptx.c (nvptx_shuffle_kind): Move to nvptx-protos.h.
(need_softstack_decl): New variable.
(need_unisimt_decl): New variable.
(diagnose_openacc_conflict): New. Use it...
(nvptx_option_override): ...here. Handle TARGET_GOMP.
(nvptx_encode_section_info): Handle "shared" attribute.
(write_as_kernel): Restrict to OpenACC target regions.
(init_softstack_frame): New.
(nvptx_init_unisimt_predicate): New.
(write_omp_entry): New. Use it...
(nvptx_declare_function_name): ...here to emit OpenMP target region
entrypoints. Handle TARGET_SOFT_STACK. Call
nvptx_init_unisimt_predicate.
(nvptx_output_set_softstack): New.
(nvptx_get_drap_rtx): Return %argp as the DRAP if needed.
(nvptx_gen_shuffle): Export.
(nvptx_output_call_insn): Handle COND_EXEC patterns. Emit instruction
predicate.
(nvptx_print_operand): Fix handling of instruction predicates.
(nvptx_get_unisimt_master): New helper function.
(nvptx_get_unisimt_predicate): Ditto.
(nvptx_call_insn_is_syscall_p): Ditto.
(nvptx_unisimt_handle_set): Ditto.
(nvptx_reorg_uniform_simt): New. Transform code for -muniform-simt.
(nvptx_reorg): Call nvptx_reorg_uniform_simt.
(nvptx_handle_shared_attribute): New. Use it...
(nvptx_attribute_table): ... here (new entry).
(nvptx_record_offload_symbol): Handle NULL attributes.
(nvptx_file_end): Handle need_softstack_decl and need_unisimt_decl.
(nvptx_simt_vf): New.
(TARGET_SIMT_VF): Define.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_softstack or __nvptx_unisimt__ when -msoft-stack, or resp.
-muniform-simt option is active.
(STACK_SIZE_MODE): Define.
(FIXED_REGISTERS): Adjust.
(SOFTSTACK_SLOT_REGNUM): New.
(SOFTSTACK_PREV_REGNUM): New.
(REGISTER_NAMES): Adjust.
(struct machine_function): New fields.
* config/nvptx/nvptx.md (UNSPEC_SET_SOFTSTACK): New.
(UNSPEC_VOTE_BALLOT): Ditto.
(UNSPEC_LANEID): Ditto.
(UNSPECV_NOUNROLL): Ditto.
(atomic): New attribute.
(predicable): New attribute. Generate predicated forms via
define_cond_exec.
(br_true): Mark as not predicable.
(br_false): Ditto.
(br_true_uni): Ditto.
(br_false_uni): Ditto.
(return): Ditto.
(trap_if_true): Ditto.
(trap_if_false): Ditto.
(nvptx_fork): Ditto.
(nvptx_forked): Ditto.
(nvptx_joining): Ditto.
(nvptx_join): Ditto.
(nvptx_barsync): Ditto.
(epilogue): Emit stack restore if TARGET_SOFT_STACK.
(allocate_stack): Implement for TARGET_SOFT_STACK. Remove unused code.
(allocate_stack_<mode>): Remove unused pattern.
(set_softstack_insn): New pattern.
(restore_stack_block): Handle for TARGET_SOFT_STACK.
(nvptx_vote_ballot): New pattern.
(omp_simt_lane): Ditto.
(omp_simt_last_lane): Ditto.
(omp_simt_ordered): Ditto.
(omp_simt_vote_any): Ditto.
(omp_simt_xchg_bfly): Ditto.
(omp_simt_xchg_idx): Ditto.
(nvptx_nounroll): Ditto.
(atomic_compare_and_swap<mode>_1): Mark with atomic attribute.
(atomic_exchange<mode>): Ditto.
(atomic_fetch_add<mode>): Ditto.
(atomic_fetch_addsf): Ditto.
(atomic_fetch_<logic><mode>): Ditto.
* config/nvptx/nvptx.opt: (msoft-stack): New option.
(muniform-simt): Ditto.
(mgomp): Ditto.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): New.
* doc/extend.texi (Nvidia PTX Variable Attributes): New section.
* doc/invoke.texi (msoft-stack): Document.
(muniform-simt): Document
(mgomp): Document.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: (TARGET_SIMT_VF): New hook.
* target.def: Define it.
* target-insns.def (omp_simt_lane): New.
(omp_simt_last_lane): New.
(omp_simt_ordered): New.
(omp_simt_vote_any): New.
(omp_simt_xchg_bfly): New.
(omp_simt_xchg_idx): New.
libgcc/
* config/nvptx/crt0.c (__main): Setup __nvptx_stacks and __nvptx_uni.
* config/nvptx/mgomp.c: New file.
* config/nvptx/t-nvptx: Add mgomp.c
gcc/testsuite/
* lib/target-supports.exp (check_effective_target_alloca): Use a
compile test.
* gcc.target/nvptx/softstack.c: New test.
* gcc.target/nvptx/decl-shared.c: New test.
* gcc.target/nvptx/decl-shared-init.c: New test.
From-SVN: r242503
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/extend.texi | 15 | ||||
-rw-r--r-- | gcc/doc/invoke.texi | 31 | ||||
-rw-r--r-- | gcc/doc/tm.texi | 4 | ||||
-rw-r--r-- | gcc/doc/tm.texi.in | 2 |
4 files changed, 52 insertions, 0 deletions
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0669f79..4dcc7f6 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -5576,6 +5576,7 @@ attributes. * MeP Variable Attributes:: * Microsoft Windows Variable Attributes:: * MSP430 Variable Attributes:: +* Nvidia PTX Variable Attributes:: * PowerPC Variable Attributes:: * RL78 Variable Attributes:: * SPU Variable Attributes:: @@ -6257,6 +6258,20 @@ same name (@pxref{MSP430 Function Attributes}). These attributes can be applied to both functions and variables. @end table +@node Nvidia PTX Variable Attributes +@subsection Nvidia PTX Variable Attributes + +These variable attributes are supported by the Nvidia PTX back end: + +@table @code +@item shared +@cindex @code{shared} attribute, Nvidia PTX +Use this attribute to place a variable in the @code{.shared} memory space. +This memory space is private to each cooperative thread array; only threads +within one thread block refer to the same instance of the variable. +The runtime does not initialize variables in this memory space. +@end table + @node PowerPC Variable Attributes @subsection PowerPC Variable Attributes diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1d24b31..620225c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -20570,6 +20570,37 @@ offloading execution. Apply partitioned execution optimizations. This is the default when any level of optimization is selected. +@item -msoft-stack +@opindex msoft-stack +Generate code that does not use @code{.local} memory +directly for stack storage. Instead, a per-warp stack pointer is +maintained explicitly. This enables variable-length stack allocation (with +variable-length arrays or @code{alloca}), and when global memory is used for +underlying storage, makes it possible to access automatic variables from other +threads, or with atomic instructions. This code generation variant is used +for OpenMP offloading, but the option is exposed on its own for the purpose +of testing the compiler; to generate code suitable for linking into programs +using OpenMP offloading, use option @option{-mgomp}. + +@item -muniform-simt +@opindex muniform-simt +Switch to code generation variant that allows to execute all threads in each +warp, while maintaining memory state and side effects as if only one thread +in each warp was active outside of OpenMP SIMD regions. All atomic operations +and calls to runtime (malloc, free, vprintf) are conditionally executed (iff +current lane index equals the master lane index), and the register being +assigned is copied via a shuffle instruction from the master lane. Outside of +SIMD regions lane 0 is the master; inside, each thread sees itself as the +master. Shared memory array @code{int __nvptx_uni[]} stores all-zeros or +all-ones bitmasks for each warp, indicating current mode (0 outside of SIMD +regions). Each thread can bitwise-and the bitmask at position @code{tid.y} +with current lane index to compute the master lane index. + +@item -mgomp +@opindex mgomp +Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and +@option{-muniform-simt} options, and selects corresponding multilib variant. + @end table @node PDP-11 Options diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 85341ae..84bba07 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5862,6 +5862,10 @@ usable. In that case, the smaller the number is, the more desirable it is to use it. @end deftypefn +@deftypefn {Target Hook} int TARGET_SIMT_VF (void) +Return number of threads in SIMT thread group on the target. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int *@var{dims}, int @var{fn_level}) This hook should check the launch dimensions provided for an OpenACC compute region, or routine. Defaulted values are represented as -1 diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 400d574..9afd5daa 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4295,6 +4295,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_SIMD_CLONE_USABLE +@hook TARGET_SIMT_VF + @hook TARGET_GOACC_VALIDATE_DIMS @hook TARGET_GOACC_DIM_LIMIT |