diff options
author | Chung-Lin Tang <cltang@codesourcery.com> | 2023-05-19 12:14:04 -0700 |
---|---|---|
committer | Chung-Lin Tang <cltang@codesourcery.com> | 2023-05-19 12:14:04 -0700 |
commit | 5f881613fa9128edae5bbfa4e19f9752809e4bd7 (patch) | |
tree | 9677f855effa09243c00f6530cb3b5b03b70ffdd | |
parent | 17c41b39078fc8ad67fd1b82f74ef5174f34452e (diff) | |
download | gcc-devel/omp/gcc-12.zip gcc-devel/omp/gcc-12.tar.gz gcc-devel/omp/gcc-12.tar.bz2 |
Use OpenACC code to process OpenMP target regionsdevel/omp/gcc-12
This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html
This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.
Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.
gcc/ChangeLog:
* builtins.cc (expand_builtin_omp_builtins): New function.
(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
* cgraphunit.cc (analyze_functions): Add call to
omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
* common.opt (fopenmp-target=): Add new option and enums.
* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
prototype.
(nvptx_mem_shared_p): Likewise.
* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
symbol for number of threads in team.
(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
(need_omp_num_threads): New bool for if any function references
omp_num_threads_sym.
(nvptx_option_override): Initialize omp_num_threads_sym/align.
(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
(nvptx_declare_function_name): Disable shim function under OMPACC mode.
Disable soft-stack under OMPACC mode. Add generation of neutering init
code under OMPACC mode.
(nvptx_output_set_softstack): Return "" under OMPACC mode.
(nvptx_expand_call): Set parallelism to vector for function calls with
"ompacc for" attached.
(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
(nvptx_expand_oacc_join): Likewise.
(nvptx_expand_omp_get_num_threads): New function.
(nvptx_mem_shared_p): New function.
(nvptx_mach_max_workers): Return 1 under OMPACC mode.
(nvptx_mach_vector_length): Return 32 under OMPACC mode.
(nvptx_single): Add adjustments for OMPACC mode, which have
parallel-construct fork/joins, and regions of code where neutering is
dynamically determined.
(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
attribute is attached to function. Disable uniform-simt when under
OMPACC mode.
(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.
* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.
* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.
* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
mode cases.
libgomp/ChangeLog:
* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.
-rw-r--r-- | gcc/builtins.cc | 71 | ||||
-rw-r--r-- | gcc/cgraphunit.cc | 7 | ||||
-rw-r--r-- | gcc/common.opt | 13 | ||||
-rw-r--r-- | gcc/config/nvptx/mkoffload.cc | 13 | ||||
-rw-r--r-- | gcc/config/nvptx/nvptx-protos.h | 2 | ||||
-rw-r--r-- | gcc/config/nvptx/nvptx.cc | 269 | ||||
-rw-r--r-- | gcc/config/nvptx/nvptx.h | 3 | ||||
-rw-r--r-- | gcc/config/nvptx/nvptx.md | 68 | ||||
-rw-r--r-- | gcc/expr.cc | 3 | ||||
-rw-r--r-- | gcc/flag-types.h | 6 | ||||
-rw-r--r-- | gcc/gimplify.cc | 33 | ||||
-rw-r--r-- | gcc/lto-wrapper.cc | 1 | ||||
-rw-r--r-- | gcc/omp-builtins.def | 4 | ||||
-rw-r--r-- | gcc/omp-expand.cc | 67 | ||||
-rw-r--r-- | gcc/omp-general.cc | 11 | ||||
-rw-r--r-- | gcc/omp-low.cc | 150 | ||||
-rw-r--r-- | gcc/omp-offload.cc | 303 | ||||
-rw-r--r-- | gcc/omp-offload.h | 1 | ||||
-rw-r--r-- | gcc/opts.cc | 8 | ||||
-rw-r--r-- | gcc/target-insns.def | 5 | ||||
-rw-r--r-- | gcc/tree-core.h | 4 | ||||
-rw-r--r-- | gcc/tree-nested.cc | 2 | ||||
-rw-r--r-- | gcc/tree-pretty-print.cc | 6 | ||||
-rw-r--r-- | gcc/tree-ssa-loop.cc | 5 | ||||
-rw-r--r-- | gcc/tree.cc | 2 | ||||
-rw-r--r-- | gcc/tree.h | 3 | ||||
-rw-r--r-- | libgomp/config/nvptx/team.c | 3 | ||||
-rw-r--r-- | libgomp/testsuite/libgomp.c-c++-common/for-17.c | 69 | ||||
-rw-r--r-- | libgomp/testsuite/libgomp.c-c++-common/for-18.c | 5 |
29 files changed, 1079 insertions, 58 deletions
diff --git a/gcc/builtins.cc b/gcc/builtins.cc index b8cd75d..f36fe15 100644 --- a/gcc/builtins.cc +++ b/gcc/builtins.cc @@ -6785,6 +6785,62 @@ expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore) return target; } +static rtx +expand_builtin_omp_builtins (tree exp, rtx target, int ignore) +{ + rtx ret = NULL; + rtx_insn *(*gen_fn) (rtx) = NULL; + + switch (DECL_FUNCTION_CODE (get_callee_fndecl (exp))) + { + case BUILT_IN_GOMP_BARRIER: + if (targetm.have_gomp_barrier ()) + { + emit_insn (targetm.gen_gomp_barrier ()); + return target; + } + break; + + case BUILT_IN_OMP_GET_THREAD_NUM: + if (targetm.have_omp_get_thread_num ()) + gen_fn = targetm.gen_omp_get_thread_num; + break; + + case BUILT_IN_OMP_GET_NUM_THREADS: + if (targetm.have_omp_get_num_threads ()) + gen_fn = targetm.gen_omp_get_num_threads; + break; + + case BUILT_IN_OMP_GET_TEAM_NUM: + if (targetm.have_omp_get_team_num ()) + gen_fn = targetm.gen_omp_get_team_num; + break; + + case BUILT_IN_OMP_GET_NUM_TEAMS: + if (targetm.have_omp_get_num_teams ()) + gen_fn = targetm.gen_omp_get_num_teams; + break; + + default: + gcc_unreachable (); + } + + if (ignore) + return const0_rtx; + + if (gen_fn) + { + rtx reg = (MEM_P (target) + ? gen_reg_rtx (GET_MODE (target)) + : target); + emit_insn (gen_fn (reg)); + if (reg != target) + emit_move_insn (target, reg); + ret = target; + } + return ret; +} + /* Expand a string compare operation using a sequence of char comparison to get rid of the calling overhead, with result going to TARGET if that's convenient. @@ -8113,6 +8169,21 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, case BUILT_IN_GOACC_PARLEVEL_SIZE: return expand_builtin_goacc_parlevel_id_size (exp, target, ignore); + case BUILT_IN_GOMP_BARRIER: + case BUILT_IN_OMP_GET_THREAD_NUM: + case BUILT_IN_OMP_GET_NUM_THREADS: + case BUILT_IN_OMP_GET_TEAM_NUM: + case BUILT_IN_OMP_GET_NUM_TEAMS: + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + { + target = expand_builtin_omp_builtins (exp, target, ignore); + if (target) + return target; + } + break; + case BUILT_IN_SPECULATION_SAFE_VALUE_PTR: return expand_speculation_safe_value (VOIDmode, exp, target, ignore); diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc index b949f61..4d45ab3 100644 --- a/gcc/cgraphunit.cc +++ b/gcc/cgraphunit.cc @@ -1174,7 +1174,12 @@ analyze_functions (bool first_time) build_type_inheritance_graph (); if (flag_openmp && first_time) - omp_discover_implicit_declare_target (); + { + omp_discover_implicit_declare_target (); + + if(flag_openmp_target == OMP_TARGET_MODE_OMPACC) + omp_ompacc_attribute_tagging (); + } /* Analysis adds static variables that in turn adds references to new functions. So we need to iterate the process until it stabilize. */ diff --git a/gcc/common.opt b/gcc/common.opt index e682cea..0caa645 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2233,6 +2233,19 @@ Enum(target_simd_clone_device) String(nohost) Value(OMP_TARGET_SIMD_CLONE_NOHOST EnumValue Enum(target_simd_clone_device) String(any) Value(OMP_TARGET_SIMD_CLONE_ANY) +fopenmp-target= +Common Joined RejectNegative Enum(openmp_target) Var(flag_openmp_target) Init(OMP_TARGET_MODE_DEFAULT) +Execution model used for OpenMP target regions. + +Enum +Name(openmp_target) Type(int) + +EnumValue +Enum(openmp_target) String(default) Value(OMP_TARGET_MODE_DEFAULT) + +EnumValue +Enum(openmp_target) String(acc) Value(OMP_TARGET_MODE_OMPACC) + fopt-info Common Var(flag_opt_info) Optimization Enable all optimization info dumps on stderr. diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc index 5d89ba8..82ea313 100644 --- a/gcc/config/nvptx/mkoffload.cc +++ b/gcc/config/nvptx/mkoffload.cc @@ -603,6 +603,7 @@ main (int argc, char **argv) /* Scan the argument vector. */ bool fopenmp = false; + bool fopenmp_target = false; bool fopenacc = false; bool fPIC = false; bool fpic = false; @@ -622,6 +623,9 @@ main (int argc, char **argv) #undef STR else if (strcmp (argv[i], "-fopenmp") == 0) fopenmp = true; + else if (strncmp (argv[i], "-fopenmp-target=", + strlen ("-fopenmp-target=")) == 0) + fopenmp_target = true; else if (strcmp (argv[i], "-fopenacc") == 0) fopenacc = true; else if (strcmp (argv[i], "-fPIC") == 0) @@ -639,6 +643,15 @@ main (int argc, char **argv) if (!(fopenacc ^ fopenmp)) fatal_error (input_location, "either %<-fopenacc%> or %<-fopenmp%> " "must be set"); + if (fopenmp_target) + { + if (fopenacc) + fatal_error (input_location, "%<-fopenacc%> not compatible with " + "%<-fopenmp-target=%>"); + if (!fopenmp) + fatal_error (input_location, "%<-fopenmp-target=%> requires " + "%<-fopenmp%>"); + } struct obstack argv_obstack; obstack_init (&argv_obstack); diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h index dfa08ec..a86514b 100644 --- a/gcc/config/nvptx/nvptx-protos.h +++ b/gcc/config/nvptx/nvptx-protos.h @@ -50,6 +50,7 @@ extern unsigned int ptx_version_to_number (enum ptx_version, bool); extern void nvptx_expand_oacc_fork (unsigned); extern void nvptx_expand_oacc_join (unsigned); extern void nvptx_expand_call (rtx, rtx); +extern void nvptx_expand_omp_get_num_threads (rtx); extern rtx nvptx_gen_shuffle (rtx, rtx, rtx, nvptx_shuffle_kind); extern rtx nvptx_expand_compare (rtx); extern const char *nvptx_ptx_type_from_mode (machine_mode, bool); @@ -63,5 +64,6 @@ extern const char *nvptx_output_red_partition (rtx, rtx); extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int); extern bool nvptx_mem_local_p (rtx); extern bool nvptx_mem_maybe_shared_p (const_rtx); +extern bool nvptx_mem_shared_p (const_rtx); #endif #endif diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 9c284ed..3b2bfd3 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -176,6 +176,9 @@ static unsigned gang_private_shared_align; static GTY(()) rtx gang_private_shared_sym; static hash_map<tree_decl_hash, unsigned int> gang_private_shared_hmap; +static GTY(()) rtx omp_num_threads_sym; +static unsigned omp_num_threads_align; + /* Global lock variable, needed for 128bit worker & gang reductions. */ static GTY(()) tree global_lock_var; @@ -187,6 +190,9 @@ static bool have_softstack_decl; static bool need_unisimt_decl; static bool have_unisimt_decl; +/* True if any function references __nvptx_omp_num_threads. */ +static bool need_omp_num_threads; + static int nvptx_mach_max_workers (); /* Allocate a new, cleared machine_function structure. */ @@ -393,6 +399,10 @@ nvptx_option_override (void) SET_SYMBOL_DATA_AREA (gang_private_shared_sym, DATA_AREA_SHARED); gang_private_shared_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + omp_num_threads_sym = gen_rtx_SYMBOL_REF (Pmode, "__nvptx_omp_num_threads"); + SET_SYMBOL_DATA_AREA (omp_num_threads_sym, DATA_AREA_SHARED); + omp_num_threads_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT; + diagnose_openacc_conflict (TARGET_GOMP, "-mgomp"); diagnose_openacc_conflict (TARGET_SOFT_STACK, "-msoft-stack"); diagnose_openacc_conflict (TARGET_UNIFORM_SIMT, "-muniform-simt"); @@ -961,7 +971,8 @@ write_as_kernel (tree attrs) { return (lookup_attribute ("kernel", attrs) != NULL_TREE || (lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE - && lookup_attribute ("oacc function", attrs) != NULL_TREE)); + && (lookup_attribute ("oacc function", attrs) != NULL_TREE + || lookup_attribute ("ompacc", attrs) != NULL_TREE))); /* For OpenMP target regions, the corresponding kernel entry is emitted from write_omp_entry as a separate function. */ } @@ -1495,6 +1506,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) DECL_ATTRIBUTES (decl))) force_public = true; if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) && !lookup_attribute ("oacc function", DECL_ATTRIBUTES (decl))) { char *buf = (char *) alloca (strlen (name) + sizeof ("$impl")); @@ -1548,7 +1560,7 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) HOST_WIDE_INT sz = get_frame_size (); bool need_frameptr = sz || cfun->machine->has_chain; int alignment = crtl->stack_alignment_needed / BITS_PER_UNIT; - if (!TARGET_SOFT_STACK) + if (!TARGET_SOFT_STACK || lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))) { /* Declare a local var for outgoing varargs. */ if (cfun->machine->has_varadic) @@ -1619,6 +1631,45 @@ nvptx_declare_function_name (FILE *file, const char *name, const_tree decl) nvptx_init_unisimt_predicate (file); if (cfun->machine->bcast_partition || cfun->machine->sync_bar) nvptx_init_oacc_workers (file); + + if (offloading_function_p ((tree) decl) + && lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl))) + { + int nthr_regno = REGNO (cfun->machine->omp_fn_entry_num_threads_reg); + if (lookup_attribute ("omp target entrypoint", DECL_ATTRIBUTES (decl))) + { + fprintf (file, "\t{\n"); + if (cfun->machine->omp_parallel_predicate) + { + /* Borrow num-threads regno as temp register. */ + fprintf (file, "\t\tmov.u32 %%r%d, %%tid.x;\n", nthr_regno); + fprintf (file, "\t\tsetp.ne.u32 %%r%d, %%r%d, 0;\n", + REGNO (cfun->machine->omp_parallel_predicate), nthr_regno); + } + fprintf (file, "\t\tmov.u32 %%r%d, 1;\n", nthr_regno); + fprintf (file, "\t\tst.shared.u32 [__nvptx_omp_num_threads], %%r%d;\n", nthr_regno); + fprintf (file, "\t}\n"); + need_omp_num_threads = true; + } + else + { + fprintf (file, "\t\tld.shared.u32 %%r%d, [__nvptx_omp_num_threads];\n", nthr_regno); + if (cfun->machine->omp_parallel_predicate) + { + fprintf (file, "\t{\n"); + fprintf (file, "\t\t.reg.u32 %%tmp1;\n"); + fprintf (file, "\t\t.reg.pred %%not_parallel_mode, %%v1_lane;\n"); + fprintf (file, "\t\tsetp.eq.u32 %%not_parallel_mode, %%r%d, 1;\n", nthr_regno); + fprintf (file, "\t\tmov.u32 %%tmp1, %%tid.x;\n"); + fprintf (file, "\t\tsetp.ne.u32 %%v1_lane, %%tmp1, 0;\n"); + fprintf (file, "\t\tand.pred %%r%d, %%not_parallel_mode, %%v1_lane;\n", + REGNO (cfun->machine->omp_parallel_predicate)); + fprintf (file, "\t}\n"); + need_omp_num_threads = true; + } + } + } } /* Output code for switching uniform-simt state. ENTERING indicates whether @@ -1736,6 +1787,10 @@ nvptx_output_simt_exit (rtx src) const char * nvptx_output_set_softstack (unsigned src_regno) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return ""; if (cfun->machine->has_softstack && !crtl->is_leaf) { fprintf (asm_out_file, "\tst.shared.u%d\t[%s], ", @@ -1854,20 +1909,29 @@ nvptx_expand_call (rtx retval, rtx address) if (DECL_STATIC_CHAIN (decl)) cfun->machine->has_chain = true; - tree attr = oacc_get_fn_attrib (decl); - if (attr) + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) { - tree dims = TREE_VALUE (attr); - - parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1; - for (int ix = 0; ix != GOMP_DIM_MAX; ix++) + if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl)) + && !lookup_attribute ("ompacc seq", DECL_ATTRIBUTES (decl))) + parallel = GOMP_DIM_MASK (GOMP_DIM_VECTOR); + } + else + { + tree attr = oacc_get_fn_attrib (decl); + if (attr) { - if (TREE_PURPOSE (dims) - && !integer_zerop (TREE_PURPOSE (dims))) - break; - /* Not on this axis. */ - parallel ^= GOMP_DIM_MASK (ix); - dims = TREE_CHAIN (dims); + tree dims = TREE_VALUE (attr); + + parallel = GOMP_DIM_MASK (GOMP_DIM_MAX) - 1; + for (int ix = 0; ix != GOMP_DIM_MAX; ix++) + { + if (TREE_PURPOSE (dims) + && !integer_zerop (TREE_PURPOSE (dims))) + break; + /* Not on this axis. */ + parallel ^= GOMP_DIM_MASK (ix); + dims = TREE_CHAIN (dims); + } } } } @@ -1930,15 +1994,27 @@ nvptx_expand_compare (rtx compare) void nvptx_expand_oacc_fork (unsigned mode) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + mode = GOMP_DIM_VECTOR; nvptx_emit_forking (GOMP_DIM_MASK (mode), false); } void nvptx_expand_oacc_join (unsigned mode) { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + mode = GOMP_DIM_VECTOR; nvptx_emit_joining (GOMP_DIM_MASK (mode), false); } +void +nvptx_expand_omp_get_num_threads (rtx target) +{ + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn (gen_rtx_SET (target, mem)); + need_omp_num_threads = true; +} + /* Generate instruction(s) to unpack a 64 bit object into 2 32 bit objects. */ @@ -2879,6 +2955,13 @@ nvptx_mem_maybe_shared_p (const_rtx x) return area == DATA_AREA_SHARED || area == DATA_AREA_GENERIC; } +bool +nvptx_mem_shared_p (const_rtx x) +{ + nvptx_data_area area = nvptx_mem_data_area (x); + return area == DATA_AREA_SHARED; +} + /* Print an operand, X, to FILE, with an optional modifier in CODE. Meaning of CODE: @@ -3483,6 +3566,11 @@ init_axis_dim (void) static int ATTRIBUTE_UNUSED nvptx_mach_max_workers () { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return 1; + if (!cfun->machine->axis_dim_init_p) init_axis_dim (); return cfun->machine->axis_dim[MACH_MAX_WORKERS]; @@ -3491,6 +3579,11 @@ nvptx_mach_max_workers () static int ATTRIBUTE_UNUSED nvptx_mach_vector_length () { + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return 32; + if (!cfun->machine->axis_dim_init_p) init_axis_dim (); return cfun->machine->axis_dim[MACH_VECTOR_LENGTH]; @@ -4873,11 +4966,27 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) rtx_insn *tail = BB_END (to); unsigned skip_mask = mask; + rtx_insn *join = NULL; + rtx_insn *fork = NULL; + while (true) { /* Find first insn of from block. */ - while (head != BB_END (from) && !needs_neutering_p (head)) - head = NEXT_INSN (head); + while (true) + { + if (INSN_P (head) + && recog_memoized (head) == CODE_FOR_nvptx_join) + { + /* Record join if we see it. */ + gcc_assert (!join); + join = head; + } + + if (head != BB_END (from) && !needs_neutering_p (head)) + head = NEXT_INSN (head); + else + break; + } if (from == to) break; @@ -4895,8 +5004,46 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) /* Find last insn of to block */ rtx_insn *limit = from == to ? head : BB_HEAD (to); - while (tail != limit && !INSN_P (tail) && !LABEL_P (tail)) - tail = PREV_INSN (tail); + while (true) + { + if (INSN_P (tail) + && recog_memoized (tail) == CODE_FOR_nvptx_fork) + { + /* Record join if we see it. */ + gcc_assert (!fork); + fork = tail; + } + + if (tail != limit && !INSN_P (tail) && !LABEL_P (tail)) + tail = PREV_INSN (tail); + else + break; + } + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + if (join + /* We do not set/restore parallel state across function calls. */ + && !(INTVAL (XVECEXP (PATTERN (join), 0, 0)) & (1 << GOMP_DIM_MAX))) + { + rtx reg = cfun->machine->omp_fn_entry_num_threads_reg; + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn_before (gen_nvptx_omp_parallel_join (mem, reg), head); + need_omp_num_threads = true; + head = PREV_INSN (head); + } + + if (fork + /* We do not set/restore parallel state across function calls. */ + && !(INTVAL (XVECEXP (PATTERN (fork), 0, 0)) & (1 << GOMP_DIM_MAX))) + { + rtx reg = gen_reg_rtx (SImode); + rtx mem = gen_rtx_MEM (SImode, omp_num_threads_sym); + emit_insn_before (gen_get_ntid (reg), tail); + emit_insn_before (gen_nvptx_omp_parallel_fork (mem, reg), tail); + need_omp_num_threads = true; + } + } /* Detect if tail is a branch. */ rtx tail_branch = NULL_RTX; @@ -4943,16 +5090,31 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) if (GOMP_DIM_MASK (mode) & skip_mask) { rtx_code_label *label = gen_label_rtx (); - rtx pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER]; rtx_insn **mode_jump = mode == GOMP_DIM_VECTOR ? &vector_jump : &worker_jump; rtx_insn **mode_label = mode == GOMP_DIM_VECTOR ? &vector_label : &worker_label; - if (!pred) + rtx pred; + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && mode == GOMP_DIM_VECTOR) + { + pred = cfun->machine->omp_parallel_predicate; + if (!pred) + { + pred = gen_reg_rtx (BImode); + cfun->machine->omp_parallel_predicate = pred; + } + } + else { - pred = gen_reg_rtx (BImode); - cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred; + pred = cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER]; + if (!pred) + { + pred = gen_reg_rtx (BImode); + cfun->machine->axis_predicate[mode - GOMP_DIM_WORKER] = pred; + } } rtx br; @@ -5067,7 +5229,38 @@ nvptx_single (unsigned mask, basic_block from, basic_block to) rtx tmp = gen_reg_rtx (BImode); emit_insn_before (gen_movbi (tmp, const0_rtx), bb_first_real_insn (from)); - emit_insn_before (gen_rtx_SET (tmp, pvar), label); + + if(flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + rtx nthr = cfun->machine->omp_fn_entry_num_threads_reg; + rtx single_p = gen_reg_rtx (BImode); + + rtx_code_label *lbl_copy_tmp_pvar = gen_label_rtx (); + LABEL_NUSES (lbl_copy_tmp_pvar) = 1; + + rtx_insn *lbl_fallthru = NEXT_INSN (tail); + gcc_assert (lbl_fallthru); + if (!LABEL_P (lbl_fallthru)) + { + rtx_code_label *nlbl = gen_label_rtx (); + LABEL_NUSES (nlbl) = 1; + emit_label_before (nlbl, lbl_fallthru); + lbl_fallthru = nlbl; + } + emit_insn_before + (gen_rtx_SET (single_p, + gen_rtx_EQ (BImode, nthr, GEN_INT (1))), + label); + emit_insn_before + (gen_br_true (single_p, lbl_copy_tmp_pvar), label); + emit_jump_insn_before (copy_rtx (tail_branch), label); + emit_insn_before (gen_jump (lbl_fallthru), label); + emit_label_before (lbl_copy_tmp_pvar, label); + emit_insn_before (gen_rtx_SET (tmp, pvar), label); + } + else + emit_insn_before (gen_rtx_SET (tmp, pvar), label); + emit_insn_before (gen_rtx_SET (pvar, tmp), tail); #endif emit_insn_before (nvptx_gen_warp_bcast (pvar), tail); @@ -5826,10 +6019,29 @@ nvptx_reorg (void) delete pars; } + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && offloading_function_p (current_function_decl) + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl)) + && !lookup_attribute ("ompacc seq", + DECL_ATTRIBUTES (current_function_decl))) + { + cfun->machine->omp_fn_entry_num_threads_reg = gen_reg_rtx (SImode); + + /* Discover & process partitioned regions. */ + parallel *pars = nvptx_discover_pars (&bb_insn_map); + nvptx_process_pars (pars); + nvptx_neuter_pars (pars, GOMP_DIM_MASK (GOMP_DIM_VECTOR), 0); + delete pars; + } + /* Replace subregs. */ nvptx_reorg_subreg (); - if (TARGET_UNIFORM_SIMT) + if (TARGET_UNIFORM_SIMT + && (flag_openmp_target != OMP_TARGET_MODE_OMPACC + || !lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl)))) nvptx_reorg_uniform_simt (); #if WORKAROUND_PTXJIT_BUG_2 @@ -6076,6 +6288,12 @@ nvptx_file_end (void) write_var_marker (asm_out_file, false, true, "__nvptx_uni"); fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n"); } + if (need_omp_num_threads) + { + write_var_marker (asm_out_file, false, true, "__nvptx_omp_num_threads"); + fprintf (asm_out_file, + ".extern .shared .u32 __nvptx_omp_num_threads;\n"); + } } /* Expander for the shuffle builtins. */ @@ -6732,6 +6950,9 @@ nvptx_goacc_fork_join (gcall *call, const int dims[], tree arg = gimple_call_arg (call, 2); unsigned axis = TREE_INT_CST_LOW (arg); + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + return true; + /* We only care about worker and vector partitioning. */ if (axis < GOMP_DIM_WORKER) return false; diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h index d815081..59580d2 100644 --- a/gcc/config/nvptx/nvptx.h +++ b/gcc/config/nvptx/nvptx.h @@ -267,6 +267,9 @@ struct GTY(()) machine_function for per-lane storage in OpenMP SIMD regions. */ unsigned HOST_WIDE_INT simt_stack_size; unsigned HOST_WIDE_INT simt_stack_align; + + rtx omp_parallel_predicate; + rtx omp_fn_entry_num_threads_reg; }; #endif diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index d271265..1d1a857 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -80,6 +80,14 @@ UNSPECV_SIMT_EXIT UNSPECV_RED_PART + + UNSPECV_GET_TID + UNSPECV_GET_NTID + UNSPECV_GET_CTAID + UNSPECV_GET_NCTAID + + UNSPECV_OMP_PARALLEL_FORK + UNSPECV_OMP_PARALLEL_JOIN ]) (define_attr "subregs_ok" "false,true" @@ -123,6 +131,12 @@ : immediate_operand (op, mode)); }) +(define_predicate "nvptx_shared_mem_operand" + (match_code "mem") +{ + return nvptx_mem_shared_p (op); +}) + (define_predicate "const0_operand" (and (match_code "const_int") (match_test "op == const0_rtx"))) @@ -1774,6 +1788,60 @@ return asms[INTVAL (operands[1])]; }) +(define_expand "gomp_barrier" + [(const_int 1)] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" +{ + emit_insn (gen_nvptx_barsync (GEN_INT (0), GEN_INT (0))); + DONE; +}) + +(define_expand "omp_get_num_threads" + [(match_operand 0 "nvptx_register_operand" "=R")] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" +{ + nvptx_expand_omp_get_num_threads (operands[0]); + DONE; +}) + +(define_insn "omp_get_num_teams" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NCTAID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%nctaid.x;") + +(define_insn "omp_get_thread_num" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_TID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%tid.x;") + +(define_insn "omp_get_team_num" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_CTAID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%ctaid.x;") + +(define_insn "get_ntid" + [(set (match_operand:SI 0 "nvptx_register_operand" "=R") + (unspec_volatile:SI [(const_int 0)] UNSPECV_GET_NTID))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tmov.u32\\t%0, %%ntid.x;") + +(define_insn "nvptx_omp_parallel_fork" + [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m") + (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")] + UNSPECV_OMP_PARALLEL_FORK))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tst.shared.u32\\t%0, %1; //omp parallel fork") + +(define_insn "nvptx_omp_parallel_join" + [(set (match_operand:SI 0 "nvptx_shared_mem_operand" "=m") + (unspec_volatile:SI [(match_operand:SI 1 "nvptx_register_operand" "R")] + UNSPECV_OMP_PARALLEL_JOIN))] + "flag_openmp_target == OMP_TARGET_MODE_OMPACC" + "%.\\tst.shared.u32\\t%0, %1; //omp parallel join") + (define_insn "nvptx_fork" [(unspec_volatile:SI [(match_operand:SI 0 "const_int_operand" "")] UNSPECV_FORK)] diff --git a/gcc/expr.cc b/gcc/expr.cc index e0a0b80..58a596e 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -10532,7 +10532,8 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode tmode, /* Allow accel compiler to handle variables that require special treatment, e.g. if they have been modified in some way earlier in compilation by the adjust_private_decl OpenACC hook. */ - if (flag_openacc && targetm.goacc.expand_var_decl) + if ((flag_openacc || flag_openmp_target == OMP_TARGET_MODE_OMPACC) + && targetm.goacc.expand_var_decl) { temp = targetm.goacc.expand_var_decl (exp); if (temp) diff --git a/gcc/flag-types.h b/gcc/flag-types.h index 2bfab98..518caad 100644 --- a/gcc/flag-types.h +++ b/gcc/flag-types.h @@ -519,6 +519,12 @@ enum omp_target_simd_clone_device_kind OMP_TARGET_SIMD_CLONE_ANY = 3 }; +enum omp_target_mode_kind +{ + OMP_TARGET_MODE_DEFAULT = 0, + OMP_TARGET_MODE_OMPACC = 1 +}; + #endif #endif /* ! GCC_FLAG_TYPES_H */ diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 6a3bd68..4274c33 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -253,6 +253,7 @@ struct gimplify_omp_ctx bool order_concurrent; bool has_depend; bool in_for_exprs; + bool ompacc; int defaultmap[5]; hash_map<tree, oacc_array_mapping_info> *decl_data_clause; }; @@ -11345,6 +11346,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_CLAUSE_USES_ALLOCATORS: break; + case OMP_CLAUSE__OMPACC_: + ctx->ompacc = true; + break; + case OMP_CLAUSE_ORDER: ctx->order_concurrent = true; break; @@ -12657,6 +12662,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p, case OMP_CLAUSE_FINALIZE: case OMP_CLAUSE_INCLUSIVE: case OMP_CLAUSE_EXCLUSIVE: + case OMP_CLAUSE__OMPACC_: case OMP_CLAUSE_TILE: case OMP_CLAUSE_UNROLL_FULL: case OMP_CLAUSE_UNROLL_NONE: @@ -13250,6 +13256,21 @@ static void omp_for_drop_tile_clauses (tree for_stmt) } } +/* Return true if in an omp_context in OMPACC mode. */ +static bool +gimplify_omp_ctx_ompacc_p (void) +{ + if (cgraph_node::get (current_function_decl)->offloadable + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return true; + struct gimplify_omp_ctx *ctx; + for (ctx = gimplify_omp_ctxp; ctx; ctx = ctx->outer_context) + if (ctx->ompacc) + return true; + return false; +} + /* Gimplify the gross structure of an OMP_FOR statement. */ static enum gimplify_status @@ -13281,6 +13302,18 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p) *expr_p = NULL_TREE; return GS_ERROR; } + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimplify_omp_ctx_ompacc_p ()) + { + gcc_assert (inner_for_stmt && TREE_CODE (for_stmt) == OMP_DISTRIBUTE); + *expr_p = OMP_FOR_BODY (for_stmt); + tree c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_GANG); + OMP_CLAUSE_CHAIN (c) = OMP_FOR_CLAUSES (inner_for_stmt); + OMP_FOR_CLAUSES (inner_for_stmt) = c; + return GS_OK; + } + if (data[2] && OMP_FOR_PRE_BODY (*data[2])) { append_to_statement_list_force (OMP_FOR_PRE_BODY (*data[2]), diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc index 3d57643..3c833fc 100644 --- a/gcc/lto-wrapper.cc +++ b/gcc/lto-wrapper.cc @@ -733,6 +733,7 @@ append_compiler_options (obstack *argv_obstack, vec<cl_decoded_option> opts) case OPT_fcommon: case OPT_fgnu_tm: case OPT_fopenmp: + case OPT_fopenmp_target_: case OPT_fopenacc: case OPT_fopenacc_dim_: case OPT_foffload_abi_: diff --git a/gcc/omp-builtins.def b/gcc/omp-builtins.def index b3715b9..6d7e9d3 100644 --- a/gcc/omp-builtins.def +++ b/gcc/omp-builtins.def @@ -71,9 +71,9 @@ DEF_GOACC_BUILTIN_ONLY (BUILT_IN_GOACC_SINGLE_COPY_END, "GOACC_single_copy_end", DEF_GOMP_BUILTIN (BUILT_IN_OMP_IS_INITIAL_DEVICE, "omp_is_initial_device", BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_THREAD_NUM, "omp_get_thread_num", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) + BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_THREADS, "omp_get_num_threads", - BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) + BT_FN_INT, ATTR_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_TEAM_NUM, "omp_get_team_num", BT_FN_INT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GOMP_BUILTIN (BUILT_IN_OMP_GET_NUM_TEAMS, "omp_get_num_teams", diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index afe0006..d7e61c1 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -1050,11 +1050,16 @@ remove_exit_barrier (struct omp_region *region) from within current function (this would be easy to check) or from some function it calls and gets passed an address of such a variable. */ + gomp_parallel *parallel_stmt + = as_a <gomp_parallel *> (last_stmt (region->entry)); + tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && child_fun == NULL_TREE) + any_addressable_vars = 0; + if (any_addressable_vars < 0) { - gomp_parallel *parallel_stmt - = as_a <gomp_parallel *> (last_stmt (region->entry)); - tree child_fun = gimple_omp_parallel_child_fn (parallel_stmt); tree local_decls, block, decl; unsigned ix; @@ -7773,6 +7778,17 @@ expand_oacc_for (struct omp_region *region, struct omp_for_data *fd) /* The SSA parallelizer does gang parallelism. */ gwv = build_int_cst (integer_type_node, GOMP_DIM_MASK (GOMP_DIM_GANG)); } + else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + tree clauses = gimple_omp_for_clauses (for_stmt); + int omp_mask = 0; + if (omp_find_clause (clauses, OMP_CLAUSE_GANG)) + omp_mask |= GOMP_DIM_MASK (GOMP_DIM_GANG); + if (omp_find_clause (clauses, OMP_CLAUSE_VECTOR)) + omp_mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR); + gcc_assert (omp_mask); + gwv = build_int_cst (integer_type_node, omp_mask); + } if (fd->collapse > 1 || fd->tiling) { @@ -9816,6 +9832,13 @@ get_target_arguments (gimple_stmt_iterator *gsi, gomp_target *tgt_stmt) t = OMP_CLAUSE_THREAD_LIMIT_EXPR (c); else t = integer_minus_one_node; + + /* Currently, OMPACC mode has a limitation of only one warp thread. */ + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute + ("ompacc", DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt_stmt)))) + t = integer_one_node; + push_target_argument_according_to_value (gsi, GOMP_TARGET_ARG_DEVICE_ALL, GOMP_TARGET_ARG_THREAD_LIMIT, t, &args); @@ -10698,6 +10721,44 @@ expand_omp (struct omp_region *region) switch (region->type) { case GIMPLE_OMP_PARALLEL: + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + struct omp_region *r; + for (r = region->outer; r; r = r->outer) + if (r->type == GIMPLE_OMP_TARGET) + { + gomp_target *tgt + = as_a <gomp_target *> (last_stmt (r->entry)); + tree tgtfn_attrs + = DECL_ATTRIBUTES (gimple_omp_target_child_fn (tgt)); + if (!lookup_attribute ("ompacc", tgtfn_attrs)) + r = NULL; + break; + } + if (r != NULL + || (lookup_attribute + ("ompacc", DECL_ATTRIBUTES (current_function_decl)))) + { + gimple_stmt_iterator gsi; + gsi = gsi_last_nondebug_bb (region->entry); + gcc_assert (!gsi_end_p (gsi) + && gimple_code + (gsi_stmt (gsi)) == GIMPLE_OMP_PARALLEL); + gsi_remove (&gsi, true); + + if (region->exit) + { + gsi = gsi_last_nondebug_bb (region->exit); + gcc_assert (!gsi_end_p (gsi) + && gimple_code + (gsi_stmt (gsi)) == GIMPLE_OMP_RETURN); + gsi_remove (&gsi, true); + } + break; + } + } + /* Fallthrough. */ + case GIMPLE_OMP_TASK: expand_omp_taskreg (region); break; diff --git a/gcc/omp-general.cc b/gcc/omp-general.cc index abaae12..9823535 100644 --- a/gcc/omp-general.cc +++ b/gcc/omp-general.cc @@ -202,8 +202,12 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd, struct omp_for_data_loop dummy_loop; location_t loc = gimple_location (for_stmt); bool simd = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_SIMD; - bool distribute = gimple_omp_for_kind (for_stmt) - == GF_OMP_FOR_KIND_DISTRIBUTE; + bool distribute = + (gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_DISTRIBUTE + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP + && omp_find_clause (gimple_omp_for_clauses (for_stmt), + OMP_CLAUSE_GANG))); bool taskloop = gimple_omp_for_kind (for_stmt) == GF_OMP_FOR_KIND_TASKLOOP; bool order_reproducible = false; @@ -441,7 +445,8 @@ omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd, loop->n2 = gimple_omp_for_final (for_stmt, i); gcc_assert (loop->cond_code != NE_EXPR || (gimple_omp_for_kind (for_stmt) - != GF_OMP_FOR_KIND_OACC_LOOP)); + != GF_OMP_FOR_KIND_OACC_LOOP) + || flag_openmp_target == OMP_TARGET_MODE_OMPACC); if (TREE_CODE (loop->n2) == TREE_VEC) { if (loop->outer) diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index bb4d148..9a569df 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -187,6 +187,10 @@ struct omp_context than teams is strictly nested in it. */ bool nonteams_nested_p; + /* Indicates that context is in OMPACC mode, set after _ompacc_ internal + clauses are removed. */ + bool ompacc_p; + /* Candidates for adjusting OpenACC privatization level. */ vec<tree> oacc_privatization_candidates; @@ -2039,6 +2043,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, case OMP_CLAUSE_TASK_REDUCTION: case OMP_CLAUSE_ALLOCATE: case OMP_CLAUSE_ALLOCATOR: + case OMP_CLAUSE__OMPACC_: break; case OMP_CLAUSE_ALIGNED: @@ -2263,6 +2268,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx, case OMP_CLAUSE_FILTER: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE_ALLOCATOR: + case OMP_CLAUSE__OMPACC_: break; case OMP_CLAUSE__CACHE_: @@ -2332,6 +2338,21 @@ omp_maybe_offloaded_ctx (omp_context *ctx) return false; } +static bool +ompacc_ctx_p (omp_context *ctx) +{ + if (cgraph_node::get (current_function_decl)->offloadable + && lookup_attribute ("ompacc", + DECL_ATTRIBUTES (current_function_decl))) + return true; + for (; ctx; ctx = ctx->outer) + if (is_gimple_omp_offloaded (ctx->stmt)) + return (ctx->ompacc_p + || omp_find_clause (gimple_omp_target_clauses (ctx->stmt), + OMP_CLAUSE__OMPACC_)); + return false; +} + /* Build a decl for the omp child function. It'll not contain a body yet, just the bare decl. */ @@ -2641,8 +2662,28 @@ scan_omp_parallel (gimple_stmt_iterator *gsi, omp_context *outer_ctx) DECL_NAMELESS (name) = 1; TYPE_NAME (ctx->record_type) = name; TYPE_ARTIFICIAL (ctx->record_type) = 1; - create_omp_child_function (ctx, false); - gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)) + { + tree data_name = get_identifier (".omp_data_i_par"); + tree t = build_decl (gimple_location (stmt), VAR_DECL, data_name, + ptr_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_NAMELESS (t) = 1; + DECL_CONTEXT (t) = current_function_decl; + DECL_SEEN_IN_BIND_EXPR_P (t) = 1; + DECL_CHAIN (t) = ctx->block_vars; + ctx->block_vars = t; + TREE_USED (t) = 1; + TREE_READONLY (t) = 1; + ctx->receiver_decl = t; + } + else + { + create_omp_child_function (ctx, false); + gimple_omp_parallel_set_child_fn (stmt, ctx->cb.dst_fn); + } scan_sharing_clauses (gimple_omp_parallel_clauses (stmt), ctx); scan_omp (gimple_omp_body_ptr (stmt), ctx); @@ -3565,6 +3606,24 @@ scan_omp_target (gomp_target *stmt, omp_context *outer_ctx) scan_sharing_clauses (clauses, ctx, base_pointers_restrict); scan_omp (gimple_omp_body_ptr (stmt), ctx); + if (offloaded && flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + for (tree *cp = gimple_omp_target_clauses_ptr (stmt); *cp; + cp = &OMP_CLAUSE_CHAIN (*cp)) + if (OMP_CLAUSE_CODE (*cp) == OMP_CLAUSE__OMPACC_) + { + DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt)) + = tree_cons (get_identifier ("ompacc"), NULL_TREE, + DECL_ATTRIBUTES (gimple_omp_target_child_fn (stmt))); + /* Unlink and remove. */ + *cp = OMP_CLAUSE_CHAIN (*cp); + + /* Set to true. */ + ctx->ompacc_p = true; + break; + } + } + if (TYPE_FIELDS (ctx->record_type) == NULL) ctx->record_type = ctx->receiver_decl = NULL; else @@ -8947,6 +9006,9 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses, gcc_unreachable (); else if (is_oacc_kernels_decomposed_part (tgt)) ; + else if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && is_omp_target (tgt->stmt)) + ; else gcc_unreachable (); @@ -8975,7 +9037,13 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses, != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE); } - if (tag & OLF_TILE) + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_code (ctx->stmt) == GIMPLE_OMP_PARALLEL + && tgt + && ompacc_ctx_p (tgt)) + levels = 1; + else + if (tag & OLF_TILE) /* Tiling could use all 3 levels. */ levels = 3; else @@ -12460,6 +12528,23 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) push_gimplify_context (); + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)) + { + enum omp_clause_code code = OMP_CLAUSE_ERROR; + if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_FOR) + code = OMP_CLAUSE_VECTOR; + else if (gimple_omp_for_kind (stmt) == GF_OMP_FOR_KIND_DISTRIBUTE) + code = OMP_CLAUSE_GANG; + if (code) + { + /* Adjust into OACC loop kind with vector/gang clause. */ + gimple_omp_for_set_kind (stmt, GF_OMP_FOR_KIND_OACC_LOOP); + tree c = build_omp_clause (UNKNOWN_LOCATION, code); + OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (stmt); + gimple_omp_for_set_clauses (stmt, c); + } + } + if (is_gimple_omp_oacc (ctx->stmt)) oacc_privatization_scan_clause_chain (ctx, gimple_omp_for_clauses (stmt)); @@ -12481,7 +12566,9 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) gbind *inner_bind = as_a <gbind *> (gimple_seq_first_stmt (omp_for_body)); tree vars = gimple_bind_vars (inner_bind); - if (is_gimple_omp_oacc (ctx->stmt)) + if (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx))) oacc_privatization_scan_decl_chain (ctx, vars); gimple_bind_append_vars (new_stmt, vars); /* bind_vars/BLOCK_VARS are being moved to new_stmt/block, don't @@ -12597,7 +12684,8 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (stmt), ctx); gcall *private_marker = NULL; - if (is_gimple_omp_oacc (ctx->stmt) + if ((is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx))) && !gimple_seq_empty_p (omp_for_body)) private_marker = lower_oacc_private_marker (ctx); @@ -12652,15 +12740,16 @@ lower_omp_for (gimple_stmt_iterator *gsi_p, omp_context *ctx) /* Once lowered, extract the bounds and clauses. */ omp_extract_for_data (stmt, &fd, NULL); - bool oacc_kernels_parloops = false; - if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS - || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS) - oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx); - if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops) + if (flag_openacc) { - lower_oacc_head_tail (gimple_location (stmt), - gimple_omp_for_clauses (stmt), private_marker, - NULL, NULL, &oacc_head, &oacc_tail, ctx); + bool oacc_kernels_parloops = false; + if (param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE_PARLOOPS + || param_openacc_kernels == OPENACC_KERNELS_PARLOOPS) + oacc_kernels_parloops = ctx_in_oacc_kernels_region (ctx); + if (is_gimple_omp_oacc (ctx->stmt) && !oacc_kernels_parloops) + lower_oacc_head_tail (gimple_location (stmt), + gimple_omp_for_clauses (stmt), private_marker, + NULL, NULL, &oacc_head, &oacc_tail, ctx); } /* Add OpenACC partitioning and reduction markers just before the loop. */ @@ -13447,9 +13536,20 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx) bind = gimple_build_bind (NULL, NULL, make_node (BLOCK)); else bind = gimple_build_bind (NULL, NULL, gimple_bind_block (par_bind)); + + gimple_seq oacc_head = NULL, oacc_tail = NULL; + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && gimple_code (stmt) == GIMPLE_OMP_PARALLEL + && ompacc_ctx_p (ctx)) + lower_oacc_head_tail (gimple_location (stmt), clauses, + NULL, NULL, NULL, &oacc_head, &oacc_tail, + ctx); + gsi_replace (gsi_p, dep_bind ? dep_bind : bind, true); gimple_bind_add_seq (bind, ilist); + gimple_bind_add_seq (bind, oacc_head); gimple_bind_add_stmt (bind, stmt); + gimple_bind_add_seq (bind, oacc_tail); gimple_bind_add_seq (bind, olist); pop_gimplify_context (NULL); @@ -15320,7 +15420,9 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx) gimple_seq fork_seq = NULL; gimple_seq join_seq = NULL; - if (offloaded && is_gimple_omp_oacc (ctx->stmt)) + if (offloaded && (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)))) { /* If there are reductions on the offloaded region itself, treat them as a dummy GANG loop. */ @@ -15456,6 +15558,22 @@ lower_omp_teams (gimple_stmt_iterator *gsi_p, omp_context *ctx) lower_omp (gimple_omp_body_ptr (teams_stmt), ctx); lower_reduction_clauses (gimple_omp_teams_clauses (teams_stmt), &olist, NULL, ctx); + + if (flag_openmp_target == OMP_TARGET_MODE_OMPACC && ompacc_ctx_p (ctx)) + { + /* Forward the team/gang-wide variables to outer target region. */ + struct omp_context *tgt = ctx; + while (tgt && !is_gimple_omp_offloaded (tgt->stmt)) + tgt = tgt->outer; + if (tgt) + { + int i; + tree decl; + FOR_EACH_VEC_ELT (ctx->oacc_privatization_candidates, i, decl) + tgt->oacc_privatization_candidates.safe_push (decl); + } + } + gimple_seq_add_stmt (&bind_body, teams_stmt); gimple_seq_add_seq (&bind_body, gimple_omp_body (teams_stmt)); @@ -15620,7 +15738,9 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx) ctx); break; case GIMPLE_BIND: - if (ctx && is_gimple_omp_oacc (ctx->stmt)) + if (ctx && (is_gimple_omp_oacc (ctx->stmt) + || (flag_openmp_target == OMP_TARGET_MODE_OMPACC + && ompacc_ctx_p (ctx)))) { tree vars = gimple_bind_vars (as_a <gbind *> (stmt)); oacc_privatization_scan_decl_chain (ctx, vars); diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index b18f28f..9dae07c 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -388,6 +388,269 @@ omp_discover_implicit_declare_target (void) lang_hooks.decls.omp_finish_decl_inits (); } +static bool ompacc_supported_clauses_p (tree clauses) +{ + for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c)) + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_COLLAPSE: + case OMP_CLAUSE_NOWAIT: + continue; + default: + return false; + } + return true; +} + +struct target_region_data +{ + tree func_decl; + bool has_omp_for; + bool has_omp_parallel; + bool ompacc_invalid; + auto_vec<const char *> warning_msgs; + auto_vec<location_t> warning_locs; + target_region_data (void) + : func_decl (NULL_TREE), + has_omp_for (false), has_omp_parallel (false), ompacc_invalid (false), + warning_msgs (), warning_locs () {} +}; + +static tree scan_omp_target_region_r (tree *, int *, void *); + +static void +scan_fndecl_for_ompacc (tree decl, target_region_data *tgtdata) +{ + target_region_data td; + td.func_decl = decl; + walk_tree_without_duplicates (&DECL_SAVED_TREE (decl), + scan_omp_target_region_r, &td); + tree v; + if ((v = lookup_attribute ("omp declare variant base", + DECL_ATTRIBUTES (decl))) + || (v = lookup_attribute ("omp declare variant variant", + DECL_ATTRIBUTES (decl)))) + { + td.ompacc_invalid = true; + td.warning_msgs.safe_push ("declare variant not supported for OMPACC"); + td.warning_locs.safe_push (EXPR_LOCATION (v)); + } + if (tgtdata) + { + tgtdata->has_omp_for |= td.has_omp_for; + tgtdata->has_omp_parallel |= td.has_omp_parallel; + tgtdata->ompacc_invalid |= td.ompacc_invalid; + for (unsigned i = 0; i < td.warning_msgs.length (); i++) + tgtdata->warning_msgs.safe_push (td.warning_msgs[i]); + for (unsigned i = 0; i < td.warning_locs.length (); i++) + tgtdata->warning_locs.safe_push (td.warning_locs[i]); + } + + if (!td.ompacc_invalid + && !lookup_attribute ("ompacc", DECL_ATTRIBUTES (decl))) + { + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("ompacc"), NULL_TREE, + DECL_ATTRIBUTES (decl)); + if (!td.has_omp_parallel) + DECL_ATTRIBUTES (decl) + = tree_cons (get_identifier ("ompacc seq"), NULL_TREE, + DECL_ATTRIBUTES (decl)); + } +} + +static tree +scan_omp_target_region_r (tree *tp, int *walk_subtrees, void *data) +{ + target_region_data *tgtdata = (target_region_data *) data; + + if (TREE_CODE (*tp) == FUNCTION_DECL + && !(fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_THREAD_NUM) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_THREADS) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_TEAM_NUM) + || fndecl_built_in_p (*tp, BUILT_IN_OMP_GET_NUM_TEAMS) + || id_equal (DECL_NAME (*tp), "omp_get_thread_num") + || id_equal (DECL_NAME (*tp), "omp_get_num_threads") + || id_equal (DECL_NAME (*tp), "omp_get_team_num") + || id_equal (DECL_NAME (*tp), "omp_get_num_teams")) + && *tp != tgtdata->func_decl) + { + tree decl = *tp; + symtab_node *node = symtab_node::get (*tp); + if (node) + { + node = node->ultimate_alias_target (); + decl = node->decl; + } + + if (!DECL_EXTERNAL (decl) && DECL_SAVED_TREE (decl)) + { + scan_fndecl_for_ompacc (decl, tgtdata); + } + else + { + tgtdata->warning_msgs.safe_push ("referencing external function"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + tgtdata->ompacc_invalid = true; + } + *walk_subtrees = 0; + return NULL_TREE; + } + + switch (TREE_CODE (*tp)) + { + case OMP_FOR: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else if (OMP_FOR_NON_RECTANGULAR (*tp)) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("non-rectangular loops not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else + tgtdata->has_omp_for = true; + break; + + case OMP_PARALLEL: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + else + tgtdata->has_omp_parallel = true; + break; + + case OMP_DISTRIBUTE: + case OMP_TEAMS: + if (!ompacc_supported_clauses_p (OMP_CLAUSES (*tp))) + { + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("clauses not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + } + /* Fallthru. */ + + case OMP_ATOMIC: + case OMP_ATOMIC_READ: + case OMP_ATOMIC_CAPTURE_OLD: + case OMP_ATOMIC_CAPTURE_NEW: + break; + + case OMP_SIMD: + case OMP_TASK: + case OMP_LOOP: + case OMP_TASKLOOP: + case OMP_TASKGROUP: + case OMP_SECTION: + case OMP_MASTER: + case OMP_MASKED: + case OMP_ORDERED: + case OMP_CRITICAL: + case OMP_SCAN: + case OMP_METADIRECTIVE: + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("construct not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + *walk_subtrees = 0; + break; + + case OMP_TARGET: + tgtdata->ompacc_invalid = true; + tgtdata->warning_msgs.safe_push ("nested target/reverse offload " + "not supported"); + tgtdata->warning_locs.safe_push (EXPR_LOCATION (*tp)); + *walk_subtrees = 0; + break; + + default: + break; + } + return NULL_TREE; +} + +static tree +scan_omp_target_construct_r (tree *tp, int *walk_subtrees, + void *data) +{ + if (TREE_CODE (*tp) == OMP_TARGET) + { + target_region_data td; + td.func_decl = (tree) data; + walk_tree_without_duplicates (&OMP_TARGET_BODY (*tp), + scan_omp_target_region_r, &td); + for (tree c = OMP_TARGET_CLAUSES (*tp); c; c = OMP_CLAUSE_CHAIN (c)) + { + switch (OMP_CLAUSE_CODE (c)) + { + case OMP_CLAUSE_MAP: + continue; + default: + td.ompacc_invalid = true; + td.warning_msgs.safe_push ("clause not supported"); + td.warning_locs.safe_push (EXPR_LOCATION (c)); + break; + } + break; + } + if (!td.ompacc_invalid) + { + tree c = build_omp_clause (EXPR_LOCATION (*tp), OMP_CLAUSE__OMPACC_); + if (!td.has_omp_parallel) + OMP_CLAUSE__OMPACC__SEQ (c) = 1; + OMP_CLAUSE_CHAIN (c) = OMP_TARGET_CLAUSES (*tp); + OMP_TARGET_CLAUSES (*tp) = c; + } + else + { + warning_at (EXPR_LOCATION (*tp), 0, "Target region not suitable for " + "OMPACC mode"); + for (unsigned i = 0; i < td.warning_locs.length (); i++) + warning_at (td.warning_locs[i], 0, td.warning_msgs[i]); + } + *walk_subtrees = 0; + } + return NULL_TREE; +} + +void +omp_ompacc_attribute_tagging (void) +{ + cgraph_node *node; + FOR_EACH_DEFINED_FUNCTION (node) + if (DECL_SAVED_TREE (node->decl)) + { + if (DECL_STRUCT_FUNCTION (node->decl) + && DECL_STRUCT_FUNCTION (node->decl)->has_omp_target) + walk_tree_without_duplicates (&DECL_SAVED_TREE (node->decl), + scan_omp_target_construct_r, + node->decl); + + for (cgraph_node *cgn = first_nested_function (node); + cgn; cgn = next_nested_function (cgn)) + if (omp_declare_target_fn_p (cgn->decl)) + { + scan_fndecl_for_ompacc (cgn->decl, NULL); + + if (lookup_attribute ("ompacc", DECL_ATTRIBUTES (cgn->decl)) + && !lookup_attribute ("noinline", DECL_ATTRIBUTES (cgn->decl))) + { + DECL_ATTRIBUTES (cgn->decl) + = tree_cons (get_identifier ("noinline"), + NULL, DECL_ATTRIBUTES (cgn->decl)); + DECL_ATTRIBUTES (cgn->decl) + = tree_cons (get_identifier ("noipa"), + NULL, DECL_ATTRIBUTES (cgn->decl)); + } + } + } +} /* Create new symbols containing (address, size) pairs for global variables, marked with "omp declare target" attribute, as well as addresses for the @@ -480,6 +743,22 @@ omp_finish_file (void) static tree oacc_dim_call (bool pos, int dim, gimple_seq *seq) { + if (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC) + { + enum built_in_function fn; + if (dim == GOMP_DIM_VECTOR) + fn = pos ? BUILT_IN_OMP_GET_THREAD_NUM : BUILT_IN_OMP_GET_NUM_THREADS; + else if (dim == GOMP_DIM_GANG) + fn = pos ? BUILT_IN_OMP_GET_TEAM_NUM : BUILT_IN_OMP_GET_NUM_TEAMS; + else + gcc_unreachable (); + tree size = create_tmp_var (integer_type_node); + gimple *call = gimple_build_call (builtin_decl_explicit (fn), 0); + gimple_call_set_lhs (call, size); + gimple_seq_add_stmt (seq, call); + return size; + } + tree arg = build_int_cst (unsigned_type_node, dim); tree size = create_tmp_var (integer_type_node); enum internal_fn fn = pos ? IFN_GOACC_DIM_POS : IFN_GOACC_DIM_SIZE; @@ -2776,15 +3055,19 @@ execute_oacc_loop_designation () static unsigned int execute_oacc_device_lower () { - tree attrs = oacc_get_fn_attrib (current_function_decl); + tree attrs; + int dims[GOMP_DIM_MAX]; - if (!attrs) - /* Not an offloaded function. */ - return 0; + if (flag_openacc) + { + attrs = oacc_get_fn_attrib (current_function_decl); + if (!attrs) + /* Not an offloaded function. */ + return 0; - int dims[GOMP_DIM_MAX]; - for (unsigned i = 0; i < GOMP_DIM_MAX; i++) - dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + for (unsigned i = 0; i < GOMP_DIM_MAX; i++) + dims[i] = oacc_get_fn_dim_size (current_function_decl, i); + } hash_map<tree, tree> adjusted_vars; @@ -2853,7 +3136,8 @@ execute_oacc_device_lower () case IFN_UNIQUE_OACC_FORK: case IFN_UNIQUE_OACC_JOIN: - if (integer_minus_onep (gimple_call_arg (call, 2))) + if (flag_openacc + && integer_minus_onep (gimple_call_arg (call, 2))) remove = true; else if (!targetm.goacc.fork_join (call, dims, kind == IFN_UNIQUE_OACC_FORK)) @@ -3150,7 +3434,8 @@ public: /* TODO If this were gated on something like '!(fun->curr_properties & PROP_gimple_oaccdevlow)', then we could easily have several instances in the pass pipeline? */ - virtual bool gate (function *) { return flag_openacc; }; + virtual bool gate (function *) + { return flag_openacc || (flag_openmp && flag_openmp_target == OMP_TARGET_MODE_OMPACC); }; virtual unsigned int execute (function *) { diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h index f6556af..581893f 100644 --- a/gcc/omp-offload.h +++ b/gcc/omp-offload.h @@ -31,6 +31,7 @@ extern GTY(()) vec<tree, va_gc> *offload_vars; extern void omp_finish_file (void); extern void omp_discover_implicit_declare_target (void); +extern void omp_ompacc_attribute_tagging (void); extern tree oacc_extract_loop_call (gcall *call); diff --git a/gcc/opts.cc b/gcc/opts.cc index 3fbfca9..019ec97 100644 --- a/gcc/opts.cc +++ b/gcc/opts.cc @@ -1393,6 +1393,14 @@ finish_options (struct gcc_options *opts, struct gcc_options *opts_set, } + if (opts_set->x_flag_openmp_target) + { + if (opts->x_flag_openacc) + error ("%<-fopenacc%> not compatible with %<-fopenmp-target=%>"); + if (!opts->x_flag_openmp) + error ("%<-fopenmp-target=%> requires %<-fopenmp%> setting"); + } + diagnose_options (opts, opts_set, loc); } diff --git a/gcc/target-insns.def b/gcc/target-insns.def index de8c009..e146140 100644 --- a/gcc/target-insns.def +++ b/gcc/target-insns.def @@ -68,6 +68,11 @@ DEF_TARGET_INSN (oacc_dim_pos, (rtx x0, rtx x1)) DEF_TARGET_INSN (oacc_dim_size, (rtx x0, rtx x1)) DEF_TARGET_INSN (oacc_fork, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (oacc_join, (rtx x0, rtx x1, rtx x2)) +DEF_TARGET_INSN (gomp_barrier, (void)) +DEF_TARGET_INSN (omp_get_thread_num, (rtx x0)) +DEF_TARGET_INSN (omp_get_num_threads, (rtx x0)) +DEF_TARGET_INSN (omp_get_team_num, (rtx x0)) +DEF_TARGET_INSN (omp_get_num_teams, (rtx x0)) DEF_TARGET_INSN (omp_simt_enter, (rtx x0, rtx x1, rtx x2)) DEF_TARGET_INSN (omp_simt_exit, (rtx x0)) DEF_TARGET_INSN (omp_simt_lane, (rtx x0)) diff --git a/gcc/tree-core.h b/gcc/tree-core.h index 7a5a87a..fcfa87a0 100644 --- a/gcc/tree-core.h +++ b/gcc/tree-core.h @@ -498,6 +498,10 @@ enum omp_clause_code { loop or not. */ OMP_CLAUSE__SIMT_, + /* Internally used only clause, flag whether this is an "ompacc" + target region or not. */ + OMP_CLAUSE__OMPACC_, + /* OpenACC clause: independent. */ OMP_CLAUSE_INDEPENDENT, diff --git a/gcc/tree-nested.cc b/gcc/tree-nested.cc index 777f85f..c6b23ef 100644 --- a/gcc/tree-nested.cc +++ b/gcc/tree-nested.cc @@ -1491,6 +1491,7 @@ convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_BIND: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE__SCANTEMP_: + case OMP_CLAUSE__OMPACC_: break; /* The following clause belongs to the OpenACC cache directive, which @@ -2287,6 +2288,7 @@ convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi) case OMP_CLAUSE_BIND: case OMP_CLAUSE__CONDTEMP_: case OMP_CLAUSE__SCANTEMP_: + case OMP_CLAUSE__OMPACC_: break; /* The following clause belongs to the OpenACC cache directive, which diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc index 30c8f7b..df1e860 100644 --- a/gcc/tree-pretty-print.cc +++ b/gcc/tree-pretty-print.cc @@ -1421,6 +1421,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags) pp_string (pp, "_simt_"); break; + case OMP_CLAUSE__OMPACC_: + pp_string (pp, "_ompacc_"); + if (OMP_CLAUSE__OMPACC__SEQ (clause)) + pp_string (pp, "(seq)"); + break; + case OMP_CLAUSE_GANG: pp_string (pp, "gang"); if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE) diff --git a/gcc/tree-ssa-loop.cc b/gcc/tree-ssa-loop.cc index b7a5a0f..ee651ce 100644 --- a/gcc/tree-ssa-loop.cc +++ b/gcc/tree-ssa-loop.cc @@ -282,6 +282,11 @@ public: /* opt_pass methods: */ virtual bool gate (function *fn) { + if (flag_openmp + && flag_openmp_target == OMP_TARGET_MODE_OMPACC + && lookup_attribute ("ompacc", DECL_ATTRIBUTES (fn->decl))) + return true; + if (!flag_openacc) return false; diff --git a/gcc/tree.cc b/gcc/tree.cc index aed566f..0192fe3 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -340,6 +340,7 @@ unsigned const char omp_clause_num_ops[] = 1, /* OMP_CLAUSE_FILTER */ 1, /* OMP_CLAUSE__SIMDUID_ */ 0, /* OMP_CLAUSE__SIMT_ */ + 0, /* OMP_CLAUSE__OMPACC_ */ 0, /* OMP_CLAUSE_INDEPENDENT */ 1, /* OMP_CLAUSE_WORKER */ 1, /* OMP_CLAUSE_VECTOR */ @@ -437,6 +438,7 @@ const char * const omp_clause_code_name[] = "filter", "_simduid_", "_simt_", + "_ompacc_", "independent", "worker", "vector", @@ -1918,6 +1918,9 @@ class auto_suppress_location_wrappers #define OMP_CLAUSE__SIMDUID__DECL(NODE) \ OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__SIMDUID_), 0) +#define OMP_CLAUSE__OMPACC__SEQ(NODE) \ + (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE__OMPACC_)->base.public_flag) + #define OMP_CLAUSE_SCHEDULE_KIND(NODE) \ (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_SCHEDULE)->omp_clause.subcode.schedule_kind) diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c index b30b8df..146907a 100644 --- a/libgomp/config/nvptx/team.c +++ b/libgomp/config/nvptx/team.c @@ -34,6 +34,9 @@ struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon)); int __gomp_team_num __attribute__((shared,nocommon)); +/* Number of active target threads in team, used in ACC mode. */ +unsigned int __nvptx_omp_num_threads __attribute__((shared,nocommon)); + static void gomp_thread_start (struct gomp_thread_pool *); /* There should be some .shared space reserved for us. There's no way to diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-17.c b/libgomp/testsuite/libgomp.c-c++-common/for-17.c new file mode 100644 index 0000000..9771aaf --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/for-17.c @@ -0,0 +1,69 @@ +/* { dg-options "-fopenmp-target=acc" } */ +/* { dg-additional-options "-std=gnu99" { target c } } */ + +#define M(x, y, z) O(x, y, z) +#define O(x, y, z) x ## _ ## y ## _ ## z + +#define DO_PRAGMA(x) _Pragma (#x) + +#undef OMPFROM +#undef OMPTO +#define OMPFROM(v) DO_PRAGMA (omp target update from(v)) +#define OMPTO(v) DO_PRAGMA (omp target update to(v)) + +#pragma omp declare target + +#define OMPTGT DO_PRAGMA (omp target) +#define F parallel for +#define G pf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G +#undef OMPTGT + +#pragma omp end declare target + +#define F target parallel for +#define G tpf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +#define F target teams distribute +#define G ttd +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +#define F target teams distribute parallel for +#define G ttdpf +#define S +#define N(x) M(x, G, ompacc) +#include "for-2.h" +#undef S +#undef N +#undef F +#undef G + +int +main () +{ + if (test_pf_ompacc () + || test_tpf_ompacc () + || test_ttd_ompacc () + || test_ttdpf_ompacc ()) + __builtin_abort (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.c-c++-common/for-18.c b/libgomp/testsuite/libgomp.c-c++-common/for-18.c new file mode 100644 index 0000000..2486d3a --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/for-18.c @@ -0,0 +1,5 @@ +/* { dg-options "-fopenmp-target=acc" } */ +/* { dg-additional-options "-std=gnu99" {target c } } */ + +#define CONDNE +#include "for-17.c" |