diff options
author | Frederik Harwath <frederik@codesourcery.com> | 2021-11-16 16:16:22 +0100 |
---|---|---|
committer | Kwok Cheung Yeung <kcy@codesourcery.com> | 2023-05-12 19:13:48 +0100 |
commit | baf1aeb505d10f01cd2aa1a81ff08b0022fe18bc (patch) | |
tree | ace7391bd1ca0c9a45d6572edc8e222f457d9723 /gcc/graphite.cc | |
parent | 1a9d5c46c01d4481767832973a55220dd422a0b9 (diff) | |
download | gcc-baf1aeb505d10f01cd2aa1a81ff08b0022fe18bc.zip gcc-baf1aeb505d10f01cd2aa1a81ff08b0022fe18bc.tar.gz gcc-baf1aeb505d10f01cd2aa1a81ff08b0022fe18bc.tar.bz2 |
openacc: Use Graphite for dependence analysis in "kernels" regions
This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose. This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops". The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions. This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.
Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure. Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs. The data-dependence analysis step in
Graphite has so far been tied to the code generation step. This
commit introduces a separate data-dependence analysis step that avoids
the code generation. This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.
gcc/ChangeLog:
* Makefile.in: Add graphite-oacc.o
* cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false.
* cfgloop.h: Add can_be_parallel_valid_p field.
* cfgloopmanip.cc (copy_loop_info): Add assert.
* config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup):
* doc/invoke.texi: Adjust param openacc-kernels description.
* doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
* flag-types.h (enum openacc_kernels):Add
OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
* gimple-pretty-print.cc (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
* gimple.h (enum gf_mask): Add
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
widen GF_OMP_TARGET_KIND_MASK.
(is_gimple_omp_oacc): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(is_gimple_omp_offloaded): Likewise.
* gimplify.cc (gimplify_omp_for): Enable reduction localization
for "kernels" regions.
(gimplify_omp_workshare): Likewise.
* graphite-dependences.cc (scop_get_reads_and_writes): Handle
"kills" and "reduction" PDRs.
(apply_schedule_on_deps): Add dump output for intermediate
steps of the dependence computation to enable understanding
of unexpected dependences.
(carries_deps): Likewise.
(scop_get_dependences): Handle "kill" operations and add dump
output.
* graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function.
(graphite_oacc_analyze_scop): New function.
* graphite-optimize-isl.cc (optimize_isl): Remove "static" and
add argument to identify OpenACC use; don't fail on unchanged
schedule in this case.
* graphite-poly.cc (new_poly_dr): Handle "kills".
(print_pdr): Likewise.
(new_gimple_poly_bb): Likewise.
(free_gimple_poly_bb): Likewise.
(new_scop): Handle "reduction", "private", and "firstprivate"
hash sets.
(free_scop): Likewise.
(print_isl_space): New function.
(debug_isl_space): New function.
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Don't fail if niter is 0 in OpenACC functions.
(scop_detection::add_scop): Don't reject regions with only one
loop in OpenACC functions.
(ignored_oacc_internal_call_p): New function.
(scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
(stmt_has_side_effects): Ignore internal OpenACC function calls.
(add_write): Likewise.
(add_read): Likewise.
(add_kill): New function.
(add_kills): New function.
(add_oacc_kills): New function.
(try_generate_gimple_bb): Kill false dependences for OpenACC
"private"/"firstprivate" vars.
(gather_bbs::gather_bbs): Determin OpenACC
"private"/"firstprivate" vars in region.
(gather_bbs::before_dom_children): Add assert.
(determine_openacc_reductions): New function.
(build_scops): Determine OpenACC "reduction" vars in SCoP.
* graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration.
(oacc_internal_call_p): New function.
(build_poly_dr): Ignore internal OpenACC function calls,
handle "reduction" refs.
(build_poly_sr): Likewise; handle "kill" operations.
* graphite.cc (graphite_transform_loops): Accept functions with
only a single loop.
(oacc_enable_graphite_p): New function.
(gate_graphite_transforms): Enable pass on OpenACC functions.
* graphite.h (enum poly_dr_type): Add PDR_KILL.
(struct poly_dr): Add "is_reduction" field.
(new_poly_dr): Add argument to declaration.
(pdr_kill_p): New function.
(print_isl_space): New declaration.
(debug_isl_space): New declaration.
(struct scop): Add fields "reductions_vars",
"oacc_firstprivate_vars", and "oacc_private_scalars".
(optimize_isl): New declaration.
(graphite_oacc_analyze_scop): New declaration.
* internal-fn.cc (expand_UNIQUE): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
* internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
* omp-expand.cc (struct omp_region): Adjust comment.
(expand_omp_taskloop_for_inner):
(expand_omp_for): Add asserts about expected "kernels" region types.
(mark_loops_in_oacc_kernels_region): Likewise.
(expand_omp_target): Likewise; handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(build_omp_regions_1): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
Likewise.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (oacc_get_kernels_attrib): New function.
(oacc_get_fn_dim_size): Allow argument to be NULL.
* omp-general.h (oacc_get_kernels_attrib): New declaration.
* omp-low.cc (struct omp_context): Add fields
"oacc_firstprivate_vars" and "oacc_private_scalars".
(was_originally_oacc_kernels): New function.
(is_oacc_kernels):
(is_oacc_kernels_decomposed_graphite_part): New function.
(new_omp_context): Allocate "oacc_first_private_vars" and
"oacc_private_scalars" ...
(delete_omp_context): ... and free from here.
(oacc_record_firstprivate_var_clauses): New function.
(oacc_record_private_scalars): New function.
(scan_sharing_clauses): Call functions to record "private"
scalars and "firstprivate" variables.
(check_oacc_kernel_gwv): Add assert.
(ctx_in_oacc_kernels_region): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(scan_omp_for): Likewise.
(check_omp_nesting_restrictions): Likewise.
(lower_oacc_head_mark): Likewise.
(lower_omp_for): Likewise.
(lower_omp_target): Create "private" and "firstprivate" marker
call statements.
(lower_oacc_head_tail): Adjust "private" and "firstprivate"
marker calls.
(lower_oacc_reductions): Emit "private" and "firstprivate"
marker call statements.
(make_oacc_firstprivate_vars_marker): New function.
(make_oacc_private_scalars_marker): New function.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
region using the new "kernels" handling.
(make_region_seq): Adjust default region type for new
"kernels" handling; no more exceptions, let Graphite handle everything.
(make_region_loop_nest): Likewise; add dump output and assert.
(adjust_nested_loop_clauses): Stop creating "auto" clauses if
loop has "independent", "gang" etc.
(transform_kernels_loop_clauses): Likewise.
* omp-offload.cc (oacc_extract_loop_call): New function.
(oacc_loop_get_cfg_loop): New function.
(can_be_parallel_str): New function.
(oacc_loop_can_be_parallel_p): New function.
(oacc_parallel_kernels_graphite_fun_p): New function.
(oacc_parallel_fun_p): New function.
(oacc_loop_transform_auto_into_independent): New function, ...
(oacc_loop_fixed_partitions): ... called from here to transfer
the result of Graphite's analysis to the loop.
(execute_oacc_loop_designation): Handle "oacc
functions with "parallel_kernels_graphite" attribute.
(execute_oacc_device_lower): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
* omp-offload.h (oacc_extract_loop_call): Add declaration.
* params.opt: Add "param=openacc-kernels" value "decompose-parloops".
* sese.cc (scalar_evolution_in_region): "Redirect" SCEV
analysis to outer loop for IFN_GOACC_LOOP calls.
* sese.h: Add field "kill_scalar_refs".
* tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
like CASE_CONVERT.
* tree-data-ref.cc (dump_data_reference): Include
DR_BASE_ADDRESS and DR_OFFSET in dump output.
(get_references_in_stmt): Don't reject OpenACC internal function
calls.
(graphite_find_data_references_in_stmt): Remove unused variable.
* tree-parloops.cc (pass_parallelize_loops::execute): Disable
pass with the new kernels handling, enable if requested explicitly.
* tree-scalar-evolution.cc (set_scev_analyze_openacc_calls):
Set flag to enable the analysis of internal OpenACC function
calls (use for Graphite only).
(oacc_call_analyzable_p): New function.
(oacc_ifn_call_extract): New function.
(oacc_simplify): New function.
(add_to_evolution): Simplify OpenACC internal function calls
if applicable.
(follow_ssa_edge_binary): Likewise.
(follow_ssa_edge_expr): Likewise.
(follow_copies_to_constant): Likewise.
(analyze_initial_condition): Likewise.
(interpret_loop_phi): Likewise.
(interpret_gimple_call): New function.
(interpret_rhs_expr): Likewise.
(instantiate_scev_name): Likewise.
(analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
(expression_expensive_p): Consider internal OpenACC calls to
be cheap.
* tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
New declaration.
(oacc_call_analyzable_p): New declaration.
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark
lhs of internal OpenACC function calls necessary.
* tree-ssa-ifcombine.c (recognize_if_then_else):
* tree-ssa-loop-niter.cc (oacc_call_analyzable_p):
(oacc_ifn_call_extract): New declaration.
(interpret_gimple_call): New delcaration.
(expand_simple_operations): Handle internal OpenACC function calls.
* tree-ssa-loop.cc (gate_oacc_kernels): Disable for new
"kernels" handling.
* graphite-oacc.cc: New file.
* graphite-oacc.h: New file.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
* c-c++-common/goacc/kernels-reduction.c: Removed.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
Diffstat (limited to 'gcc/graphite.cc')
-rw-r--r-- | gcc/graphite.cc | 127 |
1 files changed, 99 insertions, 28 deletions
diff --git a/gcc/graphite.cc b/gcc/graphite.cc index 19f8975..0096ee9 100644 --- a/gcc/graphite.cc +++ b/gcc/graphite.cc @@ -43,6 +43,8 @@ along with GCC; see the file COPYING3. If not see #include "cfghooks.h" #include "tree.h" #include "gimple.h" +#include "gimple-iterator.h" +#include "gimplify-me.h" #include "ssa.h" #include "fold-const.h" #include "gimple-iterator.h" @@ -59,6 +61,14 @@ along with GCC; see the file COPYING3. If not see #include "tree-into-ssa.h" #include "tree-ssa-propagate.h" #include "graphite.h" +#include "graphite-oacc.h" +#include "cgraph.h" +#include "gimple-pretty-print.h" +#include "print-tree.h" +#include "tree-pretty-print.h" +#include "internal-fn.h" + +static bool have_isl = true; /* Print global statistics to FILE. */ @@ -419,9 +429,12 @@ graphite_transform_loops (void) vec<scop_p> scops = vNULL; isl_ctx *ctx; - /* If a function is parallel it was most probably already run through graphite - once. No need to run again. */ - if (parallelized_function_p (cfun->decl)) + /* If a function is parallel it was most probably already run through + graphite once. No need to run again. This is not true for OpenACC + functions. The function was created for offloading, bu we still might have + to figure out which loops may be parallelized. */ + + if (parallelized_function_p (cfun->decl) && !oacc_function_p (cfun)) return; calculate_dominance_info (CDI_DOMINATORS); @@ -447,6 +460,7 @@ graphite_transform_loops (void) seir_cache = new hash_map<sese_scev_hash, tree>; calculate_dominance_info (CDI_POST_DOMINATORS); + set_scev_analyze_openacc_calls (oacc_function_p (cfun)); build_scops (&scops); free_dominance_info (CDI_POST_DOMINATORS); @@ -460,26 +474,50 @@ graphite_transform_loops (void) print_global_statistics (dump_file); } - FOR_EACH_VEC_ELT (scops, i, scop) - if (dbg_cnt (graphite_scop)) - { - scop->isl_context = ctx; - if (!build_poly_scop (scop)) - continue; - - if (!apply_poly_transforms (scop)) - continue; - - changed = true; - if (graphite_regenerate_ast_isl (scop) - && dump_enabled_p ()) - { - dump_user_location_t loc = find_loop_location - (scops[i]->scop_info->region.entry->dest->loop_father); - dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, - "loop nest optimized\n"); - } - } + if (oacc_function_p (cfun)) + { + /* OpenACC uses Graphite for dependence analysis only. + Code generation would need not to understand the + OpenACC internal function calls before it could be + enabled. */ + + FOR_EACH_VEC_ELT (scops, i, scop) + if (dbg_cnt (graphite_scop)) + { + scop->isl_context = ctx; + if (!build_poly_scop (scop)) + continue; + + if (!optimize_isl (scop, true)) + continue; + + graphite_oacc_analyze_scop (scop); + changed = true; + } + set_scev_analyze_openacc_calls (false); + } + else // Non-OpenACC-functions + { + FOR_EACH_VEC_ELT (scops, i, scop) + if (dbg_cnt (graphite_scop)) + { + scop->isl_context = ctx; + if (!build_poly_scop (scop)) + continue; + + if (!apply_poly_transforms (scop)) + continue; + + changed = true; + if (graphite_regenerate_ast_isl (scop) && dump_enabled_p ()) + { + dump_user_location_t loc = find_loop_location ( + scops[i]->scop_info->region.entry->dest->loop_father); + dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc, + "loop nest optimized\n"); + } + } + } delete seir_cache; seir_cache = NULL; @@ -521,6 +559,8 @@ graphite_transform_loops (void) #else /* If isl is not available: #ifndef HAVE_isl. */ +static bool have_isl = false; + static void graphite_transform_loops (void) { @@ -533,7 +573,10 @@ graphite_transform_loops (void) static unsigned int graphite_transforms (struct function *fun) { - if (number_of_loops (fun) <= 1) + + unsigned num_loops = number_of_loops (fun); + if (num_loops == 0 + || (num_loops == 1 && !oacc_function_p (cfun))) return 0; graphite_transform_loops (); @@ -541,14 +584,35 @@ graphite_transforms (struct function *fun) return 0; } +/* Return TRUE if fun is an OpenACC outlined function that should be analyzed + by Graphite. */ + +static inline bool oacc_enable_graphite_p (function *fun) +{ + if (!flag_openacc || !oacc_get_fn_attrib (fun->decl)) + return false; + + if (!graphite_analyze_oacc_target_region_type_p (fun)) + return false; + + bool optimizing = global_options.x_optimize <= 0; + /* Enabling Graphite if isl is not available aborts compilation. Prefer to + skip it and emit a warning, unless optimizations are enabled. */ + if (!have_isl && !optimizing) + warning (OPT_Wall, "Unable to analyze OpenACC regions with Graphite; isl " + "is not available."); + return true; +} + static bool -gate_graphite_transforms (void) +gate_graphite_transforms (function *fun) { /* Enable -fgraphite pass if any one of the graphite optimization flags is turned on. */ if (flag_graphite_identity || flag_loop_parallelize_all - || flag_loop_nest_optimize) + || flag_loop_nest_optimize + || oacc_enable_graphite_p (fun)) flag_graphite = 1; return flag_graphite != 0; @@ -577,7 +641,10 @@ public: {} /* opt_pass methods: */ - bool gate (function *) final override { return gate_graphite_transforms (); } + bool gate (function *fun) final override + { + return gate_graphite_transforms (fun); + } }; // class pass_graphite @@ -612,7 +679,11 @@ public: {} /* opt_pass methods: */ - bool gate (function *) final override { return gate_graphite_transforms (); } + bool gate (function *fun) final override + { + return gate_graphite_transforms (fun); + } + unsigned int execute (function *fun) final override { return graphite_transforms (fun); |