aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-05-12openacc: Add data optimization passAndrew Stubbs17-7/+2490
Address PR90591 "Avoid unnecessary data transfer out of OMP construct", for simple (but common) cases. This commit adds a pass that optimizes data mapping clauses. Currently, it can optimize copy/map(tofrom) clauses involving scalars to copyin/map(to) and further to "private". The pass is restricted "kernels" regions but could be extended to other types of regions. gcc/ChangeLog: * Makefile.in: Add pass. * doc/gimple.texi: TODO. * gimple-walk.cc (walk_gimple_seq_mod): Adjust for backward walking. * gimple-walk.h (struct walk_stmt_info): Add field. * passes.def: Add new pass. * tree-pass.h (make_pass_omp_data_optimize): New declaration. * omp-data-optimize.cc: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Expect optimization messages. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise. * c-c++-common/goacc/uninit-copy-clause.c: Likewise. * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise. * c-c++-common/goacc/omp_data_optimize-1.c: New test. * g++.dg/goacc/omp_data_optimize-1.C: New test. * gfortran.dg/goacc/omp_data_optimize-1.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-05-12Add function for printing a single OMP_CLAUSEFrederik Harwath3-0/+17
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain") changed the dumping behavior for OMP_CLAUSEs. The old behavior is required for a follow-up commit ("openacc: Add data optimization pass") that optimizes single OMP_CLAUSEs. gcc/ChangeLog: * tree-pretty-print.cc (print_omp_clause_to_str): Add new function. * tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
2023-05-12openacc: Remove unused partitioning in "kernels" regionsFrederik Harwath2-5/+53
With the old "kernels" handling, unparallelized regions would get executed with 1x1x1 partitioning even if the user provided explicit num_gangs, num_workers clauses etc. This commit restores this behavior by removing unused partitioning after assigning the parallelism dimensions to loops. gcc/ChangeLog: * omp-offload.cc (oacc_remove_unused_partitioning): New function for removing partitioning that is not used by any loop. (oacc_validate_dims): Call oacc_remove_unused_partitioning and enable warnings about unused partitioning. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust expectations.
2023-05-12openacc: Add further kernels testsFrederik Harwath17-24/+1013
Add some copies of tests to continue covering the old "parloops"-based "kernels" implementation - until it gets removed from GCC - and add further tests for the new Graphite-based implementation. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: New test. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: New test. * c-c++-common/goacc/kernels-decompose-1-parloops.c: New test. * c-c++-common/goacc/kernels-reduction-parloops.c: New test. * c-c++-common/goacc/loop-auto-reductions.c: New test. * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c: New test. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: New test. * gfortran.dg/goacc/kernels-conversion.f95: New test. * gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test. * gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops.f95: New test. * gfortran.dg/goacc/kernels-reductions.f90: New test.
2023-05-12openacc: Add "can_be_parallel" flag info to "graph" dumpsFrederik Harwath2-11/+32
gcc/ChangeLog: * graph.cc (oacc_get_fn_attrib): New declaration. (find_loop_location): New declaration. (draw_cfg_nodes_for_loop): Print value of the can_be_parallel flag at the top of loops in OpenACC functions.
2023-05-12openacc: Use Graphite for dependence analysis in "kernels" regionsFrederik Harwath52-400/+3253
This commit changes the handling of OpenACC "kernels" to use Graphite for dependence analysis. To this end, it first introduces a new internal representation for "kernels" regions which should be analyzed by Graphite in pass_omp_oacc_kernels_decompose. This is now the default for all "kernels" regions, but the old handling is still available through the command line parameter "--param=openacc_kernels=decompose-parloops". The handling of this new region type in the omp lowering and omp offloading passes follows the existing handling for "parallel" regions. This replaces the specialized handling for "kernels" regions that was previously used and which was in limited in many ways. Graphite is adjusted to be able to analyze the OpenACC functions that get outlined from the "kernels" regions. It is enabled to handle the internal function calls that contain information about OpenACC constructs. In some places where function calls would be rejected by Graphite, those calls need to be ignored. In other places, information about the loop step, bounds etc. needs to be extracted from the calls. The goal is to enable an analysis of the original loop parameters although the omp lowering and expansion steps have already modified the loop structure. Some parallelization-enabling constructs such as OpenACC "reduction" and "private"/"firstprivate" clauses must be recognized and the data-dependences must be adjusted to reflect the semantics of those constructs. The data-dependence analysis step in Graphite has so far been tied to the code generation step. This commit introduces a separate data-dependence analysis step that avoids the code generation. This is necessary because adjusting the code generation to create a correct OpenACC loop structure would require very considerable effort and the goal of this commit is to implement the dependence analysis only. The ability to use Graphite for dependence analysis without its code generation might be of independent interest, but it is so far used for OpenACC purposes only. In general, all changes to Graphite try to avoid affecting other uses of Graphite as much as possible. gcc/ChangeLog: * Makefile.in: Add graphite-oacc.o * cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false. * cfgloop.h: Add can_be_parallel_valid_p field. * cfgloopmanip.cc (copy_loop_info): Add assert. * config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup): * doc/invoke.texi: Adjust param openacc-kernels description. * doc/passes.texi: Adjust pass_ipa_oacc_kernels description. * flag-types.h (enum openacc_kernels):Add OPENACC_KERNELS_DECOMPOSE_PARLOOPS. * gimple-pretty-print.cc (dump_gimple_omp_target): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and widen GF_OMP_TARGET_KIND_MASK. (is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (is_gimple_omp_offloaded): Likewise. * gimplify.cc (gimplify_omp_for): Enable reduction localization for "kernels" regions. (gimplify_omp_workshare): Likewise. * graphite-dependences.cc (scop_get_reads_and_writes): Handle "kills" and "reduction" PDRs. (apply_schedule_on_deps): Add dump output for intermediate steps of the dependence computation to enable understanding of unexpected dependences. (carries_deps): Likewise. (scop_get_dependences): Handle "kill" operations and add dump output. * graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function. (graphite_oacc_analyze_scop): New function. * graphite-optimize-isl.cc (optimize_isl): Remove "static" and add argument to identify OpenACC use; don't fail on unchanged schedule in this case. * graphite-poly.cc (new_poly_dr): Handle "kills". (print_pdr): Likewise. (new_gimple_poly_bb): Likewise. (free_gimple_poly_bb): Likewise. (new_scop): Handle "reduction", "private", and "firstprivate" hash sets. (free_scop): Likewise. (print_isl_space): New function. (debug_isl_space): New function. * graphite-scop-detection.cc (scop_detection::can_represent_loop): Don't fail if niter is 0 in OpenACC functions. (scop_detection::add_scop): Don't reject regions with only one loop in OpenACC functions. (ignored_oacc_internal_call_p): New function. (scan_tree_for_params): Handle VIEW_CONVERT_EXPR. (stmt_has_side_effects): Ignore internal OpenACC function calls. (add_write): Likewise. (add_read): Likewise. (add_kill): New function. (add_kills): New function. (add_oacc_kills): New function. (try_generate_gimple_bb): Kill false dependences for OpenACC "private"/"firstprivate" vars. (gather_bbs::gather_bbs): Determin OpenACC "private"/"firstprivate" vars in region. (gather_bbs::before_dom_children): Add assert. (determine_openacc_reductions): New function. (build_scops): Determine OpenACC "reduction" vars in SCoP. * graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration. (oacc_internal_call_p): New function. (build_poly_dr): Ignore internal OpenACC function calls, handle "reduction" refs. (build_poly_sr): Likewise; handle "kill" operations. * graphite.cc (graphite_transform_loops): Accept functions with only a single loop. (oacc_enable_graphite_p): New function. (gate_graphite_transforms): Enable pass on OpenACC functions. * graphite.h (enum poly_dr_type): Add PDR_KILL. (struct poly_dr): Add "is_reduction" field. (new_poly_dr): Add argument to declaration. (pdr_kill_p): New function. (print_isl_space): New declaration. (debug_isl_space): New declaration. (struct scop): Add fields "reductions_vars", "oacc_firstprivate_vars", and "oacc_private_scalars". (optimize_isl): New declaration. (graphite_oacc_analyze_scop): New declaration. * internal-fn.cc (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE * omp-expand.cc (struct omp_region): Adjust comment. (expand_omp_taskloop_for_inner): (expand_omp_for): Add asserts about expected "kernels" region types. (mark_loops_in_oacc_kernels_region): Likewise. (expand_omp_target): Likewise; handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (build_omp_regions_1): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. Likewise. (omp_make_gimple_edges): Likewise. * omp-general.cc (oacc_get_kernels_attrib): New function. (oacc_get_fn_dim_size): Allow argument to be NULL. * omp-general.h (oacc_get_kernels_attrib): New declaration. * omp-low.cc (struct omp_context): Add fields "oacc_firstprivate_vars" and "oacc_private_scalars". (was_originally_oacc_kernels): New function. (is_oacc_kernels): (is_oacc_kernels_decomposed_graphite_part): New function. (new_omp_context): Allocate "oacc_first_private_vars" and "oacc_private_scalars" ... (delete_omp_context): ... and free from here. (oacc_record_firstprivate_var_clauses): New function. (oacc_record_private_scalars): New function. (scan_sharing_clauses): Call functions to record "private" scalars and "firstprivate" variables. (check_oacc_kernel_gwv): Add assert. (ctx_in_oacc_kernels_region): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (scan_omp_for): Likewise. (check_omp_nesting_restrictions): Likewise. (lower_oacc_head_mark): Likewise. (lower_omp_for): Likewise. (lower_omp_target): Create "private" and "firstprivate" marker call statements. (lower_oacc_head_tail): Adjust "private" and "firstprivate" marker calls. (lower_oacc_reductions): Emit "private" and "firstprivate" marker call statements. (make_oacc_firstprivate_vars_marker): New function. (make_oacc_private_scalars_marker): New function. * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn): Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to region using the new "kernels" handling. (make_region_seq): Adjust default region type for new "kernels" handling; no more exceptions, let Graphite handle everything. (make_region_loop_nest): Likewise; add dump output and assert. (adjust_nested_loop_clauses): Stop creating "auto" clauses if loop has "independent", "gang" etc. (transform_kernels_loop_clauses): Likewise. * omp-offload.cc (oacc_extract_loop_call): New function. (oacc_loop_get_cfg_loop): New function. (can_be_parallel_str): New function. (oacc_loop_can_be_parallel_p): New function. (oacc_parallel_kernels_graphite_fun_p): New function. (oacc_parallel_fun_p): New function. (oacc_loop_transform_auto_into_independent): New function, ... (oacc_loop_fixed_partitions): ... called from here to transfer the result of Graphite's analysis to the loop. (execute_oacc_loop_designation): Handle "oacc functions with "parallel_kernels_graphite" attribute. (execute_oacc_device_lower): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE. * omp-offload.h (oacc_extract_loop_call): Add declaration. * params.opt: Add "param=openacc-kernels" value "decompose-parloops". * sese.cc (scalar_evolution_in_region): "Redirect" SCEV analysis to outer loop for IFN_GOACC_LOOP calls. * sese.h: Add field "kill_scalar_refs". * tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR like CASE_CONVERT. * tree-data-ref.cc (dump_data_reference): Include DR_BASE_ADDRESS and DR_OFFSET in dump output. (get_references_in_stmt): Don't reject OpenACC internal function calls. (graphite_find_data_references_in_stmt): Remove unused variable. * tree-parloops.cc (pass_parallelize_loops::execute): Disable pass with the new kernels handling, enable if requested explicitly. * tree-scalar-evolution.cc (set_scev_analyze_openacc_calls): Set flag to enable the analysis of internal OpenACC function calls (use for Graphite only). (oacc_call_analyzable_p): New function. (oacc_ifn_call_extract): New function. (oacc_simplify): New function. (add_to_evolution): Simplify OpenACC internal function calls if applicable. (follow_ssa_edge_binary): Likewise. (follow_ssa_edge_expr): Likewise. (follow_copies_to_constant): Likewise. (analyze_initial_condition): Likewise. (interpret_loop_phi): Likewise. (interpret_gimple_call): New function. (interpret_rhs_expr): Likewise. (instantiate_scev_name): Likewise. (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions. (expression_expensive_p): Consider internal OpenACC calls to be cheap. * tree-scalar-evolution.h (set_scev_analyze_openacc_calls): New declaration. (oacc_call_analyzable_p): New declaration. * tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark lhs of internal OpenACC function calls necessary. * tree-ssa-ifcombine.c (recognize_if_then_else): * tree-ssa-loop-niter.cc (oacc_call_analyzable_p): (oacc_ifn_call_extract): New declaration. (interpret_gimple_call): New delcaration. (expand_simple_operations): Handle internal OpenACC function calls. * tree-ssa-loop.cc (gate_oacc_kernels): Disable for new "kernels" handling. * graphite-oacc.cc: New file. * graphite-oacc.h: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust. * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels.c: Adjust. * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust. * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust. * c-c++-common/goacc/classify-kernels-unparallelized.c: Removed. * c-c++-common/goacc/kernels-reduction.c: Removed. * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-05-12graphite: Add runtime alias checkingFrederik Harwath9-20/+344
Graphite rejects a SCoP if it contains a pair of data references for which it cannot determine statically if they may alias. This happens very often, for instance in C code which does not use explicit "restrict". This commit adds the possibility to analyze a SCoP nevertheless and perform an alias check at runtime. Then, if aliasing is detected, the execution will fall back to the unoptimized SCoP. TODO This needs more testing on non-OpenACC code. gcc/ChangeLog: * common.opt: Add fgraphite-runtime-alias-checks. * graphite-isl-ast-to-gimple.cc (generate_alias_cond): New function. (graphite_regenerate_ast_isl): Use from here. * graphite-poly.cc (new_scop): Create unhandled_alias_ddrs vec ... (free_scop): and release here. * graphite-scop-detection.cc (dr_defs_outside_region): New function. (dr_well_analyzed_for_runtime_alias_check_p): New function. (graphite_runtime_alias_check_p): New function. (build_alias_set): Record unhandled alias ddrs for later alias check creation if flag_graphite_runtime_alias_checks is true instead of failing. * graphite.h (struct scop): Add field unhandled_alias_ddrs. * sese.h (has_operands_from_region_p): New function. gcc/testsuite/ChangeLog: * gcc.dg/graphite/alias-1.c: New test.
2023-05-12Move compute_alias_check_pairs to tree-data-ref.cFrederik Harwath4-87/+103
Move this function from tree-loop-distribution.c to tree-data-ref.c and make it non-static to enable its use from other parts of GCC. gcc/ChangeLog: * tree-loop-distribution.cc (data_ref_segment_size): Remove function. (latch_dominated_by_data_ref): Likewise. (compute_alias_check_pairs): Likewise. * tree-data-ref.cc (data_ref_segment_size): New function, copied from tree-loop-distribution.c (compute_alias_check_pairs): Likewise. (latch_dominated_by_data_ref): Likewise. * tree-data-ref.h (compute_alias_check_pairs): New declaration.
2023-05-12Fix branch prediction dump messageFrederik Harwath2-1/+5
Instead of, for instance, "Loop got predicted 1 to iterate 10 times" the message should be "Loop 1 got predicted to iterate 10 times". gcc/ChangeLog: * predict.cc (pass_profile::execute): Fix dump message.
2023-05-12openacc: Move pass_oacc_device_lower after pass_graphiteFrederik Harwath43-81/+416
The OpenACC device lowering pass must run after the Graphite pass to allow for the use of Graphite for automatic parallelization of kernels regions in the future. Experimentation has shown that it is best, performancewise, to run pass_oacc_device_lower together with the related passes pass_oacc_loop_designation and pass_oacc_gimple_workers early after pass_graphite in pass_tree_loop, at least if the other tree loop passes are not adjusted. In particular, to enable vectorization which is crucial for GCN offloading, device lowering should happen before pass_vectorize. To bring the loops contained in the offloading functions into the shape expected by the loop vectorizer, we have to make sure that some passes that previously were executed only once before pass_tree_loop are also executed on the offloading functions. To ensure the execution of pass_oacc_device_lower if pass_tree_loop does not execute (no loops, no optimizations), we introduce two further copies of the pass to the pipeline that run if there are no loops or if no optimization is performed. gcc/ChangeLog: * omp-general.cc (oacc_get_fn_dim_size): Return 0 on missing "dims". * omp-offload.cc (pass_oacc_loop_designation::clone): New member function. (pass_oacc_gimple_workers::clone): Likewise. (pass_oacc_gimple_device_lower::clone): Likewise. * passes.cc (pass_data_no_loop_optimizations): New pass_data. (class pass_no_loop_optimizations): New pass. (make_pass_no_loop_optimizations): New function. * passes.def: Move pass_oacc_{loop_designation, gimple_workers, device_lower} into tree_loop, and add copies to pass_tree_no_loop and to new pass_no_loop_optimizations. Add copies of passes pass_ccp, pass_ipa_warn, pass_complete_unrolli, pass_backprop, pass_phiprop, pass_fix_loops after the OpenACC passes in pass_tree_loop. * tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone): New member function. (pass_complete_unrolli::clone): Likewise. * tree-ssa-loop.cc (pass_fix_loops::clone): Likewise. (pass_tree_loop_init::clone): Likewise. (pass_tree_loop_done::clone): Likewise. * tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/goacc/loop-processing-1.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/routine-nohost-1.c: Likewise. * c-c++-common/unroll-1.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gcc.dg/tree-ssa/backprop-1.c: Likewise. * gcc.dg/tree-ssa/backprop-2.c: Likewise. * gcc.dg/tree-ssa/backprop-3.c: Likewise. * gcc.dg/tree-ssa/backprop-4.c: Likewise. * gcc.dg/tree-ssa/backprop-5.c: Likewise. * gcc.dg/tree-ssa/backprop-6.c: Likewise. * gcc.dg/tree-ssa/cunroll-1.c: Likewise. * gcc.dg/tree-ssa/cunroll-3.c: Likewise. * gcc.dg/tree-ssa/cunroll-9.c: Likewise. * gcc.dg/tree-ssa/ldist-17.c: Likewise. * gcc.dg/tree-ssa/loop-38.c: Likewise. * gcc.dg/tree-ssa/pr21463.c: Likewise. * gcc.dg/tree-ssa/pr45427.c: Likewise. * gcc.dg/tree-ssa/pr61743-1.c: Likewise. * gcc.dg/unroll-2.c: Likewise. * gcc.dg/unroll-3.c: Likewise. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/vect/vect-profile-1.c: Likewise. * c-c++-common/goacc/device-lowering-debug-optimization.c: New test. * c-c++-common/goacc/device-lowering-no-loops.c: New test. * c-c++-common/goacc/device-lowering-no-optimization.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2023-05-12Fortran: delinearize multi-dimensional array accessesSandra Loosemore16-95/+292
The Fortran front end presently linearizes accesses to multi-dimensional arrays by combining the indices for the various dimensions into a series of explicit multiplies and adds with refactoring to allow CSE of invariant parts of the computation. Unfortunately this representation interferes with Graphite-based loop optimizations. It is difficult to recover the original multi-dimensional form of the access by the time loop optimizations run because parts of it have already been optimized away or into a form that is not easily recognizable, so it seems better to have the Fortran front end produce delinearized accesses to begin with, a set of nested ARRAY_REFs similar to the existing behavior of the C and C++ front ends. This is a long-standing problem that has previously been discussed e.g. in PR 14741 and PR61000. This patch is an initial implementation for explicit array accesses only; it doesn't handle the accesses generated during scalarization of whole-array or array-section operations, which follow a different code path. gcc/ * expr.cc (get_inner_reference): Handle NOP_EXPR like VIEW_CONVERT_EXPR. gcc/fortran/ * lang.opt (-param=delinearize=): New. * trans-array.cc (get_class_array_vptr): New, split from... (build_array_ref): ...here. (get_array_lbound, get_array_ubound): New, split from... (gfc_conv_array_ref): ...here. Additional code refactoring plus support for delinearization of the array access. gcc/testsuite/ * gfortran.dg/assumed_type_2.f90: Adjust patterns. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/graphite/block-3.f90: Remove xfails. * gfortran.dg/graphite/block-4.f90: Likewise. * gfortran.dg/inline_matmul_24.f90: Adjust patterns. * gfortran.dg/no_arg_check_2.f90: Likewise. * gfortran.dg/pr32921.f: Likewise. * gfortran.dg/reassoc_4.f: Disable delinearization for this test. Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
2023-05-12Fix gimple_debug_cfg declarationFrederik Harwath2-1/+6
Silence a warning. The argument type did not match the definition. gcc/ChangeLog: * tree-cfg.h (gimple_debug_cfg): Change argument type from int to dump_flags_t.
2023-05-12openacc: fix ICE for non-decl expression in non-contiguous array base-pointerChung-Lin Tang4-9/+43
Currently, we do not support cases like struct-members as the base-pointer for an OpenACC non-contiguous array. Mark such cases as unsupported in the C/C++ front-ends, instead of ICEing on them. gcc/c/ChangeLog: * c-typeck.cc (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported. gcc/cp/ChangeLog: * semantics.cc (handle_omp_array_sections_1): Robustify non-contiguous array check and reject non-DECL base-pointer cases as unsupported.
2023-05-12Add -Wopenacc-parallelism to tests only in OG11Kwok Cheung Yeung2-0/+8
2021-04-30 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Add -Wopenacc-parallelism option.
2023-05-12Update expected messages in data-clause-1 testsKwok Cheung Yeung3-2/+7
The patch 'Merge non-contiguous array support patches' handles one of the non-contiguous cases such that it is no longer an error. 2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/data-clause-1.c (foo): Remove expected message. * g++.dg/goacc/data-clause-1.C (foo): Remove expected message.
2023-05-12Update expected messages in kernels-decompose-2 testsKwok Cheung Yeung3-8/+15
This changes expected messages that differ between mainline and OG11. On OG10, these messages were added in the patch: 081a01963ca8 Update expected messages, errors and warnings for "kernels" tests 2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c (main): Update expected messages. * gfortran.dg/goacc/kernels-decompose-2.f95 (main): Update expected messages.
2023-05-12Fix is_oacc_parallel_or_serial for kernel regionsKwok Cheung Yeung2-1/+11
2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/ * omp-low.cc (is_oacc_parallel_or_serial): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED and GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE.
2023-05-12Update expected messages in OpenACC testsKwok Cheung Yeung12-165/+186
This updates the types of messages expected in the test, and the '-fopt-info' option used to request them. The phrasing of the expected messages has also changed somewhat and has been adjusted to match. 2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Update additional options and expected messages. * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Likewise. * c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c: Likewise. * c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
2023-05-12DWARF: late code range fixupAndrew Stubbs2-23/+64
Ensure that the parent DWARF subprograms of offload kernel functions have a code range, and are therefore not discarded by GDB. This is only necessary when the parent function does not actually exist in the final binary, which is commonly the case within the offload device's binary. This patch replaces 808bdf1bb29 and fdcb23540a2. It should be squashed with those before being posted upstream. gcc/ * dwarf2out.cc (notional_parents_list): New file variable. (gen_subprogram_die): Record offload kernel functions in notional_parents_list. (fixup_notional_parents): New function. (dwarf2out_finish): Call fixup_notional_parents. (dwarf2out_c_finalize): Reset notional_parents_list.
2023-05-12openmp: Scale type precision of collapsed iterator variableKwok Cheung Yeung3-7/+35
This sets the type precision of the collapsed iterator variable to the sum of the precision of the collapsed loop variables, up to a maximum of sizeof(long long) (i.e. 64-bits). 2021-03-01 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/ * omp-expand.cc (expand_oacc_for): Convert .tile variable to diff_type before multiplying. * omp-general.cc (omp_extract_for_data): Use accumulated precision of all collapsed for-loops as precision of iteration variable, up to the precision of a long long. libgomp/ * testsuite/libgomp.c-c++-common/collapse-4.c: New. * testsuite/libgomp.fortran/collapse5.f90: New.
2023-05-12DWARF: fix ICE caused by offload debug fixAndrew Stubbs2-3/+20
This should be squashed with 808bdf1bb29 and fdcb23540a2 to go to mainline. gcc/ * dwarf2out.cc (gen_subprogram_die): Replace existing low/high PC attributes, rather than ICE.
2023-05-12OpenMP 5.0: Allow multiple clauses mapping same variableChung-Lin Tang6-30/+86
This is a merge of: https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562081.html This patch now allows multiple clauses on the same construct to map the same variable, which was not valid in OpenMP 4.5, but now allowed in 5.0. This may possibly reverted/updated when a final patch is approved for mainline. 2021-02-01 Chung-Lin Tang <cltang@codesourcery.com> gcc/cp/ChangeLog: * semantics.cc (finish_omp_clauses): Adjust to allow duplicate mapped variables for OpenMP. gcc/ChangeLog: * omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE' default parameter. Set splay-tree lookup key to key_expr instead of var if key_expr is non-NULL. Adjust call to install_parm_decl. Update comments. (scan_sharing_clauses): Use clause tree expression as splay-tree key for map/to/from and OpenACC firstprivate cases when installing the variable field into the send/receive record type. (maybe_lookup_field_in_outer_ctx): Add code to search through construct clauses instead of entirely based on splay-tree lookup. (lower_oacc_reductions): Adjust to find map-clause of reduction variable, then create receiver-ref. (lower_omp_target): Adjust to lookup var field using clause expression. gcc/testsuite/ChangeLog: * c-c++-common/gomp/clauses-2.c: Adjust testcase.
2023-05-12Correct fix offload dwarf infoAndrew Stubbs2-3/+6
The previous patch wasn't quite right, apparently. Somehow the behaviour changed after another clean build? This tweak fixes it. This patch should be squashed with fdcb23540a2 to go to mainline. gcc/ChangeLog: * dwarf2out.cc (gen_subprogram_die): Check offload attributes only.
2023-05-12DWARF address space for variablesAndrew Stubbs2-0/+14
Add DWARF address class attributes for variables that exist outside the generic address space. In particular, this is the case for gang-private variables in OpenACC offload kernels. gcc/ChangeLog: * dwarf2out.cc (add_location_or_const_value_attribute): Set DW_AT_address_class, if appropriate.
2023-05-12Fix offload dwarf infoAndrew Stubbs2-0/+19
Add a notional code range to the notional parent function of offload kernel functions. This is enough to prevent GDB discarding them. gcc/ChangeLog: * dwarf2out.cc (gen_subprogram_die): Add high/low_pc attributes for parents of offload kernels.
2023-05-12[og10] vect: Add target hook to prefer gather/scatter instructionsJulian Brown6-2/+35
For AMD GCN, the instructions available for loading/storing vectors are always scatter/gather operations (i.e. there are separate addresses for each vector lane), so the current heuristic to avoid gather/scatter operations with too many elements in get_group_load_store_type is counterproductive. Avoiding such operations in that function can subsequently lead to a missed vectorization opportunity whereby later analyses in the vectorizer try to use a very wide array type which is not available on this target, and thus it bails out. The attached patch adds a target hook to override the "single_element_p" heuristic in the function as a target hook, and activates it for GCN. This allows much better code to be generated for affected loops. 2021-01-13 Julian Brown <julian@codesourcery.com> gcc/ * doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add documentation hook. * doc/tm.texi: Regenerate. * target.def (prefer_gather_scatter): Add target hook under vectorizer. * tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer gather/scatter instructions to scalar/elementwise fallback. * config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define hook.
2023-05-12[og10] openacc: Adjust loop lowering for AMD GCNJulian Brown2-35/+131
This patch adjusts OpenACC loop lowering in the AMD GCN target compiler in such a way that the autovectorizer can vectorize the "vector" dimension of those loops in more cases. Rather than generating "SIMT" code that executes a scalar instruction stream for each lane of a vector in lockstep, for GCN we model the GPU like a typical CPU, with separate instructions to operate on scalar and vector data. That means that unlike other offload targets, we rely on the autovectorizer to handle the innermost OpenACC parallelism level, which is "vector". Because of this, the OpenACC builtin functions to return the current vector lane and the vector width return 0 and 1 respectively, despite the native vector width being 64 elements wide. This allows generated code to work with our chosen compilation model, but the way loops are lowered in omp-offload.c:oacc_xform_loop does not understand the discrepancy between logical (OpenACC) and physical vector sizes correctly. That means that if a loop is partitioned over e.g. the worker AND vector dimensions, we actually lower with unit vector size -- meaning that if we then autovectorize, we end up trying to vectorize over the "worker" dimension rather than the vector one! Then, because the number of workers is not fixed at compile time, that means the autovectorizer has a hard time analysing the loop and thus vectorization often fails entirely. We can fix this by deducing the true vector width in oacc_xform_loop, and using that when we are on a "non-SIMT" offload target. We can then rearrange how loops are lowered in that function so that the loop form fed to the autovectorizer is more amenable to vectorization -- namely, the innermost step is set to process each loop iteration sequentially. For some benchmarks, allowing vectorization to succeed leads to quite impressive performance improvements -- I've observed between 2.5x and 40x on one machine/GPU combination. The low-level builtins available to user code (__builtin_goacc_parlevel_id and __builtin_goacc_parlevel_size) continue to return 0/1 respectively for the vector dimension for AMD GCN, even if their containing loop is vectorized -- that's a quirk that we might possibly want to address at some later date. Only non-"chunking" loops are handled at present. "Chunking" loops are still lowered as before. 2021-01-13 Julian Brown <julian@codesourcery.com> gcc/ * omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter. Add overloaded wrapper for previous arguments & behaviour. (oacc_xform_loop): Lower vector loops to iterate a multiple of omp_max_vf times over contiguous steps on non-SIMT targets. libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust for loop lowering changes. * testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
2023-05-12Relax some restrictions on the loop bound in kernels loop annotation.Sandra Loosemore5-9/+205
OpenACC loop semantics require that the loop bound be computable before entering the loop, rather than the C/C++ semantics where the end test is evaluated on every iteration. Formerly the kernels loop annotater permitted only constants and variables not modified in the loop body in the loop bound expression. This patch relaxes those restrictions somewhat to allow many forms of expressions involving such constants and variables, including calls to constant functions. 2020-08-30 Sandra Loosemore <sandra@codesourcery.com> gcc/c-family/ * c-omp.cc (end_test_ok_for_annotation_r): New. (end_test_ok_for_annotation): New. (check_and_annotate_for_loop): Use the new helper function. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-21.c: New. * c-c++-common/goacc/kernels-loop-annotation-22.c: New.
2023-05-12Clean up loop variable extraction in OpenACC kernels loop annotation.Sandra Loosemore2-98/+104
The code for identifying annotatable loops in OpenACC kernels regions previously looked for the loop variable as the left-hand side of the comparison in the loop end test. However, front end optimizations sometimes switch the sense of the comparison, making this method unreliable. In particular, it's ambiguous when both operands to the end test comparison are local variables. This patch reorders the loop processing to identify the loop variable from the initializer, rather than the end test. The processing of the end test then just checks that one of the operands to the comparison matches the variable appearing in the initializer. Much of the patch is code refactoring, moving the initializer analysis out of annotate_for_loop to check_and_annotate_for_loop so it can be performed earlier. 2020-08-30 Sandra Loosemore <sandra@codesourcery.com> gcc/c-family/ * c-omp.cc (annotate_for_loop): Move initializer processing... (check_and_annotate_for_loop): ... to here. Allow the loop variable as either operand to the condition.
2023-05-12Fix patterns in Fortran tests for kernels loop annotation.Sandra Loosemore15-14/+32
Several of the Fortran tests for kernels loop annotation were failing due to changes in the formatting of "acc loop" constructs in the dump file. Now the "auto" clause appears first, instead of after "private". 2020-08-23 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update expected output. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
2023-05-12Permit calls to builtins and intrinsics in kernels loops.Sandra Loosemore7-7/+85
This tweak to the OpenACC kernels loop annotation relaxes the restrictions on function calls in the loop body. Normally calls to functions not explicitly marked with a parallelism attribute are not permitted, but C/C++ builtins and Fortran intrinsics have known semantics so we can generally permit those without restriction. If any turn out to be problematical, we can add on here to recognize them, or in the processing of the "auto" annotations. 2020-08-22 Sandra Loosemore <sandra@codesourcery.com> gcc/c-family/ * c-omp.cc (annotate_loops_in_kernels_regions): Test for calls to builtins. gcc/fortran/ * openmp.cc (check_expr_for_invalid_calls): Check for intrinsic functions. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-20.c: New. * gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
2023-05-12Update dg-* in gfortran.dg/gomp/pr67500.f90Tobias Burnus2-4/+9
Contrary to GCC 11, OG10 uses an error instead of a warning, cf. commit 271c7fef548a86676d304b1eb2be5c0d47280bd6. gcc/testsuite/ * gfortran.dg/gomp/pr67500.f90: Change dg-warning to dg-error.
2023-05-12Annotate inner loops in "acc kernels loop" directives (Fortran).Sandra Loosemore8-9/+151
Normally explicit loop directives in a kernels region inhibit automatic annotation of other loops in the same nest, on the theory that users have indicated they want manual control over that section of code. However there seems to be an expectation in user code that the combined "kernels loop" directive should still allow annotation of inner loops. This patch implements this behavior in Fortran. 2020-08-19 Sandra Loosemore <sandra@codesourcery.com> gcc/fortran/ * openmp.cc (annotate_do_loops_in_kernels): Handle EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner loops in a combined "acc kernels loop" directive. gcc/testsuite/ * gfortran.dg/goacc/kernels-loop-annotation-18.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-19.f95: New. * gfortran.dg/goacc/combined-directives.f90: Adjust expected patterns. * gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise. * gfortran.dg/goacc/private-predetermined-kernels-1.f95: Likewise.
2023-05-12Annotate inner loops in "acc kernels loop" directives (C/C++).Sandra Loosemore6-13/+78
Normally explicit loop directives in a kernels region inhibit automatic annotation of other loops in the same nest, on the theory that users have indicated they want manual control over that section of code. However there seems to be an expectation in user code that the combined "kernels loop" directive should still allow annotation of inner loops. This patch implements this behavior for C and C++. 2020-08-19 Sandra Loosemore <sandra@codesourcery.com> gcc/c-family/ * c-omp.cc (annotate_loops_in_kernels_regions): Process inner loops in combined "acc kernels loop" directives. gcc/testsuite/ * c-c++-common/goacc/kernels-loop-annotation-18.c: New. * c-c++-common/goacc/kernels-loop-annotation-19.c: New. * c-c++-common/goacc/combined-directives.c: Adjust expected patterns.
2023-05-12Add a "combined" flag for "acc kernels loop" etc directives.Sandra Loosemore8-13/+57
2020-08-19 Sandra Loosemore <sandra@codesourcery.com> gcc/ * tree.h (OACC_LOOP_COMBINED): New. gcc/c/ * c-parser.cc (c_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/cp/ * parser.cc (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_do): Add combined parameter, use it to set OACC_LOOP_COMBINED. Update all call sites.
2023-05-12Fix gfortran.dg/goacc/pr70828.f90 testcaseKwok Cheung Yeung2-2/+7
Array mapping was changed by the patch '[OpenMP, Fortran] Add structure/derived-type element mapping'. 2020-08-19 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple dump.
2023-05-12XFAIL tests in gfortran.dg/goacc/loop-2-kernels.f95Kwok Cheung Yeung2-8/+12
The C-equivalent version of the test (c-c++-common/goacc/loop-2-kernels.c) has these tests XFAILed in the commit 'Make new OpenACC kernels conversion the default; adjust and add tests' (commit 757f56ddc43fd80bb8740222ec352111b26d66e9), so the Fortran version should be XFAILed too. 2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * gfortran.dg/goacc/loop-2-kernels.f95: Add XFAILs.
2023-05-12Fix failures in ↵Kwok Cheung Yeung2-8/+13
c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c testcase This should have been part of 'Update expected messages, errors and warnings for "kernels" tests' (commit 081a01963ca8db7ddaaf5871d281321454fd3246). 2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c (main): Remove obsolete expected messages.
2023-05-12Fix goacc/noncontig_array-1.c testcaseKwok Cheung Yeung2-0/+5
2020-06-02 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/noncontig_array-1.c: Dump Gimple pass.
2023-05-12OpenACC: fix privatization of by-reference arraysTobias Burnus2-1/+7
Replacing of a by-reference variable in a private clause by a local variable makes sense; however, for arrays, the size is not directly known by the type. This causes an ICE via create_tmp_var which indirectly invokes force_constant_size in this case - but the latter only handled Ada. gcc/ChangeLog: * gimplify.cc (localize_reductions): Do not create local variable for privatized arrays.
2023-05-12Fix bug in processing of array dimensions in data clauses.Sandra Loosemore2-0/+15
The g++ front end wraps the array length and low_bound values in NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail. The test case c-c++-common/goacc/kernels-loop-annotation-1.c was tickling this bug and giving bogus errors in g++ because it was falling through to dynamic array code instead of recognizing the constant bounds. This patch was posted upstream here https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html but not yet committed. It may be that some other fix for this problem is implemented on mainline instead; check before merging this patch. 2020-03-31 Sandra Loosemore <sandra@codesourcery.com> gcc/cp/ * semantics.cc (handle_omp_array_sections_1): Call STRIP_NOPS on length and low_bound; (handle_omp_array_sections): Likewise.
2023-05-12Additional Fortran testsuite fixes for kernels loops annotation pass.Sandra Loosemore4-9/+18
These testsuite fixes are specific to the og10 branch, so are being segregated from the ones that apply to mainline in a separate commit from the main Fortran kernels loop annotation patch. 2020-03-27 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust line numbering. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/kernels-decompose-2.f95: Add -fno-openacc-kernels-annotate-loops.
2023-05-12Kernels loops annotation: Fortran.Sandra Loosemore34-0/+1025
This patch implements the Fortran support for adding "#pragma acc loop auto" annotations to loops in OpenACC kernels regions. It implements the same -fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options that were previously added (and documented) for the C/C++ front ends. 2020-03-27 Sandra Loosemore <sandra@codesourcery.com> Gergö Barany <gergo@codesourcery.com> gcc/fortran/ * gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare. * lang.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. * openmp.cc: Include options.h. (enum annotation_state, enum annotation_result): New. (check_code_for_invalid_calls): New. (check_expr_for_invalid_calls): New. (check_for_invalid_calls): New. (annotate_do_loop): New. (annotate_do_loops_in_kernels): New. (compute_goto_targets): New. (gfc_oacc_annotate_loops_in_kernels_regions): New. * parse.cc (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops. gcc/testsuite/ * gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add -fno-openacc-kernels-annotate-loops option. * gfortran.dg/goacc/classify-kernels.f95: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/kernels-loop-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise. * gfortran.dg/goacc/kernels-loop-data.f95: Likewise. * gfortran.dg/goacc/kernels-loop-n.f95: Likewise. * gfortran.dg/goacc/kernels-loop.f95: Likewise. * gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise. * gfortran.dg/goacc/kernels-loop-annotation-1.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-2.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-3.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-4.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-5.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-6.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-7.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-8.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-9.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-10.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-11.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-12.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-13.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-14.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-15.f95: New. * gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
2023-05-12Add -fno-openacc-kernels-annotate-loops option to more testcases.Sandra Loosemore4-0/+10
The testcases being tweaked here are present on the og10 branch but not on FSF mainline. 2020-03-27 Sandra Loosemore <sandra@codesourcery.com> gcc/testsuite/ * c-c++-common/goacc/kernels-decompose-2.c: Add -fno-openacc-kernels-annotate-loops. * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
2023-05-12Kernels loops annotation: C and C++.Sandra Loosemore49-61/+1402
This patch detects loops in kernels regions that are candidates for parallelization, and adds "#pragma acc loop auto" annotations to them. This annotation is controlled by the -fopenacc-kernels-annotate-loops option, which is enabled by default. -Wopenacc-kernels-annotate-loops can be used to produce diagnostics about loops that cannot be annotated. 2020-03-27 Sandra Loosemore <sandra@codesourcery.com> Kernels loops annotation: C and C++. gcc/c-family/ * c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare. * c-omp.cc: Include tree-iterator.h (enum annotation_state): New. (struct annotation_info): New. (do_not_annotate_loop): New. (do_not_annotate_loop_nest): New. (annotation_error): New. (c_finish_omp_for_internal): Split from c_finish_omp_for. Use annotation_error function. Code refactoring to avoid destructive changes that cannot be undone in case of error. (is_local_var): New. (lang_specific_unwrap_initializer): New. (annotate_for_loop): New. (check_and_annotate_for_loop): New. (annotate_loops_in_kernels_regions): New. (c_oacc_annotate_loops_in_kernels_regions): New. * c.opt (Wopenacc-kernels-annotate-loops): New. (fopenacc-kernels-annotate-loops): New. gcc/c/ * c-decl.cc (c_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/cp/ * decl.cc (cp_unwrap_for_init): New. (finish_function): Call c_oacc_annotate_loops_in_kernels_regions. gcc/ * doc/invoke.texi (Option Summary): Add entries for -Wopenacc-kernels-annotate-loops and -fno-openacc-kernels-annotate-loops. (Warning Options): Document -Wopenacc-kernels-annotate-loops. (Optimization Options): Document -fno-openacc-kernels-annotate-loops. gcc/testsuite/ * c-c++-common/goacc/classify-kernels-unparallelized.c: Add -fno-openacc-kernels-annotate-loops option. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise. * c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise. * c-c++-common/goacc/kernels-double-reduction-n.c: Likewise. * c-c++-common/goacc/kernels-double-reduction.c: Likewise. * c-c++-common/goacc/kernels-loop-2.c: Likewise. * c-c++-common/goacc/kernels-loop-3.c: Likewise. * c-c++-common/goacc/kernels-loop-data-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise. * c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-loop-data-update.c: Likewise. * c-c++-common/goacc/kernels-loop-data.c: Likewise. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise. * c-c++-common/goacc/kernels-loop-n.c: Likewise. * c-c++-common/goacc/kernels-loop-nest.c: Likewise. * c-c++-common/goacc/kernels-loop.c: Likewise. * c-c++-common/goacc/kernels-one-counter-var.c: Likewise. * c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise. * c-c++-common/goacc/kernels-reduction.c: Likewise. * c-c++-common/goacc/kernels-loop-annotation-1.c: New. * c-c++-common/goacc/kernels-loop-annotation-2.c: New. * c-c++-common/goacc/kernels-loop-annotation-3.c: New. * c-c++-common/goacc/kernels-loop-annotation-4.c: New. * c-c++-common/goacc/kernels-loop-annotation-5.c: New. * c-c++-common/goacc/kernels-loop-annotation-6.c: New. * c-c++-common/goacc/kernels-loop-annotation-7.c: New. * c-c++-common/goacc/kernels-loop-annotation-8.c: New. * c-c++-common/goacc/kernels-loop-annotation-9.c: New. * c-c++-common/goacc/kernels-loop-annotation-10.c: New. * c-c++-common/goacc/kernels-loop-annotation-11.c: New. * c-c++-common/goacc/kernels-loop-annotation-12.c: New. * c-c++-common/goacc/kernels-loop-annotation-13.c: New. * c-c++-common/goacc/kernels-loop-annotation-14.c: New. * c-c++-common/goacc/kernels-loop-annotation-15.c: New. * c-c++-common/goacc/kernels-loop-annotation-16.c: New. * c-c++-common/goacc/kernels-loop-annotation-17.c: New.
2023-05-12Fix vector handling for firstprivate of <= pointer sizeTobias Burnus2-1/+7
Test case is the existing libgomp.oacc-c++/firstprivate-mappings-1.C. * omp-low.c (convert_from_firstprivate_int): Use VIEW_CONVERT also for vectors.
2023-05-12Fortran "declare create"/allocate support for OpenACCJulian Brown13-27/+222
2018-10-04 Cesar Philippidis <cesar@codesourcery.com> Julian Brown <julian@codesourcery.com> gcc/ * omp-low.cc (scan_sharing_clauses): Update handling of OpenACC declare create, declare copyin and declare deviceptr to have local lifetimes. (convert_to_firstprivate_int): Handle pointer types. (convert_from_firstprivate_int): Likewise. Create local storage for the values being pointed to. Add new orig_type argument. (lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}. Add orig_type argument to convert_from_firstprivate_int call. Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT. Don't privatize firstprivate VLAs. * tree-pretty-print.cc (dump_omp_clause): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}. gcc/fortran/ * gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE, OMP_MAP_DECLARE_DEALLOCATE. (gfc_omp_clauses): Add update_allocatable. * trans-array.cc (gfc_array_allocate): Call gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create attribute set. * trans-decl.cc (find_module_oacc_declare_clauses): Relax oacc_declare_create to OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to match OpenACC 2.5 semantics. * trans-openmp.cc (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER (for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for allocatable scalar decls. Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} clauses. (gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER for allocatable scalar data clauses inside acc update directives. (gfc_trans_oacc_declare_allocate): New function. * trans-stmt.cc (gfc_trans_allocate): Call gfc_trans_oacc_declare_allocate for decls with oacc_declare_create attribute set. (gfc_trans_deallocate): Likewise. * trans.h (gfc_trans_oacc_declare_allocate): Declare. gcc/testsuite/ * gfortran.dg/goacc/declare-allocatable-1.f90: New test. include/ * gomp-constants.h (enum gomp_map_kind): Define GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4. libgomp/ * oacc-mem.c (gomp_acc_declare_allocate): New function. * oacc-parallel.c (GOACC_enter_exit_data): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}. * testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test. * testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test. * testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test. * testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test. 2020-02-19 Julian Brown <julian@codesourcery.com> gcc/fortran/ * trans-openmp.cc (gfc_omp_check_optional_argument): Handle non-decl case. gcc/ * gimplify.cc (gimplify_scan_omp_clauses): Handle GOMP_MAP_DECLARE_ALLOCATE and GOMP_MAP_DECLARE_DEALLOCATE. libgomp/ * libgomp.h (gomp_acc_declare_allocate): Remove prototype. * oacc-mem.c (gomp_acc_declare_allocate): Make static. Add POINTER argument. Use acc_delete instead of acc_free. Handle scalar mappings. (find_group_last): Handle GOMP_MAP_DECLARE_ALLOCATE and GOMP_MAP_DECLARE_DEALLOCATE groupings. (goacc_enter_data_internal): Fix kind check for GOMP_MAP_DECLARE_ALLOCATE. Pass new pointer argument to gomp_acc_declare_allocate. (goacc_exit_data_internal): Unlock device mutex around gomp_acc_declare_allocate call. Pass new pointer argument. Handle group pointer mapping for deallocate. 2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com> libgomp/ * oacc-mem.c (goacc_enter_data_internal): Unlock mutex before calling gomp_acc_declare_allocate and relock it afterwards.
2023-05-12Handle references in OpenACC "private" clausesJulian Brown2-0/+20
gcc/ * gimplify.cc (localize_reductions): Rewrite references for OMP_CLAUSE_PRIVATE also.
2023-05-12Silence compiler warningsTobias Burnus1-0/+4
gcc/ 2019-09-17 Tobias Burnus <tobias@codesourcery.com> * gimplify.cc (gomp_oacc_needs_data_present): Remove unused variable.
2023-05-12Update expected messages, errors and warnings for "kernels" testsJulian Brown6-48/+59
gcc/testsuite/ * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Update expected message/warning/error output. * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Likewise. * c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise. * c-c++-common/goacc/routine-1.c: Likewise.