aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-17openacc: Add runtime alias checking for OpenACC kernelsAndrew Stubbs6-209/+550
This commit adds the code generation for the runtime alias checks for OpenACC loops that have been analyzed by Graphite. The runtime alias check condition gets generated in Graphite. It is evaluated by the code generated for the IFN_GOACC_LOOP internal function calls. If aliasing is detected at runtime, the execution dimensions get adjusted to execute the affected loops sequentially. gcc/ChangeLog: * graphite-isl-ast-to-gimple.c: Include internal-fn.h. (graphite_oacc_analyze_scop): Implement runtime alias checks. * omp-expand.c (expand_oacc_for): Add an additional "noalias" parameter to GOACC_LOOP internal calls, and initialise it to integer_one_node. * omp-offload.c (oacc_xform_loop): Integrate the runtime alias check into the GOACC_LOOP expansion. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test. * testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
2021-11-17openacc: Add data optimization passAndrew Stubbs17-7/+2444
Address PR90591 "Avoid unnecessary data transfer out of OMP construct", for simple (but common) cases. This commit adds a pass that optimizes data mapping clauses. Currently, it can optimize copy/map(tofrom) clauses involving scalars to copyin/map(to) and further to "private". The pass is restricted "kernels" regions but could be extended to other types of regions. gcc/ChangeLog: * Makefile.in: Add pass. * doc/gimple.texi: TODO. * gimple-walk.c (walk_gimple_seq_mod): Adjust for backward walking. * gimple-walk.h (struct walk_stmt_info): Add field. * passes.def: Add new pass. * tree-pass.h (make_pass_omp_data_optimize): New declaration. * omp-data-optimize.cc: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Expect optimization messages. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise. gcc/testsuite/ChangeLog: * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise. * c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Likewise. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise. * c-c++-common/goacc/uninit-copy-clause.c: Likewise. * gfortran.dg/goacc/uninit-copy-clause.f95: Likewise. * c-c++-common/goacc/omp_data_optimize-1.c: New test. * g++.dg/goacc/omp_data_optimize-1.C: New test. * gfortran.dg/goacc/omp_data_optimize-1.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-17Add function for printing a single OMP_CLAUSEFrederik Harwath2-0/+12
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump the whole OMP clause chain") changed the dumping behavior for OMP_CLAUSEs. The old behavior is required for a follow-up commit ("openacc: Add data optimization pass") that optimizes single OMP_CLAUSEs. gcc/ChangeLog: * tree-pretty-print.c (print_omp_clause_to_str): Add new function. * tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
2021-11-17openacc: Remove unused partitioning in "kernels" regionsFrederik Harwath2-11/+59
With the old "kernels" handling, unparallelized regions would get executed with 1x1x1 partitioning even if the user provided explicit num_gangs, num_workers clauses etc. This commit restores this behavior by removing unused partitioning after assigning the parallelism dimensions to loops. gcc/ChangeLog: * omp-offload.c (oacc_remove_unused_partitioning): New function for removing partitioning that is not used by any loop. (oacc_validate_dims): Call oacc_remove_unused_partitioning and enable warnings about unused partitioning. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust expectations.
2021-11-17openacc: Add further kernels testsFrederik Harwath17-0/+1155
Add some copies of tests to continue covering the old "parloops"-based "kernels" implementation - until it gets removed from GCC - and add further tests for the new Graphite-based implementation. libgomp/ChangeLog: * testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: New test. * c-c++-common/goacc/classify-kernels-unparallelized-parloops.c: New test. * c-c++-common/goacc/kernels-decompose-1-parloops.c: New test. * c-c++-common/goacc/kernels-reduction-parloops.c: New test. * c-c++-common/goacc/loop-auto-reductions.c: New test. * c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test. * c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c: New test. * gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95: New test. * gfortran.dg/goacc/kernels-conversion.f95: New test. * gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test. * gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test. * gfortran.dg/goacc/kernels-loop-parloops.f95: New test. * gfortran.dg/goacc/kernels-reductions.f90: New test.
2021-11-17openacc: Add "can_be_parallel" flag info to "graph" dumpsFrederik Harwath1-11/+24
gcc/ChangeLog: * graph.c (oacc_get_fn_attrib): New declaration. (find_loop_location): New declaration. (draw_cfg_nodes_for_loop): Print value of the can_be_parallel flag at the top of loops in OpenACC functions.
2021-11-17openacc: Use Graphite for dependence analysis in "kernels" regionsFrederik Harwath55-420/+3089
This commit changes the handling of OpenACC "kernels" to use Graphite for dependence analysis. To this end, it first introduces a new internal representation for "kernels" regions which should be analyzed by Graphite in pass_omp_oacc_kernels_decompose. This is now the default for all "kernels" regions, but the old handling is still available through the command line parameter "--param=openacc_kernels=decompose-parloops". The handling of this new region type in the omp lowering and omp offloading passes follows the existing handling for "parallel" regions. This replaces the specialized handling for "kernels" regions that was previously used and which was in limited in many ways. Graphite is adjusted to be able to analyze the OpenACC functions that get outlined from the "kernels" regions. It is enabled to handle the internal function calls that contain information about OpenACC constructs. In some places where function calls would be rejected by Graphite, those calls need to be ignored. In other places, information about the loop step, bounds etc. needs to be extracted from the calls. The goal is to enable an analysis of the original loop parameters although the omp lowering and expansion steps have already modified the loop structure. Some parallelization-enabling constructs such as OpenACC "reduction" and "private"/"firstprivate" clauses must be recognized and the data-dependences must be adjusted to reflect the semantics of those constructs. The data-dependence analysis step in Graphite has so far been tied to the code generation step. This commit introduces a separate data-dependence analysis step that avoids the code generation. This is necessary because adjusting the code generation to create a correct OpenACC loop structure would require very considerable effort and the goal of this commit is to implement the dependence analysis only. The ability to use Graphite for dependence analysis without its code generation might be of independent interest, but it is so far used for OpenACC purposes only. In general, all changes to Graphite try to avoid affecting other uses of Graphite as much as possible. gcc/ChangeLog: * Makefile.in: Add graphite-oacc.o * cfgloop.c (alloc_loop): Set can_be_parallel_valid_p to false. * cfgloop.h: Add can_be_parallel_valid_p field. * cfgloopmanip.c (copy_loop_info): Add assert. * config/nvptx/nvptx.c (nvptx_goacc_reduction_setup): * doc/invoke.texi: Adjust param openacc-kernels description. * doc/passes.texi: Adjust pass_ipa_oacc_kernels description. * flag-types.h (enum openacc_kernels):Add OPENACC_KERNELS_DECOMPOSE_PARLOOPS. * gimple-pretty-print.c (dump_gimple_omp_target): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. * gimple.h (enum gf_mask): Add GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and widen GF_OMP_TARGET_KIND_MASK. (is_gimple_omp_oacc): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (is_gimple_omp_offloaded): Likewise. * gimplify.c (gimplify_omp_for): Enable reduction localization for "kernels" regions. (gimplify_omp_workshare): Likewise. * graphite-dependences.c (scop_get_reads_and_writes): Handle "kills" and "reduction" PDRs. (apply_schedule_on_deps): Add dump output for intermediate steps of the dependence computation to enable understanding of unexpected dependences. (carries_deps): Likewise. (scop_get_dependences): Handle "kill" operations and add dump output. * graphite-isl-ast-to-gimple.c (visit_schedule_loop_node): New function. (graphite_oacc_analyze_scop): New function. * graphite-optimize-isl.c (optimize_isl): Remove "static" and add argument to identify OpenACC use; don't fail on unchanged schedule in this case. * graphite-poly.c (new_poly_dr): Handle "kills". (print_pdr): Likewise. (new_gimple_poly_bb): Likewise. (free_gimple_poly_bb): Likewise. (new_scop): Handle "reduction", "private", and "firstprivate" hash sets. (free_scop): Likewise. (print_isl_space): New function. (debug_isl_space): New function. * graphite-scop-detection.c (scop_detection::can_represent_loop): Don't fail if niter is 0 in OpenACC functions. (scop_detection::add_scop): Don't reject regions with only one loop in OpenACC functions. (ignored_oacc_internal_call_p): New function. (scan_tree_for_params): Handle VIEW_CONVERT_EXPR. (stmt_has_side_effects): Ignore internal OpenACC function calls. (add_write): Likewise. (add_read): Likewise. (add_kill): New function. (add_kills): New function. (add_oacc_kills): New function. (try_generate_gimple_bb): Kill false dependences for OpenACC "private"/"firstprivate" vars. (gather_bbs::gather_bbs): Determin OpenACC "private"/"firstprivate" vars in region. (gather_bbs::before_dom_children): Add assert. (determine_openacc_reductions): New function. (build_scops): Determine OpenACC "reduction" vars in SCoP. * graphite-sese-to-poly.c (oacc_ifn_call_extract): New declaration. (oacc_internal_call_p): New function. (build_poly_dr): Ignore internal OpenACC function calls, * handle "reduction" refs. (build_poly_sr): Likewise; handle "kill" operations. * graphite.c (graphite_transform_loops): Accept functions with only a single loop. (oacc_enable_graphite_p): New function. (gate_graphite_transforms): Enable pass on OpenACC functions. * graphite.h (enum poly_dr_type): Add PDR_KILL. (struct poly_dr): Add "is_reduction" field. (new_poly_dr): Add argument to declaration. (pdr_kill_p): New function. (print_isl_space): New declaration. (debug_isl_space): New declaration. (struct scop): Add fields "reductions_vars", "oacc_firstprivate_vars", and "oacc_private_scalars". (optimize_isl): New declaration. (graphite_oacc_analyze_scop): New declaration. * internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE * internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE * omp-expand.c (struct omp_region): Adjust comment. (expand_omp_taskloop_for_inner): (expand_omp_for): Add asserts about expected "kernels" region types. (mark_loops_in_oacc_kernels_region): Likewise. (expand_omp_target): Likewise; handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (build_omp_regions_1): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. Likewise. (omp_make_gimple_edges): Likewise. * omp-general.c (oacc_get_kernels_attrib): New function. (oacc_get_fn_dim_size): Allow argument to be NULL. * omp-general.h (oacc_get_kernels_attrib): New declaration. * omp-low.c (struct omp_context): Add fields "oacc_firstprivate_vars" and "oacc_private_scalars". (was_originally_oacc_kernels): New function. (is_oacc_kernels): (is_oacc_kernels_decomposed_graphite_part): New function. (new_omp_context): Allocate "oacc_first_private_vars" and "oacc_private_scalars" ... (delete_omp_context): ... and free from here. (oacc_record_firstprivate_var_clauses): New function. (oacc_record_private_scalars): New function. (scan_sharing_clauses): Call functions to record "private" scalars and "firstprivate" variables. (check_oacc_kernel_gwv): Add assert. (ctx_in_oacc_kernels_region): Handle GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE. (scan_omp_for): Likewise. (check_omp_nesting_restrictions): Likewise. (lower_oacc_head_mark): Likewise. (lower_omp_for): Likewise. (lower_omp_target): Create "private" and "firstprivate" marker call statements. (lower_oacc_head_tail): Adjust "private" and "firstprivate" marker calls. (lower_oacc_reductions): Emit "private" and "firstprivate" marker call statements. (make_oacc_firstprivate_vars_marker): New function. (make_oacc_private_scalars_marker): New function. * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn): Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to region using the new "kernels" handling. (make_region_seq): Adjust default region type for new "kernels" handling; no more exceptions, let Graphite handle everything. (make_region_loop_nest): Likewise; add dump output and assert. (adjust_nested_loop_clauses): Stop creating "auto" clauses if loop has "independent", "gang" etc. (transform_kernels_loop_clauses): Likewise. * omp-offload.c (oacc_extract_loop_call): New function. (oacc_loop_get_cfg_loop): New function. (can_be_parallel_str): New function. (oacc_loop_can_be_parallel_p): New function. (oacc_parallel_kernels_graphite_fun_p): New function. (oacc_parallel_fun_p): New function. (oacc_loop_transform_auto_into_independent): New function, ... (oacc_loop_fixed_partitions): ... called from here to transfer the result of Graphite's analysis to the loop. (execute_oacc_loop_designation): Handle "oacc functions with "parallel_kernels_graphite" attribute. (execute_oacc_device_lower): Handle IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE. * omp-offload.h (oacc_extract_loop_call): Add declaration. * params.opt: Add "param=openacc-kernels" value "decompose-parloops". * sese.c (scalar_evolution_in_region): "Redirect" SCEV analysis to outer loop for IFN_GOACC_LOOP calls. * sese.h: Add field "kill_scalar_refs". * tree-chrec.c (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR like CASE_CONVERT. * tree-data-ref.c (dump_data_reference): Include * DR_BASE_ADDRESS and DR_OFFSET in dump output. (get_references_in_stmt): Don't reject OpenACC internal function calls. (graphite_find_data_references_in_stmt): Remove unused variable. * tree-parloops.c (pass_parallelize_loops::execute): Disable pass with the new kernels handling, enable if requested explicitly. * tree-scalar-evolution.c (set_scev_analyze_openacc_calls): Set flag to enable the analysis of internal OpenACC function calls (use for Graphite only). (oacc_call_analyzable_p): New function. (oacc_ifn_call_extract): New function. (oacc_simplify): New function. (add_to_evolution): Simplify OpenACC internal function calls if applicable. (follow_ssa_edge_binary): Likewise. (follow_ssa_edge_expr): Likewise. (follow_copies_to_constant): Likewise. (analyze_initial_condition): Likewise. (interpret_loop_phi): Likewise. (interpret_gimple_call): New function. (interpret_rhs_expr): Likewise. (instantiate_scev_name): Likewise. (analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions. (expression_expensive_p): Consider internal OpenACC calls to be cheap. * tree-scalar-evolution.h (set_scev_analyze_openacc_calls): New declaration. (oacc_call_analyzable_p): New declaration. * tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Mark lhs of internal OpenACC function calls necessary. * tree-ssa-ifcombine.c (recognize_if_then_else): * tree-ssa-loop-niter.c (oacc_call_analyzable_p): (oacc_ifn_call_extract): New declaration. (interpret_gimple_call): New delcaration. (expand_simple_operations): Handle internal OpenACC function calls. * tree-ssa-loop.c (gate_oacc_kernels): Disable for new "kernels" handling. * graphite-oacc.c: New file. * graphite-oacc.h: New file. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust. * testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust. * testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust. * testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust. gcc/testsuite/ChangeLog: * c-c++-common/goacc/classify-kernels.c: Adjust. * c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust. * c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust. * c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust. * c-c++-common/goacc/classify-kernels-unparallelized.c: Removed. * c-c++-common/goacc/kernels-reduction.c: Removed. * gfortran.dg/goacc/loop-auto-transfer-2.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-3.f90: New test. * gfortran.dg/goacc/loop-auto-transfer-4.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-17graphite: Add runtime alias checkingFrederik Harwath7-23/+326
Graphite rejects a SCoP if it contains a pair of data references for which it cannot determine statically if they may alias. This happens very often, for instance in C code which does not use explicit "restrict". This commit adds the possibility to analyze a SCoP nevertheless and perform an alias check at runtime. Then, if aliasing is detected, the execution will fall back to the unoptimized SCoP. TODO This needs more testing on non-OpenACC code. gcc/ChangeLog: * common.opt: Add fgraphite-runtime-alias-checks. * graphite-isl-ast-to-gimple.c (generate_alias_cond): New function. (graphite_regenerate_ast_isl): Use from here. * graphite-poly.c (new_scop): Create unhandled_alias_ddrs vec ... (free_scop): and release here. * graphite-scop-detection.c (dr_defs_outside_region): New function. (dr_well_analyzed_for_runtime_alias_check_p): New function. (graphite_runtime_alias_check_p): New function. (build_alias_set): Record unhandled alias ddrs for later alias check creation if flag_graphite_runtime_alias_checks is true instead of failing. * graphite.h (struct scop): Add field unhandled_alias_ddrs. * sese.h (has_operands_from_region_p): New function. gcc/testsuite/ChangeLog: * gcc.dg/graphite/alias-1.c: New test.
2021-11-17Move compute_alias_check_pairs to tree-data-ref.cFrederik Harwath3-87/+90
Move this function from tree-loop-distribution.c to tree-data-ref.c and make it non-static to enable its use from other parts of GCC. gcc/ChangeLog: * tree-loop-distribution.c (data_ref_segment_size): Remove function. (latch_dominated_by_data_ref): Likewise. (compute_alias_check_pairs): Likewise. * tree-data-ref.c (data_ref_segment_size): New function, copied from tree-loop-distribution.c (compute_alias_check_pairs): Likewise. (latch_dominated_by_data_ref): Likewise. * tree-data-ref.h (compute_alias_check_pairs): New declaration.
2021-11-17Fix branch prediction dump messageFrederik Harwath1-1/+1
Instead of, for instance, "Loop got predicted 1 to iterate 10 times" the message should be "Loop 1 got predicted to iterate 10 times". gcc/ChangeLog: * predict.c (pass_profile::execute): Fix dump message.
2021-11-17graphite: Fix minor mistakes in commentsFrederik Harwath2-3/+3
gcc/ChangeLog: * graphite-sese-to-poly.c (build_poly_sr_1): Fix a typo and a reference to a variable which does not exist. * graphite-isl-ast-to-gimple.c (gsi_insert_earliest): Fix typo in comment.
2021-11-17graphite: Rename isl_id_for_ssa_nameFrederik Harwath1-10/+11
The SSA names for which this function gets used are always SCoP parameters and hence "isl_id_for_parameter" is a better name. It also explains the prefix "P_" for those names in the ISL representation. gcc/ChangeLog: * graphite-sese-to-poly.c (isl_id_for_ssa_name): Rename to ... (isl_id_for_parameter): ... this new function name. (build_scop_context): Adjust function use.
2021-11-17graphite: Extend SCoP detection dump outputFrederik Harwath1-23/+165
Extend dump output to make understanding why Graphite rejects to include a loop in a SCoP easier (for GCC developers). ChangeLog: * graphite-scop-detection.c (scop_detection::can_represent_loop): Output reason for failure to dump file. (scop_detection::harmful_loop_in_region): Likewise. (scop_detection::graphite_can_represent_expr): Likewise. (scop_detection::stmt_has_simple_data_refs_p): Likewise. (scop_detection::stmt_simple_for_scop_p): Likewise. (print_sese_loop_numbers): New function. (scop_detection::add_scop): Use from here to print loops in rejected SCoP.
2021-11-16openacc: Move pass_oacc_device_lower after pass_graphiteFrederik Harwath48-86/+360
The OpenACC device lowering pass must run after the Graphite pass to allow for the use of Graphite for automatic parallelization of kernels regions in the future. Experimentation has shown that it is best, performancewise, to run pass_oacc_device_lower together with the related passes pass_oacc_loop_designation and pass_oacc_gimple_workers early after pass_graphite in pass_tree_loop, at least if the other tree loop passes are not adjusted. In particular, to enable vectorization which is crucial for GCN offloading, device lowering should happen before pass_vectorize. To bring the loops contained in the offloading functions into the shape expected by the loop vectorizer, we have to make sure that some passes that previously were executed only once before pass_tree_loop are also executed on the offloading functions. To ensure the execution of pass_oacc_device_lower if pass_tree_loop does not execute (no loops, no optimizations), we introduce two further copies of the pass to the pipeline that run if there are no loops or if no optimization is performed. gcc/ChangeLog: * omp-general.c (oacc_get_fn_dim_size): Return 0 on missing "dims". * omp-offload.c (pass_oacc_loop_designation::clone): New member function. (pass_oacc_gimple_workers::clone): Likewise. (pass_oacc_gimple_device_lower::clone): Likewise. * passes.c (pass_data_no_loop_optimizations): New pass_data. (class pass_no_loop_optimizations): New pass. (make_pass_no_loop_optimizations): New function. * passes.def: Move pass_oacc_{loop_designation, gimple_workers, device_lower} into tree_loop, and add copies to pass_tree_no_loop and to new pass_no_loop_optimizations. Add copies of passes pass_ccp, pass_ipa_warn, pass_complete_unrolli, pass_backprop, pass_phiprop, pass_fix_loops after the OpenACC passes in pass_tree_loop. * tree-ssa-loop-ivcanon.c (pass_complete_unroll::clone): New member function. (pass_complete_unrolli::clone): Likewise. * tree-ssa-loop.c (pass_fix_loops::clone): Likewise. (pass_tree_loop_init::clone): Likewise. (pass_tree_loop_done::clone): Likewise. * tree-ssa-phiprop.c (pass_phiprop::clone): Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust expected output to pass name changes due to the pass reordering and cloning. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise. * testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/goacc/loop-processing-1.c: Adjust expected output * to pass name changes due to the pass reordering and cloning. * c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise. * c-c++-common/goacc/classify-kernels.c: Likewise. * c-c++-common/goacc/classify-parallel.c: Likewise. * c-c++-common/goacc/classify-routine.c: Likewise. * c-c++-common/goacc/routine-nohost-1.c: Likewise. * c-c++-common/unroll-1.c: Likewise. * c-c++-common/unroll-4.c: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gcc.dg/tree-ssa/backprop-1.c: Likewise. * gcc.dg/tree-ssa/backprop-2.c: Likewise. * gcc.dg/tree-ssa/backprop-3.c: Likewise. * gcc.dg/tree-ssa/backprop-4.c: Likewise. * gcc.dg/tree-ssa/backprop-5.c: Likewise. * gcc.dg/tree-ssa/backprop-6.c: Likewise. * gcc.dg/tree-ssa/cunroll-1.c: Likewise. * gcc.dg/tree-ssa/cunroll-3.c: Likewise. * gcc.dg/tree-ssa/cunroll-9.c: Likewise. * gcc.dg/tree-ssa/ldist-17.c: Likewise. * gcc.dg/tree-ssa/loop-38.c: Likewise. * gcc.dg/tree-ssa/pr21463.c: Likewise. * gcc.dg/tree-ssa/pr45427.c: Likewise. * gcc.dg/tree-ssa/pr61743-1.c: Likewise. * gcc.dg/unroll-2.c: Likewise. * gcc.dg/unroll-3.c: Likewise. * gcc.dg/unroll-4.c: Likewise. * gcc.dg/unroll-5.c: Likewise. * gcc.dg/vect/vect-profile-1.c: Likewise. * c-c++-common/goacc/device-lowering-debug-optimization.c: New test. * c-c++-common/goacc/device-lowering-no-loops.c: New test. * c-c++-common/goacc/device-lowering-no-optimization.c: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2021-11-16Fortran: delinearize multi-dimensional array accessesSandra Loosemore13-95/+264
The Fortran front end presently linearizes accesses to multi-dimensional arrays by combining the indices for the various dimensions into a series of explicit multiplies and adds with refactoring to allow CSE of invariant parts of the computation. Unfortunately this representation interferes with Graphite-based loop optimizations. It is difficult to recover the original multi-dimensional form of the access by the time loop optimizations run because parts of it have already been optimized away or into a form that is not easily recognizable, so it seems better to have the Fortran front end produce delinearized accesses to begin with, a set of nested ARRAY_REFs similar to the existing behavior of the C and C++ front ends. This is a long-standing problem that has previously been discussed e.g. in PR 14741 and PR61000. This patch is an initial implementation for explicit array accesses only; it doesn't handle the accesses generated during scalarization of whole-array or array-section operations, which follow a different code path. gcc/ * expr.c (get_inner_reference): Handle NOP_EXPR like VIEW_CONVERT_EXPR. gcc/fortran/ * lang.opt (-param=delinearize=): New. * trans-array.c (get_class_array_vptr): New, split from... (build_array_ref): ...here. (get_array_lbound, get_array_ubound): New, split from... (gfc_conv_array_ref): ...here. Additional code refactoring plus support for delinearization of the array access. gcc/testsuite/ * gfortran.dg/assumed_type_2.f90: Adjust patterns. * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise. * gfortran.dg/graphite/block-3.f90: Remove xfails. * gfortran.dg/graphite/block-4.f90: Likewise. * gfortran.dg/inline_matmul_24.f90: Adjust patterns. * gfortran.dg/no_arg_check_2.f90: Likewise. * gfortran.dg/pr32921.f: Likewise. * gfortran.dg/reassoc_4.f: Disable delinearization for this test. Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
2021-11-16Add dg-final option-based target selectorsRichard Sandiford5-15/+107
This patch adds target selectors of the form: { any-opts "opt1" ... "optn" } { no-opts "opt1" ... "optn" } for skipping or xfailing tests based on compiler options. It only works for dg-final selectors. The patch then uses no-opts to exclude -O0 and (sometimes) -Og from some guality.exp xfails. AFAICT (based on gcc-testresults) these tests pass for those options for all targets. gcc/ * doc/sourcebuild.texi: Document no-opts and any-opts target selectors. gcc/testsuite/ * lib/target-supports-dg.exp (selector_expression): Handle any-opts and no-opts. * gcc.dg/guality/pr41353-1.c: Exclude -O0 from xfail. * gcc.dg/guality/pr59776.c: Likewise. * gcc.dg/guality/pr54970.c: Likewise -O0 and -Og.
2021-11-16Fix gimple_debug_cfg declarationFrederik Harwath1-1/+1
Silence a warning. The argument type did not match the definition. gcc/ChangeLog: * tree-cfg.h (gimple_debug_cfg): Change argument type from int to dump_flags_t.
2021-11-10Merge remote-tracking branch 'origin/releases/gcc-11' into devel/omp/gcc-11Tobias Burnus64-227/+1813
Merge up to r11-9233-g3dea90505df136a4b361665772ef8e62306cfcdb (Nov 10, 2021)
2021-11-10testsuite/102690 - XFAIL g++.dg/warn/Warray-bounds-16.CRichard Biener1-3/+3
This XFAILs the bogus diagnostic test and rectifies the expectation on the optimization. 2021-11-10 Richard Biener <rguenther@suse.de> PR testsuite/102690 * g++.dg/warn/Warray-bounds-16.C: XFAIL diagnostic part and optimization. (cherry picked from commit b2cd32b743ba440e75505ce30c6b5c592ed144ea)
2021-11-10openmp: For default(none) ignore variables created by ubsan_create_data ↵Jakub Jelinek4-6/+81
[PR64888] We weren't ignoring the ubsan variables created by c-ubsan.c before gimplification (others are added later). One way to fix this would be to introduce further UBSAN_ internal functions and lower it later (sanopt pass) like other ifns, this patch instead recognizes those magic vars by name/name of type and DECL_ARTIFICIAL and TYPE_ARTIFICIAL. 2021-10-21 Jakub Jelinek <jakub@redhat.com> PR middle-end/64888 gcc/c-family/ * c-omp.c (c_omp_predefined_variable): Return true also for ubsan_create_data created artificial variables. gcc/testsuite/ * c-c++-common/ubsan/pr64888.c: New test. (cherry picked from commit 40dd9d839e52f679d8eabc1c5ca0ca17a5ccfd14)
2021-11-10Restore 'GOMP_OPENACC_DIM' environment variable parsingThomas Schwinge2-1/+8
... that got broken by recent commit c057ed9c52c6a63a1a692268f916b1a9131cd4b7 "openmp: Fix up strtoul and strtoull uses in libgomp", resulting in spurious FAILs for tests specifying 'dg-set-target-env-var "GOMP_OPENACC_DIM" "[...]"'. libgomp/ * env.c (parse_gomp_openacc_dim): Restore parsing. (cherry picked from commit 00c9ce13a64e324dabd8dfd236882919a3119479)
2021-11-10Daily bump.GCC Administrator2-1/+10
2021-11-08rs6000: Fix incorrect fusion constraint [PR102991]Xionghu Luo2-65/+65
gcc/ChangeLog: 2021-11-05 Xionghu Luo <luoxhu@linux.ibm.com> PR target/102991 * config/rs6000/fusion.md: Regenerate. * config/rs6000/genfusion.pl: Fix incorrect clobber constraint. (cherry picked from commit 614b39757b8b61f70ac1c666edb7a01a5fc19cd4)
2021-11-09Daily bump.GCC Administrator4-1/+170
2021-11-08tree-optimization/102798 - avoid copying PTA info to old SSA namesRichard Biener2-1/+43
The vectorizer duplicates pointer-info to created pointer bases but it has to avoid changing points-to info on existing SSA names because there's now flow-sensitive info in there (pt->pt_null as set from VRP). 2021-10-18 Richard Biener <rguenther@suse.de> PR tree-optimization/102798 * tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Only copy points-to info to newly generated SSA names. * gcc.dg/pr102798.c: New testcase.
2021-11-08middle-end/102518 - avoid invalid GIMPLE during inliningRichard Biener2-1/+17
When inlining we have to avoid mapping a non-lvalue parameter value into a context that prevents the parameter to be a register. Formerly the register were TREE_ADDRESSABLE but now it can be just DECL_NOT_GIMPLE_REG_P. 2021-09-30 Richard Biener <rguenther@suse.de> PR middle-end/102518 * tree-inline.c (setup_one_parameter): Avoid substituting an invariant into contexts where a GIMPLE register is not valid. * gcc.dg/torture/pr102518.c: New testcase.
2021-11-08tree-optimization/102788 - avoid spurious bool pattern failsRichard Biener2-5/+35
Bool pattern recog is required for correctness since vectorized compares otherwise produce -1 for true so any context where bool is used as value and not as condition or mask needs to be replaced with CMP ? 1 : 0. When we fail to find a vector type for the result of such use we may not simply elide such transform since a new bool result can emerge when for example the cast_forwprop pattern is applied. So the following avoids failing of the bool pattern recog process and instead not assign a vector type for the stmt. 2021-10-18 Richard Biener <rguenther@suse.de> PR tree-optimization/102788 * tree-vect-patterns.c (vect_init_pattern_stmt): Allow a NULL vectype. (vect_pattern_recog_1): Likewise. (vect_recog_bool_pattern): Continue matching the pattern even if we do not have a vector type for a conversion result. * g++.dg/vect/pr102788.cc: New testcase. (cherry picked from commit eb032893675afea4b01cc6ad06a3e0dcfe9b51cd)
2021-11-08ipa/102762 - fix ICE with invalid __builtin_va_arg_pack () useRichard Biener2-1/+18
We have to be careful to not break the argument space calculation. If there's not enough arguments just do not append any. 2021-10-15 Richard Biener <rguenther@suse.de> PR ipa/102762 * tree-inline.c (copy_bb): Avoid underflowing nargs. * gcc.dg/torture/pr102762.c: New testcase. (cherry picked from commit 11a4714860d2df6ba496d55379e7dc702d5fc425)
2021-11-08tree-optimization/102572 - fix gathers with invariant maskRichard Biener2-1/+15
This fixes the vector def gathering for invariant masks which failed to pass in the desired vector type resulting in a non-mask type to be generate. 2021-10-12 Richard Biener <rguenther@suse.de> PR tree-optimization/102572 * tree-vect-stmts.c (vect_build_gather_load_calls): When gathering the vectorized defs for the mask pass in the desired mask vector type so invariants will be handled correctly. * g++.dg/vect/pr102572.cc: New testcase. (cherry picked from commit 9f12a45ef147e563f099c24c293830727e8204cc)
2021-11-08tree-optimization/102139 - fix SLP DR base alignmentRichard Biener3-34/+85
When doing whole-function SLP we have to make sure the recorded base alignments we compute as the maximum alignment seen for a base anywhere in the function is actually valid at the point we want to make use of it. To make this work we now record the stmt the alignment was derived from in addition to the DRs innermost behavior and we use a dominance check to verify the recorded info is valid when doing BB vectorization. For this to work for groups inside a BB that are separate by a call that might not return we now store the DR analysis group-id permanently and use that for an additional check when the DRs are in the same BB. 2021-08-31 Richard Biener <rguenther@suse.de> PR tree-optimization/102139 * tree-vectorizer.h (vec_base_alignments): Adjust hash-map type to record a std::pair of the stmt-info and the innermost loop behavior. (dr_vec_info::group): New member. * tree-vect-data-refs.c (vect_record_base_alignment): Adjust. (vect_compute_data_ref_alignment): Verify the recorded base alignment can be used. (data_ref_pair): Remove. (dr_group_sort_cmp): Adjust. (vect_analyze_data_ref_accesses): Store the group-ID in the dr_vec_info and operate on a vector of dr_vec_infos. * gcc.dg/torture/pr102139.c: New testcase. (cherry picked from commit 153766ec8351d55cfe8bd6d69bdfc0c2cef71e56)
2021-11-08Refactor BB splitting of DRs for SLP group analysisRichard Biener2-13/+10
This uses the group_id computed to ensure DRs in different BBs do not get merged into a DR group. To achieve this we seed the group from the BB index when group_ids are not computed and we make sure to bump the group_id when advancing to the next BB for BB SLP analysis. This paves the way for relaxing the grouping for BB vectorization by adjusting its group_id computation. 2021-08-20 Richard Biener <rguenther@suse.de> * tree-vect-data-refs.c (dr_group_sort_cmp): Do not compare BBs. (vect_analyze_data_ref_accesses): Likewise. Assign the BB index as group_id when dataref_groups were not computed. * tree-vect-slp.c (vect_slp_bbs): Bump current_group when we advace to the next BB. (cherry picked from commit 37744f8260857005c8409c9e2e633a05c768a7dd)
2021-11-08middle-end/101480 - overloaded global new/deleteRichard Biener2-2/+54
The following fixes the issue of ignoring side-effects on memory from overloaded global new/delete operators by not marking them as effectively 'const' apart from other explicitely specified side-effects. This will cause FAIL: g++.dg/warn/Warray-bounds-16.C -std=gnu++1? (test for excess errors) because we now no longer statically see the initialization loop never executes because the call to operator new can now clobber 'a.m'. This seems to be an issue with the warning code and/or ranger so I'm leaving this FAIL to be addressed as followup. 2021-10-11 Richard Biener <rguenther@suse.de> PR middle-end/101480 * gimple.c (gimple_call_fnspec): Do not mark operator new/delete as const. * g++.dg/torture/pr10148.C: New testcase. (cherry picked from commit 09a0affdb0598a54835ac4bb0dd6b54122c12916)
2021-11-08gcov-profile: Fix -fcompare-debug with -fprofile-generate [PR100520]Martin Liska3-2/+23
PR gcov-profile/100520 gcc/ChangeLog: * coverage.c (coverage_compute_profile_id): Strip .gk when compare debug is used. * system.h (endswith): New function. gcc/testsuite/ChangeLog: * gcc.dg/pr100520.c: New test. (cherry picked from commit 7553bd35c876efaf8ab0b6661a6102822b99e6e3)
2021-11-08gcc-changelog: sync from masterMartin Liska6-19/+189
contrib/ChangeLog: * gcc-changelog/git_check_commit.py: Sync from master. * gcc-changelog/git_commit.py: Likewise. * gcc-changelog/git_email.py: Likewise. * gcc-changelog/git_update_version.py: Likewise. * gcc-changelog/test_email.py: Likewise. * gcc-changelog/test_patches.txt: Likewise.
2021-11-07vect: Don't update inits for simd_lane_access DRs [PR102789]Kewen Lin1-1/+2
As PR102789 shows, when vectorizer does some peelings for alignment in prologues, function vect_update_inits_of_drs would update the inits of some drs. But as the failed case, we shouldn't update the dr for simd_lane_access, it has the fixed-length storage mainly for the main loop, the update can make the access out of bound and access the unexpected element. gcc/ChangeLog: PR tree-optimization/102789 * tree-vect-loop-manip.c (vect_update_inits_of_drs): Do not update inits of simd_lane_access. (cherry picked from commit f3dbd3f36d55178d0a9e4431043cbc950524969a)
2021-11-08Daily bump.GCC Administrator3-1/+96
2021-11-07Fortran: error recovery on initializing invalid derived type array componentHarald Anlauf2-2/+18
gcc/fortran/ChangeLog: PR fortran/102816 * resolve.c (resolve_structure_cons): Reject invalid array spec of a DT component referenced in a structure constructor. gcc/testsuite/ChangeLog: PR fortran/102816 * gfortran.dg/pr102816.f90: New test. (cherry picked from commit 99af0b2f0fe1c0dc8c6d558157e700326d52816a)
2021-11-07Fortran: validate shape of arrays in constructors against declarationsHarald Anlauf6-6/+68
gcc/fortran/ChangeLog: PR fortran/102685 * decl.c (match_clist_expr): Set rank/shape of clist initializer to match LHS. * resolve.c (resolve_structure_cons): In a structure constructor, compare shapes of array components against declared shape. gcc/testsuite/ChangeLog: PR fortran/102685 * gfortran.dg/derived_constructor_char_1.f90: Fix invalid code. * gfortran.dg/pr70931.f90: Likewise. * gfortran.dg/transfer_simplify_2.f90: Likewise. * gfortran.dg/pr102685.f90: New test. Co-authored-by: Tobias Burnus <tobias@codesourcery.com> (cherry picked from commit 1e819bd95ebeefc1dc469daa1855ce005cb77822)
2021-11-07Fortran: error recovery on rank mismatch of array and its initializerHarald Anlauf3-1/+22
gcc/fortran/ChangeLog: PR fortran/102715 * decl.c (add_init_expr_to_sym): Reject rank mismatch between array and its initializer. gcc/testsuite/ChangeLog: PR fortran/102715 * gfortran.dg/pr68019.f90: Adjust error message. * gfortran.dg/pr102715.f90: New test. (cherry picked from commit df2135e88a8f78c853b35246ad426b01b6d08378)
2021-11-07Fortran: fix simplification of array-valued parameter expressionsHarald Anlauf2-0/+19
gcc/fortran/ChangeLog: PR fortran/102817 * expr.c (simplify_parameter_variable): Copy shape of referenced subobject when simplifying. gcc/testsuite/ChangeLog: PR fortran/102817 * gfortran.dg/pr102817.f90: New test. (cherry picked from commit bcf3728abe8488882922005166d3065fc5fdfea1)
2021-11-07Fortran: handle initialization of derived type parameter arrays from scalarHarald Anlauf2-3/+32
gcc/fortran/ChangeLog: PR fortran/99348 PR fortran/102521 * decl.c (add_init_expr_to_sym): Extend initialization of parameter arrays from scalars to handle derived types. gcc/testsuite/ChangeLog: PR fortran/99348 PR fortran/102521 * gfortran.dg/parameter_array_init_8.f90: New test. (cherry picked from commit 74ccca380cde5e79e082d39214b306a90ded0344)
2021-11-07Daily bump.GCC Administrator1-1/+1
2021-11-06Daily bump.GCC Administrator3-1/+44
2021-11-05Support TI mode and soft float on PA64John David Anglin9-15/+483
This change implements TI mode on PA64. Various new patterns are added to pa.md. The libgcc build needed modification to build both DI and TI routines. We also need various softfp routines to convert to and from TImode. I added full softfp for the -msoft-float option. At the moment, this doesn't completely eliminate all use of the floating-point co-processor. For this, libgcc needs to be built with -msoft-mult. The floating-point exception support also needs a soft option. 2021-11-05 John David Anglin <danglin@gcc.gnu.org> PR libgomp/96661 gcc/ChangeLog: * config/pa/pa-modes.def: Add OImode integer type. * config/pa/pa.c (pa_scalar_mode_supported_p): Allow TImode for TARGET_64BIT. * config/pa/pa.h (MIN_UNITS_PER_WORD) Define to MIN_UNITS_PER_WORD to UNITS_PER_WORD if IN_LIBGCC2. * config/pa/pa.md (addti3, addvti3, subti3, subvti3, negti2, negvti2, ashlti3, shrpd_internal): New patterns. Change some multi instruction types to multi. libgcc/ChangeLog: * config.host (hppa*64*-*-linux*): Revise tmake_file. (hppa*64*-*-hpux11*): Likewise. * config/pa/sfp-exceptions.c: New. * config/pa/sfp-machine.h: New. * config/pa/t-dimode: New. * config/pa/t-softfp-sfdftf: New.
2021-11-05Speed up jump table switch detection.Martin Liska2-21/+35
PR tree-optimization/100393 gcc/ChangeLog: * tree-switch-conversion.c (group_cluster::dump): Use get_comparison_count. (jump_table_cluster::find_jump_tables): Pre-compute number of comparisons and then decrement it. Cache also max_ratio. (jump_table_cluster::can_be_handled): Change signature. * tree-switch-conversion.h (get_comparison_count): New. (cherry picked from commit c517cf2e685e2903b591d63c1034ff9726cb3822)
2021-11-05gcc: vx-common.h: fix test for VxWorks7Rasmus Villemoes1-1/+1
The macro TARGET_VXWORKS7 is always defined (see vxworks-dummy.h). Thus we need to test its value, not its definedness. Fixes aca124df (define NO_DOT_IN_LABEL only in vxworks6). gcc/ChangeLog: * config/vx-common.h: Test value of TARGET_VXWORKS7 rather than definedness. (cherry picked from commit 44d0243a247dd1280265c649dab26e9486ffa015)
2021-11-05Daily bump.GCC Administrator3-1/+19
2021-11-04x86: Check leal/addl gcc.target/i386/amxtile-3.c for x32H.J. Lu1-6/+12
Check leal and addl for x32 to fix: FAIL: gcc.target/i386/amxtile-3.c scan-assembler addq[ \\t]+\\$12 FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+4 FAIL: gcc.target/i386/amxtile-3.c scan-assembler leaq[ \\t]+8 * gcc.target/i386/amxtile-3.c: Check leal/addl for x32. (cherry picked from commit fbe58ba97aff3270877d7fd5600c17687b85964c)
2021-11-04i386: Fix wrong result for AMX-TILE intrinsic when parsing expression.Hongyu Wang2-3/+31
_tile_loadd, _tile_stored, _tile_streamloadd intrinsics are defined by macro, so the parameters should be wrapped by parentheses to accept expressions. gcc/ChangeLog: * config/i386/amxtileintrin.h (_tile_loadd_internal): Add parentheses to base and stride. (_tile_stream_loadd_internal): Likewise. (_tile_stored_internal): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/amxtile-3.c: New test.
2021-11-04Daily bump.GCC Administrator3-1/+35