aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree.h
AgeCommit message (Collapse)AuthorFilesLines
2024-06-05openmp: OpenMP loop transformation supportJakub Jelinek1-0/+8
This patch is largely rewritten version of the https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631764.html patch set which I've promissed to adjust the way I'd like it but didn't get to it until now. The previous series together in diffstat was 176 files changed, 12107 insertions(+), 298 deletions(-) This patch is 197 files changed, 10843 insertions(+), 212 deletions(-) and diff between the old series and new patch is 268 files changed, 8053 insertions(+), 9231 deletions(-) Only the 5.1/5.2 tile/unroll constructs are supported, in various places some preparations for the other 6.0 loop transformations constructs (interchange/reverse/fuse) are done, but certainly not complete and not everywhere. The important difference is that because tile/unroll partial map 1:1 the original loops to generated canonical loops and add another set of generated loops without canonical form inside of it, the tile/unroll partial constructs are terminal for the generated loop, one can't have some loops from the tile or unroll partial and some further loops from inside the body of that construct. The GENERIC representation attempts to match what the standard specifies, so there are separate OMP_TILE and OMP_UNROLL trees. If for a particular loop in a loop nest of some OpenMP loop it awaits a generated loop from a nested loop, or if in OMP_LOOPXFORM_LOWERED OMP_TILE/UNROLL construct a generated loop has been moved to some surrounding construct, that particular loop is represented by all NULL_TREEs in the OMP_FOR_{INIT,COND,INCR,ORIG_DECLS} vector. The lowering of the loop transforming constructs is done at gimplification time, at the start of gimplify_omp_for. I think this way it is more maintainable over magic clauses with various loop depths on the other looping constructs or the magic OMP_LOOP_TRANS construct. Though, I admit I'm still undecided how to represent the OpenMP 6.0 loop transformation case of say: #pragma omp for collapse (4) for (int i = 0; i < 32; ++i) #pragma omp interchange permutation (2, 1) #pragma omp reverse for (int j = 0; j < 32; ++j) #pragma omp reverse for (int k = 0; k < 32; ++k) for (int l = 0; l < 32; ++l) ; Surely the i loop would go to first vector elements of OMP_FOR_* of the work-sharing loop, then 2 loops are expecting generated loops from interchange which would be inside of the body. But the innermost l loop isn't part of the interchange, so the question is where to put it. One possibility is to have it in the 4th loop of the OMP_FOR, another possibility would be to add some artificial construct inside of the OMP_INTERCHANGE and 2 OMP_REVERSE bodies which would contain the inner loop(s), e.g. it could be OMP_INTERCHANGE without permutation clause or some artificial ones or whatever. I've recently raised various unclear things in the 5.1/5.2/TRs versions regarding loop transformations, in particular https://github.com/OpenMP/spec/issues/3908 https://github.com/OpenMP/spec/issues/3909 (sorry, private links unless you have OpenMP membership). Until those are resolved, I have a sorry on trying to mix generated loops with non-rectangular loops (way too many questions need to be answered before that can be done) and similarly for mixing non-perfectly nested loops with generated loops (again, it can be implemented somehow, but is way too unclear). The second issue is mostly about data sharing, which is ambiguous, the patch makes the artificial iterators of the loops effectively private in the associated constructs (more like local), but for user iterators doesn't do anything in particular, so for now one needs to use explicit data sharing clauses on the non-loop transformation OpenMP looping constructs or surrounding parallel/task/target etc. 2024-06-05 Jakub Jelinek <jakub@redhat.com> Frederik Harwath <frederik@codesourcery.com> Sandra Loosemore <sandra@codesourcery.com> gcc/ * tree.def (OMP_TILE, OMP_UNROLL): New tree codes. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_PARTIAL, OMP_CLAUSE_FULL and OMP_CLAUSE_SIZES. * tree.h (OMP_LOOPXFORM_CHECK): Define. (OMP_LOOPXFORM_LOWERED): Define. (OMP_CLAUSE_PARTIAL_EXPR): Define. (OMP_CLAUSE_SIZES_LIST): Define. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Add entries for OMP_CLAUSE_{PARTIAL,FULL,SIZES}. * tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_{PARTIAL,FULL,SIZES}. (dump_generic_node): Handle OMP_TILE and OMP_UNROLL. Skip printing loops with NULL OMP_FOR_INIT (node) vector element. * gimplify.cc (is_gimple_stmt): Handle OMP_TILE and OMP_UNROLL. (gimplify_omp_taskloop_expr): For SAVE_EXPR use gimplify_save_expr. (gimplify_omp_loop_xform): New function. (gimplify_omp_for): Call omp_maybe_apply_loop_xforms and if that reshuffles what the passed pointer points to, retry or return GS_OK. Handle OMP_TILE and OMP_UNROLL. (gimplify_omp_loop): Call omp_maybe_apply_loop_xforms and if that reshuffles what the passed pointer points to, return GS_OK. (gimplify_expr): Handle OMP_TILE and OMP_UNROLL. * omp-general.h (omp_loop_number_of_iterations, omp_maybe_apply_loop_xforms): Declare. * omp-general.cc (omp_adjust_for_condition): For LE_EXPR and GE_EXPR with pointers, don't add/subtract one, but the size of what the pointer points to. (omp_loop_number_of_iterations, omp_apply_tile, find_nested_loop_xform, omp_maybe_apply_loop_xforms): New functions. gcc/c-family/ * c-common.h (c_omp_find_generated_loop): Declare. * c-gimplify.cc (c_genericize_control_stmt): Handle OMP_TILE and OMP_UNROLL. * c-omp.cc (c_finish_omp_for): Handle generated loops. (c_omp_is_loop_iterator): Likewise. (c_find_nested_loop_xform_r, c_omp_find_generated_loop): New functions. (c_omp_check_loop_iv): Handle generated loops. For now sorry on mixing non-rectangular loop with generated loops. (c_omp_check_loop_binding_exprs): For now sorry on mixing imperfect loops with generated loops. (c_omp_directives): Uncomment tile and unroll entries. * c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL, change PRAGMA_OMP__LAST_ to the latter. (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and PRAGMA_OMP_CLAUSE_PARTIAL. * c-pragma.cc (omp_pragmas_simd): Add tile and unroll omp pragmas. gcc/c/ * c-parser.cc (c_parser_skip_std_attribute_spec_seq): New function. (check_omp_intervening_code): Reject imperfectly nested tile. (c_parser_compound_statement_nostart): If want_nested_loop, use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. (c_parser_omp_clause_name): Handle full and partial clause names. (c_parser_omp_clause_allocate): Remove spurious semicolon. (c_parser_omp_clause_full, c_parser_omp_clause_partial): New functions. (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL and PRAGMA_OMP_CLAUSE_PARTIAL. (c_parser_omp_next_tokens_can_be_canon_loop): New function. (c_parser_omp_loop_nest): Parse C23 attributes. Handle tile/unroll constructs. Use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Only add_stmt (body) if it is non-NULL. (c_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST. Use c_parser_omp_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Remove spurious semicolon. Don't call c_omp_check_loop_binding_exprs if stmt is NULL. Skip generated loops. (c_parser_omp_tile_sizes, c_parser_omp_tile): New functions. (OMP_UNROLL_CLAUSE_MASK): Define. (c_parser_omp_unroll): New function. (c_parser_omp_construct): Handle PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL. * c-typeck.cc (c_finish_omp_clauses): Adjust wording of some of the conflicting clause diagnostic messages to include word clause. Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial conflict. gcc/cp/ * cp-tree.h (dependent_omp_for_p): Add another tree argument. * parser.cc (check_omp_intervening_code): Reject imperfectly nested tile. (cp_parser_statement_seq_opt): If want_nested_loop, use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. (cp_parser_omp_clause_name): Handle full and partial clause names. (cp_parser_omp_clause_full, cp_parser_omp_clause_partial): New functions. (cp_parser_omp_all_clauses): Formatting fix. Handle PRAGMA_OMP_CLAUSE_PARTIAL and PRAGMA_OMP_CLAUSE_FULL. (cp_parser_next_tokens_can_be_canon_loop): New function. (cp_parser_omp_loop_nest): Parse C++11 attributes. Handle tile/unroll constructs. Use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Only add_stmt cp_parser_omp_loop_nest result if it is non-NULL. (cp_parser_omp_for_loop): Rename tiling variable to oacc_tiling. For OMP_CLAUSE_SIZES set collapse to list length of OMP_CLAUSE_SIZES_LIST. Use cp_parser_next_tokens_can_be_canon_loop instead of just checking for RID_FOR keyword. Remove spurious semicolon. Don't call c_omp_check_loop_binding_exprs if stmt is NULL. Skip and/or handle generated loops. Remove spurious ()s around & operands. (cp_parser_omp_tile_sizes, cp_parser_omp_tile): New functions. (OMP_UNROLL_CLAUSE_MASK): Define. (cp_parser_omp_unroll): New function. (cp_parser_omp_construct): Handle PRAGMA_OMP_TILE and PRAGMA_OMP_UNROLL. (cp_parser_pragma): Likewise. * semantics.cc (finish_omp_clauses): Don't call fold_build_cleanup_point_expr for cases which obviously won't need it, like checked INTEGER_CSTs. Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES} and diagnose full vs. partial conflict. Adjust wording of some of the conflicting clause diagnostic messages to include word clause. (finish_omp_for): Use decl equal to global_namespace as a marker for generated loop. Pass also body to dependent_omp_for_p. Skip generated loops. (finish_omp_for_block): Skip generated loops. * pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_{FULL,PARTIAL,SIZES}. (tsubst_stmt): Handle OMP_TILE and OMP_UNROLL. Handle or skip generated loops. (dependent_omp_for_p): Add body argument. If declv vector element is NULL, find generated loop. * cp-gimplify.cc (cp_gimplify_expr): Handle OMP_TILE and OMP_UNROLL. (cp_fold_r): Likewise. (cp_genericize_r): Likewise. Skip generated loops. gcc/fortran/ * gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL, ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_END_TILE. (struct gfc_omp_clauses): Add sizes_list, partial, full and erroneous members. (enum gfc_exec_op): Add EXEC_OMP_UNROLL and EXEC_OMP_TILE. (gfc_expr_list_len): Declare. * match.h (gfc_match_omp_tile, gfc_match_omp_unroll): Declare. * openmp.cc (gfc_get_location): Declare. (gfc_free_omp_clauses): Free sizes_list. (match_oacc_expr_list): Rename to ... (match_omp_oacc_expr_list): ... this. Add is_omp argument and change diagnostic wording if it is true. (enum omp_mask2): Add OMP_CLAUSE_{FULL,PARTIAL,SIZES}. (gfc_match_omp_clauses): Parse full, partial and sizes clauses. (gfc_match_oacc_wait): Use match_omp_oacc_expr_list instead of match_oacc_expr_list. (OMP_UNROLL_CLAUSES, OMP_TILE_CLAUSES): Define. (gfc_match_omp_tile, gfc_match_omp_unroll): New functions. (resolve_omp_clauses): Diagnose full vs. partial clause conflict. Resolve sizes clause arguments. (find_nested_loop_in_chain): Use switch instead of series of ifs. Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_omp_do_blocks): Set omp_current_do_collapse to list length of sizes_list if present. (gfc_resolve_do_iterator): Return for EXEC_OMP_TILE or EXEC_OMP_UNROLL. (restructure_intervening_code): Remove spurious ()s around & operands. (is_outer_iteration_variable): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (check_nested_loop_in_chain): Likewise. (expr_is_invariant): Likewise. (resolve_omp_do): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. Diagnose tile without sizes clause. Use sizes_list length for count if non-NULL. Set code->ext.omp_clauses->erroneous on loops where we've reported diagnostics. Sorry for mixing non-rectangular loops with generated loops. (omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_omp_directive): Likewise. * parse.cc (decode_omp_directive): Parse end tile, end unroll, tile and unroll. Move nothing entry alphabetically. (case_exec_markers): Add ST_OMP_TILE and ST_OMP_UNROLL. (gfc_ascii_statement): Handle ST_OMP_END_TILE, ST_OMP_END_UNROLL, ST_OMP_TILE and ST_OMP_UNROLL. (parse_omp_do): Add nested argument. Handle ST_OMP_TILE and ST_OMP_UNROLL. (parse_omp_structured_block): Adjust parse_omp_do caller. (parse_executable): Likewise. Handle ST_OMP_TILE and ST_OMP_UNROLL. * resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_resolve_code): Likewise. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Handle full, partial and sizes clauses. Use tree_cons + nreverse instead of temporary vector and build_tree_list_vec for tile_list handling. (gfc_expr_list_len): New function. (gfc_trans_omp_do): Rename tile to oacc_tile. Handle sizes clause. Don't assert code->op is EXEC_DO. Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (gfc_trans_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. * dump-parse-tree.cc (show_omp_clauses): Dump full, partial and sizes clauses. (show_omp_node): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL. (show_code_node): Likewise. gcc/testsuite/ * c-c++-common/gomp/attrs-tile-1.c: New test. * c-c++-common/gomp/attrs-tile-2.c: New test. * c-c++-common/gomp/attrs-tile-3.c: New test. * c-c++-common/gomp/attrs-tile-4.c: New test. * c-c++-common/gomp/attrs-tile-5.c: New test. * c-c++-common/gomp/attrs-tile-6.c: New test. * c-c++-common/gomp/attrs-unroll-1.c: New test. * c-c++-common/gomp/attrs-unroll-2.c: New test. * c-c++-common/gomp/attrs-unroll-3.c: New test. * c-c++-common/gomp/attrs-unroll-inner-1.c: New test. * c-c++-common/gomp/attrs-unroll-inner-2.c: New test. * c-c++-common/gomp/attrs-unroll-inner-3.c: New test. * c-c++-common/gomp/attrs-unroll-inner-4.c: New test. * c-c++-common/gomp/attrs-unroll-inner-5.c: New test. * c-c++-common/gomp/imperfect-attributes.c: Adjust expected diagnostics. * c-c++-common/gomp/imperfect-loop-nest.c: New test. * c-c++-common/gomp/ordered-5.c: New test. * c-c++-common/gomp/scan-7.c: New test. * c-c++-common/gomp/tile-1.c: New test. * c-c++-common/gomp/tile-2.c: New test. * c-c++-common/gomp/tile-3.c: New test. * c-c++-common/gomp/tile-4.c: New test. * c-c++-common/gomp/tile-5.c: New test. * c-c++-common/gomp/tile-6.c: New test. * c-c++-common/gomp/tile-7.c: New test. * c-c++-common/gomp/tile-8.c: New test. * c-c++-common/gomp/tile-9.c: New test. * c-c++-common/gomp/tile-10.c: New test. * c-c++-common/gomp/tile-11.c: New test. * c-c++-common/gomp/tile-12.c: New test. * c-c++-common/gomp/tile-13.c: New test. * c-c++-common/gomp/tile-14.c: New test. * c-c++-common/gomp/tile-15.c: New test. * c-c++-common/gomp/unroll-1.c: New test. * c-c++-common/gomp/unroll-2.c: New test. * c-c++-common/gomp/unroll-3.c: New test. * c-c++-common/gomp/unroll-4.c: New test. * c-c++-common/gomp/unroll-5.c: New test. * c-c++-common/gomp/unroll-6.c: New test. * c-c++-common/gomp/unroll-7.c: New test. * c-c++-common/gomp/unroll-8.c: New test. * c-c++-common/gomp/unroll-9.c: New test. * c-c++-common/gomp/unroll-inner-1.c: New test. * c-c++-common/gomp/unroll-inner-2.c: New test. * c-c++-common/gomp/unroll-inner-3.c: New test. * c-c++-common/gomp/unroll-non-rect-1.c: New test. * c-c++-common/gomp/unroll-non-rect-2.c: New test. * c-c++-common/gomp/unroll-non-rect-3.c: New test. * c-c++-common/gomp/unroll-simd-1.c: New test. * gcc.dg/gomp/attrs-4.c: Adjust expected diagnostics. * gcc.dg/gomp/for-1.c: Likewise. * gcc.dg/gomp/for-11.c: Likewise. * g++.dg/gomp/attrs-4.C: Likewise. * g++.dg/gomp/for-1.C: Likewise. * g++.dg/gomp/pr94512.C: Likewise. * g++.dg/gomp/tile-1.C: New test. * g++.dg/gomp/tile-2.C: New test. * g++.dg/gomp/unroll-1.C: New test. * g++.dg/gomp/unroll-2.C: New test. * g++.dg/gomp/unroll-3.C: New test. * gfortran.dg/gomp/inner-loops-1.f90: New test. * gfortran.dg/gomp/inner-loops-2.f90: New test. * gfortran.dg/gomp/pure-1.f90: Add tests for !$omp unroll and !$omp tile. * gfortran.dg/gomp/pure-2.f90: Remove those tests from here. * gfortran.dg/gomp/scan-9.f90: New test. * gfortran.dg/gomp/tile-1.f90: New test. * gfortran.dg/gomp/tile-2.f90: New test. * gfortran.dg/gomp/tile-3.f90: New test. * gfortran.dg/gomp/tile-4.f90: New test. * gfortran.dg/gomp/tile-5.f90: New test. * gfortran.dg/gomp/tile-6.f90: New test. * gfortran.dg/gomp/tile-7.f90: New test. * gfortran.dg/gomp/tile-8.f90: New test. * gfortran.dg/gomp/tile-9.f90: New test. * gfortran.dg/gomp/tile-10.f90: New test. * gfortran.dg/gomp/tile-imperfect-nest-1.f90: New test. * gfortran.dg/gomp/tile-imperfect-nest-2.f90: New test. * gfortran.dg/gomp/tile-inner-loops-1.f90: New test. * gfortran.dg/gomp/tile-inner-loops-2.f90: New test. * gfortran.dg/gomp/tile-inner-loops-3.f90: New test. * gfortran.dg/gomp/tile-inner-loops-4.f90: New test. * gfortran.dg/gomp/tile-inner-loops-5.f90: New test. * gfortran.dg/gomp/tile-inner-loops-6.f90: New test. * gfortran.dg/gomp/tile-inner-loops-7.f90: New test. * gfortran.dg/gomp/tile-inner-loops-8.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-1.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-2.f90: New test. * gfortran.dg/gomp/tile-non-rectangular-3.f90: New test. * gfortran.dg/gomp/tile-unroll-1.f90: New test. * gfortran.dg/gomp/tile-unroll-2.f90: New test. * gfortran.dg/gomp/unroll-1.f90: New test. * gfortran.dg/gomp/unroll-2.f90: New test. * gfortran.dg/gomp/unroll-3.f90: New test. * gfortran.dg/gomp/unroll-4.f90: New test. * gfortran.dg/gomp/unroll-5.f90: New test. * gfortran.dg/gomp/unroll-6.f90: New test. * gfortran.dg/gomp/unroll-7.f90: New test. * gfortran.dg/gomp/unroll-8.f90: New test. * gfortran.dg/gomp/unroll-9.f90: New test. * gfortran.dg/gomp/unroll-10.f90: New test. * gfortran.dg/gomp/unroll-11.f90: New test. * gfortran.dg/gomp/unroll-12.f90: New test. * gfortran.dg/gomp/unroll-13.f90: New test. * gfortran.dg/gomp/unroll-inner-loop-1.f90: New test. * gfortran.dg/gomp/unroll-inner-loop-2.f90: New test. * gfortran.dg/gomp/unroll-no-clause-1.f90: New test. * gfortran.dg/gomp/unroll-non-rect-1.f90: New test. * gfortran.dg/gomp/unroll-non-rect-2.f90: New test. * gfortran.dg/gomp/unroll-simd-1.f90: New test. * gfortran.dg/gomp/unroll-simd-2.f90: New test. * gfortran.dg/gomp/unroll-simd-3.f90: New test. * gfortran.dg/gomp/unroll-tile-1.f90: New test. * gfortran.dg/gomp/unroll-tile-2.f90: New test. * gfortran.dg/gomp/unroll-tile-inner-1.f90: New test. libgomp/ * testsuite/libgomp.c-c++-common/imperfect-transform-1.c: New test. * testsuite/libgomp.c-c++-common/imperfect-transform-2.c: New test. * testsuite/libgomp.c-c++-common/matrix-1.h: New test. * testsuite/libgomp.c-c++-common/matrix-constant-iter.h: New test. * testsuite/libgomp.c-c++-common/matrix-helper.h: New test. * testsuite/libgomp.c-c++-common/matrix-no-directive-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-parallel-masked-taskloop-simd-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-target-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-target-teams-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-taskloop-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-omp-teams-distribute-parallel-for-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-simd-1.c: New test. * testsuite/libgomp.c-c++-common/matrix-transform-variants-1.h: New test. * testsuite/libgomp.c-c++-common/target-imperfect-transform-1.c: New test. * testsuite/libgomp.c-c++-common/target-imperfect-transform-2.c: New test. * testsuite/libgomp.c-c++-common/unroll-1.c: New test. * testsuite/libgomp.c-c++-common/unroll-non-rect-1.c: New test. * testsuite/libgomp.c++/matrix-no-directive-unroll-full-1.C: New test. * testsuite/libgomp.c++/tile-2.C: New test. * testsuite/libgomp.c++/tile-3.C: New test. * testsuite/libgomp.c++/unroll-1.C: New test. * testsuite/libgomp.c++/unroll-2.C: New test. * testsuite/libgomp.c++/unroll-full-tile.C: New test. * testsuite/libgomp.fortran/imperfect-transform-1.f90: New test. * testsuite/libgomp.fortran/imperfect-transform-2.f90: New test. * testsuite/libgomp.fortran/inner-1.f90: New test. * testsuite/libgomp.fortran/nested-fn.f90: New test. * testsuite/libgomp.fortran/target-imperfect-transform-1.f90: New test. * testsuite/libgomp.fortran/target-imperfect-transform-2.f90: New test. * testsuite/libgomp.fortran/tile-1.f90: New test. * testsuite/libgomp.fortran/tile-2.f90: New test. * testsuite/libgomp.fortran/tile-unroll-1.f90: New test. * testsuite/libgomp.fortran/tile-unroll-2.f90: New test. * testsuite/libgomp.fortran/tile-unroll-3.f90: New test. * testsuite/libgomp.fortran/tile-unroll-4.f90: New test. * testsuite/libgomp.fortran/unroll-1.f90: New test. * testsuite/libgomp.fortran/unroll-2.f90: New test. * testsuite/libgomp.fortran/unroll-3.f90: New test. * testsuite/libgomp.fortran/unroll-4.f90: New test. * testsuite/libgomp.fortran/unroll-5.f90: New test. * testsuite/libgomp.fortran/unroll-6.f90: New test. * testsuite/libgomp.fortran/unroll-7a.f90: New test. * testsuite/libgomp.fortran/unroll-7b.f90: New test. * testsuite/libgomp.fortran/unroll-7c.f90: New test. * testsuite/libgomp.fortran/unroll-7.f90: New test. * testsuite/libgomp.fortran/unroll-8.f90: New test. * testsuite/libgomp.fortran/unroll-simd-1.f90: New test. * testsuite/libgomp.fortran/unroll-tile-1.f90: New test. * testsuite/libgomp.fortran/unroll-tile-2.f90: New test.
2024-05-31Convert references with "counted_by" attributes to/from .ACCESS_WITH_SIZE.Qing Zhao1-0/+8
Including the following changes: * The definition of the new internal function .ACCESS_WITH_SIZE in internal-fn.def. * C FE converts every reference to a FAM with a "counted_by" attribute to a call to the internal function .ACCESS_WITH_SIZE. (build_component_ref in c_typeck.cc) This includes the case when the object is statically allocated and initialized. In order to make this working, the routine digest_init in c-typeck.cc is updated to fold calls to .ACCESS_WITH_SIZE to its first argument when require_constant is TRUE. However, for the reference inside "offsetof", the "counted_by" attribute is ignored since it's not useful at all. (c_parser_postfix_expression in c/c-parser.cc) In addtion to "offsetof", for the reference inside operator "typeof" and "alignof", we ignore counted_by attribute too. When building ADDR_EXPR for the .ACCESS_WITH_SIZE in C FE, replace the call with its first argument. * Convert every call to .ACCESS_WITH_SIZE to its first argument. (expand_ACCESS_WITH_SIZE in internal-fn.cc) * Provide the utility routines to check the call is .ACCESS_WITH_SIZE and get the reference from the call to .ACCESS_WITH_SIZE. (is_access_with_size_p and get_ref_from_access_with_size in tree.cc) gcc/c/ChangeLog: * c-parser.cc (c_parser_postfix_expression): Ignore the counted-by attribute when build_component_ref inside offsetof operator. * c-tree.h (build_component_ref): Add one more parameter. * c-typeck.cc (build_counted_by_ref): New function. (build_access_with_size_for_counted_by): New function. (build_component_ref): Check the counted-by attribute and build call to .ACCESS_WITH_SIZE. (build_unary_op): When building ADDR_EXPR for .ACCESS_WITH_SIZE, use its first argument. (lvalue_p): Accept call to .ACCESS_WITH_SIZE. (digest_init): Fold call to .ACCESS_WITH_SIZE to its first argument when require_constant is TRUE. gcc/ChangeLog: * internal-fn.cc (expand_ACCESS_WITH_SIZE): New function. * internal-fn.def (ACCESS_WITH_SIZE): New internal function. * tree.cc (is_access_with_size_p): New function. (get_ref_from_access_with_size): New function. * tree.h (is_access_with_size_p): New prototype. (get_ref_from_access_with_size): New prototype. gcc/testsuite/ChangeLog: * gcc.dg/flex-array-counted-by-2.c: New test.
2024-04-04Add condition coverage (MC/DC)Jørgen Kvalsvik1-0/+4
This patch adds support in gcc+gcov for modified condition/decision coverage (MC/DC) with the -fcondition-coverage flag. MC/DC is a type of test/code coverage and it is particularly important for safety-critical applicaitons in industries like aviation and automotive. Notably, MC/DC is required or recommended by: * DO-178C for the most critical software (Level A) in avionics. * IEC 61508 for SIL 4. * ISO 26262-6 for ASIL D. From the SQLite webpage: Two methods of measuring test coverage were described above: "statement" and "branch" coverage. There are many other test coverage metrics besides these two. Another popular metric is "Modified Condition/Decision Coverage" or MC/DC. Wikipedia defines MC/DC as follows: * Each decision tries every possible outcome. * Each condition in a decision takes on every possible outcome. * Each entry and exit point is invoked. * Each condition in a decision is shown to independently affect the outcome of the decision. In the C programming language where && and || are "short-circuit" operators, MC/DC and branch coverage are very nearly the same thing. The primary difference is in boolean vector tests. One can test for any of several bits in bit-vector and still obtain 100% branch test coverage even though the second element of MC/DC - the requirement that each condition in a decision take on every possible outcome - might not be satisfied. https://sqlite.org/testing.html#mcdc MC/DC comes in different flavors, the most important being unique cause MC/DC and masking MC/DC. This patch implements masking MC/DC, which is works well with short circuiting semantics, and according to John Chilenski's "An Investigation of Three Forms of the Modified Condition Decision Coverage (MCDC) Criterion" (2001) is as good as unique cause at catching bugs. Whalen, Heimdahl, and De Silva "Efficient Test Coverage Measurement for MC/DC" describes an algorithm for finding the masking table from an AST walk, but my algorithm figures this out by analyzing the control flow graph. The CFG is considered a reduced ordered binary decision diagram and an input vector a path through the BDD, which is recorded. Specific edges will mask ("null out") the contribution from earlier path segments, which can be determined by finding short circuit endpoints. Masking is most easily understood as circuiting of terms in the reverse-ordered Boolean function, and the masked conditions do not affect the decision like short-circuited conditions do not affect the decision. A tag/discriminator mapping from gcond->uid is created during gimplification and made available through the function struct. The values are unimportant as long as basic conditions constructed from a single Boolean expression are given the same identifier. This happens in the breaking down of ANDIF/ORIF trees, so the coverage generally works well for frontends that create such trees. Like Whalen et al this implementation records coverage in fixed-size bitsets which gcov knows how to interpret. Recording conditions only requires a few bitwise operations per condition and is very fast, but comes with a limit on the number of terms in a single boolean expression; the number of bits in a gcov_unsigned_type (which is usually typedef'd to uint64_t). For most practical purposes this is acceptable, and by default a warning will be issued if gcc cannot instrument the expression. This is a practical limitation in the implementation, and not a limitation of the algorithm, so support for more conditions can be supported by introducing arbitrary-sized bitsets. In action it looks pretty similar to the branch coverage. The -g short opt carries no significance, but was chosen because it was an available option with the upper-case free too. gcov --conditions: 3: 17:void fn (int a, int b, int c, int d) { 3: 18: if ((a && (b || c)) && d) conditions covered 3/8 condition 0 not covered (true false) condition 1 not covered (true) condition 2 not covered (true) condition 3 not covered (true) 1: 19: x = 1; -: 20: else 2: 21: x = 2; 3: 22:} gcov --conditions --json-format: "conditions": [ { "not_covered_false": [ 0 ], "count": 8, "covered": 3, "not_covered_true": [ 0, 1, 2, 3 ] } ], Expressions with constants may be heavily rewritten before it reaches the gimplification, so constructs like int x = a ? 0 : 1 becomes _x = (_a == 0). From source you would expect coverage, but it gets neither branch nor condition coverage. The same applies to expressions like int x = 1 || a which are simply replaced by a constant. The test suite contains a lot of small programs and functions. Some of these were designed by hand to test for specific behaviours and graph shapes, and some are previously-failed test cases in other programs adapted into the test suite. gcc/ChangeLog: * builtins.cc (expand_builtin_fork_or_exec): Check condition_coverage_flag. * collect2.cc (main): Add -fno-condition-coverage to OBSTACK. * common.opt: Add new options -fcondition-coverage and -Wcoverage-too-many-conditions. * doc/gcov.texi: Add --conditions documentation. * doc/invoke.texi: Add -fcondition-coverage documentation. * function.cc (free_after_compilation): Free cond_uids. * function.h (struct function): Add cond_uids. * gcc.cc: Link gcov on -fcondition-coverage. * gcov-counter.def (GCOV_COUNTER_CONDS): New. * gcov-dump.cc (tag_conditions): New. * gcov-io.h (GCOV_TAG_CONDS): New. (GCOV_TAG_CONDS_LENGTH): New. (GCOV_TAG_CONDS_NUM): New. * gcov.cc (class condition_info): New. (condition_info::condition_info): New. (condition_info::popcount): New. (struct coverage_info): New. (add_condition_counts): New. (output_conditions): New. (print_usage): Add -g, --conditions. (process_args): Likewise. (output_intermediate_json_line): Output conditions. (read_graph_file): Read condition counters. (read_count_file): Likewise. (file_summary): Print conditions. (accumulate_line_info): Accumulate conditions. (output_line_details): Print conditions. * gimplify.cc (next_cond_uid): New. (reset_cond_uid): New. (shortcut_cond_r): Set condition discriminator. (tag_shortcut_cond): New. (gimple_associate_condition_with_expr): New. (shortcut_cond_expr): Set condition discriminator. (gimplify_cond_expr): Likewise. (gimplify_function_tree): Call reset_cond_uid. * ipa-inline.cc (can_early_inline_edge_p): Check condition_coverage_flag. * ipa-split.cc (pass_split_functions::gate): Likewise. * passes.cc (finish_optimization_passes): Likewise. * profile.cc (struct condcov): New declaration. (cov_length): Likewise. (cov_blocks): Likewise. (cov_masks): Likewise. (cov_maps): Likewise. (cov_free): Likewise. (instrument_decisions): New. (read_thunk_profile): Control output to file. (branch_prob): Call find_conditions, instrument_decisions. (init_branch_prob): Add total_num_conds. (end_branch_prob): Likewise. * tree-core.h (struct tree_exp): Add condition_uid. * tree-profile.cc (struct conds_ctx): New. (CONDITIONS_MAX_TERMS): New. (EDGE_CONDITION): New. (topological_cmp): New. (index_of): New. (single_p): New. (single_edge): New. (contract_edge_up): New. (struct outcomes): New. (conditional_succs): New. (condition_index): New. (condition_uid): New. (masking_vectors): New. (emit_assign): New. (emit_bitwise_op): New. (make_top_index_visit): New. (make_top_index): New. (paths_between): New. (struct condcov): New. (cov_length): New. (cov_blocks): New. (cov_masks): New. (cov_maps): New. (cov_free): New. (find_conditions): New. (struct counters): New. (find_counters): New. (resolve_counter): New. (resolve_counters): New. (instrument_decisions): New. (tree_profiling): Check condition_coverage_flag. (pass_ipa_tree_profile::gate): Likewise. * tree.h (SET_EXPR_UID): New. (EXPR_COND_UID): New. libgcc/ChangeLog: * libgcov-merge.c (__gcov_merge_ior): New. gcc/testsuite/ChangeLog: * lib/gcov.exp: Add condition coverage test function. * g++.dg/gcov/gcov-18.C: New test. * gcc.misc-tests/gcov-19.c: New test. * gcc.misc-tests/gcov-20.c: New test. * gcc.misc-tests/gcov-21.c: New test. * gcc.misc-tests/gcov-22.c: New test. * gcc.misc-tests/gcov-23.c: New test.
2024-03-14OpenACC 2.7: front-end support for readonly modifierChung-Lin Tang1-0/+8
This patch implements the front-end support for the 'readonly' modifier for the OpenACC 'copyin' clause and 'cache' directive. This currently only includes front-end parsing for C/C++/Fortran and setting of new bits OMP_CLAUSE_MAP_READONLY, OMP_CLAUSE__CACHE__READONLY. Further linking of these bits to points-to analysis and/or utilization of read-only memory in accelerator target are for later patches. gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_data_clause): Add parsing support for 'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments. (c_parser_oacc_cache): Add parsing support for 'readonly' modifier, set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update comments. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_data_clause): Add parsing support for 'readonly' modifier, set OMP_CLAUSE_MAP_READONLY if readonly modifier found, update comments. (cp_parser_oacc_cache): Add parsing support for 'readonly' modifier, set OMP_CLAUSE__CACHE__READONLY if readonly modifier found, update comments. gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Print "readonly," for OMP_LIST_MAP and OMP_LIST_CACHE if n->u.map.readonly is set. Adjust 'n->u.map_op' to 'n->u.map.op'. * gfortran.h (typedef struct gfc_omp_namelist): Adjust map_op as 'ENUM_BITFIELD (gfc_omp_map_op) op:8', add 'bool readonly' field, change to named struct field 'map'. * openmp.cc (gfc_match_omp_map_clause): Adjust 'n->u.map_op' to 'n->u.map.op'. (gfc_match_omp_clause_reduction): Likewise. (gfc_match_omp_clauses): Add readonly modifier parsing for OpenACC copyin clause, set 'n->u.map.op' and 'n->u.map.readonly' for parsed clause. Adjust 'n->u.map_op' to 'n->u.map.op'. (gfc_match_oacc_declare): Adjust 'n->u.map_op' to 'n->u.map.op'. (gfc_match_oacc_cache): Add readonly modifier parsing for OpenACC cache directive. (resolve_omp_clauses): Adjust 'n->u.map_op' to 'n->u.map.op'. * trans-decl.cc (add_clause): Adjust 'n->u.map_op' to 'n->u.map.op'. (finish_oacc_declare): Likewise. * trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_CLAUSE_MAP_READONLY, OMP_CLAUSE__CACHE__READONLY to 1 when readonly is set. Adjust 'n->u.map_op' to 'n->u.map.op'. (gfc_add_clause_implicitly): Adjust 'n->u.map_op' to 'n->u.map.op'. gcc/ChangeLog: * tree.h (OMP_CLAUSE_MAP_READONLY): New macro. (OMP_CLAUSE__CACHE__READONLY): New macro. * tree-core.h (struct GTY(()) tree_base): Adjust comments for new uses of readonly_flag bit in OMP_CLAUSE_MAP_READONLY and OMP_CLAUSE__CACHE__READONLY. * tree-pretty-print.cc (dump_omp_clause): Add support for printing OMP_CLAUSE_MAP_READONLY and OMP_CLAUSE__CACHE__READONLY. gcc/testsuite/ChangeLog: * c-c++-common/goacc/readonly-1.c: New test. * gfortran.dg/goacc/readonly-1.f90: New test.
2024-02-24Use HOST_WIDE_INT_{C,UC,0,0U,1,1U} macros some moreJakub Jelinek1-1/+1
I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned HOST_WIDE_INT) constant and turned them into uses of the appropriate macros. THere are quite a few cases in non-i386 backends but I've left that out for now. The only behavior change is in build_replicated_int_cst where the left shift was done in HOST_WIDE_INT type but assigned to unsigned HOST_WIDE_INT, which I've changed into unsigned HOST_WIDE_INT shift. 2024-02-24 Jakub Jelinek <jakub@redhat.com> gcc/ * builtins.cc (fold_builtin_isascii): Use HOST_WIDE_INT_UC macro. * combine.cc (make_field_assignment): Use HOST_WIDE_INT_1U macro. * double-int.cc (double_int::mask): Use HOST_WIDE_INT_UC macros. * genattrtab.cc (attr_alt_complement): Use HOST_WIDE_INT_1 macro. (mk_attr_alt): Use HOST_WIDE_INT_0 macro. * genautomata.cc (bitmap_set_bit, CLEAR_BIT): Use HOST_WIDE_INT_1 macros. * ipa-strub.cc (can_strub_internally_p): Use HOST_WIDE_INT_1 macro. * loop-iv.cc (implies_p): Use HOST_WIDE_INT_1U macro. * pretty-print.cc (test_pp_format): Use HOST_WIDE_INT_C and HOST_WIDE_INT_UC macros. * rtlanal.cc (nonzero_bits1): Use HOST_WIDE_INT_UC macro. * tree.cc (build_replicated_int_cst): Use HOST_WIDE_INT_1U macro. * tree.h (DECL_OFFSET_ALIGN): Use HOST_WIDE_INT_1U macro. * tree-ssa-structalias.cc (dump_varinfo): Use ~HOST_WIDE_INT_0U macros. * wide-int.cc (divmod_internal_2): Use HOST_WIDE_INT_1U macro. * config/i386/constraints.md (define_constraint "L"): Use HOST_WIDE_INT_C macro. * config/i386/i386.md (movabsq split peephole2): Use HOST_WIDE_INT_C macro. (movl + movb peephole2): Likewise. * config/i386/predicates.md (x86_64_zext_immediate_operand): Likewise. (const_32bit_mask): Likewise. gcc/objc/ * objc-encoding.cc (encode_array): Use HOST_WIDE_INT_0 macros.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-16Add support for target_version attributeAndrew Carlotti1-2/+2
This patch adds support for the "target_version" attribute to the middle end and the C++ frontend, which will be used to implement function multiversioning in the aarch64 backend. On targets that don't use the "target" attribute for multiversioning, there is no conflict between the "target" and "target_clones" attributes. This patch therefore makes the mutual exclusion in C-family, D and Ada conditonal upon the value of the expanded_clones_attribute target hook. The "target_version" attribute is only added to C++ in this patch, because this is currently the only frontend which supports multiversioning using the "target" attribute. Support for the "target_version" attribute will be extended to C at a later date. Targets that currently use the "target" attribute for function multiversioning (i.e. i386 and rs6000) are not affected by this patch. gcc/ChangeLog: * attribs.cc (decl_attributes): Pass attribute name to target. (is_function_default_version): Update comment to specify incompatibility with target_version attributes. * cgraphclones.cc (cgraph_node::create_version_clone_with_body): Call valid_version_attribute_p for target_version attributes. * defaults.h (TARGET_HAS_FMV_TARGET_ATTRIBUTE): New macro. * target.def (valid_version_attribute_p): New hook. * doc/tm.texi.in: Add new hook. * doc/tm.texi: Regenerate. * multiple_target.cc (create_dispatcher_calls): Remove redundant is_function_default_version check. (expand_target_clones): Use target macro to pick attribute name. * targhooks.cc (default_target_option_valid_version_attribute_p): New. * targhooks.h (default_target_option_valid_version_attribute_p): New. * tree.h (DECL_FUNCTION_VERSIONED): Update comment to include target_version attributes. gcc/c-family/ChangeLog: * c-attribs.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto, and add target_version. (attr_target_version_exclusions): New. (c_common_attribute_table): Add target_version. (handle_target_version_attribute): New. (handle_target_attribute): Amend comment. (handle_target_clones_attribute): Ditto. gcc/ada/ChangeLog: * gcc-interface/utils.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto. gcc/d/ChangeLog: * d-attribs.cc (attr_target_exclusions): Make target/target_clones exclusion target-dependent. (attr_target_clones_exclusions): Ditto. gcc/cp/ChangeLog: * decl2.cc (check_classfn): Update comment to include target_version attributes.
2023-12-13OpenMP: Pointers and member mappingsJulian Brown1-0/+4
This patch changes the mapping node arrangement used for array components of derived types in order to accommodate for changes made in the previous patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed derived-type members instead of "GOMP_MAP_ALWAYS_POINTER". We change the mapping nodes used for a derived-type mapping like this: type T integer, pointer, dimension(:) :: arrptr end type T type(T) :: tvar [...] !$omp target map(tofrom: tvar%arrptr) So that the nodes used look like this: 1) map(to: tvar%arrptr) --> GOMP_MAP_TO [implicit] *tvar%arrptr%data (the array data) GOMP_MAP_TO_PSET tvar%arrptr (the descriptor) GOMP_MAP_ATTACH_DETACH tvar%arrptr%data 2) map(tofrom: tvar%arrptr(3:8) --> GOMP_MAP_TOFROM *tvar%arrptr%data(3) (size 8-3+1, etc.) GOMP_MAP_TO_PSET tvar%arrptr GOMP_MAP_ATTACH_DETACH tvar%arrptr%data (bias 3, etc.) In this case, we can determine in the front-end that the whole-array/pointer mapping (1) is only needed to map the pointer -- so we drop it entirely. (Note also that we set -- early -- the OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer mappings. See below.) In the middle end, we process mappings using the struct sibling-list handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle of the group of three mapping nodes to the proper sorted position after the GOMP_MAP_STRUCT mapping: GOMP_MAP_STRUCT tvar (len: 1) GOMP_MAP_TO_PSET tvar%arr (size: 64, etc.) <--. moved here [...] | GOMP_MAP_TOFROM *tvar%arrptr%data(3) ___| GOMP_MAP_ATTACH_DETACH tvar%arrptr%data In another case, if we have an array of derived-type values "dtarr", and mappings like: i = 1 j = 1 map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8)) We still map the same way, but this time we cannot prove that the base expressions "dtarr(i) and "dtarr(j)" are the same in the front-end. So we keep both mappings, but we move the "[implicit]" mapping of the full-array reference to the end of the clause list in gimplify.cc (by adjusting the topological sorting algorithm): GOMP_MAP_STRUCT dtvar (len: 2) GOMP_MAP_TO_PSET dtvar(i)%arrptr GOMP_MAP_TO_PSET dtvar(j)%arrptr [...] GOMP_MAP_TOFROM *dtvar(j)%arrptr%data(3) (size: 8-3+1) GOMP_MAP_ATTACH_DETACH dtvar(j)%arrptr%data GOMP_MAP_TO [implicit] *dtvar(i)%arrptr%data(1) (size: whole array) GOMP_MAP_ATTACH_DETACH dtvar(i)%arrptr%data Always moving "[implicit]" full-array mappings after array-section mappings (without that bit set) means that we'll avoid copying the whole array unnecessarily -- even in cases where we can't prove that the arrays are the same. The patch also fixes some bugs with "enter data" and "exit data" directives with this new mapping arrangement. Also now if you have mappings like this: #pragma omp target enter data map(to: dv, dv%arr(1:20)) The whole of the derived-type variable "dv" is mapped, so the GOMP_MAP_TO_PSET for the array-section mapping can be dropped: GOMP_MAP_TO dv GOMP_MAP_TO *dv%arr%data GOMP_MAP_TO_PSET dv%arr <-- deleted (array section mapping) GOMP_MAP_ATTACH_DETACH dv%arr%data To accommodate for recent changes to mapping nodes made by Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET for "exit data" directives, in favour of using the "correct" GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion. A new flag is introduced so the middle-end knows when the latter two kinds are being used specifically for an array descriptor. This version of the patch fixes "omp target exit data" handling for GOMP_MAP_DELETE, and adds pretty-printing dump output for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra clarity). Also I noticed the handling of descriptors on *OpenACC* exit-data directives was inconsistent, so I've made those use GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as OpenMP too. In the end it doesn't actually matter to the runtime, which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for array descriptors on OpenACC "exit data" directives the same, anyway, and doing it this way in the FE avoids needless divergence. I've added a couple of new tests (gomp/target-enter-exit-data.f90 and goacc/enter-exit-data-2.f90). 2023-12-07 Julian Brown <julian@codesourcery.com> gcc/fortran/ * dependency.cc (gfc_omp_expr_prefix_same): New function. * dependency.h (gfc_omp_expr_prefix_same): Add prototype. * gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2" union. * trans-openmp.cc (dependency.h): Include. (gfc_trans_omp_array_section): Adjust mapping node arrangement for array descriptors. Use GOMP_MAP_TO_PSET or GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR flag set. (gfc_symbol_rooted_namelist): New function. (gfc_trans_omp_clauses): Check subcomponent and subarray/element accesses elsewhere in the clause list for pointers to derived types or array descriptors, and adjust or drop mapping nodes appropriately. Adjust for changes to mapping node arrangement. (gfc_trans_oacc_executable_directive): Pass code op through. gcc/ * gimplify.cc (omp_map_clause_descriptor_p): New function. (build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use above function. (omp_tsort_mapping_groups): Process nodes that have OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't. Add enter_exit_data parameter. (omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if we're mapping the whole containing derived-type variable. (omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling. Remove GOMP_MAP_ALWAYS_POINTER handling. (gimplify_scan_omp_clauses): Pass enter_exit argument to omp_tsort_mapping_groups. Don't adjust/remove GOMP_MAP_TO_PSET mappings for derived-type components here. * tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro. * tree-pretty-print.cc (dump_omp_clause): Show OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with GOMP_MAP_TO_PSET-like syntax). gcc/testsuite/ * gfortran.dg/goacc/enter-exit-data-2.f90: New test. * gfortran.dg/goacc/finalize-1.f: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. * gfortran.dg/gomp/map-subarray-2.f90: New test. * gfortran.dg/gomp/map-subarray.f90: New test. * gfortran.dg/gomp/target-enter-exit-data.f90: New test. libgomp/ * testsuite/libgomp.fortran/map-subarray.f90: New test. * testsuite/libgomp.fortran/map-subarray-2.f90: New test. * testsuite/libgomp.fortran/map-subarray-3.f90: New test. * testsuite/libgomp.fortran/map-subarray-4.f90: New test. * testsuite/libgomp.fortran/map-subarray-6.f90: New test. * testsuite/libgomp.fortran/map-subarray-7.f90: New test. * testsuite/libgomp.fortran/map-subarray-8.f90: New test. * testsuite/libgomp.fortran/map-subcomponents.f90: New test. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for descriptor-mapping changes. Remove XFAIL.
2023-12-13OpenMP/OpenACC: Rework clause expansion and nested struct handlingJulian Brown1-0/+4
This patch reworks clause expansion in the C, C++ and (to a lesser extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in GPU offloading support. At present a single clause may be turned into several mapping nodes, or have its mapping type changed, in several places scattered through the front- and middle-end. The analysis relating to which particular transformations are needed for some given expression has become quite hard to follow. Briefly, we manipulate clause types in the following places: 1. During parsing, in c_omp_adjust_map_clauses. Depending on a set of rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into ATTACH_DETACH, or mark the decl addressable. 2. In semantics.cc or c-typeck.cc, clauses are expanded in handle_omp_array_sections (called via {c_}finish_omp_clauses, or in finish_omp_clauses itself. The two cases are for processing array sections (the former), or non-array sections (the latter). 3. In gimplify.cc, we build sibling lists for struct accesses, which groups and sorts accesses along with their struct base, creating new ALLOC/RELEASE nodes for pointers. 4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be adjusted or created. This patch doesn't completely disrupt this scheme, though clause types are no longer adjusted in c_omp_adjust_map_clauses (step 1). Clause expansion in step 2 (for C and C++) now uses a single, unified mechanism, parts of which are also reused for analysis in step 3. Rather than the kind-of "ad-hoc" pattern matching on addresses used to expand clauses used at present, a new method for analysing addresses is introduced. This does a recursive-descent tree walk on expression nodes, and emits a vector of tokens describing each "part" of the address. This tokenized address can then be translated directly into mapping nodes, with the assurance that no part of the expression has been inadvertently skipped or misinterpreted. In this way, all the variations of ways pointers, arrays, references and component accesses might be combined can be teased apart into easily-understood cases - and we know we've "parsed" the whole address before we start analysis, so the right code paths can easily be selected. For example, a simple access "arr[idx]" might parse as: base-decl access-indexed-array or "mystruct->foo[x]" with a pointer "foo" component might parse as: base-decl access-pointer component-selector access-pointer A key observation is that support for "array" bases, e.g. accesses whose root nodes are not structures, but describe scalars or arrays, and also *one-level deep* structure accesses, have first-class support in gimplify and beyond. Expressions that use deeper struct accesses or e.g. multiple indirections were more problematic: some cases worked, but lots of cases didn't. This patch reimplements the support for those in gimplify.cc, again using the new "address tokenization" support. An expression like "mystruct->foo->bar[0:10]" used in a mapping node will translate the right-hand access directly in the front-end. The base for the access will be "mystruct->foo". This is handled recursively in gimplify.cc -- there may be several accesses of "mystruct"'s members on the same directive, so the sibling-list building machinery can be used again. (This was already being done for OpenACC, but the new implementation differs somewhat in details, and is more robust.) For OpenMP, in the case where the base pointer itself, i.e. "mystruct->foo" here, is NOT mapped on the same directive, we create a "fragile" mapping. This turns the "foo" component access into a zero-length allocation (which is a new feature for the runtime, so support has been added there too). A couple of changes have been made to how mapping clauses are turned into mapping nodes: The first change is based on the observation that it is probably never correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for references), because if the containing struct is already mapped on the target then the host version of the pointer in question will be corrupted if the struct is copied back from the target. This patch removes all such uses, across each of C, C++ and Fortran. The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes are processed during sibling-list creation. For OpenMP, for pointer components, we must map the base pointer separately from an array section that uses the base pointer, so e.g. we must have both "map(mystruct.base)" and "map(mystruct.base[0:10])" mappings. These create nodes such as: GOMP_MAP_TOFROM mystruct.base G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base Instead of using the first of these directly when building the struct sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH, leading to: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that drops the GOMP_MAP_TOFROM for the base pointer, marks the second group as having had a base-pointer mapping, then omp_build_struct_sibling_lists can create: GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize] This ends up working better in many cases, particularly those involving references. (The "alloc" space is immediately overwritten by a pointer attachment, so this is mildly more efficient than a redundant TO mapping at runtime also.) There is support in the address tokenizer for "arbitrary" base expressions which aren't rooted at a decl, but that is not used as present because such addresses are disallowed at parse time. In the front-ends, the address tokenization machinery is mostly only used for clause expansion and not for diagnostics at present. It could be used for those too, which would allow more of my previous "address inspector" implementation to be removed. The new bits in gimplify.cc work with OpenACC also. This version of the patch addresses several first-pass review comments from Tobias, and fixes a few previously-missed cases for manually-managed ragged array mappings (including cases using references). Some arbitrary differences between handling of clause expansion for C vs. C++ have also been fixed, and some fragments from later in the patch series have been moved forward (where they were useful for fixing bugs). Several new test cases have been added. 2023-11-29 Julian Brown <julian@codesourcery.com> gcc/c-family/ * c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA, C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET. (omp_addr_token): Add forward declaration. (c_omp_address_inspector): New class. * c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but do not change any mapping node types. (c_omp_address_inspector::unconverted_ref_origin, c_omp_address_inspector::component_access_p, c_omp_address_inspector::check_clause, c_omp_address_inspector::get_root_term, c_omp_address_inspector::map_supported_p, c_omp_address_inspector::get_origin, c_omp_address_inspector::maybe_unconvert_ref, c_omp_address_inspector::maybe_zero_length_array_section, c_omp_address_inspector::expand_array_base, c_omp_address_inspector::expand_component_selector, c_omp_address_inspector::expand_map_clause): New methods. (omp_expand_access_chain): New function. gcc/c/ * c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for c_finish_omp_clauses call. (c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses. (c_parser_oacc_compute): Likewise. (c_parser_omp_target_data, c_parser_omp_target_enter_data): Support ATTACH kind. (c_parser_omp_target_exit_data): Support DETACH kind. (check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here. * c-typeck.cc (handle_omp_array_sections_1, handle_omp_array_sections, c_finish_omp_clauses): Use c_omp_address_inspector class and OMP address tokenizer to analyze and expand map clause expressions. Fix some diagnostics. Fix "is OpenACC" condition for C_ORT_ACC_TARGET addition. gcc/cp/ * parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use to select region type for finish_omp_clauses call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support GOMP_MAP_ATTACH kind. (cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind. (cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses. (cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses. (cp_parser_oacc_compute): Likewise. * pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to tsubst_omp_clauses for OpenACC compute regions. * semantics.cc (cp_omp_address_inspector): New class, derived from c_omp_address_inspector. (handle_omp_array_sections_1, handle_omp_array_sections, finish_omp_clauses): Use cp_omp_address_inspector class and OMP address tokenizer to analyze and expand OpenMP map clause expressions. Fix some diagnostics. Support C_ORT_ACC_TARGET. (finish_omp_target): Handle GOMP_MAP_POINTER. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter. Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for derived type components. (gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section. gcc/ * gimplify.cc (build_struct_comp_nodes): Don't process GOMP_MAP_ATTACH_DETACH "middle" nodes here. (omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for nested struct handling. (omp_strip_components_and_deref, omp_strip_indirections): Remove functions. (omp_get_attachment): Handle GOMP_MAP_DETACH here. (omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH, GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer component array sections. (omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile fields. (omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT. (omp_index_mapping_groups_1): Skip reprocess_struct groups. (omp_get_nonfirstprivate_group, omp_directive_maps_explicitly, omp_resolve_clause_dependencies, omp_first_chained_access_token): New functions. (omp_check_mapping_compatibility): Adjust accepted node combinations for "from" clauses using release instead of alloc. (omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P, REPROCESSING_STRUCT, ADDED_TAIL parameters. Use OMP address tokenizer to analyze addresses. Reimplement nested struct handling, and implement "fragile groups". (omp_build_struct_sibling_lists): Adjust for changes to omp_accumulate_sibling_list. Recalculate bias for ATTACH_DETACH nodes after GOMP_MAP_STRUCT nodes. (gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies. Use OMP address tokenizer. (gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc instead of build_simple_mem_ref_loc. * omp-general.cc (omp-general.h, tree-pretty-print.h): Include. (omp_addr_tokenizer): New namespace. (omp_addr_tokenizer::omp_addr_token): New. (omp_addr_tokenizer::omp_parse_component_selector, omp_addr_tokenizer::omp_parse_ref, omp_addr_tokenizer::omp_parse_pointer, omp_addr_tokenizer::omp_parse_access_method, omp_addr_tokenizer::omp_parse_access_methods, omp_addr_tokenizer::omp_parse_structure_base, omp_addr_tokenizer::omp_parse_structured_expr, omp_addr_tokenizer::omp_parse_array_expr, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New functions. (omp_parse_expr, debug_omp_tokenized_addr): New functions. * omp-general.h (omp_addr_tokenizer::access_method_kinds, omp_addr_tokenizer::structure_base_kinds, omp_addr_tokenizer::token_type, omp_addr_tokenizer::omp_addr_token, omp_addr_tokenizer::omp_access_chain_p, omp_addr_tokenizer::omp_accessed_addr): New. (omp_addr_token, omp_parse_expr): New. * omp-low.cc (scan_sharing_clauses): Skip error check for references to pointers. * tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro. gcc/testsuite/ * c-c++-common/gomp/clauses-2.c: Fix error output. * c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output. * c-c++-common/gomp/target-50.c: Adjust scan output. * c-c++-common/gomp/target-enter-data-1.c: Adjust scan output. * g++.dg/gomp/static-component-1.C: New test. * gcc.dg/gomp/target-3.c: Adjust scan output. * gfortran.dg/gomp/map-9.f90: Adjust scan output. libgomp/ * target.c (gomp_map_pointer): Modify zero-length array section pointer handling. (gomp_attach_pointer): Likewise. (gomp_map_fields_existing): Use gomp_map_0len_lookup. (gomp_attach_pointer): Allow attaching null pointers (or Fortran "unassociated" pointers). (gomp_map_vars_internal): Handle zero-sized struct members. Add diagnostic for unmapped struct pointer members. * testsuite/libgomp.c-c++-common/baseptrs-1.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-2.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-6.c: New test. * testsuite/libgomp.c-c++-common/baseptrs-7.c: New test. * testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test. * testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing "free". * testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test. * testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test. * testsuite/libgomp.c++/class-array-1.C: New test. * testsuite/libgomp.c++/baseptrs-3.C: New test. * testsuite/libgomp.c++/baseptrs-4.C: New test. * testsuite/libgomp.c++/baseptrs-5.C: New test. * testsuite/libgomp.c++/baseptrs-8.C: New test. * testsuite/libgomp.c++/baseptrs-9.C: New test. * testsuite/libgomp.c++/ref-mapping-1.C: New test. * testsuite/libgomp.c++/target-48.C: New test. * testsuite/libgomp.c++/target-49.C: New test. * testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test. * testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2 semantics. * testsuite/libgomp.c++/target-this-3.C: Likewise. * testsuite/libgomp.c++/target-this-4.C: Likewise. * testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL. * testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.
2023-12-08tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping flagHao Liu1-3/+5
The flag is defined as CHREC_NOWRAP(tree), and will be dumped from "{offset, +, 1}_1" to "{offset, +, 1}<nw>_1" (nw is short for nonwrapping). Two SCEV interfaces record_nonwrapping_chrec and nonwrapping_chrec_p are added to set and check the flag respectively. As resetting the SCEV cache (i.e., the chrec trees) may not reset the loop->estimate_state, free_numbers_of_iterations_estimates is called explicitly in loop vectorization to make sure the flag can be calculated propriately by niter. gcc/ChangeLog: PR tree-optimization/112774 * tree-pretty-print.cc: if nonwrapping flag is set, chrec will be printed with additional <nw> info. * tree-scalar-evolution.cc: add record_nonwrapping_chrec and nonwrapping_chrec_p to set and check the new flag respectively. * tree-scalar-evolution.h: Likewise. * tree-ssa-loop-niter.cc (idx_infer_loop_bounds, infer_loop_bounds_from_pointer_arith, infer_loop_bounds_from_signedness, scev_probably_wraps_p): call record_nonwrapping_chrec before record_nonwrapping_iv, call nonwrapping_chrec_p to check the flag is set and return false from scev_probably_wraps_p. * tree-vect-loop.cc (vect_analyze_loop): call free_numbers_of_iterations_estimates explicitly. * tree-core.h: document the nothrow_flag usage in CHREC_NOWRAP * tree.h: add CHREC_NOWRAP(NODE), base.nothrow_flag is used to represent the nonwrapping info. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/scev-16.c: New test.
2023-11-07openmp: Add support for the 'indirect' clause in C/C++Kwok Cheung Yeung1-0/+4
This adds support for the 'indirect' clause in the 'declare target' directive. Functions declared as indirect may be called via function pointers passed from the host in offloaded code. Virtual calls to member functions via the object pointer in C++ are currently not supported in target regions. 2023-11-07 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/c-family/ * c-attribs.cc (c_common_attribute_table): Add attribute for indirect functions. * c-pragma.h (enum parma_omp_clause): Add entry for indirect clause. gcc/c/ * c-decl.cc (c_decl_attributes): Add attribute for indirect functions. * c-lang.h (c_omp_declare_target_attr): Add indirect field. * c-parser.cc (c_parser_omp_clause_name): Handle indirect clause. (c_parser_omp_clause_indirect): New. (c_parser_omp_all_clauses): Handle indirect clause. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (c_parser_omp_begin): Handle indirect clause. * c-typeck.cc (c_finish_omp_clauses): Handle indirect clause. gcc/cp/ * cp-tree.h (cp_omp_declare_target_attr): Add indirect field. * decl2.cc (cplus_decl_attributes): Add attribute for indirect functions. * parser.cc (cp_parser_omp_clause_name): Handle indirect clause. (cp_parser_omp_clause_indirect): New. (cp_parser_omp_all_clauses): Handle indirect clause. (handle_omp_declare_target_clause): Add extra parameter. Add indirect attribute for indirect functions. (OMP_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_declare_target): Handle indirect clause. Emit error message if device_type or indirect clauses used alone. Emit error if indirect clause used with device_type that is not 'any'. (OMP_BEGIN_DECLARE_TARGET_CLAUSE_MASK): Add indirect clause to mask. (cp_parser_omp_begin): Handle indirect clause. * semantics.cc (finish_omp_clauses): Handle indirect clause. gcc/ * lto-cgraph.cc (enum LTO_symtab_tags): Add tag for indirect functions. (output_offload_tables): Write indirect functions. (input_offload_tables): read indirect functions. * lto-section-names.h (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. * omp-builtins.def (BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR): New. * omp-offload.cc (offload_ind_funcs): New. (omp_discover_implicit_declare_target): Add functions marked with 'omp declare target indirect' to indirect functions list. (omp_finish_file): Add indirect functions to section for offload indirect functions. (execute_omp_device_lower): Redirect indirect calls on target by passing function pointer to BUILT_IN_GOMP_TARGET_MAP_INDIRECT_PTR. (pass_omp_device_lower::gate): Run pass_omp_device_lower if indirect functions are present on an accelerator device. * omp-offload.h (offload_ind_funcs): New. * tree-core.h (omp_clause_code): Add OMP_CLAUSE_INDIRECT. * tree.cc (omp_clause_num_ops): Add entry for OMP_CLAUSE_INDIRECT. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_INDIRECT_EXPR): New. * config/gcn/mkoffload.cc (process_asm): Process offload_ind_funcs section. Count number of indirect functions. (process_obj): Emit number of indirect functions. * config/nvptx/mkoffload.cc (ind_func_ids, ind_funcs_tail): New. (process): Emit offload_ind_func_table in PTX code. Emit indirect function names and count in image. * config/nvptx/nvptx.cc (nvptx_record_offload_symbol): Mark indirect functions in PTX code with IND_FUNC_MAP. gcc/testsuite/ * c-c++-common/gomp/declare-target-7.c: Update expected error message. * c-c++-common/gomp/declare-target-indirect-1.c: New. * c-c++-common/gomp/declare-target-indirect-2.c: New. * g++.dg/gomp/attrs-21.C (v12): Update expected error message. * g++.dg/gomp/declare-target-indirect-1.C: New. * gcc.dg/gomp/attrs-21.c (v12): Update expected error message. include/ * gomp-constants.h (GOMP_VERSION): Increment to 3. (GOMP_VERSION_SUPPORTS_INDIRECT_FUNCS): New. libgcc/ * offloadstuff.c (OFFLOAD_IND_FUNC_TABLE_SECTION_NAME): New. (__offload_ind_func_table): New. (__offload_ind_funcs_end): New. (__OFFLOAD_TABLE__): Add entries for indirect functions. libgomp/ * Makefile.am (libgomp_la_SOURCES): Add target-indirect.c. * Makefile.in: Regenerate. * libgomp-plugin.h (GOMP_INDIRECT_ADDR_MAP): New define. (GOMP_OFFLOAD_load_image): Add extra argument. * libgomp.h (struct indirect_splay_tree_key_s): New. (indirect_splay_tree_node, indirect_splay_tree, indirect_splay_tree_key): New. (indirect_splay_compare): New. * libgomp.map (GOMP_5.1.1): Add GOMP_target_map_indirect_ptr. * libgomp.texi (OpenMP 5.1): Update documentation on indirect calls in target region and on indirect clause. (Other new OpenMP 5.2 features): Add entry for virtual function calls. * libgomp_g.h (GOMP_target_map_indirect_ptr): Add prototype. * oacc-host.c (host_load_image): Add extra argument. * target.c (gomp_load_image_to_device): If the GOMP_VERSION is high enough, read host indirect functions table and pass to load_image_func. * config/accel/target-indirect.c: New. * config/linux/target-indirect.c: New. * config/gcn/team.c (build_indirect_map): Add prototype. (gomp_gcn_enter_kernel): Initialize support for indirect function calls on GCN target. * config/nvptx/team.c (build_indirect_map): Add prototype. (gomp_nvptx_main): Initialize support for indirect function calls on NVPTX target. * plugin/plugin-gcn.c (struct gcn_image_desc): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, build address translation table and copy it to target memory. * plugin/plugin-nvptx.c (nvptx_tdata): Add field for indirect functions count. (GOMP_OFFLOAD_load_image): Add extra argument. If the GOMP_VERSION is high enough, Build address translation table and copy it to target memory. * testsuite/libgomp.c-c++-common/declare-target-indirect-1.c: New. * testsuite/libgomp.c-c++-common/declare-target-indirect-2.c: New. * testsuite/libgomp.c++/declare-target-indirect-1.C: New.
2023-11-07c: Refer more consistently to C23 not C2XJoseph Myers1-1/+1
Continuing the move to refer to C23 in place of C2X throughout the source tree, update documentation, diagnostics, comments, variable and function names, etc., to use the C23 name. Testsuite updates are left for a future patch, except for testcases that test diagnostics that previously mentioned C2X (but in those testcases, sometimes other comments are updated, not just the diagnostic expectations). Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/ * builtins.def (DEF_C2X_BUILTIN): Rename to DEF_C23_BUILTIN and use flag_isoc23 and function_c23_misc. * config/rl78/rl78.cc (rl78_option_override): Compare lang_hooks.name with "GNU C23" not "GNU C2X". * coretypes.h (function_c2x_misc): Rename to function_c23_misc. * doc/cpp.texi (@code{__has_attribute}): Refer to C23 instead of C2x. * doc/extend.texi: Likewise. * doc/invoke.texi: Likewise. * dwarf2out.cc (highest_c_language, gen_compile_unit_die): Compare against and return "GNU C23" language string instead of "GNU C2X". * ginclude/float.h: Refer to C23 instead of C2X in comments. * ginclude/stdint-gcc.h: Likewise. * glimits.h: Likewise. * tree.h: Likewise. gcc/ada/ * gcc-interface/utils.cc (flag_isoc2x): Rename to flag_isoc23. gcc/c-family/ * c-common.cc (flag_isoc2x): Rename to flag_isoc23. (c_common_reswords): Use D_C23 instead of D_C2X. * c-common.h: Refer throughout to C23 instead of C2X in comments. (D_C2X): Rename to D_C23. (flag_isoc2x): Rename to flag_isoc23. * c-cppbuiltin.cc (builtin_define_float_constants): Use flag_isoc23 instead of flag_isoc2x. Refer to C23 instead of C2x in comments. * c-format.cc: Use STD_C23 instead of STD_C2X and flag_isoc23 instead of flag_isoc2x. Refer to C23 instead of C2X in comments. * c-format.h: Use STD_C23 instead of STD_C2X. * c-lex.cc: Use warn_c11_c23_compat instead of warn_c11_c2x_compat and flag_isoc23 instead of flag_isoc2x. Refer to C23 instead of C2X in diagnostics. * c-opts.cc: Use flag_isoc23 instead of flag_isoc2x. Refer to C23 instead of C2X in comments. (set_std_c2x): Rename to set_std_c23. * c.opt (Wc11-c23-compat): Use CPP(cpp_warn_c11_c23_compat) CppReason(CPP_W_C11_C23_COMPAT) Var(warn_c11_c23_compat) instead of CPP(cpp_warn_c11_c2x_compat) CppReason(CPP_W_C11_C2X_COMPAT) Var(warn_c11_c2x_compat). gcc/c/ * c-decl.cc: Use flag_isoc23 instead of flag_isoc2x and c23_auto_p instead of c2x_auto_p. Refer to C23 instead of C2X in diagnostics and comments. * c-errors.cc: Use flag_isoc23 instead of flag_isoc2x and warn_c11_c23_compat instead of warn_c11_c2x_compat. Refer to C23 instead of C2X in comments. * c-parser.cc: Use flag_isoc23 instead of flag_isoc2x, warn_c11_c23_compat instead of warn_c11_c2x_compat, c23_auto_p instead of c2x_auto_p and D_C23 instead of D_C2X. Refer to C23 instead of C2X in diagnostics and comments. * c-tree.h: Refer to C23 instead of C2X in comments. (struct c_declspecs): Rename c2x_auto_p to c23_auto_p. * c-typeck.cc: Use flag_isoc23 instead of flag_isoc2x and warn_c11_c23_compat instead of warn_c11_c2x_compat. Refer to C23 instead of C2X in diagnostics and comments. gcc/fortran/ * gfortran.h (gfc_real_info): Refer to C23 instead of C2X in comment. gcc/lto/ * lto-lang.cc (flag_isoc2x): Rename to flag_isoc23. gcc/testsuite/ * gcc.dg/binary-constants-2.c: Refer to C23 instead of C2X. * gcc.dg/binary-constants-3.c: Likewise. * gcc.dg/bitint-23.c: Likewise. * gcc.dg/bitint-26.c: Likewise. * gcc.dg/bitint-27.c: Likewise. * gcc.dg/c11-attr-syntax-1.c: Likewise. * gcc.dg/c11-attr-syntax-2.c: Likewise. * gcc.dg/c11-floatn-1.c: Likewise. * gcc.dg/c11-floatn-2.c: Likewise. * gcc.dg/c11-floatn-3.c: Likewise. * gcc.dg/c11-floatn-4.c: Likewise. * gcc.dg/c11-floatn-5.c: Likewise. * gcc.dg/c11-floatn-6.c: Likewise. * gcc.dg/c11-floatn-7.c: Likewise. * gcc.dg/c11-floatn-8.c: Likewise. * gcc.dg/c2x-attr-syntax-4.c: Likewise. * gcc.dg/c2x-attr-syntax-6.c: Likewise. * gcc.dg/c2x-attr-syntax-7.c: Likewise. * gcc.dg/c2x-binary-constants-2.c: Likewise. * gcc.dg/c2x-floatn-5.c: Likewise. * gcc.dg/c2x-floatn-6.c: Likewise. * gcc.dg/c2x-floatn-7.c: Likewise. * gcc.dg/c2x-floatn-8.c: Likewise. * gcc.dg/c2x-nullptr-4.c: Likewise. * gcc.dg/c2x-qual-2.c: Likewise. * gcc.dg/c2x-qual-3.c: Likewise. * gcc.dg/c2x-qual-6.c: Likewise. * gcc.dg/cpp/c11-warning-1.c: Likewise. * gcc.dg/cpp/c11-warning-2.c: Likewise. * gcc.dg/cpp/c11-warning-3.c: Likewise. * gcc.dg/cpp/c2x-warning-2.c: Likewise. * gcc.dg/cpp/gnu11-elifdef-3.c: Likewise. * gcc.dg/cpp/gnu11-elifdef-4.c: Likewise. * gcc.dg/cpp/gnu11-warning-1.c: Likewise. * gcc.dg/cpp/gnu11-warning-2.c: Likewise. * gcc.dg/cpp/gnu11-warning-3.c: Likewise. * gcc.dg/cpp/gnu2x-warning-2.c: Likewise. * gcc.dg/dfp/c11-constants-1.c: Likewise. * gcc.dg/dfp/c11-constants-2.c: Likewise. * gcc.dg/dfp/c2x-constants-2.c: Likewise. * gcc.dg/dfp/constants-pedantic.c: Likewise. * gcc.dg/pr30260.c: Likewise. * gcc.dg/system-binary-constants-1.c: Likewise. libcpp/ * directives.cc: Refer to C23 instead of C2X in diagnostics and comments. (STDC2X): Rename to STDC23. * expr.cc: Use cpp_warn_c11_c23_compat instead of cpp_warn_c11_c2x_compat and CPP_W_C11_C23_COMPAT instead of CPP_W_C11_C2X_COMPAT. Refer to C23 instead of C2X in diagnostics and comments. * include/cpplib.h: Refer to C23 instead of C2X in diagnostics and comments. (CLK_GNUC2X): Rename to CLK_GNUC23. (CLK_STDC2X): Rename to CLK_STDC23. (CPP_W_C11_C2X_COMPAT): Rename to CPP_W_C11_C23_COMPAT. * init.cc: Use GNUC23 instead of GNUC2X, STDC23 instead of STDC2X and cpp_warn_c11_c23_compat instead of cpp_warn_c11_c2x_compat. * lex.cc (maybe_va_opt_error): Refer to C23 instead of C2X in diagnostic. * macro.cc (_cpp_arguments_ok): Refer to C23 instead of C2X in comment.
2023-10-25Consistently order 'OMP_CLAUSE_SELF' right after 'OMP_CLAUSE_IF'Thomas Schwinge1-2/+2
As noted in recent commit 3a3596389c2e539cb8fd5dc5784a4e2afe193a2a "OpenACC 2.7: Implement self clause for compute constructs", the OpenACC 'self' clause very much relates to the 'if' clause, and therefore copies a lot of the latter's handling. Therefore it makes sense to also place this handling in proximity to that of the 'if' clause, which was done in a lot but not all instances. gcc/ * tree-core.h (omp_clause_code): Move 'OMP_CLAUSE_SELF' after 'OMP_CLAUSE_IF'. * tree-pretty-print.cc (dump_omp_clause): Adjust. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Likewise. * tree.h: Likewise.
2023-10-25OpenACC 2.7: Implement self clause for compute constructsChung-Lin Tang1-0/+2
This patch implements the 'self' clause for compute constructs: parallel, kernels, and serial. This clause conditionally uses the local device (the host mult-core CPU) as the executing device of the compute region. The actual implementation of the "local device" device type inside libgomp (presumably using pthreads) is still not yet completed, so the libgomp side is still implemented the exact same as host-fallback mode. (so as of now, it essentially behaves like the 'if' clause with the condition inverted) gcc/c/ChangeLog: * c-parser.cc (c_parser_oacc_compute_clause_self): New function. (c_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (c_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * c-typeck.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case. gcc/cp/ChangeLog: * parser.cc (cp_parser_oacc_compute_clause_self): New function. (cp_parser_oacc_all_clauses): Add new 'bool compute_p = false' parameter, add parsing of self clause when compute_p is true. (OACC_KERNELS_CLAUSE_MASK): Add PRAGMA_OACC_CLAUSE_SELF. (OACC_PARALLEL_CLAUSE_MASK): Likewise, (OACC_SERIAL_CLAUSE_MASK): Likewise. (cp_parser_oacc_compute): Adjust call to c_parser_oacc_all_clauses to set compute_p argument to true. * pt.cc (tsubst_omp_clauses): Add OMP_CLAUSE_SELF case. * semantics.cc (c_finish_omp_clauses): Add OMP_CLAUSE_SELF case, merged with OMP_CLAUSE_IF case. gcc/fortran/ChangeLog: * gfortran.h (typedef struct gfc_omp_clauses): Add self_expr field. * openmp.cc (enum omp_mask2): Add OMP_CLAUSE_SELF. (gfc_match_omp_clauses): Add handling for OMP_CLAUSE_SELF. (OACC_PARALLEL_CLAUSES): Add OMP_CLAUSE_SELF. (OACC_KERNELS_CLAUSES): Likewise. (OACC_SERIAL_CLAUSES): Likewise. (resolve_omp_clauses): Add handling for omp_clauses->self_expr. * trans-openmp.cc (gfc_trans_omp_clauses): Add handling of clauses->self_expr and building of OMP_CLAUSE_SELF tree clause. (gfc_split_omp_clauses): Add handling of self_expr field copy. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Add OMP_CLAUSE_SELF case. (gimplify_adjust_omp_clauses): Likewise. * omp-expand.cc (expand_omp_target): Add OMP_CLAUSE_SELF expansion code, * omp-low.cc (scan_sharing_clauses): Add OMP_CLAUSE_SELF case. * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_SELF enum. * tree-nested.cc (convert_nonlocal_omp_clauses): Add OMP_CLAUSE_SELF case. (convert_local_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Add OMP_CLAUSE_SELF case. * tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_SELF entry. (omp_clause_code_name): Likewise. * tree.h (OMP_CLAUSE_SELF_EXPR): New macro. gcc/testsuite/ChangeLog: * c-c++-common/goacc/self-clause-1.c: New test. * c-c++-common/goacc/self-clause-2.c: New test. * gfortran.dg/goacc/self.f95: New test. include/ChangeLog: * gomp-constants.h (GOACC_FLAG_LOCAL_DEVICE): New flag bit value. libgomp/ChangeLog: * oacc-parallel.c (GOACC_parallel_keyed): Add code to handle GOACC_FLAG_LOCAL_DEVICE case. * testsuite/libgomp.oacc-c-c++-common/self-1.c: New test.
2023-10-20tree-optimization/111000 - restrict invariant motion of shiftsRichard Biener1-0/+1
The following restricts moving variable shifts to when they are always executed in the loop as we currently do not have an efficient way to rewrite them to something that is unconditionally well-defined and value range analysis will otherwise compute invalid ranges for the shift operand. PR tree-optimization/111000 * stor-layout.h (element_precision): Move .. * tree.h (element_precision): .. here. * tree-ssa-loop-im.cc (movement_possibility_1): Restrict motion of shifts and rotates. * gcc.dg/torture/pr111000.c: New testcase.
2023-10-14middle-end: Allow _BitInt(65535) [PR102989]Jakub Jelinek1-3/+9
The following patch lifts further restrictions which limited _BitInt to at most 16319 bits up to 65535. The problem was mainly in INTEGER_CST representation, which had 3 unsigned char members to describe lengths in number of 64-bit limbs, which it wanted to fit into 32 bits. This patch removes the third one which was just a cache to save a few compile time cycles for wi::to_offset and enlarges the other two members to unsigned short. Furthermore, the same problem has been in some uses of trailing_wide_int* (in value-range-storage*) and value-range-storage* itself, while other uses of trailing_wide_int* have been fine (e.g. CONST_POLY_INT, where no constants will be larger than 3/5/9/11 limbs depending on target, so 255 limit is plenty). The patch turns all those length representations to be unsigned short for consistency, so value-range-storage* can handle even 16320-65535 bits BITINT_TYPE ranges. The cc1plus growth is about 16K, so not really significant for 38M .text section. Note, the reason for the new limit is unsigned int precision : 16; TYPE_PRECISION limit, if we wanted to overcome that, TYPE_PRECISION would need to use some other member for BITINT_TYPE from all the others and we could reach that way 4194239 limit (65535 * 64 - 1, again implied by INTEGER_CST and value-range-storage*). Dunno if that is worth it or if it is something we want to do for GCC 14 though. 2023-10-14 Jakub Jelinek <jakub@redhat.com> PR c/102989 gcc/ * tree-core.h (struct tree_base): Remove int_length.offset member, change type of int_length.unextended and int_length.extended from unsigned char to unsigned short. * tree.h (TREE_INT_CST_OFFSET_NUNITS): Remove. (wi::extended_tree <N>::get_len): Don't use TREE_INT_CST_OFFSET_NUNITS, instead compute it at runtime from TREE_INT_CST_EXT_NUNITS and TREE_INT_CST_NUNITS. * tree.cc (wide_int_to_tree_1): Don't assert TREE_INT_CST_OFFSET_NUNITS value. (make_int_cst): Don't initialize TREE_INT_CST_OFFSET_NUNITS. * wide-int.h (WIDE_INT_MAX_ELTS): Change from 255 to 1024. (WIDEST_INT_MAX_ELTS): Change from 510 to 2048, adjust comment. (trailing_wide_int_storage): Change m_len type from unsigned char * to unsigned short *. (trailing_wide_int_storage::trailing_wide_int_storage): Change second argument from unsigned char * to unsigned short *. (trailing_wide_ints): Change m_max_len type from unsigned char to unsigned short. Change m_len element type from struct{unsigned char len;} to unsigned short. (trailing_wide_ints <N>::operator []): Remove .len from m_len accesses. * value-range-storage.h (irange_storage::lengths_address): Change return type from const unsigned char * to const unsigned short *. (irange_storage::write_lengths_address): Change return type from unsigned char * to unsigned short *. * value-range-storage.cc (irange_storage::write_lengths_address): Likewise. (irange_storage::lengths_address): Change return type from const unsigned char * to const unsigned short *. (write_wide_int): Change len argument type from unsigned char *& to unsigned short *&. (irange_storage::set_irange): Change len variable type from unsigned char * to unsigned short *. (read_wide_int): Change len argument type from unsigned char to unsigned short. Use trailing_wide_int_storage <unsigned short> instead of trailing_wide_int_storage and trailing_wide_int <unsigned short> instead of trailing_wide_int. (irange_storage::get_irange): Change len variable type from unsigned char * to unsigned short *. (irange_storage::size): Multiply n by sizeof (unsigned short) in len_size variable initialization. (irange_storage::dump): Change len variable type from unsigned char * to unsigned short *. gcc/cp/ * module.cc (trees_out::start, trees_in::start): Remove TREE_INT_CST_OFFSET_NUNITS handling. gcc/testsuite/ * gcc.dg/bitint-38.c: Change into dg-do run test, in addition to checking the addition, division and right shift results at compile time check it also at runtime. * gcc.dg/bitint-39.c: New test.
2023-10-12wide-int: Fix build with gcc < 12 or clang++ [PR111787]Jakub Jelinek1-4/+19
While my wide_int patch bootstrapped/regtested fine when I used GCC 12 as system gcc, apparently it doesn't with GCC 11 and older or clang++. For GCC before PR96555 C++ DR1315 implementation the compiler complains about template argument involving template parameters, for clang++ the same + complains about missing needs_write_val_arg static data member in some wi::int_traits specializations. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR bootstrap/111787 * tree.h (wi::int_traits <unextended_tree>::needs_write_val_arg): New static data member. (int_traits <extended_tree <N>>::needs_write_val_arg): Likewise. (wi::ints_for): Provide separate partial specializations for generic_wide_int <extended_tree <N>> and INL_CONST_PRECISION or that and CONST_PRECISION, rather than using int_traits <extended_tree <N> >::precision_type as the second template argument. * rtl.h (wi::int_traits <rtx_mode_t>::needs_write_val_arg): New static data member. * double-int.h (wi::int_traits <double_int>::needs_write_val_arg): Likewise.
2023-10-12wide-int: Allow up to 16320 bits wide_int and change widest_int precision to ↵Jakub Jelinek1-6/+9
32640 bits [PR102989] As mentioned in the _BitInt support thread, _BitInt(N) is currently limited by the wide_int/widest_int maximum precision limitation, which is depending on target 191, 319, 575 or 703 bits (one less than WIDE_INT_MAX_PRECISION). That is fairly low limit for _BitInt, especially on the targets with the 191 bit limitation. The following patch bumps that limit to 16319 bits on all arches (which support _BitInt at all), which is the limit imposed by INTEGER_CST representation (unsigned char members holding number of HOST_WIDE_INT limbs). In order to achieve that, wide_int is changed from a trivially copyable type which contained just an inline array of WIDE_INT_MAX_ELTS (3, 5, 9 or 11 limbs depending on target) limbs into a non-trivially copy constructible, copy assignable and destructible type which for the usual small cases (up to WIDE_INT_MAX_INL_ELTS which is the former WIDE_INT_MAX_ELTS) still uses an inline array of limbs, but for larger precisions uses heap allocated limb array. This makes wide_int unusable in GC structures, so for dwarf2out which was the only place which needed it there is a new rwide_int type (restricted wide_int) which supports only up to RWIDE_INT_MAX_ELTS limbs inline and is trivially copyable (dwarf2out should never deal with large _BitInt constants, those should have been lowered earlier). Similarly, widest_int has been changed from a trivially copyable type which contained also an inline array of WIDE_INT_MAX_ELTS limbs (but unlike wide_int didn't contain precision and assumed that to be WIDE_INT_MAX_PRECISION) into a non-trivially copy constructible, copy assignable and destructible type which has always WIDEST_INT_MAX_PRECISION precision (32640 bits currently, twice as much as INTEGER_CST limitation allows) and unlike wide_int decides depending on get_len () value whether it uses an inline array (again, up to WIDE_INT_MAX_INL_ELTS) or heap allocated one. In wide-int.h this means we need to estimate an upper bound on how many limbs will wide-int.cc (usually, sometimes wide-int.h) need to write, heap allocate if needed based on that estimation and upon set_len which is done at the end if we guessed over WIDE_INT_MAX_INL_ELTS and allocated dynamically, while we actually need less than that copy/deallocate. The unexact guesses are needed because the exact computation of the length in wide-int.cc is sometimes quite complex and especially canonicalize at the end can decrease it. widest_int is again because of this not usable in GC structures, so cfgloop.h has been changed to use fixed_wide_int_storage <WIDE_INT_MAX_INL_PRECISION> and punt if we'd have larger _BitInt based iterators, programs having more than 128-bit iterators will be hopefully rare and I think it is fine to treat loops with more than 2^127 iterations as effectively possibly infinite, omp-general.cc is changed to use fixed_wide_int_storage <1024>, as it better should support scores with the same precision on all arches. Code which used WIDE_INT_PRINT_BUFFER_SIZE sized buffers for printing wide_int/widest_int into buffer had to be changed to use XALLOCAVEC for larger lengths. On x86_64, the patch in --enable-checking=yes,rtl,extra configured bootstrapped cc1plus enlarges the .text section by 1.01% - from 0x25725a5 to 0x25e5555 and similarly at least when compiling insn-recog.cc with the usual bootstrap option slows compilation down by 1.01%, user 4m22.046s and 4m22.384s on vanilla trunk vs. 4m25.947s and 4m25.581s on patched trunk. I'm afraid some code size growth and compile time slowdown is unavoidable in this case, we use wide_int and widest_int everywhere, and while the rare cases are marked with UNLIKELY macros, it still means extra checks for it. The patch also regresses +FAIL: gm2/pim/fail/largeconst.mod, -O +FAIL: gm2/pim/fail/largeconst.mod, -O -g +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst.mod, -Os +FAIL: gm2/pim/fail/largeconst.mod, -g +FAIL: gm2/pim/fail/largeconst2.mod, -O +FAIL: gm2/pim/fail/largeconst2.mod, -O -g +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer +FAIL: gm2/pim/fail/largeconst2.mod, -O3 -fomit-frame-pointer -finline-functions +FAIL: gm2/pim/fail/largeconst2.mod, -Os +FAIL: gm2/pim/fail/largeconst2.mod, -g tests, which previously were rejected with error: constant literal ‘12345678912345678912345679123456789123456789123456789123456789123456791234567891234567891234567891234567891234567912345678912345678912345678912345678912345679123456789123456789’ exceeds internal ZTYPE range kind of errors, but now are accepted. Seems the FE tries to parse constants into widest_int in that case and only diagnoses if widest_int overflows, that seems wrong, it should at least punt if stuff doesn't fit into WIDE_INT_MAX_PRECISION, but perhaps far less than that, if it wants support for middle-end for precisions above 128-bit, it better should be using BITINT_TYPE. Will file a PR and defer to Modula2 maintainer. 2023-10-12 Jakub Jelinek <jakub@redhat.com> PR c/102989 * wide-int.h: Adjust file comment. (WIDE_INT_MAX_INL_ELTS): Define to former value of WIDE_INT_MAX_ELTS. (WIDE_INT_MAX_INL_PRECISION): Define. (WIDE_INT_MAX_ELTS): Change to 255. Assert that WIDE_INT_MAX_INL_ELTS is smaller than WIDE_INT_MAX_ELTS. (RWIDE_INT_MAX_ELTS, RWIDE_INT_MAX_PRECISION, WIDEST_INT_MAX_ELTS, WIDEST_INT_MAX_PRECISION): Define. (WI_BINARY_RESULT_VAR, WI_UNARY_RESULT_VAR): Change write_val callers to pass 0 as a new argument. (class widest_int_storage): Likewise. (widest_int, widest2_int): Change typedefs to use widest_int_storage rather than fixed_wide_int_storage. (enum wi::precision_type): Add INL_CONST_PRECISION enumerator. (struct binary_traits): Add partial specializations for INL_CONST_PRECISION. (generic_wide_int): Add needs_write_val_arg static data member. (int_traits): Likewise. (wide_int_storage): Replace val non-static data member with a union u of it and HOST_WIDE_INT *valp. Declare copy constructor, copy assignment operator and destructor. Add unsigned int argument to write_val. (wide_int_storage::wide_int_storage): Initialize precision to 0 in the default ctor. Remove unnecessary {}s around STATIC_ASSERTs. Assert in non-default ctor T's precision_type is not INL_CONST_PRECISION and allocate u.valp for large precision. Add copy constructor. (wide_int_storage::~wide_int_storage): New. (wide_int_storage::operator=): Add copy assignment operator. In assignment operator remove unnecessary {}s around STATIC_ASSERTs, assert ctor T's precision_type is not INL_CONST_PRECISION and if precision changes, deallocate and/or allocate u.valp. (wide_int_storage::get_val): Return u.valp rather than u.val for large precision. (wide_int_storage::write_val): Likewise. Add an unused unsigned int argument. (wide_int_storage::set_len): Use write_val instead of writing val directly. (wide_int_storage::from, wide_int_storage::from_array): Adjust write_val callers. (wide_int_storage::create): Allocate u.valp for large precisions. (wi::int_traits <wide_int_storage>::get_binary_precision): New. (fixed_wide_int_storage::fixed_wide_int_storage): Make default ctor defaulted. (fixed_wide_int_storage::write_val): Add unused unsigned int argument. (fixed_wide_int_storage::from, fixed_wide_int_storage::from_array): Adjust write_val callers. (wi::int_traits <fixed_wide_int_storage>::get_binary_precision): New. (WIDEST_INT): Define. (widest_int_storage): New template class. (wi::int_traits <widest_int_storage>): New. (trailing_wide_int_storage::write_val): Add unused unsigned int argument. (wi::get_binary_precision): Use wi::int_traits <WI_BINARY_RESULT (T1, T2)>::get_binary_precision rather than get_precision on get_binary_result. (wi::copy): Adjust write_val callers. Don't call set_len if needs_write_val_arg. (wi::bit_not): If result.needs_write_val_arg, call write_val again with upper bound estimate of len. (wi::sext, wi::zext, wi::set_bit): Likewise. (wi::bit_and, wi::bit_and_not, wi::bit_or, wi::bit_or_not, wi::bit_xor, wi::add, wi::sub, wi::mul, wi::mul_high, wi::div_trunc, wi::div_floor, wi::div_ceil, wi::div_round, wi::divmod_trunc, wi::mod_trunc, wi::mod_floor, wi::mod_ceil, wi::mod_round, wi::lshift, wi::lrshift, wi::arshift): Likewise. (wi::bswap, wi::bitreverse): Assert result.needs_write_val_arg is false. (gt_ggc_mx, gt_pch_nx): Remove generic template for all generic_wide_int, instead add functions and templates for each storage of generic_wide_int. Make functions for generic_wide_int <wide_int_storage> and templates for generic_wide_int <widest_int_storage <N>> deleted. (wi::mask, wi::shifted_mask): Adjust write_val calls. * wide-int.cc (zeros): Decrease array size to 1. (BLOCKS_NEEDED): Use CEIL. (canonize): Use HOST_WIDE_INT_M1. (wi::from_buffer): Pass 0 to write_val. (wi::to_mpz): Use CEIL. (wi::from_mpz): Likewise. Pass 0 to write_val. Use WIDE_INT_MAX_INL_ELTS instead of WIDE_INT_MAX_ELTS. (wi::mul_internal): Use WIDE_INT_MAX_INL_PRECISION instead of MAX_BITSIZE_MODE_ANY_INT in automatic array sizes, for prec above WIDE_INT_MAX_INL_PRECISION estimate precision from lengths of operands. Use XALLOCAVEC allocated buffers for prec above WIDE_INT_MAX_INL_PRECISION. (wi::divmod_internal): Likewise. (wi::lshift_large): For len > WIDE_INT_MAX_INL_ELTS estimate it from xlen and skip. (rshift_large_common): Remove xprecision argument, add len argument with len computed in caller. Don't return anything. (wi::lrshift_large, wi::arshift_large): Compute len here and pass it to rshift_large_common, for lengths above WIDE_INT_MAX_INL_ELTS using estimations from xlen if possible. (assert_deceq, assert_hexeq): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. (test_printing): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.h (WIDE_INT_PRINT_BUFFER_SIZE): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION. * wide-int-print.cc (print_decs, print_decu, print_hex): For lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * tree.h (wi::int_traits<extended_tree <N>>): Change precision_type to INL_CONST_PRECISION for N == ADDR_MAX_PRECISION. (widest_extended_tree): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::ints_for): Use int_traits <extended_tree <N> >::precision_type instead of hard coded CONST_PRECISION. (widest2_int_cst): Use WIDEST_INT_MAX_PRECISION instead of WIDE_INT_MAX_PRECISION. (wi::extended_tree <N>::get_len): Use WIDEST_INT_MAX_PRECISION rather than WIDE_INT_MAX_PRECISION. (wi::ints_for::zero): Use wi::int_traits <wi::extended_tree <N> >::precision_type instead of wi::CONST_PRECISION. * tree.cc (build_replicated_int_cst): Formatting fix. Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. * print-tree.cc (print_node): Don't print TREE_UNAVAILABLE on INTEGER_CSTs, TREE_VECs or SSA_NAMEs. * double-int.h (wi::int_traits <double_int>::precision_type): Change to INL_CONST_PRECISION from CONST_PRECISION. * poly-int.h (struct poly_coeff_traits): Add partial specialization for wi::INL_CONST_PRECISION. * cfgloop.h (bound_wide_int): New typedef. (struct nb_iter_bound): Change bound type from widest_int to bound_wide_int. (struct loop): Change nb_iterations_upper_bound, nb_iterations_likely_upper_bound and nb_iterations_estimate type from widest_int to bound_wide_int. * cfgloop.cc (record_niter_bound): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (get_estimated_loop_iterations, get_max_loop_iterations, get_likely_max_loop_iterations): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * tree-vect-loop.cc (vect_transform_loop): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Use XALLOCAVEC allocated buffer for i_bound len above WIDE_INT_MAX_INL_ELTS. (record_estimate): Return early if wi::min_precision of i_bound is too large for bound_wide_int. Adjustments for the widest_int to bound_wide_int type change in non-static data members. (wide_int_cmp): Use bound_wide_int instead of widest_int. (bound_index): Use bound_wide_int instead of widest_int. (discover_iteration_bound_by_body_walk): Likewise. Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (maybe_lower_iteration_bound): Use widest_int::from to convert it to widest_int when passed to record_niter_bound. (estimate_numbers_of_iteration): Don't record upper bound if loop->nb_iterations has too large precision for bound_wide_int. (n_of_executions_at_most): Use widest_int::from. * tree-ssa-loop-ivcanon.cc (remove_redundant_iv_tests): Adjust for the widest_int to bound_wide_int changes. * match.pd (fold_sign_changed_comparison simplification): Use wide_int::from on wi::to_wide instead of wi::to_widest. * value-range.h (irange::maybe_resize): Avoid using memcpy on non-trivially copyable elements. * value-range.cc (irange_bitmask::dump): Use XALLOCAVEC allocated buffer for mask or value len above WIDE_INT_PRINT_BUFFER_SIZE. * fold-const.cc (fold_convert_const_int_from_int, fold_unary_loc): Use wide_int::from on wi::to_wide instead of wi::to_widest. * tree-ssa-ccp.cc (bit_value_binop): Zero extend r1max from width before calling wi::udiv_trunc. * lto-streamer-out.cc (output_cfg): Adjustments for the widest_int to bound_wide_int type change in non-static data members. * lto-streamer-in.cc (input_cfg): Likewise. (lto_input_tree_1): Use WIDE_INT_MAX_INL_ELTS rather than WIDE_INT_MAX_ELTS. For length above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. Formatting fix. * data-streamer-in.cc (streamer_read_wide_int, streamer_read_widest_int): Likewise. * tree-affine.cc (aff_combination_expand): Use placement new to construct name_expansion. (free_name_expansion): Destruct name_expansion. * gimple-ssa-strength-reduction.cc (struct slsr_cand_d): Change index type from widest_int to offset_int. (class incr_info_d): Change incr type from widest_int to offset_int. (alloc_cand_and_find_basis, backtrace_base_for_ref, restructure_reference, slsr_process_ref, create_mul_ssa_cand, create_mul_imm_cand, create_add_ssa_cand, create_add_imm_cand, slsr_process_add, cand_abs_increment, replace_mult_candidate, replace_unconditional_candidate, incr_vec_index, create_add_on_incoming_edge, create_phi_basis_1, replace_conditional_candidate, record_increment, record_phi_increments_1, phi_incr_cost_1, phi_incr_cost, lowest_cost_path, total_savings, ncd_with_phi, ncd_of_cand_and_phis, nearest_common_dominator_for_cands, insert_initializers, all_phi_incrs_profitable_1, replace_one_candidate, replace_profitable_candidates): Use offset_int rather than widest_int and wi::to_offset rather than wi::to_widest. * real.cc (real_to_integer): Use WIDE_INT_MAX_INL_ELTS rather than 2 * WIDE_INT_MAX_ELTS and for words above that use XALLOCAVEC allocated buffer. * tree-ssa-loop-ivopts.cc (niter_for_exit): Use placement new to construct tree_niter_desc and destruct it on failure. (free_tree_niter_desc): Destruct tree_niter_desc if value is non-NULL. * gengtype.cc (main): Remove widest_int handling. * graphite-isl-ast-to-gimple.cc (widest_int_from_isl_expr_int): Use WIDEST_INT_MAX_ELTS instead of WIDE_INT_MAX_ELTS. * gimple-ssa-warn-alloca.cc (pass_walloca::execute): Use WIDE_INT_MAX_INL_PRECISION instead of WIDE_INT_MAX_PRECISION and assert get_len () fits into it. * value-range-pretty-print.cc (vrange_printer::print_irange_bitmasks): For mask or value lengths above WIDE_INT_MAX_INL_ELTS use XALLOCAVEC allocated buffer. * gimple-ssa-sprintf.cc (adjust_range_for_overflow): Use wide_int::from on wi::to_wide instead of wi::to_widest. * omp-general.cc (score_wide_int): New typedef. (omp_context_compute_score): Use score_wide_int instead of widest_int and adjust for those changes. (struct omp_declare_variant_entry): Change score and score_in_declare_simd_clone non-static data member type from widest_int to score_wide_int. (omp_resolve_late_declare_variant, omp_resolve_declare_variant): Use score_wide_int instead of widest_int and adjust for those changes. (omp_lto_output_declare_variant_alt): Likewise. (omp_lto_input_declare_variant_alt): Likewise. * godump.cc (go_output_typedef): Assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/c-family/ * c-warn.cc (match_case_to_enum_1): Use wi::to_wide just once instead of 3 times, assert get_len () is smaller than WIDE_INT_MAX_INL_ELTS. gcc/testsuite/ * gcc.dg/bitint-38.c: New test.
2023-09-29Remove poly_int_podRichard Sandiford1-4/+4
poly_int was written before the switch to C++11 and so couldn't use explicit default constructors. This led to an awkward split between poly_int_pod and poly_int. poly_int simply inherited from poly_int_pod and added constructors, with the argumentless constructor having an empty body. But inheritance meant that poly_int had to repeat the assignment operators from poly_int_pod (again, no C++11, so no "using" to inherit base-class implementations). All that goes away if we switch to using default constructors. The main complication is ensuring that braced initialisation still gives a constexpr, so that static variables can be initialised without runtime code. The two problems here are: (1) When initialising a poly_int<N, wide_int> with fewer than N coefficients, the other coefficients need to be a zero of the same precision as the explicit coefficients. This was previously done in a for loop using wi::ints_for<...>::zero, but C++11 constexpr constructors can't have function bodies. The patch instead uses a series of delegated initialisers to fill in the implicit coefficients. (2) The initialisation in: void f(int x) { unsigned int foo {x}; } produces the warning: warning: narrowing conversion of 'x' from 'int' to 'unsigned int' [-Wnarrowing] whereas: void f(int x) { unsigned int foo = x; } does not. So switching to direct initialisation of the coeffs array would mean that: poly_uin64_t x = 0; would trigger a warning for using 0 rather than 0u. That seemed overly pedantic, so the patch adds explicit casts to the constructor. The complication is to do that without adding extra code to wide-int versions. The patch uses a new init_cast type for that. gcc/ * poly-int.h (poly_int_pod): Delete. (poly_coeff_traits::init_cast): New type. (poly_int_full, poly_int_hungry, poly_int_fullness): New structures. (poly_int): Replace constructors that take 1 and 2 coefficients with a general one that takes an arbitrary number of coefficients. Delegate initialization to two new private constructors, one of which uses the coefficients as-is and one of which adds an extra zero of the appropriate type (and precision, where applicable). (gt_ggc_mx, gt_pch_nx): Operate on poly_ints rather than poly_int_pods. * poly-int-types.h (poly_uint16_pod, poly_int64_pod, poly_uint64_pod) (poly_offset_int_pod, poly_wide_int_pod, poly_widest_int_pod): Delete. * gengtype.cc (main): Don't register poly_int64_pod. * calls.cc (initialize_argument_information): Use poly_int rather than poly_int_pod. (combine_pending_stack_adjustment_and_call): Likewise. * config/aarch64/aarch64.cc (pure_scalable_type_info): Likewise. * data-streamer.h (bp_unpack_poly_value): Likewise. * dwarf2cfi.cc (struct dw_trace_info): Likewise. (struct queued_reg_save): Likewise. * dwarf2out.h (struct dw_cfa_location): Likewise. * emit-rtl.h (struct incoming_args): Likewise. (struct rtl_data): Likewise. * expr.cc (get_bit_range): Likewise. (get_inner_reference): Likewise. * expr.h (get_bit_range): Likewise. * fold-const.cc (split_address_to_core_and_offset): Likewise. (ptr_difference_const): Likewise. * fold-const.h (ptr_difference_const): Likewise. * function.cc (try_fit_stack_local): Likewise. (instantiate_new_reg): Likewise. * function.h (struct expr_status): Likewise. (struct args_size): Likewise. * genmodes.cc (ZERO_COEFFS): Likewise. (mode_size_inline): Likewise. (mode_nunits_inline): Likewise. (emit_mode_precision): Likewise. (emit_mode_size): Likewise. (emit_mode_nunits): Likewise. * gimple-fold.cc (get_base_constructor): Likewise. * gimple-ssa-store-merging.cc (struct symbolic_number): Likewise. * inchash.h (class hash): Likewise. * ipa-modref-tree.cc (modref_access_node::dump): Likewise. * ipa-modref.cc (modref_access_analysis::merge_call_side_effects): Likewise. * ira-int.h (ira_spilled_reg_stack_slot): Likewise. * lra-eliminations.cc (self_elim_offsets): Likewise. * machmode.h (mode_size, mode_precision, mode_nunits): Likewise. * omp-low.cc (omplow_simd_context): Likewise. * pretty-print.cc (pp_wide_integer): Likewise. * pretty-print.h (pp_wide_integer): Likewise. * reload.cc (struct decomposition): Likewise. * reload.h (struct reload): Likewise. * reload1.cc (spill_stack_slot_width): Likewise. (struct elim_table): Likewise. (offsets_at): Likewise. (init_eliminable_invariants): Likewise. * rtl.h (union rtunion): Likewise. (poly_int_rtx_p): Likewise. (strip_offset): Likewise. (strip_offset_and_add): Likewise. * rtlanal.cc (strip_offset): Likewise. * tree-dfa.cc (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-dfa.h (get_ref_base_and_extent): Likewise. (get_addr_base_and_unit_offset_1): Likewise. (get_addr_base_and_unit_offset): Likewise. * tree-ssa-loop-ivopts.cc (struct iv_use): Likewise. (strip_offset): Likewise. * tree-ssa-sccvn.h (struct vn_reference_op_struct): Likewise. * tree.cc (ptrdiff_tree_p): Likewise. * tree.h (poly_int_tree_p): Likewise. (ptrdiff_tree_p): Likewise. (get_inner_reference): Likewise. gcc/testsuite/ * gcc.dg/plugin/poly-int-tests.h (test_num_coeffs_extra): Use poly_int rather than poly_int_pod.
2023-09-06Middle-end _BitInt support [PR102989]Jakub Jelinek1-6/+88
The following patch introduces the middle-end part of the _BitInt support, a new BITINT_TYPE, handling it where needed, except the lowering pass and sanitizer support. 2023-09-06 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree.def (BITINT_TYPE): New type. * tree.h (TREE_CHECK6, TREE_NOT_CHECK6): Define. (NUMERICAL_TYPE_CHECK, INTEGRAL_TYPE_P): Include BITINT_TYPE. (BITINT_TYPE_P): Define. (CONSTRUCTOR_BITFIELD_P): Return true even for BLKmode bit-fields if they have BITINT_TYPE type. (tree_check6, tree_not_check6): New inline functions. (any_integral_type_check): Include BITINT_TYPE. (build_bitint_type): Declare. * tree.cc (tree_code_size, wide_int_to_tree_1, cache_integer_cst, build_zero_cst, type_hash_canon_hash, type_cache_hasher::equal, type_hash_canon): Handle BITINT_TYPE. (bitint_type_cache): New variable. (build_bitint_type): New function. (signed_or_unsigned_type_for, verify_type_variant, verify_type): Handle BITINT_TYPE. (tree_cc_finalize): Free bitint_type_cache. * builtins.cc (type_to_class): Handle BITINT_TYPE. (fold_builtin_unordered_cmp): Handle BITINT_TYPE like INTEGER_TYPE. * cfgexpand.cc (expand_debug_expr): Punt on BLKmode BITINT_TYPE INTEGER_CSTs. * convert.cc (convert_to_pointer_1, convert_to_real_1, convert_to_complex_1): Handle BITINT_TYPE like INTEGER_TYPE. (convert_to_integer_1): Likewise. For BITINT_TYPE don't check GET_MODE_PRECISION (TYPE_MODE (type)). * doc/generic.texi (BITINT_TYPE): Document. * doc/tm.texi.in (TARGET_C_BITINT_TYPE_INFO): New. * doc/tm.texi: Regenerated. * dwarf2out.cc (base_type_die, is_base_type, modified_type_die, gen_type_die_with_usage): Handle BITINT_TYPE. (rtl_for_decl_init): Punt on BLKmode BITINT_TYPE INTEGER_CSTs or handle those which fit into shwi. * expr.cc (expand_expr_real_1): Define EXTEND_BITINT macro, reduce to bitfield precision reads from BITINT_TYPE vars, parameters or memory locations. Expand large/huge BITINT_TYPE INTEGER_CSTs into memory. * fold-const.cc (fold_convert_loc, make_range_step): Handle BITINT_TYPE. (extract_muldiv_1): For BITINT_TYPE use TYPE_PRECISION rather than GET_MODE_SIZE (SCALAR_INT_TYPE_MODE). (native_encode_int, native_interpret_int, native_interpret_expr): Handle BITINT_TYPE. * gimple-expr.cc (useless_type_conversion_p): Make BITINT_TYPE to some other integral type or vice versa conversions non-useless. * gimple-fold.cc (gimple_fold_builtin_memset): Punt for BITINT_TYPE. (clear_padding_unit): Mention in comment that _BitInt types don't need to fit either. (clear_padding_bitint_needs_padding_p): New function. (clear_padding_type_may_have_padding_p): Handle BITINT_TYPE. (clear_padding_type): Likewise. * internal-fn.cc (expand_mul_overflow): For unsigned non-mode precision operands force pos_neg? to 1. (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): New functions. * internal-fn.def (MULBITINT, DIVMODBITINT, FLOATTOBITINT, BITINTTOFLOAT): New internal functions. * internal-fn.h (expand_MULBITINT, expand_DIVMODBITINT, expand_FLOATTOBITINT, expand_BITINTTOFLOAT): Declare. * match.pd (non-equality compare simplifications from fold_binary): Punt if TYPE_MODE (arg1_type) is BLKmode. * pretty-print.h (pp_wide_int): Handle printing of large precision wide_ints which would buffer overflow digit_buffer. * stor-layout.cc (finish_bitfield_representative): For bit-fields with BITINT_TYPE, prefer representatives with precisions in multiple of limb precision. (layout_type): Handle BITINT_TYPE. Handle COMPLEX_TYPE with BLKmode element type and assert it is BITINT_TYPE. * target.def (bitint_type_info): New C target hook. * target.h (struct bitint_info): New type. * targhooks.cc (default_bitint_type_info): New function. * targhooks.h (default_bitint_type_info): Declare. * tree-pretty-print.cc (dump_generic_node): Handle BITINT_TYPE. Handle printing large wide_ints which would buffer overflow digit_buffer. * tree-ssa-sccvn.cc: Include target.h. (eliminate_dom_walker::eliminate_stmt): Punt for large/huge BITINT_TYPE. * tree-switch-conversion.cc (jump_table_cluster::emit): For more than 64-bit BITINT_TYPE subtract low bound from expression and cast to 64-bit integer type both the controlling expression and case labels. * typeclass.h (enum type_class): Add bitint_type_class enumerator. * varasm.cc (output_constant): Handle BITINT_TYPE INTEGER_CSTs. * vr-values.cc (check_for_binary_op_overflow): Use widest2_int rather than widest_int. (simplify_using_ranges::simplify_internal_call_using_ranges): Use unsigned_type_for rather than build_nonstandard_integer_type.
2023-08-25OpenMP: Add OMP_STRUCTURED_BLOCK and GIMPLE_OMP_STRUCTURED_BLOCK.Sandra Loosemore1-0/+3
In order to detect invalid jumps in and out of intervening code in imperfectly-nested loops, the front ends need to insert some sort of marker to identify the structured block sequences that they push into the inner body of the loop. The error checking happens in the diagnose_omp_blocks pass, between gimplification and OMP lowering, so we need both GENERIC and GIMPLE representations of these markers. They are removed in OMP lowering so no subsequent passes need to know about them. This patch doesn't include any front-end changes to generate the new data structures. gcc/cp/ChangeLog * constexpr.cc (cxx_eval_constant_expression): Handle OMP_STRUCTURED_BLOCK. * pt.cc (tsubst_expr): Likewise. gcc/ChangeLog * doc/generic.texi (OpenMP): Document OMP_STRUCTURED_BLOCK. * doc/gimple.texi (GIMPLE instruction set): Add GIMPLE_OMP_STRUCTURED_BLOCK. (GIMPLE_OMP_STRUCTURED_BLOCK): New subsection. * gimple-low.cc (lower_stmt): Error on GIMPLE_OMP_STRUCTURED_BLOCK. * gimple-pretty-print.cc (dump_gimple_omp_block): Handle GIMPLE_OMP_STRUCTURED_BLOCK. (pp_gimple_stmt_1): Likewise. * gimple-walk.cc (walk_gimple_stmt): Likewise. * gimple.cc (gimple_build_omp_structured_block): New. * gimple.def (GIMPLE_OMP_STRUCTURED_BLOCK): New. * gimple.h (gimple_build_omp_structured_block): Declare. (gimple_has_substatements): Handle GIMPLE_OMP_STRUCTURED_BLOCK. (CASE_GIMPLE_OMP): Likewise. * gimplify.cc (is_gimple_stmt): Handle OMP_STRUCTURED_BLOCK. (gimplify_expr): Likewise. * omp-expand.cc (GIMPLE_OMP_STRUCTURED_BLOCK): Error on GIMPLE_OMP_STRUCTURED_BLOCK. * omp-low.cc (scan_omp_1_stmt): Handle GIMPLE_OMP_STRUCTURED_BLOCK. (lower_omp_1): Likewise. (diagnose_sb_1): Likewise. (diagnose_sb_2): Likewise. * tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_STRUCTURED_BLOCK. (estimate_num_insns): Likewise. * tree-nested.cc (convert_nonlocal_reference_stmt): Likewise. (convert_local_reference_stmt): Likewise. (convert_gimple_call): Likewise. * tree-pretty-print.cc (dump_generic_node): Handle OMP_STRUCTURED_BLOCK. * tree.def (OMP_STRUCTURED_BLOCK): New. * tree.h (OMP_STRUCTURED_BLOCK_BODY): New.
2023-07-12tree: Hide wi::from_mpz from GENERATOR_FILEKewen Lin1-0/+2
Similar to r0-85707-g34917a102a4e0c for PR35051, the uses of mpz_t should be guarded with "#ifndef GENERATOR_FILE". This patch is to fix it and avoid some possible build errors. gcc/ChangeLog: * tree.h (wi::from_mpz): Hide from GENERATOR_FILE.
2023-07-04middle-end/110495 - avoid associating constants with (VL) vectorsRichard Biener1-1/+1
When trying to associate (v + INT_MAX) + INT_MAX we are using the TREE_OVERFLOW bit to check for correctness. That isn't working for VECTOR_CSTs and it can't in general when one considers VL vectors. It looks like it should work for COMPLEX_CSTs but I didn't try to single out _Complex int in this change. The following makes sure that for vectors we use the fallback of using unsigned arithmetic when associating the above to v + (INT_MAX + INT_MAX). PR middle-end/110495 * tree.h (TREE_OVERFLOW): Do not mention VECTOR_CSTs since we do not set TREE_OVERFLOW on those since the introduction of VL vectors. * match.pd (x +- CST +- CST): For VECTOR_CST do not look at TREE_OVERFLOW to determine validity of association. * gcc.dg/tree-ssa/addadd-2.c: Amend. * gcc.dg/tree-ssa/forwprop-27.c: Adjust.
2023-07-03tree+ggc: Change return type of predicate functions from int to boolUros Bizjak1-3/+3
Also change internal variable from int to bool. gcc/ChangeLog: * tree.h (tree_int_cst_equal): Change return type from int to bool. (operand_equal_for_phi_arg_p): Ditto. (tree_map_base_marked_p): Ditto. * tree.cc (contains_placeholder_p): Update function body for bool return type. (type_cache_hasher::equal): Ditto. (tree_map_base_hash): Change return type from int to void and adjust function body accordingly. (tree_int_cst_equal): Ditto. (operand_equal_for_phi_arg_p): Ditto. (get_narrower): Change "first" variable to bool. (cl_option_hasher::equal): Update function body for bool return type. * ggc.h (ggc_set_mark): Change return type from int to bool. (ggc_marked_p): Ditto. * ggc-page.cc (gt_ggc_mx): Change return type from int to void and adjust function body accordingly. (ggc_set_mark): Ditto.
2023-06-29Introduce IR bit TYPE_INCLUDES_FLEXARRAY for the GCC extensionQing Zhao1-1/+6
on a structure with a C99 flexible array member being nested in another structure GCC extension accepts the case when a struct with a flexible array member is embedded into another struct or union (possibly recursively) as the last field. This patch is to introduce the IR bit TYPE_INCLUDES_FLEXARRAY (reuse the existing IR bit TYPE_NO_NAMED_ARGS_SATDARG_P), set it correctly in C FE, stream it correctly in Middle-end, and print it during IR dumping. gcc/c/ChangeLog: * c-decl.cc (finish_struct): Set TYPE_INCLUDES_FLEXARRAY for struct/union type. gcc/lto/ChangeLog: * lto-common.cc (compare_tree_sccs_1): Compare bit TYPE_NO_NAMED_ARGS_STDARG_P or TYPE_INCLUDES_FLEXARRAY properly for its corresponding type. gcc/ChangeLog: * print-tree.cc (print_node): Print new bit type_include_flexarray. * tree-core.h (struct tree_type_common): Use bit no_named_args_stdarg_p as type_include_flexarray for RECORD_TYPE or UNION_TYPE. * tree-streamer-in.cc (unpack_ts_type_common_value_fields): Stream in bit no_named_args_stdarg_p properly for its corresponding type. * tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream out bit no_named_args_stdarg_p properly for its corresponding type. * tree.h (TYPE_INCLUDES_FLEXARRAY): New macro TYPE_INCLUDES_FLEXARRAY.
2023-06-28Prevent TYPE_PRECISION on VECTOR_TYPEsRichard Biener1-1/+3
The following makes sure that using TYPE_PRECISION on VECTOR_TYPE ICEs when tree checking is enabled. This should avoid wrong-code in cases like PR110182 and instead ICE. It also introduces a TYPE_PRECISION_RAW accessor and adjusts places I found that are eligible to use that. * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE. (TYPE_PRECISION_RAW): Provide raw access to the precision field. * tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW. (gimple_canonical_types_compatible_p): Likewise. * tree-streamer-out.cc (pack_ts_type_common_value_fields): Stream TYPE_PRECISION_RAW. * tree-streamer-in.cc (unpack_ts_type_common_value_fields): Likewise. * lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW. gcc/lto/ * lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.
2023-06-06openmp: Add support for the 'present' modifierTobias Burnus1-0/+3
This implements support for the OpenMP 5.1 'present' modifier, which can be used in map clauses in the 'target', 'target data', 'target data enter' and 'target data exit' constructs, and in the 'to' and 'from' clauses of the 'target update' construct. It is also supported in defaultmap. The modifier triggers a fatal runtime error if the data specified by the clause is not already present on the target device. It can also be combined with 'always' in map clauses. 2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/c/ * c-parser.cc (c_parser_omp_clause_defaultmap, c_parser_omp_clause_map): Parse 'present'. (c_parser_omp_clause_to, c_parser_omp_clause_from): Remove. (c_parser_omp_clause_from_to): New; parse to/from clauses with optional present modifer. (c_parser_omp_all_clauses): Update call. (c_parser_omp_target_data, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Handle new map enum values for 'present' mapping. gcc/cp/ * parser.cc (cp_parser_omp_clause_defaultmap, cp_parser_omp_clause_map): Parse 'present'. (cp_parser_omp_clause_from_to): New; parse to/from clauses with optional 'present' modifier. (cp_parser_omp_all_clauses): Update call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data): Handle new enum value for 'present' mapping. * semantics.cc (finish_omp_target): Likewise. gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses. * gfortran.h (enum gfc_omp_map_op): Add entries with 'present' modifiers. (struct gfc_omp_namelist): Add 'present_modifer'. * openmp.cc (gfc_match_motion_var_list): New, handles optional 'present' modifier for to/from clauses. (gfc_match_omp_clauses): Call it for to/from clauses; parse 'present' in defaultmap and map clauses. (resolve_omp_clauses): Allow 'present' modifiers on 'target', 'target data', 'target enter' and 'target exit' directives. * trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for defaultmap. gcc/ * gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag set. (omp_get_attachment): Handle map clauses with 'present' modifier. (omp_group_base): Likewise. (gimplify_scan_omp_clauses): Reorder present maps to come first. Set GOVD flags for present defaultmaps. (gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps. * omp-low.cc (scan_sharing_clauses): Handle 'always, present' map clauses. (lower_omp_target): Handle map clauses with 'present' modifier. Handle 'to' and 'from' clauses with 'present'. * tree-core.h (enum omp_clause_defaultmap_kind): Add OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind. * tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and 'from' clauses with 'present' modifier. Handle present defaultmap. * tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New. (GOMP_MAP_FLAG_FORCE): Redefine. (GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New. (enum gomp_map_kind): Add map kinds with 'present' modifiers. (GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for map variants with 'present' (GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true for map variants with 'always, present' modifiers. (GOMP_MAP_ALWAYS): Redefine. (GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New. libgomp/ * libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses. * target.c (gomp_to_device_kind_p): Add map kinds with 'present' modifier. (gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro. (gomp_map_vars_internal, gomp_update, gomp_target_rev): Emit runtime error if memory region not present. * testsuite/libgomp.c-c++-common/target-present-1.c: New test. * testsuite/libgomp.c-c++-common/target-present-2.c: New test. * testsuite/libgomp.c-c++-common/target-present-3.c: New test. * testsuite/libgomp.fortran/target-present-1.f90: New test. * testsuite/libgomp.fortran/target-present-2.f90: New test. * testsuite/libgomp.fortran/target-present-3.f90: New test. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update dg-error, extend to test for duplicated 'present' and extend scan-dump tests for 'present'. * gfortran.dg/gomp/defaultmap-1.f90: Update dg-error. * gfortran.dg/gomp/map-7.f90: Extend parse and dump test for 'present'. * gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present' modifier checking. * c-c++-common/gomp/defaultmap-4.c: New test. * c-c++-common/gomp/map-9.c: New test. * c-c++-common/gomp/target-update-1.c: New test. * gfortran.dg/gomp/defaultmap-8.f90: New test. * gfortran.dg/gomp/map-11.f90: New test. * gfortran.dg/gomp/map-12.f90: New test. * gfortran.dg/gomp/target-update-1.f90: New test.
2023-06-05vect: Refactor to allow internal_fn'sAndre Vieira1-0/+21
Refactor vect-patterns to allow patterns to be internal_fns starting with widening_plus/minus patterns 2023-06-05 Andre Vieira <andre.simoesdiasvieira@arm.com> Joel Hutton <joel.hutton@arm.com> gcc/ChangeLog: * tree-vect-patterns.cc: Add include for gimple-iterator. (vect_recog_widen_op_pattern): Refactor to use code_helper. (vect_gimple_build): New function. * tree-vect-stmts.cc (simple_integer_narrowing): Refactor to use code_helper. (vectorizable_call): Likewise. (vect_gen_widened_results_half): Likewise. (vect_create_vectorized_demotion_stmts): Likewise. (vect_create_vectorized_promotion_stmts): Likewise. (vect_create_half_widening_stmts): Likewise. (vectorizable_conversion): Likewise. (supportable_widening_operation): Likewise. (supportable_narrowing_operation): Likewise. * tree-vectorizer.h (supportable_widening_operation): Change prototype to use code_helper. (supportable_narrowing_operation): Likewise. (vect_gimple_build): New function prototype. * tree.h (code_helper::safe_as_tree_code): New function. (code_helper::safe_as_fn_code): New function.
2023-06-02c++: make initializer_list array static again [PR110070]Jason Merrill1-0/+6
After the maybe_init_list_as_* patches, I noticed that we were putting the array of strings into .rodata, but then memcpying it into an automatic array, which is pointless; we should be able to use it directly. This doesn't happen automatically because TREE_ADDRESSABLE is set (since r12-657 for PR100464), and so gimplify_init_constructor won't promote the variable to static. Theoretically we could do escape analysis to recognize that the address, though taken, never leaves the function; that would allow promotion when we're only using the address for indexing within the function, as in initlist-opt2.C. But this would be a new pass. And in initlist-opt1.C, we're passing the array address to another function, so it definitely escapes; it's only safe in this case because it's calling a standard library function that we know only uses it for indexing. So, a flag seems needed. I first thought to put the flag on the TARGET_EXPR, but the VAR_DECL seems more appropriate. In a previous revision of the patch I called this flag DECL_NOT_OBSERVABLE, but I think DECL_MERGEABLE is a better name, especially if we're going to apply it to the backing array of initializer_list, which is observable. I then also check it in places that check for -fmerge-all-constants, so that multiple equivalent initializer-lists can also be combined. And then it seemed to make sense for [[no_unique_address]] to have this meaning for user-written variables. I think the note in [dcl.init.list]/6 intended to allow this kind of merging for initializer_lists, but it didn't actually work; for an explicit array with the same initializer, if the address escapes the program could tell whether the same variable in two frames have the same address. P2752 is trying to correct this defect, so I'm going to assume that this is the intent. PR c++/110070 PR c++/105838 gcc/ChangeLog: * tree.h (DECL_MERGEABLE): New. * tree-core.h (struct tree_decl_common): Mention it. * gimplify.cc (gimplify_init_constructor): Check it. * cgraph.cc (symtab_node::address_can_be_compared_p): Likewise. * varasm.cc (categorize_decl_for_section): Likewise. gcc/cp/ChangeLog: * call.cc (maybe_init_list_as_array): Set DECL_MERGEABLE. (convert_like_internal) [ck_list]: Set it. (set_up_extended_ref_temp): Copy it. * tree.cc (handle_no_unique_addr_attribute): Set it. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/initlist-opt1.C: Check for static array. * g++.dg/tree-ssa/initlist-opt2.C: Likewise. * g++.dg/tree-ssa/initlist-opt4.C: New test. * g++.dg/opt/icf1.C: New test. * g++.dg/opt/icf2.C: New test. * g++.dg/opt/icf3.C: New test. * g++.dg/tree-ssa/array-temp1.C: Revert r12-657 change.
2023-04-28tree-optimization/108752 - vectorize emulated vectors in lowered formRichard Biener1-0/+1
The following makes sure to emit operations lowered to bit operations when vectorizing using emulated vectors. This avoids relying on the vector lowering pass adhering to the exact same cost considerations as the vectorizer. PR tree-optimization/108752 * tree-vect-generic.cc (build_replicated_const): Rename to build_replicated_int_cst and move to tree.{h,cc}. (do_plus_minus): Adjust. (do_negate): Likewise. * tree-vect-stmts.cc (vectorizable_operation): Emit emulated arithmetic vector operations in lowered form. * tree.h (build_replicated_int_cst): Declare. * tree.cc (build_replicated_int_cst): Moved from tree-vect-generic.cc build_replicated_const.
2023-04-24c++, tree: declare some basic functions inlinePatrick Palka1-3/+29
The functions strip_array_types, is_typedef_decl, typedef_variant_p and cp_expr_location are used throughout the C++ front end including in some fairly hot parts (e.g. in the tsubst routines and cp_walk_subtree) and they're small enough that the overhead of calling them out-of-line is relatively significant. So this patch moves their definitions into the appropriate headers to enable inlining them. gcc/cp/ChangeLog: * cp-tree.h (cp_expr_location): Define here. * tree.cc (cp_expr_location): Don't define here. gcc/ChangeLog: * tree.cc (strip_array_types): Don't define here. (is_typedef_decl): Don't define here. (typedef_variant_p): Don't define here. * tree.h (strip_array_types): Define here. (is_typedef_decl): Define here. (typedef_variant_p): Define here.
2023-04-20tree: Add 3+ argument fndecl_built_in_pJakub Jelinek1-3/+24
On Wed, Feb 22, 2023 at 09:52:06AM +0000, Richard Biener wrote: > > The following testcase ICEs because we still have some spots that > > treat BUILT_IN_UNREACHABLE specially but not BUILT_IN_UNREACHABLE_TRAP > > the same. This patch uses (fndecl_built_in_p (node, BUILT_IN_UNREACHABLE) || fndecl_built_in_p (node, BUILT_IN_UNREACHABLE_TRAP)) a lot and from grepping around, we do something like that in lots of other places, or in some spots instead as (fndecl_built_in_p (node, BUILT_IN_NORMAL) && (DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER1 || DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER2)) The following patch adds an overload for this case, so we can write it in a shorter way, using C++11 argument packs so that it supports as many codes as one needs. 2023-04-20 Jakub Jelinek <jakub@redhat.com> Jonathan Wakely <jwakely@redhat.com> * tree.h (built_in_function_equal_p): New helper function. (fndecl_built_in_p): Turn into variadic template to support 1 or more built_in_function arguments. * builtins.cc (fold_builtin_expect): Use 3 argument fndecl_built_in_p. * gimplify.cc (goa_stabilize_expr): Likewise. * cgraphclones.cc (cgraph_node::create_clone): Likewise. * ipa-fnsummary.cc (compute_fn_summary): Likewise. * omp-low.cc (setjmp_or_longjmp_p): Likewise. * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee, cgraph_update_edges_for_call_stmt_node, cgraph_edge::verify_corresponds_to_fndecl, cgraph_node::verify_node): Likewise. * tree-stdarg.cc (optimize_va_list_gpr_fpr_size): Likewise. * gimple-ssa-warn-access.cc (matching_alloc_calls_p): Likewise. * ipa-prop.cc (try_make_edge_direct_virtual_call): Likewise.
2023-03-21tree: Fix up component_ref_sam_type handling of arrays of 0 sized elements ↵Jakub Jelinek1-2/+2
[PR109215] Our documentation sadly talks about elt_type arr[0]; as zero-length arrays, not arrays with zero elements. Unfortunately, those aren't the only arrays which can have zero size, the same size can be also result of zero-length element, like in GNU C struct whatever {} or in GNU C/C++ if the element type is [0] array or combination thereof (dunno if Ada doesn't allow something similar too). One can't do much with them, taking address of their elements, (no-op) copying of the elements in and out. But they behave differently from arr[0] arrays e.g. in that using non-zero indexes in them (as long as they are within bounds as for normal arrays) is valid. I think this naming inaccuracy resulted in Martin designing special_array_member in an inconsistent way, mixing size zero array members with array members of one or two or more elements and then using the size zero interchangeably with zero elements. The following patch changes that (but doesn't do any documentation/diagnostics renaming, as this is really a corner case), such that int_0/trail_0 for consistency is just about [0] arrays plus [] for the latter, not one or more zero sized elements case. The testcase has one xfailed case for where perhaps in later GCC versions we could add extra code to handle it, for some reason we don't diagnose out of bounds accesses for the zero sized elements cases. It will be harder because e.g. FRE will canonicalize &var.fld[0] and &var.fld[10] to just one of them because they are provably the same address. But the important thing is to fix this regression (where we warn on completely valid code in the Linux kernel). Anyway, for further work on this we don't really need any extra help from special_array_member, all code can just check integer_zerop (TYPE_SIZE_UNIT (TREE_TYPE (type))), it doesn't depend on the position of the members etc. 2023-03-21 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109215 * tree.h (enum special_array_member): Adjust comments for int_0 and trail_0. * tree.cc (component_ref_sam_type): Clear zero_elts if memtype has zero sized element type and the array has variable number of elements or constant one or more elements. (component_ref_size): Adjust comments, formatting fix. * gcc.dg/Wzero-length-array-bounds-3.c: New test.
2023-03-10tree: Use comdat tree_code_{type,length} even for C++11/14 [PR108634]Jakub Jelinek1-0/+10
The recent change to undo the tree_code_type/tree_code_length excessive duplication apparently broke building the Linux kernel plugin. While it is certainly desirable that GCC plugins are built with the same compiler as GCC has been built and with the same options (at least the important ones), it might be hard to arrange that, e.g. if gcc is built using a cross-compiler but the plugin then built natively, or GCC isn't bootstrapped for other reasons, or just as in the kernel case they were building the plugin with -std=gnu++11 while the bootstrapped GCC has been built without any such option and so with whatever the compiler defaulted to. For C++17 and later tree_code_{type,length} are UNIQUE symbols with those assembler names, while for C++11/14 they were _ZL14tree_code_type and _ZL16tree_code_length. The following patch uses a comdat var for those even for C++11/14 as suggested by Maciej Cencora. Relying on weak attribute is not an option because not all hosts support it and there are non-GNU system compilers. While we could use it unconditionally, I think defining a template just to make it comdat is weird, and the compiler itself is always built with the same compiler. Plugins, being separate shared libraries, will have a separate copy of the arrays if they are ODR-used in the plugin, so there is not a big deal if e.g. cc1plus uses tree_code_type while plugin uses _ZN19tree_code_type_tmplILi0EE14tree_code_typeE or vice versa. 2023-03-10 Jakub Jelinek <jakub@redhat.com> PR plugins/108634 * tree-core.h (tree_code_type, tree_code_length): For C++11 or C++14, don't declare as extern const arrays. (tree_code_type_tmpl, tree_code_length_tmpl): New types with static constexpr member arrays for C++11 or C++14. * tree.h (TREE_CODE_CLASS): For C++11 or C++14 use tree_code_type_tmpl <0>::tree_code_type instead of tree_code_type. (TREE_CODE_LENGTH): For C++11 or C++14 use tree_code_length_tmpl <0>::tree_code_length instead of tree_code_length. * tree.cc (tree_code_type, tree_code_length): Remove.
2023-02-16don't declare header-defined functions both static and inlinePatrick Palka1-44/+44
Many functions defined in our headers are declared 'static inline' which is a C idiom whose use predates our move to C++ as the implementation language. But in C++ the inline keyword is more than just a compiler hint, and is sufficient to give the function the intended semantics. In fact declaring a function both static and inline is a pessimization since static effectively disables the desired definition merging behavior enabled by inline, and is also a source of (harmless) ODR violations when a static inline function gets called from a non-static inline one (such as tree_operand_check calling tree_operand_length). This patch mechanically fixes the vast majority of occurrences of this anti-pattern throughout the compiler's headers via the command line sed -i 's/^static inline/inline/g' gcc/*.h gcc/*/*.h There's also a manual change to remove the redundant declarations of is_ivar and lookup_category in gcc/objc/objc-act.cc which would otherwise conflict with their modified definitions in objc-act.h (due to the difference in staticness). Besides fixing some ODR violations, this speeds up stage1 cc1plus by about 2% and reduces the size of its text segment by 1.5MB. gcc/ChangeLog: * addresses.h: Mechanically drop 'static' from 'static inline' functions via s/^static inline/inline/g. * asan.h: Likewise. * attribs.h: Likewise. * basic-block.h: Likewise. * bitmap.h: Likewise. * cfghooks.h: Likewise. * cfgloop.h: Likewise. * cgraph.h: Likewise. * cselib.h: Likewise. * data-streamer.h: Likewise. * debug.h: Likewise. * df.h: Likewise. * diagnostic.h: Likewise. * dominance.h: Likewise. * dumpfile.h: Likewise. * emit-rtl.h: Likewise. * except.h: Likewise. * expmed.h: Likewise. * expr.h: Likewise. * fixed-value.h: Likewise. * gengtype.h: Likewise. * gimple-expr.h: Likewise. * gimple-iterator.h: Likewise. * gimple-predict.h: Likewise. * gimple-range-fold.h: Likewise. * gimple-ssa.h: Likewise. * gimple.h: Likewise. * graphite.h: Likewise. * hard-reg-set.h: Likewise. * hash-map.h: Likewise. * hash-set.h: Likewise. * hash-table.h: Likewise. * hwint.h: Likewise. * input.h: Likewise. * insn-addr.h: Likewise. * internal-fn.h: Likewise. * ipa-fnsummary.h: Likewise. * ipa-icf-gimple.h: Likewise. * ipa-inline.h: Likewise. * ipa-modref.h: Likewise. * ipa-prop.h: Likewise. * ira-int.h: Likewise. * ira.h: Likewise. * lra-int.h: Likewise. * lra.h: Likewise. * lto-streamer.h: Likewise. * memmodel.h: Likewise. * omp-general.h: Likewise. * optabs-query.h: Likewise. * optabs.h: Likewise. * plugin.h: Likewise. * pretty-print.h: Likewise. * range.h: Likewise. * read-md.h: Likewise. * recog.h: Likewise. * regs.h: Likewise. * rtl-iter.h: Likewise. * rtl.h: Likewise. * sbitmap.h: Likewise. * sched-int.h: Likewise. * sel-sched-ir.h: Likewise. * sese.h: Likewise. * sparseset.h: Likewise. * ssa-iterators.h: Likewise. * system.h: Likewise. * target-globals.h: Likewise. * target.h: Likewise. * timevar.h: Likewise. * tree-chrec.h: Likewise. * tree-data-ref.h: Likewise. * tree-iterator.h: Likewise. * tree-outof-ssa.h: Likewise. * tree-phinodes.h: Likewise. * tree-scalar-evolution.h: Likewise. * tree-sra.h: Likewise. * tree-ssa-alias.h: Likewise. * tree-ssa-live.h: Likewise. * tree-ssa-loop-manip.h: Likewise. * tree-ssa-loop.h: Likewise. * tree-ssa-operands.h: Likewise. * tree-ssa-propagate.h: Likewise. * tree-ssa-sccvn.h: Likewise. * tree-ssa.h: Likewise. * tree-ssanames.h: Likewise. * tree-streamer.h: Likewise. * tree-switch-conversion.h: Likewise. * tree-vectorizer.h: Likewise. * tree.h: Likewise. * wide-int.h: Likewise. gcc/c-family/ChangeLog: * c-common.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. gcc/c/ChangeLog: * c-parser.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. gcc/cp/ChangeLog: * cp-tree.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. gcc/fortran/ChangeLog: * gfortran.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. gcc/jit/ChangeLog: * jit-dejagnu.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. * jit-recording.h: Likewise. gcc/objc/ChangeLog: * objc-act.h: Mechanically drop static from static inline functions via s/^static inline/inline/g. * objc-map.h: Likewise. * objc-act.cc: Remove the redundant redeclarations of is_ivar and lookup_category.
2023-01-27Add support for conditional xorsign [PR96373]Richard Sandiford1-0/+1
This patch is an optimisation, but it's also a prerequisite for fixing PR96373 without regressing vect-xorsign_exec.c. Currently the vectoriser vectorises: for (i = 0; i < N; i++) r[i] = a[i] * __builtin_copysignf (1.0f, b[i]); as two unconditional operations (copysign and mult). tree-ssa-math-opts.cc later combines them into an "xorsign" function. This works for both Advanced SIMD and SVE. However, with the fix for PR96373, the vectoriser will instead generate a conditional multiplication (IFN_COND_MUL). Something then needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional xorsign. Three obvious options were: (1) Extend tree-ssa-math-opts.cc. (2) Do the fold in match.pd. (3) Leave it to rtl combine. I'm against (3), because this isn't a target-specific optimisation. (1) would be possible, but would involve open-coding a lot of what match.pd does for us. And, in contrast to doing the current tree-ssa-math-opts.cc optimisation in match.pd, there should be no danger of (2) happening too early. If we have an IFN_COND_MUL then we're already past the stage of simplifying the original source code. There was also a choice between adding a conditional xorsign ifn and simply open-coding the xorsign. The latter seems simpler, and means less boiler-plate for target-specific code. The signed_or_unsigned_type_for change is needed to make sure that we stay in "SVE space" when doing the optimisation on 128-bit fixed-length SVE. gcc/ PR tree-optimization/96373 * tree.h (sign_mask_for): Declare. * tree.cc (sign_mask_for): New function. (signed_or_unsigned_type_for): For vector types, try to use the related_int_vector_mode. * genmatch.cc (commutative_op): Handle conditional internal functions. * match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and. gcc/testsuite/ PR tree-optimization/96373 * gcc.target/aarch64/sve/cond_xorsign_1.c: New test. * gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
2023-01-02Update copyright years.Jakub Jelinek1-1/+1
2022-12-23tree-ssa-dom: can_infer_simple_equiv fixes [PR108068]Jakub Jelinek1-0/+1
As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find if a floating point constant is zero and it shouldn't try to infer equivalences from comparison against it if signed zeros are honored. This doesn't work at all for decimal types, because real_zerop always returns false for them (one can have different representations of decimal zero beyond -0/+0), and it doesn't work for vector compares either, as real_zerop checks if all elements are zero, while we need to avoid infering equivalences from comparison against vector constants which have at least one zero element in it (if signed zeros are honored). Furthermore, as mentioned by Joseph, for decimal types many other values aren't singleton. So, this patch stops infering anything if element mode is decimal, and otherwise uses instead of real_zerop a new function, real_maybe_zerop, which will work even for decimal types and for complex or vector will return true if any element is or might be zero (so it returns true for anything but constants for now). 2022-12-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108068 * tree.h (real_maybe_zerop): Declare. * tree.cc (real_maybe_zerop): Define. * tree-ssa-dom.cc (record_edge_info): Use it instead of real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop. Always set can_infer_simple_equiv to false for decimal floating point types. * gcc.dg/dfp/pr108068.c: New test.
2022-12-20c++: source position of lambda captures [PR84471]Jason Merrill1-0/+2
If the DECL_VALUE_EXPR of a VAR_DECL has EXPR_LOCATION set, then any use of that variable looks like it has that location, which leads to the debugger jumping back and forth for both lambdas and structured bindings. Rather than fix all the uses, it seems simplest to remove any EXPR_LOCATION when setting DECL_VALUE_EXPR. So the cp/ hunks aren't necessary, but they avoid the need to unshare to remove the location. PR c++/84471 PR c++/107504 gcc/cp/ChangeLog: * coroutines.cc (transform_local_var_uses): Don't specify a location for DECL_VALUE_EXPR. * decl.cc (cp_finish_decomp): Likewise. gcc/ChangeLog: * fold-const.cc (protected_set_expr_location_unshare): Not static. * tree.h: Declare it. * tree.cc (decl_value_expr_insert): Use it. include/ChangeLog: * ansidecl.h (ATTRIBUTE_WARN_UNUSED_RESULT): Add __. gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/value-expr1.C: New test. * g++.dg/tree-ssa/value-expr2.C: New test. * g++.dg/analyzer/pr93212.C: Move warning.
2022-12-12middle-end: Add new tbranch optab to add support for bit-test-and-branch ↵Tamar Christina1-0/+1
operations This adds a new test-and-branch optab that can be used to do a conditional test of a bit and branch. This is similar to the cbranch optab but instead can test any arbitrary bit inside the register. This patch recognizes boolean comparisons and single bit mask tests. gcc/ChangeLog: * dojump.cc (do_jump): Pass along value. (do_jump_by_parts_greater_rtx): Likewise. (do_jump_by_parts_zero_rtx): Likewise. (do_jump_by_parts_equality_rtx): Likewise. (do_compare_rtx_and_jump): Likewise. (do_compare_and_jump): Likewise. * dojump.h (do_compare_rtx_and_jump): New. * optabs.cc (emit_cmp_and_jump_insn_1): Refactor to take optab to check. (validate_test_and_branch): New. (emit_cmp_and_jump_insns): Optiobally take a value, and when value is supplied then check if it's suitable for tbranch. * optabs.def (tbranch_eq$a4, tbranch_ne$a4): New. * doc/md.texi (tbranch_@var{op}@var{mode}4): Document it. * optabs.h (emit_cmp_and_jump_insns): New. * tree.h (tree_zero_one_valued_p): New.
2022-12-11tree-optimization/106904 - bogus -Wstringopt-overflow with vectorsRichard Biener1-0/+1
The following avoids CSE of &ps->wp to &ps->wp.hwnd confusing -Wstringopt-overflow by making sure to produce addresses to the biggest container from vectorization. For this I introduce strip_zero_offset_components which turns &ps->wp.hwnd into &(*ps) and use that to base the vector data references on. That will also work for addresses with variable components, alternatively emitting pointer arithmetic via calling get_inner_reference and gimplifying that would be possible but likely more intrusive. This is by no means a complete fix for all of those issues (avoiding ADDR_EXPRs in favor of pointer arithmetic might be). Other passes will have similar issues. In theory that might now cause false negatives. PR tree-optimization/106904 * tree.h (strip_zero_offset_components): Declare. * tree.cc (strip_zero_offset_components): Define. * tree-vect-data-refs.cc (vect_create_addr_base_for_vector_ref): Strip zero offset components before building the address. * gcc.dg/Wstringop-overflow-pr106904.c: New testcase.
2022-12-06Update -Warray-bounds with -fstrict-flex-arrays.Qing Zhao1-4/+8
A. add the following to clarify the relationship between -Warray-bounds and the LEVEL of -fstrict-flex-array: By default, the trailing array of a structure will be treated as a flexible array member by '-Warray-bounds' or '-Warray-bounds=N' if it is declared as either a flexible array member per C99 standard onwards ('[]'), a GCC zero-length array extension ('[0]'), or an one-element array ('[1]'). As a result, out of bounds subscripts or offsets into zero-length arrays or one-element arrays are not warned by default. You can add the option '-fstrict-flex-arrays' or '-fstrict-flex-arrays=LEVEL' to control how this option treat trailing array of a structure as a flexible array member. when LEVEL<=1, no change to the default behavior. when LEVEL=2, additional warnings will be issued for out of bounds subscripts or offsets into one-element arrays; when LEVEL=3, in addition to LEVEL=2, additional warnings will be issued for out of bounds subscripts or offsets into zero-length arrays. B. change -Warray-bounds=2 to exclude its control on how to treat trailing arrays as flexible array members: '-Warray-bounds=2' This warning level also warns about the intermediate results of pointer arithmetic that may yield out of bounds values. This warning level may give a larger number of false positives and is deactivated by default. gcc/ChangeLog: * attribs.cc (strict_flex_array_level_of): New function. * attribs.h (strict_flex_array_level_of): Prototype for new function. * doc/invoke.texi: Update -Warray-bounds by specifying the impact from -fstrict-flex-arrays. Also update -Warray-bounds=2 by eliminating its impact on treating trailing arrays as flexible array members. * gimple-array-bounds.cc (get_up_bounds_for_array_ref): New function. (check_out_of_bounds_and_warn): New function. (array_bounds_checker::check_array_ref): Update with call to the above new functions. * tree.cc (array_ref_flexible_size_p): Add one new argument. (component_ref_sam_type): New function. (component_ref_size): Control with level of strict-flex-array. * tree.h (array_ref_flexible_size_p): Update prototype. (enum struct special_array_member): Add two new enum values. (component_ref_sam_type): New prototype. gcc/c/ChangeLog: * c-decl.cc (is_flexible_array_member_p): Call new function strict_flex_array_level_of. gcc/testsuite/ChangeLog: * gcc.dg/Warray-bounds-11.c: Update warnings for -Warray-bounds=2. * gcc.dg/Warray-bounds-flex-arrays-1.c: New test. * gcc.dg/Warray-bounds-flex-arrays-2.c: New test. * gcc.dg/Warray-bounds-flex-arrays-3.c: New test. * gcc.dg/Warray-bounds-flex-arrays-4.c: New test. * gcc.dg/Warray-bounds-flex-arrays-5.c: New test. * gcc.dg/Warray-bounds-flex-arrays-6.c: New test.
2022-11-24Remove ASSERT_EXPR.Aldy Hernandez1-4/+0
This removes all uses of ASSERT_EXPR except the internal one in ipa-*. gcc/ChangeLog: * doc/gimple.texi: Remove ASSERT_EXPR references. * fold-const.cc (tree_expr_nonzero_warnv_p): Same. (fold_binary_loc): Same. (tree_expr_nonnegative_warnv_p): Same. * gimple-array-bounds.cc (get_base_decl): Same. * gimple-pretty-print.cc (dump_unary_rhs): Same. * gimple.cc (get_gimple_rhs_num_ops): Same. * pointer-query.cc (handle_ssa_name): Same. * tree-cfg.cc (verify_gimple_assign_single): Same. * tree-pretty-print.cc (dump_generic_node): Same. * tree-scalar-evolution.cc (scev_dfs::follow_ssa_edge_expr):Same. (interpret_rhs_expr): Same. * tree-ssa-operands.cc (operands_scanner::get_expr_operands): Same. * tree-ssa-propagate.cc (substitute_and_fold_dom_walker::before_dom_children): Same. * tree-ssa-threadedge.cc: Same. * tree-vrp.cc (overflow_comparison_p): Same. * tree.def (ASSERT_EXPR): Add note. * tree.h (ASSERT_EXPR_VAR): Remove. (ASSERT_EXPR_COND): Remove. * vr-values.cc (simplify_using_ranges::vrp_visit_cond_stmt): Remove comment.
2022-11-09Change the name of array_at_struct_end_p to array_ref_flexible_size_pQing Zhao1-4/+4
The name of the utility routine "array_at_struct_end_p" is misleading and should be changed to a new name that more accurately reflects its real meaning. The routine "array_at_struct_end_p" is used to check whether an array reference is to an array whose actual size might be larger than its upper bound implies, which includes 3 different cases: A. a ref to a flexible array member at the end of a structure; B. a ref to an array with a different type against the original decl; C. a ref to an array that was passed as a parameter; The old name only reflects the above case A, therefore very confusing when reading the corresponding gcc source code. In this patch, A new name "array_ref_flexible_size_p" is used to replace the old name. All the references to the routine "array_at_struct_end_p" was replaced with this new name, and the corresponding comments were updated to make them clean and consistent. gcc/ChangeLog: * gimple-array-bounds.cc (trailing_array): Replace array_at_struct_end_p with new name and update comments. * gimple-fold.cc (get_range_strlen_tree): Likewise. * gimple-ssa-warn-restrict.cc (builtin_memref::builtin_memref): Likewise. * graphite-sese-to-poly.cc (bounds_are_valid): Likewise. * tree-if-conv.cc (idx_within_array_bound): Likewise. * tree-object-size.cc (addr_object_size): Likewise. * tree-ssa-alias.cc (component_ref_to_zero_sized_trailing_array_p): Likewise. (stmt_kills_ref_p): Likewise. * tree-ssa-loop-niter.cc (idx_infer_loop_bounds): Likewise. * tree-ssa-strlen.cc (maybe_set_strlen_range): Likewise. * tree.cc (array_at_struct_end_p): Rename to ... (array_ref_flexible_size_p): ... this. (component_ref_size): Replace array_at_struct_end_p with new name. * tree.h (array_at_struct_end_p): Rename to ... (array_ref_flexible_size_p): ... this.
2022-10-28c: tree: target: C2x (...) function prototypes and va_start relaxationJoseph Myers1-1/+7
C2x allows function prototypes to be given as (...), a prototype meaning a variable-argument function with no named arguments. To allow such functions to access their arguments, requirements for va_start calls are relaxed so it ignores all but its first argument (i.e. subsequent arguments, if any, can be arbitrary pp-token sequences). Implement this feature accordingly. The va_start relaxation in <stdarg.h> is itself easy: __builtin_va_start already supports a second argument of 0 instead of a parameter name, and calls get converted internally to the form using 0 for that argument, so <stdarg.h> just needs changing to use a variadic macro that passes 0 as the second argument of __builtin_va_start. (This is done only in C2x mode, on the expectation that users of older standard would expect unsupported uses of va_start to be diagnosed.) For the (...) functions, it's necessary to distinguish these from unprototyped functions, whereas previously C++ (...) functions and unprototyped functions both used NULL TYPE_ARG_TYPES. A flag is added to tree_type_common to mark the (...) functions; as discussed on gcc@, doing things this way is likely to be safer for unchanged code in GCC than adding a different form of representation in TYPE_ARG_TYPES, or adding a flag that instead signals that the function is unprototyped. There was previously an option -fallow-parameterless-variadic-functions to enable support for (...) prototypes. The support was incomplete - it treated the functions as unprototyped, and only parsed some declarations, not e.g. "int g (int (...));". This option is changed into a no-op ignored option; (...) is always accepted syntactically, with a pedwarn_c11 call to given required diagnostics when appropriate. The peculiarity of a parameter list with __attribute__ followed by '...' being accepted with that option is removed. Interfaces in tree.cc that create function types are adjusted to set this flag as appropriate. It is of course possible that some existing users of the functions to create variable-argument functions actually wanted unprototyped functions in the no-named-argument case, rather than functions with a (...) prototype; some such cases in c-common.cc (for built-in functions and implicit function declarations) turn out to need updating for that reason. I didn't do anything to change how the C++ front end creates (...) function types. It's very likely there are unchanged places in the compiler that in fact turn out to need changes to work properly with (...) function prototypes. Target setup_incoming_varargs hooks, where they used the information passed about the last named argument, needed updating to avoid using that information in the (...) case. Note that apart from the x86 changes, I haven't done any testing of those target changes beyond building cc1 to check for syntax errors. It's possible further target-specific fixes will be needed; target maintainers should watch out for failures of c2x-stdarg-4.c or c2x-stdarg-split-1a.c, the execution tests, which would indicate that this feature is not working correctly. Those tests also verify the case where there are named arguments but the last named argument has a declaration that results in undefined behavior in previous C standard versions, such as a type changed by the default argument promotions. Bootstrapped with no regressions for x86_64-pc-linux-gnu. gcc/ * config/aarch64/aarch64.cc (aarch64_setup_incoming_varargs): Check TYPE_NO_NAMED_ARGS_STDARG_P. * config/alpha/alpha.cc (alpha_setup_incoming_varargs): Likewise. * config/arc/arc.cc (arc_setup_incoming_varargs): Likewise. * config/arm/arm.cc (arm_setup_incoming_varargs): Likewise. * config/csky/csky.cc (csky_setup_incoming_varargs): Likewise. * config/epiphany/epiphany.cc (epiphany_setup_incoming_varargs): Likewise. * config/fr30/fr30.cc (fr30_setup_incoming_varargs): Likewise. * config/frv/frv.cc (frv_setup_incoming_varargs): Likewise. * config/ft32/ft32.cc (ft32_setup_incoming_varargs): Likewise. * config/i386/i386.cc (ix86_setup_incoming_varargs): Likewise. * config/ia64/ia64.cc (ia64_setup_incoming_varargs): Likewise. * config/loongarch/loongarch.cc (loongarch_setup_incoming_varargs): Likewise. * config/m32r/m32r.cc (m32r_setup_incoming_varargs): Likewise. * config/mcore/mcore.cc (mcore_setup_incoming_varargs): Likewise. * config/mips/mips.cc (mips_setup_incoming_varargs): Likewise. * config/mmix/mmix.cc (mmix_setup_incoming_varargs): Likewise. * config/nds32/nds32.cc (nds32_setup_incoming_varargs): Likewise. * config/nios2/nios2.cc (nios2_setup_incoming_varargs): Likewise. * config/riscv/riscv.cc (riscv_setup_incoming_varargs): Likewise. * config/rs6000/rs6000-call.cc (setup_incoming_varargs): Likewise. * config/sh/sh.cc (sh_setup_incoming_varargs): Likewise. * config/visium/visium.cc (visium_setup_incoming_varargs): Likewise. * config/vms/vms-c.cc (vms_c_common_override_options): Do not set flag_allow_parameterless_variadic_functions. * doc/invoke.texi (-fallow-parameterless-variadic-functions): Do not document option. * function.cc (assign_parms): Call assign_parms_setup_varargs for TYPE_NO_NAMED_ARGS_STDARG_P case. * ginclude/stdarg.h [__STDC_VERSION__ > 201710L] (va_start): Make variadic macro. Pass second argument of 0 to __builtin_va_start. * target.def (setup_incoming_varargs): Update documentation. * doc/tm.texi: Regenerate. * tree-core.h (struct tree_type_common): Add no_named_args_stdarg_p. * tree-streamer-in.cc (unpack_ts_type_common_value_fields): Unpack TYPE_NO_NAMED_ARGS_STDARG_P. * tree-streamer-out.cc (pack_ts_type_common_value_fields): Pack TYPE_NO_NAMED_ARGS_STDARG_P. * tree.cc (type_cache_hasher::equal): Compare TYPE_NO_NAMED_ARGS_STDARG_P. (build_function_type): Add argument no_named_args_stdarg_p. (build_function_type_list_1, build_function_type_array_1) (reconstruct_complex_type): Update calls to build_function_type. (stdarg_p, prototype_p): Return true for (...) functions. (gimple_canonical_types_compatible_p): Compare TYPE_NO_NAMED_ARGS_STDARG_P. * tree.h (TYPE_NO_NAMED_ARGS_STDARG_P): New. (build_function_type): Update prototype. gcc/c-family/ * c-common.cc (def_fn_type): Call build_function_type for zero-argument variable-argument function. (c_common_nodes_and_builtins): Build default_function_type with build_function_type. * c.opt (fallow-parameterless-variadic-functions): Mark as ignored option. gcc/c/ * c-decl.cc (grokdeclarator): Pass arg_info->no_named_args_stdarg_p to build_function_type. (grokparms): Check arg_info->no_named_args_stdarg_p before converting () to (void). (build_arg_info): Initialize no_named_args_stdarg_p. (get_parm_info): Set no_named_args_stdarg_p. (start_function): Pass TYPE_NO_NAMED_ARGS_STDARG_P to build_function_type. (store_parm_decls): Count (...) functions as prototyped. * c-parser.cc (c_parser_direct_declarator): Allow '...' after open parenthesis to start parameter list. (c_parser_parms_list_declarator): Always allow '...' with no arguments, call pedwarn_c11 and set no_named_args_stdarg_p. * c-tree.h (struct c_arg_info): Add field no_named_args_stdarg_p. * c-typeck.cc (composite_type): Handle TYPE_NO_NAMED_ARGS_STDARG_P. (function_types_compatible_p): Compare TYPE_NO_NAMED_ARGS_STDARG_P. gcc/fortran/ * trans-types.cc (gfc_get_function_type): Do not use build_varargs_function_type_vec for unprototyped function. gcc/lto/ * lto-common.cc (compare_tree_sccs_1): Compare TYPE_NO_NAMED_ARGS_STDARG_P. gcc/objc/ * objc-next-runtime-abi-01.cc (build_next_objc_exception_stuff): Use build_function_type to build type of objc_setjmp_decl. gcc/testsuite/ * gcc.dg/c11-stdarg-1.c, gcc.dg/c11-stdarg-2.c, gcc.dg/c11-stdarg-3.c, gcc.dg/c2x-stdarg-1.c, gcc.dg/c2x-stdarg-2.c, gcc.dg/c2x-stdarg-3.c, gcc.dg/c2x-stdarg-4.c, gcc.dg/gnu2x-stdarg-1.c, gcc.dg/torture/c2x-stdarg-split-1a.c, gcc.dg/torture/c2x-stdarg-split-1b.c: New tests. * gcc.dg/Wold-style-definition-2.c, gcc.dg/format/sentinel-1.c: Update expected diagnostics. * gcc.dg/c2x-nullptr-1.c (test5): Cast unused parameter to (void). * gcc.dg/diagnostic-token-ranges.c: Use -pedantic. Expect warning in place of error.
2022-10-24tree: add build_string_literal overloadsJason Merrill1-1/+8
Simplify several calls to build_string_literal by not requiring redundant strlen or IDENTIFIER_* in the caller. I also corrected a wrong comment on IDENTIFIER_LENGTH. gcc/ChangeLog: * tree.h (build_string_literal): New one-argument overloads that take tree (identifier) and const char *. * builtins.cc (fold_builtin_FILE) (fold_builtin_FUNCTION) * gimplify.cc (gimple_add_init_for_auto_var) * vtable-verify.cc (verify_bb_vtables): Simplify calls. gcc/cp/ChangeLog: * cp-gimplify.cc (fold_builtin_source_location) * vtable-class-hierarchy.cc (register_all_pairs): Simplify calls to build_string_literal. (build_string_from_id): Remove.
2022-10-14middle-end, c++, i386, libgcc: std::bfloat16_t and __bf16 arithmetic supportJakub Jelinek1-0/+1
Here is a complete patch to add std::bfloat16_t support on x86 (AArch64 and ARM left for later). Almost no BFmode optabs are added by the patch, so for binops/unops it extends to SFmode first and then truncates back to BFmode. For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations of all those conversions so that we avoid double rounding, for BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much it emits BFmode -> SFmode conversion first and then converts to the even wider mode, neither step should be imprecise. For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion and then SFmode -> HFmode, because neither format is subset or superset of the other, while SFmode is superset of both. expr.cc then contains a -ffast-math optimization of the BF -> SF and SF -> BF conversions if we don't optimize for space (and for the latter if -frounding-math isn't enabled either). For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16 but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math || !flag_unsafe_math_optimizations, because I think the insn doesn't raise on sNaNs, hardcodes round to nearest and flushes denormals to zero. By default (unless x86 -fexcess-precision=16) we use float excess precision for BFmode, so truncate only on explicit casts and assignments. The patch introduces a single __bf16 builtin - __builtin_nansf16b, because (__bf16) __builtin_nansf ("") will drop the sNaN into qNaN, and uses f16b suffix instead of bf16 because there would be ambiguity on log vs. logb - __builtin_logbf16 could be either log with bf16 suffix or logb with f16 suffix. In other cases libstdc++ should mostly use __builtin_*f for std::bfloat16_t overloads (we have a problem with std::nextafter though but that one we have also for std::float16_t). 2022-10-14 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE. * tree.h (bfloat16_type_node): Define. * tree.cc (excess_precision_type): Promote bfloat16_type_mode like float16_type_mode. (build_common_tree_nodes): Initialize bfloat16_type_node if BFmode is supported. * expmed.h (maybe_expand_shift): Declare. * expmed.cc (maybe_expand_shift): No longer static. * expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF conversions. If there is no optab, handle BF -> {DF,XF,TF,HF} conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add -ffast-math generic implementation for BF -> SF and SF -> BF conversions. * builtin-types.def (BT_BFLOAT16, BT_FN_BFLOAT16_CONST_STRING): New. * builtins.def (BUILT_IN_NANSF16B): New builtin. * fold-const-call.cc (fold_const_call): Handle CFN_BUILT_IN_NANSF16B. * config/i386/i386.cc (classify_argument): Handle E_BCmode. (ix86_libgcc_floating_mode_supported_p): Also return true for BFmode for -msse2. (ix86_mangle_type): Mangle BFmode as DF16b. (ix86_invalid_conversion, ix86_invalid_unary_op, ix86_invalid_binary_op): Remove. (TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP): Don't redefine. * config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove. (ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than ix86_bf16_type_node, only create it if still NULL. * config/i386/i386-builtin-types.def (BFLOAT16): Likewise. * config/i386/i386.md (cbranchbf4, cstorebf4): New expanders. gcc/c-family/ * c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node, predefine __BFLT16_*__ macros and for C++23 also __STDCPP_BFLOAT16_T__. Predefine bfloat16_type_node related macros for -fbuilding-libgcc. * c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote __bf16 to double. gcc/cp/ * cp-tree.h (extended_float_type_p): Return true for bfloat16_type_node. * typeck.cc (cp_compare_floating_point_conversion_ranks): Set extended{1,2} if mv{1,2} is bfloat16_type_node. Adjust comment. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_bfloat16, check_effective_target_bfloat16_runtime, add_options_for_bfloat16): New. * gcc.dg/torture/bfloat16-basic.c: New test. * gcc.dg/torture/bfloat16-builtin.c: New test. * gcc.dg/torture/bfloat16-builtin-issignaling-1.c: New test. * gcc.dg/torture/bfloat16-complex.c: New test. * gcc.dg/torture/builtin-issignaling-1.c: Allow to be includable from bfloat16-builtin-issignaling-1.c. * gcc.dg/torture/floatn-basic.h: Allow to be includable from bfloat16-basic.c. * gcc.target/i386/vect-bfloat16-typecheck_2.c: Adjust expected diagnostics. * gcc.target/i386/sse2-bfloat16-scalar-typecheck.c: Likewise. * gcc.target/i386/vect-bfloat16-typecheck_1.c: Likewise. * g++.target/i386/bfloat_cpp_typecheck.C: Likewise. libcpp/ * include/cpplib.h (CPP_N_BFLOAT16): Define. * expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for C++. libgcc/ * config/i386/t-softfp (softfp_extensions): Add bfsf. (softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf. (CFLAGS-extendbfsf2.c, CFLAGS-truncsfbf2.c, CFLAGS-truncdfbf2.c, CFLAGS-truncxfbf2.c, CFLAGS-trunctfbf2.c, CFLAGS-trunchfbf2.c): Add -msse2. * config/i386/libgcc-glibc.ver (GCC_13.0.0): Export __extendbfsf2 and __trunc{s,d,x,t,h}fbf2. * config/i386/sfp-machine.h (_FP_NANSIGN_B): Define. * config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define. * config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define. * soft-fp/brain.h: New file. * soft-fp/truncsfbf2.c: New file. * soft-fp/truncdfbf2.c: New file. * soft-fp/truncxfbf2.c: New file. * soft-fp/trunctfbf2.c: New file. * soft-fp/trunchfbf2.c: New file. * soft-fp/truncbfhf2.c: New file. * soft-fp/extendbfsf2.c: New file. libiberty/ * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment. * cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t entry. (cplus_demangle_type): Demangle DF16b. * testsuite/demangle-expected (_Z3xxxDF16b): New test.
2022-10-07Add a new option -fstrict-flex-arrays[=n] and new attribute strict_flex_arrayQing Zhao1-4/+10
Add the following new option -fstrict-flex-arrays[=n] and a corresponding attribute strict_flex_array to GCC: '-fstrict-flex-arrays' Control when to treat the trailing array of a structure as a flexible array member for the purpose of accessing the elements of such an array. The positive form is equivalent to '-fstrict-flex-arrays=3', which is the strictest. A trailing array is treated as a flexible array member only when it declared as a flexible array member per C99 standard onwards. The negative form is equivalent to '-fstrict-flex-arrays=0', which is the least strict. All trailing arrays of structures are treated as flexible array members. '-fstrict-flex-arrays=LEVEL' Control when to treat the trailing array of a structure as a flexible array member for the purpose of accessing the elements of such an array. The value of LEVEL controls the level of strictness The possible values of LEVEL are the same as for the 'strict_flex_array' attribute (*note Variable Attributes::). You can control this behavior for a specific trailing array field of a structure by using the variable attribute 'strict_flex_array' attribute (*note Variable Attributes::). 'strict_flex_array (LEVEL)' The 'strict_flex_array' attribute should be attached to the trailing array field of a structure. It controls when to treat the trailing array field of a structure as a flexible array member for the purposes of accessing the elements of such an array. LEVEL must be an integer betwen 0 to 3. LEVEL=0 is the least strict level, all trailing arrays of structures are treated as flexible array members. LEVEL=3 is the strictest level, only when the trailing array is declared as a flexible array member per C99 standard onwards ('[]'), it is treated as a flexible array member. There are two more levels in between 0 and 3, which are provided to support older codes that use GCC zero-length array extension ('[0]') or one-element array as flexible array members('[1]'): When LEVEL is 1, the trailing array is treated as a flexible array member when it is declared as either '[]', '[0]', or '[1]'; When LEVEL is 2, the trailing array is treated as a flexible array member when it is declared as either '[]', or '[0]'. This attribute can be used with or without the '-fstrict-flex-arrays'. When both the attribute and the option present at the same time, the level of the strictness for the specific trailing array field is determined by the attribute. gcc/c-family/ChangeLog: * c-attribs.cc (handle_strict_flex_array_attribute): New function. (c_common_attribute_table): New item for strict_flex_array. * c.opt: (fstrict-flex-arrays): New option. (fstrict-flex-arrays=): New option. gcc/c/ChangeLog: * c-decl.cc (flexible_array_member_type_p): New function. (one_element_array_type_p): Likewise. (zero_length_array_type_p): Likewise. (add_flexible_array_elts_to_size): Call new utility routine flexible_array_member_type_p. (is_flexible_array_member_p): New function. (finish_struct): Set the new DECL_NOT_FLEXARRAY flag. gcc/cp/ChangeLog: * module.cc (trees_out::core_bools): Stream out new bit decl_not_flexarray. (trees_in::core_bools): Stream in new bit decl_not_flexarray. gcc/ChangeLog: * doc/extend.texi: Document strict_flex_array attribute. * doc/invoke.texi: Document -fstrict-flex-arrays[=n] option. * print-tree.cc (print_node): Print new bit decl_not_flexarray. * tree-core.h (struct tree_decl_common): New bit field decl_not_flexarray. * tree-streamer-in.cc (unpack_ts_decl_common_value_fields): Stream in new bit decl_not_flexarray. * tree-streamer-out.cc (pack_ts_decl_common_value_fields): Stream out new bit decl_not_flexarray. * tree.cc (array_at_struct_end_p): Update it with the new bit field decl_not_flexarray. * tree.h (DECL_NOT_FLEXARRAY): New flag. gcc/testsuite/ChangeLog: * g++.dg/strict-flex-array-1.C: New test. * gcc.dg/strict-flex-array-1.c: New test.
2022-09-27c++: Implement P1467R9 - Extended floating-point types and standard names ↵Jakub Jelinek1-0/+4
compiler part except for bfloat16 [PR106652] The following patch implements the compiler part of C++23 P1467R9 - Extended floating-point types and standard names compiler part by introducing _Float{16,32,64,128} as keywords and builtin types like they are implemented for C already since GCC 7, with DF{16,32,64,128}_ mangling. It also introduces _Float{32,64,128}x for C++ with the https://github.com/itanium-cxx-abi/cxx-abi/pull/147 proposed mangling of DF{32,64,128}x. The patch doesn't add anything for bfloat16_t support, as right now __bf16 type refuses all conversions and arithmetic operations. The patch wants to keep backwards compatibility with how __float128 has been handled in C++ before, both for mangling and behavior in binary operations, overload resolution etc. So, there are some backend changes where for C __float128 and _Float128 are the same type (float128_type_node and float128t_type_node are the same pointer), but for C++ they are distinct types which mangle differently and _Float128 is treated as extended floating-point type while __float128 is treated as non-standard floating point type. The various C++23 changes about how floating-point types are changed are actually implemented as written in the spec only if at least one of the types involved is _Float{16,32,64,128,32x,64x,128x} (_FloatNx are also treated as extended floating-point types) and kept previous behavior otherwise. For float/double/long double the rules are actually written that they behave the same as before. There is some backwards incompatibility at least on x86 regarding _Float16, because that type was already used by that name and with the DF16_ mangling (but only since GCC 12 and I think it isn't that widely used in the wild yet). E.g. config/i386/avx512fp16intrin.h shows the issues, where in C or in GCC 12 in C++ one could pass 0.0f to a builtin taking _Float16 argument, but with the changes that is not possible anymore, one needs to either use 0.0f16 or (_Float16) 0.0f. We have also a problem with glibc headers, where since glibc 2.27 math.h and complex.h aren't compilable with these changes. One gets errors like: In file included from /usr/include/math.h:43, from abc.c:1: /usr/include/bits/floatn.h:86:9: error: multiple types in one declaration 86 | typedef __float128 _Float128; | ^~~~~~~~~~ /usr/include/bits/floatn.h:86:20: error: declaration does not declare anything [-fpermissive] 86 | typedef __float128 _Float128; | ^~~~~~~~~ In file included from /usr/include/bits/floatn.h:119: /usr/include/bits/floatn-common.h:214:9: error: multiple types in one declaration 214 | typedef float _Float32; | ^~~~~ /usr/include/bits/floatn-common.h:214:15: error: declaration does not declare anything [-fpermissive] 214 | typedef float _Float32; | ^~~~~~~~ /usr/include/bits/floatn-common.h:251:9: error: multiple types in one declaration 251 | typedef double _Float64; | ^~~~~~ /usr/include/bits/floatn-common.h:251:16: error: declaration does not declare anything [-fpermissive] 251 | typedef double _Float64; | ^~~~~~~~ This is from snippets like: /* The remaining of this file provides support for older compilers. */ # if __HAVE_FLOAT128 /* The type _Float128 exists only since GCC 7.0. */ # if !__GNUC_PREREQ (7, 0) || defined __cplusplus typedef __float128 _Float128; # endif where it hardcodes that C++ doesn't have _Float{16,32,64,128,32x,64x,128x} support nor {f,F}{16,32,64,128}{,x} literal suffixes nor _Complex _Float{16,32,64,128,32x,64x,128x}. The patch fixincludes this for now and hopefully if this is committed, then glibc can change those. The patch changes those # if !__GNUC_PREREQ (7, 0) || defined __cplusplus conditions to # if !__GNUC_PREREQ (7, 0) || (defined __cplusplus && !__GNUC_PREREQ (13, 0)) Another thing is mangling, as said above, Itanium C++ ABI specifies DF <number> _ as _Float{16,32,64,128} mangling, but GCC was implementing a mangling incompatible with that starting with DF for fixed point types. Fixed point was never supported in C++ though, I believe the reason why the mangling has been added was that due to a bug it would leak into the C++ FE through decltype (0.0r) etc. But that has been shortly after the mangling was added fixed (I think in the same GCC release cycle), so we now reject 0.0r etc. in C++. If we ever need the fixed point mangling, I think it can be readded but better with a different prefix so that it doesn't conflict with the published standard manglings. So, this patch also kills the fixed point mangling and implements the DF <number> _ demangling. The patch predefines __STDCPP_FLOAT{16,32,64,128}_T__ macros when those types are available, but only for C++23, while the underlying types are available in C++98 and later including the {f,F}{16,32,64,128} literal suffixes (but those with a pedwarn for C++20 and earlier). My understanding is that it needs to be predefined by the compiler, on the other side predefining even for older modes when <stdfloat> is a new C++23 header would be weird. One can find out if _Float{16,32,64,128,32x,64x,128x} is supported in C++ by __GNUC__ >= 13 && defined(__FLT{16,32,64,128,32X,64X,128X}_MANT_DIG__) (but that doesn't work well with older G++ 13 snapshots). As for std::bfloat16_t, three targets (aarch64, arm and x86) apparently "support" __bf16 type which has the bfloat16 format, but isn't really usable, e.g. {aarch64,arm,ix86}_invalid_conversion disallow any conversions from or to type with BFmode, {aarch64,arm,ix86}_invalid_unary_op disallows any unary operations on those except for ADDR_EXPR and {aarch64,arm,ix86}_invalid_binary_op disallows any binary operation on those. So, I think we satisfy: "If the implementation supports an extended floating-point type with the properties, as specified by ISO/IEC/IEEE 60559, of radix (b) of 2, storage width in bits (k) of 16, precision in bits (p) of 8, maximum exponent (emax) of 127, and exponent field width in bits (w) of 8, then the typedef-name std::bfloat16_t is defined in the header <stdfloat> and names such a type, the macro __STDCPP_BFLOAT16_T__ is defined, and the floating-point literal suffixes bf16 and BF16 are supported." because we don't really support those right now. 2022-09-27 Jakub Jelinek <jakub@redhat.com> PR c++/106652 PR c++/85518 gcc/ * tree-core.h (enum tree_index): Add TI_FLOAT128T_TYPE enumerator. * tree.h (float128t_type_node): Define. * tree.cc (build_common_tree_nodes): Initialize float128t_type_node. * builtins.def (DEF_FLOATN_BUILTIN): Adjust comment now that _Float<N> is supported in C++ too. * config/i386/i386.cc (ix86_mangle_type): Only mangle as "g" float128t_type_node. * config/i386/i386-builtins.cc (ix86_init_builtin_types): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. * config/i386/avx512fp16intrin.h (_mm_setzero_ph, _mm256_setzero_ph, _mm512_setzero_ph, _mm_set_sh, _mm_load_sh): Use 0.0f16 instead of 0.0f. * config/ia64/ia64.cc (ia64_init_builtins): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. * config/rs6000/rs6000-c.cc (is_float128_p): Also return true for float128t_type_node if non-NULL. * config/rs6000/rs6000.cc (rs6000_mangle_type): Don't mangle float128_type_node as "u9__ieee128". * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Use float128t_type_node for __float128 instead of float128_type_node and create it if NULL. gcc/c-family/ * c-common.cc (c_common_reswords): Change _Float{16,32,64,128} and _Float{32,64,128}x flags from D_CONLY to 0. (shorten_binary_op): Punt if common_type returns error_mark_node. (shorten_compare): Likewise. (c_common_nodes_and_builtins): For C++ record _Float{16,32,64,128} and _Float{32,64,128}x builtin types if available. For C++ clear float128t_type_node. * c-cppbuiltin.cc (c_cpp_builtins): Predefine __STDCPP_FLOAT{16,32,64,128}_T__ for C++23 if supported. * c-lex.cc (interpret_float): For q/Q suffixes prefer float128t_type_node over float128_type_node. Allow {f,F}{16,32,64,128} suffixes for C++ if supported with pedwarn for C++20 and older. Allow {f,F}{32,64,128}x suffixes for C++ with pedwarn. Don't call excess_precision_type for C++. gcc/cp/ * cp-tree.h (cp_compare_floating_point_conversion_ranks): Implement P1467R9 - Extended floating-point types and standard names except for std::bfloat16_t for now. Declare. (extended_float_type_p): New inline function. * mangle.cc (write_builtin_type): Mangle float{16,32,64,128}_type_node as DF{16,32,64,128}_. Mangle float{32,64,128}x_type_node as DF{32,64,128}x. Remove FIXED_POINT_TYPE mangling that conflicts with that. * typeck2.cc (check_narrowing): If one of ftype or type is extended floating-point type, compare floating-point conversion ranks. * parser.cc (cp_keyword_starts_decl_specifier_p): Handle CASE_RID_FLOATN_NX. (cp_parser_simple_type_specifier): Likewise and diagnose missing _Float<N> or _Float<N>x support if not supported by target. * typeck.cc (cp_compare_floating_point_conversion_ranks): New function. (cp_common_type): If both types are REAL_TYPE and one or both are extended floating-point types, select common type based on comparison of floating-point conversion ranks and subranks. (cp_build_binary_op): Diagnose operation with floating point arguments with unordered conversion ranks. * call.cc (standard_conversion): For floating-point conversion, if either from or to are extended floating-point types, set conv->bad_p for implicit conversion from larger to smaller conversion rank or with unordered conversion ranks. (convert_like_internal): Emit a pedwarn on such conversions. (build_conditional_expr): Diagnose operation with floating point arguments with unordered conversion ranks. (convert_arg_to_ellipsis): Don't promote extended floating-point types narrower than double to double. (compare_ics): Implement P1467R9 [over.ics.rank]/4 changes. gcc/testsuite/ * g++.dg/cpp23/ext-floating1.C: New test. * g++.dg/cpp23/ext-floating2.C: New test. * g++.dg/cpp23/ext-floating3.C: New test. * g++.dg/cpp23/ext-floating4.C: New test. * g++.dg/cpp23/ext-floating5.C: New test. * g++.dg/cpp23/ext-floating6.C: New test. * g++.dg/cpp23/ext-floating7.C: New test. * g++.dg/cpp23/ext-floating8.C: New test. * g++.dg/cpp23/ext-floating9.C: New test. * g++.dg/cpp23/ext-floating10.C: New test. * g++.dg/cpp23/ext-floating.h: New file. * g++.target/i386/float16-1.C: Adjust expected diagnostics. libcpp/ * expr.cc (interpret_float_suffix): Allow {f,F}{16,32,64,128} and {f,F}{32,64,128}x suffixes for C++. include/ * demangle.h (enum demangle_component_type): Add DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. (struct demangle_component): Add u.s_extended_builtin member. libiberty/ * cp-demangle.c (d_dump): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Don't handle DEMANGLE_COMPONENT_FIXED_TYPE. (d_make_extended_builtin_type): New function. (cplus_demangle_builtin_types): Add _Float entry. (cplus_demangle_type): For DF demangle it as _Float<N> or _Float<N>x rather than fixed point which conflicts with it. (d_count_templates_scopes): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Just break; for DEMANGLE_COMPONENT_FIXED_TYPE. (d_find_pack): Handle DEMANGLE_COMPONENT_EXTENDED_BUILTIN_TYPE. Don't handle DEMANGLE_COMPONENT_FIXED_TYPE. (d_print_comp_inner): Likewise. * cp-demangle.h (D_BUILTIN_TYPE_COUNT): Bump. * testsuite/demangle-expected: Replace _Z3xxxDFyuVb test with _Z3xxxDF16_DF32_DF64_DF128_CDF16_Vb. Add _Z3xxxDF32xDF64xDF128xCDF32xVb test. fixincludes/ * inclhack.def (glibc_cxx_floatn_1, glibc_cxx_floatn_2, glibc_cxx_floatn_3): New fixes. * tests/base/bits/floatn.h: New file. * fixincl.x: Regenerated.
2022-09-03openmp: Partial OpenMP 5.2 doacross and omp_cur_iteration supportJakub Jelinek1-1/+13
The following patch implements part of the OpenMP 5.2 changes related to ordered loops and with the assumed resolution of https://github.com/OpenMP/spec/issues/3302 issues. The changes are: 1) the depend clause on stand-alone ordered constructs has been renamed to doacross (because depend clause has different syntax on other constructs) with some syntax changes below, depend clause is deprecated (we'll deprecate stuff on the GCC side only when we have everything else from 5.2 implemented) depend(source) -> doacross(source:) or doacross(source:omp_cur_iteration) depend(sink:vec) -> doacross(sink:vec) (where vec has the same syntax as before) 2) in 5.1 and before it has been significant whether ordered clause has or doesn't have an argument, if it didn't, only block-associated ordered could appear in the body, if it did, only stand-alone ordered could appear in the body, all loops had to be perfectly nested, no associated range-based for loops, no linear clause on work-sharing loop and ordered clause with an argument wasn't allowed on composite for simd. In 5.2, whether ordered clause has or doesn't have an argument is insignificant (except for bugs in the standard, #3302 mentions those), if the argument is missing, it is simply treated as equal to collapse argument (if any, otherwise 1). The implementation better should be able to differentiate between ordered and doacross loops at compile time which previously was through the absence or presence of the argument, now it is done through looking at the body of the construct lexically and looking for stand-alone ordered constructs. If there are any, it is to be handled as doacross loop, otherwise it is ordered loop (but in that case ordered argument if present must be equal to collapse argument - 5.2 says instead it must be one, but that is clearly wrong and mentioned in #3302) - stand-alone ordered constructs must appear lexically in the body (and had to before as well). For the restrictions mentioned above, the for simd restriction is gone (stand-alone ordered can't appear in simd construct, so that is enough), and the other rules are expected to be changed into something related to presence of stand-alone ordered constructs in the body 3) 5.2 allows a new syntax, doacross(sink:omp_cur_iteration-1), which means wait for previous iteration in the iteration space of all the associated loops The following patch implements that, except that we sorry for now on the doacross(sink:omp_cur_iteration-1) syntax during omp expansion because library side isn't done yet for it. It doesn't implement it for the Fortran FE either. Incrementally, I'd like to change the way we differentiate between stand-alone and block-associated ordered constructs, because the current way of looking for presence of doacross clause doesn't work well if those clauses are removed because they had been invalid (wrong syntax or unknown variables in it etc.) and of course implement doacross(sink:omp_cur_iteration-1). 2022-09-03 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_DOACROSS. (enum omp_clause_depend_kind): Remove OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK, add OMP_CLAUSE_DEPEND_INVALID. (enum omp_clause_doacross_kind): New type. (struct tree_omp_clause): Add subcode.doacross_kind member. * tree.h (OMP_CLAUSE_DEPEND_SINK_NEGATIVE): Remove. (OMP_CLAUSE_DOACROSS_KIND): Define. (OMP_CLAUSE_DOACROSS_SINK_NEGATIVE): Define. (OMP_CLAUSE_DOACROSS_DEPEND): Define. (OMP_CLAUSE_ORDERED_DOACROSS): Define. * tree.cc (omp_clause_num_ops, omp_clause_code_name): Add OMP_CLAUSE_DOACROSS entries. * tree-nested.cc (convert_nonlocal_omp_clauses, convert_local_omp_clauses): Handle OMP_CLAUSE_DOACROSS. * tree-pretty-print.cc (dump_omp_clause): Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. Handle OMP_CLAUSE_DOACROSS. * gimplify.cc (gimplify_omp_depend): Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. (gimplify_scan_omp_clauses): Likewise. Handle OMP_CLAUSE_DOACROSS. (gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_DOACROSS. (find_standalone_omp_ordered): New function. (gimplify_omp_for): When OMP_CLAUSE_ORDERED is present, search body for OMP_ORDERED with OMP_CLAUSE_DOACROSS and if found, set OMP_CLAUSE_ORDERED_DOACROSS. (gimplify_omp_ordered): Don't handle OMP_CLAUSE_DEPEND_SINK or OMP_CLAUSE_DEPEND_SOURCE, instead check OMP_CLAUSE_DOACROSS, adjust diagnostics that presence or absence of ordered clause parameter is irrelevant. Handle doacross(sink:omp_cur_iteration-1). Use actual user name of the clause - doacross or depend - in diagnostics. * omp-general.cc (omp_extract_for_data): Don't set fd->ordered if !OMP_CLAUSE_ORDERED_DOACROSS (t). If OMP_CLAUSE_ORDERED_DOACROSS (t) but !OMP_CLAUSE_ORDERED_EXPR (t), set fd->ordered to -1 and set it after the loop in that case to fd->collapse. * omp-low.cc (check_omp_nesting_restrictions): Don't handle OMP_CLAUSE_DEPEND_SOURCE nor OMP_CLAUSE_DEPEND_SINK, instead check OMP_CLAUSE_DOACROSS. Use actual user name of the clause - doacross or depend - in diagnostics. Diagnose mixing of stand-alone and block associated ordered constructs binding to the same loop. (lower_omp_ordered_clauses): Don't handle OMP_CLAUSE_DEPEND_SINK, instead handle OMP_CLAUSE_DOACROSS. (lower_omp_ordered): Look for OMP_CLAUSE_DOACROSS instead of OMP_CLAUSE_DEPEND. (lower_depend_clauses): Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. * omp-expand.cc (expand_omp_ordered_sink): Emit a sorry for doacross(sink:omp_cur_iteration-1). (expand_omp_ordered_source_sink): Use OMP_CLAUSE_DOACROSS_SINK_NEGATIVE instead of OMP_CLAUSE_DEPEND_SINK_NEGATIVE. Use actual user name of the clause - doacross or depend - in diagnostics. (expand_omp): Look for OMP_CLAUSE_DOACROSS clause instead of OMP_CLAUSE_DEPEND. (build_omp_regions_1): Likewise. (omp_make_gimple_edges): Likewise. * lto-streamer-out.cc (hash_tree): Handle OMP_CLAUSE_DOACROSS. * tree-streamer-in.cc (unpack_ts_omp_clause_value_fields): Likewise. * tree-streamer-out.cc (pack_ts_omp_clause_value_fields): Likewise. gcc/c-family/ * c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_DOACROSS. * c-omp.cc (c_finish_omp_depobj): Check also for OMP_CLAUSE_DOACROSS clause and diagnose it. Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. Assert kind is not OMP_CLAUSE_DEPEND_INVALID. gcc/c/ * c-parser.cc (c_parser_omp_clause_name): Handle doacross. (c_parser_omp_clause_depend_sink): Renamed to ... (c_parser_omp_clause_doacross_sink): ... this. Add depend_p argument. Handle parsing of doacross(sink:omp_cur_iteration-1). Use OMP_CLAUSE_DOACROSS_SINK_NEGATIVE instead of OMP_CLAUSE_DEPEND_SINK_NEGATIVE, build OMP_CLAUSE_DOACROSS instead of OMP_CLAUSE_DEPEND and set OMP_CLAUSE_DOACROSS_DEPEND flag on it. (c_parser_omp_clause_depend): Use OMP_CLAUSE_DOACROSS_SINK and OMP_CLAUSE_DOACROSS_SOURCE instead of OMP_CLAUSE_DEPEND_SINK and OMP_CLAUSE_DEPEND_SOURCE, build OMP_CLAUSE_DOACROSS for depend(source) and set OMP_CLAUSE_DOACROSS_DEPEND on it. (c_parser_omp_clause_doacross): New function. (c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DOACROSS. (c_parser_omp_depobj): Use OMP_CLAUSE_DEPEND_INVALID instead of OMP_CLAUSE_DEPEND_SOURCE. (c_parser_omp_for_loop): Don't diagnose here linear clause together with ordered with argument. (c_parser_omp_simd): Don't diagnose ordered clause with argument on for simd. (OMP_ORDERED_DEPEND_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_DOACROSS. (c_parser_omp_ordered): Handle also doacross and adjust for it diagnostic wording. * c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_DOACROSS. Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. gcc/cp/ * parser.cc (cp_parser_omp_clause_name): Handle doacross. (cp_parser_omp_clause_depend_sink): Renamed to ... (cp_parser_omp_clause_doacross_sink): ... this. Add depend_p argument. Handle parsing of doacross(sink:omp_cur_iteration-1). Use OMP_CLAUSE_DOACROSS_SINK_NEGATIVE instead of OMP_CLAUSE_DEPEND_SINK_NEGATIVE, build OMP_CLAUSE_DOACROSS instead of OMP_CLAUSE_DEPEND and set OMP_CLAUSE_DOACROSS_DEPEND flag on it. (cp_parser_omp_clause_depend): Use OMP_CLAUSE_DOACROSS_SINK and OMP_CLAUSE_DOACROSS_SOURCE instead of OMP_CLAUSE_DEPEND_SINK and OMP_CLAUSE_DEPEND_SOURCE, build OMP_CLAUSE_DOACROSS for depend(source) and set OMP_CLAUSE_DOACROSS_DEPEND on it. (cp_parser_omp_clause_doacross): New function. (cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DOACROSS. (cp_parser_omp_depobj): Use OMP_CLAUSE_DEPEND_INVALID instead of OMP_CLAUSE_DEPEND_SOURCE. (cp_parser_omp_for_loop): Don't diagnose here linear clause together with ordered with argument. (cp_parser_omp_simd): Don't diagnose ordered clause with argument on for simd. (OMP_ORDERED_DEPEND_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_DOACROSS. (cp_parser_omp_ordered): Handle also doacross and adjust for it diagnostic wording. * pt.cc (tsubst_omp_clause_decl): Use OMP_CLAUSE_DOACROSS_SINK_NEGATIVE instead of OMP_CLAUSE_DEPEND_SINK_NEGATIVE. (tsubst_omp_clauses): Handle OMP_CLAUSE_DOACROSS. (tsubst_expr): Use OMP_CLAUSE_DEPEND_INVALID instead of OMP_CLAUSE_DEPEND_SOURCE. * semantics.cc (cp_finish_omp_clause_depend_sink): Rename to ... (cp_finish_omp_clause_doacross_sink): ... this. (finish_omp_clauses): Handle OMP_CLAUSE_DOACROSS. Don't handle OMP_CLAUSE_DEPEND_SOURCE and OMP_CLAUSE_DEPEND_SINK. gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Use OMP_CLAUSE_DOACROSS_SINK_NEGATIVE instead of OMP_CLAUSE_DEPEND_SINK_NEGATIVE, build OMP_CLAUSE_DOACROSS clause instead of OMP_CLAUSE_DEPEND and set OMP_CLAUSE_DOACROSS_DEPEND on it. gcc/testsuite/ * c-c++-common/gomp/doacross-2.c: Adjust expected diagnostics. * c-c++-common/gomp/doacross-5.c: New test. * c-c++-common/gomp/doacross-6.c: New test. * c-c++-common/gomp/nesting-2.c: Adjust expected diagnostics. * c-c++-common/gomp/ordered-3.c: Likewise. * c-c++-common/gomp/sink-3.c: Likewise. * gfortran.dg/gomp/nesting-2.f90: Likewise.