Age | Commit message (Collapse) | Author | Files | Lines |
|
'construct={target}' called directly
void f(void)
{
#pragma omp metadirective \
when (construct={target}: A) \
default (B)
...
}
...
{
#pragma omp target
f(); // Target call
f(); // Local call
}
With the OpenMP 5.0/5.1 specifications, we would expect A to be selected in
the metadirective when the target call is made, but B when f is called
directly outside of a target context. However, since GCC does not have
separate copies of f for local and target calls, and the construct selector
is static, it must be resolved one way or the other at compile-time (currently
in the favour of selecting A), which may be unexpected behaviour.
This patch attempts to detect the above situation, and will emit a warning
if found.
2022-01-28 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* gimplify.cc (gimplify_omp_metadirective): Mark offloadable functions
containing metadirectives with 'construct={target}' in the selector.
* omp-general.cc (omp_has_target_constructor_p): New.
* omp-general.h (omp_has_target_constructor_p): New prototype.
* omp-low.cc (lower_omp_1): Emit warning if marked functions called
outside of a target context.
gcc/testsuite/
* c-c++-common/gomp/metadirective-4.c (main): Add expected warning.
* gfortran.dg/gomp/metadirective-4.f90 (test): Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/metadirective-2.c (main): Add
expected warning.
* testsuite/libgomp.fortran/metadirective-2.f90 (test): Likewise.
|
|
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* builtin-types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR): New
type.
* omp-builtins.def (BUILT_IN_GOMP_EVALUATE_TARGET_DEVICE): New builtin.
* omp-general.cc (omp_context_selector_matches): Handle 'target_device'
selector set.
(omp_dynamic_cond): Generate expression tree for 'target_device'
selector set.
(omp_context_compute_score): Handle selectors in 'target_device' set.
gcc/c/
* c-parser.cc (omp_target_device_selectors): New.
(c_parser_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(c_parser_omp_context_selector_specification): Handle 'target_device'
selector set.
gcc/cp/
* parser.cc (omp_target_device_selectors): New.
(cp_parser_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(cp_parser_omp_context_selector_specification): Handle 'target_device'
selector set.
gcc/fortran/
* openmp.cc (omp_target_device_selectors): New.
(gfc_match_omp_context_selector): Accept 'target_device' selector set.
Treat 'device_num' selector as expression.
(gfc_match_omp_context_selector_specification): Handle 'target_device'
selector set.
* types.def (BT_FN_BOOL_INT_CONST_PTR_CONST_PTR_CONST_PTR): New type.
gcc/testsuite/
* c-c++-common/gomp/metadirective-7.c: New.
* gfortran.dg/gomp/metadirective-7.f90: New.
libgomp/
* Makefile.am (libgomp_la_SOURCES): Add selector.c.
* Makefile.am: Regenerate.
* config/gcn/selector.c: New.
* config/linux/selector.c: New.
* config/linux/x86/selector.c: New.
* config/nvptx/selector.c: New.
* libgomp-plugin.h (GOMP_OFFLOAD_evaluate_device): New.
* libgomp.h (struct gomp_device_descr): Add evaluate_device_func field.
* libgomp.map (GOMP_5.1): Add GOMP_evaluate_target_device.
* libgomp_g.h (GOMP_evaluate_current_device): New.
(GOMP_evaluate_target_device): New.
* oacc-host.c (host_evaluate_device): New.
(host_openacc_exec): Initialize evaluate_device_func field to
host_evaluate_device.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_evaluate_device): New.
* plugin/plugin-nvptx.c (struct ptx_device): Add compute_major and
compute_minor fields.
(nvptx_open_device): Read compute capability information from device.
(CHECK_ISA): New macro.
(GOMP_OFFLOAD_evaluate_device): New.
* selector.c: New.
* target.c (GOMP_evaluate_target_device): New.
(gomp_load_plugin_for_device): Load evaulate_device plugin function.
* testsuite/libgomp.c-c++-common/metadirective-5.c: New testcase.
* testsuite/libgomp.fortran/metadirective-5.f90: New testcase.
|
|
Fix regressions introduced by block/statement skipping.
If user condition selector is constant, do not return it as a dynamic
selector.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Track
bracket depth separately from nesting depth.
gcc/cp/
* parser.cc (cp_parser_skip_to_end_of_statement): Revert.
(cp_parser_skip_to_end_of_block_or_statement): Track bracket depth
separately from nesting depth.
gcc/
* omp-general.cc (omp_dynamic_cond): Do not return user condition if
constant.
|
|
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/gomp/metadirective-1.c: New.
* c-c++-common/gomp/metadirective-2.c: New.
* c-c++-common/gomp/metadirective-3.c: New.
* c-c++-common/gomp/metadirective-4.c: New.
* c-c++-common/gomp/metadirective-5.c: New.
* c-c++-common/gomp/metadirective-6.c: New.
* gcc.dg/gomp/metadirective-1.c: New.
* gfortran.dg/gomp/metadirective-1.f90: New.
* gfortran.dg/gomp/metadirective-2.f90: New.
* gfortran.dg/gomp/metadirective-3.f90: New.
* gfortran.dg/gomp/metadirective-4.f90: New.
* gfortran.dg/gomp/metadirective-5.f90: New.
* gfortran.dg/gomp/metadirective-6.f90: New.
libgomp/
* testsuite/libgomp.c-c++-common/metadirective-1.c: New.
* testsuite/libgomp.c-c++-common/metadirective-2.c: New.
* testsuite/libgomp.c-c++-common/metadirective-3.c: New.
* testsuite/libgomp.c-c++-common/metadirective-4.c: New.
* testsuite/libgomp.fortran/metadirective-1.f90: New.
* testsuite/libgomp.fortran/metadirective-2.f90: New.
* testsuite/libgomp.fortran/metadirective-3.f90: New.
* testsuite/libgomp.fortran/metadirective-4.f90: New.
|
|
This adds support for parsing OpenMP metadirectives in the Fortran front end.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* omp-general.cc (omp_check_context_selector): Revert string length
check.
(omp_context_name_list_prop): Likewise.
gcc/fortran/
* decl.cc (gfc_match_end): Handle COMP_OMP_METADIRECTIVE and
COMP_OMP_BEGIN_METADIRECTIVE.
* dump-parse-tree.cc (show_omp_node): Handle EXEC_OMP_METADIRECTIVE.
(show_code_node): Handle EXEC_OMP_METADIRECTIVE.
* gfortran.h (enum gfc_statement): Add ST_OMP_METADIRECTIVE,
ST_OMP_BEGIN_METADIRECTIVE and ST_OMP_END_METADIRECTIVE.
(struct gfc_omp_metadirective_clause): New structure.
(gfc_get_omp_metadirective_clause): New macro.
(struct gfc_st_label): Add omp_region field.
(enum gfc_exec_op): Add EXEC_OMP_METADIRECTIVE.
(struct gfc_code): Add omp_metadirective_clauses field.
(gfc_free_omp_metadirective_clauses): New prototype.
(match_omp_directive): New prototype.
* io.cc (format_asterisk): Initialize omp_region field.
* match.h (gfc_match_omp_begin_metadirective): New prototype.
(gfc_match_omp_metadirective): New prototype.
* openmp.cc (gfc_match_omp_eos): Match ')' in context selectors.
(gfc_free_omp_metadirective_clauses): New.
(gfc_match_omp_clauses): Remove context_selector argument. Rely on
gfc_match_omp_eos to match end of clauses.
(match_omp): Remove extra argument to gfc_match_omp_clauses.
(gfc_match_omp_context_selector): Remove extra argument to
gfc_match_omp_clauses. Set gfc_matching_omp_context_selector
before call to gfc_match_omp_clauses and reset after.
(gfc_match_omp_context_selector_specification): Modify to take a
gfc_omp_set_selector** argument.
(gfc_match_omp_declare_variant): Pass set_selectors to
gfc_match_omp_context_selector_specification.
(match_omp_metadirective): New.
(gfc_match_omp_begin_metadirective): New.
(gfc_match_omp_metadirective): New.
(resolve_omp_metadirective): New.
(gfc_resolve_omp_directive): Handle EXEC_OMP_METADIRECTIVE.
* parse.cc (gfc_matching_omp_context_selector): New variable.
(gfc_in_metadirective_body): New variable.
(gfc_omp_region_count): New variable.
(decode_omp_directive): Match 'begin metadirective',
'end metadirective' and 'metadirective'.
(match_omp_directive): New.
(case_omp_structured_block): New.
(case_omp_do): New.
(gfc_ascii_statement): Handle metadirective statements.
(gfc_omp_end_stmt): New.
(parse_omp_do): Delegate to gfc_omp_end_stmt.
(parse_omp_structured_block): Delegate to gfc_omp_end_stmt. Handle
ST_OMP_END_METADIRECTIVE.
(parse_omp_metadirective_body): New.
(parse_executable): Delegate to case_omp_structured_block and
case_omp_do. Return after one statement if compiling regular
metadirective. Handle metadirective statements.
(gfc_parse_file): Reset gfc_omp_region_count,
gfc_in_metadirective_body and gfc_matching_omp_context_selector.
* parse.h (enum gfc_compile_state): Add COMP_OMP_METADIRECTIVE and
COMP_OMP_BEGIN_METADIRECTIVE.
(gfc_omp_end_stmt): New prototype.
(gfc_matching_omp_context_selector): New declaration.
(gfc_in_metadirective_body): New declaration.
(gfc_omp_region_count): New declaration.
* resolve.cc (gfc_resolve_code): Handle EXEC_OMP_METADIRECTIVE.
* st.cc (gfc_free_statement): Handle EXEC_OMP_METADIRECTIVE.
* symbol.cc (compare_st_labels): Take omp_region into account.
(gfc_get_st_labels): Incorporate omp_region into label.
* trans-decl.cc (gfc_get_label_decl): Add omp_region into translated
label name.
* trans-openmp.cc (gfc_trans_omp_directive): Handle
EXEC_OMP_METADIRECTIVE.
(gfc_trans_omp_set_selector): Hoist code from...
(gfc_trans_omp_declare_variant): ...here.
(gfc_trans_omp_metadirective): New.
* trans-stmt.h (gfc_trans_omp_metadirective): New prototype.
* trans.cc (trans_code): Handle EXEC_OMP_METADIRECTIVE.
|
|
This adds support for parsing OpenMP metadirectives in the C++ front end.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/cp/
* parser.cc (cp_parser_skip_to_end_of_statement): Handle parentheses.
(cp_parser_skip_to_end_of_block_or_statement): Likewise.
(cp_parser_omp_context_selector): Add extra argument. Allow
non-constant expressions.
(cp_parser_omp_context_selector_specification): Add extra argument and
propagate to cp_parser_omp_context_selector.
(analyze_metadirective_body): New.
(cp_parser_omp_metadirective): New.
(cp_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.
(cp_parser_pragma): Handle PRAGMA_OMP_METADIRECTIVE.
|
|
This patch adds support for streaming metadirective Gimple statements during
LTO, and adds a metadirective expansion pass that runs after LTO. This is
required for metadirectives with selectors that can only be resolved from
within the accel compiler.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* Makefile.in (OBJS): Add omp-expand-metadirective.o.
* gimple-streamer-in.cc (input_gimple_stmt): Add case for
GIMPLE_OMP_METADIRECTIVE. Handle metadirective labels.
* gimple-streamer-out.cc (output_gimple_stmt): Likewise.
* omp-expand-metadirective.cc: New.
* passes.def: Add pass_omp_expand_metadirective.
* tree-pass.h (make_pass_omp_expand_metadirective): New prototype.
|
|
Gimplification
This adds support for resolving metadirectives according to the OpenMP 5.1
specification. The variants are sorted by score, then gathered into a list
of dynamic replacement candidates. The metadirective is then expanded into
a sequence of 'if..else' statements to test the dynamic selector and execute
the variant if the selector is satisfied.
If any of the selectors in the list are unresolvable, GCC will give up on
resolving the metadirective and try again later.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* gimplify.cc (expand_omp_metadirective): New.
* omp-general.cc: Include tree-pretty-print.h.
(DELAY_METADIRECTIVES_AFTER_LTO): New macro.
(omp_context_selector_matches): Delay resolution of selectors. Allow
non-constant expressions.
(omp_dynamic_cond): New.
(omp_dynamic_selector_p): New.
(sort_variant): New.
(omp_get_dynamic_candidates): New.
(omp_resolve_metadirective): New.
(omp_resolve_metadirective): New.
* omp-general.h (struct omp_metadirective_variant): New.
(omp_resolve_metadirective): New prototype.
gcc/c-family/
* c-omp.cc (c_omp_expand_metadirective_r): New.
(c_omp_expand_metadirective): New.
|
|
This adds a new Gimple statement type GIMPLE_OMP_METADIRECTIVE, which
represents the metadirective in Gimple. In high Gimple, the statement
contains the body of the directive variants, whereas in low Gimple, it
only contains labels to the bodies.
This patch adds support for converting metadirectives from tree to Gimple
form, and handling of the Gimple form (Gimple lowering, OpenMP lowering
and expansion, inlining, SSA handling etc).
Metadirectives should be resolved before they reach the back-end, otherwise
the compiler will crash as GCC does not know how to convert metadirective
Gimple statements to RTX.
2022-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* gimple-low.cc (lower_omp_metadirective): New.
(lower_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
* gimple-pretty-print.cc (dump_gimple_omp_metadirective): New.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_METADIRECTIVE.
* gimple-walk.cc (walk_gimple_op): Handle GIMPLE_OMP_METADIRECTIVE.
(walk_gimple_stmt): Likewise.
* gimple.cc (gimple_alloc_omp_metadirective): New.
(gimple_build_omp_metadirective): New.
(gimple_build_omp_metadirective_variant): New.
* gimple.def (GIMPLE_OMP_METADIRECTIVE): New.
(GIMPLE_OMP_METADIRECTIVE_VARIANT): New.
* gimple.h (gomp_metadirective_variant): New.
(gomp_metadirective): New.
(is_a_helper <gomp_metadirective *>::test): New.
(is_a_helper <gomp_metadirective_variant *>::test): New.
(is_a_helper <const gomp_metadirective *>::test): New.
(is_a_helper <const gomp_metadirective_variant *>::test): New.
(gimple_alloc_omp_metadirective): New prototype.
(gimple_build_omp_metadirective): New prototype.
(gimple_build_omp_metadirective_variant): New prototype.
(gimple_has_substatements): Add GIMPLE_OMP_METADIRECTIVE case.
(gimple_has_ops): Add GIMPLE_OMP_METADIRECTIVE.
(gimple_omp_metadirective_label): New.
(gimple_omp_metadirective_set_label): New.
(gimple_omp_metadirective_variants): New.
(gimple_omp_metadirective_set_variants): New.
(CASE_GIMPLE_OMP): Add GIMPLE_OMP_METADIRECTIVE.
* gimplify.cc (is_gimple_stmt): Add OMP_METADIRECTIVE.
(expand_omp_metadirective): New.
(gimplify_omp_metadirective): New.
(gimplify_expr): Add case for OMP_METADIRECTIVE.
* gsstruct.def (GSS_OMP_METADIRECTIVE): New.
(GSS_OMP_METADIRECTIVE_VARIANT): New.
* omp-expand.cc (build_omp_regions_1): Handle GIMPLE_OMP_METADIRECTIVE.
(omp_make_gimple_edges): Likewise.
* omp-low.cc (struct omp_context): Add next_clone field.
(new_omp_context): Initialize next_clone field.
(clone_omp_context): New.
(delete_omp_context): Delete clone contexts.
(scan_omp_metadirective): New.
(scan_omp_1_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(lower_omp_metadirective): New.
(lower_omp_1): Handle GIMPLE_OMP_METADIRECTIVE.
* tree-cfg.cc (cleanup_dead_labels): Handle GIMPLE_OMP_METADIRECTIVE.
(gimple_redirect_edge_and_branch): Likewise.
* tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_OMP_METADIRECTIVE.
(estimate_num_insns): Likewise.
* tree-pretty-print.cc (dump_generic_node): Handle OMP_METADIRECTIVE.
* tree-ssa-operands.cc (parse_ssa_operands): Handle
GIMPLE_OMP_METADIRECTIVE.
|
|
This patch implements parsing for the OpenMP metadirective introduced in
OpenMP 5.0. Metadirectives are parsed into an OMP_METADIRECTIVE node,
with the variant clauses forming a chain accessible via
OMP_METADIRECTIVE_CLAUSES. Each clause contains the context selector
and tree for the variant.
User conditions in the selector are now permitted to be non-constant when
used in metadirectives as specified in OpenMP 5.1.
2021-01-25 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* omp-general.cc (omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New stub function.
* omp-general.h (struct omp_metadirective_variant): New.
(omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New prototype.
* tree.def (OMP_METADIRECTIVE): New.
* tree.h (OMP_METADIRECTIVE_CLAUSES): New macro.
gcc/c/
* c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Handle
parentheses in statement.
(c_parser_omp_metadirective): New prototype.
(c_parser_omp_context_selector): Add extra argument. Allow
non-constant expressions.
(c_parser_omp_context_selector_specification): Add extra argument and
propagate it to c_parser_omp_context_selector.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.
(c_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.
gcc/c-family/
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_metadirective): New prototype.
* c-gimplify.cc (genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Handle OMP_METADIRECTIVE tree nodes.
* c-omp.cc (omp_directives): Classify metadirectives as C_OMP_DIR_META.
(c_omp_expand_metadirective): New stub function.
* c-pragma.cc (omp_pragmas): Add entry for metadirective.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.
|
|
[PR106032]
noce_try_sign_mask as documented will optimize
if (c < 0)
x = t;
else
x = 0;
into x = (c >> bitsm1) & t;
The optimization is done if either t is unconditional
(e.g. for
x = t;
if (c >= 0)
x = 0;
) or if it is cheap. We already check that t doesn't have side-effects,
but if t is conditional, we need to punt also if it may trap or fault,
as we make it unconditional.
I've briefly skimmed other noce_try* optimizations and didn't find one that
would suffer from the same problem.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/106032
* ifcvt.cc (noce_try_sign_mask): Punt if !t_unconditional, and
t may_trap_or_fault_p, even if it is cheap.
* gcc.c-torture/execute/pr106032.c: New test.
(cherry picked from commit a0c30fe3b888f20215f3e040d21b62b603804ca9)
|
|
If expand_cond_expr_using_cmove can't find a cmove optab for a particular
mode, it tries to promote the mode and perform the cmove in the promoted
mode.
The testcase in the patch ICEs on arm because in that case we pass temp which
has the promoted mode (SImode) as target to expand_operands where the
operands have the non-promoted mode (QImode).
Later on the function uses paradoxical subregs:
if (GET_MODE (op1) != mode)
op1 = gen_lowpart (mode, op1);
if (GET_MODE (op2) != mode)
op2 = gen_lowpart (mode, op2);
to change the operand modes.
The following patch fixes it by passing NULL_RTX as target if it has
promoted mode.
2022-06-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/106030
* expr.cc (expand_cond_expr_using_cmove): Pass NULL_RTX instead of
temp to expand_operands if mode has been promoted.
* gcc.c-torture/compile/pr106030.c: New test.
(cherry picked from commit 2df1df945fac85d7b3d084001414a66a2709d8fe)
|
|
The epilogue may clobber LARCH_PROLOGUE_TEMP ($r13/$t1), so it cannot be
used for sibcalls.
gcc/ChangeLog:
PR target/106096
* config/loongarch/loongarch.h (REG_CLASS_CONTENTS): Exclude
$r13 from SIBCALL_REGS.
* config/loongarch/loongarch.cc (loongarch_regno_to_class):
Change $r13 to JIRL_REGS.
gcc/testsuite/ChangeLog:
PR target/106096
* g++.target/loongarch/loongarch.exp: New test support file.
* g++.target/loongarch/pr106096.C: New test.
(cherry picked from commit 020b7d98589bbc928b5a66b1ed56b42af8791355)
|
|
|
|
|
|
|
|
|
|
Adjust tests to changed expectations with the new Graphite-based
"kernels" handling.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr84955-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Adjust.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-acc-loop-reduction-2.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90: Removed.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/acc-icf.c: Adjust.
* c-c++-common/goacc/cache-3-1.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c: Adjust.
* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/classify-serial.c: Adjust.
* c-c++-common/goacc/if-clause-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-1.c: Adjust.
* c-c++-common/goacc/kernels-decompose-2.c: Adjust.
* c-c++-common/goacc/kernels-decompose-ice-1.c: Adjust.
* c-c++-common/goacc/kernels-decompose-ice-2.c: Adjust.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Adjust.
* c-c++-common/goacc/kernels-loop-3.c: Adjust.
* c-c++-common/goacc/loop-2-kernels.c: Adjust.
* c-c++-common/goacc/nested-reductions-2-parallel.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Adjust.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/routine-1.c: Adjust.
* c-c++-common/goacc/routine-level-of-parallelism-2.c: Adjust.
* c-c++-common/goacc/routine-nohost-1.c: Adjust.
* c-c++-common/goacc/uninit-copy-clause.c: Adjust.
* gcc.dg/goacc/loop-processing-1.c: Adjust.
* gcc.dg/goacc/nested-function-1.c: Adjust.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust.
* gfortran.dg/goacc/classify-kernels.f95: Adjust.
* gfortran.dg/goacc/classify-parallel.f95: Adjust.
* gfortran.dg/goacc/classify-routine.f95: Adjust.
* gfortran.dg/goacc/classify-serial.f95: Adjust.
* gfortran.dg/goacc/common-block-3.f90: Adjust.
* gfortran.dg/goacc/gang-static.f95: Adjust.
* gfortran.dg/goacc/kernels-decompose-1.f95: Adjust.
* gfortran.dg/goacc/kernels-decompose-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Adjust.
* gfortran.dg/goacc/kernels-loop-inner.f95: Adjust.
* gfortran.dg/goacc/kernels-loop.f95: Adjust.
* gfortran.dg/goacc/kernels-tree.f95: Adjust.
* gfortran.dg/goacc/loop-2-kernels.f95: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: Adjust.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: Adjust.
* gfortran.dg/goacc/nested-function-1.f90: Adjust.
* gfortran.dg/goacc/nested-reductions-2-parallel.f90: Adjust.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Adjust.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95: Adjust.
* gfortran.dg/goacc/routine-module-mod-1.f90: Adjust.
* gfortran.dg/goacc/uninit-copy-clause.f95: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Removed.
|
|
Since around GCC 10, the condition `j < (INTMAX_MAX / 10)' will get
optimized into `j != 922337203685477580', which will result in an
infinite loop for certain inputs of `j'.
Copy the condition already used by the -DTILEPRO generator code, which
doesn't fall into this trap.
gcc/ChangeLog:
* config/tilepro/gen-mul-tables.cc (tilegx_emit): Adjust loop
condition to avoid overflow.
(cherry picked from commit c0ad48527c314a1e9354b7c26718b56ed4abc92c)
|
|
Changing the type of N from int to unsigned in decltype82.C (from
r13-986-g0ecb6b906f215e) reveals another spot where we perform constexpr
evaluation in an unevaluated context for sake of warnings, this time
from the call to shorten_compare in cp_build_binary_op, which calls
fold_for_warn.
We could (and probably should) suppress the shorten_compare warnings
when in an unevaluated context, but there's probably other callers of
fold_for_warn that are similarly affected. So this patch takes the
approach of directly suppressing fold_for_warn when in an unevaluated
context.
PR c++/105931
gcc/cp/ChangeLog:
* expr.cc (fold_for_warn): Don't fold when in an unevaluated
context.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/decltype82a.C: New test.
(cherry picked from commit b00b95198e6720eb23a2618870d67800f6180fdd)
|
|
|
|
This testcase was failing because CONSTRUCTOR_IS_DESIGNATED_INIT wasn't
getting set on the introduced CONSTRUCTOR for the anonymous union, and
build_aggr_conv uses that flag to decide whether to pay attention to the
indexes of the CONSTRUCTOR. So set the flag when we see a designator rather
than relying on copying it from another CONSTRUCTOR.
PR c++/105925
gcc/cp/ChangeLog:
* decl.cc (reshape_init_array_1): Set
CONSTRUCTOR_IS_DESIGNATED_INIT here.
(reshape_init_class): And here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/desig26.C: New test.
|
|
We already suppress various warnings for code that would be tautological if
written directly, but not when it's the result of template substitution. It
seems we need to do this for -Waddress as well.
PR c++/105885
gcc/cp/ChangeLog:
* pt.cc (tsubst_copy_and_build): Also suppress -Waddress for
comparison of dependent operands.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/constexpr-if37.C: New test.
|
|
Similarly to cgraph_nodes, it may happen that body_removed is set
during merging of symbols.
PR ipa/105600
gcc/ChangeLog:
* ipa-icf.cc (sem_item_optimizer::filter_removed_items):
Skip variables with body_removed.
(cherry picked from commit 31ce821a790caec8a2849dd67a9847e78a33d14c)
|
|
The addr_expr computation does not check for error_mark_node before
returning the size expression. This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.
Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.
gcc/ChangeLog:
PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.
gcc/testsuite/ChangeLog:
PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4): New struct.
(val3): New variable.
(test_pr105736): New test.
(main): Call it.
Signed-off-by: Siddhesh Poyarekar <siddhesh@gotplt.org>
(cherry picked from commit 70454c50b4592fe6876ecca13268264e395e058f)
|
|
helper<token>::c isn't dependent just because we haven't deduced its return
type yet. type_dependent_expression_p already knows how to deal with that
for bare FUNCTION_DECL, but needs to learn to look through a BASELINK.
PR c++/105964
gcc/cp/ChangeLog:
* pt.cc (type_dependent_expression_p): Look through BASELINK.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/nontype-auto21.C: New test.
|
|
In r12-1273 for PR91706, I removed the code in get_class_binding that
stripped BASELINK. This testcase demonstrates that we still need to strip
it in outer_binding before putting the overload set in IDENTIFIER_BINDING,
for compatibility with bindings added directly for declarations.
PR c++/105908
gcc/cp/ChangeLog:
* name-lookup.cc (outer_binding): Strip BASELINK.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/trailing16.C: New test.
|
|
In f2ebf2d98efe0ac2314b58cf474f44cb8ebd5244 I'd forced the
chosen unroll factor to be a factor of the VF, in order to
work around an exact_div ICE in PR105254. This was completely
bogus -- clearly I didn't look in enough detail at why we ended
up with an unrolled VF that wasn't a multiple of the UF.
Kewen has since fixed the bug properly for PR105940, so this
patch reverts my earlier attempt. Sorry for the stupidity.
gcc/
PR tree-optimization/105254
PR tree-optimization/105940
Revert:
* config/aarch64/aarch64.cc
(aarch64_vector_costs::determine_suggested_unroll_factor): Take a
loop_vec_info as argument. Restrict the unroll factor to values
that divide the VF.
(aarch64_vector_costs::finish_cost): Update call accordingly.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_14.c: New test.
(cherry picked from commit 2636660b6f35423e0cfbf53bfad5c5fed6ae6471)
|
|
As PR105940 shown, when rs6000 port tries to assign
m_suggested_unroll_factor by 4 or so, there will be ICE on:
exact_div (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
loop_vinfo->suggested_unroll_factor);
In function vect_analyze_loop_2, the current place of
suggested_unroll_factor applying can't guarantee it's
applied for all cases. As the case shows, vectorizer
could retry with SLP forced off, the vf is reset by
saved_vectorization_factor which isn't applied with
suggested_unroll_factor before. It means it can end
up with one vf which neglects suggested_unroll_factor.
I think it's off design, we should move the applying
of suggested_unroll_factor after start_over.
PR tree-optimization/105940
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_2): Move the place of
applying suggested_unroll_factor after start_over.
(cherry picked from commit f907cf4c07cf51863dadbe90894e2ae3382bada5)
|
|
|
|
|
|
Disallow siball when calling ifunc functions with PIC register so that
PIC register can be restored.
gcc/
PR target/105960
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if PIC register is used when calling ifunc functions.
gcc/testsuite/
PR target/105960
* gcc.target/i386/pr105960.c: New test.
(cherry picked from commit fe9765c0b97e6b4ce2cd226631d329fc05ba2aa5)
|
|
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops. Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions. If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.
gcc/ChangeLog:
* graphite-scop-detection.cc (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
|
|
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP. The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region. The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop. This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops. This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.
gcc/ChangeLog:
* graphite-scop-detection.cc (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.cc (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
|
|
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes. Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.
This commit increases the parameter values on OpenACC functions. The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite. Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.
gcc/ChangeLog:
* graphite-optimize-isl.cc (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.
* graphite-scop-detection.cc (build_scops): Likewise for
param_graphite_max_arrays_per_scop.
gcc/testsuite/ChangeLog:
* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
|
|
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.
This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.
gcc/ChangeLog:
* tree-ssa-pre.cc (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
|
|
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable. The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.
This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop. This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process. Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.
A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.
gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.cc (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to
loop_invariant_motion_in_fun.
|
|
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences. A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.
gcc/ChangeLog:
* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.cc (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
|
|
This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite. The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls. If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.
gcc/ChangeLog:
* graphite-isl-ast-to-gimple.cc: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.cc (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.cc (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
|
|
Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.
This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private". The pass is restricted
"kernels" regions but could be extended to other types of regions.
gcc/ChangeLog:
* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.cc (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
Expect optimization messages.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
* c-c++-common/goacc/uninit-copy-clause.c: Likewise.
* gfortran.dg/goacc/uninit-copy-clause.f95: Likewise.
* c-c++-common/goacc/omp_data_optimize-1.c: New test.
* g++.dg/goacc/omp_data_optimize-1.C: New test.
* gfortran.dg/goacc/omp_data_optimize-1.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs. The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.
gcc/ChangeLog:
* tree-pretty-print.cc (print_omp_clause_to_str): Add new function.
* tree-pretty-print.h (print_omp_clause_to_str): Add declaration.
|
|
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.
This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.
gcc/ChangeLog:
* omp-offload.cc (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Adjust
expectations.
|
|
Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.
|
|
gcc/ChangeLog:
* graph.cc (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
|
|
This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose. This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops". The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions. This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.
Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure. Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs. The data-dependence analysis step in
Graphite has so far been tied to the code generation step. This
commit introduces a separate data-dependence analysis step that avoids
the code generation. This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.
gcc/ChangeLog:
* Makefile.in: Add graphite-oacc.o
* cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false.
* cfgloop.h: Add can_be_parallel_valid_p field.
* cfgloopmanip.cc (copy_loop_info): Add assert.
* config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup):
* doc/invoke.texi: Adjust param openacc-kernels description.
* doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
* flag-types.h (enum openacc_kernels):Add
OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
* gimple-pretty-print.cc (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
* gimple.h (enum gf_mask): Add
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
widen GF_OMP_TARGET_KIND_MASK.
(is_gimple_omp_oacc): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(is_gimple_omp_offloaded): Likewise.
* gimplify.cc (gimplify_omp_for): Enable reduction localization
for "kernels" regions.
(gimplify_omp_workshare): Likewise.
* graphite-dependences.cc (scop_get_reads_and_writes): Handle
"kills" and "reduction" PDRs.
(apply_schedule_on_deps): Add dump output for intermediate
steps of the dependence computation to enable understanding
of unexpected dependences.
(carries_deps): Likewise.
(scop_get_dependences): Handle "kill" operations and add dump
output.
* graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function.
(graphite_oacc_analyze_scop): New function.
* graphite-optimize-isl.cc (optimize_isl): Remove "static" and
add argument to identify OpenACC use; don't fail on unchanged
schedule in this case.
* graphite-poly.cc (new_poly_dr): Handle "kills".
(print_pdr): Likewise.
(new_gimple_poly_bb): Likewise.
(free_gimple_poly_bb): Likewise.
(new_scop): Handle "reduction", "private", and "firstprivate"
hash sets.
(free_scop): Likewise.
(print_isl_space): New function.
(debug_isl_space): New function.
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Don't fail if niter is 0 in OpenACC functions.
(scop_detection::add_scop): Don't reject regions with only one
loop in OpenACC functions.
(ignored_oacc_internal_call_p): New function.
(scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
(stmt_has_side_effects): Ignore internal OpenACC function calls.
(add_write): Likewise.
(add_read): Likewise.
(add_kill): New function.
(add_kills): New function.
(add_oacc_kills): New function.
(try_generate_gimple_bb): Kill false dependences for OpenACC
"private"/"firstprivate" vars.
(gather_bbs::gather_bbs): Determin OpenACC
"private"/"firstprivate" vars in region.
(gather_bbs::before_dom_children): Add assert.
(determine_openacc_reductions): New function.
(build_scops): Determine OpenACC "reduction" vars in SCoP.
* graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration.
(oacc_internal_call_p): New function.
(build_poly_dr): Ignore internal OpenACC function calls,
handle "reduction" refs.
(build_poly_sr): Likewise; handle "kill" operations.
* graphite.cc (graphite_transform_loops): Accept functions with
only a single loop.
(oacc_enable_graphite_p): New function.
(gate_graphite_transforms): Enable pass on OpenACC functions.
* graphite.h (enum poly_dr_type): Add PDR_KILL.
(struct poly_dr): Add "is_reduction" field.
(new_poly_dr): Add argument to declaration.
(pdr_kill_p): New function.
(print_isl_space): New declaration.
(debug_isl_space): New declaration.
(struct scop): Add fields "reductions_vars",
"oacc_firstprivate_vars", and "oacc_private_scalars".
(optimize_isl): New declaration.
(graphite_oacc_analyze_scop): New declaration.
* internal-fn.cc (expand_UNIQUE): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
* internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
* omp-expand.cc (struct omp_region): Adjust comment.
(expand_omp_taskloop_for_inner):
(expand_omp_for): Add asserts about expected "kernels" region types.
(mark_loops_in_oacc_kernels_region): Likewise.
(expand_omp_target): Likewise; handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(build_omp_regions_1): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
Likewise.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (oacc_get_kernels_attrib): New function.
(oacc_get_fn_dim_size): Allow argument to be NULL.
* omp-general.h (oacc_get_kernels_attrib): New declaration.
* omp-low.cc (struct omp_context): Add fields
"oacc_firstprivate_vars" and "oacc_private_scalars".
(was_originally_oacc_kernels): New function.
(is_oacc_kernels):
(is_oacc_kernels_decomposed_graphite_part): New function.
(new_omp_context): Allocate "oacc_first_private_vars" and
"oacc_private_scalars" ...
(delete_omp_context): ... and free from here.
(oacc_record_firstprivate_var_clauses): New function.
(oacc_record_private_scalars): New function.
(scan_sharing_clauses): Call functions to record "private"
scalars and "firstprivate" variables.
(check_oacc_kernel_gwv): Add assert.
(ctx_in_oacc_kernels_region): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(scan_omp_for): Likewise.
(check_omp_nesting_restrictions): Likewise.
(lower_oacc_head_mark): Likewise.
(lower_omp_for): Likewise.
(lower_omp_target): Create "private" and "firstprivate" marker
call statements.
(lower_oacc_head_tail): Adjust "private" and "firstprivate"
marker calls.
(lower_oacc_reductions): Emit "private" and "firstprivate"
marker call statements.
(make_oacc_firstprivate_vars_marker): New function.
(make_oacc_private_scalars_marker): New function.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
region using the new "kernels" handling.
(make_region_seq): Adjust default region type for new
"kernels" handling; no more exceptions, let Graphite handle everything.
(make_region_loop_nest): Likewise; add dump output and assert.
(adjust_nested_loop_clauses): Stop creating "auto" clauses if
loop has "independent", "gang" etc.
(transform_kernels_loop_clauses): Likewise.
* omp-offload.cc (oacc_extract_loop_call): New function.
(oacc_loop_get_cfg_loop): New function.
(can_be_parallel_str): New function.
(oacc_loop_can_be_parallel_p): New function.
(oacc_parallel_kernels_graphite_fun_p): New function.
(oacc_parallel_fun_p): New function.
(oacc_loop_transform_auto_into_independent): New function, ...
(oacc_loop_fixed_partitions): ... called from here to transfer
the result of Graphite's analysis to the loop.
(execute_oacc_loop_designation): Handle "oacc
functions with "parallel_kernels_graphite" attribute.
(execute_oacc_device_lower): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
* omp-offload.h (oacc_extract_loop_call): Add declaration.
* params.opt: Add "param=openacc-kernels" value "decompose-parloops".
* sese.cc (scalar_evolution_in_region): "Redirect" SCEV
analysis to outer loop for IFN_GOACC_LOOP calls.
* sese.h: Add field "kill_scalar_refs".
* tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
like CASE_CONVERT.
* tree-data-ref.cc (dump_data_reference): Include
DR_BASE_ADDRESS and DR_OFFSET in dump output.
(get_references_in_stmt): Don't reject OpenACC internal function
calls.
(graphite_find_data_references_in_stmt): Remove unused variable.
* tree-parloops.cc (pass_parallelize_loops::execute): Disable
pass with the new kernels handling, enable if requested explicitly.
* tree-scalar-evolution.cc (set_scev_analyze_openacc_calls):
Set flag to enable the analysis of internal OpenACC function
calls (use for Graphite only).
(oacc_call_analyzable_p): New function.
(oacc_ifn_call_extract): New function.
(oacc_simplify): New function.
(add_to_evolution): Simplify OpenACC internal function calls
if applicable.
(follow_ssa_edge_binary): Likewise.
(follow_ssa_edge_expr): Likewise.
(follow_copies_to_constant): Likewise.
(analyze_initial_condition): Likewise.
(interpret_loop_phi): Likewise.
(interpret_gimple_call): New function.
(interpret_rhs_expr): Likewise.
(instantiate_scev_name): Likewise.
(analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
(expression_expensive_p): Consider internal OpenACC calls to
be cheap.
* tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
New declaration.
(oacc_call_analyzable_p): New declaration.
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark
lhs of internal OpenACC function calls necessary.
* tree-ssa-ifcombine.c (recognize_if_then_else):
* tree-ssa-loop-niter.cc (oacc_call_analyzable_p):
(oacc_ifn_call_extract): New declaration.
(interpret_gimple_call): New delcaration.
(expand_simple_operations): Handle internal OpenACC function calls.
* tree-ssa-loop.cc (gate_oacc_kernels): Disable for new
"kernels" handling.
* graphite-oacc.cc: New file.
* graphite-oacc.h: New file.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
* c-c++-common/goacc/kernels-reduction.c: Removed.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict". This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime. Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.
TODO This needs more testing on non-OpenACC code.
gcc/ChangeLog:
* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.cc
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.cc (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.cc (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/graphite/alias-1.c: New test.
|
|
Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.
gcc/ChangeLog:
* tree-loop-distribution.cc (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.
* tree-data-ref.cc (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.
* tree-data-ref.h (compute_alias_check_pairs): New declaration.
|
|
Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".
gcc/ChangeLog:
* predict.cc (pass_profile::execute): Fix dump message.
|
|
gcc/ChangeLog:
* graphite-sese-to-poly.cc (build_poly_sr_1): Fix a typo and
a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.cc (gsi_insert_earliest): Fix typo
in comment.
|
|
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name. It also
explains the prefix "P_" for those names in the ISL representation.
gcc/ChangeLog:
* graphite-sese-to-poly.cc (isl_id_for_ssa_name): Rename to ...
(isl_id_for_parameter): ... this new function name.
(build_scop_context): Adjust function use.
|