Age | Commit message (Collapse) | Author | Files | Lines |
|
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions. To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.
gcc/ChangeLog:
* omp-general.cc (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.cc (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.cc (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations. Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.cc (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.
Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
|
|
The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations. It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends. This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.
This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.
gcc/
* expr.cc (get_inner_reference): Handle NOP_EXPR like
VIEW_CONVERT_EXPR.
gcc/fortran/
* lang.opt (-param=delinearize=): New.
* trans-array.cc (get_class_array_vptr): New, split from...
(build_array_ref): ...here.
(get_array_lbound, get_array_ubound): New, split from...
(gfc_conv_array_ref): ...here. Additional code refactoring
plus support for delinearization of the array access.
gcc/testsuite/
* gfortran.dg/assumed_type_2.f90: Adjust patterns.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/graphite/block-3.f90: Remove xfails.
* gfortran.dg/graphite/block-4.f90: Likewise.
* gfortran.dg/inline_matmul_24.f90: Adjust patterns.
* gfortran.dg/no_arg_check_2.f90: Likewise.
* gfortran.dg/pr32921.f: Likewise.
* gfortran.dg/reassoc_4.f: Disable delinearization for this test.
Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>
|
|
Silence a warning. The argument type did not match the definition.
gcc/ChangeLog:
* tree-cfg.h (gimple_debug_cfg): Change argument type from int
to dump_flags_t.
|
|
The following testcases expect the -Wopenacc-parallelism warning output
but did fail as not compiled with that warning; solution: add it.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: Compile
with -Wopenacc-parallelism.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: Likewise.
|
|
Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.
gcc/c/ChangeLog:
* c-typeck.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
gcc/cp/ChangeLog:
* semantics.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
|
|
libgomp/ChangeLog:
* config/gcn/bar.h (gomp_barrier_init): Limit thread count to the
actual physical number.
* config/gcn/team.c (gomp_team_start): Don't attempt to set up
threads that do not exist.
|
|
2021-04-30 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Add
-Wopenacc-parallelism option.
|
|
The patch 'Merge non-contiguous array support patches' handles one of the
non-contiguous cases such that it is no longer an error.
2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/data-clause-1.c (foo): Remove expected message.
* g++.dg/goacc/data-clause-1.C (foo): Remove expected message.
|
|
This changes expected messages that differ between mainline and OG11. On
OG10, these messages were added in the patch:
081a01963ca8 Update expected messages, errors and warnings for "kernels" tests
2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c (main): Update expected
messages.
* gfortran.dg/goacc/kernels-decompose-2.f95 (main): Update expected
messages.
|
|
2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* omp-low.cc (is_oacc_parallel_or_serial): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED and
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE.
|
|
This updates the types of messages expected in the test, and the '-fopt-info'
option used to request them. The phrasing of the expected messages has also
changed somewhat and has been adjusted to match.
2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
Update additional options and expected messages.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
|
|
Ensure that the parent DWARF subprograms of offload kernel functions have a
code range, and are therefore not discarded by GDB. This is only necessary
when the parent function does not actually exist in the final binary, which is
commonly the case within the offload device's binary.
This patch replaces 808bdf1bb29 and fdcb23540a2. It should be squashed with
those before being posted upstream.
gcc/
* dwarf2out.cc (notional_parents_list): New file variable.
(gen_subprogram_die): Record offload kernel functions in
notional_parents_list.
(fixup_notional_parents): New function.
(dwarf2out_finish): Call fixup_notional_parents.
(dwarf2out_c_finalize): Reset notional_parents_list.
|
|
This sets the type precision of the collapsed iterator variable to the
sum of the precision of the collapsed loop variables, up to a maximum of
sizeof(long long) (i.e. 64-bits).
2021-03-01 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/
* omp-expand.cc (expand_oacc_for): Convert .tile variable to
diff_type before multiplying.
* omp-general.cc (omp_extract_for_data): Use accumulated precision
of all collapsed for-loops as precision of iteration variable, up
to the precision of a long long.
libgomp/
* testsuite/libgomp.c-c++-common/collapse-4.c: New.
* testsuite/libgomp.fortran/collapse5.f90: New.
|
|
The stacks are not supposed to be deleted every time memory is allocated, only
when there is insufficient memory. The unconditional call here seems to be in
error, and is causing a costly reallocation of the stacks before every launch.
libgomp/
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_alloc): Remove early call to
nvptx_stacks_free.
|
|
This should be squashed with 808bdf1bb29 and fdcb23540a2 to go to mainline.
gcc/
* dwarf2out.cc (gen_subprogram_die): Replace existing low/high PC
attributes, rather than ICE.
|
|
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562081.html
This patch now allows multiple clauses on the same construct to map
the same variable, which was not valid in OpenMP 4.5, but now allowed
in 5.0.
This may possibly reverted/updated when a final patch is approved
for mainline.
2021-02-01 Chung-Lin Tang <cltang@codesourcery.com>
gcc/cp/ChangeLog:
* semantics.cc (finish_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.
gcc/ChangeLog:
* omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(maybe_lookup_field_in_outer_ctx): Add code to search through
construct clauses instead of entirely based on splay-tree lookup.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Adjust testcase.
|
|
The previous patch wasn't quite right, apparently. Somehow the behaviour
changed after another clean build? This tweak fixes it.
This patch should be squashed with fdcb23540a2 to go to mainline.
gcc/ChangeLog:
* dwarf2out.cc (gen_subprogram_die): Check offload attributes only.
|
|
Add DWARF address class attributes for variables that exist outside the
generic address space. In particular, this is the case for gang-private
variables in OpenACC offload kernels.
gcc/ChangeLog:
* dwarf2out.cc (add_location_or_const_value_attribute): Set
DW_AT_address_class, if appropriate.
|
|
Add a notional code range to the notional parent function of offload kernel
functions. This is enough to prevent GDB discarding them.
gcc/ChangeLog:
* dwarf2out.cc (gen_subprogram_die): Add high/low_pc attributes for
parents of offload kernels.
|
|
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
gather/scatter instructions to scalar/elementwise fallback.
* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
hook.
|
|
This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.
Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".
Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.
This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.
We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.
For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.
The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.
Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust for loop
lowering changes.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
|
|
OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration. Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression. This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.
2020-08-30 Sandra Loosemore <sandra@codesourcery.com>
gcc/c-family/
* c-omp.cc (end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(check_and_annotate_for_loop): Use the new helper function.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-21.c: New.
* c-c++-common/goacc/kernels-loop-annotation-22.c: New.
|
|
The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test. However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable. In particular, it's ambiguous when both operands to the
end test comparison are local variables.
This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer. Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.
2020-08-30 Sandra Loosemore <sandra@codesourcery.com>
gcc/c-family/
* c-omp.cc (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here. Allow the loop
variable as either operand to the condition.
|
|
Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file. Now the "auto" clause appears first, instead of after "private".
2020-08-23 Sandra Loosemore <sandra@codesourcery.com>
gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.
|
|
This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body. Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction. If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.
2020-08-22 Sandra Loosemore <sandra@codesourcery.com>
gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Test for
calls to builtins.
gcc/fortran/
* openmp.cc (check_expr_for_invalid_calls): Check for intrinsic
functions.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.
|
|
Contrary to GCC 11, OG10 uses an error instead of a warning,
cf. commit 271c7fef548a86676d304b1eb2be5c0d47280bd6.
gcc/testsuite/
* gfortran.dg/gomp/pr67500.f90: Change dg-warning to
dg-error.
|
|
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior in Fortran.
2020-08-19 Sandra Loosemore <sandra@codesourcery.com>
gcc/fortran/
* openmp.cc (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.
gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.
|
|
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior for C and C++.
2020-08-19 Sandra Loosemore <sandra@codesourcery.com>
gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.
|
|
2020-08-19 Sandra Loosemore <sandra@codesourcery.com>
gcc/
* tree.h (OACC_LOOP_COMBINED): New.
gcc/c/
* c-parser.cc (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.
gcc/cp/
* parser.cc (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED. Update all call sites.
|
|
Array mapping was changed by the patch '[OpenMP, Fortran] Add
structure/derived-type element mapping'.
2020-08-19 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple
dump.
|
|
The C-equivalent version of the test (c-c++-common/goacc/loop-2-kernels.c)
has these tests XFAILed in the commit 'Make new OpenACC kernels conversion
the default; adjust and add tests' (commit
757f56ddc43fd80bb8740222ec352111b26d66e9), so the Fortran version should
be XFAILed too.
2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* gfortran.dg/goacc/loop-2-kernels.f95: Add XFAILs.
|
|
c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c testcase
This should have been part of 'Update expected messages, errors and warnings
for "kernels" tests' (commit 081a01963ca8db7ddaaf5871d281321454fd3246).
2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
(main): Remove obsolete expected messages.
|
|
libgomp/
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: Use
highest available integer kind instead of assuming that kind=16 exists.
|
|
2020-06-02 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/noncontig_array-1.c: Dump Gimple pass.
|
|
Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.
gcc/ChangeLog:
* gimplify.cc (localize_reductions): Do not create local
variable for privatized arrays.
|
|
The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.
This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed. It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.
2020-03-31 Sandra Loosemore <sandra@codesourcery.com>
gcc/cp/
* semantics.cc (handle_omp_array_sections_1): Call STRIP_NOPS
on length and low_bound;
(handle_omp_array_sections): Likewise.
|
|
These testsuite fixes are specific to the og10 branch, so are being
segregated from the ones that apply to mainline in a separate commit
from the main Fortran kernels loop annotation patch.
2020-03-27 Sandra Loosemore <sandra@codesourcery.com>
gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
line numbering.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Add
-fno-openacc-kernels-annotate-loops.
|
|
This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions. It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.
2020-03-27 Sandra Loosemore <sandra@codesourcery.com>
Gergö Barany <gergo@codesourcery.com>
gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.cc: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.cc (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.
gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.
|
|
The testcases being tweaked here are present on the og10 branch but not on
FSF mainline.
2020-03-27 Sandra Loosemore <sandra@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Add
-fno-openacc-kernels-annotate-loops.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
|
|
This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default. -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.
2020-03-27 Sandra Loosemore <sandra@codesourcery.com>
Kernels loops annotation: C and C++.
gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.cc: Include tree-iterator.h
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): Split from c_finish_omp_for. Use
annotation_error function. Code refactoring to avoid destructive
changes that cannot be undone in case of error.
(is_local_var): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(check_and_annotate_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
gcc/c/
* c-decl.cc (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.
gcc/cp/
* decl.cc (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.
gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document -fno-openacc-kernels-annotate-loops.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.
|
|
The firstprivate_int optimization changes the semantics of firstprivate
in this test, so XFAIL it until the correct semantics for firstprivate
have been decided (PR92036).
2020-03-24 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-firstprivate-1.c: XFAIL
execution test.
|
|
Test case is the existing libgomp.oacc-c++/firstprivate-mappings-1.C.
* omp-low.c (convert_from_firstprivate_int):
Use VIEW_CONVERT also for vectors.
|
|
libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c (main): Fix
async-safety issue. Increase number of iterations.
|
|
2018-10-04 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
gcc/
* omp-low.cc (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise. Create local storage for
the values being pointed to. Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT. Don't privatize
firstprivate VLAs.
* tree-pretty-print.cc (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.cc (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.cc (find_module_oacc_declare_clauses): Relax
oacc_declare_create to OMP_MAP_ALLOC, and oacc_declare_copyin to
OMP_MAP_TO, in order to match OpenACC 2.5 semantics.
* trans-openmp.cc (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls. Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.cc (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans.h (gfc_trans_oacc_declare_allocate): Declare.
gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.
include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.
libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
2020-02-19 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* trans-openmp.cc (gfc_omp_check_optional_argument): Handle non-decl
case.
gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Handle
GOMP_MAP_DECLARE_ALLOCATE and GOMP_MAP_DECLARE_DEALLOCATE.
libgomp/
* libgomp.h (gomp_acc_declare_allocate): Remove prototype.
* oacc-mem.c (gomp_acc_declare_allocate): Make static. Add POINTER
argument. Use acc_delete instead of acc_free. Handle scalar
mappings.
(find_group_last): Handle GOMP_MAP_DECLARE_ALLOCATE and
GOMP_MAP_DECLARE_DEALLOCATE groupings.
(goacc_enter_data_internal): Fix kind check for
GOMP_MAP_DECLARE_ALLOCATE. Pass new pointer argument to
gomp_acc_declare_allocate.
(goacc_exit_data_internal): Unlock device mutex around
gomp_acc_declare_allocate call. Pass new pointer argument. Handle
group pointer mapping for deallocate.
2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com>
libgomp/
* oacc-mem.c (goacc_enter_data_internal): Unlock mutex before calling
gomp_acc_declare_allocate and relock it afterwards.
|
|
gcc/
* gimplify.cc (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.
|
|
gcc/
2019-09-17 Tobias Burnus <tobias@codesourcery.com>
* gimplify.cc (gomp_oacc_needs_data_present): Remove unused variable.
|
|
gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Update
expected message/warning/error output.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
|
|
gcc/
* gimplify.cc (gimplify_omp_for): Use for_stmt in call to
localize_reductions.
|
|
gcc/
* gimplify.cc (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY
instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY.
|
|
gcc/
* gimplify.cc (privatize_reduction): New struct.
(localize_reductions_r, localize_reductions): New functions.
(gimplify_omp_for): Call localize_reductions.
(gimplify_omp_workshare): Likewise.
* omp-low.cc (lower_oacc_reductions): Handle localized reductions.
Create fewer temp vars.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
documentation.
* tree.cc (omp_clause_num_ops): Bump number of ops for
OMP_CLAUSE_REDUCTION to 6.
(walk_tree_1): Adjust accordingly.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.
|