Age | Commit message (Collapse) | Author | Files | Lines |
|
Just as on aarch64, x86's wider long double experiences loss of
precision with from_chars implemented in terms of double. Expect the
execution fail.
for libstdc++-v3/ChangeLog
* testsuite/20_util/to_chars/long_double.cc: Expect execution
fail on x86-vxworks.
(cherry picked from commit 7daa166fe89fca4ff1baa063c00a5a690f7e462f)
|
|
The constexpr API is only available with -std=gnu++XX (and proposed for
C++26). The proposal is to have the complete simd API usable in constant
expressions.
This patch resolves several issues with using simd in constant
expressions.
Issues why constant_evaluated branches are necessary:
* subscripting vector builtins is not allowed in constant expressions
* if the implementation needs/uses memcpy
* if the implementation would otherwise call SIMD intrinsics/builtins
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109261
* include/experimental/bits/simd.h (_SimdWrapper::_M_set):
Avoid vector builtin subscripting in constant expressions.
(resizing_simd_cast): Avoid memcpy if constant_evaluated.
(const_where_expression, where_expression, where)
(__extract_part, simd_mask, _SimdIntOperators, simd): Add either
_GLIBCXX_SIMD_CONSTEXPR (on public APIs), or constexpr (on
internal APIs).
* include/experimental/bits/simd_builtin.h (__vector_permute)
(__vector_shuffle, __extract_part, _GnuTraits::_SimdCastType1)
(_GnuTraits::_SimdCastType2, _SimdImplBuiltin)
(_MaskImplBuiltin::_S_store): Add constexpr.
(_CommonImplBuiltin::_S_store_bool_array)
(_SimdImplBuiltin::_S_load, _SimdImplBuiltin::_S_store)
(_SimdImplBuiltin::_S_reduce, _MaskImplBuiltin::_S_load): Add
constant_evaluated case.
* include/experimental/bits/simd_fixed_size.h
(_S_masked_load): Reword comment.
(__tuple_element_meta, __make_meta, _SimdTuple::_M_apply_r)
(_SimdTuple::_M_subscript_read, _SimdTuple::_M_subscript_write)
(__make_simd_tuple, __optimize_simd_tuple, __extract_part)
(__autocvt_to_simd, _Fixed::__traits::_SimdBase)
(_Fixed::__traits::_SimdCastType, _SimdImplFixedSize): Add
constexpr.
(_SimdTuple::operator[], _M_set): Add constexpr and add
constant_evaluated case.
(_MaskImplFixedSize::_S_load): Add constant_evaluated case.
* include/experimental/bits/simd_scalar.h: Add constexpr.
* include/experimental/bits/simd_x86.h (_CommonImplX86): Add
constexpr and add constant_evaluated case.
(_SimdImplX86::_S_equal_to, _S_not_equal_to, _S_less)
(_S_less_equal): Value-initialize to satisfy constexpr
evaluation.
(_MaskImplX86::_S_load): Add constant_evaluated case.
(_MaskImplX86::_S_store): Add constexpr and constant_evaluated
case. Value-initialize local variables.
(_MaskImplX86::_S_logical_and, _S_logical_or, _S_bit_not)
(_S_bit_and, _S_bit_or, _S_bit_xor): Add constant_evaluated
case.
* testsuite/experimental/simd/pr109261_constexpr_simd.cc: New
test.
(cherry picked from commit da579188807ede4ee9466d0b5bf51559c96a0b51)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109949
* include/experimental/bits/simd.h (__intrinsic_type): If
__ALTIVEC__ is defined, map gnu::vector_size types to their
corresponding __vector T types without losing unsignedness of
integer types. Also prefer long long over long.
* include/experimental/bits/simd_ppc.h (_S_popcount): Cast mask
object to the expected unsigned vector type.
(cherry picked from commit efd2b55d8562c6e80cb7ee8b9b1f9418f0c00cd9)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.
(cherry picked from commit b0a483b0a011f9cbc8b25053eae809c77dae2a12)
|
|
On ARM NEON doesn't support double, so __is_intrinsic_type_v<double,
whatever> should say false (instead of being ill-formed).
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/109261
* include/experimental/bits/simd.h (__intrinsic_type):
Specialize __intrinsic_type<double, 8> and
__intrinsic_type<double, 16> in any case, but provide the member
type only with __aarch64__.
(cherry picked from commit aa8b363171a95b8f867a74f29c75f9577e9087e1)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_builtin.h (_S_fpclassify): Move
__infn into #ifdef'ed block.
* testsuite/experimental/simd/tests/fpclassify.cc: Declare
constants only when used.
* testsuite/experimental/simd/tests/frexp.cc: Likewise.
* testsuite/experimental/simd/tests/logarithm.cc: Likewise.
* testsuite/experimental/simd/tests/trunc_ceil_floor.cc:
Likewise.
* testsuite/experimental/simd/tests/ldexp_scalbn_scalbln_modf.cc:
Move totest and expect1 into #ifdef'ed block.
(cherry picked from commit a7129e82bed1bd4f513fc3c3f401721e2c96a865)
|
|
|
|
(forward ported from devel/omp/gcc-12)
This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html
This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.
Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.
gcc/ChangeLog:
* builtins.cc (expand_builtin_omp_builtins): New function.
(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
* cgraphunit.cc (analyze_functions): Add call to
omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
* common.opt (fopenmp-target=): Add new option and enums.
* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
prototype.
(nvptx_mem_shared_p): Likewise.
* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
symbol for number of threads in team.
(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
(need_omp_num_threads): New bool for if any function references
omp_num_threads_sym.
(nvptx_option_override): Initialize omp_num_threads_sym/align.
(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
(nvptx_declare_function_name): Disable shim function under OMPACC mode.
Disable soft-stack under OMPACC mode. Add generation of neutering init
code under OMPACC mode.
(nvptx_output_set_softstack): Return "" under OMPACC mode.
(nvptx_expand_call): Set parallelism to vector for function calls with
"ompacc for" attached.
(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
(nvptx_expand_oacc_join): Likewise.
(nvptx_expand_omp_get_num_threads): New function.
(nvptx_mem_shared_p): New function.
(nvptx_mach_max_workers): Return 1 under OMPACC mode.
(nvptx_mach_vector_length): Return 32 under OMPACC mode.
(nvptx_single): Add adjustments for OMPACC mode, which have
parallel-construct fork/joins, and regions of code where neutering is
dynamically determined.
(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
attribute is attached to function. Disable uniform-simt when under
OMPACC mode.
(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.
* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.
* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.
* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
mode cases.
libgomp/ChangeLog:
* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.
(cherry picked from commit 5f881613fa9128edae5bbfa4e19f9752809e4bd7)
|
|
|
|
Some miscomputation of rtx_costs lead to sub-optimal code for
single-bit bit insertions. This patch implements TARGET_INSN_COST,
which has a chance to see the whole insn during insn combination;
in particular the SET_DEST of (set (zero_extract (...) ...)).
gcc/
* config/avr/avr.cc (avr_insn_cost): New static function.
(TARGET_INSN_COST): Define to that function.
|
|
The encoder for CONSTRUCTORs assumes that all bit-fields (DECL_BIT_FIELD)
have integral types, but that's not the case in Ada where they may have
pretty much any type, resulting in a wrong encoding for them
gcc/
* fold-const.cc (native_encode_initializer) <CONSTRUCTOR>: Apply the
specific treatment for bit-fields only if they have an integral type
and filter out non-integral bit-fields that do not start and end on
a byte boundary.
gcc/testsuite/
* gnat.dg/opt101.adb: New test.
* gnat.dg/opt101_pkg.ads: New helper.
|
|
|
|
|
|
On the following testcase we hang, because POLY_INT_CST is CONSTANT_CLASS_P,
but BIT_AND_EXPR with it and INTEGER_CST doesn't simplify and the
(x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2)
simplification actually relies on the (CST1 & CST2) simplification,
otherwise it is a deoptimization, trading 2 ops for 3 and furthermore
running into
/* Given a bit-wise operation CODE applied to ARG0 and ARG1, see if both
operands are another bit-wise operation with a common input. If so,
distribute the bit operations to save an operation and possibly two if
constants are involved. For example, convert
(A | B) & (A | C) into A | (B & C)
Further simplification will occur if B and C are constants. */
simplification which simplifies that
(x & CST2) | (CST1 & CST2) back to
CST2 & (x | CST1).
I went through all other places I could find where we have a simplification
with 2 CONSTANT_CLASS_P operands and perform some operation on those two,
while the other spots aren't that severe (just trade 2 operations for
another 2 if the two constants don't simplify, rather than as in the above
case trading 2 ops for 3), I still think all those spots really intend
to optimize only if the 2 constants simplify.
So, the following patch adds to those a ! modifier to ensure that,
even at GENERIC that modifier means !EXPR_P which is exactly what we want
IMHO.
2023-05-21 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109505
* match.pd ((x | CST1) & CST2 -> (x & CST2) | (CST1 & CST2),
Combine successive equal operations with constants,
(A +- CST1) +- CST2 -> A + CST3, (CST1 - A) +- CST2 -> CST3 - A,
CST1 - (CST2 - A) -> CST3 + A): Use ! on ops with 2 CONSTANT_CLASS_P
operands.
* gcc.target/aarch64/sve/pr109505.c: New test.
(cherry picked from commit f211757f6fa9515e3fd1a4f66f1a8b48e500c9de)
|
|
When working on a cost tweaking patch, I found that a newly
added test case has different dumpings with stage-1 and
bootstrapped gcc. By looking into it, the apparent reason
is vect_analyze_loop_2 doesn't get slp_done_for_suggested_uf
set expectedly, the following retrying will use the garbage
slp_done_for_suggested_uf instead. In fact, the setting of
slp_done_for_suggested_uf only happens when the previous
analysis succeeds, for the mentioned test case, its previous
analysis does fail, it's unexpected to use the value of
slp_done_for_suggested_uf any more.
In function vect_analyze_loop_1, we only return success when
res is true, which is the result of 1st analysis. It means
we never try to vectorize with unroll_vinfo if the previous
analysis fails. So this patch shouldn't break anything, and
just stop some useless analysis early.
gcc/ChangeLog:
* tree-vect-loop.cc (vect_analyze_loop_1): Don't retry analysis with
suggested unroll factor once the previous analysis fails.
(cherry picked from commit a04bf39f61ce7814d197d712760f08c206daf4f1)
|
|
|
|
Tools from later versions of the OS deprecate or fail to support
earlier OS revisions.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
libgcc/ChangeLog:
* config.host: Arrange to set min Darwin OS versions from
the configured host version.
* config/darwin10-unwind-find-enc-func.c: Do not use current
headers, but declare the nexessary structures locally to the
versions in use for Mac OSX 10.6.
* config/t-darwin: Amend to handle configured min OS
versions.
* config/t-darwin-min-1: New.
* config/t-darwin-min-5: New.
* config/t-darwin-min-8: New.
(cherry picked from commit 20b8779ea9bd82b26eeb195b30f695168cd7ae1d)
|
|
|
|
This patch removes the superfluous parallel in [u]divmod patterns in
the AVR backend. Effect of extra parallel is that add_clobbers reaches
gcc_unreachable() because the clobbers for [u]divmod are missing.
If an insn has multiple parts like clobbers, the parallel around the
parts of the insn pattern is implicit.
gcc/
PR target/105753
Backport from 2023-05-20 https://gcc.gnu.org/r14-1016
* config/avr/avr.md (divmodpsi, udivmodpsi, divmodsi, udivmodsi):
Remove superfluous "parallel" in insn pattern.
([u]divmod<mode>4): Tidy code. Use gcc_unreachable() instead of
printing error text to assembly.
gcc/testsuite/
PR target/105753
Backport from 2023-05-20 https://gcc.gnu.org/r14-1016
* gcc.target/avr/torture/pr105753.c: New test.
|
|
|
|
Now that we have support for inline subword atomic operations, it is no
longer necessary to link against libatomic. This also fixes testsuite
failures because the framework does not properly set up the linker flags
for finding libatomic.
The use of atomic operations is also independent of the use of libpthread.
gcc/
* config/riscv/linux.h (LIB_SPEC): Don't redefine.
|
|
This is one part of the fix for PR109128, along with a corresponding
binutils's linker change. Without this patch, what happens in the
linker, when an unused object in a .a file has offload data, is that
elf_link_is_defined_archive_symbol calls bfd_link_plugin_object_p,
which ends up calling the plugin's claim_file_handler, which then
records the object as one with offload data. That is, the linker never
decides to use the object in the first place, but use of this _p
interface (called as part of trying to decide whether to use the
object) results in the plugin deciding to use its offload data (and a
consequent mismatch in the offload data present at runtime).
The new hook allows the linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid recording the object as one
with offload data.
PR middle-end/109128
include/
* plugin-api.h (ld_plugin_claim_file_handler_v2)
(ld_plugin_register_claim_file_v2)
(LDPT_REGISTER_CLAIM_FILE_HOOK_V2): New.
(struct ld_plugin_tv): Add tv_register_claim_file_v2.
lto-plugin/
* lto-plugin.c (register_claim_file_v2): New.
(claim_file_handler_v2): New.
(claim_file_handler): Wrap claim_file_handler_v2.
(onload): Handle LDPT_REGISTER_CLAIM_FILE_HOOK_V2.
(cherry picked from commit c49d51fa8134f6c7e6c7cf6e4e3007c4fea617c5)
|
|
This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.
At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.
The approach taken is to generate a set of constructors to run on the
target: this currently works for AMD GCN, but fails on NVPTX due
to lack of constructor/destructor support there so far on mainline.
(See the new test static-aggr-constructor-destructor-3.C for a reason
why running constructors on the target is preferable to e.g. constructing
on the host and then copying the resulting object to the target.)
2023-05-12 Julian Brown <julian@codesourcery.com>
gcc/cp/
* decl2.cc (tree-inline.h): Include.
(static_init_fini_fns): Bump to four entries. Update comment.
(start_objects, start_partial_init_fini_fn): Add 'omp_target'
parameter. Support "declare target" decls. Update forward declaration.
(emit_partial_init_fini_fn): Add 'host_fn' parameter. Return tree for
the created function. Support "declare target".
(OMP_SSDF_IDENTIFIER): New macro.
(partition_vars_for_init_fini): Support partitioning "declare target"
variables also.
(generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
"declare target" decls.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.
gcc/
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): New builtin.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.
libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-3.C: New
test.
|
|
add_list_candidates has logic to reject designated initialization of a
non-aggregate type, but this is inadvertently being suppressed if the type
has a list constructor due to the order of case analysis, which in the
below testcase leads to us incorrectly treating the initializer list as if
it's non-designated. This patch fixes this by making us check for invalid
designated initialization sooner.
PR c++/109871
gcc/cp/ChangeLog:
* call.cc (add_list_candidates): Check for invalid designated
initialization sooner and even for types that have a list
constructor.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/desig27.C: New test.
(cherry picked from commit d5e5007c4b534391c0a97be56f6024fde1a88682)
|
|
This adds the feature-test macro for PR0849R8, as per
https://github.com/cplusplus/CWG/issues/281.
gcc/c-family/ChangeLog:
* c-cppbuiltin.cc (c_cpp_builtins): Predefine __cpp_auto_cast
for C++23.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/feat-cxx2b.C: Test __cpp_auto_cast.
(cherry picked from commit 32b81d897629b6c3bd9e2780831a1c45b38b5ac3)
|
|
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.
Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.
gcc/ChangeLog:
* omp-transform-loops.cc (full_unroll): Add initialization of index variable.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty pragmas.
Use -O2.
* testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use -Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
|
|
|
|
gcc/fortran/ChangeLog:
PR fortran/109846
* expr.cc (gfc_check_vardef_context): Check appropriate pointer
attribute for CLASS vs. non-CLASS function result in variable
definition context.
gcc/testsuite/ChangeLog:
PR fortran/109846
* gfortran.dg/ptr-func-5.f90: New test.
(cherry picked from commit fa0569e90efe8a5cb895a3f50dd502f849940828)
|
|
Fortran allows overloading of intrinsic operators also for operands of
numeric intrinsic types. The intrinsic operator versions are used
according to the rules of F2018 table 10.2 and imply type conversion as
long as the operand ranks are conformable. Otherwise no type conversion
shall be performed to allow the resolution of a matching user-defined
operator.
gcc/fortran/ChangeLog:
PR fortran/109641
* arith.cc (eval_intrinsic): Check conformability of ranks of operands
for intrinsic binary operators before performing type conversions.
* gfortran.h (gfc_op_rank_conformable): Add prototype.
* resolve.cc (resolve_operator): Check conformability of ranks of
operands for intrinsic binary operators before performing type
conversions.
(gfc_op_rank_conformable): New helper function to compare ranks of
operands of binary operator.
gcc/testsuite/ChangeLog:
PR fortran/109641
* gfortran.dg/overload_5.f90: New test.
(cherry picked from commit 185da7c2014ba41f38dd62cc719873ebf020b076)
|
|
The order of the map clauses in the Gimple representation has changed.
2023-05-10 Kwok Cheung Yeung <kcy@codesourcery.com>
* gfortran.dg/gomp/target-exit-data.f90: Update expected outputs.
|
|
2023-05-03 Kwok Cheung Yeung <kcy@codesourcery.com>
* gfortran.dg/gomp/map-10a.f90: Update expected outputs.
|
|
NULLs cannot be added to a hash_set in GCC 13, so test for NULL before adding.
Should probably be a fixup to 'Kernels loops annotation: Fortran'.
2023-04-27 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/fortran/
* openmp.cc (compute_goto_targets): Check label before adding to
goto_targets.
|
|
The GOMP_MAP_NONCONTIG_ARRAY_* map types need to be handled in omp_group_base
(which was added in GCC 13).
This should probably be a fixup to 'Merge non-contiguous array support
patches'.
2023-04-18 Kwok Cheung Yeung <kcy@codesourcery.com>
* gimplify.cc (omp_group_base): Handle GOMP_MAP_NONCONTIG_ARRAY_*
map types.
|
|
GOMP_MAP_DECLARE_ALLOCATE and GOMP_MAP_DECLARE_DEALLOCATE need to be handled
in omp_group_base (which was added in GCC 13).
This should probably be a fixup to 'Fortran "declare create"/allocate support
for OpenACC'.
2023-04-17 Kwok Cheung Yeung <kcy@codesourcery.com>
* gimplify.cc (omp_group_base): Handle GOMP_MAP_DECLARE_ALLOCATE
and GOMP_MAP_DECLARE_DEALLOCATE.
|
|
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
and initialization.
|
|
The previous code works fine on Fiji and Vega 10 devices, but bogs down in The
spin locks on Vega 20 or newer. Adding the sleep instructions fixes the
problem.
libgomp/ChangeLog:
* basic-allocator.c (basic_alloc_free): Use BASIC_ALLOC_YIELD.
(basic_alloc_realloc): Use BASIC_ALLOC_YIELD.
|
|
non-contiguous array support
Changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
libgomp/
* target.c (gomp_map_vars_internal)
<non-contiguous array support>: Handle 'always_pinned_mode'.
|
|
Implemented for nvptx offloading via 'cuMemHostAlloc', 'cuMemHostRegister'.
gcc/
* doc/invoke.texi (-foffload-memory=pinned): Document.
include/
* cuda/cuda.h (CUresult): Add
'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'.
(CUdevice_attribute): Add
'CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED'.
(CU_MEMHOSTREGISTER_READ_ONLY): Add.
(cuMemHostGetFlags, cuMemHostRegister, cuMemHostUnregister): Add.
libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add
'struct goacc_asyncqueue *' formal parameter.
(GOMP_OFFLOAD_page_locked_host_register)
(GOMP_OFFLOAD_page_locked_host_unregister)
(GOMP_OFFLOAD_page_locked_host_p): Add.
* libgomp.h (always_pinned_mode)
(gomp_page_locked_host_register_dev)
(gomp_page_locked_host_unregister_dev): Add.
(struct splay_tree_key_s): Add 'page_locked_host_p'.
(struct gomp_device_descr): Add
'GOMP_OFFLOAD_page_locked_host_register',
'GOMP_OFFLOAD_page_locked_host_unregister',
'GOMP_OFFLOAD_page_locked_host_p'.
* libgomp.texi (-foffload-memory=pinned): Document.
* plugin/cuda-lib.def (cuMemHostGetFlags, cuMemHostRegister_v2)
(cuMemHostRegister, cuMemHostUnregister): Add.
* plugin/plugin-nvptx.c (struct ptx_device): Add
'read_only_host_register_supported'.
(nvptx_open_device): Initialize it.
(free_host_blocks, free_host_blocks_lock)
(nvptx_run_deferred_page_locked_host_free)
(nvptx_page_locked_host_free_callback, nvptx_page_locked_host_p)
(GOMP_OFFLOAD_page_locked_host_register)
(nvptx_page_locked_host_unregister_callback)
(GOMP_OFFLOAD_page_locked_host_unregister)
(GOMP_OFFLOAD_page_locked_host_p)
(nvptx_run_deferred_page_locked_host_unregister)
(nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback):
Add.
(GOMP_OFFLOAD_fini_device, GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_run): Call
'nvptx_run_deferred_page_locked_host_free'.
(struct goacc_asyncqueue): Add
'page_locked_host_unregister_blocks_lock',
'page_locked_host_unregister_blocks'.
(nvptx_goacc_asyncqueue_construct)
(nvptx_goacc_asyncqueue_destruct): Handle those.
(GOMP_OFFLOAD_page_locked_host_free): Handle
'struct goacc_asyncqueue *' formal parameter.
(GOMP_OFFLOAD_openacc_async_test)
(nvptx_goacc_asyncqueue_synchronize): Call
'nvptx_run_deferred_page_locked_host_unregister'.
(GOMP_OFFLOAD_openacc_async_serialize): Call
'nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback'.
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_calloc, linux_memspace_free)
(linux_memspace_realloc): Remove 'always_pinned_mode' handling.
(GOMP_enable_pinned_mode): Move...
* target.c: ... here.
(always_pinned_mode, verify_always_pinned_mode)
(gomp_verify_always_pinned_mode, gomp_page_locked_host_alloc_dev)
(gomp_page_locked_host_free_dev)
(gomp_page_locked_host_aligned_alloc_dev)
(gomp_page_locked_host_aligned_free_dev)
(gomp_page_locked_host_register_dev)
(gomp_page_locked_host_unregister_dev): Add.
(gomp_copy_host2dev, gomp_map_vars_internal)
(gomp_remove_var_internal, gomp_unmap_vars_internal)
(get_gomp_offload_icvs, gomp_load_image_to_device)
(gomp_target_rev, omp_target_memcpy_copy)
(omp_target_memcpy_rect_worker): Handle 'always_pinned_mode'.
(gomp_copy_host2dev, gomp_copy_dev2host): Handle
'verify_always_pinned_mode'.
(GOMP_target_ext): Add 'assert'.
(gomp_page_locked_host_alloc): Use
'gomp_page_locked_host_alloc_dev'.
(gomp_page_locked_host_free): Use
'gomp_page_locked_host_free_dev'.
(omp_target_associate_ptr): Adjust.
(gomp_load_plugin_for_device): Handle 'page_locked_host_register',
'page_locked_host_unregister', 'page_locked_host_p'.
* oacc-mem.c (memcpy_tofrom_device): Handle 'always_pinned_mode'.
* libgomp_g.h (GOMP_enable_pinned_mode): Adjust.
* testsuite/libgomp.c/alloc-pinned-7.c: Remove.
|
|
'goacc_noncontig_array_create_ptrblock' [PR76739]
... to simplify later changes. No functional change.
Follow-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
PR other/76739
libgomp/
* target.c (gomp_map_vars_internal): Pass pre-allocated 'ptrblock'
to 'goacc_noncontig_array_create_ptrblock'.
* oacc-parallel.c (goacc_noncontig_array_create_ptrblock): Adjust.
* oacc-int.h (goacc_noncontig_array_create_ptrblock): Adjust.
|
|
libgomp/
* libgomp.texi (AMD Radeon, nvptx): Document OpenMP 'pinned'
memory.
|
|
gcc/ChangeLog:
* omp-transform-loops.cc (walk_omp_for_loops): Handle
GIMPLE_OMP_METADIRECTIVE.
|
|
Add the parsing of loop transformations on inner loops of a loop-nest.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(c_parser_omp_tile): ... adjust use here,
(c_parser_omp_unroll): ... and here,
(c_parser_omp_for_loop): ... and here. Stop treating loop
transformations like intervening code, parse them, and adjust
the loop-nest depth if necessary for tiling.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_is_pragma): New function.
(cp_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(cp_parser_omp_tile): ... adjust use here,
(cp_parser_omp_unroll): ... and here,
(cp_parser_omp_for_loop): ... and here. Stop treating loop
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.
libgomp/ChangeLog
* testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
matrix-* tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
New header file for new tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
New test.
|
|
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops. The transformation clauses are extended to
carry the information about the level of the loop-nest at which a
transformation should be applied. The middle end transformation pass
is adjusted to apply the transformations at the right level of a loop
nest and to take their effect on the loop nest depth into account.
gcc/fortran/ChangeLog:
* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.
gcc/ChangeLog:
* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.
gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.
gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
|
|
This commit adds the C and C++ front end support for the "omp tile"
directive. The middle end support for the transformation is
implemented in a previous commit.
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_directives): Add PRAGMA_OMP_TILE.
* c-pragma.cc (omp_pragmas_simd): Likewise.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_TILE
gcc/c/ChangeLog:
* c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
generalize ...
(c_parser_omp_nested_loop_transform_clauses): ... to this.
(c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(c_parser_omp_tile_sizes): Parse single "sizes" clause.
(c_parser_omp_loop_transform_clause): New function.
(c_parser_omp_tile): New function for parsing "omp tile"
(c_parser_omp_unroll): Adjust to renaming.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
(cp_parser_nested_omp_unroll_clauses): Rename ...
(cp_parser_omp_nested_loop_transform_clauses): ... to this.
(cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
(cp_parser_omp_tile): New function for parsing "omp tile".
(cp_parser_omp_loop_transform_clause): New function.
(cp_parser_omp_unroll): Adjust to renaming.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
* semantics.cc (finish_omp_clauses): Likewise.
gcc/ChangeLog:
* gimplify.cc (omp_for_drop_tile_clauses): New function, ...
(gimplify_omp_for): ... used here.
libgomp/ChangeLog:
* testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/tile-1.c: New test.
* c-c++-common/gomp/loop-transforms/tile-2.c: New test.
* c-c++-common/gomp/loop-transforms/tile-3.c: New test.
* c-c++-common/gomp/loop-transforms/tile-4.c: New test.
* c-c++-common/gomp/loop-transforms/tile-5.c: New test.
* c-c++-common/gomp/loop-transforms/tile-6.c: New test.
* c-c++-common/gomp/loop-transforms/tile-7.c: New test.
* c-c++-common/gomp/loop-transforms/tile-8.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt
to changed diagnostic messages.
* g++.dg/gomp/loop-transforms/tile-1.h: New test.
* g++.dg/gomp/loop-transforms/tile-1a.C: New test.
* g++.dg/gomp/loop-transforms/tile-1b.C: New test.
|
|
This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
(enum gfc_exec_op): Add EXEC_OMP_TILE.
(loop_transform_p): New declaration.
(struct gfc_omp_clauses): Add "tile_sizes" field.
* dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
(show_omp_node): Handle EXEC_OMP_TILE.
(show_code_node): Likewise.
* match.h (gfc_match_omp_tile): New declaration.
* openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
(match_tile_sizes): New function.
(OMP_TILE_CLAUSES): New macro.
(gfc_match_omp_tile): New function.
(resolve_omp_do): Handle EXEC_OMP_TILE.
(resolve_omp_tile): New function.
(omp_code_to_statement): Handle EXEC_OMP_TILE.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
and ST_OMP_TILE.
(next_statement): Handle ST_OMP_TILE.
(gfc_ascii_statement): Likewise.
(parse_omp_do): Likewise.
(parse_executable): Likewise.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
(loop_transform_p): New function.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Handle EXEC_OMP_TILE.
(gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clause_p): Add
OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
|
|
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".
gcc/ChangeLog:
* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
(OMP_CLAUSE_OACC_TILE_LIST): ... this.
(OMP_CLAUSE_TILE_ITERVAR): Rename to ...
(OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
(OMP_CLAUSE_TILE_COUNT): Rename to ...
(OMP_CLAUSE_OACC_TILE_COUNT): this.
* gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_for): Likewise.
* omp-general.cc (omp_extract_for_data): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_oacc_head_mark): Likewise.
* tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.cc: Likewise.
gcc/c-family/ChangeLog:
* c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
(c_parser_oacc_clause_tile): Likewise.
(c_parser_omp_for_loop): Likewise.
* c-typeck.cc (c_finish_omp_clauses): Likewise.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
(cp_parser_omp_clause_collapse): Likewise.
(cp_parser_omp_for_loop): Likewise.
* pt.cc (tsubst_omp_clauses): Likewise.
* semantics.cc (finish_omp_clauses): Likewise.
(finish_omp_for): Likewise.
gcc/fortran/ChangeLog:
* openmp.cc (enum omp_mask2): Adjust to renamings.
(gfc_match_omp_clauses): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
|
|
This commit implements the C and the C++ front end changes to support
the "omp unroll" directive. The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.
gcc/c-family/ChangeLog:
* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
* c-omp.cc: Add "unroll" to omp_directives[].
* c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
pragma_kind and adjust PRAGMA_OMP__LAST_.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Handle "full" and
"partial" clauses.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(c_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(c_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
and PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(c_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.
* parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(cp_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(cp_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
OMP_CLAUSE_FULL.
(cp_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(cp_parser_omp_for_loop): Handle "omp unroll" directives
between directive and loop.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(cp_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(cp_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Handle PRAGMA_OMP_UNROLL.
* pt.cc (tsubst_omp_clauses): Handle
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_NONE.
(tsubst_expr): Handle OMP_UNROLL.
* semantics.cc (finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.
libgomp/ChangeLog:
* testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
* g++.dg/gomp/loop-transforms/unroll-1.C: New test.
* g++.dg/gomp/loop-transforms/unroll-2.C: New test.
* g++.dg/gomp/loop-transforms/unroll-3.C: New test.
|
|
This commit implements the OpenMP 5.1 "omp unroll" directive for
Fortran. The Fortran front end changes encompass the parsing and the
verification of nesting restrictions etc. The actual loop
transformation is implemented in a new language-independent
"omp_transform_loops" pass which runs before omp lowering. No attempt
is made to re-use existing unrolling optimizations because a separate
implementation allows for better control of the unrolling. The new
pass will also serve as a foundation for the implementation of further
OpenMP loop transformations. This commit only implements the support
for "omp unroll" on the outermost loop of a loop nest. The support
for inner loops will be added later.
gcc/ChangeLog:
* Makefile.in: Add omp_transform_loops.o.
* gimple-pretty-print.cc (dump_gimple_omp_for): Handle "full"
and "partial" clauses.
* gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_TRANSFORM_LOOP.
* gimplify.cc (is_gimple_stmt): Handle OMP_UNROLL.
(gimplify_scan_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_adjust_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_omp_for): Handle OMP_UNROLL.
(gimplify_expr): Likewise.
* params.opt: Add omp-unroll-full-max-iteration and
omp-unroll-default-factor.
* passes.def: Add pass_omp_transform_loop before
pass_lower_omp.
* tree-core.h (enum omp_clause_code): Add
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree-pass.h (make_pass_omp_transform_loops): Declare
pmake_pass_omp_transform_loops.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
(dump_generic_node): Handle OMP_UNROLL.
* tree.cc (omp_clause_num_ops): Add number of operators
for OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAl.
(omp_clause_code_names): Add name strings for
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree.def (OMP_UNROLL): Define.
* tree.h (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Define.
* omp-transform-loops.cc: New file.
* omp-general.cc (omp_loop_transform_clause_p): New function.
* omp-general.h (omp_loop_transform_clause_p): New declaration.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Handle "unroll full"
and "unroll partial".
(show_omp_node): Handle OMP_UNROLL.
(show_code_node): Handle EXEC_OMP_UNROLL.
* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL, ST_OMP_END_UNROLL.
(enum gfc_exec_op): Add EXEC_OMP_UNROLL.
* match.h (gfc_match_omp_unroll): Declare.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_PARTIAL.
(gfc_match_omp_clauses): Handle "omp unroll partial".
(OMP_UNROLL_CLAUSES): New macro definition.
(gfc_match_omp_unroll): Match "full" clause.
(omp_unroll_removes_loop_nest): New function.
(resolve_omp_unroll): New function.
(resolve_omp_do): Accept and verify "omp unroll"
directives between directive and loop.
(omp_code_to_statement): Handle EXEC_OMP_UNROLL.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle "undroll" and "end unroll".
(next_statement): Handle ST_OMP_UNROLL.
(gfc_ascii_statement): Handle ST_OMP_UNROLL and ST_OMP_END_UNROLL.
(parse_omp_do): Accept ST_OMP_UNROLL and ST_OMP_END_UNROLL
before/after loop.
(parse_executable): Handle ST_OMP_UNROLL.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_UNROLL.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle unroll clauses.
(gfc_trans_omp_do): Handle OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_NONE creation.
(gfc_trans_omp_directive): Handle EXEC_OMP_UNROLL.
* trans.cc (trans_code): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-5.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-8.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-6.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-7.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90: New test.
|
|
OpenMP 'ompx_host_mem_alloc' is available for host and nvptx offloading as of
og12 commit 84914e197d91a67b3d27db0e4c69a433462983a5
"openmp, nvptx: ompx_unified_shared_mem_alloc", and for GCN offloading as of
og12 commit c77c45a641fedc3fe770e909cc010fb1735bdbbd
"amdgcn, libgomp: low-latency allocator".
libgomp/
* testsuite/libgomp.c/alloc-ompx_host_mem_alloc-1.c: New.
|
|
Like done for nvptx in og12 commit 23f52e49368d7b26a1b1a72d6bb903d31666e961
"Miscellaneous clean-up re OpenMP 'ompx_unified_shared_mem_space', 'ompx_host_mem_space'".
Clean-up for og12 commit c77c45a641fedc3fe770e909cc010fb1735bdbbd
"amdgcn, libgomp: low-latency allocator". No functional change.
libgomp/
* config/gcn/allocator.c (gcn_memspace_free): Explicitly handle
'memspace == ompx_host_mem_space'.
|