Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a backport of:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619003.html
This patch implements '-fopenmp-target=acc', which enables internally handling
a subset of OpenMP target regions as OpenACC parallel regions. This basically
includes target, teams, parallel, distribute, for/do constructs, and atomics.
Essentially, we adjust the internal kinds to OpenACC type, and let OpenACC code
paths handle them, with various needed adjustments throughout middle-end and
nvptx backend. When using this "OMPACC" mode, if there are cases the patch
doesn't handle, it issues a warning, and reverts to normal processing for that
target region.
gcc/ChangeLog:
* builtins.cc (expand_builtin_omp_builtins): New function.
(expand_builtin): Add expand cases for BUILT_IN_GOMP_BARRIER,
BUILT_IN_OMP_GET_THREAD_NUM, BUILT_IN_OMP_GET_NUM_THREADS,
BUILT_IN_OMP_GET_TEAM_NUM, and BUILT_IN_OMP_GET_NUM_TEAMS using
expand_builtin_omp_builtins, enabled under -fopenmp-target=acc.
* cgraphunit.cc (analyze_functions): Add call to
omp_ompacc_attribute_tagging, enabled under -fopenmp-target=acc.
* common.opt (fopenmp-target=): Add new option and enums.
* config/nvptx/mkoffload.cc (main): Handle -fopenmp-target=.
* config/nvptx/nvptx-protos.h (nvptx_expand_omp_get_num_threads): New
prototype.
(nvptx_mem_shared_p): Likewise.
* config/nvptx/nvptx.cc (omp_num_threads_sym): New global static RTX
symbol for number of threads in team.
(omp_num_threads_align): New var for alignment of omp_num_threads_sym.
(need_omp_num_threads): New bool for if any function references
omp_num_threads_sym.
(nvptx_option_override): Initialize omp_num_threads_sym/align.
(write_as_kernel): Disable normal OpenMP kernel entry under OMPACC mode.
(nvptx_declare_function_name): Disable shim function under OMPACC mode.
Disable soft-stack under OMPACC mode. Add generation of neutering init
code under OMPACC mode.
(nvptx_output_set_softstack): Return "" under OMPACC mode.
(nvptx_expand_call): Set parallelism to vector for function calls with
"ompacc for" attached.
(nvptx_expand_oacc_fork): Set mode to GOMP_DIM_VECTOR under OMPACC mode.
(nvptx_expand_oacc_join): Likewise.
(nvptx_expand_omp_get_num_threads): New function.
(nvptx_mem_shared_p): New function.
(nvptx_mach_max_workers): Return 1 under OMPACC mode.
(nvptx_mach_vector_length): Return 32 under OMPACC mode.
(nvptx_single): Add adjustments for OMPACC mode, which have
parallel-construct fork/joins, and regions of code where neutering is
dynamically determined.
(nvptx_reorg): Enable neutering under OMPACC mode when "ompacc for"
attribute is attached to function. Disable uniform-simt when under
OMPACC mode.
(nvptx_file_end): Write __nvptx_omp_num_threads out when needed.
(nvptx_goacc_fork_join): Return true under OMPACC mode.
* config/nvptx/nvptx.h (struct GTY(()) machine_function): Add
omp_parallel_predicate and omp_fn_entry_num_threads_reg fields.
* config/nvptx/nvptx.md (unspecv): Add UNSPECV_GET_TID,
UNSPECV_GET_NTID, UNSPECV_GET_CTAID, UNSPECV_GET_NCTAID,
UNSPECV_OMP_PARALLEL_FORK, UNSPECV_OMP_PARALLEL_JOIN entries.
(nvptx_shared_mem_operand): New predicate.
(gomp_barrier): New expand pattern.
(omp_get_num_threads): New expand pattern.
(omp_get_num_teams): New insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_team_num): Likewise.
(get_ntid): Likewise.
(nvptx_omp_parallel_fork): Likewise.
(nvptx_omp_parallel_join): Likewise.
* flag-types.h (omp_target_mode_kind): New flag value enum.
* gimplify.cc (struct gimplify_omp_ctx): Add 'bool ompacc' field.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE__OMPACC_.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_ctx_ompacc_p): New function.
(gimplify_omp_for): Handle combined loops under OMPACC.
* lto-wrapper.cc (append_compiler_options): Add OPT_fopenmp_target_.
* omp-builtins.def (BUILT_IN_OMP_GET_THREAD_NUM): Remove CONST.
(BUILT_IN_OMP_GET_NUM_THREADS): Likewise.
* omp-expand.cc (remove_exit_barrier): Disable addressable-var
processing for parallel construct child functions under OMPACC mode.
(expand_oacc_for): Add OMPACC mode handling.
(get_target_arguments): Force thread_limit clause value to 1 under
OMPACC mode.
(expand_omp): Under OMPACC mode, avoid child function expanding of
GIMPLE_OMP_PARALLEL.
* omp-general.cc (omp_extract_for_data): Adjustments for OMPACC mode.
* omp-low.cc (struct omp_context): Add 'bool ompacc_p' field.
(scan_sharing_clauses): Handle OMP_CLAUSE__OMPACC_.
(ompacc_ctx_p): New function.
(scan_omp_parallel): Handle OMPACC mode, avoid creating child function.
(scan_omp_target): Tag "ompacc"/"ompacc for" attributes for target
construct child function, remove OMP_CLAUSE__OMPACC_ clauses.
(lower_oacc_head_mark): Handle OMPACC mode cases.
(lower_omp_for): Adjust OMP_FOR kind from OpenMP to OpenACC kinds, add
vector/gang clauses as needed. Add other OMPACC handling.
(lower_omp_taskreg): Add call to lower_oacc_head_tail for OMPACC case.
(lower_omp_target): Do OpenACC gang privatization under OMPACC case.
(lower_omp_teams): Forward OpenACC privatization variables to outer
target region under OMPACC mode.
(lower_omp_1): Do OpenACC gang privatization under OMPACC case for
GIMPLE_BIND.
* omp-offload.cc (ompacc_supported_clauses_p): New function.
(struct target_region_data): New struct type for tree walk.
(scan_fndecl_for_ompacc): New function.
(scan_omp_target_region_r): New function.
(scan_omp_target_construct_r): New function.
(omp_ompacc_attribute_tagging): New function.
(oacc_dim_call): Add OMPACC case handling.
(execute_oacc_device_lower): Make parts explicitly only OpenACC enabled.
(pass_oacc_device_lower::gate): Enable pass under OMPACC mode.
* omp-offload.h (omp_ompacc_attribute_tagging): New prototype.
* opts.cc (finish_options): Only allow -fopenmp-target= when -fopenmp
and no -fopenacc.
* target-insns.def (gomp_barrier): New defined insn pattern.
(omp_get_thread_num): Likewise.
(omp_get_num_threads): Likewise.
(omp_get_team_num): Likewise.
(omp_get_num_teams): Likewise.
* tree-core.h (enum omp_clause_code): Add new OMP_CLAUSE__OMPACC_ entry
for internal clause.
* tree-nested.cc (convert_nonlocal_omp_clauses): Handle
OMP_CLAUSE__OMPACC_.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE__OMPACC_.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE__OMPACC_ entry.
(omp_clause_code_name): Likewise.
* tree.h (OMP_CLAUSE__OMPACC__FOR): New macro for OMP_CLAUSE__OMPACC_.
* tree-ssa-loop.cc (pass_oacc_only::gate): Enable pass under OMPACC
mode cases.
libgomp/ChangeLog:
* config/nvptx/team.c (__nvptx_omp_num_threads): New global variable in
shared memory.
|
|
When offloading was enabled, top-level 'asm' were added to the offloading
section, confusing assemblers which did not support the syntax. Additionally,
with offloading and -flto, the top-level assembler code did not end up
in the host files.
As r14-321-g9a41d2cdbcd added top-level 'asm' to one libstdc++ header file,
the issue became more apparent, causing fails with nvptx for some
C++ testcases.
PR libstdc++/109816
gcc/ChangeLog:
* lto-cgraph.cc (output_symtab): Guard lto_output_toplevel_asms by
'!lto_stream_offload_p'.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-1.C: New test.
* testsuite/libgomp.c++/target-map-class-2.C: New test.
(cherry picked from commit a835f046cdf017b9e8ad5576df4f10daaf8420d0)
|
|
The index variable initialization for the 'omp unroll'
directive with 'full' clause got lost and the testsuite
did not catch it.
Add the initialization and add -Wall to some tests
to detect uninitialized variable uses and other
potential problems in the code generation.
gcc/ChangeLog:
* omp-transform-loops.cc (full_unroll): Add initialization of index variable.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
Use -Wall and add -Wno-unknown-pragmas to disable warnings about empty pragmas.
Use -O2.
* testsuite/libgomp.c++/loop-transforms/matrix-no-directive-unroll-full-1.C:
Copy of testsuite/libgomp.c-c++-common/matrix-no-directive-unroll-full-1.c,
but using -O0 which works only for C++.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c: Use -Wall
and use -Wno-unknown-pragmas to disable warnings about empty pragmas.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c:
Likewise and fix broken function calls found by -Wall.
|
|
The vec_fmsubadd instuction actually had add twice, by mistake.
Also improve code-gen for all the complex patterns by using properly
undefined values. Mostly this just prevents the compiler reserving space
in the stack frame.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (cmul<conj_op><mode>3): Use gcn_gen_undef.
(cml<addsub_as><mode>4): Likewise.
(vec_addsub<mode>3): Likewise.
(cadd<rot><mode>3): Likewise.
(vec_fmaddsub<mode>4): Likewise.
(vec_fmsubadd<mode>4): Likewise, and use sub for the odd lanes.
|
|
The vop3 instructions don't support B constraint immediates.
Also, take the use the SV_FP iterator to delete a redundant pattern.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (vnsi, VnSI): Add scalar modes.
(ldexp<mode>3): Delete.
(ldexp<mode>3<exec>): Change "B" to "A".
|
|
The backend can now vectorize more things.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp
(check_effective_target_vect_call_copysignf): Add amdgcn.
(check_effective_target_vect_call_sqrtf): Add amdgcn.
(check_effective_target_vect_call_ceilf): Add amdgcn.
(check_effective_target_vect_call_floor): Add amdgcn.
(check_effective_target_vect_logical_reduc): Add amdgcn.
|
|
|
|
Implement FP division using hardware instructions. This replaces both the
softfp library calls, and the --fast-math inaccurate divsion we had previously.
The GCN architecture does not have a single divide instruction, but it does
have a number of support instructions designed to make multiply-by-reciprocal
sufficiently accurate for non-fast-math usage.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (SV_SFDF): New iterator.
(SV_FP): New iterator.
(scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes.
(recip<mode>2): Unify the two patterns using SV_FP.
(div_scale<mode><exec_vcc>): New insn.
(div_fmas<mode><exec>): New insn.
(div_fixup<mode><exec>): New insn.
(div<mode>3): Unify the two expanders and rewrite using hardfp.
* config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute.
* config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS,
and UNSPEC_DIV_FIXUP.
(vccwait): New attribute.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
(cherry picked from commit cfdc45f73c56ad051a53576a4e88675ced2660d4)
|
|
The original patch to fix this PR broke the if-conversion of calls into
IFN_MASK_CALL. This patch restores that original behaviour and makes sure the
tests added earlier specifically test inbranch SIMD clones.
gcc/ChangeLog:
PR tree-optimization/108888
* tree-if-conv.cc (predicate_statements): Fix gimple call check.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16.c: Make simd clone inbranch only.
* gcc.dg/vect/vect-simd-clone-17.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18.c: Likewise.
(cherry picked from commit 58c8c1b383bc3c286d6527fc6e8fb62463f9a877)
|
|
This patch fixes an ICE with checking enabled with the patch:
abcb5dbac666513c798e574808f849f76a1c0799
There were two problems: first, OMP_CLAUSE_CHAIN was erroneously
used as the chain pointer instead of TREE_CHAIN for a non-OMP clause
list. Secondly, "copy_node" by itself is not sufficient to clone the
initialization statement for use in the on-target constructor/destructor
function. Instead we now use walk_tree with "copy_tree_body_r" and
appropriate configuration parameters.
2023-04-05 Julian Brown <julian@codesourcery.com>
gcc/cp/
* decl2.cc (tree-inline.h): Include.
(do_static_initialization_or_destruction): Change OMP_TARGET parameter
to pass the host version of the SSDF function decl. Use
copy_tree_body_r to clone init stmt. Update forward declaration.
(c_parse_final_cleanups): Update calls to
do_static_initialization_or_destruction. Use TREE_CHAIN instead of
OMP_CLAUSE_CHAIN.
|
|
gcc/ChangeLog:
* config/gcn/gcn-valu.md (one_cmpl<mode>2<exec>): New.
|
|
Implemented for nvptx offloading via 'cuMemHostAlloc', 'cuMemHostRegister'.
gcc/
* doc/invoke.texi (-foffload-memory=pinned): Document.
include/
* cuda/cuda.h (CUresult): Add
'CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED'.
(CUdevice_attribute): Add
'CU_DEVICE_ATTRIBUTE_READ_ONLY_HOST_REGISTER_SUPPORTED'.
(CU_MEMHOSTREGISTER_READ_ONLY): Add.
(cuMemHostGetFlags, cuMemHostRegister, cuMemHostUnregister): Add.
libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add
'struct goacc_asyncqueue *' formal parameter.
(GOMP_OFFLOAD_page_locked_host_register)
(GOMP_OFFLOAD_page_locked_host_unregister)
(GOMP_OFFLOAD_page_locked_host_p): Add.
* libgomp.h (always_pinned_mode)
(gomp_page_locked_host_register_dev)
(gomp_page_locked_host_unregister_dev): Add.
(struct splay_tree_key_s): Add 'page_locked_host_p'.
(struct gomp_device_descr): Add
'GOMP_OFFLOAD_page_locked_host_register',
'GOMP_OFFLOAD_page_locked_host_unregister',
'GOMP_OFFLOAD_page_locked_host_p'.
* libgomp.texi (-foffload-memory=pinned): Document.
* plugin/cuda-lib.def (cuMemHostGetFlags, cuMemHostRegister_v2)
(cuMemHostRegister, cuMemHostUnregister): Add.
* plugin/plugin-nvptx.c (struct ptx_device): Add
'read_only_host_register_supported'.
(nvptx_open_device): Initialize it.
(free_host_blocks, free_host_blocks_lock)
(nvptx_run_deferred_page_locked_host_free)
(nvptx_page_locked_host_free_callback, nvptx_page_locked_host_p)
(GOMP_OFFLOAD_page_locked_host_register)
(nvptx_page_locked_host_unregister_callback)
(GOMP_OFFLOAD_page_locked_host_unregister)
(GOMP_OFFLOAD_page_locked_host_p)
(nvptx_run_deferred_page_locked_host_unregister)
(nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback):
Add.
(GOMP_OFFLOAD_fini_device, GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_run): Call
'nvptx_run_deferred_page_locked_host_free'.
(struct goacc_asyncqueue): Add
'page_locked_host_unregister_blocks_lock',
'page_locked_host_unregister_blocks'.
(nvptx_goacc_asyncqueue_construct)
(nvptx_goacc_asyncqueue_destruct): Handle those.
(GOMP_OFFLOAD_page_locked_host_free): Handle
'struct goacc_asyncqueue *' formal parameter.
(GOMP_OFFLOAD_openacc_async_test)
(nvptx_goacc_asyncqueue_synchronize): Call
'nvptx_run_deferred_page_locked_host_unregister'.
(GOMP_OFFLOAD_openacc_async_serialize): Call
'nvptx_move_page_locked_host_unregister_blocks_aq1_aq2_callback'.
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_calloc, linux_memspace_free)
(linux_memspace_realloc): Remove 'always_pinned_mode' handling.
(GOMP_enable_pinned_mode): Move...
* target.c: ... here.
(always_pinned_mode, verify_always_pinned_mode)
(gomp_verify_always_pinned_mode, gomp_page_locked_host_alloc_dev)
(gomp_page_locked_host_free_dev)
(gomp_page_locked_host_aligned_alloc_dev)
(gomp_page_locked_host_aligned_free_dev)
(gomp_page_locked_host_register_dev)
(gomp_page_locked_host_unregister_dev): Add.
(gomp_copy_host2dev, gomp_map_vars_internal)
(gomp_remove_var_internal, gomp_unmap_vars_internal)
(get_gomp_offload_icvs, gomp_load_image_to_device)
(gomp_target_rev, omp_target_memcpy_copy)
(omp_target_memcpy_rect_worker): Handle 'always_pinned_mode'.
(gomp_copy_host2dev, gomp_copy_dev2host): Handle
'verify_always_pinned_mode'.
(GOMP_target_ext): Add 'assert'.
(gomp_page_locked_host_alloc): Use
'gomp_page_locked_host_alloc_dev'.
(gomp_page_locked_host_free): Use
'gomp_page_locked_host_free_dev'.
(omp_target_associate_ptr): Adjust.
(gomp_load_plugin_for_device): Handle 'page_locked_host_register',
'page_locked_host_unregister', 'page_locked_host_p'.
* oacc-mem.c (memcpy_tofrom_device): Handle 'always_pinned_mode'.
* libgomp_g.h (GOMP_enable_pinned_mode): Adjust.
* testsuite/libgomp.c/alloc-pinned-7.c: Remove.
|
|
'gcc/cp/decl2.cc:one_static_initialization_or_destruction'
[...]/gcc/cp/decl2.cc: In function ‘void one_static_initialization_or_destruction(tree, tree, bool, bool)’:
[...]/gcc/cp/decl2.cc:4171:48: error: unused parameter ‘omp_target’ [-Werror=unused-parameter]
4171 | bool omp_target)
| ~~~~~^~~~~~~~~~
cc1plus: all warnings being treated as errors
make[3]: *** [cp/decl2.o] Error 1
Fix-up for og12 commit abcb5dbac666513c798e574808f849f76a1c0799
"[og12] OpenMP: Constructors and destructors for "declare target" static aggregates".
gcc/cp/
* decl2.cc (one_static_initialization_or_destruction): Remove
'omp_target' formal parameter. Adjust all users.
|
|
gcc/ChangeLog:
* omp-transform-loops.cc (walk_omp_for_loops): Handle
GIMPLE_OMP_METADIRECTIVE.
|
|
... for commits b35b06e9d39..09f39bb1141
|
|
|
|
aggregates
This patch adds support for running constructors and destructors for
static (file-scope) aggregates for C++ objects which are marked with
"declare target" directives on OpenMP offload targets.
At present, space is allocated on the target for such aggregates, but
nothing ever constructs them properly, so they end up zero-initialised.
Tested with offloading to AMD GCN. I will apply to the og12 branch
shortly.
ChangeLog
2023-03-27 Julian Brown <julian@codesourcery.com>
gcc/cp/
* decl2.cc (priority_info): Add omp_tgt_initializations_p and
omp_tgt_destructions_p.
(start_objects, start_static_storage_duration_function,
do_static_initialization_or_destruction,
one_static_initialization_or_destruction,
generate_ctor_or_dtor_function): Add 'omp_target' parameter. Support
"declare target" decls. Update forward declarations.
(OMP_SSDF_IDENTIFIER): New macro.
(omp_tgt_ssdf_decls): New vec.
(get_priority_info): Initialize omp_tgt_initializations_p and
omp_tgt_destructions_p fields.
(handle_tls_init): Update call to
omp_static_initialization_or_destruction.
(c_parse_final_cleanups): Support constructors/destructors on OpenMP
offload targets.
gcc/
* omp-builtins.def (BUILT_IN_OMP_IS_INITIAL_DEVICE): New builtin.
* tree.cc (get_file_function_name): Support names for on-target
constructor/destructor functions.
libgomp/
* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C: New
test.
* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C: New
test.
|
|
Add the parsing of loop transformations on inner loops of a loop-nest.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(c_parser_omp_tile): ... adjust use here,
(c_parser_omp_unroll): ... and here,
(c_parser_omp_for_loop): ... and here. Stop treating loop
transformations like intervening code, parse them, and adjust
the loop-nest depth if necessary for tiling.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_is_pragma): New function.
(cp_parser_omp_nested_loop_transform_clauses):
Add argument for the level of loop-nest at which the clauses
appear, ...
(cp_parser_omp_tile): ... adjust use here,
(cp_parser_omp_unroll): ... and here,
(cp_parser_omp_for_loop): ... and here. Stop treating loop
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-inner-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-inner-2.c: New test.
libgomp/ChangeLog
* testsuite/libgomp.c++/loop-transforms/tile-1.C: Deleted, replaced by
matrix-* tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-1.h:
New header file for new tests.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-constant-iter.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-helper.h:
Likewise.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-no-directive-unroll-full-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-parallel-masked-taskloop-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-target-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-taskloop-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-omp-teams-distribute-parallel-for-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-simd-1.c:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/matrix-transform-variants-1.h:
New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-non-rect-1.c:
New test.
|
|
So far the implementation of the "omp tile" and "omp unroll"
directives restricted their use to the outermost loop of a loop-nest.
This commit changes the Fortran front end to parse and verify the
directives on inner loops. The transformation clauses are extended to
carry the information about the level of the loop-nest at which a
transformation should be applied. The middle end transformation pass
is adjusted to apply the transformations at the right level of a loop
nest and to take their effect on the loop nest depth into account.
gcc/fortran/ChangeLog:
* openmp.cc (omp_unroll_removes_loop_nest): Move down in file.
(resolve_loop_transform_generic): Remove, and ...
(resolve_omp_unroll): ... inline and adapt here. Move function.
Move functin.
(find_nested_loop_in_block): New function.
(find_nested_loop_in_chain): New function, used ...
(is_outer_iteration_variable): ... here, and ...
(expr_is_invariant): ... here.
(resolve_omp_do): Adjust code for resolving loop transformations.
(resolve_omp_tile): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Set OMP_TRANSFROM_LEVEL
on new clause.
(compute_transformed_depth): New function to compute the depth
("collapse") of a transformed loop nest, used
(gfc_trans_omp_do): ... here.
gcc/ChangeLog:
* omp-transform-loops.cc (gimple_assign_rhs_to_tree): Fix type
in comment.
(gomp_for_uncollapse): Adjust "collapse" value after uncollapse.
(partial_unroll): Add argument for the loop nest level to be transformed.
(tile): Likewise.
(transform_gomp_for): Pass level to transformatoin functions.
(optimize_transformation_clauses): Handle transformation clauses for all
levels recursively.
* tree-pretty-print.cc (dump_omp_clause): Print
OMP_CLAUSE_TRANSFORM_LEVEL for OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.cc: Increase number of operands of OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TRANSFORM_LEVEL): New macro to access
clause operand 0.
(OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Use operand 1 instead of 0.
(OMP_CLAUSE_TILE_SIZES): Likewise.
gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(cp_parser_omp_clause_unroll_partial): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
(cp_parser_omp_loop_transform_clause): Likewise.
(cp_parser_omp_nested_loop_transform_clauses): Likewise.
(cp_parser_omp_unroll): Likewise.
* pt.cc (tsubst_omp_clauses): Adjust OMP_CLAUSE_UNROLL_PARTIAL
and OMP_CLAUSE_TILE handling to changed number of operands.
gcc/c/ChangeLog
* c-parser.cc (c_parser_omp_clause_unroll_full): Set new
OMP_CLAUSE_TRANSFORM_LEVEL operand to default value.
(c_parser_omp_clause_unroll_partial): Likewise.
(c_parser_omp_tile_sizes): Likewise.
(c_parser_omp_loop_transform_clause): Likewise.
(c_parser_omp_nested_loop_transform_clauses): Likewise.
(c_parser_omp_unroll): Likewise.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-8.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: Adjust.
* gfortran.dg/gomp/loop-transforms/inner-loops.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-imperfect-nest.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-3a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-4a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-inner-loops-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-inner-loop.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-inner-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: Adapt to
changed diagnostic messages.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/inner-1.f90: New test.
|
|
This commit adds the C and C++ front end support for the "omp tile"
directive. The middle end support for the transformation is
implemented in a previous commit.
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_directives): Add PRAGMA_OMP_TILE.
* c-pragma.cc (omp_pragmas_simd): Likewise.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_TILE.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_TILE
gcc/c/ChangeLog:
* c-parser.cc (c_parser_nested_omp_unroll_clauses): Rename and
generalize ...
(c_parser_omp_nested_loop_transform_clauses): ... to this.
(c_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(c_parser_omp_tile_sizes): Parse single "sizes" clause.
(c_parser_omp_loop_transform_clause): New function.
(c_parser_omp_tile): New function for parsing "omp tile"
(c_parser_omp_unroll): Adjust to renaming.
(c_parser_omp_construct): Handle PRAGMA_OMP_TILE.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_unroll_partial): Adjust.
(cp_parser_nested_omp_unroll_clauses): Rename ...
(cp_parser_omp_nested_loop_transform_clauses): ... to this.
(cp_parser_omp_for_loop): Handle "omp tile" parsing in loop nests.
(cp_parser_omp_tile_sizes): New function, parses single "sizes" clause
(cp_parser_omp_tile): New function for parsing "omp tile".
(cp_parser_omp_loop_transform_clause): New function.
(cp_parser_omp_unroll): Adjust to renaming.
(cp_parser_omp_construct): Handle PRAGMA_OMP_TILE.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_omp_clauses): Handle OMP_CLAUSE_TILE.
* semantics.cc (finish_omp_clauses): Likewise.
gcc/ChangeLog:
* gimplify.cc (omp_for_drop_tile_clauses): New function, ...
(gimplify_omp_for): ... used here.
libgomp/ChangeLog:
* testsuite/libgomp.c++/loop-transforms/tile-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-2.C: New test.
* testsuite/libgomp.c++/loop-transforms/tile-3.C: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/tile-1.c: New test.
* c-c++-common/gomp/loop-transforms/tile-2.c: New test.
* c-c++-common/gomp/loop-transforms/tile-3.c: New test.
* c-c++-common/gomp/loop-transforms/tile-4.c: New test.
* c-c++-common/gomp/loop-transforms/tile-5.c: New test.
* c-c++-common/gomp/loop-transforms/tile-6.c: New test.
* c-c++-common/gomp/loop-transforms/tile-7.c: New test.
* c-c++-common/gomp/loop-transforms/tile-8.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: Adapt
to changed diagnostic messages.
* g++.dg/gomp/loop-transforms/tile-1.h: New test.
* g++.dg/gomp/loop-transforms/tile-1a.C: New test.
* g++.dg/gomp/loop-transforms/tile-1b.C: New test.
|
|
This commit implements the Fortran front end support for the "omp
tile" directive and the corresponding middle end transformation.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_statement): Add ST_OMP_TILE, ST_OMP_END_TILE.
(enum gfc_exec_op): Add EXEC_OMP_TILE.
(loop_transform_p): New declaration.
(struct gfc_omp_clauses): Add "tile_sizes" field.
* dump-parse-tree.cc (show_omp_clauses): Handle "tile_sizes" dumping.
(show_omp_node): Handle EXEC_OMP_TILE.
(show_code_node): Likewise.
* match.h (gfc_match_omp_tile): New declaration.
* openmp.cc (gfc_free_omp_clauses): Free "tile_sizes" field.
(match_tile_sizes): New function.
(OMP_TILE_CLAUSES): New macro.
(gfc_match_omp_tile): New function.
(resolve_omp_do): Handle EXEC_OMP_TILE.
(resolve_omp_tile): New function.
(omp_code_to_statement): Handle EXEC_OMP_TILE.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle ST_OMP_END_TILE
and ST_OMP_TILE.
(next_statement): Handle ST_OMP_TILE.
(gfc_ascii_statement): Likewise.
(parse_omp_do): Likewise.
(parse_executable): Likewise.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_TILE.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle "tile_sizes" field.
(loop_transform_p): New function.
(gfc_expr_list_len): New function.
(gfc_trans_omp_do): Handle EXEC_OMP_TILE.
(gfc_trans_omp_directive): Likewise.
* trans.cc (trans_code): Likewise.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Handle OMP_CLAUSE_TILE.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_loop): Likewise.
* omp-transform-loops.cc (walk_omp_for_loops): New declaration.
(subst_var_in_op): New function.
(subst_var): New function.
(gomp_for_number_of_iterations): Adjust.
(gomp_for_iter_count_type): New function.
(gimple_assign_rhs_to_tree): New function.
(subst_defs): New function.
(gomp_for_uncollapse): Adjust.
(transformation_clause_p): Add OMP_CLAUSE_TILE.
(tile): New function.
(transform_gomp_for): Handle OMP_CLAUSE_TILE.
(optimize_transformation_clauses): Handle OMP_CLAUSE_TILE.
* omp-general.cc (omp_loop_transform_clause_p): Add
OMP_CLAUSE_TILE.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TILE.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_TILE.
* tree.cc: Add OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_SIZES): New macro.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/tile-unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-tile-2.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-1a.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/tile-unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-tile-2.f90: New test.
|
|
OMP_CLAUSE_TILE will be used for the OpenMP 5.1 loop transformation
construct "omp tile".
gcc/ChangeLog:
* tree-core.h (enum omp_clause_code): Rename OMP_CLAUSE_TILE.
* tree.h (OMP_CLAUSE_TILE_LIST): Rename to ...
(OMP_CLAUSE_OACC_TILE_LIST): ... this.
(OMP_CLAUSE_TILE_ITERVAR): Rename to ...
(OMP_CLAUSE_OACC_TILE_ITERVAR): ... this.
(OMP_CLAUSE_TILE_COUNT): Rename to ...
(OMP_CLAUSE_OACC_TILE_COUNT): this.
* gimplify.cc (gimplify_scan_omp_clauses): Adjust to renamings.
(gimplify_adjust_omp_clauses): Likewise.
(gimplify_omp_for): Likewise.
* omp-general.cc (omp_extract_for_data): Likewise.
* omp-low.cc (scan_sharing_clauses): Likewise.
(lower_oacc_head_mark): Likewise.
* tree-nested.cc (convert_nonlocal_omp_clauses): Likewise.
(convert_local_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Likewise.
* tree.cc: Likewise.
gcc/c-family/ChangeLog:
* c-omp.cc (c_oacc_split_loop_clauses): Adjust to renamings.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_collapse): Adjust to renamings.
(c_parser_oacc_clause_tile): Likewise.
(c_parser_omp_for_loop): Likewise.
* c-typeck.cc (c_finish_omp_clauses): Likewise.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_oacc_clause_tile): Adjust to renamings.
(cp_parser_omp_clause_collapse): Likewise.
(cp_parser_omp_for_loop): Likewise.
* pt.cc (tsubst_omp_clauses): Likewise.
* semantics.cc (finish_omp_clauses): Likewise.
(finish_omp_for): Likewise.
gcc/fortran/ChangeLog:
* openmp.cc (enum omp_mask2): Adjust to renamings.
(gfc_match_omp_clauses): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Likewise.
|
|
This commit implements the C and the C++ front end changes to support
the "omp unroll" directive. The execution of the loop transformation
relies on the pass that has been added as a part of the earlier
Fortran patch.
gcc/c-family/ChangeLog:
* c-gimplify.cc (c_genericize_control_stmt): Handle OMP_UNROLL.
* c-omp.cc: Add "unroll" to omp_directives[].
* c-pragma.cc: Add "unroll" to omp_pragmas_simd[].
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_UNROLL to
pragma_kind and adjust PRAGMA_OMP__LAST_.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_FULL and
PRAGMA_OMP_CLAUSE_PARTIAL.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_clause_name): Handle "full" and
"partial" clauses.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(c_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(c_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_FULL
and PRAGMA_OMP_CLAUSE_PARTIAL.
(c_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(c_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(c_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
* c-typeck.cc (c_finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.
gcc/cp/ChangeLog:
* cp-gimplify.cc (cp_gimplify_expr): Handle OMP_UNROLL.
(cp_fold_r): Likewise.
(cp_genericize_r): Likewise.
* parser.cc (cp_parser_omp_clause_name): Handle "full" clause.
(check_no_duplicate_clause): Change return type to bool and
return check result.
(cp_parser_omp_clause_unroll_full): New function for parsing
the "unroll clause".
(cp_parser_omp_clause_unroll_partial): New function for
parsing the "partial" clause.
(cp_parser_omp_all_clauses): Handle OMP_CLAUSE_UNROLL and
OMP_CLAUSE_FULL.
(cp_parser_nested_omp_unroll_clauses): New function for parsing
"omp unroll" directives following another directive.
(cp_parser_omp_for_loop): Handle "omp unroll" directives
between directive and loop.
(OMP_UNROLL_CLAUSE_MASK): New definition.
(cp_parser_omp_unroll): New function for parsing "omp unroll"
loops that are not associated with another directive.
(cp_parser_omp_construct): Handle PRAGMA_OMP_UNROLL.
(cp_parser_pragma): Handle PRAGMA_OMP_UNROLL.
* pt.cc (tsubst_omp_clauses): Handle
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_NONE.
(tsubst_expr): Handle OMP_UNROLL.
* semantics.cc (finish_omp_clauses): Handle
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_PARTIAL,
and OMP_CLAUSE_UNROLL_NONE.
libgomp/ChangeLog:
* testsuite/libgomp.c++/loop-transforms/unroll-1.C: New test.
* testsuite/libgomp.c++/loop-transforms/unroll-2.C: New test.
* testsuite/libgomp.c-c++-common/loop-transforms/unroll-1.c: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/loop-transforms/unroll-1.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-2.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-3.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-4.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-5.c: New test.
* c-c++-common/gomp/loop-transforms/unroll-6.c: New test.
* g++.dg/gomp/loop-transforms/unroll-1.C: New test.
* g++.dg/gomp/loop-transforms/unroll-2.C: New test.
* g++.dg/gomp/loop-transforms/unroll-3.C: New test.
|
|
This commit implements the OpenMP 5.1 "omp unroll" directive for
Fortran. The Fortran front end changes encompass the parsing and the
verification of nesting restrictions etc. The actual loop
transformation is implemented in a new language-independent
"omp_transform_loops" pass which runs before omp lowering. No attempt
is made to re-use existing unrolling optimizations because a separate
implementation allows for better control of the unrolling. The new
pass will also serve as a foundation for the implementation of further
OpenMP loop transformations. This commit only implements the support
for "omp unroll" on the outermost loop of a loop nest. The support
for inner loops will be added later.
gcc/ChangeLog:
* Makefile.in: Add omp_transform_loops.o.
* gimple-pretty-print.cc (dump_gimple_omp_for): Handle "full"
and "partial" clauses.
* gimple.h (enum gf_mask): Add GF_OMP_FOR_KIND_TRANSFORM_LOOP.
* gimplify.cc (is_gimple_stmt): Handle OMP_UNROLL.
(gimplify_scan_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_adjust_omp_clauses): Handle OMP_UNROLL_FULL,
OMP_UNROLL_NONE, and OMP_UNROLL_PARTIAL.
(gimplify_omp_for): Handle OMP_UNROLL.
(gimplify_expr): Likewise.
* params.opt: Add omp-unroll-full-max-iteration and
omp-unroll-default-factor.
* passes.def: Add pass_omp_transform_loop before
pass_lower_omp.
* tree-core.h (enum omp_clause_code): Add
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree-pass.h (make_pass_omp_transform_loops): Declare
pmake_pass_omp_transform_loops.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_FULL, and
OMP_CLAUSE_UNROLL_PARTIAL.
(dump_generic_node): Handle OMP_UNROLL.
* tree.cc (omp_clause_num_ops): Add number of operators
for OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAl.
(omp_clause_code_names): Add name strings for
OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE, and
OMP_CLAUSE_UNROLL_PARTIAL.
* tree.def (OMP_UNROLL): Define.
* tree.h (OMP_CLAUSE_UNROLL_PARTIAL_EXPR): Define.
* omp-transform-loops.cc: New file.
* omp-general.cc (omp_loop_transform_clause_p): New function.
* omp-general.h (omp_loop_transform_clause_p): New declaration.
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_clauses): Handle "unroll full"
and "unroll partial".
(show_omp_node): Handle OMP_UNROLL.
(show_code_node): Handle EXEC_OMP_UNROLL.
* gfortran.h (enum gfc_statement): Add ST_OMP_UNROLL, ST_OMP_END_UNROLL.
(enum gfc_exec_op): Add EXEC_OMP_UNROLL.
* match.h (gfc_match_omp_unroll): Declare.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_NONE, OMP_CLAUSE_UNROLL_PARTIAL.
(gfc_match_omp_clauses): Handle "omp unroll partial".
(OMP_UNROLL_CLAUSES): New macro definition.
(gfc_match_omp_unroll): Match "full" clause.
(omp_unroll_removes_loop_nest): New function.
(resolve_omp_unroll): New function.
(resolve_omp_do): Accept and verify "omp unroll"
directives between directive and loop.
(omp_code_to_statement): Handle EXEC_OMP_UNROLL.
(gfc_resolve_omp_directive): Likewise.
* parse.cc (decode_omp_directive): Handle "undroll" and "end unroll".
(next_statement): Handle ST_OMP_UNROLL.
(gfc_ascii_statement): Handle ST_OMP_UNROLL and ST_OMP_END_UNROLL.
(parse_omp_do): Accept ST_OMP_UNROLL and ST_OMP_END_UNROLL
before/after loop.
(parse_executable): Handle ST_OMP_UNROLL.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_UNROLL.
(gfc_resolve_code): Likewise.
* st.cc (gfc_free_statement): Likewise.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle unroll clauses.
(gfc_trans_omp_do): Handle OMP_CLAUSE_UNROLL_FULL,
OMP_CLAUSE_UNROLL_PARTIAL, OMP_CLAUSE_UNROLL_NONE creation.
(gfc_trans_omp_directive): Handle EXEC_OMP_UNROLL.
* trans.cc (trans_code): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-2.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-3.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-4.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-5.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-6.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7a.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7b.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-7c.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-8.f90: New test.
* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/loop-transforms/unroll-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-4.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-5.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-6.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-7.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-9.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-1.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-2.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-no-clause-3.f90: New test.
* gfortran.dg/gomp/loop-transforms/unroll-simd-1.f90: New test.
|
|
Even with 'alloc' and map-entering 'from' mapping, the following should hold.
For explicit mapping, that's already the case, this handles the automatical
deep mapping of allocatable components. Namely:
* On the device, the array bounds (of allocated allocatables) must match the
host, implying 'to' (or 'tofrom') mapping.
* On map exiting, the copying out shall not destroy the unallocated allocation
status (nor the pointer address of allocated allocatables).
The latter was not a problem for allocated allocatables as for those a pointer
was GOMP_MAP_ATTACHed; however, for unallocated allocatables, before it copied
back device-allocated memory which might not be nullified.
While 'alloc' was not deep-mapped at all, for map-entering 'from', the array
bounds were not set, making allocated derived-type components inaccessible on
the device (and wrong on the host on copy back).
The solution is, first, to deep-map 'alloc' as well and to copy to the device
even with 'alloc' and (map-entering) 'from'. This copying is only done if there
is a scalar (for the unallocated case) or array allocatable directly in the
derived type and then it is shallowly copied; the data pointed to is then again
only alloc'ed, unless it contains in turn allocatables.
gcc/fortran/
* trans-openmp.cc (gfc_has_alloc_comps): Add 'bool
shallow_alloc_only=false' arg.
(gfc_omp_replace_alloc_by_to_mapping): New, call it.
(gfc_omp_deep_map_kind_p): Return 'true' also for '(present,)alloc'.
(gfc_omp_deep_mapping_item, gfc_omp_deep_mapping_do): On map entering,
replace shallowly 'alloc'/'from' by '(from)to' mapping if there are
allocatable components.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
|
|
Proper variables/components of type BT_CLASS have 'class_ok' set; check
for that to avoid an ICE on invalid code for gfortran.dg/pr108434.f90.
gcc/fortran/
* class.cc (generate_callback_wrapper): Add attr.class_ok check.
* resolve.cc (resolve_fl_derived): Likewise.
|
|
allocatables/pointers
target exit data: Do unmap GOMP_MAP_POINTER for scalar allocatables/pointers
to prevent stale mappings.
While for allocatable/pointer arrays, there is a PSET followed by POINTER,
for allocatable/pointer scalars there is only a POINTER. Before the below
mentioned OG12 patch: For exit data, PSET was converted to RELEASE/DELETE
in gimplify.cc while all POINTER were removed; correct for arrays but leaving
POINTER behind for scalars. Since that commit, all in trans-openmp.cc but
the scalar case was still mishandled before this follow-up commit.
This is a follow up to OG12's 55a18d4744258e3909568e425f9f473c49f9d13f
While the problem is independent, it will be merged into v4 of the
mainline patch
'Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings'
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Fix unmapping of
GOMP_MAP_POINTER for scalar allocatables/pointers.
gcc/testsuite/
* gfortran.dg/gomp/map-10.f90: New test.
|
|
gcc/ChangeLog:
* config/gcn/gcn-protos.h (gcn_expand_dpp_swap_pairs_insn)
(gcn_expand_dpp_distribute_even_insn)
(gcn_expand_dpp_distribute_odd_insn): Declare.
* config/gcn/gcn-valu.md (@dpp_swap_pairs<mode>)
(@dpp_distribute_even<mode>, @dpp_distribute_odd<mode>)
(cmul<conj_op><mode>3, cml<addsub_as><mode>4, vec_addsub<mode>3)
(cadd<rot><mode>3, vec_fmaddsub<mode>4, vec_fmsubadd<mode>4)
(fms<mode>4<exec>, fms<mode>4_negop2<exec>, fms<mode>4)
(fms<mode>4_negop2): New patterns.
* config/gcn/gcn.cc (gcn_expand_dpp_swap_pairs_insn)
(gcn_expand_dpp_distribute_even_insn)
(gcn_expand_dpp_distribute_odd_insn): New functions.
* config/gcn/gcn.md: Add entries to unspec enum.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/complex.c: New test.
|
|
Fix an issue in which "vectors" of duplicate entries placed in scalar
registers caused the following 63 registers to be marked live, for the
purpose of prologue generation, which resulted in stack corruption.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_class_max_nregs): Handle vectors in SGPRs.
(move_callee_saved_registers): Detect the bug condition early.
|
|
While writing a testcase for PR106794, I noticed that we failed
to vectorise the testcase in the patch for SVE. The code that
recognises gather loads tries to optimise the point at which
the offset is calculated, to avoid unnecessary extensions or
truncations:
/* Don't include the conversion if the target is happy with
the current offset type. */
But breaking only makes sense if we're at an SSA_NAME (which could
then be vectorised). We shouldn't break on a conversion embedded
in a generic expression.
gcc/
* tree-vect-data-refs.cc (vect_check_gather_scatter): Restrict
early-out optimisation to SSA_NAMEs.
gcc/testsuite/
* gcc.dg/vect/vect-gather-5.c: New test.
(cherry picked from commit 4a773bf2f08656a39ac75cf6b4871c8cec8b5007)
|
|
The GPU architecture requires SImode offsets on gather/scatter instructions,
but they can also take a vector of absolute addresses, so this allows
gather/scatter in more situations.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (gather_load<mode><vndi>): New.
(scatter_store<mode><vndi>): New.
(mask_gather_load<mode><vndi>): New.
(mask_scatter_store<mode><vndi>): New.
|
|
Just using move insn for no-op conversions triggers special move handling in
IRA which declares that subreg of vectors aren't valid and routes everything
through memory. These patterns make the vec_select explicit and all is well.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (gcn_stepped_zero_int_parallel_p): New.
* config/gcn/gcn-valu.md (V_1REG_ALT): New.
(V_2REG_ALT): New.
(vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): New.
(vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): New.
(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Use new patterns.
* config/gcn/gcn.cc (gcn_stepped_zero_int_parallel_p): New.
* config/gcn/predicates.md (ascending_zero_int_parallel): New.
|
|
gcc/ChangeLog:
* config/gcn/gcn-valu.md (<expander><mode>3_exec): Add patterns for
{s|u}{max|min} in QI, HI and DI modes.
(<expander><mode>3): Add pattern for {s|u}{max|min} in DI mode.
(cond_<fexpander><mode>): Add pattern for cond_f{max|min}.
(cond_<expander><mode>): Add pattern for cond_{s|u}{max|min}.
* config/gcn/gcn.cc (gcn_spill_class): Allow the exec register to be
saved in SGPRs.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/cond_fmaxnm_1.c: New test.
* gcc.target/gcn/cond_fmaxnm_1_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_2.c: New test.
* gcc.target/gcn/cond_fmaxnm_2_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_3.c: New test.
* gcc.target/gcn/cond_fmaxnm_3_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_4.c: New test.
* gcc.target/gcn/cond_fmaxnm_4_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_5.c: New test.
* gcc.target/gcn/cond_fmaxnm_5_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_6.c: New test.
* gcc.target/gcn/cond_fmaxnm_6_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_7.c: New test.
* gcc.target/gcn/cond_fmaxnm_7_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_8.c: New test.
* gcc.target/gcn/cond_fmaxnm_8_run.c: New test.
* gcc.target/gcn/cond_fminnm_1.c: New test.
* gcc.target/gcn/cond_fminnm_1_run.c: New test.
* gcc.target/gcn/cond_fminnm_2.c: New test.
* gcc.target/gcn/cond_fminnm_2_run.c: New test.
* gcc.target/gcn/cond_fminnm_3.c: New test.
* gcc.target/gcn/cond_fminnm_3_run.c: New test.
* gcc.target/gcn/cond_fminnm_4.c: New test.
* gcc.target/gcn/cond_fminnm_4_run.c: New test.
* gcc.target/gcn/cond_fminnm_5.c: New test.
* gcc.target/gcn/cond_fminnm_5_run.c: New test.
* gcc.target/gcn/cond_fminnm_6.c: New test.
* gcc.target/gcn/cond_fminnm_6_run.c: New test.
* gcc.target/gcn/cond_fminnm_7.c: New test.
* gcc.target/gcn/cond_fminnm_7_run.c: New test.
* gcc.target/gcn/cond_fminnm_8.c: New test.
* gcc.target/gcn/cond_fminnm_8_run.c: New test.
* gcc.target/gcn/cond_smax_1.c: New test.
* gcc.target/gcn/cond_smax_1_run.c: New test.
* gcc.target/gcn/cond_smin_1.c: New test.
* gcc.target/gcn/cond_smin_1_run.c: New test.
* gcc.target/gcn/cond_umax_1.c: New test.
* gcc.target/gcn/cond_umax_1_run.c: New test.
* gcc.target/gcn/cond_umin_1.c: New test.
* gcc.target/gcn/cond_umin_1_run.c: New test.
* gcc.target/gcn/smax_1.c: New test.
* gcc.target/gcn/smax_1_run.c: New test.
* gcc.target/gcn/smin_1.c: New test.
* gcc.target/gcn/smin_1_run.c: New test.
* gcc.target/gcn/umax_1.c: New test.
* gcc.target/gcn/umax_1_run.c: New test.
* gcc.target/gcn/umin_1.c: New test.
* gcc.target/gcn/umin_1_run.c: New test.
(cherry picked from commit 553ff2524f412be4e02e2ffb1a0a3dc3e2280742)
|
|
Merge up to r12-9210-gb3f9d2cf7dd5488800f867a6aae076465ecb391b (2nd Mar 2023)
|
|
|
|
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
(cherry picked from commit 96ff97ff6574666a5509ae9fa596e7f2b6ad4f88)
|
|
|
|
|
|
Merge up to r12-9207-gb8e496d132ec087c9db5951fea23551dcc831d8c (27th Feb 2023)
|
|
and deferred-length strings"
Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
block to handle deferred-string length vars better, esp. when 'optional'.
gcc/testsuite/:
* gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
mapping changes.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
|
|
As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.
After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").
PR sanitizer/108834
gcc/ChangeLog:
* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).
gcc/testsuite/ChangeLog:
* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.
(cherry picked from commit 94c9b1bb79f63d000ebb05efc155c149325e332d)
|
|
As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.
Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610728.html
PR target/96373
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vec-length-epil-1.c: Add -fno-trapping-math.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/pr96373.c: New test.
(cherry picked from commit 4f5a1198065dc078f8099db628da7b06a2666f34)
|
|
|
|
|
|
|
|
gcc/ChangeLog:
* config/riscv/t-rtems: Keep only -mcmodel=medany 64-bit multilibs.
Add non-compact 32-bit multilibs.
(cherry picked from commit 35a067020e41d97bc3be15b518b3dc2a64b4aae2)
|
|
|
|
The following makes sure to only predicate calls necessary.
PR tree-optimization/108888
* tree-if-conv.cc (if_convertible_stmt_p): Set PLF_2 on
calls to predicate.
(predicate_statements): Only predicate calls with PLF_2.
* g++.dg/torture/pr108888.C: New testcase.
(cherry picked from commit 31cc5821223a096ef61743bff520f4a0dbba5872)
|
|
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be no regressions.
The sub-set of support should cover all cases needed by amdgcn, at present.
gcc/ChangeLog:
* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.
(cherry picked from commit 3da77f217c8b2089ecba3eb201e727c3fcdcd19d)
|
|
|