Age | Commit message (Collapse) | Author | Files | Lines |
|
implementation
... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it.
Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609
"libgomp/nvptx: Prepare for reverse-offload callback handling",
and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8
"libgomp: Handle OpenMP's reverse offloads".
libgomp/
* target.c (gomp_target_rev): Instead of 'dev_to_host_cpy',
'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'.
* libgomp.h (gomp_target_rev): Adjust.
* libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust.
* libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust.
* plugin/plugin-gcn.c (process_reverse_offload): Adjust.
* plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy)
(rev_off_host_to_dev_cpy): Remove.
(GOMP_OFFLOAD_run): Adjust.
|
|
An upcoming change requires that 'gomp_remove_var' be deferred until after all
'gomp_copy_dev2host' calls have been handled.
Do this likewise to how commit 275c736e732d29934e4d22e8f030d5aae8c12a52
"libgomp: Structure element mapping for OpenMP 5.0" changed 'gomp_exit_data'.
libgomp/
* target.c (gomp_unmap_vars_internal): Queue splay-tree keys for
removal after main loop.
|
|
structures [PR76739]
Fix-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
PR other/76739
libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Given OpenACC 'async',
defer 'free' of non-contiguous array support data structures.
* target.c (gomp_map_vars_internal): Likewise.
|
|
Even with 'alloc' and map-entering 'from' mapping, the following should hold.
For explicit mapping, that's already the case, this handles the automatical
deep mapping of allocatable components. Namely:
* On the device, the array bounds (of allocated allocatables) must match the
host, implying 'to' (or 'tofrom') mapping.
* On map exiting, the copying out shall not destroy the unallocated allocation
status (nor the pointer address of allocated allocatables).
The latter was not a problem for allocated allocatables as for those a pointer
was GOMP_MAP_ATTACHed; however, for unallocated allocatables, before it copied
back device-allocated memory which might not be nullified.
While 'alloc' was not deep-mapped at all, for map-entering 'from', the array
bounds were not set, making allocated derived-type components inaccessible on
the device (and wrong on the host on copy back).
The solution is, first, to deep-map 'alloc' as well and to copy to the device
even with 'alloc' and (map-entering) 'from'. This copying is only done if there
is a scalar (for the unallocated case) or array allocatable directly in the
derived type and then it is shallowly copied; the data pointed to is then again
only alloc'ed, unless it contains in turn allocatables.
gcc/fortran/
* trans-openmp.cc (gfc_has_alloc_comps): Add 'bool
shallow_alloc_only=false' arg.
(gfc_omp_replace_alloc_by_to_mapping): New, call it.
(gfc_omp_deep_map_kind_p): Return 'true' also for '(present,)alloc'.
(gfc_omp_deep_mapping_item, gfc_omp_deep_mapping_do): On map entering,
replace shallowly 'alloc'/'from' by '(from)to' mapping if there are
allocatable components.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-8.f90: New test.
|
|
Proper variables/components of type BT_CLASS have 'class_ok' set; check
for that to avoid an ICE on invalid code for gfortran.dg/pr108434.f90.
gcc/fortran/
* class.cc (generate_callback_wrapper): Add attr.class_ok check.
* resolve.cc (resolve_fl_derived): Likewise.
|
|
allocatables/pointers
target exit data: Do unmap GOMP_MAP_POINTER for scalar allocatables/pointers
to prevent stale mappings.
While for allocatable/pointer arrays, there is a PSET followed by POINTER,
for allocatable/pointer scalars there is only a POINTER. Before the below
mentioned OG12 patch: For exit data, PSET was converted to RELEASE/DELETE
in gimplify.cc while all POINTER were removed; correct for arrays but leaving
POINTER behind for scalars. Since that commit, all in trans-openmp.cc but
the scalar case was still mishandled before this follow-up commit.
This is a follow up to OG12's 55a18d4744258e3909568e425f9f473c49f9d13f
While the problem is independent, it will be merged into v4 of the
mainline patch
'Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings'
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Fix unmapping of
GOMP_MAP_POINTER for scalar allocatables/pointers.
gcc/testsuite/
* gfortran.dg/gomp/map-10a.f90: New test.
|
|
Fix an issue in which "vectors" of duplicate entries placed in scalar
registers caused the following 63 registers to be marked live, for the
purpose of prologue generation, which resulted in stack corruption.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_class_max_nregs): Handle vectors in SGPRs.
(move_callee_saved_registers): Detect the bug condition early.
|
|
The GPU architecture requires SImode offsets on gather/scatter instructions,
but they can also take a vector of absolute addresses, so this allows
gather/scatter in more situations.
gcc/ChangeLog:
* config/gcn/gcn-valu.md (gather_load<mode><vndi>): New.
(scatter_store<mode><vndi>): New.
(mask_gather_load<mode><vndi>): New.
(mask_scatter_store<mode><vndi>): New.
|
|
Just using move insn for no-op conversions triggers special move handling in
IRA which declares that subreg of vectors aren't valid and routes everything
through memory. These patterns make the vec_select explicit and all is well.
gcc/ChangeLog:
* config/gcn/gcn-protos.h (gcn_stepped_zero_int_parallel_p): New.
* config/gcn/gcn-valu.md (V_1REG_ALT): New.
(V_2REG_ALT): New.
(vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop): New.
(vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop): New.
(vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Use new patterns.
* config/gcn/gcn.cc (gcn_stepped_zero_int_parallel_p): New.
* config/gcn/predicates.md (ascending_zero_int_parallel): New.
|
|
Upstream has 'goacc_map_vars'; merge the new 'gomp_map_vars_openacc' into it.
(Maybe the latter didn't exist yet when the former was originally added?)
No functional change.
Clean-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
PR other/76739
libgomp/
* libgomp.h (goacc_map_vars): Add 'struct goacc_ncarray_info *'
formal parameter.
(gomp_map_vars_openacc): Remove.
* target.c (goacc_map_vars): Adjust.
(gomp_map_vars_openacc): Remove.
* oacc-mem.c (acc_map_data, goacc_enter_datum)
(goacc_enter_data_internal): Adjust.
* oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
Adjust.
|
|
and deferred-length strings"
Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
block to handle deferred-string length vars better, esp. when 'optional'.
gcc/testsuite/:
* gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
mapping changes.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
|
|
Fix a bug in which Fortran pointers inside derived types caused a runtime
error when Unified Shared Memory was active.
libgomp/ChangeLog:
* target.c (gomp_attach_pointer): Check for USM.
* testsuite/libgomp.fortran/usm-3.f90: New test.
|
|
Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.
The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.
This is the OG12 variant of the submitted but unreviewed GCC 13/mainline
patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612387.html
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
|
|
Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned". No functional change.
libgomp/
* libgomp_g.h (GOMP_enable_pinned_mode): New.
|
|
... instead of 'mmap' plus attempting to register using a device.
Implemented for nvptx offloading via 'cuMemHostAlloc'.
This re-works og12 commit a5a4800e92773da7126c00a9c79b172494d58ab5
"Attempt to register OpenMP pinned memory using a device instead of 'mlock'".
include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): Remove.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Add 'init0'
formal parameter. Adjust all users.
(linux_memspace_alloc, linux_memspace_free): Attempt to allocate
OpenMP pinned memory using a device instead of 'mmap' plus
attempting to register using a device.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): New.
(struct gomp_device_descr): Remove 'register_page_locked_func',
'unregister_page_locked_func'. Add 'page_locked_host_alloc_func',
'page_locked_host_free_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): Remove.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): Add.
(gomp_load_plugin_for_device): Don't handle
'register_page_locked', 'unregister_page_locked'. Handle
'page_locked_host_alloc', 'page_locked_host_free'.
Suggested-by: Andrew Stubbs <ams@codesourcery.com>
|
|
Implemented for nvptx offloading via 'cuMemHostRegister'. This means: (a) not
running into 'mlock' limitations, and (b) the device is aware of this and may
optimize host <-> device memory transfers.
This re-works og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory".
include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): New.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_free, linux_memspace_realloc): Attempt to register
OpenMP pinned memory using a device instead of 'mlock'.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): New
(struct gomp_device_descr): Add 'register_page_locked_func',
'unregister_page_locked_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): New.
(gomp_load_plugin_for_device): Handle 'register_page_locked',
'unregister_page_locked'.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
|
|
... to not run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd
here.
Fix-up for og12 commit c5d1d7651297a273321154a5fe1b01eba9dcf604
"libgomp, nvptx: low-latency memory allocator".
libgomp/
* allocator.c (omp_realloc): Route 'free' through 'MEMSPACE_FREE'.
|
|
Clarification for og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory". No functional change.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_calloc): Clarify zero-initialization for pinned
memory.
* testsuite/libgomp.c/alloc-pinned-1.c: Verify zero-initialization
for pinned memory.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
|
|
'ompx_host_mem_space'
Clean-up for og12 commit 84914e197d91a67b3d27db0e4c69a433462983a5
"openmp, nvptx: ompx_unified_shared_mem_alloc". No functional change.
libgomp/
* config/linux/allocator.c (linux_memspace_calloc): Elide
(innocuous) duplicate 'if' condition.
* config/nvptx/allocator.c (nvptx_memspace_free): Explicitly
handle 'memspace == ompx_host_mem_space'.
* libgomp.h (gomp_is_usm_ptr): Remove.
|
|
Change '-foffload=amdgcn-amdhsa=[...]' to
'-foffload-options=amdgcn-amdhsa=[...]', so that non-GCN offloading compilation
doesn't get disabled.
Fix-up for og12 commit 6ec2c29dbbc19e7d2a8f991a5848e10c65c7c74c
"amdgcn, libgomp: USM allocation update".
libgomp/
* testsuite/libgomp.c/usm-1.c: Re-enable non-GCN offloading
compilation.
* testsuite/libgomp.c/usm-2.c: Likewise.
* testsuite/libgomp.c/usm-3.c: Likewise.
* testsuite/libgomp.c/usm-4.c: Likewise.
|
|
This implements the OpenMP low-latency memory allocator for AMD GCN using the
small per-team LDS memory (Local Data Store).
Since addresses can now refer to LDS space, the "Global" address space is
no-longer compatible. This patch therefore switches the backend to use
entirely "Flat" addressing (which supports both memories). A future patch
will re-enable "global" instructions for cases where it is known to be safe
to do so.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in.
* config/gcn/gcn.cc (gcn_init_machine_status): Disable global
addressing.
(gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR.
libgomp/ChangeLog:
* config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
(GCN_LOWLAT_HEAP): New.
* config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h.
(__gcn_lowlat_init): New prototype.
(gomp_gcn_enter_kernel): Initialize the low-latency heap.
* libgomp.h (TEAM_ARENA_START): Move to libgomp.h.
(TEAM_ARENA_FREE): Likewise.
(TEAM_ARENA_END): Likewise.
* plugin/plugin-gcn.c (lowlat_size): New variable.
(print_kernel_dispatch): Label the group_segment_size purpose.
(init_environment_variables): Read GOMP_GCN_LOWLAT_POOL.
(create_kernel_dispatch): Pass low-latency head allocation to kernel.
(run_kernel): Use shadow; don't assume values.
* testsuite/libgomp.c/allocators-7.c: Enable for amdgcn.
* config/gcn/allocator.c: New file.
|
|
There shouldn't be a functionality change; this is just so AMD can share
the code.
The new basic-allocator.c is designed to be included so it can be used as a
template multiple times and inlined.
libgomp/ChangeLog:
* config/nvptx/allocator.c (BASIC_ALLOC_PREFIX): New define, and
include basic-allocator.c.
(__nvptx_lowlat_heap_root): Remove.
(heapdesc): Remove.
(nvptx_memspace_alloc): Move implementation to basic-allocator.c.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_free): Likewise.
(nvptx_memspace_realloc): Likewise.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): Remove.
(gomp_nvptx_main): Call __nvptx_lowlat_init.
* basic-allocator.c: New file.
|
|
Their execution isn't expected to error out if we've been *compiling for any
offload target*, but rather if they're *executing on a non-shared memory
offload device*. For example, if (any) offloading is configured but not
effective (no device available, for example), you'd get:
PASS: libgomp.c/../libgomp.c-c++-common/target-present-1.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/target-present-1.c execution test
PASS: libgomp.c/../libgomp.c-c++-common/target-present-2.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/target-present-2.c execution test
PASS: libgomp.c/../libgomp.c-c++-common/target-present-3.c (test for excess errors)
FAIL: libgomp.c/../libgomp.c-c++-common/target-present-3.c execution test
PASS: libgomp.c++/../libgomp.c-c++-common/target-present-1.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-1.c execution test
PASS: libgomp.c++/../libgomp.c-c++-common/target-present-2.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-2.c execution test
PASS: libgomp.c++/../libgomp.c-c++-common/target-present-3.c (test for excess errors)
FAIL: libgomp.c++/../libgomp.c-c++-common/target-present-3.c execution test
PASS: libgomp.fortran/target-present-1.f90 -O0 (test for excess errors)
FAIL: libgomp.fortran/target-present-1.f90 -O0 execution test
[...]
PASS: libgomp.fortran/target-present-2.f90 -O0 (test for excess errors)
FAIL: libgomp.fortran/target-present-2.f90 -O0 execution test
[...]
PASS: libgomp.fortran/target-present-3.f90 -O0 (test for excess errors)
FAIL: libgomp.fortran/target-present-3.f90 -O0 execution test
[...]
Also, verify reaching a checkpoint before the expected error condition -- and
fix up one case where that didn't happen; missing OpenMP 'map' clauses
('libgomp.fortran/target-present-2.f90').
Fix-up for recent og12 commit 229b705862c1d7f9634f72272b77c22970baf821
"openmp: Add support for the 'present' modifier"
libgomp/
* testsuite/libgomp.c-c++-common/target-present-1.c: Fix.
* testsuite/libgomp.c-c++-common/target-present-2.c: Likewise.
* testsuite/libgomp.c-c++-common/target-present-3.c: Likewise.
* testsuite/libgomp.fortran/target-present-1.f90: Likewise.
* testsuite/libgomp.fortran/target-present-2.f90: Likewise.
* testsuite/libgomp.fortran/target-present-3.f90: Likewise.
|
|
While mainline (GCC 13) converts assumptions to the ASSUME internal
function, OG12 has only parsing-only support. Thus, remove the
failing dump scan from the testcase.
gcc/testsuite/
* gfortran.dg/gomp/openmp-simd-8.f90: Remove dump test.
|
|
This implements support for the OpenMP 5.1 'present' modifier, which can be
used in map clauses in the 'target', 'target data', 'target data enter' and
'target data exit' constructs, and in the 'to' and 'from' clauses of the
'target update' construct. It is also supported in defaultmap.
The modifier triggers a fatal runtime error if the data specified by the
clause is not already present on the target device. It can also be combined
with 'always' in map clauses.
2023-02-01 Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/c/
* c-parser.cc (c_parser_omp_variable_list): Set default motion
modifier.
(c_parser_omp_var_list_parens): Add new parameter with default. Parse
'present' motion modifier and apply.
(c_parser_omp_clause_defaultmap): Parse 'present' in defaultmap.
(c_parser_omp_clause_map): Parse 'present' modifier in map clauses.
(c_parser_omp_clause_to): Allow use of 'present' in variable list.
(c_parser_omp_clause_from): Likewise.
(c_parser_omp_target_data): Allow map clauses with 'present'
modifiers.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
gcc/cp/
* parser.cc (cp_parser_omp_var_list_no_open): Add new parameter with
default. Parse 'present' motion modifier and apply.
(cp_parser_omp_clause_defaultmap): Parse 'present' in defaultmap.
(cp_parser_omp_clause_map): Parse 'present' modifier in map clauses.
(cp_parser_omp_all_clauses): Allow use of 'present' in 'to' and 'from'
clauses.
(cp_parser_omp_target_data): Allow map clauses with 'present'
modifiers.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
* semantics.cc (finish_omp_target): Accept map clauses with 'present'
modifiers.
gcc/fortran/
* dump-parse-tree.cc (show_omp_namelist): Display 'present' map
modifier.
(show_omp_clauses): Display 'present' motion modifier for 'to'
and 'from' clauses.
* gfortran.h (enum gfc_omp_map_op): Add entries with 'present'
modifiers.
(enum gfc_omp_motion_modifier): New.
(struct gfc_omp_namelist): Add motion_modifier field.
* openmp.cc (gfc_match_omp_variable_list): Add new parameter with
default. Parse 'present' motion modifier and apply.
(gfc_match_omp_clauses): Parse 'present' in defaultmap, 'from'
clauses, 'map' clauses and 'to' clauses.
(resolve_omp_clauses): Allow 'present' modifiers on 'target',
'target data', 'target enter' and 'target exit' directives.
* trans-openmp.cc (gfc_omp_deep_map_kind_p): Handle map kinds with
'present' modifier.
(gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for
'map', 'to' and 'from' clauses. Apply 'present' for defaultmap.
gcc/
* gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag
and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag
set.
(omp_get_attachment): Handle map clauses with 'present' modifier.
(omp_group_base): Likewise.
(gimplify_scan_omp_clauses): Reorder present maps to come first.
Set GOVD flags for present defaultmaps.
(gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps.
* omp-low.cc (scan_sharing_clauses): Handle 'always, present' map
clauses.
(lower_omp_target): Handle map clauses with 'present' modifier.
Handle 'to' and 'from' clauses with 'present'.
* tree-core.h (enum omp_clause_defaultmap_kind): Add
OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind.
(enum omp_clause_motion_modifier): New.
(struct tree_omp_clause): Add motion_modifier field.
* tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and
'from' clauses with 'present' modifier. Handle present defaultmap.
* tree.h (OMP_CLAUSE_MOTION_MODIFIER): New.
(OMP_CLAUSE_SET_MOTION_MODIFIER): New.
gcc/testsuite/
* c-c++-common/gomp/defaultmap-4.c: New.
* c-c++-common/gomp/map-6.c: Update expected error messages.
* c-c++-common/gomp/map-9.c: New.
* c-c++-common/gomp/target-update-1.c: New.
* gfortran.dg/gomp/defaultmap-1.f90: Update expected error messages.
* gfortran.dg/gomp/defaultmap-8.f90: New.
* gfortran.dg/gomp/map-10.f90: New.
* gfortran.dg/gomp/target-update-1.f90: New.
include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New.
(GOMP_MAP_FLAG_FORCE): Redefine.
(GOMP_MAP_FLAG_PRESENT): New.
(GOMP_MAP_FLAG_ALWAYS_PRESENT): New.
(enum gomp_map_kind): Add map kinds with 'present' modifiers.
(GOMP_MAP_COPY_TO_P): Evaluate to true for map variants with 'present'
modifiers.
(GOMP_MAP_COPY_FROM_P): Likewise.
(GOMP_MAP_ALWAYS_TO_P): Evaluate to true for map variants with
'always, present' modifiers.
(GOMP_MAP_ALWAYS_FROM_P): Likewise.
(GOMP_MAP_ALWAYS): Redefine.
(GOMP_MAP_FORCE_P): New.
(GOMP_MAP_PRESENT_P): New.
libgomp/
* target.c (gomp_to_device_kind_p): Add map kinds with 'present'
modifier.
(gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro.
(gomp_map_vars_internal): Emit runtime error if memory region not
present.
(gomp_update): Likewise.
(gomp_target_rev): Likewise.
* testsuite/libgomp.c-c++-common/target-present-1.c: New.
* testsuite/libgomp.c-c++-common/target-present-2.c: New.
* testsuite/libgomp.c-c++-common/target-present-3.c: New.
* testsuite/libgomp.fortran/target-present-1.f90: New.
* testsuite/libgomp.fortran/target-present-2.f90: New.
* testsuite/libgomp.fortran/target-present-3.f90: New.
Add 'present' map types to gfc_omp_deep_map_kind_p
|
|
Otherwise, for build-tree testing:
[...]/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90:10:7: Fatal Error: Cannot open module file 'omp_lib.mod' for reading at (1): No such file or directory
..., and thus corresponding FAILs.
(Not renamed to 'libgomp.fortran/allocate-4.f90', as that one already exists.)
Fix-up for og12 commit 491478d12b83e102f72858e8a871a25c951df293
"Add parsing support for allocate directive (OpenMP 5.0)".
gcc/testsuite/
* gfortran.dg/gomp/allocate-4.f90: Cut.
libgomp/
* testsuite/libgomp.fortran/allocate-5.f90: Paste.
|
|
'libgomp.{c-c++-common,fortran}/uses_allocators-*'
Otherwise, for build-tree testing:
[...]/gcc/testsuite/c-c++-common/gomp/uses_allocators-1.c:4:10: fatal error: omp.h: No such file or directory
[...]/gcc/testsuite/c-c++-common/gomp/uses_allocators-2.c:3:10: fatal error: omp.h: No such file or directory
[...]/gcc/testsuite/c-c++-common/gomp/uses_allocators-3.c:4:10: fatal error: omp.h: No such file or directory
[...]/gcc/testsuite/gfortran.dg/gomp/uses_allocators-1.f90:5:7: Fatal Error: Cannot open module file 'omp_lib.mod' for reading at (1): No such file or directory
[...]/gcc/testsuite/gfortran.dg/gomp/uses_allocators-2.f90:4:7: Fatal Error: Cannot open module file 'omp_lib.mod' for reading at (1): No such file or directory
[...]/gcc/testsuite/gfortran.dg/gomp/uses_allocators-3.f90:4:7: Fatal Error: Cannot open module file 'omp_lib.mod' for reading at (1): No such file or directory
..., and thus corresponding FAILs, UNRESOLVEDs.
Fix-up for og12 commit dbc770c4351c8824e8083f8aff6117a6b4ba3c0d
"openmp: Implement uses_allocators clause".
gcc/testsuite/
* c-c++-common/gomp/uses_allocators-1.c: Cut.
* c-c++-common/gomp/uses_allocators-2.c: Likewise.
* c-c++-common/gomp/uses_allocators-3.c: Likewise.
* gfortran.dg/gomp/uses_allocators-1.f90: Likewise.
* gfortran.dg/gomp/uses_allocators-2.f90: Likewise.
* gfortran.dg/gomp/uses_allocators-3.f90: Likewise.
libgomp/
* testsuite/libgomp.c++/c++.exp (check_effective_target_c)
(check_effective_target_c++): New.
* testsuite/libgomp.c/c.exp (check_effective_target_c)
(check_effective_target_c++): Likewise.
* testsuite/libgomp.c-c++-common/uses_allocators-1.c: Paste.
* testsuite/libgomp.c-c++-common/uses_allocators-2.c: Likewise.
* testsuite/libgomp.c-c++-common/uses_allocators-3.c: Likewise.
* testsuite/libgomp.fortran/uses_allocators-1.f90: Likewise.
* testsuite/libgomp.fortran/uses_allocators-2.f90: Likewise.
* testsuite/libgomp.fortran/uses_allocators-3.f90: Likewise.
|
|
Otherwise, for build-tree testing:
xgcc: fatal error: cannot read spec file 'libgomp.spec': No such file or directory
..., and thus corresponding FAILs, UNRESOLVEDs.
Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned".
gcc/testsuite/
* c-c++-common/gomp/alloc-pinned-1.c: Cut.
libgomp/
* testsuite/libgomp.c-c++-common/alloc-pinned-1.c: Paste.
|
|
I've noticed that while 'gfortran.dg/gomp/allocate-4.f90' is all-PASS for
x86_64-pc-linux-gnu (default) '-m64' testing, it does have one FAIL for
'-m32' testing: 'test for errors, line 25'. Here's the 'diff':
@@ -1,8 +1,3 @@
-source-gcc/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90:25:34:
-
- 25 | !$omp allocate (var1) allocator(10) ! { dg-error "Expected integer expression of the 'omp_allocator_handle_kind' kind at .1." }
- | 1
-Error: Expected integer expression of the ‘omp_allocator_handle_kind’ kind at (1)
source-gcc/gcc/testsuite/gfortran.dg/gomp/allocate-4.f90:28:130:
28 | !$omp allocate (var2) ! { dg-error "'var2' in 'allocate' directive at .1. is not present in associated 'allocate' statement." }
I understand that's due to an "accidental" non-match vs. match of
'10' vs. 'omp_allocator_handle_kind' ('c_intptr_t') data types:
> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> +static void
> +gfc_resolve_omp_allocate (gfc_code *code, gfc_namespace *ns)
> +{
> + gfc_alloc *al;
> + gfc_omp_namelist *n = NULL;
> + gfc_omp_namelist *cn = NULL;
> + gfc_omp_namelist *p, *tail;
> + gfc_code *cur;
> + hash_set<gfc_symbol*> vars;
> +
> + gfc_omp_clauses *clauses = code->ext.omp_clauses;
> + gcc_assert (clauses);
> + cn = clauses->lists[OMP_LIST_ALLOCATOR];
> + gfc_expr *omp_al = cn ? cn->expr : NULL;
> +
> + if (omp_al && (omp_al->ts.type != BT_INTEGER
> + || omp_al->ts.kind != gfc_c_intptr_kind))
> + gfc_error ("Expected integer expression of the "
> + "%<omp_allocator_handle_kind%> kind at %L", &omp_al->where);
$ git grep -i parameter.\*omp_allocator_handle_kind -- libgomp/omp_lib.*
libgomp/omp_lib.f90.in: integer, parameter :: omp_allocator_handle_kind = c_intptr_t
libgomp/omp_lib.h.in: parameter (omp_allocator_handle_kind = @INTPTR_T_KIND@)
Fix-up for og12 commit 491478d12b83e102f72858e8a871a25c951df293
"Add parsing support for allocate directive (OpenMP 5.0)".
gcc/testsuite/
* gfortran.dg/gomp/allocate-4.f90: Fix 'omp_allocator_handle_kind'
example.
|
|
"minimal" mode': 'libgomp/ChangeLog.omp'
|
|
"minimal" mode'
libgomp/
* libgomp.texi (nvptx): Update for
'nvptx, libgfortran: Switch out of "minimal" mode'.
|
|
..., where it currently fails:
[...]/libgcc/config/nvptx/crt0.c:22:10: fatal error: stdlib.h: No such file or directory
22 | #include <stdlib.h>
| ^~~~~~~~~~
Fix-up for "nvptx: Support global constructors/destructors via 'collect2'".
libgcc/
* config/nvptx/crt0.c [!HAVE_STDLIB_H]: Don't '#include <stdlib.h>'.
(atexit): Prototype.
|
|
..., in order to enable (portions of) Fortran I/O, for example.
libgfortran/ChangeLog:
* configure: Regenerate.
* configure.ac: No longer set LIBGFOR_MINIMAL for nvptx.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Remove.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
Co-authored-by: Andrew Stubbs <ams@codesourcery.com>
|
|
Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.
The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.
libgcc/ChangeLog:
* config/nvptx/t-nvptx: Add unwind-nvptx.c.
* config/nvptx/unwind-nvptx.c: New file.
Co-authored-by: Andrew Stubbs <ams@codesourcery.com>
|
|
This extends "nvptx: Support global constructors/destructors via 'collect2'"
for offloading.
libgcc/
* config/nvptx/crtstuff.c ["mgomp"]
(__do_global_ctors__entry__mgomp)
(__do_global_dtors__entry__mgomp): New.
[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
(nvptx_close_device, GOMP_OFFLOAD_load_image)
(GOMP_OFFLOAD_unload_image): Call it.
|
|
The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this. Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more "sorry, unimplemented:
global constructors not supported on this target".
This depends on <https://github.com/MentorEmbedded/nvptx-tools/pull/40> "'nm'"
generally, and for global destructors support: newlib
<https://inbox.sourceware.org/newlib/878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com/>
"nvptx: Implement '_exit' instead of 'exit'".
gcc/
* collect2.cc (write_c_file_glob): Allow for
'COLLECT2_MAIN_REFERENCE' override.
* config.gcc <case ${target} in nvptx-*>: Set 'use_collect2=yes'.
* config/nvptx/nvptx.h: Adjust.
gcc/testsuite/
* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
in identifiers.
* lib/target-supports.exp
(check_effective_target_global_constructor): Enable for nvptx.
libgcc/
* config.host <case ${host} in nvptx-*>: Add 'crtbegin.o',
'crtend.o' to 'extra_parts'.
* config/nvptx/crt0.c: Invoke '__do_global_ctors',
'__do_global_dtors'.
* config/nvptx/crtstuff.c: New.
* config/nvptx/t-nvptx: Adjust.
|
|
'__nvptx_uni'
As I have reported to Nvidia in 2022-12-01 'NVIDIA Incident Report (3891704):
ptxas: Duplicate declaration error: "cannot be resolved by a '.static'"',
'ptxas' has an inscrutable error mode for duplicate declarations:
ptxas softstack-decl-1.o, line 11; error : '.extern' variable '__nvptx_stacks' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
ptxas uniform-simt-decl-1.o, line 12; error : '.extern' variable '__nvptx_uni' cannot be resolved by a '.static'
ptxas fatal : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
This is inscrutable, because (a) what is "cannot be resolved by a '.static'"
supposed to tell me (there is no '.static' in PTX?), and (b) why arent't
repeated declaration just verified to match the first, but otherwise a no-op
(like in other programming languages)?
gcc/
* config/nvptx/nvptx.cc (nvptx_assemble_undefined_decl): Notice
'__nvptx_stacks', '__nvptx_uni' declarations.
(nvptx_file_end): Don't emit duplicate declarations for those.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: Make 'dg-do assemble',
adjust.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
'gcc.target/nvptx/uniform-simt-decl-1.c'
... to document the status quo re implicit (via 'need_softstack_decl',
'need_unisimt_decl') and explicit declarations of '__nvptx_stacks',
'__nvptx_uni'.
gcc/testsuite/
* gcc.target/nvptx/softstack-decl-1.c: New.
* gcc.target/nvptx/uniform-simt-decl-1.c: Likewise.
|
|
For example, this allows for '-muniform-simt' code to be executed
single-threaded, which currently fails (device-side 'trap'), as the 0xffffffff
mask isn't correct if not all 32 threads of a warp are active. The same
issue/fix, I suppose but have not verified, would apply if we were to allow for
OpenACC 'vector_length' smaller than 32, for example for OpenACC 'serial'.
We use 'nvptx_uniform_warp_check' only for PTX ISA version less than 6.0.
Otherwise we're using 'nvptx_warpsync', which emits 'bar.warp.sync 0xffffffff',
which evidently appears to do the right thing. (I've tested '-muniform-simt'
code executing single-threaded.)
gcc/
* config/nvptx/nvptx.md (nvptx_uniform_warp_check): Make fit for
non-full-warp execution.
gcc/testsuite/
* gcc.target/nvptx/nvptx.exp
(check_effective_target_default_ptx_isa_version_at_least_6_0):
New.
* gcc.target/nvptx/uniform-simt-5.c: New.
libgomp/
* plugin/plugin-nvptx.c (nvptx_exec): Assert what we know about
'blockDimX'.
|
|
'abort' [GCC PR85463]"
PR target/85463
libgfortran/
* runtime/minimal.c [__nvptx__] (exit): Don't override.
libgomp/
* config/nvptx/error.c (exit): Don't override.
* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
|
|
'libgomp.oacc-c-c++-common/abort-3.c'
libgomp/
* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Force
'--param openacc-kernels=parloops'.
|
|
This should optimize cache-lines on the AMD GPUs somewhat.
libgomp/ChangeLog:
* usm-allocator.c (ALIGN): Use 128-byte alignment.
|
|
There were problems with critical driver data sharing pages with USM data, so
this new allocator implementation moves USM to entirely different pages.
libgomp/ChangeLog:
* plugin/plugin-gcn.c: Include sys/mman.h and unistd.h.
(usm_heap_create): New function.
(struct usm_splay_tree_key_s): Delete function.
(usm_splay_compare): Delete function.
(splay_tree_prefix): Delete define.
(GOMP_OFFLOAD_usm_alloc): Use new allocator.
(GOMP_OFFLOAD_usm_free): Likewise.
(GOMP_OFFLOAD_is_usm_ptr): Likewise.
(gomp_fatal): Delete macro.
(splay_tree_c): Delete.
* usm-allocator.c: New file.
|
|
Fix up some USM corner cases.
libgomp/ChangeLog:
* libgomp.h (OFFSET_USM): New macro.
* target.c (gomp_map_pointer): Handle USM mappings.
(gomp_map_val): Handle OFFSET_USM.
(gomp_map_vars_internal): Move USM check earlier, and use OFFSET_USM.
Add OFFSET_USM check to the second mapping pass.
* testsuite/libgomp.fortran/usm-1.f90: New test.
* testsuite/libgomp.fortran/usm-2.f90: New test.
|
|
This patch handles Unified Shared Memory (USM) in the OpenMP runtime routine
omp_target_is_accessible.
libgomp/ChangeLog:
* target.c (omp_target_is_accessible): Handle unified shared memory.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Updated.
* testsuite/libgomp.fortran/target-is-accessible-1.f90: Updated.
* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
* testsuite/libgomp.fortran/target-is-accessible-2.f90: New test.
|
|
Add libgomp support for 'amdgcn' as arch, and for each processor type (as passed
to '-march') as isa traits.
Add test case for all supported 'isa' values used as context selectors in a
metadirective construct.
libgomp/ChangeLog:
* config/gcn/selector.c (GOMP_evaluate_current_device): Recognise 'amdgcn'
as arch, and '-march' values (as well as 'gfx803') as isa traits.
* testsuite/libgomp.c-c++-common/metadirective-6.c: New test.
|
|
'libgomp.oacc-fortran/declare-allocatable-array_descriptor-1*.f90'
libgomp/
* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-directive.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1-runtime.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/declare-allocatable-array_descriptor-1.f90:
New.
|
|
'libgomp.oacc-fortran/declare-allocatable-1*.f90'
libgomp/
* testsuite/libgomp.oacc-fortran/declare-allocatable-1-directive.f90:
Adjust.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90:
Likewise.
|
|
'libgomp.oacc-fortran/declare-allocatable-1*.f90'
Seen only for certain optimizations levels, as indicated; so there are a few
XPASSes otherwise.
There are neither OpenACC 'kernels' constructs here, nor other 'loop'
constructs with 'auto' clause, so I'm not sure what's going on.
libgomp/
* testsuite/libgomp.oacc-fortran/declare-allocatable-1-directive.f90:
XFAIL some OpenACC 'kernels' confusion.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1-runtime.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90:
Likewise.
|
|
When using 'map(alloc: var, dt%comp)' needs to have a 'to' mapping of
the array descriptor as otherwise the bounds are not available in the
target region. - Likewise for character strings.
This patch implements this; however, some additional issues are exposed
by the testcase; those are '#if 0'ed and will be handled later.
Submitted to mainline (but pending review):
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Ensure DT struct-comp with
array descriptor and 'alloc:' have the descriptor mapped with 'to:'.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3a.f90: New test.
|