Age | Commit message (Collapse) | Author | Files | Lines |
|
Upstream has 'goacc_map_vars'; merge the new 'gomp_map_vars_openacc' into it.
(Maybe the latter didn't exist yet when the former was originally added?)
No functional change.
Clean-up for og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
PR other/76739
libgomp/
* libgomp.h (goacc_map_vars): Add 'struct goacc_ncarray_info *'
formal parameter.
(gomp_map_vars_openacc): Remove.
* target.c (goacc_map_vars): Adjust.
(gomp_map_vars_openacc): Remove.
* oacc-mem.c (acc_map_data, goacc_enter_datum)
(goacc_enter_data_internal): Adjust.
* oacc-parallel.c (GOACC_parallel_keyed, GOACC_data_start):
Adjust.
|
|
There is occasional execution failure; these changes need to be reviewed.
This reverts og12 commit 719f93c8618a134f90b5b661ab70c918d659ad05.
libgomp/
* oacc-host.c: Revert
"OpenACC profiling-interface fixes for asynchronous operations"
changes.
* oacc-mem.c: Likewise.
* oacc-parallel.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Likewise.
|
|
... as a prerequisite for reverting
"OpenACC profiling-interface fixes for asynchronous operations".
This reverts og12 commit b845d2f62e7da1c4cfdfee99690de94b648d076d.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Revert
"Revert changes to acc_prof-init-1.c and acc_prof-parallel-1.c"
changes.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Likewise.
|
|
gcc/ChangeLog:
* config/gcn/gcn-valu.md (<expander><mode>3_exec): Add patterns for
{s|u}{max|min} in QI, HI and DI modes.
(<expander><mode>3): Add pattern for {s|u}{max|min} in DI mode.
(cond_<fexpander><mode>): Add pattern for cond_f{max|min}.
(cond_<expander><mode>): Add pattern for cond_{s|u}{max|min}.
* config/gcn/gcn.cc (gcn_spill_class): Allow the exec register to be
saved in SGPRs.
gcc/testsuite/ChangeLog:
* gcc.target/gcn/cond_fmaxnm_1.c: New test.
* gcc.target/gcn/cond_fmaxnm_1_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_2.c: New test.
* gcc.target/gcn/cond_fmaxnm_2_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_3.c: New test.
* gcc.target/gcn/cond_fmaxnm_3_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_4.c: New test.
* gcc.target/gcn/cond_fmaxnm_4_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_5.c: New test.
* gcc.target/gcn/cond_fmaxnm_5_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_6.c: New test.
* gcc.target/gcn/cond_fmaxnm_6_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_7.c: New test.
* gcc.target/gcn/cond_fmaxnm_7_run.c: New test.
* gcc.target/gcn/cond_fmaxnm_8.c: New test.
* gcc.target/gcn/cond_fmaxnm_8_run.c: New test.
* gcc.target/gcn/cond_fminnm_1.c: New test.
* gcc.target/gcn/cond_fminnm_1_run.c: New test.
* gcc.target/gcn/cond_fminnm_2.c: New test.
* gcc.target/gcn/cond_fminnm_2_run.c: New test.
* gcc.target/gcn/cond_fminnm_3.c: New test.
* gcc.target/gcn/cond_fminnm_3_run.c: New test.
* gcc.target/gcn/cond_fminnm_4.c: New test.
* gcc.target/gcn/cond_fminnm_4_run.c: New test.
* gcc.target/gcn/cond_fminnm_5.c: New test.
* gcc.target/gcn/cond_fminnm_5_run.c: New test.
* gcc.target/gcn/cond_fminnm_6.c: New test.
* gcc.target/gcn/cond_fminnm_6_run.c: New test.
* gcc.target/gcn/cond_fminnm_7.c: New test.
* gcc.target/gcn/cond_fminnm_7_run.c: New test.
* gcc.target/gcn/cond_fminnm_8.c: New test.
* gcc.target/gcn/cond_fminnm_8_run.c: New test.
* gcc.target/gcn/cond_smax_1.c: New test.
* gcc.target/gcn/cond_smax_1_run.c: New test.
* gcc.target/gcn/cond_smin_1.c: New test.
* gcc.target/gcn/cond_smin_1_run.c: New test.
* gcc.target/gcn/cond_umax_1.c: New test.
* gcc.target/gcn/cond_umax_1_run.c: New test.
* gcc.target/gcn/cond_umin_1.c: New test.
* gcc.target/gcn/cond_umin_1_run.c: New test.
* gcc.target/gcn/smax_1.c: New test.
* gcc.target/gcn/smax_1_run.c: New test.
* gcc.target/gcn/smin_1.c: New test.
* gcc.target/gcn/smin_1_run.c: New test.
* gcc.target/gcn/umax_1.c: New test.
* gcc.target/gcn/umax_1_run.c: New test.
* gcc.target/gcn/umin_1.c: New test.
* gcc.target/gcn/umin_1_run.c: New test.
(cherry picked from commit 553ff2524f412be4e02e2ffb1a0a3dc3e2280742)
|
|
Merge up to r12-9210-gb3f9d2cf7dd5488800f867a6aae076465ecb391b (2nd Mar 2023)
|
|
|
|
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
(cherry picked from commit 96ff97ff6574666a5509ae9fa596e7f2b6ad4f88)
|
|
|
|
|
|
Merge up to r12-9207-gb8e496d132ec087c9db5951fea23551dcc831d8c (27th Feb 2023)
|
|
and deferred-length strings"
Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
block to handle deferred-string length vars better, esp. when 'optional'.
gcc/testsuite/:
* gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
mapping changes.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
|
|
As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.
After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").
PR sanitizer/108834
gcc/ChangeLog:
* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).
gcc/testsuite/ChangeLog:
* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.
(cherry picked from commit 94c9b1bb79f63d000ebb05efc155c149325e332d)
|
|
As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.
Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610728.html
PR target/96373
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/p9-vec-length-epil-1.c: Add -fno-trapping-math.
* gcc.target/powerpc/p9-vec-length-epil-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-1.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-2.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-3.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-4.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-5.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-6.c: Likewise.
* gcc.target/powerpc/p9-vec-length-full-8.c: Likewise.
* gcc.target/powerpc/pr96373.c: New test.
(cherry picked from commit 4f5a1198065dc078f8099db628da7b06a2666f34)
|
|
|
|
|
|
|
|
gcc/ChangeLog:
* config/riscv/t-rtems: Keep only -mcmodel=medany 64-bit multilibs.
Add non-compact 32-bit multilibs.
(cherry picked from commit 35a067020e41d97bc3be15b518b3dc2a64b4aae2)
|
|
|
|
The following makes sure to only predicate calls necessary.
PR tree-optimization/108888
* tree-if-conv.cc (if_convertible_stmt_p): Set PLF_2 on
calls to predicate.
(predicate_statements): Only predicate calls with PLF_2.
* g++.dg/torture/pr108888.C: New testcase.
(cherry picked from commit 31cc5821223a096ef61743bff520f4a0dbba5872)
|
|
There has been support for generating "inbranch" SIMD clones for a long time,
but nothing actually uses them (as far as I can see).
This patch add supports for a sub-set of possible cases (those using
mask_mode == VOIDmode). The other cases fail to vectorize, just as before,
so there should be no regressions.
The sub-set of support should cover all cases needed by amdgcn, at present.
gcc/ChangeLog:
* internal-fn.cc (expand_MASK_CALL): New.
* internal-fn.def (MASK_CALL): New.
* internal-fn.h (expand_MASK_CALL): New prototype.
* omp-simd-clone.cc (simd_clone_adjust_argument_types): Set vector_type
for mask arguments also.
* tree-if-conv.cc: Include cgraph.h.
(if_convertible_stmt_p): Do if conversions for calls to SIMD calls.
(predicate_statements): Convert functions to IFN_MASK_CALL.
* tree-vect-loop.cc (vect_get_datarefs_in_loop): Recognise
IFN_MASK_CALL as a SIMD function call.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Handle
IFN_MASK_CALL as an inbranch SIMD function call.
Generate the mask vector arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-16.c: New test.
* gcc.dg/vect/vect-simd-clone-16b.c: New test.
* gcc.dg/vect/vect-simd-clone-16c.c: New test.
* gcc.dg/vect/vect-simd-clone-16d.c: New test.
* gcc.dg/vect/vect-simd-clone-16e.c: New test.
* gcc.dg/vect/vect-simd-clone-16f.c: New test.
* gcc.dg/vect/vect-simd-clone-17.c: New test.
* gcc.dg/vect/vect-simd-clone-17b.c: New test.
* gcc.dg/vect/vect-simd-clone-17c.c: New test.
* gcc.dg/vect/vect-simd-clone-17d.c: New test.
* gcc.dg/vect/vect-simd-clone-17e.c: New test.
* gcc.dg/vect/vect-simd-clone-17f.c: New test.
* gcc.dg/vect/vect-simd-clone-18.c: New test.
* gcc.dg/vect/vect-simd-clone-18b.c: New test.
* gcc.dg/vect/vect-simd-clone-18c.c: New test.
* gcc.dg/vect/vect-simd-clone-18d.c: New test.
* gcc.dg/vect/vect-simd-clone-18e.c: New test.
* gcc.dg/vect/vect-simd-clone-18f.c: New test.
(cherry picked from commit 3da77f217c8b2089ecba3eb201e727c3fcdcd19d)
|
|
Broadcast is a very common function. This should reduce compile-time
effort.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
PR libstdc++/108030
* include/experimental/bits/simd.h (__vector_broadcast):
Implement via __vector_broadcast_impl instead of
__call_with_n_evaluations + 2 lambdas.
(__vector_broadcast_impl): New.
(cherry picked from commit 2e29e2fbeb8936e5c85cefaf547cba42e17e137b)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_builtin.h (_S_set): Compare as
int. The actual range of these indexes is very small.
(cherry picked from commit ffa39f7120f6e83a567d7a83ff4437f6b41036ea)
|
|
Resolves -Wtautological-compare warnings about `if
(__builtin_is_constant_evaluated())` in the implementations of these
functions.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_x86.h (_S_bit_shift_left)
(_S_bit_shift_right): Declare constexpr. The implementation was
already expecting constexpr evaluation.
(cherry picked from commit fa37ac2b59ed1c379b35dbf9bd58f7849f9fd5b5)
|
|
Fix a bug in which Fortran pointers inside derived types caused a runtime
error when Unified Shared Memory was active.
libgomp/ChangeLog:
* target.c (gomp_attach_pointer): Check for USM.
* testsuite/libgomp.fortran/usm-3.f90: New test.
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd.h (__extract_part, split):
Use reserved name for template parameter.
(cherry picked from commit bb920f561e983c64d146f173dc4ebc098441a962)
|
|
|
|
Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.
The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.
This is the OG12 variant of the submitted but unreviewed GCC 13/mainline
patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612387.html
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
|
|
When merging r13-4584-gb2e1c49b4a4 to OG12 as commit 58e0579ed87,
the 'align' handling seemingly ended up in the wrong clause.
(Result: libgomp.fortran/allocate-2a.f90 FAILED; now fixed.)
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Move align modifier
handling from OMP_LIST_ALLOCATOR to OMP_LIST_ALLOCATE.
|
|
|
|
This isn't required by the standard, but there's an LWG issue suggesting
to add it.
Also use __invoke_result instead of result_of, to match the spec in
recent standards.
libstdc++-v3/ChangeLog:
* include/bits/refwrap.h (reference_wrapper::operator()): Add
noexcept-specifier and use __invoke_result instead of result_of.
* testsuite/20_util/reference_wrapper/invoke-noexcept.cc: New test.
(cherry picked from commit e47df5eb56c4e7aca0d3e50826e5aaa1887fa446)
|
|
With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internsl. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.
libstdc++-v3/ChangeLog:
PR libstdc++/108636
* include/bits/fs_path.h (path::path(string_view, _Type))
(path::_Cmpt::_Cmpt(string_view, _Type, size_t)): Move inline
definitions to ...
* src/c++17/fs_path.cc: ... here.
* testsuite/27_io/filesystem/path/108636.cc: New test.
(cherry picked from commit db8d6fc572ec316ccfcf70b1dffe3be0b1b37212)
|
|
|
|
Here we crash in is_capture_proxy:
/* Location wrappers should be stripped or otherwise handled by the
caller before using this predicate. */
gcc_checking_assert (!location_wrapper_p (decl));
We only crash with the redundant capture:
int abyPage = [=, abyPage] { ... }
because prune_lambda_captures is only called when there was a default
capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST.
The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated
correctly and so var_to_maybe_prune proceeded where it shouldn't.
Co-Authored by: Patrick Palka <ppalka@redhat.com>
PR c++/108829
gcc/cp/ChangeLog:
* pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P.
(tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to
prepend_one_capture.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-108829-2.C: New test.
* g++.dg/cpp0x/lambda/lambda-108829.C: New test.
(cherry picked from commit 02d8ab3e4e2f3d9dc12157a98c976d6698e71e29)
|
|
Fix-up for og12 commit 842df187487f5b16ae29bbe7e9acd79661a9df48
"openmp: -foffload-memory=pinned". No functional change.
libgomp/
* libgomp_g.h (GOMP_enable_pinned_mode): New.
|
|
device: ChangeLog
... forgotten in og12 commit 4bd844f3e0202b3d083f0784f4343570c88bb86c
"Attempt to not just register but allocate OpenMP pinned memory using a device".
|
|
... instead of 'mmap' plus attempting to register using a device.
Implemented for nvptx offloading via 'cuMemHostAlloc'.
This re-works og12 commit a5a4800e92773da7126c00a9c79b172494d58ab5
"Attempt to register OpenMP pinned memory using a device instead of 'mlock'".
include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): Remove.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Add 'init0'
formal parameter. Adjust all users.
(linux_memspace_alloc, linux_memspace_free): Attempt to allocate
OpenMP pinned memory using a device instead of 'mmap' plus
attempting to register using a device.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): New.
(struct gomp_device_descr): Remove 'register_page_locked_func',
'unregister_page_locked_func'. Add 'page_locked_host_alloc_func',
'page_locked_host_free_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): Remove.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): Remove.
(GOMP_OFFLOAD_page_locked_host_alloc)
(GOMP_OFFLOAD_page_locked_host_free): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): Remove.
(gomp_page_locked_host_alloc, gomp_page_locked_host_free): Add.
(gomp_load_plugin_for_device): Don't handle
'register_page_locked', 'unregister_page_locked'. Handle
'page_locked_host_alloc', 'page_locked_host_free'.
Suggested-by: Andrew Stubbs <ams@codesourcery.com>
|
|
As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.
This patch fixes the constraint accordingly.
The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.
gcc/ChangeLog:
PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.
gcc/testsuite/ChangeLog:
PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.
(cherry picked from commit 277e1f30a5e4e634304a7b8a532825119f0ea47f)
|
|
Merge up to r12-9189-gc6e3ecca0e3dcf567d0c843a4987e52591041372 (20th Feb 2023)
|
|
|
|
|
|
Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple. For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf". Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).
A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1]. Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical. This commit also
aligns GCC with the revision.
Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.
[1]: https://github.com/loongson/LoongArch-Documentation/pull/80
gcc/ChangeLog:
* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).
(cherry picked from commit 017849d9d88f021770a90f12fffec9aa2425ed27)
|
|
|
|
|
|
Implemented for nvptx offloading via 'cuMemHostRegister'. This means: (a) not
running into 'mlock' limitations, and (b) the device is aware of this and may
optimize host <-> device memory transfers.
This re-works og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory".
include/
* cuda/cuda.h (cuMemHostRegister, cuMemHostUnregister): New.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_free, linux_memspace_realloc): Attempt to register
OpenMP pinned memory using a device instead of 'mlock'.
* libgomp-plugin.h (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* libgomp.h (gomp_register_page_locked)
(gomp_unregister_page_locked): New
(struct gomp_device_descr): Add 'register_page_locked_func',
'unregister_page_locked_func'.
* plugin/cuda-lib.def (cuMemHostRegister_v2, cuMemHostRegister)
(cuMemHostUnregister): New.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_register_page_locked)
(GOMP_OFFLOAD_unregister_page_locked): New.
* target.c (gomp_register_page_locked)
(gomp_unregister_page_locked): New.
(gomp_load_plugin_for_device): Handle 'register_page_locked',
'unregister_page_locked'.
* testsuite/libgomp.c/alloc-pinned-1.c: Adjust.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-6.c: Likewise.
|
|
... to not run into a SIGSEGV if a non-'malloc'-based allocation is 'free'd
here.
Fix-up for og12 commit c5d1d7651297a273321154a5fe1b01eba9dcf604
"libgomp, nvptx: low-latency memory allocator".
libgomp/
* allocator.c (omp_realloc): Route 'free' through 'MEMSPACE_FREE'.
|
|
Clarification for og12 commit ab7520b3b4cd9fdabfd63652badde478955bd3b5
"libgomp: pinned memory". No functional change.
libgomp/
* config/linux/allocator.c (linux_memspace_alloc)
(linux_memspace_calloc): Clarify zero-initialization for pinned
memory.
* testsuite/libgomp.c/alloc-pinned-1.c: Verify zero-initialization
for pinned memory.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-3.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-4.c: Likewise.
* testsuite/libgomp.c/alloc-pinned-5.c: Likewise.
|
|
'ompx_host_mem_space'
Clean-up for og12 commit 84914e197d91a67b3d27db0e4c69a433462983a5
"openmp, nvptx: ompx_unified_shared_mem_alloc". No functional change.
libgomp/
* config/linux/allocator.c (linux_memspace_calloc): Elide
(innocuous) duplicate 'if' condition.
* config/nvptx/allocator.c (nvptx_memspace_free): Explicitly
handle 'memspace == ompx_host_mem_space'.
* libgomp.h (gomp_is_usm_ptr): Remove.
|
|
In file included from [...]/libgomp/config/nvptx/allocator.c:49:
[...]/libgomp/config/nvptx/../../basic-allocator.c:52:2: error: invalid preprocessing directive #deine; did you mean #define?
52 | #deine BASIC_ALLOC_YIELD
| ^~~~~
| define
Yes, indeed.
Fix-up for og12 commit 9583738a62a33a276b2aad980a27e77097f95924
"nvptx, libgomp: Move the low-latency allocator code".
libgomp/
* basic-allocator.c (BASIC_ALLOC_YIELD): instead of '#deine',
'#define' it.
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_math.h (__hypot): Bitcasting
between scalars requires the __bit_cast helper function instead
of simd_bit_cast.
(cherry picked from commit a5de17d9120dde7e6598a05ea4d1556c2783c69b)
|
|
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
libstdc++-v3/ChangeLog:
* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_not_equal_to, _SimdImplX86::_S_less)
(_SimdImplX86::_S_less_equal): Do not call
__builtin_is_constant_evaluated in constexpr-if.
(cherry picked from commit 1fd3836463c65f695831ef04c7dbda1e7a1794ba)
|