Age | Commit message (Collapse) | Author | Files | Lines |
|
The nteams-var ICV exists per device and can be set either via the routine
omp_set_num_teams or as environment variable (OMP_NUM_TEAMS with optional
_ALL/_DEV/_DEV_<num> suffix); it is default-initialized to zero. The number
of teams created is described under the num_teams clause. If the clause is
absent, the number of teams is implementation defined but at least
one team must exist and, if nteams-var is positive, at most nteams-var
teams may exist.
The latter condition was not honored in a target region before this
commit, such that too many teams were created.
Already before this commit, both the num_teams([lower:]upper) clause
(on the host and in target regions) and, only on the host, the nteams-var
ICV were honored. And as only one teams is created for host fallback,
unless the clause specifies otherwise, the nteams-var ICV was and is
effectively honored.
libgomp/ChangeLog:
PR libgomp/109875
* config/gcn/target.c (GOMP_teams4): Honor nteams-var ICV.
* config/nvptx/target.c (GOMP_teams4): Likewise.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-2.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-3.c: New test.
* testsuite/libgomp.c-c++-common/teams-nteams-icv-4.c: New test.
|
|
Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.
Additionally, the patch now unmaps for scalar allocatables/pointers
the GOMP_MAP_POINTER, avoiding stale mappings.
The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.
gcc/fortran/ChangeLog:
* trans-decl.cc (gfc_get_symbol_decl): Add attributes
such as 'declare target' also to hidden artificial
variable for deferred-length character variables.
* trans-openmp.cc (gfc_trans_omp_array_section,
gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
Improve mapping of array descriptors and deferred-length
string variables.
gcc/ChangeLog:
* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
special case.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
'target exit data'.
* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
gcc/testsuite/
* gfortran.dg/goacc/finalize-1.f: Update dg-tree; shows a fix
for 'finalize' as a ptr is now 'delete' instead of 'release'.
* gfortran.dg/gomp/pr78260-2.f90: Likewise as elem-size calc moved
to if (allocated) block
* gfortran.dg/gomp/target-exit-data.f90: Likewise as a var is now a
replaced by a MEM< _25 > expression.
* gfortran.dg/gomp/map-9.f90: Update dg-scan-tree-dump.
* gfortran.dg/gomp/map-10.f90: New test.
|
|
..., and enable if 'flock' is available for serializing execution testing.
Regarding the default of 19 parallel slots, this turned out to be a local
minimum for wall time when testing this on:
$ uname -srvi
Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
... in two configurations: case (a) standard configuration, no offloading
configured, case (b) offloading for GCN and nvptx configured but no devices
available. For both cases, default plus '-m32' variant.
$ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}"
Case (a), baseline:
6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k
6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k
This is what people have been complaining about, rightly so, in
<https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and
elsewhere.
Case (a), parallelized:
-j12 GCC_TEST_PARALLEL_SLOTS=10
3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k
-j15 GCC_TEST_PARALLEL_SLOTS=15
3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k
-j17 GCC_TEST_PARALLEL_SLOTS=17
3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k
-j18 GCC_TEST_PARALLEL_SLOTS=18
3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k
-j19 GCC_TEST_PARALLEL_SLOTS=19
3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k
-j20 GCC_TEST_PARALLEL_SLOTS=20
3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k
-j23 GCC_TEST_PARALLEL_SLOTS=23
4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k
-j26 GCC_TEST_PARALLEL_SLOTS=26
3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k
-j32 GCC_TEST_PARALLEL_SLOTS=32
4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k
Yay!
Case (b), baseline; 2+ h:
7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k
Case (b), parallelized:
-j12 GCC_TEST_PARALLEL_SLOTS=10
7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k
-j15 GCC_TEST_PARALLEL_SLOTS=15
8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k
-j17 GCC_TEST_PARALLEL_SLOTS=17
8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k
-j18 GCC_TEST_PARALLEL_SLOTS=18
8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k
-j19 GCC_TEST_PARALLEL_SLOTS=19
9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k
-j20 GCC_TEST_PARALLEL_SLOTS=20
9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k
-j23 GCC_TEST_PARALLEL_SLOTS=23
10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k
-j26 GCC_TEST_PARALLEL_SLOTS=26
11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k
-j32 GCC_TEST_PARALLEL_SLOTS=32
11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k
On my Dell Precision 7530 laptop:
$ uname -srvi
Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64
$ grep '^model name' < /proc/cpuinfo | uniq -c
12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
$ nvidia-smi -L
GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea)
... in two configurations: case (c) standard configuration, no offloading
configured, case (d) offloading for nvptx configured and device available.
For both cases, only default variant, no '-m32'.
$ \time make check-target-libgomp
Case (c), baseline; roughly half of case (a) (just one variant):
1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k
1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k
Case (c), parallelized:
-j12 GCC_TEST_PARALLEL_SLOTS=2
1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=6
1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k
1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=8
2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k
2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=10
2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k
2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=12
2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k
2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k
-j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k
Case (d), baseline (compared to case (b): only nvptx offloading compilation,
but also nvptx offloading execution); ~1 h:
2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k
2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k
Case (d), parallelized:
-j12 GCC_TEST_PARALLEL_SLOTS=2
2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=6
3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k
3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=8
3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=10
4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k
-j12 GCC_TEST_PARALLEL_SLOTS=12
4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k
-j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe]
4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k
Worth noting is that with nvptx offloading, there is one execution test case
that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively
stalls progress for almost 5 min: quickly other executions test cases queue up
on the lock for all parallel slots. That's working as expected; just noting
this as it accordingly does skew the wall time numbers.
PR testsuite/66005
libgomp/
* configure.ac: Look for 'flock'.
* testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing.
* testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here...
* testsuite/lib/libgomp.exp: ... but here, instead.
(libgomp_load): Override for parallel testing.
* testsuite/libgomp-site-extra.exp.in (FLOCK): Set.
* configure: Regenerate.
* Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
|
|
..., while still hard-coding the number of parallel slots to one.
PR testsuite/66005
libgomp/
* testsuite/Makefile.am (PWD_COMMAND): New variable.
(%/site.exp): New target.
(check_p_numbers0, check_p_numbers1, check_p_numbers2)
(check_p_numbers3, check_p_numbers4, check_p_numbers5)
(check_p_numbers6, check_p_numbers, gcc_test_parallel_slots)
(check_p_subdirs)
(check_DEJAGNU_libgomp_targets): New variables.
($(check_DEJAGNU_libgomp_targets)): New target.
($(check_DEJAGNU_libgomp_targets)): New dependency.
(check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets.
* testsuite/Makefile.in: Regenerate.
* testsuite/lib/libgomp.exp: For parallel testing,
'load_file ../libgomp-test-support.exp'.
Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
|
|
[PR91884]
..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead
of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++'
for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran
code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.)
PR testsuite/91884
libgomp/
* configure.ac: 'AC_SUBST(CXX)'.
* configure: Regenerate.
* Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
* testsuite/libgomp-site-extra.exp.in (GXX_UNDER_TEST)
(GFORTRAN_UNDER_TEST): Set.
* testsuite/lib/libgomp.exp (libgomp_init): Adjust.
* testsuite/libgomp.c++/c++.exp: Use 'GXX_UNDER_TEST'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.fortran/fortran.exp: Use
'GFORTRAN_UNDER_TEST'.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
|
|
..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior.
PR testsuite/91884
libgomp/
* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't
specify compiler.
* testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler.
* testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS):
Likewise.
* testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS):
Likewise.
|
|
When offloading was enabled, top-level 'asm' were added to the offloading
section, confusing assemblers which did not support the syntax. Additionally,
with offloading and -flto, the top-level assembler code did not end up
in the host files.
As r14-321-g9a41d2cdbcd added top-level 'asm' to one libstdc++ header file,
the issue became more apparent, causing fails with nvptx for some
C++ testcases.
PR libstdc++/109816
gcc/ChangeLog:
* lto-cgraph.cc (output_symtab): Guard lto_output_toplevel_asms by
'!lto_stream_offload_p'.
libgomp/ChangeLog:
* testsuite/libgomp.c++/target-map-class-1.C: New test.
* testsuite/libgomp.c++/target-map-class-2.C: New test.
|
|
'lang_library_paths'
..., and use that for libquadmath, too.
libgomp/
* testsuite/lib/libgomp.exp (libgomp_target_compile): Generalize
'lang_library_path' into a list of 'lang_library_paths'.
* testsuite/libgomp.c++/c++.exp: Adjust.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.fortran/fortran.exp: Adjust. Use that for
libquadmath, too.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
|
|
Instead, 'return' early from the '*.exp' files that we're not able to test.
Also, change 'puts' into 'verbose -log'. While re-indenting the previous
'if { $lang_test_file_found } { [...] }' code, also simplify 'ld_library_path'
setup.
libgomp/
* testsuite/lib/libgomp.exp (libgomp_target_compile): Don't look
at 'lang_test_file_found'.
* testsuite/libgomp.c++/c++.exp: Don't set and use it, and instead
'return' early if not able to test. Simplify 'ld_library_path' setup.
* testsuite/libgomp.fortran/fortran.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
|
|
libgomp/
* testsuite/libgomp.c++/c++.exp: Resolve 'lang_test_file_found'
first.
* testsuite/libgomp.fortran/fortran.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
|
|
..., instead of letting them bleed into the next '*.exp' file, requiring
clean-up there.
libgomp/
* testsuite/libgomp.c++/c++.exp: Localize 'lang_[...]' etc.
* testsuite/libgomp.c/c.exp: Likewise.
* testsuite/libgomp.fortran/fortran.exp: Likewise.
* testsuite/libgomp.graphite/graphite.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
|
|
The value of 'lang_test_file' isn't actually used anywhere.
libgomp/
* testsuite/libgomp.c++/c++.exp: Don't set 'lang_test_file'.
* testsuite/libgomp.fortran/fortran.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
* testsuite/libgomp.c/c.exp: Unset 'lang_test_file_found' instead of
'lang_test_file'.
* testsuite/libgomp.oacc-c/c.exp: Likewise.
* testsuite/libgomp.graphite/graphite.exp: Likewise.
* testsuite/lib/libgomp.exp (libgomp_target_compile): Look for
'lang_test_file_found' instead of 'lang_test_file'.
|
|
(It is unclear to me why the current working directory needs to be in
'LD_LIBRARY_PATH'; leaving that alone for now.)
libgomp/
* testsuite/lib/libgomp.exp (libgomp_init): Only use 'blddir' if
set.
* testsuite/libgomp.c++/c++.exp: Likewise.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
|
|
It has already been set in 'libgomp/testsuite/lib/libgomp.exp:libgomp_init'.
libgomp/
* testsuite/libgomp.c++/c++.exp (blddir): Don't set.
* testsuite/libgomp.oacc-c++/c++.exp (blddir): Likewise.
|
|
With nvptx offloading configured, and supported, and CUDA available:
$ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c"
[...]
Running [...]/libgomp.oacc-c/c.exp ...
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors)
PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test
UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2
Running [...]/libgomp.oacc-c++/c++.exp ...
PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors)
PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test
PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors)
PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test
UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2
[...]
..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED:
$ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c"
[...]
Running [...]/libgomp.oacc-c++/c++.exp ...
UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0
UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2
UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2
[...]
That is, if 'c.exp' executes first, it does successfully evaluate
'dg-require-effective-target openacc_cublas' -- and does cache this result (so
it isn't reevaluated for 'c++.exp'). However, for 'c++.exp' alone (that is,
without the 'c.exp' result cached), we run into:
spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...]
In file included from /usr/include/cuda_fp16.h:3673,
from /usr/include/cublas_api.h:75,
from /usr/include/cublas_v2.h:65,
from openacc_cublas2311907.c:3:
/usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory
We're missing include paths to C++/libstdc++ build-tree headers.
Fix this by using the mechanism introduced for Fortran in
r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re
"libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}".
libgomp/
* testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead
of 'libstdcxx_includes'.
* testsuite/libgomp.oacc-c++/c++.exp: Likewise.
|
|
This patch moves several tests introduced by the following patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html
commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7
into the proper location for OpenACC testing (thanks to Thomas for
spotting my mistake!), and also fixes a few additional problems --
missing diagnostics for non-pointer attaches, and a case where a pointer
was incorrectly dereferenced. Tests are also adjusted for vector-length
warnings on nvidia accelerators.
2023-04-29 Julian Brown <julian@codesourcery.com>
PR fortran/109622
gcc/fortran/
* openmp.cc (resolve_omp_clauses): Add diagnostic for
non-pointer/non-allocatable attach/detach.
* trans-openmp.cc (gfc_trans_omp_clauses): Remove dereference for
pointer-to-scalar derived type component attach/detach. Fix
attach/detach handling for descriptors.
gcc/testsuite/
* gfortran.dg/goacc/pr109622-5.f90: New test.
* gfortran.dg/goacc/pr109622-6.f90: New test.
libgomp/
* testsuite/libgomp.fortran/pr109622.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore
vector length warning.
* testsuite/libgomp.fortran/pr109622-2.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here. Add
missing copyin/copyout variable. Ignore vector length warnings.
* testsuite/libgomp.fortran/pr109622-3.f90: Move test...
* testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here. Ignore
vector length warnings.
* testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.
|
|
This patch fixes several cases where multiple attach or detach mapping
nodes were being created for stand-alone attach or detach clauses
in Fortran. After the introduction of stricter checking later during
compilation, these extra nodes could cause ICEs, as seen in the PR.
The patch also fixes cases that "happened to work" previously where
the user attaches/detaches a pointer to array using a descriptor, and
(I think!) the "_data" field has offset zero, hence the same address as
the descriptor as a whole.
2023-04-27 Julian Brown <julian@codesourcery.com>
PR fortran/109622
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Attach/detach clause fixes.
gcc/testsuite/
* gfortran.dg/goacc/attach-descriptor.f90: Adjust expected output.
libgomp/
* testsuite/libgomp.fortran/pr109622.f90: New test.
* testsuite/libgomp.fortran/pr109622-2.f90: New test.
* testsuite/libgomp.fortran/pr109622-3.f90: New test.
|
|
While OpenMP 5.0 required a single structured block before and after the
'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence,
denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or
more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but
till requires the '{' ... '}' for the final-loop-body) and updates Fortran
to accept zero or more than one statements.
If there is no preceeding or succeeding executable statement, a warning is
shown.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_scan_loop_body): Handle
zero exec statements before/after 'omp scan'.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_resolve_omp_do_blocks): Handle zero
or more than one exec statements before/after 'omp scan'.
* trans-openmp.cc (gfc_trans_omp_do): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/scan-1.c: New test.
* testsuite/libgomp.c/scan-23.c: New test.
* testsuite/libgomp.fortran/scan-2.f90: New test.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning.
* gfortran.dg/gomp/loop-2.f90: Likewise.
* gfortran.dg/gomp/reduction5.f90: Likewise.
* gfortran.dg/gomp/reduction6.f90: Likewise.
* gfortran.dg/gomp/scan-1.f90: Likewise.
* gfortran.dg/gomp/taskloop-2.f90: Likewise.
* c-c++-common/gomp/scan-6.c: New test.
* gfortran.dg/gomp/scan-8.f90: New test.
|
|
The patch that introduced the weak_undefined effective-target keyword
and corresponding dg-add-options support
commit 378ec7b87a5265dbe2d489c245fac98ef37fa638
Author: Alexandre Oliva <oliva@adacore.com>
Date: Thu Mar 23 00:45:05 2023 -0300
[testsuite] test for weak_undefined support and add options
badly broke the affected tests on macOS like so:
ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined "
ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined "
add_options_for_weak_undefined tries to call an non-existant proc "89".
Even after fixing this by escaping the brackets, two tests still failed to
link since they lacked the corresponding calls do dg-add-options
weak_undefined.
Tested on x86_64-apple-darwin20.6.0 and i386-pc-solaris2.11.
2023-03-27 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
* lib/target-supports.exp (add_options_for_weak_undefined): Escape
brackets.
* gcc.dg/visibility-22.c: Add weak_undefined options.
libgomp:
* testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Add
weak_undefined options.
|
|
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec',
'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in
particular conceptually: no more device memory allocation, host to device data
copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for
us.
This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd
"Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data",
where I said that "a use will emerge later", which is this one here.
PR libgomp/90596
libgomp/
* target.c (gomp_map_vars_internal): Allow for
'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'.
* oacc-parallel.c (GOACC_parallel_keyed): Pass
'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'.
* plugin/plugin-gcn.c (alloc_by_agent, gcn_exec)
(GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec):
Adjust, simplify.
(gomp_offload_free): Remove.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify.
(cuda_free_argmem): Remove.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Adjust.
|
|
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then
simplify the device plugins accordingly.
This is a follow-up to
Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1)
"Add OpenACC 2.6's no_create",
Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090)
"Use gomp_map_val for OpenACC host-to-device address translation".
libgomp/
* target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for
'GOMP_MAP_IF_PRESENT'.
* plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Adjust.
* plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec)
(GOMP_OFFLOAD_openacc_async_exec): Likewise.
* testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async'
testing.
* testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
|
|
'libgomp.oacc-c-c++-common/data-3.c'
... that I almost broke with later implementation changes.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify
another aspect of OpenACC 'async' semantics.
|
|
For an OpenACC compute construct, we've currently got:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- free memory
- acc_ev_free
- acc_ev_enqueue_launch_end
This confused another thing that I'm working on, so I adjusted that to:
- [...]
- acc_ev_enqueue_launch_start
- launch kernel
- acc_ev_enqueue_launch_end
- free memory
- acc_ev_free
Correspondingly, verify 'acc_ev_alloc', 'acc_ev_free' in
'libgomp.oacc-c-c++-common/acc_prof-parallel-1.c'.
libgomp/
* plugin/plugin-gcn.c (gcn_exec): Fix 'acc_ev_enqueue_launch_end'
position.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c:
Verify 'acc_ev_alloc', 'acc_ev_free'.
|
|
When OMP_WAIT_POLICY is not specified, current implementation will cause
icv flag GOMP_ICV_WAIT_POLICY unset, so global variable wait_policy
will remain its uninitialized value. Initialize it to -1 to make
GOMP_SPINCOUNT behavior consistent with its description.
libgomp/ChangeLog:
PR libgomp/109062
* env.c (wait_policy): Initialize to -1.
(initialize_icvs): Initialize icvs->wait_policy to -1.
* testsuite/libgomp.c-c++-common/pr109062.c: New test.
|
|
Calls to vectorized versions of routines in the math library will now
be inserted when vectorizing code containing supported math functions.
2023-03-02 Kwok Cheung Yeung <kcy@codesourcery.com>
Paul-Antoine Arras <pa@codesourcery.com>
gcc/
* builtins.cc (mathfn_built_in_explicit): New.
* config/gcn/gcn.cc: Include case-cfn-macros.h.
(mathfn_built_in_explicit): Add prototype.
(gcn_vectorize_builtin_vectorized_function): New.
(gcn_libc_has_function): New.
(TARGET_LIBC_HAS_FUNCTION): Define.
(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define.
gcc/testsuite/
* gcc.target/gcn/simd-math-1.c: New testcase.
* gcc.target/gcn/simd-math-2.c: New testcase.
libgomp/
* testsuite/libgomp.c/simd-math-1.c: New testcase.
|
|
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
|
|
'gcc/testsuite/lib/target-supports.exp:check_compile' and elsewhere
I noticed that GCC/Rust recently lost all LTO variants in torture testing:
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O0 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O1 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 (test for excess errors)
-PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
-PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O3 -g (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -Os (test for excess errors)
Etc.
The reason is that when probing for availability of LTO, we run into:
spawn [...]/build-gcc/gcc/testsuite/rust/../../gccrs -B[...]/build-gcc/gcc/testsuite/rust/../../ -fdiagnostics-plain-output -frust-incomplete-and-experimental-compiler-do-not-use -flto -c -o lto8274.o lto8274.c
cc1: warning: command-line option '-frust-incomplete-and-experimental-compiler-do-not-use' is valid for Rust but not for C
For GCC/Rust testing, this flag is (as of recently) defaulted in
'gcc/testsuite/lib/rust.exp:rust_init':
lappend ALWAYS_RUSTFLAGS "additional_flags=-frust-incomplete-and-experimental-compiler-do-not-use"
A few more "command-line option [...] is valid for [...] but not for [...]"
instances were found in the test suite logs, when more than one language is
involved.
With '-Wno-complain-wrong-lang' used in
'gcc/testsuite/lib/target-supports.exp:check_compile', we get back:
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O0 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O1 (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 (test for excess errors)
+PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fno-use-linker-plugin -flto-partition=none (test for excess errors)
+PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -O3 -g (test for excess errors)
PASS: rust/compile/torture/all_doc_comment_line_blocks.rs -Os (test for excess errors)
Etc., and in total:
=== rust Summary for unix ===
# of expected passes [-4990-]{+6718+}
# of expected failures [-39-]{+51+}
Anything that 'gcc/opts-global.cc:complain_wrong_lang' might do is cut
short by '-Wno-complain-wrong-lang', not just the one 'warning'
diagnostic. This corresponds to what already exists via
'lang_hooks.complain_wrong_lang_p'.
The 'gcc/opts-common.cc:prune_options' changes follow the same rationale
as PR67640 "driver passes -fdiagnostics-color= always last": we need to
process '-Wno-complain-wrong-lang' early, so that it properly affects
other options appearing before it on the command line.
gcc/
* common.opt (-Wcomplain-wrong-lang): New.
* doc/invoke.texi (-Wno-complain-wrong-lang): Document it.
* opts-common.cc (prune_options): Handle it.
* opts-global.cc (complain_wrong_lang): Use it.
gcc/testsuite/
* gcc.dg/Wcomplain-wrong-lang-1.c: New.
* gcc.dg/Wcomplain-wrong-lang-2.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-3.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-4.c: Likewise.
* gcc.dg/Wcomplain-wrong-lang-5.c: Likewise.
* lib/target-supports.exp (check_compile): Use
'-Wno-complain-wrong-lang'.
* g++.dg/abi/empty12.C: Likewise.
* g++.dg/abi/empty13.C: Likewise.
* g++.dg/abi/empty14.C: Likewise.
* g++.dg/abi/empty15.C: Likewise.
* g++.dg/abi/empty16.C: Likewise.
* g++.dg/abi/empty17.C: Likewise.
* g++.dg/abi/empty18.C: Likewise.
* g++.dg/abi/empty19.C: Likewise.
* g++.dg/abi/empty22.C: Likewise.
* g++.dg/abi/empty25.C: Likewise.
* g++.dg/abi/empty26.C: Likewise.
* gfortran.dg/bind-c-contiguous-1.f90: Likewise.
* gfortran.dg/bind-c-contiguous-4.f90: Likewise.
* gfortran.dg/bind-c-contiguous-5.f90: Likewise.
libgomp/
* testsuite/libgomp.fortran/alloc-10.f90: Use
'-Wno-complain-wrong-lang'.
* testsuite/libgomp.fortran/alloc-11.f90: Likewise.
* testsuite/libgomp.fortran/alloc-7.f90: Likewise.
* testsuite/libgomp.fortran/alloc-9.f90: Likewise.
* testsuite/libgomp.fortran/allocate-1.f90: Likewise.
* testsuite/libgomp.fortran/depend-4.f90: Likewise.
* testsuite/libgomp.fortran/depend-5.f90: Likewise.
* testsuite/libgomp.fortran/depend-6.f90: Likewise.
* testsuite/libgomp.fortran/depend-7.f90: Likewise.
* testsuite/libgomp.fortran/depend-inoutset-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90:
Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90:
Likewise.
* testsuite/libgomp.fortran/order-reproducible-1.f90: Likewise.
* testsuite/libgomp.fortran/order-reproducible-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/parallel-dims.f90: Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90: Remove left-over
'dg-prune-output'.
|
|
I saw
FAIL: libgomp.fortran/target-nowait-array-section.f90 -O execution test
in my last x86_64-linux bootstrap. From quick skimming, it might be just
unreliable test, which assumes that asynchronous execution wouldn't produce
ordered sequence, but can't it happen even with asynchronous execution?
That said, while skimming the test, I've noticed a comment typo and
this patch fixes that up.
2023-02-16 Jakub Jelinek <jakub@redhat.com>
* testsuite/libgomp.fortran/target-nowait-array-section.f90: Fix
comment typo and improve its wording.
|
|
libgomp/
* target.c (gomp_target_rev): Dereference ptr
to get device address.
* testsuite/libgomp.fortran/reverse-offload-5.f90: Add test
for unallocated allocatable.
|
|
As GOMP_MAP_ALWAYS_POINTER operates on the previous map item, ensure that
with 'target enter data' both are passed together to gomp_map_vars_internal.
libgomp/ChangeLog:
* target.c (gomp_map_vars_internal): Add 'i > 0' before doing a
kind check.
(GOMP_target_enter_exit_data): If the next map item is
GOMP_MAP_ALWAYS_POINTER map it together with the current item.
* testsuite/libgomp.fortran/target-enter-data-3.f90: New test.
|
|
This patch ensures that loop bounds depending on outer loop vars use the
proper TREE_VEC format. It additionally gives a sorry if such an outer
var has a non-one/non-minus-one increment as currently a count variable
is used in this case (see PR).
Finally, it avoids 'count' and just uses a local loop variable if the
step increment is +/-1.
PR fortran/107424
gcc/fortran/ChangeLog:
* trans-openmp.cc (struct dovar_init_d): Add 'sym' and
'non_unit_incr' members.
(gfc_nonrect_loop_expr): New.
(gfc_trans_omp_do): Call it; use normal loop bounds
for unit stride - and only create local loop var.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/non-rectangular-loop-1.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-1a.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-2.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-3.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-4.f90: New test.
* testsuite/libgomp.fortran/non-rectangular-loop-5.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/privatization-1-compute-loop.f90: Update dg-note.
* gfortran.dg/goacc/privatization-1-routine_gang-loop.f90: Likewise.
|
|
Fix-up for recent commit 0b1ce70a813b98ef2893779d14ad6c90c5d06a71
"libgomp: Fix reverse offload issues".
libgomp/
* testsuite/libgomp.fortran/reverse-offload-6.f90: Fix nvptx
offloading compilation.
|
|
If there is nothing to map, skip the mapping and avoid attempting to
copy 0 bytes from addrs, sizes and kinds.
Additionally, it could happen that a non-allocated address was deallocated,
such as a pointer set, leading to a free for the actual data.
libgomp/
* target.c (gomp_target_rev): Handle mapnum == 0 and avoid
freeing not allocated memory.
* testsuite/libgomp.fortran/reverse-offload-6.f90: New test.
|
|
gcc/fortran/ChangeLog:
* openmp.cc (resolve_omp_clauses): Check also for
power of two.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-3.f90: Fix ALIGN
usage, remove unused -fdump-tree-original.
* testsuite/libgomp.fortran/allocate-4.f90: New.
|
|
gcc/fortran/ChangeLog:
PR fortran/108558
* trans-openmp.cc (gfc_split_omp_clauses): Handle has_device_addr.
libgomp/ChangeLog:
PR fortran/108558
* testsuite/libgomp.fortran/has_device_addr.f90: New test.
|
|
expand_omp_for_init_counts was using for the case where collapse(2)
inner loop has init expression dependent on non-constant multiple of
the outer iterator and the condition upper bound expression doesn't
depend on the outer iterator fold_unary (NEGATE_EXPR, ...). This
will just return NULL if it can't be folded, we need fold_build1
instead.
2023-01-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108459
* omp-expand.cc (expand_omp_for_init_counts): Use fold_build1 rather
than fold_unary for NEGATE_EXPR.
* testsuite/libgomp.c/pr108459.c: New test.
|
|
|
|
The comment in the loop says that we shouldn't add a map clause if such
a clause exists already, but the loop was actually using OMP_CLAUSE_DECL
on any clause. Target construct can have various clauses which don't
have OMP_CLAUSE_DECL at all (e.g. nowait, device or if) or clause
where it means something different (e.g. privatization clauses, allocate,
depend).
So, only check OMP_CLAUSE_DECL on OMP_CLAUSE_MAP clauses.
2023-01-05 Jakub Jelinek <jakub@redhat.com>
PR c++/108286
* semantics.cc (finish_omp_target_clauses): Ignore clauses other than
OMP_CLAUSE_MAP.
* testsuite/libgomp.c++/pr108286.C: New test.
|
|
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR
of this->field used just during gimplification and omp lowering/expansion
to privatize individual fields in methods when needed.
As the following testcase shows, when not in templates, they were handled
right, but in templates we actually called cp_finish_decl on them and
that can result in their destruction, which is obviously undesirable,
we should only destruct the privatized copies of them created in omp
lowering.
Fixed thusly.
2022-12-21 Jakub Jelinek <jakub@redhat.com>
PR c++/108180
* pt.cc (tsubst_expr): Don't call cp_finish_decl on
DECL_OMP_PRIVATIZED_MEMBER vars.
* testsuite/libgomp.c++/pr108180.C: New test.
|
|
Commit r13-4716-ge205ec03f0794aeac3e8a89e947c12624d5a274e accidentally
included a testcase of another patch that is pending review:
https://gcc.gnu.org/pipermail/gcc-patches/2022-December/608401.html
libgomp/
PR libfortran/108056
* testsuite/libgomp.fortran/allocate-4.f90: Remove
accidentally added file.
|
|
code [PR108056]
Since GCC 12, the conversion between the array descriptors formats - the
internal (GFC) and the C binding one (CFI) - moved to the compiler itself
such that the cfi_desc_to_gfc_desc/gfc_desc_to_cfi_desc functions are only
used with older code (GCC 9 to 11). The newly added checks caused asserts
as older code did not pass the proper values (e.g. real(4) as effective
argument arrived as BT_ASSUME type as the effective type got lost inbetween).
As proposed in the PR, revert to the GCC 11 version - known bugs is better
than some fixes and new issues. Still, GCC 12 is much better in terms of
TS29113 support and should really be used.
This patch uses the current libgomp version of the GCC 11 branch, except
it fixes the GFC version number (which is 0), uses calloc instead of malloc,
and sets the lower bound to 1 instead of keeping it as is for
CFI_attribute_other.
libgfortran/ChangeLog:
PR libfortran/108056
* runtime/ISO_Fortran_binding.c (cfi_desc_to_gfc_desc,
gfc_desc_to_cfi_desc): Mostly revert to GCC 11 version for
those backward-compatiblity-only functions.
|
|
This patch fixes a case where a combined directive (e.g. "!$omp target
parallel ...") contains both a map and a firstprivate clause for the
same variable. When the combined directive is split into two nested
directives, the outer "target" gets the "map" clause, and the inner
"parallel" gets the "firstprivate" clause, like so:
!$omp target parallel map(x) firstprivate(x)
-->
!$omp target map(x)
!$omp parallel firstprivate(x)
...
When there is no map of the same variable, the firstprivate is distributed
to both directives, e.g. for 'y' in:
!$omp target parallel map(x) firstprivate(y)
-->
!$omp target map(x) firstprivate(y)
!$omp parallel firstprivate(y)
...
This is not a recent regression, but appear to fix a long-standing ICE.
(The included testcase is based on one by Tobias.)
2022-12-06 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* trans-openmp.cc (gfc_add_firstprivate_if_unmapped): New function.
(gfc_split_omp_clauses): Call above.
libgomp/
* testsuite/libgomp.fortran/combined-directive-splitting-1.f90: New
test.
|
|
This commit enabled reverse offload for nvptx such that gomp_target_rev
actually gets called. And it fills the latter function to do all of
the following: finding the host function to the device func ptr and
copying the arguments to the host, processing the mapping/firstprivate,
calling the host function, copying back the data and freeing as needed.
The data handling is made easier by assuming that all host variables
either existed before (and are in the mapping) or that those are
devices variables not yet available on the host. Thus, the reverse
mapping can do without refcounts etc. Note that the spec disallows
inside a target region device-affecting constructs other than target
plus ancestor device-modifier and it also limits the clauses permitted
on this construct.
For the function addresses, an additional splay tree is used; for
the lookup of mapped variables, the existing splay-tree is used.
Unfortunately, its data structure requires a full walk of the tree;
Additionally, the just mapped variables are recorded in a separate
data structure an extra lookup. While the lookup is slow, assuming
that only few variables get mapped in each reverse offload construct
and that reverse offload is the exception and not performance critical,
this seems to be acceptable.
libgomp/ChangeLog:
* libgomp.h (struct target_mem_desc): Predeclare; move
below after 'reverse_splay_tree_node' and add rev_array
member.
(struct reverse_splay_tree_key_s, reverse_splay_compare): New.
(reverse_splay_tree_node, reverse_splay_tree,
reverse_splay_tree_key): New typedef.
(struct gomp_device_descr): Add mem_map_rev member.
* oacc-host.c (host_dispatch): NULL init .mem_map_rev.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices): Claim
support for GOMP_REQUIRES_REVERSE_OFFLOAD.
* splay-tree.h (splay_tree_callback_stop): New typedef; like
splay_tree_callback but returning int not void.
(splay_tree_foreach_lazy): Define; like splay_tree_foreach but
taking splay_tree_callback_stop as argument.
* splay-tree.c (splay_tree_foreach_internal_lazy,
splay_tree_foreach_lazy): New; but early exit if callback returns
nonzero.
* target.c: Instatiate splay_tree_c with splay_tree_prefix 'reverse'.
(gomp_map_lookup_rev): New.
(gomp_load_image_to_device): Handle reverse-offload function
lookup table.
(gomp_unload_image_from_device): Free devicep->mem_map_rev.
(struct gomp_splay_tree_rev_lookup_data, gomp_splay_tree_rev_lookup,
gomp_map_rev_lookup, struct cpy_data, gomp_map_cdata_lookup_int,
gomp_map_cdata_lookup): New auxiliary structs and functions for
gomp_target_rev.
(gomp_target_rev): Implement reverse offloading and its mapping.
(gomp_target_init): Init current_device.mem_map_rev.root.
* testsuite/libgomp.fortran/reverse-offload-2.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-3.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-4.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5.f90: New test.
* testsuite/libgomp.fortran/reverse-offload-5a.f90: New test without
mapping of on-device allocated variables.
|
|
gcc/fortran/ChangeLog:
* dump-parse-tree.cc (show_omp_namelist): Improve OMP_LIST_ALLOCATE
output.
* gfortran.h (struct gfc_omp_namelist): Add 'align' to 'u'.
(gfc_free_omp_namelist): Add bool arg.
* match.cc (gfc_free_omp_namelist): Likewise; free 'u.align'.
* openmp.cc (gfc_free_omp_clauses, gfc_match_omp_clause_reduction,
gfc_match_omp_flush): Update call.
(gfc_match_omp_clauses): Match 'align/allocate modifers in
'allocate' clause.
(resolve_omp_clauses): Resolve align.
* st.cc (gfc_free_statement): Update call
* trans-openmp.cc (gfc_trans_omp_clauses): Handle 'align'.
libgomp/ChangeLog:
* libgomp.texi (5.1 Impl. Status): Split allocate clause/directive
item about 'align'; mark clause as 'Y' and directive as 'N'.
* testsuite/libgomp.fortran/allocate-2.f90: New test.
* testsuite/libgomp.fortran/allocate-3.f90: New test.
|
|
omp_{gs}et_teams_thread_limit on offload devices
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order to reuse device-specific ICV values, a copy back mechanism is
implemented that copies ICV values back from device to the host.
Additionally, a limitation of the number of teams on gcn offload devices is
implemented. The number of teams is limited by twice the number of compute
units (one team is executed on one compute unit). This avoids queueing
unnessecary many teams and a corresponding allocation of large amounts of
memory. Without that limitation the memory allocation for a large number of
user-specified teams can result in an "memory access fault".
A limitation of the number of teams is already also implemented for nvptx
devices (see nvptx_adjust_launch_bounds in libgomp/plugin/plugin-nvptx.c).
gcc/ChangeLog:
* gimplify.cc (optimize_target_teams): Set initial num_teams_upper
to "-2" instead of "1" for non-existing num_teams clause in order to
disambiguate from the case of an existing num_teams clause with value 1.
libgomp/ChangeLog:
* config/gcn/icv-device.c (omp_get_teams_thread_limit): Added to
allow processing of device-specific values.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* config/nvptx/icv-device.c (omp_get_teams_thread_limit): Likewise.
(omp_set_teams_thread_limit): Likewise.
(ialias): Likewise.
* icv-device.c (omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
(omp_set_teams_thread_limit): Likewise.
* icv.c (omp_set_teams_thread_limit): Removed.
(omp_get_teams_thread_limit): Likewise.
(ialias): Likewise.
* libgomp.texi: Updated documentation for nvptx and gcn corresponding
to the limitation of the number of teams.
* plugin/plugin-gcn.c (limit_teams): New helper function that limits
the number of teams by twice the number of compute units.
(parse_target_attributes): Limit the number of teams on gcn offload
devices.
* target.c (get_gomp_offload_icvs): Added teams_thread_limit_var
handling.
(gomp_load_image_to_device): Added a size check for the ICVs struct
variable.
(gomp_copy_back_icvs): New function that is used in GOMP_target_ext to
copy back the ICV values from device to host.
(GOMP_target_ext): Update the number of teams and threads in the kernel
args also considering device-specific values.
* testsuite/libgomp.c-c++-common/icv-4.c: Fixed an error in the reading
of OMP_TEAMS_THREAD_LIMIT from the environment.
* testsuite/libgomp.c-c++-common/icv-5.c: Extended.
* testsuite/libgomp.c-c++-common/icv-6.c: Extended.
* testsuite/libgomp.c-c++-common/icv-7.c: Extended.
* testsuite/libgomp.c-c++-common/icv-9.c: New test.
* testsuite/libgomp.fortran/icv-5.f90: New test.
* testsuite/libgomp.fortran/icv-6.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/target-teams-1.c: Adapt expected values for
num_teams from "1" to "-2" in cases without num_teams clause.
* g++.dg/gomp/target-teams-1.C: Likewise.
* gfortran.dg/gomp/defaultmap-4.f90: Likewise.
* gfortran.dg/gomp/defaultmap-5.f90: Likewise.
* gfortran.dg/gomp/defaultmap-6.f90: Likewise.
|
|
Add support for gfx803 as an alias for fiji.
Add test cases for all supported 'isa' values.
gcc/ChangeLog:
* config/gcn/gcn.cc (gcn_omp_device_kind_arch_isa): Add gfx803.
* config/gcn/t-omp-device: Add gfx803.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4-fiji.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx803.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx900.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx906.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx908.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx90a.c: New test.
* testsuite/libgomp.c/declare-variant-4.h: New header file.
|
|
This patch causes the IPA simdclone pass to generate clones for
functions with the "omp declare target" attribute as if they had
"omp declare simd", provided the function appears to be suitable for
SIMD execution. The filter is conservative, rejecting functions
that write memory or that call other functions not known to be safe.
A new option -fopenmp-target-simd-clone is added to control this
transformation; it's enabled for offload processing at -O2 and higher.
gcc/ChangeLog:
* common.opt (fopenmp-target-simd-clone): New option.
(target_simd_clone_device): New enum to go with it.
* doc/invoke.texi (-fopenmp-target-simd-clone): Document.
* flag-types.h (enum omp_target_simd_clone_device_kind): New.
* omp-simd-clone.cc (auto_simd_fail): New function.
(auto_simd_check_stmt): New function.
(plausible_type_for_simd_clone): New function.
(ok_for_auto_simd_clone): New function.
(simd_clone_create): Add force_local argument, make the symbol
have internal linkage if it is true.
(expand_simd_clones): Also check for cloneable functions with
"omp declare target". Pass explicit_p argument to
simd_clone.compute_vecsize_and_simdlen target hook.
* opts.cc (default_options_table): Add -fopenmp-target-simd-clone.
* target.def (TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN):
Add bool explicit_p argument.
* doc/tm.texi: Regenerated.
* config/aarch64/aarch64.cc
(aarch64_simd_clone_compute_vecsize_and_simdlen): Update.
* config/gcn/gcn.cc
(gcn_simd_clone_compute_vecsize_and_simdlen): Update.
* config/i386/i386.cc
(ix86_simd_clone_compute_vecsize_and_simdlen): Update.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/target-simd-clone-1.C: New.
* g++.dg/gomp/target-simd-clone-2.C: New.
* gcc.dg/gomp/target-simd-clone-1.c: New.
* gcc.dg/gomp/target-simd-clone-2.c: New.
* gcc.dg/gomp/target-simd-clone-3.c: New.
* gcc.dg/gomp/target-simd-clone-4.c: New.
* gcc.dg/gomp/target-simd-clone-5.c: New.
* gcc.dg/gomp/target-simd-clone-6.c: New.
* gcc.dg/gomp/target-simd-clone-7.c: New.
* gcc.dg/gomp/target-simd-clone-8.c: New.
* lib/scanoffloadipa.exp: New.
libgomp/ChangeLog:
* testsuite/lib/libgomp.exp: Load scanoffloadipa.exp library.
* testsuite/libgomp.c/target-simd-clone-1.c: New.
* testsuite/libgomp.c/target-simd-clone-2.c: New.
* testsuite/libgomp.c/target-simd-clone-3.c: New.
|
|
OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case. This commit adds a testcase for this.
In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.
libgomp/ChangeLog:
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
as valid and the code having no reverse-offload code.
* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.
|
|
... after its deprecation in GCC 12.
* Makefile.def: Remove module 'liboffloadmic'.
* Makefile.in: Regenerate.
* configure.ac: Remove 'liboffloadmic' handling.
* configure: Regenerate.
contrib/
* gcc-changelog/git_commit.py (default_changelog_locations):
Remove 'liboffloadmic'.
* gcc_update (files_and_dependencies): Remove 'liboffloadmic'
files.
* update-copyright.py (GCCCmdLine): Remove 'liboffloadmic'
comment.
gcc/
* config.gcc [target *-intelmic-* | *-intelmicemul-*]: Remove.
* config/i386/i386-options.cc (ix86_omp_device_kind_arch_isa)
[ACCEL_COMPILER]: Remove.
* config/i386/intelmic-mkoffload.cc: Remove.
* config/i386/intelmic-offload.h: Likewise.
* config/i386/t-intelmic: Likewise.
* config/i386/t-omp-device: Likewise.
* configure.ac [target *-intelmic-* | *-intelmicemul-*]: Remove.
* configure: Regenerate.
* doc/install.texi (--enable-offload-targets=[...]): Update.
* doc/sourcebuild.texi: Remove 'liboffloadmic' documentation.
include/
* gomp-constants.h (GOMP_DEVICE_INTEL_MIC): Comment out.
(GOMP_VERSION_INTEL_MIC): Remove.
libgomp/
* libgomp-plugin.h (OFFLOAD_TARGET_TYPE_INTEL_MIC): Remove.
* libgomp.texi (OpenMP Context Selectors): Remove Intel MIC
documentation.
* plugin/configfrag.ac <enable_offload_targets>
[*-intelmic-* | *-intelmicemul-*]: Remove.
* configure: Regenerate.
* testsuite/lib/libgomp.exp (libgomp_init): Remove 'liboffloadmic'
handling.
(offload_target_to_openacc_device_type)
[$offload_target = *-intelmic*]: Remove.
(check_effective_target_offload_device_intel_mic)
(check_effective_target_offload_device_any_intel_mic): Remove.
* testsuite/libgomp.c-c++-common/on_device_arch.h
(device_arch_intel_mic, on_device_arch_intel_mic, any_device_arch)
(any_device_arch_intel_mic): Remove.
* testsuite/libgomp.c-c++-common/target-45.c: Remove
'offload_device_any_intel_mic' XFAIL.
* testsuite/libgomp.fortran/target10.f90: Likewise.
liboffloadmic/
* ChangeLog: Remove.
* Makefile.am: Likewise.
* Makefile.in: Likewise.
* aclocal.m4: Likewise.
* configure: Likewise.
* configure.ac: Likewise.
* configure.tgt: Likewise.
* doc/doxygen/config: Likewise.
* doc/doxygen/header.tex: Likewise.
* include/coi/common/COIEngine_common.h: Likewise.
* include/coi/common/COIEvent_common.h: Likewise.
* include/coi/common/COIMacros_common.h: Likewise.
* include/coi/common/COIPerf_common.h: Likewise.
* include/coi/common/COIResult_common.h: Likewise.
* include/coi/common/COISysInfo_common.h: Likewise.
* include/coi/common/COITypes_common.h: Likewise.
* include/coi/sink/COIBuffer_sink.h: Likewise.
* include/coi/sink/COIPipeline_sink.h: Likewise.
* include/coi/sink/COIProcess_sink.h: Likewise.
* include/coi/source/COIBuffer_source.h: Likewise.
* include/coi/source/COIEngine_source.h: Likewise.
* include/coi/source/COIEvent_source.h: Likewise.
* include/coi/source/COIPipeline_source.h: Likewise.
* include/coi/source/COIProcess_source.h: Likewise.
* liboffloadmic_host.spec.in: Likewise.
* liboffloadmic_target.spec.in: Likewise.
* plugin/Makefile.am: Likewise.
* plugin/Makefile.in: Likewise.
* plugin/aclocal.m4: Likewise.
* plugin/configure: Likewise.
* plugin/configure.ac: Likewise.
* plugin/libgomp-plugin-intelmic.cpp: Likewise.
* plugin/offload_target_main.cpp: Likewise.
* runtime/cean_util.cpp: Likewise.
* runtime/cean_util.h: Likewise.
* runtime/coi/coi_client.cpp: Likewise.
* runtime/coi/coi_client.h: Likewise.
* runtime/coi/coi_server.cpp: Likewise.
* runtime/coi/coi_server.h: Likewise.
* runtime/compiler_if_host.cpp: Likewise.
* runtime/compiler_if_host.h: Likewise.
* runtime/compiler_if_target.cpp: Likewise.
* runtime/compiler_if_target.h: Likewise.
* runtime/dv_util.cpp: Likewise.
* runtime/dv_util.h: Likewise.
* runtime/emulator/coi_common.h: Likewise.
* runtime/emulator/coi_device.cpp: Likewise.
* runtime/emulator/coi_device.h: Likewise.
* runtime/emulator/coi_host.cpp: Likewise.
* runtime/emulator/coi_host.h: Likewise.
* runtime/emulator/coi_version_asm.h: Likewise.
* runtime/emulator/coi_version_linker_script.map: Likewise.
* runtime/liboffload_error.c: Likewise.
* runtime/liboffload_error_codes.h: Likewise.
* runtime/liboffload_msg.c: Likewise.
* runtime/liboffload_msg.h: Likewise.
* runtime/mic_lib.f90: Likewise.
* runtime/offload.h: Likewise.
* runtime/offload_common.cpp: Likewise.
* runtime/offload_common.h: Likewise.
* runtime/offload_engine.cpp: Likewise.
* runtime/offload_engine.h: Likewise.
* runtime/offload_env.cpp: Likewise.
* runtime/offload_env.h: Likewise.
* runtime/offload_host.cpp: Likewise.
* runtime/offload_host.h: Likewise.
* runtime/offload_iterator.h: Likewise.
* runtime/offload_omp_host.cpp: Likewise.
* runtime/offload_omp_target.cpp: Likewise.
* runtime/offload_orsl.cpp: Likewise.
* runtime/offload_orsl.h: Likewise.
* runtime/offload_table.cpp: Likewise.
* runtime/offload_table.h: Likewise.
* runtime/offload_target.cpp: Likewise.
* runtime/offload_target.h: Likewise.
* runtime/offload_target_main.cpp: Likewise.
* runtime/offload_timer.h: Likewise.
* runtime/offload_timer_host.cpp: Likewise.
* runtime/offload_timer_target.cpp: Likewise.
* runtime/offload_trace.cpp: Likewise.
* runtime/offload_trace.h: Likewise.
* runtime/offload_util.cpp: Likewise.
* runtime/offload_util.h: Likewise.
* runtime/ofldbegin.cpp: Likewise.
* runtime/ofldend.cpp: Likewise.
* runtime/orsl-lite/include/orsl-lite.h: Likewise.
* runtime/orsl-lite/lib/orsl-lite.c: Likewise.
* runtime/orsl-lite/version.txt: Likewise.
|
|
OpenMP 5.0 permits to use arrays with derived type components for the list
items to the 'from'/'to' clauses of the 'target update' directive.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_clauses): Permit derived types for
the 'to' and 'from' clauses of 'target update'.
* trans-openmp.cc (gfc_trans_omp_clauses): Fixes for
derived-type changes; fix size for scalars.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/target-11.f90: New test.
* testsuite/libgomp.fortran/target-13.f90: New test.
|