aboutsummaryrefslogtreecommitdiff
path: root/libgomp
AgeCommit message (Collapse)AuthorFilesLines
2023-06-06libgomp: plugin-gcn - support 'unified_address'Tobias Burnus2-6/+7
Effectively, for GCN (as for nvptx) there is a common address space between host and device, whether being accessible or not. Thus, this commit permits to use 'omp requires unified_address' with GCN devices. (nvptx accepts this requirement since r13-3460-g131d18e928a3ea.) libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Regard unified_address requirement as supported. * libgomp.texi (OpenMP 5.0, AMD Radeon, nvptx): Remove 'unified_address' from the not-supported requirements.
2023-06-06openmp: Add support for the 'present' modifierTobias Burnus8-6/+227
This implements support for the OpenMP 5.1 'present' modifier, which can be used in map clauses in the 'target', 'target data', 'target data enter' and 'target data exit' constructs, and in the 'to' and 'from' clauses of the 'target update' construct. It is also supported in defaultmap. The modifier triggers a fatal runtime error if the data specified by the clause is not already present on the target device. It can also be combined with 'always' in map clauses. 2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/c/ * c-parser.cc (c_parser_omp_clause_defaultmap, c_parser_omp_clause_map): Parse 'present'. (c_parser_omp_clause_to, c_parser_omp_clause_from): Remove. (c_parser_omp_clause_from_to): New; parse to/from clauses with optional present modifer. (c_parser_omp_all_clauses): Update call. (c_parser_omp_target_data, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Handle new map enum values for 'present' mapping. gcc/cp/ * parser.cc (cp_parser_omp_clause_defaultmap, cp_parser_omp_clause_map): Parse 'present'. (cp_parser_omp_clause_from_to): New; parse to/from clauses with optional 'present' modifier. (cp_parser_omp_all_clauses): Update call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data): Handle new enum value for 'present' mapping. * semantics.cc (finish_omp_target): Likewise. gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses. * gfortran.h (enum gfc_omp_map_op): Add entries with 'present' modifiers. (struct gfc_omp_namelist): Add 'present_modifer'. * openmp.cc (gfc_match_motion_var_list): New, handles optional 'present' modifier for to/from clauses. (gfc_match_omp_clauses): Call it for to/from clauses; parse 'present' in defaultmap and map clauses. (resolve_omp_clauses): Allow 'present' modifiers on 'target', 'target data', 'target enter' and 'target exit' directives. * trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for defaultmap. gcc/ * gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag set. (omp_get_attachment): Handle map clauses with 'present' modifier. (omp_group_base): Likewise. (gimplify_scan_omp_clauses): Reorder present maps to come first. Set GOVD flags for present defaultmaps. (gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps. * omp-low.cc (scan_sharing_clauses): Handle 'always, present' map clauses. (lower_omp_target): Handle map clauses with 'present' modifier. Handle 'to' and 'from' clauses with 'present'. * tree-core.h (enum omp_clause_defaultmap_kind): Add OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind. * tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and 'from' clauses with 'present' modifier. Handle present defaultmap. * tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New. (GOMP_MAP_FLAG_FORCE): Redefine. (GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New. (enum gomp_map_kind): Add map kinds with 'present' modifiers. (GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for map variants with 'present' (GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true for map variants with 'always, present' modifiers. (GOMP_MAP_ALWAYS): Redefine. (GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New. libgomp/ * libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses. * target.c (gomp_to_device_kind_p): Add map kinds with 'present' modifier. (gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro. (gomp_map_vars_internal, gomp_update, gomp_target_rev): Emit runtime error if memory region not present. * testsuite/libgomp.c-c++-common/target-present-1.c: New test. * testsuite/libgomp.c-c++-common/target-present-2.c: New test. * testsuite/libgomp.c-c++-common/target-present-3.c: New test. * testsuite/libgomp.fortran/target-present-1.f90: New test. * testsuite/libgomp.fortran/target-present-2.f90: New test. * testsuite/libgomp.fortran/target-present-3.f90: New test. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update dg-error, extend to test for duplicated 'present' and extend scan-dump tests for 'present'. * gfortran.dg/gomp/defaultmap-1.f90: Update dg-error. * gfortran.dg/gomp/map-7.f90: Extend parse and dump test for 'present'. * gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present' modifier checking. * c-c++-common/gomp/defaultmap-4.c: New test. * c-c++-common/gomp/map-9.c: New test. * c-c++-common/gomp/target-update-1.c: New test. * gfortran.dg/gomp/defaultmap-8.f90: New test. * gfortran.dg/gomp/map-11.f90: New test. * gfortran.dg/gomp/map-12.f90: New test. * gfortran.dg/gomp/target-update-1.f90: New test.
2023-06-03Daily bump.GCC Administrator1-0/+16
2023-06-02Support parallel testing in libgomp: fallback Perl 'flock' [PR66005]Thomas Schwinge4-1/+67
Follow-up to commit 6c3b30ef9e0578509bdaf59c13da4a212fe6c2ba "Support parallel testing in libgomp, part II [PR66005]" ("..., and enable if 'flock' is available for serializing execution testing"), where we saw: > On my Dell Precision 7530 laptop: > > $ uname -srvi > Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 > $ grep '^model name' < /proc/cpuinfo | uniq -c > 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz > $ nvidia-smi -L > GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) > > ... [...]: case (c) standard configuration, no offloading > configured, [...] > $ \time make check-target-libgomp > > Case (c), baseline; [...]: > > 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k > 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k > > Case (c), parallelized [using 'flock']: > > [...] > -j12 GCC_TEST_PARALLEL_SLOTS=12 > 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k > 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k Quite the same when instead of 'flock' using this fallback Perl 'flock': 2565.23user 194.35system 4:46.77elapsed 962%CPU (0avgtext+0avgdata 505216maxresident)k 2549.38user 200.20system 4:46.08elapsed 961%CPU (0avgtext+0avgdata 505216maxresident)k PR testsuite/66005 gcc/ * doc/install.texi: Document (optional) Perl usage for parallel testing of libgomp. libgomp/ * testsuite/lib/libgomp.exp: 'flock' through stdout. * testsuite/flock: New. * configure.ac (FLOCK): Point to that if no 'flock' available, but 'perl' is. * configure: Regenerate.
2023-06-02Remove stale Autoconf checks for PerlThomas Schwinge4-47/+2
Subversion r110220 (Git commit 03b8fe495d716c004f5491eb2347537f115ab2d8) for PR25884 "libgomp should not require perl to compile" removed all '$(PERL)' usage from libgomp -- but didn't remove the then-unused Autoconf Perl check itself. Later, this Autoconf Perl check appears to have been copied from libgomp into other GCC libraries, likewise unused. libgomp/ * configure.ac (PERL): Remove. * configure: Regenerate. * Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. libatomic/ * configure.ac (PERL): Remove. * configure: Regenerate. * Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. libgm2/ * configure.ac (PERL): Remove. * configure: Regenerate. * Makefile.in: Likewise. * libm2cor/Makefile.in: Likewise. * libm2iso/Makefile.in: Likewise. * libm2log/Makefile.in: Likewise. * libm2min/Makefile.in: Likewise. * libm2pim/Makefile.in: Likewise. libitm/ * configure.ac (PERL): Remove. * configure: Regenerate. * Makefile.in: Likewise. * testsuite/Makefile.in: Likewise.
2023-06-02Daily bump.GCC Administrator1-0/+4
2023-06-01OpenMP/Fortran: Permit pure directives inside PURETobias Burnus1-1/+1
Update permitted directives for directives marked in OpenMP's 5.2 as pure. To ensure that list is updated, unimplemented directives are placed into pure-2.f90 such the test FAILs once a known to be pure directive is implemented without handling its pureness. gcc/fortran/ChangeLog: * parse.cc (decode_omp_directive): Accept all pure directives inside a PURE procedures; handle 'error at(execution). libgomp/ChangeLog: * libgomp.texi (OpenMP 5.2): Mark pure-directive handling as 'Y'. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/nothing-2.f90: Remove one dg-error. * gfortran.dg/gomp/pr79154-2.f90: Update expected dg-error wording. * gfortran.dg/gomp/pr79154-simd.f90: Likewise. * gfortran.dg/gomp/pure-1.f90: New test. * gfortran.dg/gomp/pure-2.f90: New test. * gfortran.dg/gomp/pure-3.f90: New test. * gfortran.dg/gomp/pure-4.f90: New test.
2023-05-27Daily bump.GCC Administrator1-0/+4
2023-05-26Fortran/OpenMP: Add parsing support for allocators/allocate directivesTobias Burnus1-6/+6
gcc/fortran/ChangeLog: * dump-parse-tree.cc (show_omp_namelist): Update allocator, fix align dump. (show_omp_node, show_code_node): Handle EXEC_OMP_ALLOCATE. * gfortran.h (enum gfc_statement): Add ST_OMP_ALLOCATE and ..._EXEC. (enum gfc_exec_op): Add EXEC_OMP_ALLOCATE. (struct gfc_omp_namelist): Add 'allocator' to 'u2' union. (struct gfc_namespace): Add omp_allocate. (gfc_resolve_omp_allocate): New. * match.cc (gfc_free_omp_namelist): Free 'u2.allocator'. * match.h (gfc_match_omp_allocate, gfc_match_omp_allocators): New. * openmp.cc (gfc_omp_directives): Uncomment allocate/allocators. (gfc_match_omp_variable_list): Add bool arg for rejecting listening common-block vars separately. (gfc_match_omp_clauses): Update for u2.allocators. (OMP_ALLOCATORS_CLAUSES, gfc_match_omp_allocate, gfc_match_omp_allocators, is_predefined_allocator, gfc_resolve_omp_allocate): New. (resolve_omp_clauses): Update 'allocate' clause checks. (omp_code_to_statement, gfc_resolve_omp_directive): Handle OMP ALLOCATE/ALLOCATORS. * parse.cc (in_exec_part): New global var. (check_omp_allocate_stmt, parse_openmp_allocate_block): New. (decode_omp_directive, case_exec_markers, case_omp_decl, gfc_ascii_statement, parse_omp_structured_block): Handle OMP allocate/allocators. (verify_st_order, parse_executable): Set in_exec_part. * resolve.cc (gfc_resolve_blocks, resolve_codes): Handle allocate/allocators. * st.cc (gfc_free_statement): Likewise. * trans.cc (trans_code): Likewise. * trans-openmp.cc (gfc_trans_omp_directive): Likewise. (gfc_trans_omp_clauses, gfc_split_omp_clauses): Update for u2.allocator, fix for u.align. libgomp/ChangeLog: * testsuite/libgomp.fortran/allocate-4.f90: Update dg-error. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-2.f90: Update dg-error. * gfortran.dg/gomp/allocate-4.f90: New test. * gfortran.dg/gomp/allocate-5.f90: New test. * gfortran.dg/gomp/allocate-6.f90: New test. * gfortran.dg/gomp/allocate-7.f90: New test. * gfortran.dg/gomp/allocators-1.f90: New test. * gfortran.dg/gomp/allocators-2.f90: New test.
2023-05-22Daily bump.GCC Administrator1-0/+10
2023-05-21libgomp: Honor OpenMP's nteams-var ICV as upper limit on num teams [PR109875]Tobias Burnus6-2/+234
The nteams-var ICV exists per device and can be set either via the routine omp_set_num_teams or as environment variable (OMP_NUM_TEAMS with optional _ALL/_DEV/_DEV_<num> suffix); it is default-initialized to zero. The number of teams created is described under the num_teams clause. If the clause is absent, the number of teams is implementation defined but at least one team must exist and, if nteams-var is positive, at most nteams-var teams may exist. The latter condition was not honored in a target region before this commit, such that too many teams were created. Already before this commit, both the num_teams([lower:]upper) clause (on the host and in target regions) and, only on the host, the nteams-var ICV were honored. And as only one teams is created for host fallback, unless the clause specifies otherwise, the nteams-var ICV was and is effectively honored. libgomp/ChangeLog: PR libgomp/109875 * config/gcn/target.c (GOMP_teams4): Honor nteams-var ICV. * config/nvptx/target.c (GOMP_teams4): Likewise. * testsuite/libgomp.c-c++-common/teams-nteams-icv-1.c: New test. * testsuite/libgomp.c-c++-common/teams-nteams-icv-2.c: New test. * testsuite/libgomp.c-c++-common/teams-nteams-icv-3.c: New test. * testsuite/libgomp.c-c++-common/teams-nteams-icv-4.c: New test.
2023-05-20Daily bump.GCC Administrator1-0/+6
2023-05-19libgomp: Fix up -static -fopenmp linking [PR109904]Jakub Jelinek2-4/+4
When an OpenMP program with target regions is linked statically, it fails to link on various arches (doesn't when using recent glibc because it has libdl stuff in libc), because libgomp.a(target.o) uses dlopen/dlsym/dlclose, but we aren't linking against -ldl (unless user asked for that). We already have libgomp.spec so that we can supply extra libraries to link against in the -static case, this patch adds -ldl to that if plugins are supported. 2023-05-19 Jakub Jelinek <jakub@redhat.com> PR libgomp/109904 * configure.ac (link_gomp): Include also $DL_LIBS. * configure: Regenerated.
2023-05-18Daily bump.GCC Administrator1-0/+9
2023-05-17Fortran/OpenMP: Fix mapping of array descriptors and deferred-length stringsTobias Burnus5-1/+1551
Previously, array descriptors might have been mapped as 'alloc' instead of 'to' for 'alloc', not updating the array bounds. The 'alloc' could also appear for 'data exit', failing with a libgomp assert. In some cases, either array descriptors or deferred-length string's length variable was not mapped. And, finally, some offset calculations with array-sections mappings went wrong. Additionally, the patch now unmaps for scalar allocatables/pointers the GOMP_MAP_POINTER, avoiding stale mappings. The testcases contain some comment-out tests which require follow-up work and for which PR exist. Those mostly relate to deferred-length strings which have several issues beyong OpenMP support. gcc/fortran/ChangeLog: * trans-decl.cc (gfc_get_symbol_decl): Add attributes such as 'declare target' also to hidden artificial variable for deferred-length character variables. * trans-openmp.cc (gfc_trans_omp_array_section, gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data): Improve mapping of array descriptors and deferred-length string variables. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran special case. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment 'target exit data'. * testsuite/libgomp.fortran/target-enter-data-4.f90: New test. * testsuite/libgomp.fortran/target-enter-data-5.f90: New test. * testsuite/libgomp.fortran/target-enter-data-6.f90: New test. * testsuite/libgomp.fortran/target-enter-data-7.f90: New test. gcc/testsuite/ * gfortran.dg/goacc/finalize-1.f: Update dg-tree; shows a fix for 'finalize' as a ptr is now 'delete' instead of 'release'. * gfortran.dg/gomp/pr78260-2.f90: Likewise as elem-size calc moved to if (allocated) block * gfortran.dg/gomp/target-exit-data.f90: Likewise as a var is now a replaced by a MEM< _25 > expression. * gfortran.dg/gomp/map-9.f90: Update dg-scan-tree-dump. * gfortran.dg/gomp/map-10.f90: New test.
2023-05-16Daily bump.GCC Administrator1-0/+62
2023-05-15Support parallel testing in libgomp, part II [PR66005]Thomas Schwinge8-6/+84
..., and enable if 'flock' is available for serializing execution testing. Regarding the default of 19 parallel slots, this turned out to be a local minimum for wall time when testing this on: $ uname -srvi Linux 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 32 model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz ... in two configurations: case (a) standard configuration, no offloading configured, case (b) offloading for GCN and nvptx configured but no devices available. For both cases, default plus '-m32' variant. $ \time make check-target-libgomp RUNTESTFLAGS="--target_board=unix\{,-m32\}" Case (a), baseline: 6432.23user 332.38system 47:32.28elapsed 237%CPU (0avgtext+0avgdata 505044maxresident)k 6382.43user 319.21system 47:06.04elapsed 237%CPU (0avgtext+0avgdata 505172maxresident)k This is what people have been complaining about, rightly so, in <https://gcc.gnu.org/PR66005> "libgomp make check time is excessive" and elsewhere. Case (a), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 3088.49user 267.74system 6:43.82elapsed 831%CPU (0avgtext+0avgdata 505188maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 3308.08user 294.79system 5:56.04elapsed 1011%CPU (0avgtext+0avgdata 505360maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 3539.93user 298.99system 5:27.86elapsed 1170%CPU (0avgtext+0avgdata 505112maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 3697.50user 317.18system 5:14.63elapsed 1275%CPU (0avgtext+0avgdata 505360maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 3765.94user 324.27system 5:13.22elapsed 1305%CPU (0avgtext+0avgdata 505128maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 3684.66user 312.32system 5:15.26elapsed 1267%CPU (0avgtext+0avgdata 505100maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 4040.59user 347.10system 5:29.12elapsed 1333%CPU (0avgtext+0avgdata 505200maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 3973.24user 377.96system 5:24.70elapsed 1340%CPU (0avgtext+0avgdata 505160maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 4004.42user 346.10system 5:16.11elapsed 1376%CPU (0avgtext+0avgdata 505160maxresident)k Yay! Case (b), baseline; 2+ h: 7227.58user 700.54system 2:14:33elapsed 98%CPU (0avgtext+0avgdata 994264maxresident)k Case (b), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=10 7377.46user 777.52system 16:06.63elapsed 843%CPU (0avgtext+0avgdata 994344maxresident)k -j15 GCC_TEST_PARALLEL_SLOTS=15 8019.18user 721.42system 12:13.56elapsed 1191%CPU (0avgtext+0avgdata 994228maxresident)k -j17 GCC_TEST_PARALLEL_SLOTS=17 8530.11user 716.95system 10:45.92elapsed 1431%CPU (0avgtext+0avgdata 994176maxresident)k -j18 GCC_TEST_PARALLEL_SLOTS=18 8776.79user 645.89system 10:27.20elapsed 1502%CPU (0avgtext+0avgdata 994248maxresident)k -j19 GCC_TEST_PARALLEL_SLOTS=19 9332.37user 641.76system 10:15.09elapsed 1621%CPU (0avgtext+0avgdata 994260maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 9609.54user 789.88system 10:26.94elapsed 1658%CPU (0avgtext+0avgdata 994284maxresident)k -j23 GCC_TEST_PARALLEL_SLOTS=23 10362.40user 911.14system 10:44.47elapsed 1749%CPU (0avgtext+0avgdata 994208maxresident)k -j26 GCC_TEST_PARALLEL_SLOTS=26 11159.44user 850.99system 11:09.25elapsed 1794%CPU (0avgtext+0avgdata 994256maxresident)k -j32 GCC_TEST_PARALLEL_SLOTS=32 11453.50user 939.52system 11:00.38elapsed 1876%CPU (0avgtext+0avgdata 994240maxresident)k On my Dell Precision 7530 laptop: $ uname -srvi Linux 5.15.0-71-generic #78-Ubuntu SMP Tue Apr 18 09:00:29 UTC 2023 x86_64 $ grep '^model name' < /proc/cpuinfo | uniq -c 12 model name : Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz $ nvidia-smi -L GPU 0: Quadro P1000 (UUID: GPU-e043973b-b52a-d02b-c066-a8fdbf64e8ea) ... in two configurations: case (c) standard configuration, no offloading configured, case (d) offloading for nvptx configured and device available. For both cases, only default variant, no '-m32'. $ \time make check-target-libgomp Case (c), baseline; roughly half of case (a) (just one variant): 1180.98user 110.80system 19:36.40elapsed 109%CPU (0avgtext+0avgdata 505148maxresident)k 1133.22user 111.08system 19:35.75elapsed 105%CPU (0avgtext+0avgdata 505212maxresident)k Case (c), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 1143.83user 110.76system 10:20.46elapsed 202%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 1737.08user 143.94system 4:59.48elapsed 628%CPU (0avgtext+0avgdata 505200maxresident)k 1730.31user 143.02system 4:58.75elapsed 627%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 2192.63user 169.34system 4:52.96elapsed 806%CPU (0avgtext+0avgdata 505216maxresident)k 2219.04user 167.67system 4:53.19elapsed 814%CPU (0avgtext+0avgdata 505152maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 2463.93user 184.98system 4:48.39elapsed 918%CPU (0avgtext+0avgdata 505200maxresident)k 2455.62user 183.68system 4:47.40elapsed 918%CPU (0avgtext+0avgdata 505216maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 2591.04user 192.64system 4:44.98elapsed 976%CPU (0avgtext+0avgdata 505216maxresident)k 2581.23user 195.21system 4:47.51elapsed 965%CPU (0avgtext+0avgdata 505212maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 2613.18user 199.51system 4:44.06elapsed 990%CPU (0avgtext+0avgdata 505216maxresident)k Case (d), baseline (compared to case (b): only nvptx offloading compilation, but also nvptx offloading execution); ~1 h: 2841.93user 653.68system 1:02:26elapsed 93%CPU (0avgtext+0avgdata 909792maxresident)k 2842.03user 654.39system 1:02:24elapsed 93%CPU (0avgtext+0avgdata 909880maxresident)k Case (d), parallelized: -j12 GCC_TEST_PARALLEL_SLOTS=2 2856.39user 606.87system 33:58.64elapsed 169%CPU (0avgtext+0avgdata 909948maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=6 3444.90user 666.86system 18:37.57elapsed 367%CPU (0avgtext+0avgdata 909856maxresident)k 3462.13user 667.13system 18:36.87elapsed 369%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=8 3929.74user 716.22system 18:02.36elapsed 429%CPU (0avgtext+0avgdata 909832maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=10 4152.84user 736.16system 17:43.05elapsed 459%CPU (0avgtext+0avgdata 909872maxresident)k -j12 GCC_TEST_PARALLEL_SLOTS=12 4209.60user 749.00system 17:35.20elapsed 469%CPU (0avgtext+0avgdata 909840maxresident)k -j20 GCC_TEST_PARALLEL_SLOTS=20 [oversubscribe] 4255.54user 756.78system 17:29.06elapsed 477%CPU (0avgtext+0avgdata 909868maxresident)k Worth noting is that with nvptx offloading, there is one execution test case that times out ('libgomp.fortran/reverse-offload-5.f90'). This effectively stalls progress for almost 5 min: quickly other executions test cases queue up on the lock for all parallel slots. That's working as expected; just noting this as it accordingly does skew the wall time numbers. PR testsuite/66005 libgomp/ * configure.ac: Look for 'flock'. * testsuite/Makefile.am (gcc_test_parallel_slots): Enable parallel testing. * testsuite/config/default.exp: Don't 'load_lib "standard.exp"' here... * testsuite/lib/libgomp.exp: ... but here, instead. (libgomp_load): Override for parallel testing. * testsuite/libgomp-site-extra.exp.in (FLOCK): Set. * configure: Regenerate. * Makefile.in: Regenerate. * testsuite/Makefile.in: Regenerate.
2023-05-15Support parallel testing in libgomp, part I [PR66005]Rainer Orth3-23/+138
..., while still hard-coding the number of parallel slots to one. PR testsuite/66005 libgomp/ * testsuite/Makefile.am (PWD_COMMAND): New variable. (%/site.exp): New target. (check_p_numbers0, check_p_numbers1, check_p_numbers2) (check_p_numbers3, check_p_numbers4, check_p_numbers5) (check_p_numbers6, check_p_numbers, gcc_test_parallel_slots) (check_p_subdirs) (check_DEJAGNU_libgomp_targets): New variables. ($(check_DEJAGNU_libgomp_targets)): New target. ($(check_DEJAGNU_libgomp_targets)): New dependency. (check-DEJAGNU $(check_DEJAGNU_libgomp_targets)): New targets. * testsuite/Makefile.in: Regenerate. * testsuite/lib/libgomp.exp: For parallel testing, 'load_file ../libgomp-test-support.exp'. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2023-05-15libgomp testsuite: As appropriate, use the 'gcc', 'g++', 'gfortran' driver ↵Thomas Schwinge10-65/+71
[PR91884] ..., that is, 'GCC_UNDER_TEST', 'GXX_UNDER_TEST', 'GFORTRAN_UNDER_TEST' instead of 'GCC_UNDER_TEST' for all of them. No need anymore for 'gcc -lstdc++ -x c++' for C++ code, or 'gcc -lgfortran' plus conditional '-lquadmath' for Fortran code. (Getting rid of explicit '-foffload=-lgfortran' is for another day.) PR testsuite/91884 libgomp/ * configure.ac: 'AC_SUBST(CXX)'. * configure: Regenerate. * Makefile.in: Likewise. * testsuite/Makefile.in: Likewise. * testsuite/libgomp-site-extra.exp.in (GXX_UNDER_TEST) (GFORTRAN_UNDER_TEST): Set. * testsuite/lib/libgomp.exp (libgomp_init): Adjust. * testsuite/libgomp.c++/c++.exp: Use 'GXX_UNDER_TEST'. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Use 'GFORTRAN_UNDER_TEST'. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-15libgomp testsuite: Have each '*.exp' file specify the compiler to use [PR91884]Thomas Schwinge8-15/+17
..., which is still 'GCC_UNDER_TEST' for all of them; no change in behavior. PR testsuite/91884 libgomp/ * testsuite/lib/libgomp.exp (libgomp_target_compile): Don't specify compiler. * testsuite/libgomp.c++/c++.exp (ALWAYS_CFLAGS): Specify compiler. * testsuite/libgomp.c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.fortran/fortran.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.graphite/graphite.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Likewise.
2023-05-13Daily bump.GCC Administrator1-0/+44
2023-05-12LTO: Fix writing of toplevel asm with offloading [PR109816]Tobias Burnus2-0/+104
When offloading was enabled, top-level 'asm' were added to the offloading section, confusing assemblers which did not support the syntax. Additionally, with offloading and -flto, the top-level assembler code did not end up in the host files. As r14-321-g9a41d2cdbcd added top-level 'asm' to one libstdc++ header file, the issue became more apparent, causing fails with nvptx for some C++ testcases. PR libstdc++/109816 gcc/ChangeLog: * lto-cgraph.cc (output_symtab): Guard lto_output_toplevel_asms by '!lto_stream_offload_p'. libgomp/ChangeLog: * testsuite/libgomp.c++/target-map-class-1.C: New test. * testsuite/libgomp.c++/target-map-class-2.C: New test.
2023-05-12libgomp testsuite: Generalize 'lang_library_path' into a list of ↵Thomas Schwinge5-47/+63
'lang_library_paths' ..., and use that for libquadmath, too. libgomp/ * testsuite/lib/libgomp.exp (libgomp_target_compile): Generalize 'lang_library_path' into a list of 'lang_library_paths'. * testsuite/libgomp.c++/c++.exp: Adjust. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Adjust. Use that for libquadmath, too. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-12libgomp testsuite: Get rid of 'lang_test_file_found'Thomas Schwinge5-282/+243
Instead, 'return' early from the '*.exp' files that we're not able to test. Also, change 'puts' into 'verbose -log'. While re-indenting the previous 'if { $lang_test_file_found } { [...] }' code, also simplify 'ld_library_path' setup. libgomp/ * testsuite/lib/libgomp.exp (libgomp_target_compile): Don't look at 'lang_test_file_found'. * testsuite/libgomp.c++/c++.exp: Don't set and use it, and instead 'return' early if not able to test. Simplify 'ld_library_path' setup. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-12libgomp C++, Fortran testsuites: Resolve 'lang_test_file_found' firstThomas Schwinge4-73/+68
libgomp/ * testsuite/libgomp.c++/c++.exp: Resolve 'lang_test_file_found' first. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-12libgomp testsuite: Localize 'lang_[...]' etc.Thomas Schwinge7-49/+36
..., instead of letting them bleed into the next '*.exp' file, requiring clean-up there. libgomp/ * testsuite/libgomp.c++/c++.exp: Localize 'lang_[...]' etc. * testsuite/libgomp.c/c.exp: Likewise. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.graphite/graphite.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
2023-05-10Daily bump.GCC Administrator1-0/+25
2023-05-09libgomp testsuite: Use 'lang_test_file_found' instead of 'lang_test_file'Thomas Schwinge8-24/+8
The value of 'lang_test_file' isn't actually used anywhere. libgomp/ * testsuite/libgomp.c++/c++.exp: Don't set 'lang_test_file'. * testsuite/libgomp.fortran/fortran.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise. * testsuite/libgomp.c/c.exp: Unset 'lang_test_file_found' instead of 'lang_test_file'. * testsuite/libgomp.oacc-c/c.exp: Likewise. * testsuite/libgomp.graphite/graphite.exp: Likewise. * testsuite/lib/libgomp.exp (libgomp_target_compile): Look for 'lang_test_file_found' instead of 'lang_test_file'.
2023-05-09libgomp testsuite: Only use 'blddir' if setThomas Schwinge3-4/+7
(It is unclear to me why the current working directory needs to be in 'LD_LIBRARY_PATH'; leaving that alone for now.) libgomp/ * testsuite/lib/libgomp.exp (libgomp_init): Only use 'blddir' if set. * testsuite/libgomp.c++/c++.exp: Likewise. * testsuite/libgomp.oacc-c++/c++.exp: Likewise.
2023-05-09libgomp C++ testsuite: Don't compute 'blddir' twiceThomas Schwinge2-6/+0
It has already been set in 'libgomp/testsuite/lib/libgomp.exp:libgomp_init'. libgomp/ * testsuite/libgomp.c++/c++.exp (blddir): Don't set. * testsuite/libgomp.oacc-c++/c++.exp (blddir): Likewise.
2023-05-09Daily bump.GCC Administrator1-0/+18
2023-05-08libgomp C++ testsuite: Use 'lang_include_flags' instead of 'libstdcxx_includes'Thomas Schwinge2-8/+6
With nvptx offloading configured, and supported, and CUDA available: $ make check-target-libgomp RUNTESTFLAGS="--all c.exp=context-1.c c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c/c.exp ... PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 Running [...]/libgomp.oacc-c++/c++.exp ... PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 execution test PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 (test for excess errors) PASS: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 execution test UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] ..., but for 'c++.exp=context-1.c' alone, we currently get all-UNSUPPORTED: $ make check-target-libgomp RUNTESTFLAGS_="--all c++.exp=context-1.c" [...] Running [...]/libgomp.oacc-c++/c++.exp ... UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 UNSUPPORTED: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/context-1.c -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 [...] That is, if 'c.exp' executes first, it does successfully evaluate 'dg-require-effective-target openacc_cublas' -- and does cache this result (so it isn't reevaluated for 'c++.exp'). However, for 'c++.exp' alone (that is, without the 'c.exp' result cached), we run into: spawn -ignore SIGHUP [xgcc] [...] -x c++ openacc_cublas2311907.c [...] In file included from /usr/include/cuda_fp16.h:3673, from /usr/include/cublas_api.h:75, from /usr/include/cublas_v2.h:65, from openacc_cublas2311907.c:3: /usr/include/cuda_fp16.hpp:67:10: fatal error: utility: No such file or directory We're missing include paths to C++/libstdc++ build-tree headers. Fix this by using the mechanism introduced for Fortran in r212268 (commit f707da16f714f7fe5a42391748212c84dfec639b) re "libgomp.fortran/fortran.exp - add -fintrinsic-modules-path ${blddir}". libgomp/ * testsuite/libgomp.c++/c++.exp: Use 'lang_include_flags' instead of 'libstdcxx_includes'. * testsuite/libgomp.oacc-c++/c++.exp: Likewise.
2023-05-08libgomp: Simplify OpenMP reverse offload host <-> device memory copy ↵Thomas Schwinge6-103/+96
implementation ... by using the existing 'goacc_asyncqueue' instead of re-coding parts of it. Follow-up to commit 131d18e928a3ea1ab2d3bf61aa92d68a8a254609 "libgomp/nvptx: Prepare for reverse-offload callback handling", and commit ea4b23d9c82d9be3b982c3519fe5e8e9d833a6a8 "libgomp: Handle OpenMP's reverse offloads". libgomp/ * target.c (gomp_target_rev): Instead of 'dev_to_host_cpy', 'host_to_dev_cpy', 'token', take a single 'goacc_asyncqueue'. * libgomp.h (gomp_target_rev): Adjust. * libgomp-plugin.c (GOMP_PLUGIN_target_rev): Adjust. * libgomp-plugin.h (GOMP_PLUGIN_target_rev): Adjust. * plugin/plugin-gcn.c (process_reverse_offload): Adjust. * plugin/plugin-nvptx.c (rev_off_dev_to_host_cpy) (rev_off_host_to_dev_cpy): Remove. (GOMP_OFFLOAD_run): Adjust.
2023-05-05Daily bump.GCC Administrator1-0/+14
2023-05-04OpenACC: Further attach/detach clause fixes for Fortran [PR109622]Julian Brown4-2/+58
This patch moves several tests introduced by the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7 into the proper location for OpenACC testing (thanks to Thomas for spotting my mistake!), and also fixes a few additional problems -- missing diagnostics for non-pointer attaches, and a case where a pointer was incorrectly dereferenced. Tests are also adjusted for vector-length warnings on nvidia accelerators. 2023-04-29 Julian Brown <julian@codesourcery.com> PR fortran/109622 gcc/fortran/ * openmp.cc (resolve_omp_clauses): Add diagnostic for non-pointer/non-allocatable attach/detach. * trans-openmp.cc (gfc_trans_omp_clauses): Remove dereference for pointer-to-scalar derived type component attach/detach. Fix attach/detach handling for descriptors. gcc/testsuite/ * gfortran.dg/goacc/pr109622-5.f90: New test. * gfortran.dg/goacc/pr109622-6.f90: New test. libgomp/ * testsuite/libgomp.fortran/pr109622.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore vector length warning. * testsuite/libgomp.fortran/pr109622-2.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here. Add missing copyin/copyout variable. Ignore vector length warnings. * testsuite/libgomp.fortran/pr109622-3.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here. Ignore vector length warnings. * testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.
2023-04-29Daily bump.GCC Administrator1-0/+7
2023-04-28OpenACC: Stand-alone attach/detach clause fixes for Fortran [PR109622]Julian Brown3-0/+96
This patch fixes several cases where multiple attach or detach mapping nodes were being created for stand-alone attach or detach clauses in Fortran. After the introduction of stricter checking later during compilation, these extra nodes could cause ICEs, as seen in the PR. The patch also fixes cases that "happened to work" previously where the user attaches/detaches a pointer to array using a descriptor, and (I think!) the "_data" field has offset zero, hence the same address as the descriptor as a whole. 2023-04-27 Julian Brown <julian@codesourcery.com> PR fortran/109622 gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Attach/detach clause fixes. gcc/testsuite/ * gfortran.dg/goacc/attach-descriptor.f90: Adjust expected output. libgomp/ * testsuite/libgomp.fortran/pr109622.f90: New test. * testsuite/libgomp.fortran/pr109622-2.f90: New test. * testsuite/libgomp.fortran/pr109622-3.f90: New test.
2023-04-26Daily bump.GCC Administrator1-0/+6
2023-04-25'omp scan' struct block seq update for OpenMP 5.xTobias Burnus3-0/+248
While OpenMP 5.0 required a single structured block before and after the 'omp scan' directive, OpenMP 5.1 changed this to a 'structured block sequence, denoting 2 or more executable statements in OpenMP 5.1 (whoops!) and zero or more in OpenMP 5.2. This commit updates C/C++ to accept zero statements (but till requires the '{' ... '}' for the final-loop-body) and updates Fortran to accept zero or more than one statements. If there is no preceeding or succeeding executable statement, a warning is shown. gcc/c/ChangeLog: * c-parser.cc (c_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/cp/ChangeLog: * parser.cc (cp_parser_omp_scan_loop_body): Handle zero exec statements before/after 'omp scan'. gcc/fortran/ChangeLog: * openmp.cc (gfc_resolve_omp_do_blocks): Handle zero or more than one exec statements before/after 'omp scan'. * trans-openmp.cc (gfc_trans_omp_do): Likewise. libgomp/ChangeLog: * testsuite/libgomp.c-c++-common/scan-1.c: New test. * testsuite/libgomp.c/scan-23.c: New test. * testsuite/libgomp.fortran/scan-2.f90: New test. gcc/testsuite/ChangeLog: * g++.dg/gomp/attrs-7.C: Update dg-error/dg-warning. * gfortran.dg/gomp/loop-2.f90: Likewise. * gfortran.dg/gomp/reduction5.f90: Likewise. * gfortran.dg/gomp/reduction6.f90: Likewise. * gfortran.dg/gomp/scan-1.f90: Likewise. * gfortran.dg/gomp/taskloop-2.f90: Likewise. * c-c++-common/gomp/scan-6.c: New test. * gfortran.dg/gomp/scan-8.f90: New test.
2023-03-29Daily bump.GCC Administrator1-0/+5
2023-03-28testsuite: Fix weak_undefined handling on DarwinRainer Orth1-0/+1
The patch that introduced the weak_undefined effective-target keyword and corresponding dg-add-options support commit 378ec7b87a5265dbe2d489c245fac98ef37fa638 Author: Alexandre Oliva <oliva@adacore.com> Date: Thu Mar 23 00:45:05 2023 -0300 [testsuite] test for weak_undefined support and add options badly broke the affected tests on macOS like so: ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined " ERROR: gcc.dg/addr_equal-1.c: unknown dg option: 89 for " dg-add-options 5 weak_undefined " add_options_for_weak_undefined tries to call an non-existant proc "89". Even after fixing this by escaping the brackets, two tests still failed to link since they lacked the corresponding calls do dg-add-options weak_undefined. Tested on x86_64-apple-darwin20.6.0 and i386-pc-solaris2.11. 2023-03-27 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc/testsuite: * lib/target-supports.exp (add_options_for_weak_undefined): Escape brackets. * gcc.dg/visibility-22.c: Add weak_undefined options. libgomp: * testsuite/libgomp.oacc-c-c++-common/routine-nohost-2.c: Add weak_undefined options.
2023-03-25Daily bump.GCC Administrator1-0/+10
2023-03-24libgomp.texi: Fix wording in GCN offload specificsTobias Burnus1-1/+1
libgomp/ * libgomp.texi (Offload-Target Specifics): Grammar fix.
2023-03-24Add caveat/safeguard to OpenMP: Handle descriptors in target's firstprivate ↵Thomas Schwinge1-0/+5
[PR104949] Follow-up to commit 49d1a2f91325fa8cc011149e27e5093a988b3a49 "OpenMP: Handle descriptors in target's firstprivate [PR104949]". PR fortran/104949 libgomp/ * target.c (gomp_map_vars_internal) <GOMP_MAP_FIRSTPRIVATE>: Add caveat/safeguard.
2023-03-11Daily bump.GCC Administrator1-0/+52
2023-03-10Use 'GOMP_MAP_VARS_TARGET' for OpenACC compute constructs [PR90596]Thomas Schwinge5-238/+44
Thereby considerably simplify the device plugins' 'GOMP_OFFLOAD_openacc_exec', 'GOMP_OFFLOAD_openacc_async_exec' functions: in terms of lines of code, but in particular conceptually: no more device memory allocation, host to device data copying, device memory deallocation -- 'GOMP_MAP_VARS_TARGET' does all that for us. This depends on commit 2b2340e236c0bba8aaca358ea25a5accd8249fbd "Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data", where I said that "a use will emerge later", which is this one here. PR libgomp/90596 libgomp/ * target.c (gomp_map_vars_internal): Allow for 'param_kind == GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_TARGET'. * oacc-parallel.c (GOACC_parallel_keyed): Pass 'GOMP_MAP_VARS_TARGET' to 'goacc_map_vars'. * plugin/plugin-gcn.c (alloc_by_agent, gcn_exec) (GOMP_OFFLOAD_openacc_exec, GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (gomp_offload_free): Remove. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust, simplify. (cuda_free_argmem): Remove. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Adjust.
2023-03-10Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' dataThomas Schwinge1-34/+36
This does *allow*, but under no circumstances is this currently going to be used: all potentially applicable data is non-'ephemeral', and thus not considered for 'gomp_coalesce_buf_add' for OpenACC 'async'. (But a use will emerge later.) Follow-up to commit r12-2530-gd88a6951586c7229b25708f4486eaaf4bf4b5bbe "Don't use libgomp 'cbuf' buffering with OpenACC 'async'", addressing this TODO comment: TODO ... but we could allow CBUF usage for EPHEMERAL data? (Open question: is it more performant to use libgomp CBUF buffering or individual device asyncronous copying?) Ephemeral data is small, and therefore individual device asyncronous copying does seem dubious -- in particular given that for all those, we'd individually have to allocate and queue for deallocation a temporary buffer to capture the ephemeral data. Instead, just let the 'cbuf' *be* the temporary buffer. libgomp/ * target.c (gomp_copy_host2dev, gomp_map_vars_internal): Allow libgomp 'cbuf' buffering with OpenACC 'async' for 'ephemeral' data.
2023-03-10Simplify OpenACC 'no_create' clause implementationThomas Schwinge5-27/+54
For 'OFFSET_INLINED', 'gomp_map_val' does the right thing, and we may then simplify the device plugins accordingly. This is a follow-up to Subversion r279551 (Git commit a6163563f2ce502bd4ef444bd5de33570bb8eeb1) "Add OpenACC 2.6's no_create", Subversion r279622 (Git commit 5bcd470bf0749e1f56d05dd43aa9584ff2e3a090) "Use gomp_map_val for OpenACC host-to-device address translation". libgomp/ * target.c (gomp_map_vars_internal): Use 'OFFSET_INLINED' for 'GOMP_MAP_IF_PRESENT'. * plugin/plugin-gcn.c (gcn_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Adjust. * plugin/plugin-nvptx.c (nvptx_exec, GOMP_OFFLOAD_openacc_exec) (GOMP_OFFLOAD_openacc_async_exec): Likewise. * testsuite/libgomp.oacc-c-c++-common/no_create-1.c: Add 'async' testing. * testsuite/libgomp.oacc-c-c++-common/no_create-2.c: Likewise.
2023-03-10OpenACC: Remove 'acc_async_test' -> skip shortcut in ↵Thomas Schwinge1-3/+0
'libgomp/oacc-async.c:goacc_wait' We're not taking such a shortcut anywhere else, and (with future changes) it has potential to confuse things if synchronization in a libgomp plugin happens to have side effects even if an async queue currently is empty. libgomp/ * oacc-async.c (goacc_wait): Remove 'acc_async_test' -> skip shortcut.
2023-03-10Document/verify another aspect of OpenACC 'async' semantics in ↵Thomas Schwinge1-2/+2
'libgomp.oacc-c-c++-common/data-3.c' ... that I almost broke with later implementation changes. libgomp/ * testsuite/libgomp.oacc-c-c++-common/data-3.c: Document/verify another aspect of OpenACC 'async' semantics.