aboutsummaryrefslogtreecommitdiff
path: root/libgomp
AgeCommit message (Collapse)AuthorFilesLines
2022-05-12openmp: Add omp_all_memory support (C/C++ only so far)Jakub Jelinek5-1/+498
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier which isn't allowed to be used anywhere but in the depend clause, this is against how everything else has been handled in OpenMP so far (where some identifiers could have special meaning in some OpenMP clauses or pragmas but not elsewhere). The patch handles it by making it a conditional keyword (for -fopenmp only) and emitting a better diagnostics when it is used in a primary expression. Having a nicer diagnostics when e.g. trying to do int omp_all_memory; or int *omp_all_memory[10]; etc. would mean changing too many spots and hooking into name lookups to reject declaring any such symbols would be too ugly and I'm afraid there are way too many spots where one can introduce a name (variables, functions, namespaces, struct, enum, enumerators, template arguments, ...). Otherwise, the handling is quite simple, normal depend clauses lower into addresses of variables being handed over to the library, for omp_all_memory I'm using NULL pointers. omp_all_memory can only be used with inout or out depend kinds and means that a task is dependent on all previously created sibling tasks that have any dependency (of any depend kind) and that any later created sibling tasks will be dependent on it if they have any dependency. 2022-05-12 Jakub Jelinek <jakub@redhat.com> gcc/ * gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr if null_pointer_node. (gimplify_scan_omp_clauses): Likewise. * tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node as omp_all_memory. gcc/c-family/ * c-common.h (enum rid): Add RID_OMP_ALL_MEMORY. * c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr if null_pointer_node. gcc/c/ * c-parser.cc (c_parse_init): Register omp_all_memory as keyword if flag_openmp. (c_parser_postfix_expression): Diagnose uses of omp_all_memory in postfix expressions. (c_parser_omp_variable_list): Handle omp_all_memory in depend clause. * c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. gcc/cp/ * lex.cc (init_reswords): Register omp_all_memory as keyword if flag_openmp. * parser.cc (cp_parser_primary_expression): Diagnose uses of omp_all_memory in postfix expressions. (cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend clause. * semantics.cc (finish_omp_clauses): Handle omp_all_memory keyword in depend clause as null_pointer_node, diagnose invalid uses. * pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory. gcc/testsuite/ * c-c++-common/gomp/all-memory-1.c: New test. * c-c++-common/gomp/all-memory-2.c: New test. * c-c++-common/gomp/all-memory-3.c: New test. * g++.dg/gomp/all-memory-1.C: New test. * g++.dg/gomp/all-memory-2.C: New test. libgomp/ * libgomp.h (struct gomp_task): Add depend_all_memory member. * task.c (gomp_init_task): Initialize depend_all_memory. (gomp_task_handle_depend): Handle omp_all_memory. (gomp_task_run_post_handle_depend_hash): Clear parent->depend_all_memory if equal to current task. (gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory. * testsuite/libgomp.c-c++-common/depend-1.c: New test. * testsuite/libgomp.c-c++-common/depend-2.c: New test. * testsuite/libgomp.c-c++-common/depend-3.c: New test.
2022-05-12Daily bump.GCC Administrator1-0/+32
2022-05-11libgomp: Remove unused '--with-hsa-runtime', '--with-hsa-runtime-include', ↵Thomas Schwinge4-86/+2
'--with-hsa-runtime-lib' With recent commit 2e309a4eff80e55b53d32d26926a2a94eabfea21 "libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library", and commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056 "libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library", the last uses of '--with-hsa-runtime' etc. are gone. gcc/ * doc/install.texi: Don't document '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib'. libgomp/ * plugin/configfrag.ac: Remove '--with-hsa-runtime', '--with-hsa-runtime-include', '--with-hsa-runtime-lib' processing. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise.
2022-05-11libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', ↵Thomas Schwinge5-22/+7
'PLUGIN_GCN_LDFLAGS' After recent commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056 "libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library", these aren't set anymore. libgomp/ * plugin/Makefrag.am (libgomp_plugin_gcn_la_CPPFLAGS): Don't consider 'PLUGIN_GCN_CPPFLAGS'. (libgomp_plugin_gcn_la_LDFLAGS): Don't consider 'PLUGIN_GCN_LDFLAGS'. * plugin/configfrag.ac (PLUGIN_GCN_CPPFLAGS, PLUGIN_GCN_LDFLAGS): Remove. * Makefile.in: Regenerate. * configure: Likewise. * testsuite/Makefile.in: Likewise.
2022-05-11libgomp GCN plugin: Clean up unused references to system-provided HSA ↵Thomas Schwinge2-20/+0
Runtime library This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or '--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can tell. Originally changed for the libgomp HSA plugin in commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749) "Remove build dependence on HSA run-time", and later propagated into the GCN plugin, these are no longer built against system-provided HSA Runtime library. Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header files, and at run time does 'dlopen("libhsa-runtime64.so.1")'. It thus doesn't make sense to consider references to system-provided HSA Runtime library during libgomp GCN plugin build. libgomp/ * plugin/configfrag.ac (HSA_RUNTIME_CPPFLAGS) (HSA_RUNTIME_LDFLAGS): Remove. * configure: Regenerate.
2022-05-11libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA ↵Thomas Schwinge2-5/+0
Runtime library This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or '--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can tell. 'libgomp/testsuite/lib/libgomp.exp:libgomp_init' states: # For build-tree testing, also consider the library paths used for builing. # For installed testing, we assume all that to be provided in the sysroot. if { $blddir != "" } { [...] global hsa_runtime_lib if { $hsa_runtime_lib != "" } { append always_ld_library_path ":$hsa_runtime_lib" } } However, the libgomp GCN plugin is unconditionally built against the GCC-shipped 'include/hsa*.h' header files, and at run time does 'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime library "used for builing". It thus doesn't make sense to amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library. libgomp/ * testsuite/lib/libgomp.exp (libgomp_init): Don't 'append always_ld_library_path ":$hsa_runtime_lib"'. * testsuite/libgomp-test-support.exp.in (hsa_runtime_lib): Don't set.
2022-05-11Daily bump.GCC Administrator1-0/+5
2022-05-10Fix up 'libgomp.fortran/use_device_addr-5.f90' multi-device testingThomas Schwinge1-1/+1
Fix-up for recent commit r13-116-g3f8c389fe90bf565a6221a46bb7fb745dd4c1510 "OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg", where we currently get: libgomp: use_device_ptr pointer wasn't mapped FAIL: libgomp.fortran/use_device_addr-5.f90 -O execution test libgomp/ * testsuite/libgomp.fortran/use_device_addr-5.f90: Fix up multi-device testing.
2022-05-07Daily bump.GCC Administrator1-0/+12
2022-05-06OpenMP, libgomp: Add new runtime routine omp_target_is_accessible.Marcel Vollweiler8-1/+140
gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added target_is_accessible to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_target_is_accessible. * libgomp.texi: Tagged omp_target_is_accessible as supported. * omp.h.in: Added omp_target_is_accessible. * omp_lib.f90.in: Added interface for omp_target_is_accessible. * omp_lib.h.in: Likewise. * target.c (omp_target_is_accessible): Added implementation of omp_target_is_accessible. * testsuite/libgomp.c-c++-common/target-is-accessible-1.c: New test. * testsuite/libgomp.fortran/target-is-accessible-1.f90: New test.
2022-05-06Daily bump.GCC Administrator1-0/+4
2022-05-05libgomp: Update docs to reflect Fortran support for non-rectangular loopsSandra Loosemore1-1/+1
libgomp/ * libgomp.texi (OpenMP 5.0): Feature is now fully supported.
2022-05-05Daily bump.GCC Administrator1-0/+8
2022-05-04libgomp/plugin/plugin-gcn.c: Use -foffload-options= in err msgTobias Burnus1-1/+1
While -foffload=-<flag> works (never documented legacy feature), the documented way is to use -foffload-options=. libgomp/ChangeLog: * plugin/plugin-gcn.c (isa_matches_agent): Suggest -foffload-options.
2022-05-04OpenMP: Fix use_device_{addr,ptr} with in-data-sharing argTobias Burnus1-0/+143
For array-descriptor vars, the descriptor is assigned to a temporary. However, this failed when the clause's argument was in turn in a data-sharing clause as the outer context's VALUE_EXPR wasn't used. gcc/ChangeLog: * omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list item that is in an outer data-sharing clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
2022-05-04Daily bump.GCC Administrator1-0/+18
2022-05-02OpenMP, libgomp: Add new runtime routine omp_get_mapped_ptr.Marcel Vollweiler14-1/+660
This patch adds the OpenMP runtime routine "omp_get_mapped_ptr" which was introduced in OpenMP 5.1. gcc/ChangeLog: * omp-low.cc (omp_runtime_api_call): Added get_mapped_ptr to omp_runtime_apis array. libgomp/ChangeLog: * libgomp.map: Added omp_get_mapped_ptr. * libgomp.texi: Tagged omp_get_mapped_ptr as supported. * omp.h.in: Added omp_get_mapped_ptr. * omp_lib.f90.in: Added interface for omp_get_mapped_ptr. * omp_lib.h.in: Likewise. * target.c (omp_get_mapped_ptr): Added implementation of omp_get_mapped_ptr. * testsuite/libgomp.c-c++-common/get-mapped-ptr-1.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-2.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-3.c: New test. * testsuite/libgomp.c-c++-common/get-mapped-ptr-4.c: New test. * testsuite/libgomp.fortran/get-mapped-ptr-1.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-2.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-3.f90: New test. * testsuite/libgomp.fortran/get-mapped-ptr-4.f90: New test.
2022-04-29Daily bump.GCC Administrator1-0/+7
2022-04-28Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717]Thomas Schwinge1-1/+29
That got broken by recent commit b2202431910e30d8505c94d1cb9341cac7080d10 "fortran: Fix up gfc_trans_oacc_construct [PR104717]". PR fortran/104717 libgomp/ * testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC privatization scanning. For GCN offloading compilation, raise '-mgang-private-size'.
2022-04-27Daily bump.GCC Administrator1-0/+13
2022-04-26libgomp: Fix up two non-GOMP_USE_ALIGNED_WORK_SHARES related issues [PR105358]Jakub Jelinek4-7/+18
Last fall I've changed struct gomp_work_share, so that it doesn't have __attribute__((aligned (64))) lock member in the middle unless the target has non-emulated aligned allocator, otherwise it just makes sure the first and second halves are 64 bytes appart for cache line reasons, but doesn't make the struct 64-byte aligned itself and so we can use normal allocators for it. When the struct isn't 64-byte aligned, the amount of tail padding significantly decreases, to 0 or 4 bytes or so. The library uses that tail padding when the ordered_teams_ids array (array of uints) and/or the memory for lastprivate conditional temporaries (the latter wants to guarantee long long alignment). The problem with it on ia32 darwin9 is that while the struct contains long long members, long long is just 4 byte aligned while __alignof__(long long) is 8. That causes problems in gomp_init_work_share, where we currently rely on if offsetof (struct gomp_work_share, inline_ordered_team_ids) is long long aligned, then that tail array will be aligned at runtime and so no extra memory for dynamic realignment will be needed (that is false when the whole struct doesn't have long long alignment). And also in the remaining hunks causes another problem, where we compute INLINE_ORDERED_TEAM_IDS_OFF as the above offsetof aligned up to long long boundary and subtract sizeof (struct gomp_work_share) and INLINE_ORDERED_TEAM_IDS_OFF. When unlucky, the former isn't multiple of 8 and the latter is 4 bigger than that and as the subtraction is done in size_t, we end up with (size_t) -4, so the comparison doesn't really work. The fixes add additional conditions to make it work properly, but all of them should be evaluated at compile time when optimizing and so shouldn't slow anything. 2022-04-26 Jakub Jelinek <jakub@redhat.com> PR libgomp/105358 * work.c (gomp_init_work_share): Don't mask of adjustment for dynamic long long realignment if struct gomp_work_share has smaller alignof than long long. * loop.c (GOMP_loop_start): Don't use inline_ordered_team_ids if struct gomp_work_share has smaller alignof than long long or if sizeof (struct gomp_work_share) is smaller than INLINE_ORDERED_TEAM_IDS_OFF. * loop_ull.c (GOMP_loop_ull_start): Likewise. * sections.c (GOMP_sections2_start): Likewise.
2022-04-26Daily bump.GCC Administrator1-0/+6
2022-04-25fortran: Fix up gfc_trans_oacc_construct [PR104717]Jakub Jelinek1-0/+7
So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs for which that function is invoked need an extra artificial BIND_EXPR around their body so that we move all variables of the bodies. The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK or OMP_TARGET and for OpenACC constructs that behave similarly to OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. The following patch does that for OpenACC constructs too. PR fortran/104717 gcc/fortran/ * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body in an extra BIND_EXPR. gcc/testsuite/ * gfortran.dg/goacc/pr104717.f90: New test. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
2022-04-14Daily bump.GCC Administrator1-0/+4
2022-04-13libgomp: Fix a documentation typoJakub Jelinek1-1/+1
This fixes a typo in the 5.0 feature support table. 2022-04-13 Jakub Jelinek <jakub@redhat.com> * libgomp.texi: Fix a typo - mutexinouset -> mutexinoutset.
2022-04-07Daily bump.GCC Administrator1-0/+9
2022-04-06Move 'libgomp/plugin/cuda/cuda.h' to 'include/cuda/cuda.h'Thomas Schwinge4-198/+5
... so that it may be used by other projects that inherit GCC's 'include' directory. include/ * cuda/cuda.h: New file. libgomp/ * plugin/cuda/cuda.h: Remove file. * plugin/plugin-nvptx.c [PLUGIN_NVPTX_DYNAMIC]: Include "cuda/cuda.h" instead of <cuda.h>. * plugin/configfrag.ac <PLUGIN_NVPTX_DYNAMIC>: Don't set 'PLUGIN_NVPTX_CPPFLAGS'. * configure: Regenerate.
2022-04-06Daily bump.GCC Administrator1-0/+4
2022-04-05OpenMP: Fix nested use_device_ptrChung-Lin Tang1-0/+41
This patch fixes a bug in lower_omp_target, where for Fortran arrays, the expanded sender assignment is wrongly using the variable in the current ctx, instead of the one looked-up outside, which is causing use_device_ptr/addr to fail to work when used inside an omp-parallel (where the omp child_fn is split away from the original). The fix is inside omp-low.cc, though because the omp_array_data langhook is used only by Fortran, this is essentially Fortran-specific. 2022-04-05 Chung-Lin Tang <cltang@codesourcery.com> gcc/ChangeLog: * omp-low.cc (lower_omp_target): Use outer context looked-up 'var' as argument to lang_hooks.decls.omp_array_data, instead of 'ovar' from current clause. libgomp/ChangeLog: * testsuite/libgomp.fortran/use_device_ptr-4.f90: New testcase.
2022-04-05Daily bump.GCC Administrator1-0/+6
2022-04-04[libgomp/testsuite] Fix libgomp.fortran/examples-4/declare_target-{1,2}.f90Tom de Vries2-24/+38
The test-cases libgomp.fortran/examples-4/declare_target-{1,2}.f90 mean to set an nvptx-specific limit using offload_target_nvptx, but also change behaviour for amd. That is, there is now a difference in behaviour between: - a compiler configured for GCN offloading, and - a compiler configured for both GCN and nvptx offloading. Fix this by using instead on_device_arch_nvptx. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-04 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Use on_device_arch_nvptx instead of offload_target_nvptx. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-04-04Daily bump.GCC Administrator1-0/+11
2022-04-01[libgomp, testsuite, nvptx] Limit recursion in declare_target-{1,2}.f90Tom de Vries2-11/+27
When running testcases libgomp.fortran/examples-4/declare_target-{1,2}.f90 on an RTX A2000 (sm_86) with driver 510.60.02 and with GOMP_NVPTX_JIT=-O0 I run into: ... FAIL: libgomp.fortran/examples-4/declare_target-1.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test FAIL: libgomp.fortran/examples-4/declare_target-2.f90 -O0 \ -DGOMP_NVPTX_JIT=-O0 execution test ... Fix this by further limiting recursion depth in the test-cases for nvptx. Furthermore, make the recursion depth limiting nvptx-specific. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.fortran/examples-4/declare_target-1.f90: Define and use REC_DEPTH. * testsuite/libgomp.fortran/examples-4/declare_target-2.f90: Same.
2022-04-01[libgomp, testsuite, nvptx] Fix dg-output test in vector-length-128-7.cTom de Vries1-1/+1
When running test-case libgomp.oacc-c-c++-common/vector-length-128-7.c on an RTX A2000 (sm_86) with driver 510.60.02 I run into: ... FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/vector-length-128-7.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ output pattern test ... The failing check verifies the launch dimensions: ... /* { dg-output "nvptx_exec: kernel main\\\$_omp_fn\\\$0: \ launch gangs=1, workers=8, vectors=128" } */ ... which fails because (as we can see with GOMP_DEBUG=1) the actual num_workers is 6: ... nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=6, vectors=128 ... This is due to the result of cuOccupancyMaxPotentialBlockSize (which suggests 'a launch configuration with reasonable occupancy') printed just before: ... cuOccupancyMaxPotentialBlockSize: grid = 52, block = 768 ... [ Note: 6 * 128 == 768. ] Fix this by updating the check to allow num_workers in the range 1 to 8. Tested on x86_64 with nvptx accelerator. libgomp/ChangeLog: 2022-04-01 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Fix num_workers check.
2022-03-30Daily bump.GCC Administrator1-0/+5
2022-03-29LoongArch Port: libgompchenglulu1-0/+4
2022-03-29 Chenghua Xu <xuchenghua@loongson.cn> Lulu Cheng <chenglulu@loongson.cn> libgomp/ChangeLog: * configure.tgt: Add LoongArch triplet.
2022-03-29Daily bump.GCC Administrator1-0/+6
2022-03-28[libgomp, testsuite] Fix hardcoded libexec in plugin/configfrag.acTom de Vries2-2/+2
When building an nvptx offloading configuration on openSUSE Leap 15.3, the site script /usr/share/site/x86_64-unknown-linux-gnu is activated, setting libexecdir to ${exec_prefix}/lib rather than ${exec_prefix}/libexec: ... | # If user did not specify libexecdir, set the correct target: | # Nor FHS nor openSUSE allow prefix/libexec. Let's default to prefix/lib. | | if test "$libexecdir" = '${exec_prefix}/libexec' ; then | libexecdir='${exec_prefix}/lib' | fi ... However, in libgomp libgomp/plugin/configfrag.ac we hardcode libexec: ... # Configure additional search paths. if test x"$tgt_dir" != x; then offload_additional_options="$offload_additional_options \ -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) \ -B$tgt_dir/bin" ... Fix this by using /$(libexecdir:\$(exec_prefix)/%=%)/ instead of /libexec/. Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-03-28 Tom de Vries <tdevries@suse.de> * plugin/configfrag.ac: Use /$(libexecdir:\$(exec_prefix)/%=%)/ instead of /libexec/. * configure: Regenerate.
2022-03-26Daily bump.GCC Administrator1-0/+8
2022-03-25[libgomp, testsuite] Scale down some OpenACC test-casesTom de Vries3-23/+46
When a display manager is running on an nvidia card, all CUDA kernel launches get a 5 seconds watchdog timer. Consequently, when running the libgomp testsuite with nvptx accelerator and GOMP_NVPTX_JIT=-O0 we run into a few FAILs like this: ... libgomp: cuStreamSynchronize error: the launch timed out and was terminated FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/parallel-dims.c \ -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O0 \ execution test ... Fix this by scaling down the failing test-cases by default, and reverting to the original behaviour for GCC_TEST_RUN_EXPENSIVE=1. Tested on x86_64-linux with nvptx accelerator. libgomp/ChangeLog: 2022-03-25 Tom de Vries <tdevries@suse.de> PR libgomp/105042 * testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Reduce execution time. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Same. * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Same.
2022-03-24Daily bump.GCC Administrator1-0/+10
2022-03-23LTO: Fixes for renaming issues with offload/OpenMP [PR104285]Tobias Burnus6-0/+290
gcc/lto/ChangeLog: PR middle-end/104285 * lto-partition.cc (maybe_rewrite_identifier): Use get_identifier for the returned string to be usable as hash key. (validize_symbol_for_target): Hence, use return value directly. (privatize_symbol_name_1): Track maybe_rewrite_identifier renames. * lto.cc (offload_handle_link_vars): Move function up before ... (do_whole_program_analysis): Call it after static renamings. (lto_main): Move call after static renamings. libgomp/ChangeLog: PR middle-end/104285 * testsuite/libgomp.c++/target-same-name-2-a.C: New test. * testsuite/libgomp.c++/target-same-name-2-b.C: New test. * testsuite/libgomp.c++/target-same-name-2.C: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-a.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1-b.c: New test. * testsuite/libgomp.c-c++-common/target-same-name-1.c: New test.
2022-03-23Daily bump.GCC Administrator1-0/+6
2022-03-22[nvptx] Add warp sync at simt exitTom de Vries1-0/+25
Consider this code (with N defined to 1024): ... float v = 0.0; #pragma omp target map(tofrom: v) #pragma omp parallel for simd for (int i = 0 ; i < N; i++) { #pragma omp atomic update v = v + 1.0; } ... It hangs when executing on target board unix/-foffload=-misa=sm_75, using drivers 470.103.01 and 510.54 on a T400 board (sm_75). I'm tentatively identifying the problem as a bug in -muniform-simt for architectures that support Independent Thread Scheduling (sm_70 and later). The problem -muniform-simt is trying to address is to make sure that a register produced outside an openmp simd region is available when used in any lane inside an simd region. The solution is to, outside an simd region, execute in all warp lanes, thus producing consistent values in result registers in each warp thread. This approach doesn't work when executing in all warp lanes multiplies the side effects from 1 to 32 separate side effects, which is the case for atomic insns. So atomic insns are rewritten to execute only in lane 0, and if there are any results, those are propagated to the other threads in the warp. [ And likewise for system calls malloc, free, vprintf. ] Now, consider a non-atomic update: ld, add, store. The store has side effects, are those multiplied or not? Pre-sm_70 we can assume that at the end of an SIMT region, any divergent control flow has reconverged, and we have a uniform warp, executing in lock step. So: - the load will load the same value into the result register across the warp, - the add will write the same value into the result register across the warp, - the store will write the same value to the same memory location, 32 times, at once, having the result of a single store. So, no side-effect multiplication (well, at least that's the observation). Starting sm_70, the threads in a warp are no longer guaranteed to reconverge after divergence. There's a "Convergence Optimizer" that can can identify that it is safe for a warp to reconverge, but that works only as long as the code does not contain "synchronizing operations". Consequently, the ld, add, store sequence can be executed by a non-uniform warp, which means the side effects can have multiplied, and the registers are no longer guarantueed to be in sync. The atomic update in the example above is translated using an atom.cas loop, which means that we have divergence (because only one thread is allowed to succeed at a time) and the "Convergence Optimizer" doesn't reconverge probably because the atom.cas counts as a "synchronizing operation". So, it seems plausible that the root cause for the mentioned hang is the problem described above. Fix this by adding an explicit warp sync at simt exit. Note that we're assuming here that the warp will stay uniform until the next SIMT region entry. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-09 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * config/nvptx/nvptx.md (define_expand "omp_simt_exit"): Emit warp sync (or uniform warp check for mptx < 6.0). libgomp/ChangeLog: 2022-03-15 Tom de Vries <tdevries@suse.de> PR target/104916 PR target/104783 * testsuite/libgomp.c/pr104783-2.c: New test.
2022-03-19Daily bump.GCC Administrator1-0/+16
2022-03-18Fortran/OpenMP: Fix privatization of associated namesTobias Burnus1-0/+92
gfc_omp_predetermined_sharing cases the associate-name pointer variable to be OMP_CLAUSE_DEFAULT_FIRSTPRIVATE, which is fine. However, the associated selector is shared. Thus, the target of associate-name pointer should not get copied. (It was before but because of gfc_omp_privatize_by_reference returning false, the selector was not only wrongly copied but this was also not done properly.) gcc/fortran/ChangeLog: PR fortran/103039 * trans-openmp.cc (gfc_omp_clause_copy_ctor, gfc_omp_clause_dtor): Only privatize pointer for associate names. libgomp/ChangeLog: PR fortran/103039 * testsuite/libgomp.fortran/associate4.f90: New test.
2022-03-18[openmp] Fix SIMT reduction using TRUTH_{AND,OR}IF_EXPRTom de Vries2-0/+46
Consider test-case pr104952-1.c, included in this commit, containing: ... #pragma omp target map(tofrom:result) map(to:arr) #pragma omp simd reduction(||: result) ... When run on x86_64 with nvptx accelerator, the test-case either aborts or hangs. The reduction clause is translated by the SIMT code (active for nvptx) as a butterfly reduction loop with this butterfly shuffle / update pair: ... D.2163 = D.2163 || .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) ... in the loop body. The problem is that the butterfly shuffle is possibly not executed, while it needs to be executed unconditionally. Fix this by translating instead as: ... D.tmp_bfly = .GOMP_SIMT_XCHG_BFLY (D.2163, D.2164) D.2163 = D.2163 || D.tmp_bfly ... Tested on x86_64-linux with nvptx accelerator. gcc/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * omp-low.cc (lower_rec_input_clauses): Make sure GOMP_SIMT_XCHG_BFLY is executed unconditionally. libgomp/ChangeLog: 2022-03-17 Tom de Vries <tdevries@suse.de> PR target/104952 * testsuite/libgomp.c/pr104952-1.c: New test. * testsuite/libgomp.c/pr104952-2.c: New test.
2022-03-18openmp: Fix up gomp_affinity_init_numa_domainsJakub Jelinek1-1/+1
On Thu, Nov 11, 2021 at 02:14:05PM +0100, Thomas Schwinge wrote: > There appears to be yet another issue: there still are quite a number of > 'FAIL: libgomp.c/places-10.c execution test' reports on > <gcc-testresults@gcc.gnu.org>. Also in my testing testing, on a system > where '/sys/devices/system/node/online' contains '0-1', I get a FAIL: > > [...] > OPENMP DISPLAY ENVIRONMENT BEGIN > _OPENMP = '201511' > OMP_DYNAMIC = 'FALSE' > OMP_NESTED = 'FALSE' > OMP_NUM_THREADS = '8' > OMP_SCHEDULE = 'DYNAMIC' > OMP_PROC_BIND = 'TRUE' > OMP_PLACES = '{0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30},{FAIL: libgomp.c/places-10.c execution test I've finally managed to debug this (by dumping used /sys/ files from an affected system in Fedora build system, replacing /sys/ with /tmp/ in gcc sources and populating there those files), I think following patch ought to fix it. 2022-03-18 Jakub Jelinek <jakub@redhat.com> * config/linux/affinity.c (gomp_affinity_init_numa_domains): Move seen variable next to pl variable.
2022-03-18Daily bump.GCC Administrator1-0/+13
2022-03-17Enhance further testcases to verify Openacc 'kernels' decompositionThomas Schwinge3-6/+67
gcc/testsuite/ * c-c++-common/goacc-gomp/nesting-1.c: Enhance. * c-c++-common/goacc/kernels-loop-g.c: Likewise. * c-c++-common/goacc/nesting-1.c: Likewise. * gcc.dg/goacc/nested-function-1.c: Likewise. * gfortran.dg/goacc/common-block-3.f90: Likewise. * gfortran.dg/goacc/nested-function-1.f90: Likewise. libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Enhance. * testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise. * testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.