aboutsummaryrefslogtreecommitdiff
path: root/openmp/runtime/src
AgeCommit message (Collapse)AuthorFilesLines
2024-01-09[openmp][AIX] Add AIX to __kmp_set_stack_info() (#77421)Brad Smith1-1/+1
2024-01-08[openmp][AIX]Initial changes for porting to AIX (#76841)Xing Xue12-26/+75
This PR contains initial changes for building and testing libomp on AIX. More changes will follow. - `KMP_OS_AIX` is defined for the AIX platform - `KMP_ARCH_PPC` is defined for 32-bit PPC - `KMP_ARCH_PPC_XCOFF` and `KMP_ARCH_PPC64_XCOFF` are for 32- and 64-bit XCOFF object formats respectively - Assembly file `z_AIX_asm.S` is used for AIX specific assembly code and will be added in a separate PR - The target library is disabled because AIX does not have the device support - OMPT is temporarily disabled
2023-12-15[OpenMP] Use simple VLA implementation to replace uses of actual VLAShilei Tian3-4/+61
Use of VLA can cause compile warning that was introduced in D156565. This patch implements a simple stack/heap-based VLA that can miminc the behavior of an actual VLA and prevent the warning. By default the stack accomodates the elements. If the number of emelements is greater than N, which by default is 8, a heap buffer will be allocated and used to acccomodate the elements.
2023-12-14[openmp][wasm] Allow compiling OpenMP to WebAssembly (#71297)Andrew Brown9-52/+116
This change allows building the static OpenMP runtime, `libomp.a`, as WebAssembly. It builds on the work done in [D142593] but goes further in several ways: - it makes the OpenMP CMake files more WebAssembly-aware - it conditions much more code (or code that had been refactored since [D142593]) for `KMP_ARCH_WASM` and `KMP_OS_WASI` - it fixes a Clang crash due to unimplemented common symbols in WebAssembly The commit messages have more details. Please understand this PR as a start, not the completed work, for WebAssembly support in OpenMP. Getting the tests running somehow would be a good next step, e.g.; but what is contained here works, at least with recent versions of [wasi-sdk] and engines that support [wasi-threads]. I suspect the same is true for Emscripten and browsers, but I have not tested that workflow. [D142593]: https://reviews.llvm.org/D142593 [wasi-sdk]: https://github.com/WebAssembly/wasi-sdk [wasi-threads]: https://github.com/WebAssembly/wasi-threads --------- Co-authored-by: Atanas Atanasov <atanas.atanasov@intel.com>
2023-12-11[OpenMP] Change check for OS to check for defined for a macro (#75012)Brad Smith1-1/+1
Check for the existence of the macro instead of checking for Solaris. illumos has this macro in sys/time.h. /export/home/brad/llvm-brad/openmp/runtime/src/z_Linux_util.cpp:77:9: warning: 'TIMEVAL_TO_TIMESPEC' macro redefined [-Wmacro-redefined] 77 | #define TIMEVAL_TO_TIMESPEC(tv, ts) \ | ^ /usr/include/sys/time.h:424:9: note: previous definition is here 424 | #define TIMEVAL_TO_TIMESPEC(tv, ts) { \ | ^
2023-12-01[OpenMP] Re-enable KMP_HAVE_QUAD on NetBSD 10.0 with GCC 10.5 (#73478)Brad Smith1-2/+3
2023-11-30Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA"Shilei Tian3-64/+4
This reverts commit 97e16da450e94c92456fa5a74768ec1b22fe6b63 because it causes build error on i386 system.
2023-11-29[OpenMP] Add an 'stddef.h' include to 'omp.h' (#73876)Joseph Huber1-0/+1
Summary: We use `size_t` internally in the omp.h header, which is normally provided by `stdlib.h` which is already included. Howevever, some cases when using `-ffreestanding` can result in this not being defined via `stdlib.h`. This patch simply adds an explicit inclusion of this header, which is provided by the `clang` resource directory, to resolve this in all cases.
2023-11-28[OpenMP] Use simple VLA implementation to replace uses of actual VLAShilei Tian3-4/+64
Use of VLA can cause compile warning that was introduced in D156565. This patch implements a simple stack/heap-based VLA that can miminc the behavior of an actual VLA and prevent the warning. By default the stack accomodates the elements. If the number of emelements is greater than N, which by default is 8, a heap buffer will be allocated and used to acccomodate the elements.
2023-11-28Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA"Shilei Tian3-63/+4
This reverts commit d46f63553ab9ee041884b5306527afefaf00e144.
2023-11-28[OpenMP] Use simple VLA implementation to replace uses of actual VLAShilei Tian3-4/+63
Use of VLA can cause compile warning that was introduced in D156565. This patch implements a simple stack/heap-based VLA that can miminc the behavior of an actual VLA and prevent the warning. By default the stack accomodates the elements. If the number of emelements is greater than N, which by default is 8, a heap buffer will be allocated and used to acccomodate the elements.
2023-11-28Revert "[OpenMP] Use simple VLA implementation to replace uses of actual VLA ↵Shilei Tian3-58/+4
(#71412)" This reverts commit eaab947a8aa39002e8bdaa82be08cbc31e116a11 because it causes link error.
2023-11-28[OpenMP] Use simple VLA implementation to replace uses of actual VLA (#71412)Shilei Tian3-4/+58
Use of VLA can cause compile warning that was introduced in D156565. This patch implements a simple stack/heap-based VLA that can miminc the behavior of an actual VLA and prevent the warning. By default the stack accomodates the elements. If the number of emelements is greater than N, which by default is 8, a heap buffer will be allocated and used to acccomodate the elements.
2023-11-27[runtime] Have the runtime use the compiler builtin for alloca on NetBSD ↵Brad Smith1-0/+4
(#73480) Most of the tests were failing with the following in their logs.. | /usr/bin/ld: /home/brad/llvm-build/runtimes/runtimes-bins/openmp/runtime/src/libomp.so: warning: Warning: reference to the libc supplied alloca(3); this most likely will not work. Please use the compiler provided version of alloca(3), by supplying the appropriate compiler flags (e.g. -std=gnu99). By making use of __builtin_alloca.. before: Total Discovered Tests: 353 Unsupported: 59 (16.71%) Passed : 51 (14.45%) Failed : 243 (68.84%) after: Total Discovered Tests: 353 Unsupported: 59 (16.71%) Passed : 290 (82.15%) Failed : 4 (1.13%)
2023-11-21[OpenMP] Optimized trivial multiple edges from task dependency graphJoachim Jenke2-22/+42
From "3.1 Reducing the number of edges" of this [[ https://hal.science/hal-04136674v1/ | paper ]] - Optimization (b) Task (dependency) nodes have a `successors` list built upon passed dependency. Given the following code, B will be added to A's successors list building the graph `A` -> `B` ``` // A # pragma omp task depend(out: x) {} // B # pragma omp task depend(in: x) {} ``` In the following code, B is currently added twice to A's successor list ``` // A # pragma omp task depend(out: x, y) {} // B # pragma omp task depend(in: x, y) {} ``` This patch removes such dupplicates by checking lastly inserted task in `A` successor list. Authored by: Romain Pereira (rpereira-dev) Differential Revision: https://reviews.llvm.org/D158544
2023-11-17[OpenMP] Add missing pieces in __kmp_launch_worker for Solaris support (#72613)Brad Smith1-2/+2
2023-11-09[OpenMP][SystemZ] Compile __kmpc_omp_task_begin_if0() with backchain (#71834)Ilya Leoshkevich1-0/+8
OpenMP runtime fails to build on SystemZ with clang with the following error message: LLVM ERROR: Unsupported stack frame traversal count __kmpc_omp_task_begin_if0() uses OMPT_GET_FRAME_ADDRESS(1), which delegates to __builtin_frame_address(), which in turn works with nonzero values on SystemZ only if backchain is in use. If backchain is not in use, the above error is emitted. Compile __kmpc_omp_task_begin_if0() with backchain. Note that this only resolves the build error. If at runtime its caller is compiled without backchain, __builtin_frame_address() will produce an incorrect value, but will not cause a crash. Since the value is relevant only for OMPT, this is acceptable.
2023-11-09[OpenMP] Fix a condition for KMP_OS_SOLARIS. (#71831)xingxue-ibm1-1/+1
Line 75 of `z_Linux_util.cpp` checks `#ifdef KMP_OS_SOLARIS` which is always true regardless of the building platform because macro `KMP_OS_SOLARIS` is always defined in line 23 of `kmp_platform.h`: `define KMP_OS_SOLARIS 0`.
2023-11-08[OpenMP] Add skewed iteration distribution on hybrid systems (#69946)Jonathan Peyton5-56/+275
This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955
2023-11-03Add openmp support to System z (#66081)Neale Ferguson10-10/+229
* openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x
2023-11-02[OpenMP] Add support for Solaris/x86_64 (#70593)Brad Smith5-14/+38
Tested on `amd64-pc-solaris2.11`.
2023-10-29[OpenMP] Add missing bit with the Hurd support (#70609)Brad Smith1-1/+1
Looking at 855d09855d8e541176758f38015e8b9b522d6110 it looks like a bit was missing. The padding variable is used further down by the KMP_ALLOCA() function.
2023-10-29[OpenMP] Make use of getloadavg() on *BSD OS's (#70586)Brad Smith1-1/+2
OpenBSD does not have /proc filesystem, neither does FreeBSD (by default).
2023-10-27[OpenMP] Fix building for 32-bit DragonFly, NetBSD, OpenBSD (#70527)Brad Smith1-1/+2
Fixing ```#error "Unknown or unsupported OS"```
2023-10-25[OpenMP][Obvious] Fix function prototype when used in C modeJoseph Huber1-1/+1
Summary: The `llvm_omp_target_dynamic_shared_alloc` prototype in `omp.h` accidentally left the void argument unspecified. This created unintended code when called from the C language, causing some `nvlink` failures in certain scenarios.
2023-10-24[OpenMP] Provide big-endian bitfield definitions (#69995)Ilya Leoshkevich1-1/+38
structs kmp_depend_info.flags and kmp_tasking_flags contain bitfields, which overlay integer flag values. The current bitfield definitions target little-endian machines. On big-endian machines bitfields are laid out in the opposite order, so the current definitions do not work there. There are two ways to fix this: either provide big-endian bitfield definitions, or bit-swap integer flag values. Go with the former, since it's localized to one place and therefore is more maintainable.
2023-10-19[libomptarget][OpenMP] Initial implementation of omp_target_memset() and ↵Michael Klemm5-0/+53
omp_target_memset_async() (#68706) Implement a slow-path version of omp_target_memset*() There is a TODO to implement a fast path that uses an on-device kernel instead of the host-based memory fill operation. This may require some additional plumbing to have kernels in libomptarget.so
2023-09-29[OpenMP] Fix a potential memory buffer overflow (#67252)Shilei Tian1-1/+3
#67167 reports a potential memory overflow caused by the wrong size passed to the function `memcpy_s`. This patch fixes it. Fix #67167.
2023-09-20[OpenMP][VE] Limit the number of threads to create (#66729)Kazushi Marukawa2-0/+15
VE supports up to 64 threads per a VE process. So, we limit the number of threads defined by KMP_MAX_NTH. We also modify the __kmp_sys_max_nth initialization to use KMP_MAX_NTH as a limit.
2023-09-13Fix /tmp approach, and add environment variable method as third fallback ↵Terry Wilmarth3-73/+164
during library registration The /tmp fallback for /dev/shm did not write to a fixed filename, so multiple instances of the runtime would not be able to detect each other. Now, we create the /tmp file in much the same way as the /dev/shm file was created, since mkstemp approach would not work to create a file that other instances of the runtime would detect. Also, add the environment variable method as a third fallback to /dev/shm and /tmp for library registration, as some systems do not have either. Also, add ability to fallback to a subsequent method should a failure occur during any part of the registration process. When unregistering, it is assumed that the method chosen during registration should work, so errors at that point are ignored. This also avoids a problem with multiple threads trying to unregister the library.
2023-09-12[OpenMP] Remove optimization skipping reduction struct initialization (#65697)Rodrigo Ceccato de Freitas1-2/+6
This commit removes an optimization that skips the initialization of the reduction struct if the number of threads in a team is 1. This optimization caused a bug with Hidden Helper Threads. When the task group is initially initialized by the master thread but a Hidden Helper Thread executes a target nowait region, it requires the reduction struct initialization to properly accumulate the data. This commit also adds a LIT test for issue #57522 to ensure that the issue is properly addressed and that the optimization removal does not introduce any regressions. Fixes: #57522
2023-09-10[OpenMP][VE] Support OpenMP runtime on VEKazushi (Jam) Marukawa8-7/+225
Support OpenMP runtime library on VE. This patch makes OpenMP compilable for VE architecture. Almost all tests run correctly on VE. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D159401
2023-09-07[OpenMP] Use the more appropriate function to retrieve the thread id on ↵Brad Smith1-1/+1
OpenBSD (#65553) Use the getthrid() function instead of a syscall.
2023-09-06[OpenMP] Fix build issue with `libomp` when OMPT is disabledShilei Tian1-3/+3
2023-09-06[OpenMP] Fix gettid warnings on DragonFly (#65549)Brad Smith1-1/+1
Define __kmp_gettid() as appropriate for DragonFly.
2023-09-06[OpenMP] Align up the size when calling aligned_alloc (#65525)Shilei Tian1-1/+4
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the `size` is supposed to be a multiple of `alignment`, and it is implementation defined behavior if not. We have a non-conformant use in `kmp_barrier.h` when allocating distribute barrier. The size of the barrier is 576 and the alignment is `4*CACHE_LINE`, which is 256 on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac, but not for Apple Silicon based Mac. Fix #63194.
2023-09-06[OpenMP] Fix a wrong assertion in `__kmp_get_global_thread_id`Shilei Tian1-1/+6
The function assumes that `__kmp_gtid_get_specific` always returns a valid gtid. That is not always true, because when creating the key for thread-specific data, a destructor is assigned. The dtor will be called at thread exit. However, before the dtor is called, the thread-specific data will be reset to NULL first (https://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_key_create.html): > At thread exit, if a key value has a non-NULL destructor pointer, and the thread > has a non-NULL value associated with that key, the value of the key is set to NULL. This will lead to that `__kmp_gtid_get_specific` returns `KMP_GTID_DNE`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159369
2023-09-06[OpenMP] Fix issue of indirect function call in `__kmpc_fork_call_if` (#65436)Shilei Tian1-3/+21
The outlined function is typically invoked by using `__kmp_invoke_microtask`, which is written in asm. D138495 introduces a new interface function for parallel region for OpenMPIRBuilder, where the outlined function is called via the function pointer. For some reason, it works perfectly well on x86 and x86-64 system, but doesn't work on Apple Silicon. The 3rd argument in the callee is always `nullptr`, even if it is not in caller. It appears `x2` always contains `0x0`. This patch adopts the typical method to invoke the function pointer. It works on my M2 Ultra Mac. Fix #63194.
2023-09-01[lldb] Fix duplicate word typos; NFCFangrui Song4-5/+5
Those fixes were taken from https://reviews.llvm.org/D137338
2023-08-31[OpenMP] Fix a segment fault in __kmp_get_global_thread_idShilei Tian1-0/+6
In `__kmp_get_global_thread_id`, if the gtid mode is 1, after getting the gtid from TLS, it will store the gtid value to the thread stack maintained in the thread descriptor. However, `__kmp_get_global_thread_id` can be called when the library is destructed, after the corresponding thread info has been release. This will cause a segment fault. This can happen on an Intel-based Mac. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D159324
2023-08-29[OpenMP] Export __kmpc_set_thread_limit on WindowsMartin Storsjö1-0/+2
This fixes the new test target/target_thread_limit.cpp on Windows, which was added recently in 08bbff4aad57c70a38d5d2680a61901977e66637 / https://reviews.llvm.org/D152054. Differential Revision: https://reviews.llvm.org/D159070
2023-08-28[OpenMP][OMPT] Fix ompt_get_task_memory implementationJoachim Jenke1-17/+4
Since td_allow_completion_event is a member of the taskdata struct, not all firstprivate/shared variables are stored at the end of the task memory allocation. Simply report the whole allocation instead. Furthermore, the function should always return 0 since in no case there is another block to report. Differential Review: https://reviews.llvm.org/D158080
2023-08-26[OpenMP] Codegen support for thread_limit on target directive for hostSandeep Kosuri5-0/+37
offloading - This patch adds support for thread_limit clause on target directive according to OpenMP 51 [2.14.5] - The idea is to create an outer task for target region, when there is a thread_limit clause, and manipulate the thread_limit of task instead. This way, thread_limit will be applied to all the relevant constructs enclosed by the target region. Differential Revision: https://reviews.llvm.org/D152054
2023-08-23[OpenMP] make small memory allocations in loop collapse code on the stackVadim Paretsky1-48/+37
A few places in the loop collapse support code make small dynamic allocations that introduce a noticeable performance overhead when made on the heap. This change moves allocations up to 32 bytes to the stack instead of the heap. Differential Revision: https://reviews.llvm.org/D158220
2023-08-22[OpenMP] Let primary thread gather topology info for each worker threadJonathan Peyton6-63/+56
This change has the primary thread create each thread's initial mask and topology information so it is available immediately after forking. The setting of mask/topology information is decoupled from the actual binding. Also add this setting of topology information inside the __kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND. Without this, there could be a timing window after the primary thread signals the workers to fork where worker threads have not yet established their affinity mask or topology information. Each worker thread will then bind to the location the primary thread sets. Differential Revision: https://reviews.llvm.org/D156727
2023-08-18[OpenMP] Add option to use different units for blocktimeTerry Wilmarth10-99/+137
This change adds the option of using different units for blocktimes specified via the KMP_BLOCKTIME environment variable. The parsing of the environment now recognizes units suffixes: ms and us. If a units suffix is not specified, the default unit is ms. Thus default behavior is still the same, and any previous usage still works the same. Internally, blocktime is now converted to microseconds everywhere, so settings that exceed INT_MAX in microseconds are considered "infinite". kmp_set/get_blocktime are updated to use the units the user specified with KMP_BLOCKTIME, and if not specified, ms are used. Added better range checking and inform messages for the two time units. Large values of blocktime for default (ms) case (beyond INT_MAX/1000) are no longer allowed, but will autocorrect with an INFORM message. The delay for determining ticks per usec was lowered. It is now 1 million ticks which was calculated as ~450us based on 2.2GHz clock which is pretty typical base clock frequency on X86: (1e6 Ticks) / (2.2e9 Ticks/sec) * (1e6 usec/sec) = 454 usec Really short benchmarks can be affected by longer delay. Update KMP_BLOCKTIME docs. Portions of this commit were authored by Johnny Peyton. Differential Revision: https://reviews.llvm.org/D157646
2023-08-03[OMPX] Change `thread_dim` to `block_dim` and the original `block_dim` to ↵Shilei Tian1-6/+6
`grid_dim` There is no `threadDim` in CUDA. Instead, it is `blockDim`. Then the current `blockDim` is `gridDim` in CUDA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D157051
2023-07-31[OpenMP] Add ompx wrappers for __syncthreadsJohannes Doerfert1-0/+55
Differential Revision: https://reviews.llvm.org/D156729
2023-07-31[OpenMP] Introduce ompx.h and 3D wrappers (threadId, threadDim, ...)Johannes Doerfert2-0/+112
The new ompx.h header will give us a place to put extensions. The first are 3D getters for the common cuda values: `{threadId,threadDim,blockId,blockDim}.{x,y,z}` Differential Revision: https://reviews.llvm.org/D156501
2023-07-31[OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITYJonathan Peyton6-123/+455
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API * Add printout of leader to hardware thread dump * Allow OMP_PLACES to restrict fullMask This change fixes an issue with the OMP_PLACES=resource(#) syntax. Before this change, specifying the number of resources did NOT change the default number of threads created by the runtime. e.g., OMP_PLACES=cores(2) would still create __kmp_avail_proc number of threads. After this change, the fullMask and __kmp_avail_proc are modified if necessary so that the final place list dictates which resources are available and how thus, how many threads are created by default. * Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY For OMP_PLACES, two new features are added: 1) OMP_PLACES=cores:<attribute> where <attribute> is either intel_atom, intel_core, or eff# where # is 0 - number of core efficiencies-1. This syntax also supports the optional (#) number selection of resources. 2) OMP_PLACES=core_types|core_effs where this setting will create the number of core_types (or core_effs|core_efficiencies). For KMP_AFFINITY, the granularity setting is expanded to include two new keywords: core_type, and core_eff (or core_efficiency). This will set the granularity to include all cores with a particular core type (or efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will create threads which can float across a single core type. Differential Revision: https://reviews.llvm.org/D154547