aboutsummaryrefslogtreecommitdiff
path: root/openmp/runtime/src/kmp_affinity.cpp
AgeCommit message (Collapse)AuthorFilesLines
12 days[OpenMP] Fixup bugs found during fuzz testing (#143455)Jonathan Peyton1-5/+19
A lot of these only trip when using sanitizers with the library. * Insert forgotten free()s * Change (-1) << amount to 0xffffffffu as left shifting a negative is UB * Fixup integer parser to return INT_MAX when parsing huge string of digits. e.g., 452523423423423423 returns INT_MAX * Fixup range parsing for affinity mask so integer overflow does not occur * Don't assert when branch bits are 0, instead warn user that is invalid and use the default value. * Fixup kmp_set_defaults() so the C version only uses null terminated strings and the Fortran version uses the string + size version. * Make sure the KMP_ALIGN_ALLOC is power of two, otherwise use CACHE_LINE. * Disallow ability to set KMP_TASKING=1 (task barrier) this doesn't work and hasn't worked for a long time. * Limit KMP_HOT_TEAMS_MAX_LEVEL to 1024, an array is allocated based on this value. * Remove integer values for OMP_PROC_BIND. The specification only allows strings and CSV of strings. * Fix setting KMP_AFFINITY=disabled + OMP_DISPLAY_AFFINITY=TRUE
2025-05-05[OpenMP] Fix KMP_OS_AIX handling (#138499)Rainer Orth1-1/+1
When building `openmp` on Linux/sparc64, I get ``` In file included fromopenmp/runtime/src/kmp_utility.cpp:16: openmp/runtime/src/kmp_wrapper_getpid.h:47:2: warning: No gettid found, use getpid instead [-W#warnings] 47 | #warning No gettid found, use getpid instead | ^ ``` This is highly confusing since `<sys/syscall.h>` **does** define `SYS_gettid` and the header is supposed to be included: ``` #if !defined(KMP_OS_AIX) && !defined(KMP_OS_HAIKU) #include <sys/syscall.h> #endif ``` However, this actually is **not** the case for two reasons: - `KMP_OS_HAIKU` is always defined, either as 1 on Haiku or as 0 otherwise. - `KMP_OS_AIX` is even worse: it is only defined as 1 on on AIX, but undefined otherwise. All those `KMP_OS_*` macros are supposed to always be defined as 1/0 as appropriate, and to be checked with `#if`, not `#ifdef`. AIX is violating this, causing the problem above. Other targets probably get `<sys/syscall.h>` indirectly otherwise, but Linux/sparc64 does not. This patch fixes this by also defining `KMP_OS_AIX` as 0 on other OSes and changing the checks to `#if` as necessary. Tested on `sparc64-unknown-linux-gnu`, `sparcv9-sun-solaris2.11`, `amd64-pc-solaris2.11`, and `x86_64-pc-linux-gnu`.
2025-04-02[OpenMP] Add memory allocation using hwloc (#132843)nawrinsu1-0/+1
This patch adds support for memory allocation using hwloc. To enable memory allocation using hwloc, env KMP_TOPOLOGY_METHOD=hwloc needs to be used. If hwloc is not supported/available, allocation will fallback to default path.
2024-10-14[openmp] Use core_siblings_list if physical_package_id not available (#111831)Nikita Popov1-29/+71
On powerpc, physical_package_id may not be available. Currently, this causes openmp to fall back to flat topology and various affinity tests fail. Fix this by parsing core_siblings_list to deterimine which cpus belong to the same socket. This matches what the testing code does. The code to parse the CPU list format thankfully already exists. Fixes https://github.com/llvm/llvm-project/issues/111809.
2024-08-15[OpenMP] Miscellaneous small code improvements (#95603)Hansang Bae1-2/+1
Removes a few uninitialized variables, possible resource leaks, and redundant code.
2024-08-11[openmp][runtime] Silence warningsAlexandre Ganea1-1/+2
This fixes several of those when building with MSVC on Windows: ``` [3625/7617] Building CXX object projects\openmp\runtime\src\CMakeFiles\omp.dir\kmp_affinity.cpp.obj C:\src\git\llvm-project\openmp\runtime\src\kmp_affinity.cpp(2637): warning C4062: enumerator 'KMP_HW_UNKNOWN' in switch of enum 'kmp_hw_t' is not handled C:\src\git\llvm-project\openmp\runtime\src\kmp.h(628): note: see declaration of 'kmp_hw_t' ```
2024-07-29[OpenMP] Assign thread ids in the cpuinfo topology method (#91013)Jonathan Peyton1-0/+26
On non-hyperthreaded machines, the thread id is not always explicit in the /proc/cpuinfo file. This patch adds a check to ensure the thread ids are put in.
2024-07-29[OpenMP] Add topology and affinity changes for Meteor Lake (#91012)Jonathan Peyton1-112/+378
These are Intel-specific changes for the CPUID leaf 31 method for detecting machine topology. * Cleanup known levels usage in x2apicid topology algorithm Change to be a constant mask of all Intel topology type values. * Take unknown ids into account when sorting them If a hardware id is unknown, then put further down the hardware thread list so it will take last priority when assigning to threads. * Have sub ids printed out for hardware thread dump * Add caches to topology New` kmp_cache_ids_t` class helps create cache ids which are then put into the topology table after regular topology type ids have been put in. * Allow empty masks in place list creation Have enumeration information and place list generation take into account that certain hardware threads may be lacking certain layers * Allow different procs to have different number of topology levels Accommodates possible situation where CPUID.1F has different depth for different hardware threads. Each hardware thread has a topology description which is just a small set of its topology levels. These descriptions are tracked to see if the topology is uniform or not. * Change regular ids with logical ids Instead of keeping the original sub ids that the x2apicid topology detection algorithm gives, change each id to its logical id which is a number: [0, num_items - 1]. This makes inserting new layers into the topology significantly simpler. * Insert caches into topology This change takes into account that most topologies are uniform and therefore can use the quicker method of inserting caches as equivalent layers into the topology.
2024-04-26[OpenMP][AIX] Use syssmt() to get the number of SMTs per physical CPU (#89985)Xing Xue1-9/+1
This patch changes to use system call `syssmt()` instead of `lpar_get_info()` to get the number of SMTs (logical processors) per physical processor for AIX. `lpar_get_info()` gives the max number of SMTs that the physical processor can support while `syssmt()` returns the number that is currently configured.
2024-04-03[OpenMP] Add absolute KMP_HW_SUBSET functionality (#85326)Jonathan Peyton1-75/+95
Users can put a : in front of KMP_HW_SUBSET to indicate that the specified subset is an "absolute" subset. Currently, when a user puts KMP_HW_SUBSET=1t. This gets translated to KMP_HW_SUBSET="*s,*c,1t", where * means "use all of". If a user wants only one thread as the entire topology they can now do KMP_HW_SUBSET=:1t. Along with the absolute syntax is a fix for newer machines and making them easier to use with only the 3-level topology syntax. When a user puts KMP_HW_SUBSET=1s,4c,2t on a machine which actually has 4 layers, (say 1s,2m,3c,2t as the entire machine) the user gets an unexpected "too many resources asked" message because KMP_HW_SUBSET currently translates the "4c" value to mean 4 cores per module. To help users out, the runtime can assume that these newer layers, module in this case, should be ignored if they are not specified, but the topology should always take into account the sockets, cores, and threads layers.
2024-03-22[OpenMP][AIX] Affinity implementation for AIX (#84984)Xing Xue1-8/+122
This patch implements `affinity` for AIX, which is quite different from platforms such as Linux. - Setting CPU affinity through masks and related functions are not supported. System call `bindprocessor()` is used to bind a thread to one CPU per call. - There are no system routines to get the affinity info of a thread. The implementation of `get_system_affinity()` for AIX gets the mask of all available CPUs, to be used as the full mask only. - Topology is not available from the file system. It is obtained through system SRAD (Scheduler Resource Allocation Domain). This patch has run through the libomp LIT tests successfully with `affinity` enabled.
2024-03-13[OpenMP] Sort topology after adding processor group layer. (#83943)MessyHack1-0/+3
Various behavior around creating affinity masks and detecting uniform topology depends on the topology being sorted. resort topology after adding processor group layer to ensure that the updated topology reflects the newly added processor group info. Observed that the topology was not sorted correctly on high core count AMD Epyc Genoa (2 sockets, 96 cores, 2 threads) using NUMA (NPS 2+).
2024-03-12[OpenMP] Add debug checks for divide by zero (#83300)Jonathan Peyton1-0/+2
2024-03-11[OpenMP] Fixup while loops to avoid bad NULL check (#83302)Jonathan Peyton1-8/+3
2024-03-10[openmp] adding affinity support to DragonFlyBSD. (#84672)David CARLIER1-3/+5
2024-03-09[openmp] porting affinity feature to netbsd. (#84618)David CARLIER1-3/+5
netbsd supports the portable hwloc's layer as well. for a hardware with 4 cpus, a cpu set is 4 and maxcpus is 256.
2024-01-23Re-land [openmp] Fix warnings when building on Windows with latest MSVC or ↵Alexandre Ganea1-7/+18
Clang ToT (#77853) The reverts 94f960925b7f609636fc2ffd83053814d5e45ed1 and fixes it.
2024-01-23Revert 10f3296dd7d74c975f208a8569221dc8f96d1db1 - [openmp] Fix warnings when ↵Alexandre Ganea1-20/+7
building on Windows with latest MSVC or Clang ToT (#77853) It broke the AMDGPU buildbot: https://lab.llvm.org/buildbot/#/builders/193/builds/45378
2024-01-23[openmp] Fix warnings when building on Windows with latest MSVC or Clang ToT ↵Alexandre Ganea1-7/+20
(#77853) There were quite a few compilation warnings when building openmp on Windows with the latest Visual Studios 2022 version 17.8.4. Some other warnings were visible with the latest Clang at tip. This commit fixes all of them.
2024-01-18[openmp] Revert 64874e5ab5fd102344d43ac9465537a44130bf19 since it was ↵Alexandre Ganea1-9/+5
committed by mistake and the PR (https://github.com/llvm/llvm-project/pull/77853) wasn't approved yet.
2024-01-17[openmp] Silence warnings when building the LLVM release with MSVCAlexandre Ganea1-5/+9
2023-11-08[OpenMP] Add skewed iteration distribution on hybrid systems (#69946)Jonathan Peyton1-6/+32
This commit adds skewed distribution of iterations in nonmonotonic:dynamic schedule (static steal) for hybrid systems when thread affinity is assigned. Currently, it distributes the iterations at 60:40 ratio. Consider this loop with dynamic schedule type, for (int i = 0; i < 100; ++i). In a hybrid system with 20 hardware threads (16 CORE and 4 ATOM core), 88 iterations will be assigned to performance cores and 12 iterations will be assigned to efficient cores. Each thread with CORE core will process 5 iterations + extras and with ATOM core will process 3 iterations. Differential Revision: https://reviews.llvm.org/D152955
2023-11-03Add openmp support to System z (#66081)Neale Ferguson1-0/+33
* openmp/README.rst - Add s390x to those platforms supported * openmp/libomptarget/plugins-nextgen/CMakeLists.txt - Add s390x subdirectory * openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt - Add s390x definitions * openmp/runtime/CMakeLists.txt - Add s390x to those platforms supported * openmp/runtime/cmake/LibompGetArchitecture.cmake - Define s390x ARCHITECTURE * openmp/runtime/cmake/LibompMicroTests.cmake - Add dependencies for System z (aka s390x) * openmp/runtime/cmake/LibompUtils.cmake - Add S390X to the mix * openmp/runtime/cmake/config-ix.cmake - Add s390x as a supported LIPOMP_ARCH * openmp/runtime/src/kmp_affinity.h - Define __NR_sched_[get|set]addinity for s390x * openmp/runtime/src/kmp_config.h.cmake - Define CACHE_LINE for s390x * openmp/runtime/src/kmp_os.h - Add KMP_ARCH_S390X to support checks * openmp/runtime/src/kmp_platform.h - Define KMP_ARCH_S390X * openmp/runtime/src/kmp_runtime.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/kmp_tasking.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h - Define ITT_ARCH_S390X * openmp/runtime/src/z_Linux_asm.S - Instantiate __kmp_invoke_microtask for s390x * openmp/runtime/src/z_Linux_util.cpp - Generate code when KMP_ARCH_S390X is defined * openmp/runtime/test/ompt/callback.h - Define print_possible_return_addresses for s390x * openmp/runtime/tools/lib/Platform.pm - Return s390x as platform and host architecture * openmp/runtime/tools/lib/Uname.pm - Set hardware platform value for s390x
2023-09-01[lldb] Fix duplicate word typos; NFCFangrui Song1-2/+2
Those fixes were taken from https://reviews.llvm.org/D137338
2023-08-22[OpenMP] Let primary thread gather topology info for each worker threadJonathan Peyton1-16/+24
This change has the primary thread create each thread's initial mask and topology information so it is available immediately after forking. The setting of mask/topology information is decoupled from the actual binding. Also add this setting of topology information inside the __kmp_partition_places mechanism for OMP_PLACES+OMP_PROC_BIND. Without this, there could be a timing window after the primary thread signals the workers to fork where worker threads have not yet established their affinity mask or topology information. Each worker thread will then bind to the location the primary thread sets. Differential Revision: https://reviews.llvm.org/D156727
2023-07-31[OpenMP] Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITYJonathan Peyton1-52/+203
* Add KMP_CPU_EQUAL and KMP_CPU_ISEMPTY to affinity mask API * Add printout of leader to hardware thread dump * Allow OMP_PLACES to restrict fullMask This change fixes an issue with the OMP_PLACES=resource(#) syntax. Before this change, specifying the number of resources did NOT change the default number of threads created by the runtime. e.g., OMP_PLACES=cores(2) would still create __kmp_avail_proc number of threads. After this change, the fullMask and __kmp_avail_proc are modified if necessary so that the final place list dictates which resources are available and how thus, how many threads are created by default. * Introduce hybrid core attributes to OMP_PLACES and KMP_AFFINITY For OMP_PLACES, two new features are added: 1) OMP_PLACES=cores:<attribute> where <attribute> is either intel_atom, intel_core, or eff# where # is 0 - number of core efficiencies-1. This syntax also supports the optional (#) number selection of resources. 2) OMP_PLACES=core_types|core_effs where this setting will create the number of core_types (or core_effs|core_efficiencies). For KMP_AFFINITY, the granularity setting is expanded to include two new keywords: core_type, and core_eff (or core_efficiency). This will set the granularity to include all cores with a particular core type (or efficiency). e.g., KMP_AFFINITY=granularity=core_type,compact will create threads which can float across a single core type. Differential Revision: https://reviews.llvm.org/D154547
2023-07-24[OpenMP] Re-use affinity raii class in worker spawningJonathan Peyton1-22/+0
Get rid of explicit mask alloc, getthreadaffinity, set temp affinity, reset to old affinity, dealloc steps in favor of existing kmp_affinity_raii_t to push/pop a temporary affinity. Differential Revision: https://reviews.llvm.org/D154650
2023-07-06[OpenMP] Ensure socket layer is not first in CPUID topology detectionJonathan Peyton1-1/+5
* Return 0 length topology if socket layer is detected first * Fix DEBUG ASSERT
2023-01-19[OpenMP][libomp] Insert correct HWLOC version guardsGilles Gouaillardet1-0/+8
Put needed HWLOC version guards around relevant HWLOC API. Tested OpenMP host runtime build with HWLOC 1.11.13, 2.0-2.9. Differential Revision: https://reviews.llvm.org/D142152 Fix #54951
2023-01-18[OpenMP][libomp] Switch Intel topology type values: module, tileJonathan Peyton1-2/+2
According to Software Developer Manual, modules should be value 3 and tile should be value 4.
2023-01-16[OpenMP][libomp] Add topology information to thread structureJonathan Peyton1-1/+132
Each time a thread gets a new affinity assigned, it will not only assign its mask, but also topology information including which socket, core, thread and core-attributes (if available) it is now assigned. This occurs for all non-disabled KMP_AFFINITY values as well as OMP_PLACES/OMP_PROC_BIND. The information regarding which socket, core, etc. can take on three values: 1) The actual ID of the unit (0 - (N-1)), given N units 2) UNKNOWN_ID (-1) which indicates it does not know which ID 3) MULTIPLE_ID (-2) which indicates the thread is spread across multiple of this unit (e.g., affinity mask is spread across multiple hardware threads) This new information is stored in th_topology_ids[] array. An example how to get the socket Id, one would read th_topology_ids[KMP_HW_SOCKET]. This could be expanded in the future to something more descriptive for the "multiple" case, like a range of values. For now, the single value suffices. The information regarding the core attributes can take on two values: 1) The actual core-type or core-eff 2) KMP_HW_CORE_TYPE_UNKNOWN if the core type is unknown, and UNKNOWN_CORE_EFF (-1) if the core eff is unknown. This new information is stored in th_topology_attrs. An example how to get the core type, one would read th_topology_attrs.core_type. Differential Revision: https://reviews.llvm.org/D139854
2022-12-13[OpenMP] Skip extra blank line when parsing /proc/cpuinfo on LoongArch64gonglingqin1-0/+11
This fixes the following test cases: * affinity/kmp-affinity.c * affinity/kmp-hw-subset.c * affinity/omp-places.c Differential Revision: https://reviews.llvm.org/D139802
2022-11-02[OpenMP][libomp] Fix disabled affinityJonathan Peyton1-2/+10
Fix setting affinity type and topology method when affinity is disabled and fix places that were not taking into account that affinity can be explicitly disabled by putting proper KMP_AFFINITY_CAPABLE() check. Differential Revision: https://reviews.llvm.org/D137176
2022-10-28[OpenMP][libomp] Add hidden helper affinityJonathan Peyton1-43/+51
Add new hidden helper affinity via the environment variable, KMP_HIDDEN_HELPER_AFFINITY, which allows users to assign thread affinity to hidden helper threads using the same syntax as KMP_AFFINITY. OMP_PLACES/OMP_PROC_BIND have no interaction with KMP_HIDDEN_HELPER_AFFINITY. Differential Revision: https://reviews.llvm.org/D135113
2022-10-28[OpenMP][libomp] Make affinity warnings parameterizedJonathan Peyton1-32/+34
Separate change for the warnings to depend on the relevant affinity settings verbose and warnings settings. Differential Revision: https://reviews.llvm.org/D135112
2022-10-28[OpenMP][libomp] Parameterize affinity functionsJonathan Peyton1-262/+290
This patch parameterizes the affinity initialization code to allow multiple affinity settings. Almost all global affinity settings are consolidated and put into a structure kmp_affinity_t. This is in anticipation of the addition of hidden helper affinity which will have the same syntax and semantics as KMP_AFFINITY only for the hidden helper team. Differential Revision: https://reviews.llvm.org/D135109
2022-10-03[OpenMP][libomp] Allow unused-but-set warningsJonathan Peyton1-2/+0
Only a few remaining which are taken care of by this patch. Differential Revision: https://reviews.llvm.org/D133528
2022-07-19[OpenMP][libomp] Fix affinity warnings and unify under one macroJonathan Peyton1-82/+48
Warnings that occur during affinity initialization are supposed to be guarded by KMP_AFFINITY=nowarnings,noverbose, but some had been missed by this logic. Create one macro for affinity warnings that takes these settings into account. Differential Revision: https://reviews.llvm.org/D125991
2022-07-19[OpenMP][libomp] Allow reset affinity mask after parallelAndreyChurbanov1-0/+22
Added control to reset affinity of primary thread after outermost parallel region to initial affinity encountered before OpenMP runtime was initialized. KMP_AFFINITY environment variable reset/noreset modifier introduced. Default behavior is unchanged. Differential Revision: https://reviews.llvm.org/D125993
2022-04-12[OpenMP][libomp] Replace global variable references with local objectJonathan Peyton1-3/+3
Remove references to global __kmp_topology within a kmp_topology_t object method. There should just be implicit references to the private object.
2022-02-14[OpenMP][libomp] Introduce oneAPI compiler supportJonathan Peyton1-1/+1
Introduce KMP_COMPILER_ICX macro to represent compilation with oneAPI compiler. Fixup flag detection and compiler ID detection in CMake. Older CMake's detect IntelLLVM as Clang. Fix compiler warnings. Fixup many of the tests to have non-empty parallel regions as they are elided by oneAPI compiler.
2022-02-09[OpenMP][libomp] Replace accidental VLA with KMP_ALLOCAJonathan Peyton1-1/+1
MSVC does not support variable length arrays. Replace with KMP_ALLOCA which is already used in the same file for stack-allocated variables.
2021-12-20[OpenMP][libomp] Add use-all syntax to KMP_HW_SUBSETJonathan Peyton1-4/+8
This patch allows the user to request all resources of a particular layer (or core-attribute). The syntax of KMP_HW_SUBSET is modified so the number of units requested is optional or can be replaced with an '*' character. e.g., KMP_HW_SUBSET=c:intel_atom@3 will use all the cores after offset 3 e.g., KMP_HW_SUBSET=*c:intel_core will use all the big cores e.g., KMP_HW_SUBSET=*s,*c,1t will use all the sockets, all cores per each socket and 1 thread per core. Differential Revision: https://reviews.llvm.org/D115826
2021-12-14[OpenMP][libomp] Fix compile errors with new KMP_HW_SUBSET changesJonathan Peyton1-0/+2
Add missing guards around x86-specific code. Reviewed By: kaz7 Differential Revision: https://reviews.llvm.org/D115664
2021-12-10[OpenMP][libomp] Add core attributes to KMP_HW_SUBSETJonathan Peyton1-57/+386
Allow filtering of resources based on core attributes. There are two new attributes added: 1) Core Type (intel_atom, intel_core) 2) Core Efficiency (integer) where the higher the efficiency, the more performant the core On hybrid architectures , e.g., Alder Lake, users can specify KMP_HW_SUBSET=4c:intel_atom,4c:intel_core to select the first four Atom and first four Big cores. The can also use the efficiency syntax. e.g., KMP_HW_SUBSET=2c:eff0,2c:eff1 Differential Revision: https://reviews.llvm.org/D114901
2021-11-17[OpenMP][libomp] Enable HWLOC topology detection of multiple CPU kindsPeyton, Jonathan L1-10/+62
Teach the HWLOC topology method how to detect Atom and Core types so hybrid CPUs are properly detected and represented when using the HWLOC topology method. Differential Revision: https://reviews.llvm.org/D112270
2021-11-17[OpenMP][libomp] Improve Windows Processor Group handling within topologyPeyton, Jonathan L1-5/+95
The current implementation of Windows Processor Groups has a separate topology method to handle them. This patch deprecates that specific method and uses the regular CPUID topology method by default and inserts the Windows Processor Group objects in the topology manually. Notes: * The preference for processor groups is lowered to a value less than socket so that the user will see sockets in the KMP_AFFINITY=verbose output instead of processor groups when sockets=processor groups. * The topology's capacity is modified to handle additional topology layers without the need for reallocation. * If a user asks for a granularity setting that is "above" the processor group layer, then the granularity is adjusted "down" to the processor group since this is the coarsest layer available for threads. Differential Revision: https://reviews.llvm.org/D112273
2021-11-17[OpenMP][libomp] Add support for offline CPUs in LinuxPeyton, Jonathan L1-3/+73
If some CPUs are offline, then make sure they are not included in the fullMask even if norespect is given to KMP_AFFINITY. Differential Revision: https://reviews.llvm.org/D112274
2021-11-17[OpenMP][libomp] Allow users to specify KMP_HW_SUBSET in any orderPeyton, Jonathan L1-17/+3
Remove restriction forcing users to specify the KMP_HW_SUBSET value in topology order. This patch sorts the user KMP_HW_SUBSET value before trying to apply it. For example: 1s,4c,2t is equivalent to 2t,1s,4c Differential Revision: https://reviews.llvm.org/D112027
2021-10-14[OpenMP][host runtime] Add initial hybrid CPU supportPeyton, Jonathan L1-0/+75
Detect, through CPUID.1A, and show user different core types through KMP_AFFINITY=verbose mechanism. Offer future runtime optimizations __kmp_is_hybrid_cpu() to know whether running on a hybrid system or not. Differential Revision: https://reviews.llvm.org/D110435