aboutsummaryrefslogtreecommitdiff
path: root/openmp
diff options
context:
space:
mode:
authorJose M Monsalve Diaz <jmonsalvediaz@anl.gov>2021-07-27 21:46:39 -0400
committerShilei Tian <tianshilei1992@gmail.com>2021-07-27 21:47:12 -0400
commit5ab6aedda9d959a44453b7163b59f645012dbb83 (patch)
treeab4aec1aba18256261589cf1bf483bac66f754d5 /openmp
parent4819b751bd875f458eb0060f7c586aa9ac41965c (diff)
downloadllvm-5ab6aedda9d959a44453b7163b59f645012dbb83.zip
llvm-5ab6aedda9d959a44453b7163b59f645012dbb83.tar.gz
llvm-5ab6aedda9d959a44453b7163b59f645012dbb83.tar.bz2
[OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033
Diffstat (limited to 'openmp')
-rw-r--r--openmp/libomptarget/deviceRTLs/target_interface.h4
1 files changed, 2 insertions, 2 deletions
diff --git a/openmp/libomptarget/deviceRTLs/target_interface.h b/openmp/libomptarget/deviceRTLs/target_interface.h
index a4961c3..c7ac065 100644
--- a/openmp/libomptarget/deviceRTLs/target_interface.h
+++ b/openmp/libomptarget/deviceRTLs/target_interface.h
@@ -18,8 +18,8 @@
// Calls to the NVPTX layer (assuming 1D layout)
EXTERN int __kmpc_get_hardware_thread_id_in_block();
EXTERN int GetBlockIdInKernel();
-EXTERN int __kmpc_get_hardware_num_blocks();
-EXTERN int __kmpc_get_hardware_num_threads_in_block();
+EXTERN NOINLINE int __kmpc_get_hardware_num_blocks();
+EXTERN NOINLINE int __kmpc_get_hardware_num_threads_in_block();
EXTERN unsigned GetWarpId();
EXTERN unsigned GetWarpSize();
EXTERN unsigned GetLaneId();