diff options
author | Johannes Doerfert <johannes@jdoerfert.de> | 2023-03-02 18:35:15 -0800 |
---|---|---|
committer | Johannes Doerfert <johannes@jdoerfert.de> | 2023-07-24 22:04:45 -0700 |
commit | ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2 (patch) | |
tree | b3f8edde0b515fde75774548bf3dbb07eb98e640 /clang/lib/CodeGen/CodeGenModule.h | |
parent | fb2a971c015fa991b47aa8d93bd97379c012cb68 (diff) | |
download | llvm-ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2.zip llvm-ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2.tar.gz llvm-ef9ec4bbcca2fa4f64df47bc426f1d1c59ea47e2.tar.bz2 |
[OpenMP] Add the `ompx_attribute` clause for target directives
CUDA and HIP have kernel attributes to tune the code generation (in the
backend). To reuse this functionality for OpenMP target regions we
introduce the `ompx_attribute` clause that takes these kernel
attributes and emits code as if they had been attached to the kernel
fuction (which is implicitly generated).
To limit the impact, we only support three kernel attributes:
`amdgpu_waves_per_eu`, for AMDGPU
`amdgpu_flat_work_group_size`, for AMDGPU
`launch_bounds`, for NVPTX
The existing implementations of those attributes are used for error
checking and code generation. `ompx_attribute` can be attached to any
executable target region and it can hold more than one kernel attribute.
Differential Revision: https://reviews.llvm.org/D156184
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.h')
-rw-r--r-- | clang/lib/CodeGen/CodeGenModule.h | 15 |
1 files changed, 15 insertions, 0 deletions
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h index 05cb217..f5fd944 100644 --- a/clang/lib/CodeGen/CodeGenModule.h +++ b/clang/lib/CodeGen/CodeGenModule.h @@ -1557,6 +1557,21 @@ public: /// because we'll lose all important information after each repl. void moveLazyEmissionStates(CodeGenModule *NewBuilder); + /// Emit the IR encoding to attach the CUDA launch bounds attribute to \p F. + void handleCUDALaunchBoundsAttr(llvm::Function *F, + const CUDALaunchBoundsAttr *A); + + /// Emit the IR encoding to attach the AMD GPU flat-work-group-size attribute + /// to \p F. Alternatively, the work group size can be taken from a \p + /// ReqdWGS. + void handleAMDGPUFlatWorkGroupSizeAttr( + llvm::Function *F, const AMDGPUFlatWorkGroupSizeAttr *A, + const ReqdWorkGroupSizeAttr *ReqdWGS = nullptr); + + /// Emit the IR encoding to attach the AMD GPU waves-per-eu attribute to \p F. + void handleAMDGPUWavesPerEUAttr(llvm::Function *F, + const AMDGPUWavesPerEUAttr *A); + private: llvm::Constant *GetOrCreateLLVMFunction( StringRef MangledName, llvm::Type *Ty, GlobalDecl D, bool ForVTable, |