[CUDA][HIP] Fix linkage for -fgpu-rdc

Currently for explicit template function instantiation in CUDA/HIP device compilation clang emits instantiated kernel with external linkage and instantiated device function with internal linkage. This is fine for -fno-gpu-rdc since there is only one TU. However this causes duplicate symbols for kernels for -fgpu-rdc if the same instantiation happen in multiple TU. Or missing symbols if a device function calls an explicitly instantiated template function in a different TU. To make explicit template function instantiation work for -fgpu-rdc we need to follow the C++ linkage paradigm, i.e. use weak_odr linkage. Differential Revision: https://reviews.llvm.org/D90311
author: Yaxun (Sam) Liu <yaxun.liu@amd.com> 2020-10-28 10:44:21 -0400
committer: Yaxun (Sam) Liu <yaxun.liu@amd.com> 2020-11-03 08:07:19 -0500
commit: abd8cd9199d1e14cae961e1067b78df7044179a3 (patch)
tree: 9ebeb3833d3a57f2075f0019781fbedaf789938e /clang/lib/CodeGen/CodeGenModule.cpp
parent: c009d11bdac4a7f4a3a8ae85e42da053828a6f24 (diff)
download: llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.zip
llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.tar.gz
llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.tar.bz2
1 files changed, 7 insertions, 4 deletions
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 9512b35..1efc39b 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -4483,13 +4483,16 @@ llvm::GlobalValue::LinkageTypes CodeGenModule::getLLVMLinkageForDeclarator(
   // and must all be equivalent. However, we are not allowed to
   // throw away these explicit instantiations.
   //
-  // We don't currently support CUDA device code spread out across multiple TUs,
+  // CUDA/HIP: For -fno-gpu-rdc case, device code is limited to one TU,
   // so say that CUDA templates are either external (for kernels) or internal.
-  // This lets llvm perform aggressive inter-procedural optimizations.
+  // This lets llvm perform aggressive inter-procedural optimizations. For
+  // -fgpu-rdc case, device function calls across multiple TU's are allowed,
+  // therefore we need to follow the normal linkage paradigm.
   if (Linkage == GVA_StrongODR) {
-    if (Context.getLangOpts().AppleKext)
+    if (getLangOpts().AppleKext)
       return llvm::Function::ExternalLinkage;
-    if (Context.getLangOpts().CUDA && Context.getLangOpts().CUDAIsDevice)
+    if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice &&
+        !getLangOpts().GPURelocatableDeviceCode)
       return D->hasAttr<CUDAGlobalAttr>() ? llvm::Function::ExternalLinkage
                                           : llvm::Function::InternalLinkage;
     return llvm::Function::WeakODRLinkage;
author	Yaxun (Sam) Liu <yaxun.liu@amd.com>	2020-10-28 10:44:21 -0400
committer	Yaxun (Sam) Liu <yaxun.liu@amd.com>	2020-11-03 08:07:19 -0500
commit	abd8cd9199d1e14cae961e1067b78df7044179a3 (patch)
tree	9ebeb3833d3a57f2075f0019781fbedaf789938e /clang/lib/CodeGen/CodeGenModule.cpp
parent	c009d11bdac4a7f4a3a8ae85e42da053828a6f24 (diff)
download	llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.zip llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.tar.gz llvm-abd8cd9199d1e14cae961e1067b78df7044179a3.tar.bz2