aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CGCUDANV.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-06-12Reland [HIP] use offload wrapper for non-device-only non-rdc (#143964)Yaxun (Sam) Liu1-1/+2
Fixed a typo: - auto Section = (Prefix + "llvm_offload_entries").str(); + auto Section = (Prefix + "_offload_entries").str(); which broke buildbot e.g. https://lab.llvm.org/buildbot/#/builders/208/builds/1948
2025-06-12Revert "Reland [HIP] use offload wrapper for non-device-only non-rdc ↵Yaxun (Sam) Liu1-2/+1
(#132869) (#143964)" This reverts commit 22f9b4aa1dad597d908be77be1e10ba4c77330ce.
2025-06-12Reland [HIP] use offload wrapper for non-device-only non-rdc (#132869) (#143964)Yaxun (Sam) Liu1-1/+2
Fixed two issues: 1. assertion with -flto. the linker wrapper action is missing for wrapping the device binary. Added it for -flto. 2. when there are two HIP files, the kernels in the second file were not found. This is because the -r option of linker wrapper assumes offload entries section of HIP to be hip_offloading_entries but it is actually llvm_offload_entries, causing the offload entries sections not made unique for different object files. Fixed and tested working for both -fgpu-rdc and -fno-gpu-rdc case with and without -r
2025-06-09Revert "[HIP] use offload wrapper for non-device-only non-rdc (#132869)" ↵Joseph Huber1-2/+1
(#143432) This breaks a lot of new driver HIP compilation. We should probably revert this for now until we can make a fixed version. ```c++ static __global__ void print() { printf("%s\n", "foo"); } void b(); int main() { hipLaunchKernelGGL(print, dim3(1), dim3(1), 0, 0); auto y = hipDeviceSynchronize(); b(); } ``` ```c++ static __global__ void print() { printf("%s\n", "bar"); } void b() { hipLaunchKernelGGL(print, dim3(1), dim3(1), 0, 0); auto y = hipDeviceSynchronize(); } ``` ```console $ clang++ a.hip b.hip --offload-arch=gfx1030 --offload-new-driver $ ./a.out foo foo ``` ```console $ clang++ a.hip b.hip --offload-arch=gfx1030 --offload-new-driver -flto <crash> ``` This reverts commit d54c28b9c1396fa92d9347ac1135da7907121cb8.
2025-05-04[clang] Remove unused local variables (NFC) (#138453)Kazu Hirata1-1/+0
2025-04-09[HIP] use offload wrapper for non-device-only non-rdc (#132869)Yaxun (Sam) Liu1-1/+2
Currently HIP still uses offload bundler for non-rdc mode for the new offload driver. This patch switches to use offload wrapper for non-device-only non-rdc mode when new offload driver is enabled. This makes the rdc and non-rdc compilation more consistent and speeds up compilation since the offload wrapper supports parallel compilation for different GPU arch's. It is implemented by adding a linker wrapper action for each assemble action of input file. Linker wrapper action differentiates this special type of work vs normal linker wrapper work by the fle type. This type of work results in object instead of image. The linker wrapper adds "-r" for it and only includes the object file as input, not the host libraries. For device-only non-RDC mode, the new driver keeps the original behavior.
2025-02-06[Offload] Unify offloading entries into a single section (#125731)Joseph Huber1-13/+12
Summary: This patch unifies the existing offloading entires into a single section called `llvm_offload_entires`. This lets us use a more unified offloading infrastructure so that all targets share the same handling. The effect is that people in the runtimes now need to check if the kind is what they expect, but the expectation is that you can combine multiple potential providers into a compile job. Doesn't fully work yet because of other runtime issues, but some day. Mostly this helps the future of liboffload where we want to handle different languages than OpenMP.
2025-01-28[Offload] Rework offloading entry type to be more generic (#124018)Joseph Huber1-18/+9
Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.
2025-01-22[HIP] Support managed variables using the new driver (#123437)Joseph Huber1-6/+28
Summary: Previously, managed variables didn't work in rdc mode using the new driver because we just didn't register them. This was previously ignored because we didn't have enough space in the current struct format. This patch amends that by just emitting a struct pair for the two variables and using the single pointer. In the future, a more extensible entry format would be nice, but that can be done later.
2024-09-11[HIP][Clang][CodeGen] Handle hip bin symbols properly. (#107458)jofrn1-8/+11
Remove '_' in fatbin and gpubin symbol suffixes when missing TU hash ID. Internalize gpubin symbol so that it is not unresolved at link-time when symbol is not relocatable.
2024-08-12[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549)Johannes Doerfert1-15/+82
Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm*` functions to orchestrate the kernel launch rather than `cuda*` functions. These `llvm*` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void *llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void *DevicePtr, int DeviceNum); } __global__ void square(int *A) { *A = 42; } int main(int argc, char **argv) { int DevNo = 0; int *Ptr = reinterpret_cast<int *>(llvm_omp_target_alloc_shared(4, DevNo)); *Ptr = 7; printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, *Ptr %i\n", Ptr, *Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, *Ptr 7 Ptr 0x155448ac8000, *Ptr 42 ```
2024-08-09[DebugInfo][RemoveDIs] Use iterator-inserters in clang (#102006)Jeremy Morse1-3/+3
As part of the LLVM effort to eliminate debug-info intrinsics, we're moving to a world where only iterators should be used to insert instructions. This isn't a problem in clang when instructions get generated before any debug-info is inserted, however we're planning on deprecating and removing the instruction-pointer insertion routines. Scatter some calls to getIterator in a few places, remove a deref-then-addrof on another iterator, and add an overload for the createLoadInstBefore utility. Some callers passes a null insertion point, which we need to handle explicitly now.
2024-05-30[CodeGen] Remove useless zero-index constant GEPs (NFCI)Nikita Popov1-8/+2
Remove zero-index constant expression GEPs, which are not needed with opaque pointers and will get folded away.
2024-05-01[CUDA] make kernel stub ICF-proof (#90155)Yaxun (Sam) Liu1-0/+27
MSVC linker merges functions having comdat which have identical set of instructions. CUDA uses kernel stub function as key to look up kernels in device executables. If kernel stub function for different kernels are merged by ICF, incorrect kernels will be launched. To prevent ICF from merging kernel stub functions, an unique global variable is created for each kernel stub function having comdat and a store is added to the kernel stub function. This makes the set of instructions in each kernel function unique. Fixes: https://github.com/llvm/llvm-project/issues/88883
2024-04-11[NFC][Clang] Improve const correctness for IdentifierInfo (#79365)Bill Wendling1-1/+1
The IdentifierInfo isn't typically modified. Use 'const' wherever possible.
2024-03-28[CodeGen][arm64e] Add methods and data members to Address, which are needed ↵Akira Hatanaka1-9/+10
to authenticate signed pointers (#86923) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies d9a685a9dd589486e882b722e513ee7b8c84870c, which was reverted because it broke ubsan bots. There seems to be a bug in coroutine code-gen, which is causing EmitTypeCheck to use the wrong alignment. For now, pass alignment zero to EmitTypeCheck so that it can compute the correct alignment based on the passed type (see function EmitCXXMemberOrOperatorMemberCallExpr).
2024-03-27Revert "[CodeGen][arm64e] Add methods and data members to Address, which are ↵Akira Hatanaka1-10/+9
needed to authenticate signed pointers (#86721)" (#86898) This reverts commit d9a685a9dd589486e882b722e513ee7b8c84870c. The commit broke ubsan bots.
2024-03-27[CodeGen][arm64e] Add methods and data members to Address, which are needed ↵Akira Hatanaka1-9/+10
to authenticate signed pointers (#86721) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects. This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit broke msan bots because LValue::IsKnownNonNull was uninitialized.
2024-03-26Revert "[CodeGen][arm64e] Add methods and data members to Address, which are ↵Akira Hatanaka1-10/+9
needed to authenticate signed pointers (#67454)" (#86674) This reverts commit 8bd1f9116aab879183f34707e6d21c7051d083b6. It appears that the commit broke msan bots.
2024-03-25[CodeGen][arm64e] Add methods and data members to Address, which are needed ↵Akira Hatanaka1-9/+10
to authenticate signed pointers (#67454) To authenticate pointers, CodeGen needs access to the key and discriminators that were used to sign the pointer. That information is sometimes known from the context, but not always, which is why `Address` needs to hold that information. This patch adds methods and data members to `Address`, which will be needed in subsequent patches to authenticate signed pointers, and uses the newly added methods throughout CodeGen. Although this patch isn't strictly NFC as it causes CodeGen to use different code paths in some cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any changes in functionality as it doesn't add any information needed for authentication. In addition to the changes mentioned above, this patch introduces class `RawAddress`, which contains a pointer that we know is unsigned, and adds several new functions for creating `Address` and `LValue` objects.
2024-03-22[HIP][NFC] Refactor managed var codegen (#85976)Yaxun (Sam) Liu1-18/+9
Refactor managed variable handling in codegen so that the transformation is done separately from registration. This will allow the new driver to register the managed var in the linker wrapper.
2024-03-19Reapply "[NFC][RemoveDIs] Switch ConstantExpr::getAsInstruction to not ↵Stephen Tozer1-1/+2
insert (#84737)" Fixes a build error caused by an unupdated getAsInstruction callsite in clang. This reverts commit ab851f7fe946e7eed700ef9d82082eb721860189.
2024-02-22[HIP] Allow partial linking for `-fgpu-rdc` (#81700)Yaxun (Sam) Liu1-11/+11
`-fgpu-rdc` mode allows device functions call device functions in different TU. However, currently all device objects have to be linked together since only one fat binary is supported. This is time consuming for AMDGPU backend since it only supports LTO. There are use cases that objects can be divided into groups in which device functions are self-contained but host functions are not. It is desirable to link/optimize/codegen the device code and generate a fatbin for each group, whereas partially link the host code with `ld -r` or generate a static library by using the `--emit-static-lib` option of clang. This avoids linking all device code together, therefore decreases the linking time for `-fgpu-rdc`. Previously, clang emits an external symbol `__hip_fatbin` for all objects for `-fgpu-rdc`. With this patch, clang emits an unique external symbol `__hip_fatbin_{cuid}` for the fat binary for each object. When a group of objects are linked together to generate a fatbin, the symbols are merged by alias and point to the same fat binary. Each group has its own fat binary. One executable or shared library can have multiple fat binaries. Device linking is done for undefined fab binary symbols only to avoid repeated linking. `__hip_gpubin_handle` is also uniquefied and merged to avoid repeated registering. Symbol `__hip_cuid_{cuid}` is introduced to facilitate debugging and tooling. Fixes: https://github.com/llvm/llvm-project/issues/77018
2024-01-20[clang] Use SmallString::operator std::string (NFC)Kazu Hirata1-1/+1
2023-12-16[clang][CGCUDANV] Unify PointerType members of CGNVCUDARuntime (NFC) (#75668)Youngsuk Kim1-50/+38
Unify 3 `Pointertype *` members which all refer to the same llvm type. Opaque pointer clean-up effort.
2023-12-07[CUDA][HIP] Improve variable registration with the new driver (#73177)Joseph Huber1-7/+20
Summary: This patch adds support for registering texture / surface variables from CUDA / HIP. Additionally, we now properly track the `extern` and `const` flags that are also used in these runtime functions. This does not implement the `managed` variables yet as those seem to require some extra handling I'm not familiar with. The issue is that the current offload entry isn't large enough to carry size and alignment information along with an extra global.
2023-10-25[clang] Remove redundant ptr-to-ptr bitcasts (NFC)Youngsuk Kim1-17/+13
Remove redundant bitcasts performed on instances of `llvm::GlobalVariable`, which are pointers. Opaque pointer cleanup effort. NFC.
2023-10-25[Offloading][NFC] Move creation of offloading entries from OpenMP (#70116)Joseph Huber1-14/+14
Summary: This patch is a first step to remove dependencies on the OpenMPIRBuilder for creating generic offloading entries. This patch changes no functionality and merely moves the code around. In the future the interface will be changed to allow for more code re-use in the registration and creation of offloading entries as well as a more generic interface for CUDA, HIP, OpenMP, and SYCL(?). Doing this as a first step to reduce the noise involved in the functional changes.
2023-09-25[clang][CodeGen] Simplify code based on opaque pointers (#65624)Björn Pettersson1-6/+3
- Update CodeGenTypeCache to use a single union for all pointers in address space zero. - Introduce a UnqualPtrTy in CodeGenTypeCache, and use that (for example instead of llvm::PointerType::getUnqual) in some places. - Drop some redundant bit/pointers casts from ptr to ptr.
2023-09-14[HIP] Fix comdat of template kernel handle (#66283)Yaxun (Sam) Liu1-1/+4
Currently, clang emits LLVM IR that fails verifier for the following code: ``` template<typename T> __global__ void foo(T x); void bar() { foo<<<1, 1>>>(0); } ``` This is due to clang putting the kernel handle for foo into comdat, which is not allowed, since the kernel handle is a declaration. The siutation is similar to calling a declaration-only template function. The callee will be a declaration in LLVM IR and won't be put into comdat. This is in contrast to calling a template function with body, which will be put into comdat. Fixes: SWDEV-419769
2023-07-13[clang] Support '-fgpu-default-stream=per-thread' for NVIDIA CUDAboxu.zhang1-3/+7
I'm using clang to compile CUDA code. And just found that clang doesn't support the per-thread stream option for NV CUDA. I don't know if there is another solution. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D154822
2023-06-16[clang] Replace use of Type::getPointerTo() (NFC)Youngsuk Kim1-8/+9
Partial progress towards replacing in-tree uses of `Type::getPointerTo()`. This needs to be done before deprecating the API. Reviewed By: nikic, barannikov88 Differential Revision: https://reviews.llvm.org/D152321
2023-05-27[clang-repl][CUDA] Re-land: Initial interactive CUDA support for clang-replAnubhab Ghosh1-2/+4
CUDA support can be enabled in clang-repl with --cuda flag. Device code linking is not yet supported. inline must be used with all __device__ functions. Differential Revision: https://reviews.llvm.org/D146389
2023-05-20Revert "[clang-repl][CUDA] Initial interactive CUDA support for clang-repl"Anubhab Ghosh1-4/+2
This reverts commit 80e7eed6a610ab3c7289e6f9b7ec006bc7d7ae31.
2023-05-20[clang-repl][CUDA] Initial interactive CUDA support for clang-replAnubhab Ghosh1-2/+4
CUDA support can be enabled in clang-repl with --cuda flag. Device code linking is not yet supported. inline must be used with all __device__ functions. Differential Revision: https://reviews.llvm.org/D146389
2023-03-21[CUDA] Update cached kernel handle when the function instance changes.Artem Belevich1-2/+17
Fixes clang crash caused by a stale function pointer. The bug has been present for a pretty long time, but we were lucky not to trigger it until D140663. Differential Revision: https://reviews.llvm.org/D146448
2023-01-19CUDA/HIP: Use kernel name to map to symbolDaniele Castagna1-12/+13
Currently CGCUDANV uses an llvm::Function as a key to map kernels to a symbol in host code. HIP adds one level of indirection and uses the llvm::Function to map to a global variable that will be initialized to the kernel stub ptr. Unfortunately there is no garantee that the llvm::Function created by GetOrCreateLLVMFunction will be the same. In fact, the first time we encounter GetOrCrateLLVMFunction for a kernel, the type might not be completed yet, and the type of llvm::Function will be a generic {}, since the complete type is not required to get a symbol to a function. In this case we end up creating two global variables, one for the llvm::Function with the incomplete type and one for the function with the complete type. The first global variable will be declared by not defined, resulting in a linking error. This change uses the mangled name of the llvm::Function as key in the KernelHandles map, in this way the same llvm::Function will be associated to the same kernel handle even if they types are different. Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D140663
2022-10-17Do not append terminating NUL to the binary string with embedded fatbin.Artem Belevich1-17/+29
Extra NUL does not impact functionality of the generated code, but it confuses various NVIDIA tools used to examine embedded GPU binaries. Differential Revision: https://reviews.llvm.org/D135832
2022-07-13[CUDA] Allow the new driver to compile CUDA in non-RDC modeJoseph Huber1-4/+4
The new driver primarily allows us to support RDC-mode compilations with proper linking. This is not needed for non-RDC mode compilation, but we still would like the new driver to be able to handle this mode so we can transition away from the old driver in the future. This patch adds the necessary code to support creating a fatbinary for CUDA code generation as well as removing old assumptions and errors about RDC-mode with the new driver. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D129655
2022-07-11[HIP] Generate offloading entries for HIP with the new driver.Joseph Huber1-1/+2
This patch adds the small change required to output offloading entried for HIP instead of CUDA. These should be placed in different sections so because they need to be distinct to the offloading toolchain, otherwise we'd have HIP trying to register CUDA kernels or vice-versa. This patch will precede support for HIP in the linker wrapper. Reviewed By: yaxunl, tra Differential Revision: https://reviews.llvm.org/D128850
2022-05-26[Cuda] Use fallback method to mangle externalized decls if no CUID givenJoseph Huber1-2/+1
CUDA requires that static variables be visible to the host when offloading. However, The standard semantics of a stiatc variable dictate that it should not be visible outside of the current file. In order to access it from the host we need to perform "externalization" on the static variable on the device. This requires generating a semi-unique name that can be affixed to the variable as to not cause linker errors. This is currently done using the CUID functionality, an MD5 hash value set up by the clang driver. This allows us to achieve is mostly unique ID that is unique even between multiple compilations of the same file. However, this is not always availible. Instead, this patch uses the unique ID from the file to generate a unique symbol name. This will create a unique name that is consistent between the host and device side compilations without requiring the CUID to be entered by the driver. The one downside to this is that we are no longer stable under multiple compilations of the same file. However, this is a very niche use-case and is not supported by Nvidia's CUDA compiler so it likely to be good enough. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D125904
2022-05-11[CUDA] Create offloading entries when using the new driverJoseph Huber1-2/+43
The changes made in D123460 generalized the code generation for OpenMP's offloading entries. We can use the same scheme to register globals for CUDA code. This patch adds the code generation to create these offloading entries when compiling using the new offloading driver mode. The offloading entries are simple structs that contain the information necessary to register the global. The struct used is as follows: ``` Type struct __tgt_offload_entry { void *addr; // Pointer to the offload entry info. // (function or global) char *name; // Name of the function or global. size_t size; // Size of the entry info (0 if it a function). int32_t flags; int32_t reserved; }; ``` Currently CUDA handles RDC code generation by deferring the registration of globals in the current TU to a callback function containing the modules ID. Later all the module IDs will be used to register all of the globals at once. Rather than mimic this, offloading entries allow us to mimic the way OpenMP registers globals. That is, we create a simple global struct for each device global to be registered. These are placed at a special section `cuda_offloading_entires`. Because this section is a valid C-identifier, the linker will profide a `__start` and `__stop` pointer that we can use to iterate and register all globals at runtime. the registration requires a flag variable to indicate which registration function to use. I have assigned the flags somewhat arbitrarily, but these use the following values. Kernel: 0 Variable: 0 Managed: 1 Surface: 2 Texture: 3 Depends on D120272 Reviewed By: tra Differential Revision: https://reviews.llvm.org/D123471
2022-05-04[NFC][CUDA][HIP] rework mangling number for aux targetYaxun (Sam) Liu1-3/+0
CUDA/HIP needs to mangle for aux target. When mangling for aux target, the mangler should use mangling number for aux target. Previously in https://reviews.llvm.org/D122734 a state was introduced in ASTContext to let the mangler get mangling number for aux target from ASTContext. This patch removes that state from ASTConext and add an IsAux member to MangleContext to indicate that the mangle context is for aux target. This reflects the reality that the mangle context is created for mangling aux target and makes ASTContext cleaner. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D124842
2022-04-28[CUDA][HIP] Fix mangling number for local structYaxun (Sam) Liu1-0/+3
MSVC and Itanium mangling use different mangling numbers for function-scope structs, which causes inconsistent mangled kernel names in device and host compilations. This patch uses Itanium mangling number for structs in for mangling device side names in CUDA/HIP host compilation on Windows to fix this issue. A state is added to ASTContext to indicate whether the current name mangling is for device side names in host compilation. Device and host mangling number are encoded/decoded as upper and lower half of 32 bit unsigned integer to fit into the original mangling number field for AST. Diagnostic will be emitted if a manglining number exceeds limit. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D122734 Fixes: SWDEV-328515
2022-04-10[CUDA][HIP] Externalize kernels in anonymous name spaceYaxun (Sam) Liu1-2/+2
kernels in anonymous name space needs to have unique name to avoid duplicate symbols. Fixes: https://github.com/llvm/llvm-project/issues/54560 Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D123353
2022-04-09[CUDA/HIP] Remove argument from module ctor/dtor signaturesJonas Hahnfeld1-6/+6
In theory, constructors can take arguments when called via .init_array where at least glibc passes in (argc, argv, envp). This isn't used in the generated code and if it was, the first argument should be an integer, not a pointer. For destructors registered via atexit, the function should never take an argument. Differential Revision: https://reviews.llvm.org/D123370
2022-03-22[CodeGen] Remove some uses of deprecated Address constructorNikita Popov1-3/+3
Remove two stray uses in CodeGenModule and CGCUDANV.
2022-02-23[HIP] Support `-fgpu-default-stream`Yaxun (Sam) Liu1-4/+11
Introduce -fgpu-default-stream={legacy|per-thread} option to support per-thread default stream for HIP runtime. When -fgpu-default-stream=per-thread, HIP kernels are launched through hipLaunchKernel_spt instead of hipLaunchKernel. Also HIP_API_PER_THREAD_DEFAULT_STREAM=1 is defined by the preprocessor to enable other per-thread stream API's. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D120298
2022-02-17[CodeGen] Rename deprecated Address constructorNikita Popov1-3/+4
To make uses of the deprecated constructor easier to spot, and to ensure that no new uses are introduced, rename it to Address::deprecated(). While doing the rename, I've filled in element types in cases where it was relatively obvious, but we're still left with 135 calls to the deprecated constructor.
2021-12-06[HIP] Fix -fgpu-rdc for WindowsYaxun (Sam) Liu1-0/+3
This patch fixes issues for -fgpu-rdc for Windows MSVC toolchain: Fix COFF specific section flags and remove section types in llvm-mc input file for Windows. Escape fatbin path in llvm-mc input file. Add -triple option to llvm-mc. Put __hip_gpubin_handle in comdat when it has linkonce_odr linkage. Reviewed by: Artem Belevich Differential Revision: https://reviews.llvm.org/D115039