[LinkerWrapper] Support relocatable linking for offloading (#80066)

Summary: The standard GPU compilation process embeds each intermediate object file into the host file at the `.llvm.offloading` section so it can be linked later. We also use a special section called something like `omp_offloading_entries` to store all the globals that need to be registered by the runtime. The linker-wrapper's job is to link the embedded device code stored at this section and then emit code to register the linked image and the kernels and globals in the offloading entry section. One downside to RDC linking is that it can become quite big for very large projects that wish to make use of static linking. This patch changes the support for relocatable linking via `-r` to support a kind of "partial" RDC compilation for offloading languages. This primarily requires manually editing the embedded data in the output object file for the relocatable link. We need to rename the output section to make it distinct from the input sections that will be merged. We then delete the old embedded object code so it won't be linked further. We then need to rename the old offloading section so that it is private to the module. A runtime solution could also be done to defer entries that don't belong to the given GPU executable, but this is easier. Note that this does not work with COFF linking, only the ELF method for handling offloading entries, that could be made to work similarly. Given this support, the following compilation path should produce two distinct images for OpenMP offloading. ``` $ clang foo.c -fopenmp --offload-arch=native -c $ clang foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang main.c merged.o -fopenmp --offload-arch=native $ ./a.out ``` Or similarly for HIP to effectively perform non-RDC mode compilation for a subset of files. ``` $ clang -x hip foo.c --offload-arch=native --offload-new-driver -fgpu-rdc -c $ clang -x hip foo.c -lomptarget.devicertl --offload-link -r -o merged.o $ clang -x hip main.c merged.o --offload-arch=native --offload-new-driver -fgpu-rdc $ ./a.out ``` One question is whether or not this should be the default behavior of `-r` when run through the linker-wrapper or a special option. Standard `-r` behavior is still possible if used without invoking the linker-wrapper and it guaranteed to be correct.
author: Joseph Huber <huberjn@outlook.com> 2024-02-07 08:20:07 -0600
committer: GitHub <noreply@github.com> 2024-02-07 08:20:07 -0600
commit: 5c84054223102b00cc0dd343a4023e3c6cba42b2 (patch)
tree: a927883f2f49a3e5af31043b51f501aac420aac6 /clang/docs/ClangLinkerWrapper.rst
parent: 52bf531630d19e115d30b4ca46f1ef03b9a724c6 (diff)
download: llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.zip
llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.tar.gz
llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.tar.bz2
1 files changed, 18 insertions, 0 deletions
diff --git a/clang/docs/ClangLinkerWrapper.rst b/clang/docs/ClangLinkerWrapper.rst
index fbabb4f..1e851b0 100644
--- a/clang/docs/ClangLinkerWrapper.rst
+++ b/clang/docs/ClangLinkerWrapper.rst
@@ -54,12 +54,30 @@ only for the linker wrapper will be forwarded to the wrapped linker job.
     --pass-remarks=<value> Pass remarks for LTO
     --print-wrapped-module Print the wrapped module's IR for testing
     --ptxas-arg=<value>    Argument to pass to the 'ptxas' invocation
+    --relocatable           Link device code to create a relocatable offloading application
     --save-temps           Save intermediate results
     --sysroot<value>       Set the system root
     --verbose              Verbose output from tools
     --v                    Display the version number and exit
     --                     The separator for the wrapped linker arguments
 
+Relocatable Linking
+===================
+
+The ``clang-linker-wrapper`` handles linking embedded device code and then 
+registering it with the appropriate runtime. Normally, this is only done when 
+the executable is created so other files containing device code can be linked 
+together. This can be somewhat problematic for users who wish to ship static 
+libraries that contain offloading code to users without a compatible offloading 
+toolchain.
+
+When using a relocatable link with ``-r``, the ``clang-linker-wrapper`` will 
+perform the device linking and registration eagerly. This will remove the 
+embedded device code and register it correctly with the runtime. Semantically, 
+this is similar to creating a shared library object. If standard relocatable 
+linking is desired, simply do not run the binaries through the 
+``clang-linker-wrapper``. This will simply append the embedded device code so 
+that it can be linked later.
 
 Example
 =======
author	Joseph Huber <huberjn@outlook.com>	2024-02-07 08:20:07 -0600
committer	GitHub <noreply@github.com>	2024-02-07 08:20:07 -0600
commit	5c84054223102b00cc0dd343a4023e3c6cba42b2 (patch)
tree	a927883f2f49a3e5af31043b51f501aac420aac6 /clang/docs/ClangLinkerWrapper.rst
parent	52bf531630d19e115d30b4ca46f1ef03b9a724c6 (diff)
download	llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.zip llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.tar.gz llvm-5c84054223102b00cc0dd343a4023e3c6cba42b2.tar.bz2