aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTobias Burnus <tobias@codesourcery.com>2023-05-05 11:27:32 +0200
committerTobias Burnus <tobias@codesourcery.com>2023-05-05 11:27:32 +0200
commit4359724cba31b2645f6106266bef019c3d6ef16a (patch)
tree7febe16cf9c9a785c8e6f9d880e5397019ea20e6
parent21cf5ec1993f33d7993559db25bc14c1fa57d790 (diff)
downloadgcc-4359724cba31b2645f6106266bef019c3d6ef16a.zip
gcc-4359724cba31b2645f6106266bef019c3d6ef16a.tar.gz
gcc-4359724cba31b2645f6106266bef019c3d6ef16a.tar.bz2
nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]
Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global variables by NULL if a translation does not contain any executable code. It works with CUDA 11.1. The code of this commit is about reverse offload; having NULL values disables the side of reverse offload during image load. Solution is the same as found by Thomas for a related issue: Adding a dummy procedure. Cf. the PR of this issue and Thomas' patch "nvptx: Support global constructors/destructors via 'collect2'" https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html As that approach also works here: Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> gcc/ PR libgomp/108098 * config/nvptx/mkoffload.cc (process): Emit dummy procedure alongside reverse-offload function table to prevent NULL values of the function addresses.
-rw-r--r--gcc/config/nvptx/mkoffload.cc14
1 files changed, 14 insertions, 0 deletions
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index edb03cf..6cdea45 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -357,6 +357,20 @@ process (FILE *in, FILE *out, uint32_t omp_requires)
fputc (sm_ver2[i], out);
fprintf (out, "\"\n\t\".file 1 \\\"<dummy>\\\"\"\n");
+ /* WORKAROUND - see PR 108098
+ It seems as if older CUDA JIT compiler optimizes the function pointers
+ in offload_func_table to NULL, which can be prevented by adding a
+ dummy procedure. With CUDA 11.1, it seems to work fine without
+ workaround while CUDA 10.2 as some ancient version have need the
+ workaround. Assuming CUDA 11.0 fixes it, emitting it could be
+ restricted to 'if (sm_ver2[0] < 8 && version2[0] < 7)' as sm_80 and
+ PTX ISA 7.0 are new in CUDA 11.0; for 11.1 it would be sm_86 and
+ PTX ISA 7.1. */
+ fprintf (out, "\n\t\".func __dummy$func ( );\"\n");
+ fprintf (out, "\t\".func __dummy$func ( )\"\n");
+ fprintf (out, "\t\"{\"\n");
+ fprintf (out, "\t\"}\"\n");
+
size_t fidx = 0;
for (id = func_ids; id; id = id->next)
{