aboutsummaryrefslogtreecommitdiff
path: root/libgomp/plugin/cuda-lib.def
diff options
context:
space:
mode:
authorCesar Philippidis <cesar@codesourcery.com>2018-08-13 05:04:24 -0700
committerTom de Vries <vries@gcc.gnu.org>2018-08-13 12:04:24 +0000
commitbd9b3d3d1a8d33e460ae137da0cb0d5a919e8f8f (patch)
tree0729abf2a8dd6b64d2d8313c3c16b16b428c15ab /libgomp/plugin/cuda-lib.def
parentcdf899781c7321987a9948e5ca0847e8b38da798 (diff)
downloadgcc-bd9b3d3d1a8d33e460ae137da0cb0d5a919e8f8f.zip
gcc-bd9b3d3d1a8d33e460ae137da0cb0d5a919e8f8f.tar.gz
gcc-bd9b3d3d1a8d33e460ae137da0cb0d5a919e8f8f.tar.bz2
[nvptx] Use CUDA driver API to select default runtime launch geometry
The CUDA driver API starting version 6.5 offers a set of runtime functions to calculate several occupancy-related measures, as a replacement for the occupancy calculator spreadsheet. This patch adds a heuristic for default runtime launch geometry, based on the new runtime function cuOccupancyMaxPotentialBlockSize. Build on x86_64 with nvptx accelerator and ran libgomp testsuite. 2018-08-13 Cesar Philippidis <cesar@codesourcery.com> Tom de Vries <tdevries@suse.de> PR target/85590 * plugin/cuda/cuda.h (CUoccupancyB2DSize): New typedef. (cuOccupancyMaxPotentialBlockSize): Declare. * plugin/cuda-lib.def (cuOccupancyMaxPotentialBlockSize): New CUDA_ONE_CALL_MAYBE_NULL. * plugin/plugin-nvptx.c (CUDA_VERSION < 6050): Define CUoccupancyB2DSize and declare cuOccupancyMaxPotentialBlockSize. (nvptx_exec): Use cuOccupancyMaxPotentialBlockSize to set the default num_gangs and num_workers when the driver supports it. Co-Authored-By: Tom de Vries <tdevries@suse.de> From-SVN: r263505
Diffstat (limited to 'libgomp/plugin/cuda-lib.def')
-rw-r--r--libgomp/plugin/cuda-lib.def1
1 files changed, 1 insertions, 0 deletions
diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 29028b5..b2a4c21 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -41,6 +41,7 @@ CUDA_ONE_CALL (cuModuleGetGlobal)
CUDA_ONE_CALL (cuModuleLoad)
CUDA_ONE_CALL (cuModuleLoadData)
CUDA_ONE_CALL (cuModuleUnload)
+CUDA_ONE_CALL_MAYBE_NULL (cuOccupancyMaxPotentialBlockSize)
CUDA_ONE_CALL (cuStreamCreate)
CUDA_ONE_CALL (cuStreamDestroy)
CUDA_ONE_CALL (cuStreamQuery)