diff options
author | Olexa Bilaniuk <obilaniu@gmail.com> | 2019-01-28 05:29:04 -0500 |
---|---|---|
committer | Olexa Bilaniuk <obilaniu@gmail.com> | 2019-02-02 22:49:02 -0500 |
commit | 592af0b1afa03b10beffee481c00c60a7b7db907 (patch) | |
tree | e2e2f27733b1357f1078b675ed6cfb19f6ae5181 /docs/markdown/Cuda-module.md | |
parent | ad442b352083a48f97f3f47834b770da1ef1248b (diff) | |
download | meson-592af0b1afa03b10beffee481c00c60a7b7db907.zip meson-592af0b1afa03b10beffee481c00c60a7b7db907.tar.gz meson-592af0b1afa03b10beffee481c00c60a7b7db907.tar.bz2 |
Add unstable CUDA module.
Includes three general utility functions connected to CUDA, in
particular the crafting of -gencode flags as done in CMake:
https://github.com/Kitware/CMake/blob/master/Modules/FindCUDA/
select_compute_arch.cmake
Diffstat (limited to 'docs/markdown/Cuda-module.md')
-rw-r--r-- | docs/markdown/Cuda-module.md | 183 |
1 files changed, 183 insertions, 0 deletions
diff --git a/docs/markdown/Cuda-module.md b/docs/markdown/Cuda-module.md new file mode 100644 index 0000000..6e7be47 --- /dev/null +++ b/docs/markdown/Cuda-module.md @@ -0,0 +1,183 @@ +--- +short-description: CUDA module +authors: + - name: Olexa Bilaniuk + years: [2019] + has-copyright: false +... + +# Unstable CUDA Module (`unstable-cuda`) +_Since: 0.50.0_ + +This module provides helper functionality related to the CUDA Toolkit and +building code using it. + + +**Note**: this module is unstable. It is only provided as a technology preview. +Its API may change in arbitrary ways between releases or it might be removed +from Meson altogether. + + +## Importing the module + +The module may be imported as follows: + +``` meson +cuda = import('unstable-cuda') +``` + +It offers several useful functions that are enumerated below. + + +## Functions + +### `nvcc_arch_flags()` +_Since: 0.50.0_ + +``` meson +cuda.nvcc_arch_flags(nvcc_or_version, ..., + detected: string_or_array) +``` + +Returns a list of `-gencode` flags that should be passed to `cuda_args:` in +order to compile a "fat binary" for the architectures/compute capabilities +enumerated in the positional argument(s). The flags shall be acceptable to +the NVCC compiler object `nvcc_or_version`, or its version string. + +A set of architectures and/or compute capabilities may be specified by: + +- The single positional argument `'All'`, `'Common'` or `'Auto'` +- As (an array of) + - Architecture names (`'Kepler'`, `'Maxwell+Tegra'`, `'Turing'`) and/or + - Compute capabilities (`'3.0'`, `'3.5'`, `'5.3'`, `'7.5'`) + +A suffix of `+PTX` requests PTX code generation for the given architecture. +A compute capability given as `A.B(X.Y)` requests PTX generation for an older +virtual architecture `X.Y` before binary generation for a newer architecture +`A.B`. + +Multiple architectures and compute capabilities may be passed in using + +- Multiple positional arguments +- Lists of strings +- Space (` `), comma (`,`) or semicolon (`;`)-separated strings + +The single-word architectural sets `'All'`, `'Common'` or `'Auto'` cannot be +mixed with architecture names or compute capabilities. Their interpretation is: + +| Name | Compute Capability | +|-------------------|--------------------| +| `'All'` | All CCs supported by given NVCC compiler. | +| `'Common'` | Relatively common CCs supported by given NVCC compiler. Generally excludes Tegra and Tesla devices. | +| `'Auto'` | The CCs provided by the `detected:` keyword, filtered for support by given NVCC compiler. | + +The supported architecture names and their corresponding compute capabilities +are: + +| Name | Compute Capability | +|-------------------|--------------------| +| `'Fermi'` | 2.0, 2.1(2.0) | +| `'Kepler'` | 3.0, 3.5 | +| `'Kepler+Tegra'` | 3.2 | +| `'Kepler+Tesla'` | 3.7 | +| `'Maxwell'` | 5.0, 5.2 | +| `'Maxwell+Tegra'` | 5.3 | +| `'Pascal'` | 6.0, 6.1 | +| `'Pascal+Tegra'` | 6.2 | +| `'Volta'` | 7.0 | +| `'Volta+Tegra'` | 7.2 | +| `'Turing'` | 7.5 | + + +Examples: + + cuda.nvcc_arch_flags('10.0', '3.0', '3.5', '5.0+PTX') + cuda.nvcc_arch_flags('10.0', ['3.0', '3.5', '5.0+PTX']) + cuda.nvcc_arch_flags('10.0', [['3.0', '3.5'], '5.0+PTX']) + cuda.nvcc_arch_flags('10.0', '3.0 3.5 5.0+PTX') + cuda.nvcc_arch_flags('10.0', '3.0,3.5,5.0+PTX') + cuda.nvcc_arch_flags('10.0', '3.0;3.5;5.0+PTX') + cuda.nvcc_arch_flags('10.0', 'Kepler 5.0+PTX') + # Returns ['-gencode', 'arch=compute_30,code=sm_30', + # '-gencode', 'arch=compute_35,code=sm_35', + # '-gencode', 'arch=compute_50,code=sm_50', + # '-gencode', 'arch=compute_50,code=compute_50'] + + cuda.nvcc_arch_flags('10.0', '3.5(3.0)') + # Returns ['-gencode', 'arch=compute_30,code=sm_35'] + + cuda.nvcc_arch_flags('8.0', 'Common') + # Returns ['-gencode', 'arch=compute_30,code=sm_30', + # '-gencode', 'arch=compute_35,code=sm_35', + # '-gencode', 'arch=compute_50,code=sm_50', + # '-gencode', 'arch=compute_52,code=sm_52', + # '-gencode', 'arch=compute_60,code=sm_60', + # '-gencode', 'arch=compute_61,code=sm_61', + # '-gencode', 'arch=compute_61,code=compute_61'] + + cuda.nvcc_arch_flags('9.2', 'Auto', detected: '6.0 6.0 6.0 6.0') + cuda.nvcc_arch_flags('9.2', 'Auto', detected: ['6.0', '6.0', '6.0', '6.0']) + # Returns ['-gencode', 'arch=compute_60,code=sm_60'] + + cuda.nvcc_arch_flags(nvcc, 'All') + # Returns ['-gencode', 'arch=compute_20,code=sm_20', + # '-gencode', 'arch=compute_20,code=sm_21', + # '-gencode', 'arch=compute_30,code=sm_30', + # '-gencode', 'arch=compute_32,code=sm_32', + # '-gencode', 'arch=compute_35,code=sm_35', + # '-gencode', 'arch=compute_37,code=sm_37', + # '-gencode', 'arch=compute_50,code=sm_50', # nvcc.version() < 7.0 + # '-gencode', 'arch=compute_52,code=sm_52', + # '-gencode', 'arch=compute_53,code=sm_53', # nvcc.version() >= 7.0 + # '-gencode', 'arch=compute_60,code=sm_60', + # '-gencode', 'arch=compute_61,code=sm_61', # nvcc.version() >= 8.0 + # '-gencode', 'arch=compute_70,code=sm_70', + # '-gencode', 'arch=compute_72,code=sm_72', # nvcc.version() >= 9.0 + # '-gencode', 'arch=compute_75,code=sm_75'] # nvcc.version() >= 10.0 + +_Note:_ This function is intended to closely replicate CMake's FindCUDA module +function `CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])` + + + +### `nvcc_arch_readable()` +_Since: 0.50.0_ + +``` meson +cuda.nvcc_arch_readable(nvcc_or_version, ..., + detected: string_or_array) +``` + +Has precisely the same interface as [`nvcc_arch_flags()`](#nvcc_arch_flags), +but rather than returning a list of flags, it returns a "readable" list of +architectures that will be compiled for. The output of this function is solely +intended for informative message printing. + + archs = '3.0 3.5 5.0+PTX' + readable = cuda.nvcc_arch_readable(nvcc, archs) + message('Building for architectures ' + ' '.join(readable)) + +This will print + + Message: Building for architectures sm30 sm35 sm50 compute50 + +_Note:_ This function is intended to closely replicate CMake's FindCUDA module function +`CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable, [list of CUDA compute architectures])` + + + +### `min_driver_version()` +_Since: 0.50.0_ + +``` meson +cuda.min_driver_version(nvcc_or_version) +``` + +Returns the minimum NVIDIA proprietary driver version required, on the host +system, by kernels compiled with the given NVCC compiler or its version string. + +The output of this function is generally intended for informative message +printing, but could be used for assertions or to conditionally enable +features known to exist within the minimum NVIDIA driver required. + + |