aboutsummaryrefslogtreecommitdiff
path: root/docs/markdown/Simd-module.md
blob: 0fd1dda701663a95c822ac100cb5226229ede9d1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Unstable SIMD module

This module provides helper functionality to build code with SIMD instructions.
Available since 0.42.0.

**Note**: this module is unstable. It is only provided as a technology preview.
Its API may change in arbitrary ways between releases or it might be removed
from Meson altogether.

## Usage

This module is designed for the use case where you have an algorithm with one
or more SIMD implementation and you choose which one to use at runtime.

The module provides one method, `check`, which is used like this:

    rval = simd.check('mysimds',
      mmx : 'simd_mmx.c',
      sse : 'simd_sse.c',
      sse2 : 'simd_sse2.c',
      sse3 : 'simd_sse3.c',
      ssse3 : 'simd_ssse3.c',
      sse41 : 'simd_sse41.c',
      sse42 : 'simd_sse42.c',
      avx : 'simd_avx.c',
      avx2 : 'simd_avx2.c',
      neon : 'simd_neon.c',
      compiler : cc)

Here the individual files contain the accelerated versions of the functions
in question. The `compiler` keyword argument takes the compiler you are
going to use to compile them. The function returns an array with two values.
The first value is a bunch of libraries that contain the compiled code. Any
SIMD code that the compiler can't compile (for example, Neon instructions on
an x86 machine) are ignored. You should pass this value to the desired target
using `link_with`. The second value is a `configuration_data` object that
contains true for all the values that were supported. For example if the
compiler did support sse2 instructions, then the object would have `HAVE_SSE2`
set to 1.

Generating code to detect the proper instruction set at runtime is
straightforward. First you create a header with the configuration object and
then a chooser function that looks like this:

    void (*fptr)(type_of_function_here)  = NULL;

    #if HAVE_NEON
    if(fptr == NULL && neon_available()) {
        fptr = neon_accelerated_function;
    }
    #endif
    #if HAVE_AVX2
    if(fptr == NULL && avx2_available()) {
        fptr = avx_accelerated_function;
    }
    #endif

    ...

    if(fptr == NULL) {
        fptr = default_function;
    }

Each source file provides two functions, the `xxx_available` function to query
whether the CPU currently in use supports the instruction set and
`xxx_accelerated_function` that is the corresponding accelerated
implementation.

At the end of this function the function pointer points to the fastest
available implementation and can be invoked to do the computation.