aboutsummaryrefslogtreecommitdiff
path: root/clang/docs/AllocToken.rst
blob: bda84669456cef3bbaea7d237720d44318e2f2ca (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
=================
Allocation Tokens
=================

.. contents::
   :local:

Introduction
============

Clang provides support for allocation tokens to enable allocator-level heap
organization strategies. Clang assigns mode-dependent token IDs to allocation
calls; the runtime behavior depends entirely on the implementation of a
compatible memory allocator.

Possible allocator strategies include:

* **Security Hardening**: Placing allocations into separate, isolated heap
  partitions. For example, separating pointer-containing types from raw data
  can mitigate exploits that rely on overflowing a primitive buffer to corrupt
  object metadata.

* **Memory Layout Optimization**: Grouping related allocations to improve data
  locality and cache utilization.

* **Custom Allocation Policies**: Applying different management strategies to
  different partitions.

Token Assignment Mode
=====================

The default mode to calculate tokens is:

* ``typehashpointersplit``: This mode assigns a token ID based on the hash of
  the allocated type's name, where the top half ID-space is reserved for types
  that contain pointers and the bottom half for types that do not contain
  pointers.

Other token ID assignment modes are supported, but they may be subject to
change or removal. These may (experimentally) be selected with ``-mllvm
-alloc-token-mode=<mode>``:

* ``typehash``: This mode assigns a token ID based on the hash of the allocated
  type's name.

* ``random``: This mode assigns a statically-determined random token ID to each
  allocation site.

* ``increment``: This mode assigns a simple, incrementally increasing token ID
  to each allocation site.

Allocation Token Instrumentation
================================

To enable instrumentation of allocation functions, code can be compiled with
the ``-fsanitize=alloc-token`` flag:

.. code-block:: console

    % clang++ -fsanitize=alloc-token example.cc

The instrumentation transforms allocation calls to include a token ID. For
example:

.. code-block:: c

    // Original:
    ptr = malloc(size);

    // Instrumented:
    ptr = __alloc_token_malloc(size, <token id>);

The following command-line options affect generated token IDs:

* ``-falloc-token-max=<N>``
    Configures the maximum number of tokens. No max by default (tokens bounded
    by ``SIZE_MAX``).

    .. code-block:: console

        % clang++ -fsanitize=alloc-token -falloc-token-max=512 example.cc

Runtime Interface
-----------------

A compatible runtime must be provided that implements the token-enabled
allocation functions. The instrumentation generates calls to functions that
take a final ``size_t token_id`` argument.

.. code-block:: c

    // C standard library functions
    void *__alloc_token_malloc(size_t size, size_t token_id);
    void *__alloc_token_calloc(size_t count, size_t size, size_t token_id);
    void *__alloc_token_realloc(void *ptr, size_t size, size_t token_id);
    // ...

    // C++ operators (mangled names)
    // operator new(size_t, size_t)
    void *__alloc_token__Znwm(size_t size, size_t token_id);
    // operator new[](size_t, size_t)
    void *__alloc_token__Znam(size_t size, size_t token_id);
    // ... other variants like nothrow, etc., are also instrumented.

Fast ABI
--------

An alternative ABI can be enabled with ``-fsanitize-alloc-token-fast-abi``,
which encodes the token ID hint in the allocation function name.

.. code-block:: c

    void *__alloc_token_0_malloc(size_t size);
    void *__alloc_token_1_malloc(size_t size);
    void *__alloc_token_2_malloc(size_t size);
    ...
    void *__alloc_token_0_Znwm(size_t size);
    void *__alloc_token_1_Znwm(size_t size);
    void *__alloc_token_2_Znwm(size_t size);
    ...

This ABI provides a more efficient alternative where
``-falloc-token-max`` is small.

Instrumenting Non-Standard Allocation Functions
-----------------------------------------------

By default, AllocToken only instruments standard library allocation functions.
This simplifies adoption, as a compatible allocator only needs to provide
token-enabled variants for a well-defined set of standard functions.

To extend instrumentation to custom allocation functions, enable broader
coverage with ``-fsanitize-alloc-token-extended``. Such functions require being
marked with the `malloc
<https://clang.llvm.org/docs/AttributeReference.html#malloc>`_ or `alloc_size
<https://clang.llvm.org/docs/AttributeReference.html#alloc-size>`_ attributes
(or a combination).

For example:

.. code-block:: c

    void *custom_malloc(size_t size) __attribute__((malloc));
    void *my_malloc(size_t size) __attribute__((alloc_size(1)));

    // Original:
    ptr1 = custom_malloc(size);
    ptr2 = my_malloc(size);

    // Instrumented:
    ptr1 = __alloc_token_custom_malloc(size, token_id);
    ptr2 = __alloc_token_my_malloc(size, token_id);

Disabling Instrumentation
-------------------------

To exclude specific functions from instrumentation, you can use the
``no_sanitize("alloc-token")`` attribute:

.. code-block:: c

    __attribute__((no_sanitize("alloc-token")))
    void* custom_allocator(size_t size) {
        return malloc(size);  // Uses original malloc
    }

Note: Independent of any given allocator support, the instrumentation aims to
remain performance neutral. As such, ``no_sanitize("alloc-token")``
functions may be inlined into instrumented functions and vice-versa. If
correctness is affected, such functions should explicitly be marked
``noinline``.

The ``__attribute__((disable_sanitizer_instrumentation))`` is also supported to
disable this and other sanitizer instrumentations.

Suppressions File (Ignorelist)
------------------------------

AllocToken respects the ``src`` and ``fun`` entity types in the
:doc:`SanitizerSpecialCaseList`, which can be used to omit specified source
files or functions from instrumentation.

.. code-block:: bash

    [alloc-token]
    # Exclude specific source files
    src:third_party/allocator.c
    # Exclude function name patterns
    fun:*custom_malloc*
    fun:LowLevel::*

.. code-block:: console

    % clang++ -fsanitize=alloc-token -fsanitize-ignorelist=my_ignorelist.txt example.cc

Conditional Compilation with ``__SANITIZE_ALLOC_TOKEN__``
-----------------------------------------------------------

In some cases, one may need to execute different code depending on whether
AllocToken instrumentation is enabled. The ``__SANITIZE_ALLOC_TOKEN__`` macro
can be used for this purpose.

.. code-block:: c

    #ifdef __SANITIZE_ALLOC_TOKEN__
    // Code specific to -fsanitize=alloc-token builds
    #endif