1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
|
===================================
Instrumentation Profile Format
===================================
.. contents::
:local:
Overview
=========
Clang supports two types of profiling via instrumentation [1]_: frontend-based
and IR-based, and both could support a variety of use cases [2]_ .
This document describes two binary serialization formats (raw and indexed) to
store instrumented profiles with a specific emphasis on IRPGO use case, in the
sense that when specific header fields and payload sections have different ways
of interpretation across use cases, the documentation is based on IRPGO.
.. note::
Frontend-generated profiles are used together with coverage mapping for
`source-based code coverage`_. The `coverage mapping format`_ is different from
profile format.
.. _`source-based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
Raw Profile Format
===================
The raw profile is generated by running the instrumented binary. The raw profile
data from an executable or a shared library [3]_ consists of a header and
multiple sections, with each section as a memory dump. The raw profile data needs
to be reasonably compact and fast to generate.
There are no backward or forward version compatibility guarantees for the raw profile
format. That is, compilers and tools `require`_ a specific raw profile version
to parse the profiles.
.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
To feed profiles back into compilers for an optimized build (e.g., via
``-fprofile-use`` for IR instrumentation), a raw profile must to be converted into
indexed format.
General Storage Layout
-----------------------
The storage layout of raw profile data format is illustrated below. Basically,
when the raw profile is read into an memory buffer, the actual byte offset of a
section is inferred from the section's order in the layout and size information
of all the sections ahead of it.
::
+----+-----------------------+
| | Magic |
| +-----------------------+
| | Version |
| +-----------------------+
H | Size Info for |
E | Section 1 |
A +-----------------------+
D | Size Info for |
E | Section 2 |
R +-----------------------+
| | ... |
| +-----------------------+
| | Size Info for |
| | Section N |
+----+-----------------------+
P | Section 1 |
A +-----------------------+
Y | Section 2 |
L +-----------------------+
O | ... |
A +-----------------------+
D | Section N |
+----+-----------------------+
.. note::
Sections might be padded to meet specific alignment requirements. For
simplicity, header fields and data sections solely for padding purpose are
omitted in the data layout graph above and the rest of this document.
Header
-------
``Magic``
Magic number encodes profile format (raw, indexed or text). For the raw format,
the magic number also encodes the endianness (big or little) and C pointer
size (4 or 8 bytes) of the platform on which the profile is generated.
A factory method reads the magic number to construct reader properly and returns
error upon unrecognized format. Specifically, the factory method and raw profile
reader implementation make sure that a raw profile file could be read back on
a platform with the opposite endianness and/or the other C pointer size.
``Version``
The lower 32 bits specify the actual version and the most significant 32 bits
specify the variant types of the profile. IR-based instrumentation PGO and
context-sensitive IR-based instrumentation PGO are two variant types.
``BinaryIdsSize``
The byte size of `binary id`_ section.
``NumData``
The number of profile metadata. The byte size of `profile metadata`_ section
could be computed with this field.
``NumCounter``
The number of entries in the profile counter section. The byte size of `counter`_
section could be computed with this field.
``NumBitmapBytes``
The number of bytes in the profile `bitmap`_ section.
``NamesSize``
The number of bytes in the name section.
.. _`CountersDelta`:
``CountersDelta``
This field records the in-memory address difference between the `profile metadata`_
and counter section in the instrumented binary, i.e., ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
It's used jointly with the `CounterPtr`_ field to compute the counter offset
relative to ``start(__llvm_prf_cnts)``. Check out calculation-of-counter-offset_
for a visualized explanation.
.. note::
The ``__llvm_prf_data`` object file section might not be loaded into memory
when instrumented binary runs or might not get generated in the instrumented
binary in the first place. In those cases, ``CountersDelta`` is not used and
other mechanisms are used to match counters with instrumented code. See
`lightweight instrumentation`_ and `binary profile correlation`_ for examples.
``BitmapDelta``
This field records the in-memory address difference between the `profile metadata`_
and bitmap section in the instrumented binary, i.e., ``start(__llvm_prf_bits) - start(__llvm_prf_data)``.
It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data
record, in a similar way to how counters are referenced as explained by
calculation-of-counter-offset_ .
Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants
of profiles.
``NamesDelta``
Records the in-memory address of name section. Not used except for raw profile
reader error checking.
``NumVTables``
Records the number of instrumented vtable entries in the binary. Used for
`type profiling`_.
``VNamesSize``
Records the byte size in the virtual table names section. Used for `type profiling`_.
``ValueKindLast``
Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
kinds with a description of the kind.
.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
Payload Sections
------------------
Binary Ids
^^^^^^^^^^^
Stores the binary ids of the instrumented binaries to associate binaries with
profiles for source code coverage. See `binary id`_ RFC for the design.
.. _`profile metadata`:
Profile Metadata
^^^^^^^^^^^^^^^^^^
This section stores the metadata to map counters and value profiles back to
instrumented code regions (e.g., LLVM IR for IRPGO).
The in-memory representation of the metadata is `__llvm_profile_data`_.
Some fields are used to reference data from other sections in the profile.
The fields are documented as follows:
.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95
``NameRef``
The MD5 of the function's PGO name. PGO name has the format
``[<filepath><delimiter>]<mangled-name>`` where ``<filepath>`` and
``<delimiter>`` are provided for local-linkage functions to tell possibly
identical functions.
.. _FuncHash:
``FuncHash``
A checksum of the function's IR, taking control flow graph and instrumented
value sites into accounts. See `computeCFGHash`_ for details.
.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
.. _`CounterPtr`:
``CounterPtr``
The in-memory address difference between profile data and the start of corresponding
counters. Counter position is stored this way (as a link-time constant) to reduce
instrumented binary size compared with snapshotting the address of symbols directly.
See `commit a1532ed`_ for further information.
.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
.. note::
``CounterPtr`` might represent a different value for non-IRPGO use case. For
example, for `binary profile correlation`_, it represents the absolute address of counter.
When in doubt, check source code.
.. _`BitmapPtr`:
``BitmapPtr``
The in-memory address difference between profile data and the start address of
corresponding bitmap.
.. note::
Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case.
``FunctionPointer``
Records the function address when instrumented binary runs. This is used to
map the profiled callee address of indirect calls to the ``NameRef`` during
conversion from raw to indexed profiles.
``Values``
Represents value profiles in a two dimensional array. The number of elements
in the first dimension is the number of instrumented value sites across all
kinds. Each element in the first dimension is the head of a linked list, and
the each element in the second dimension is linked list element, carrying
``<profiled-value, count>`` as payload. This is used by compiler runtime when
writing out value profiles.
.. note::
Value profiling is supported by frontend and IR PGO instrumentation,
but it's not supported in all cases (e.g., `lightweight instrumentation`_).
``NumCounters``
The number of counters for the instrumented function.
``NumValueSites``
This is an array of counters, and each counter represents the number of
instrumented sites for a kind of value in the function.
``NumBitmapBytes``
The number of bitmap bytes for the function.
.. _`counter`:
Profile Counters
^^^^^^^^^^^^^^^^^
For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
are stored contiguously and in an order that is consistent with instrumentation points selection.
.. _calculation-of-counter-offset:
As mentioned above, the recorded counter offset is relative to the profile metadata.
So how are function counters located in the raw profile data?
Basically, the profile reader iterates profile metadata (from the `profile metadata`_
section) and makes use of the recorded relative distances, as illustrated below.
::
+ --> start(__llvm_prf_data) --> +---------------------+ ------------+
| | Data 1 | |
| +---------------------+ =====|| |
| | Data 2 | || |
| +---------------------+ || |
| | ... | || |
Counter| +---------------------+ || |
Delta | | Data N | || |
| +---------------------+ || | CounterPtr1
| || |
| CounterPtr2 || |
| || |
| || |
+ --> start(__llvm_prf_cnts) --> +---------------------+ || |
| ... | || |
+---------------------+ -----||----+
| Counter for | ||
| Data 1 | ||
+---------------------+ ||
| ... | ||
+---------------------+ =====||
| Counter for |
| Data 2 |
+---------------------+
| ... |
+---------------------+
| Counter for |
| Data N |
+---------------------+
In the graph,
* The profile header records ``CounterDelta`` with the value as ``start(__llvm_prf_cnts) - start(__llvm_prf_data)``.
We will call it ``CounterDeltaInitVal`` below for convenience.
* For each profile data record ``ProfileDataN``, ``CounterPtr`` is recorded as
``start(CounterN) - start(ProfileDataN)``, where ``ProfileDataN`` is the N-th
entry in ``__llvm_prf_data``, and ``CounterN`` represents the corresponding
profile counters.
Each time the reader advances to the next data record, it `updates`_ ``CounterDelta``
to minus the size of one ``ProfileData``.
.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444
For the counter corresponding to the first data record, the byte offset
relative to the start of the counter section is calculated as ``CounterPtr1 - CounterDeltaInitVal``.
When profile reader advances to the second data record, note ``CounterDelta``
is updated to ``CounterDeltaInitVal - sizeof(ProfileData)``.
Thus the byte offset relative to the start of the counter section is calculated
as ``CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))``.
.. _`bitmap`:
Bitmap
^^^^^^^
This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
for the design.
.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
.. _`function names`:
Names
^^^^^^
This section contains possibly compressed concatenated string of functions' PGO
names. If compressed, zlib library is used.
Function names serve as keys in the PGO data hash table when raw profiles are
converted into indexed profiles. They are also crucial for ``llvm-profdata`` to
show the profiles in a human-readable way.
Virtual Table Profile Data
^^^^^^^^^^^^^^^^^^^^^^^^^^^
This section is used for `type profiling`_. Each entry corresponds to one virtual
table and is defined by the following C++ struct
.. code-block:: c++
struct VTableProfData {
// The start address of the vtable, collected at runtime.
uint64_t StartAddress;
// The byte size of the vtable. `StartAddress` and `ByteSize` specifies an address range to look up.
uint32_t ByteSize;
// The hash of vtable's (PGO) name
uint64_t MD5HashOfName;
};
At profile use time, the compiler looks up a profiled address in the sorted vtable
address ranges and maps the address to a specific vtable through hashed name.
Virtual Table Names
^^^^^^^^^^^^^^^^^^^^
This section is similar to `function names`_ section above, except it contains the PGO
names of profiled virtual tables. It's a standalone section such that raw profile
readers could directly find each name set by accessing the corresponding profile
data section.
This section is stored in raw profiles such that `llvm-profdata` could show the
profiles in a human-readable way.
Value Profile Data
^^^^^^^^^^^^^^^^^^^^
This section contains the profile data for value profiling.
The value profiles corresponding to a profile metadata are serialized contiguously
as one record, and value profile records are stored in the same order as the
respective profile data, such that a raw profile reader `advances`_ the pointer to
profile data and the pointer to value profile records simultaneously [5]_ to find
value profiles for a per function, per `FuncHash`_ profile data.
.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
Indexed Profile Format
===========================
Indexed profiles are generated from ``llvm-profdata``. In the indexed profiles,
function data are organized as on-disk hash table such that compilers can
look up profile data for functions in an IR module.
Compilers and tools must retain backward compatibility with indexed profiles.
That is, a tool or a compiler built at newer versions of code must understand
profiles generated by older tools or compilers.
General Storage Layout
-----------------------
The ASCII art depicts the general storage layout of indexed profiles.
Specifically, the indexed profile header describes the byte offset of individual
payload sections.
::
+-----------------------+---+
| Magic | |
+-----------------------+ |
| Version | |
+-----------------------+ |
| HashType | H
+-----------------------+ E
| Byte Offset | A
+------ | of section A | D
| +-----------------------+ E
| | Byte Of fset | R
+-----------| of section B | |
| | +-----------------------+ |
| | | ... | |
| | +-----------------------+ |
| | | Byte Offset | |
+---------------| of section Z | |
| | | +-----------------------+---+
| | | | Profile Summary | |
| | | +-----------------------+ P
| | +------>| Section A | A
| | +-----------------------+ Y
| +---------->| Section B | L
| +-----------------------+ O
| | ... | A
| +-----------------------+ D
+-------------->| Section Z | |
+-----------------------+---+
.. note::
Profile summary section is at the beginning of payload. It's right after the
header so its position is implicitly known after reading the header.
Header
--------
The `Header struct`_ is the source of truth and struct fields should explain
what's in the header. At a high level, `*Offset` fields record section byte
offsets, which are used by readers to locate interesting sections and skip
uninteresting ones.
.. note::
To maintain backward compatibility of the indexed profiles, existing fields
shouldn't be deleted from struct definition; the field order shouldn't be
modified. New fields should be appended.
.. _`Header struct`: https://github.com/llvm/llvm-project/blob/1a2960bab6381f2b288328e2371829b460ac020c/llvm/include/llvm/ProfileData/InstrProf.h#L1053-L1080
Payload Sections
------------------
(CS) Profile Summary
^^^^^^^^^^^^^^^^^^^^^
This section is right after profile header. It stores the serialized profile
summary. For context-sensitive IR-based instrumentation PGO, this section stores
an additional profile summary corresponding to the context-sensitive profiles.
.. _`function data`:
Function data
^^^^^^^^^^^^^^^^^^
This section stores functions and their profiling data as an on-disk hash table.
Profile data for functions with the same name are grouped together and share one
hash table entry (the functions may come from different shared libraries for
instance). The profile data for them are organized as a sequence of key-value
pair where the key is `FuncHash`_, and the value is profiled information (represented
by `InstrProfRecord`_) for the function.
.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693
MemProf Profile data
^^^^^^^^^^^^^^^^^^^^^^
This section stores function's memory profiling data. See
`MemProf binary serialization format RFC`_ for the design.
.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
Binary Ids
^^^^^^^^^^^^^^^^^^^^^^
The section is used to carry on `binary id`_ information from raw profiles.
Temporal Profile Traces
^^^^^^^^^^^^^^^^^^^^^^^^
The section is used to carry on temporal profile information from raw profiles.
See `temporal profiling`_ for the design.
Virtual Table Names
^^^^^^^^^^^^^^^^^^^^
This section is used to store the names of vtables from raw profile in the indexed
profile.
Unlike function names which are stored as keys of `function data`_ hash table,
vtable names need to be stored in a standalone section in indexed profiles.
This way, `llvm-profdata` could show the profiled vtable information in a
human-readable way.
Profile Data Usage
=======================================
``llvm-profdata`` is the command line tool to display and process instrumentation-
based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_
and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_.
.. [3] A raw profile file could contain the concatenation of multiple raw
profiles, for example, from an executable and its shared libraries. Raw
profile reader could parse all raw profiles from the file correctly.
.. [4] The counter section is used by a few variant types (like temporal
profiling) and might have different semantics there.
.. [5] The step size of data pointer is the ``sizeof(ProfileData)``, and the step
size of value profile pointer is calculated based on the number of collected
values.
.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
.. _`temporal profiling`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685
.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565
.. _`binary id`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
.. _`type profiling`: https://discourse.llvm.org/t/rfc-dynamic-type-profiling-and-optimizations-in-llvm/74600
|