diff options
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/HowToReleaseLLVM.rst | 3 | ||||
-rw-r--r-- | llvm/docs/LibFuzzer.rst | 54 | ||||
-rw-r--r-- | llvm/docs/NVPTXUsage.rst | 98 |
3 files changed, 131 insertions, 24 deletions
diff --git a/llvm/docs/HowToReleaseLLVM.rst b/llvm/docs/HowToReleaseLLVM.rst index 4274717..ca55ee5 100644 --- a/llvm/docs/HowToReleaseLLVM.rst +++ b/llvm/docs/HowToReleaseLLVM.rst @@ -349,8 +349,7 @@ Below are the rules regarding patching the release branch: #. *Bug fix releases* Patches should be limited to bug fixes or very safe and critical performance improvements. Patches must maintain both API and - ABI compatibility with the previous major release. - + ABI compatibility with the X.1.0 release. Release Final Tasks ------------------- diff --git a/llvm/docs/LibFuzzer.rst b/llvm/docs/LibFuzzer.rst index 9e34530..2137740 100644 --- a/llvm/docs/LibFuzzer.rst +++ b/llvm/docs/LibFuzzer.rst @@ -354,16 +354,18 @@ Output During operation the fuzzer prints information to ``stderr``, for example:: - INFO: Seed: 1523017872 - INFO: Loaded 1 modules (16 guards): [0x744e60, 0x744ea0), - INFO: -max_len is not provided, using 64 + INFO: Running with entropic power schedule (0xFF, 100). + INFO: Seed: 1434179311 + INFO: Loaded 1 modules (8 inline 8-bit counters): 8 [0x5f03d189be90, 0x5f03d189be98), + INFO: Loaded 1 PC tables (8 PCs): 8 [0x5f03d189be98,0x5f03d189bf18), + INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: A corpus is not provided, starting from an empty corpus - #0 READ units: 1 - #1 INITED cov: 3 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb - #3811 NEW cov: 4 ft: 3 corp: 2/2b exec/s: 0 rss: 25Mb L: 1 MS: 5 ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte- - #3827 NEW cov: 5 ft: 4 corp: 3/4b exec/s: 0 rss: 25Mb L: 2 MS: 1 CopyPart- - #3963 NEW cov: 6 ft: 5 corp: 4/6b exec/s: 0 rss: 25Mb L: 2 MS: 2 ShuffleBytes-ChangeBit- - #4167 NEW cov: 7 ft: 6 corp: 5/9b exec/s: 0 rss: 25Mb L: 3 MS: 1 InsertByte- + #2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 31Mb + #144 NEW cov: 3 ft: 3 corp: 2/2b lim: 4 exec/s: 0 rss: 31Mb L: 1/1 MS: 2 ChangeByte-ChangeByte- + #157 NEW cov: 4 ft: 4 corp: 3/4b lim: 4 exec/s: 0 rss: 31Mb L: 2/2 MS: 3 CrossOver-ChangeBit-CrossOver- + #1345 NEW cov: 5 ft: 5 corp: 4/8b lim: 14 exec/s: 0 rss: 32Mb L: 4/4 MS: 3 InsertByte-ChangeBit-CrossOver- + #1696 NEW cov: 6 ft: 6 corp: 5/10b lim: 17 exec/s: 0 rss: 32Mb L: 2/4 MS: 1 EraseBytes- + #1832 REDUCE cov: 6 ft: 6 corp: 5/9b lim: 17 exec/s: 0 rss: 32Mb L: 3/3 MS: 1 EraseBytes- ... The early parts of the output include information about the fuzzer options and @@ -407,7 +409,7 @@ Each output line also reports the following statistics (when non-zero): ``corp:`` Number of entries in the current in-memory test corpus and its size in bytes. ``lim:`` - Current limit on the length of new entries in the corpus. Increases over time + Current limit on the length of new entries in the corpus. Increases over time until the max length (``-max_len``) is reached. ``exec/s:`` Number of fuzzer iterations per second. @@ -418,7 +420,8 @@ For ``NEW`` and ``REDUCE`` events, the output line also includes information about the mutation operation that produced the new input: ``L:`` - Size of the new input in bytes. + Size of the new/reduced input in bytes and the size of the largest input + in current in-memory test corpus. ``MS: <n> <operations>`` Count and list of the mutation operations used to generate the input. @@ -453,19 +456,26 @@ A simple function that does something interesting if it receives the input You should get an error pretty quickly:: - INFO: Seed: 1523017872 - INFO: Loaded 1 modules (16 guards): [0x744e60, 0x744ea0), - INFO: -max_len is not provided, using 64 + INFO: Running with entropic power schedule (0xFF, 100). + INFO: Seed: 1434179311 + INFO: Loaded 1 modules (8 inline 8-bit counters): 8 [0x5f03d189be90, 0x5f03d189be98), + INFO: Loaded 1 PC tables (8 PCs): 8 [0x5f03d189be98,0x5f03d189bf18), + INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes INFO: A corpus is not provided, starting from an empty corpus - #0 READ units: 1 - #1 INITED cov: 3 ft: 2 corp: 1/1b exec/s: 0 rss: 24Mb - #3811 NEW cov: 4 ft: 3 corp: 2/2b exec/s: 0 rss: 25Mb L: 1 MS: 5 ChangeBit-ChangeByte-ChangeBit-ShuffleBytes-ChangeByte- - #3827 NEW cov: 5 ft: 4 corp: 3/4b exec/s: 0 rss: 25Mb L: 2 MS: 1 CopyPart- - #3963 NEW cov: 6 ft: 5 corp: 4/6b exec/s: 0 rss: 25Mb L: 2 MS: 2 ShuffleBytes-ChangeBit- - #4167 NEW cov: 7 ft: 6 corp: 5/9b exec/s: 0 rss: 25Mb L: 3 MS: 1 InsertByte- - ==31511== ERROR: libFuzzer: deadly signal + #2 INITED cov: 2 ft: 2 corp: 1/1b exec/s: 0 rss: 31Mb + #144 NEW cov: 3 ft: 3 corp: 2/2b lim: 4 exec/s: 0 rss: 31Mb L: 1/1 MS: 2 ChangeByte-ChangeByte- + #157 NEW cov: 4 ft: 4 corp: 3/4b lim: 4 exec/s: 0 rss: 31Mb L: 2/2 MS: 3 CrossOver-ChangeBit-CrossOver- + #1345 NEW cov: 5 ft: 5 corp: 4/8b lim: 14 exec/s: 0 rss: 32Mb L: 4/4 MS: 3 InsertByte-ChangeBit-CrossOver- + #1696 NEW cov: 6 ft: 6 corp: 5/10b lim: 17 exec/s: 0 rss: 32Mb L: 2/4 MS: 1 EraseBytes- + #1832 REDUCE cov: 6 ft: 6 corp: 5/9b lim: 17 exec/s: 0 rss: 32Mb L: 3/3 MS: 1 EraseBytes- + ==840148== ERROR: libFuzzer: deadly signal ... - artifact_prefix='./'; Test unit written to ./crash-b13e8756b13a00cf168300179061fb4b91fefbed + SUMMARY: libFuzzer: deadly signal + MS: 2 CopyPart-ChangeByte-; base unit: dbee5f8c7a5da845446e75b4a5708e74428b520a + 0x48,0x49,0x21, + HI! + artifact_prefix='./'; Test unit written to ./crash-7a8dc3985d2a90fb6e62e94910fc11d31949c348 + Base64: SEkh More examples diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst index 64dd2b8..dec6ad4 100644 --- a/llvm/docs/NVPTXUsage.rst +++ b/llvm/docs/NVPTXUsage.rst @@ -962,6 +962,104 @@ The ``griddepcontrol`` intrinsics allows the dependent grids and prerequisite gr For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-griddepcontrol>`__. +TCGEN05 family of Intrinsics +---------------------------- + +The llvm.nvvm.tcgen05.* intrinsics model the TCGEN05 family of instructions +exposed by PTX. These intrinsics use 'Tensor Memory' (henceforth ``tmem``). +NVPTX represents this memory using ``addrspace(6)`` and is always 32-bits. + +For more information, refer to the PTX ISA +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory>`_. + +The tensor-memory pointers may only be used with the tcgen05 intrinsics. +There are specialized load/store instructions provided (tcgen05.ld/st) to +work with tensor-memory. + +See the PTX ISA for more information on tensor-memory load/store instructions +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-and-register-load-store-instructions>`_. + +'``llvm.nvvm.tcgen05.alloc``' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +.. code-block:: llvm + + declare void @llvm.nvvm.tcgen05.alloc.cg1(ptr %dst, i32 %ncols) + declare void @llvm.nvvm.tcgen05.alloc.cg2(ptr %dst, i32 %ncols) + declare void @llvm.nvvm.tcgen05.alloc.shared.cg1(ptr addrspace(3) %dst, i32 %ncols) + declare void @llvm.nvvm.tcgen05.alloc.shared.cg2(ptr addrspace(3) %dst, i32 %ncols) + +Overview: +""""""""" + +The '``@llvm.nvvm.tcgen05.alloc.*``' intrinsics correspond to the +``tcgen05.alloc.cta_group*.sync.aligned.b32`` family of PTX instructions. +The ``tcgen05.alloc`` is a potentially blocking instruction which dynamically +allocates the specified number of columns in the Tensor Memory and writes +the address of the allocated Tensor Memory into shared memory at the +location specified by ``%dst``. The 32-bit operand ``%ncols`` specifies +the number of columns to be allocated and it must be a power-of-two. +The ``.shared`` variant explicitly uses shared memory address space for +the ``%dst`` operand. The ``.cg1`` and ``.cg2`` variants generate +``cta_group::1`` and ``cta_group::2`` variants of the instruction respectively. + +For more information, refer to the PTX ISA +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-allocation-and-management-instructions>`_. + +'``llvm.nvvm.tcgen05.dealloc``' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +.. code-block:: llvm + + declare void @llvm.nvvm.tcgen05.dealloc.cg1(ptr addrspace(6) %tmem_addr, i32 %ncols) + declare void @llvm.nvvm.tcgen05.dealloc.cg2(ptr addrspace(6) %tmem_addr, i32 %ncols) + +Overview: +""""""""" + +The '``@llvm.nvvm.tcgen05.dealloc.*``' intrinsics correspond to the +``tcgen05.dealloc.*`` set of PTX instructions. The ``tcgen05.dealloc`` +instructions deallocates the Tensor Memory specified by the Tensor Memory +address ``%tmem_addr``. The operand ``%tmem_addr`` must point to a previous +Tensor Memory allocation. The 32-bit operand ``%ncols`` specifies the number +of columns to be de-allocated. The ``.cg1`` and ``.cg2`` variants generate +``cta_group::1`` and ``cta_group::2`` variants of the instruction respectively. + +For more information, refer to the PTX ISA +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-allocation-and-management-instructions>`_. + +'``llvm.nvvm.tcgen05.relinq.alloc.permit``' +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +.. code-block:: llvm + + declare void @llvm.nvvm.tcgen05.relinq.alloc.permit.cg1() + declare void @llvm.nvvm.tcgen05.relinq.alloc.permit.cg2() + +Overview: +""""""""" + +The '``@llvm.nvvm.tcgen05.relinq.alloc.permit.*``' intrinsics correspond +to the ``tcgen05.relinquish_alloc_permit.*`` set of PTX instructions. +This instruction specifies that the CTA of the executing thread is +relinquishing the right to allocate Tensor Memory. So, it is illegal +for a CTA to perform ``tcgen05.alloc`` after any of its constituent +threads execute ``tcgen05.relinquish_alloc_permit``. The ``.cg1`` +and ``.cg2`` variants generate ``cta_group::1`` and ``cta_group::2`` +flavors of the instruction respectively. + +For more information, refer to the PTX ISA +`<https://docs.nvidia.com/cuda/parallel-thread-execution/#tensor-memory-allocation-and-management-instructions>`_. + Other Intrinsics ---------------- |