diff options
Diffstat (limited to 'llvm/docs')
-rw-r--r-- | llvm/docs/LangRef.rst | 54 | ||||
-rw-r--r-- | llvm/docs/ReleaseNotes.md | 4 | ||||
-rw-r--r-- | llvm/docs/SourceLevelDebugging.rst | 80 |
3 files changed, 98 insertions, 40 deletions
diff --git a/llvm/docs/LangRef.rst b/llvm/docs/LangRef.rst index 2fbca05..99a0b17 100644 --- a/llvm/docs/LangRef.rst +++ b/llvm/docs/LangRef.rst @@ -5175,6 +5175,8 @@ The following is the syntax for constant expressions: Perform the :ref:`trunc operation <i_trunc>` on constants. ``ptrtoint (CST to TYPE)`` Perform the :ref:`ptrtoint operation <i_ptrtoint>` on constants. +``ptrtoaddr (CST to TYPE)`` + Perform the :ref:`ptrtoaddr operation <i_ptrtoaddr>` on constants. ``inttoptr (CST to TYPE)`` Perform the :ref:`inttoptr operation <i_inttoptr>` on constants. This one is *really* dangerous! @@ -12523,6 +12525,58 @@ Example: %Y = ptrtoint ptr %P to i64 ; yields zero extension on 32-bit architecture %Z = ptrtoint <4 x ptr> %P to <4 x i64>; yields vector zero extension for a vector of addresses on 32-bit architecture +.. _i_ptrtoaddr: + +'``ptrtoaddr .. to``' Instruction +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + <result> = ptrtoaddr <ty> <value> to <ty2> ; yields ty2 + +Overview: +""""""""" + +The '``ptrtoaddr``' instruction converts the pointer or a vector of +pointers ``value`` to the underlying integer address (or vector of addresses) of +type ``ty2``. This is different from :ref:`ptrtoint <i_ptrtoint>` in that it +only operates on the index bits of the pointer and ignores all other bits, and +does not capture the provenance of the pointer. + +Arguments: +"""""""""" + +The '``ptrtoaddr``' instruction takes a ``value`` to cast, which must be +a value of type :ref:`pointer <t_pointer>` or a vector of pointers, and a +type to cast it to ``ty2``, which must be must be the :ref:`integer <t_integer>` +type (or vector of integers) matching the pointer index width of the address +space of ``ty``. + +Semantics: +"""""""""" + +The '``ptrtoaddr``' instruction converts ``value`` to integer type ``ty2`` by +interpreting the lowest index-width pointer representation bits as an integer. +If the address size and the pointer representation size are the same and +``value`` and ``ty2`` are the same size, then nothing is done (*no-op cast*) +other than a type change. + +The ``ptrtoaddr`` instruction always :ref:`captures the address but not the provenance <pointercapture>` +of the pointer argument. + +Example: +"""""""" +This example assumes pointers in address space 1 are 64 bits in size with an +address width of 32 bits (``p1:64:64:64:32`` :ref:`datalayout string<langref_datalayout>`) +.. code-block:: llvm + + %X = ptrtoaddr ptr addrspace(1) %P to i32 ; extracts low 32 bits of pointer + %Y = ptrtoaddr <4 x ptr addrspace(1)> %P to <4 x i32>; yields vector of low 32 bits for each pointer + + .. _i_inttoptr: '``inttoptr .. to``' Instruction diff --git a/llvm/docs/ReleaseNotes.md b/llvm/docs/ReleaseNotes.md index b38ed62..88b7e6d 100644 --- a/llvm/docs/ReleaseNotes.md +++ b/llvm/docs/ReleaseNotes.md @@ -56,6 +56,10 @@ Makes programs 10x faster by doing Special New Thing. Changes to the LLVM IR ---------------------- +* The `ptrtoaddr` instruction was introduced. This instruction returns the + address component of a pointer type variable but unlike `ptrtoint` does not + capture provenance ([#125687](https://github.com/llvm/llvm-project/pull/125687)). + Changes to LLVM infrastructure ------------------------------ diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst index dfc8c53e..c2084c2 100644 --- a/llvm/docs/SourceLevelDebugging.rst +++ b/llvm/docs/SourceLevelDebugging.rst @@ -1300,28 +1300,28 @@ calls. This descriptor results in the following DWARF tag: Debugging information format ============================ -Debugging Information Extension for Objective C Properties +Debugging Information Extension for Objective-C Properties ---------------------------------------------------------- Introduction ^^^^^^^^^^^^ -Objective C provides a simpler way to declare and define accessor methods using +Objective-C provides a simpler way to declare and define accessor methods using declared properties. The language provides features to declare a property and to let compiler synthesize accessor methods. -The debugger lets developer inspect Objective C interfaces and their instance +The debugger lets developer inspect Objective-C interfaces and their instance variables and class variables. However, the debugger does not know anything -about the properties defined in Objective C interfaces. The debugger consumes +about the properties defined in Objective-C interfaces. The debugger consumes information generated by compiler in DWARF format. The format does not support -encoding of Objective C properties. This proposal describes DWARF extensions to -encode Objective C properties, which the debugger can use to let developers -inspect Objective C properties. +encoding of Objective-C properties. This proposal describes DWARF extensions to +encode Objective-C properties, which the debugger can use to let developers +inspect Objective-C properties. Proposal ^^^^^^^^ -Objective C properties exist separately from class members. A property can be +Objective-C properties exist separately from class members. A property can be defined only by "setter" and "getter" selectors, and be calculated anew on each access. Or a property can just be a direct access to some declared ivar. Finally it can have an ivar "automatically synthesized" for it by the compiler, @@ -1624,24 +1624,24 @@ The BUCKETS are an array of offsets to DATA for each hash: So for ``bucket[3]`` in the example above, we have an offset into the table 0x000034f0 which points to a chain of entries for the bucket. Each bucket must -contain a next pointer, full 32 bit hash value, the string itself, and the data +contain a next pointer, full 32-bit hash value, the string itself, and the data for the current string value. .. code-block:: none .------------. 0x000034f0: | 0x00003500 | next pointer - | 0x12345678 | 32 bit hash + | 0x12345678 | 32-bit hash | "erase" | string value | data[n] | HashData for this bucket |------------| 0x00003500: | 0x00003550 | next pointer - | 0x29273623 | 32 bit hash + | 0x29273623 | 32-bit hash | "dump" | string value | data[n] | HashData for this bucket |------------| 0x00003550: | 0x00000000 | next pointer - | 0x82638293 | 32 bit hash + | 0x82638293 | 32-bit hash | "main" | string value | data[n] | HashData for this bucket `------------' @@ -1650,17 +1650,17 @@ The problem with this layout for debuggers is that we need to optimize for the negative lookup case where the symbol we're searching for is not present. So if we were to lookup "``printf``" in the table above, we would make a 32-bit hash for "``printf``", it might match ``bucket[3]``. We would need to go to -the offset 0x000034f0 and start looking to see if our 32 bit hash matches. To +the offset 0x000034f0 and start looking to see if our 32-bit hash matches. To do so, we need to read the next pointer, then read the hash, compare it, and skip to the next bucket. Each time we are skipping many bytes in memory and -touching new pages just to do the compare on the full 32 bit hash. All of +touching new pages just to do the compare on the full 32-bit hash. All of these accesses then tell us that we didn't have a match. Name Hash Tables """""""""""""""" To solve the issues mentioned above we have structured the hash tables a bit -differently: a header, buckets, an array of all unique 32 bit hash values, +differently: a header, buckets, an array of all unique 32-bit hash values, followed by an array of hash value data offsets, one for each hash value, then the data for all hash values: @@ -1679,11 +1679,11 @@ the data for all hash values: `-------------' The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By -making all of the full 32 bit hash values contiguous in memory, we allow +making all of the full 32-bit hash values contiguous in memory, we allow ourselves to efficiently check for a match while touching as little memory as -possible. Most often checking the 32 bit hash values is as far as the lookup +possible. Most often checking the 32-bit hash values is as far as the lookup goes. If it does match, it usually is a match with no collisions. So for a -table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash +table with "``n_buckets``" buckets, and "``n_hashes``" unique 32-bit hash values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and ``OFFSETS`` as: @@ -1698,11 +1698,11 @@ values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and | HEADER.header_data_len | uint32_t | HEADER_DATA | HeaderData |-------------------------| - | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes + | BUCKETS | uint32_t[n_buckets] // 32-bit hash indexes |-------------------------| - | HASHES | uint32_t[n_hashes] // 32 bit hash values + | HASHES | uint32_t[n_hashes] // 32-bit hash values |-------------------------| - | OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data + | OFFSETS | uint32_t[n_hashes] // 32-bit offsets to hash value data |-------------------------| | ALL HASH DATA | `-------------------------' @@ -1761,7 +1761,7 @@ with: | | |------------| 0x000034f0: | 0x00001203 | .debug_str ("erase") - | 0x00000004 | A 32 bit array count - number of HashData with name "erase" + | 0x00000004 | A 32-bit array count - number of HashData with name "erase" | 0x........ | HashData[0] | 0x........ | HashData[1] | 0x........ | HashData[2] @@ -1769,18 +1769,18 @@ with: | 0x00000000 | String offset into .debug_str (terminate data for hash) |------------| 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") - | 0x00000002 | A 32 bit array count - number of HashData with name "collision" + | 0x00000002 | A 32-bit array count - number of HashData with name "collision" | 0x........ | HashData[0] | 0x........ | HashData[1] | 0x00001203 | String offset into .debug_str ("dump") - | 0x00000003 | A 32 bit array count - number of HashData with name "dump" + | 0x00000003 | A 32-bit array count - number of HashData with name "dump" | 0x........ | HashData[0] | 0x........ | HashData[1] | 0x........ | HashData[2] | 0x00000000 | String offset into .debug_str (terminate data for hash) |------------| 0x00003550: | 0x00001203 | String offset into .debug_str ("main") - | 0x00000009 | A 32 bit array count - number of HashData with name "main" + | 0x00000009 | A 32-bit array count - number of HashData with name "main" | 0x........ | HashData[0] | 0x........ | HashData[1] | 0x........ | HashData[2] @@ -1795,13 +1795,13 @@ with: So we still have all of the same data, we just organize it more efficiently for debugger lookup. If we repeat the same "``printf``" lookup from above, we -would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit +would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32-bit hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which is the index into the ``HASHES`` table. We would then compare any consecutive -32 bit hashes values in the ``HASHES`` array as long as the hashes would be in +32-bit hashes values in the ``HASHES`` array as long as the hashes would be in ``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo ``n_buckets`` is still 3. In the case of a failed lookup we would access the -memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes +memory for ``BUCKETS[3]``, and then compare a few consecutive 32-bit hashes before we know that we have no match. We don't end up marching through multiple words of memory and we really keep the number of processor data cache lines being accessed as small as possible. @@ -1842,10 +1842,10 @@ header is: HeaderData header_data; // Implementation specific header data }; -The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` +The header starts with a 32-bit "``magic``" value which must be ``'HASH'`` encoded as an ASCII integer. This allows the detection of the start of the hash table and also allows the table's byte order to be determined so the table -can be correctly extracted. The "``magic``" value is followed by a 16 bit +can be correctly extracted. The "``magic``" value is followed by a 16-bit ``version`` number which allows the table to be revised and modified in the future. The current version number is 1. ``hash_function`` is a ``uint16_t`` enumeration that specifies which hash function was used to produce this table. @@ -1858,8 +1858,8 @@ The current values for the hash function enumerations include: eHashFunctionDJB = 0u, // Daniel J Bernstein hash function }; -``bucket_count`` is a 32 bit unsigned integer that represents how many buckets -are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit +``bucket_count`` is a 32-bit unsigned integer that represents how many buckets +are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32-bit hash values that are in the ``HASHES`` array, and is the same number of offsets are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size in bytes of the ``HeaderData`` that is filled in by specialized versions of @@ -1875,12 +1875,12 @@ The header is followed by the buckets, hashes, offsets, and hash value data. struct FixedTable { uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below - uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table + uint32_t hashes [Header.hashes_count]; // Every unique 32-bit hash for the entire table is in this table uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above }; -``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The -``hashes`` array contains all of the 32 bit hash values for all names in the +``buckets`` is an array of 32-bit indexes into the ``hashes`` array. The +``hashes`` array contains all of the 32-bit hash values for all names in the hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` array that points to the data for the hash value. @@ -1967,13 +1967,13 @@ array to be: HeaderData.atoms[0].form = DW_FORM_data4; This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is -encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have +encoded as a 32-bit value (DW_FORM_data4). This allows a single name to have multiple matching DIEs in a single file, which could come up with an inlined function for instance. Future tables could include more information about the DIE such as flags indicating if the DIE is a function, method, block, or inlined. -The KeyType for the DWARF table is a 32 bit string table offset into the +The KeyType for the DWARF table is a 32-bit string table offset into the ".debug_str" table. The ".debug_str" is the string table for the DWARF which may already contain copies of all of the strings. This helps make sure, with help from the compiler, that we reuse the strings between all of the DWARF @@ -1982,7 +1982,7 @@ compiler generate all strings as DW_FORM_strp in the debug info, is that DWARF parsing can be made much faster. After a lookup is made, we get an offset into the hash data. The hash data -needs to be able to deal with 32 bit hash collisions, so the chunk of data +needs to be able to deal with 32-bit hash collisions, so the chunk of data at the offset in the hash data consists of a triple: .. code-block:: c @@ -1992,7 +1992,7 @@ at the offset in the hash data consists of a triple: HashData[hash_data_count] If "str_offset" is zero, then the bucket contents are done. 99.9% of the -hash data chunks contain a single item (no 32 bit hash collision): +hash data chunks contain a single item (no 32-bit hash collision): .. code-block:: none @@ -2025,7 +2025,7 @@ If there are collisions, you will have multiple valid string offsets: `------------' Current testing with real world C++ binaries has shown that there is around 1 -32 bit hash collision per 100,000 name entries. +32-bit hash collision per 100,000 name entries. Contents ^^^^^^^^ |