diff options
author | Bill Traynor <wmat@riscv.org> | 2022-08-18 14:45:47 -0400 |
---|---|---|
committer | Bill Traynor <wmat@riscv.org> | 2022-08-18 14:45:47 -0400 |
commit | a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7 (patch) | |
tree | e895b8125c6fc950f12085e860619b5427070b97 /src/zihintntl.adoc | |
parent | dc829d6b72d71b5ab8a88d4436b536702f74470b (diff) | |
download | riscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.zip riscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.tar.gz riscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.tar.bz2 |
subject line
Adding zihintntl.adoc.
Diffstat (limited to 'src/zihintntl.adoc')
-rw-r--r-- | src/zihintntl.adoc | 146 |
1 files changed, 146 insertions, 0 deletions
diff --git a/src/zihintntl.adoc b/src/zihintntl.adoc new file mode 100644 index 0000000..5ea7f32 --- /dev/null +++ b/src/zihintntl.adoc @@ -0,0 +1,146 @@ +[[chap:zihintntl]] +== ``Zihintntl'' Non-Temporal Locality Hints, Version 0.2 + +The NTL instructions are HINTs that indicate that the explicit memory +accesses of the immediately subsequent instruction (henceforth ``target +instruction'') exhibit poor temporal locality of reference. The NTL +instructions do not change architectural state, nor do they alter the +architecturally visible effects of the target instruction. Four variants +are provided: + +The NTL.P1 instruction indicates that the target instruction does not +exhibit temporal locality within the capacity of the innermost level of +private cache in the memory hierarchy. NTL.P1 is encoded as +ADD _x0, x0, x2_. + +The NTL.PALL instruction indicates that the target instruction does not +exhibit temporal locality within the capacity of any level of private +cache in the memory hierarchy. NTL.PALL is encoded as ADD _x0, x0, x3_. + +The NTL.S1 instruction indicates that the target instruction does not +exhibit temporal locality within the capacity of the innermost level of +shared cache in the memory hierarchy. NTL.S1 is encoded as +ADD _x0, x0, x4_. + +The NTL.ALL instruction indicates that the target instruction does not +exhibit temporal locality within the capacity of any level of cache in +the memory hierarchy. NTL.ALL is encoded as ADD _x0, x0, x5_. + +The NTL instructions can be used to avoid cache pollution when streaming +data or traversing large data structures, or to reduce latency in +producer-consumer interactions. + +A microarchitecture might use the NTL instructions to inform the cache +replacement policy, or to decide which cache to allocate into, or to +avoid cache allocation altogether. For example, NTL.P1 might indicate +that an implementation should not allocate a line in a private L1 cache, +but should allocate in L2 (whether private or shared). In another +implementation, NTL.P1 might allocate the line in L1, but in the +least-recently used state. + +NTL.ALL will typically inform implementations not to allocate anywhere +in the cache hierarchy. Programmers should use NTL.ALL for accesses that +have no exploitable temporal locality. + +Like any HINTs, these instructions may be freely ignored. Hence, +although they are described in terms of cache-based memory hierarchies, +they do not mandate the provision of caches. + +Some implementations might respect these HINTs for some memory accesses +but not others: e.g., implementations that implement LR/SC by acquiring +a cache line in the exclusive state in L1 might ignore NTL instructions +on LR and SC, but might respect NTL instructions for AMOs and regular +loads and stores. + +Table #tab:ntl-portable[1.1] lists several software use cases and the +recommended NTL variant that _portable_ software—i.e., software not +tuned for any specific implementation’s memory hierarchy—should use in +each case. + +[[tab:ntl-portable]] +.Recommended NTL variant for portable software to employ in various +scenarios. +[cols="<,<",options="header",] +|=== +|Scenario |Recommended NTL variant +|Access to a working set between and in size |NTL.P1 +|Access to a working set between and in size |NTL.PALL +|Access to a working set greater than in size |NTL.S1 +|Access with no exploitable temporal locality (e.g., streaming) |NTL.ALL +|Access to a contended synchronization variable |NTL.PALL +|=== + +Cache sizes will obviously vary between implementations, and so the +working-set sizes listed in Table #tab:ntl-portable[1.1] are merely +rough guidelines. + +Table #tab:ntl[[tab:ntl]] lists several sample memory hierarchies and +recommends how each NTL variant maps onto each cache level. The table +also recommends which NTL variant that implementation-tuned software +should use to avoid allocating in a particular cache level. For example, +for a system with a private L1 and a shared L2, it is recommended that +NTL.P1 and NTL.PALL indicate that temporal locality cannot be exploited +by the L1, and that NTL.S1 and NTL.ALL indicate that temporal locality +cannot be exploited by the L2. Furthermore, software tuned for such a +system should use NTL.P1 to indicate a lack of temporal locality +exploitable by the L1, or should use NTL.ALL indicate a lack of temporal +locality exploitable by the L2. + +If the C extension is provided, compressed variants of these HINTs are +also provided: C.NTL.P1 is encoded as C.ADD _x0, x2_; C.NTL.PALL is +encoded as C.ADD _x0, x3_; C.NTL.S1 is encoded as C.ADD _x0, x4_; and +C.NTL.ALL is encoded as C.ADD _x0, x5_. + +The NTL instructions affect all memory-access instructions except the +cache-management instructions in the Zicbom extension. + +As of this writing, there are no other exceptions to this rule, and so +the NTL instructions affect all memory-access instructions defined in +the base ISAs and the A, F, D, Q, C, and V standard extensions, as well +as those defined within the hypervisor extension in Volume II. + +The NTL instructions can affect cache-management operations other than +those in the Zicbom extension. For example, NTL.PALL followed by +CBO.ZERO might indicate that the line should be allocated in L3 and +zeroed, but not allocated in L1 or L2. + +When an NTL instruction is applied to a prefetch hint in the Zicbop +extension, it indicates that a cache line should be prefetched into a +cache that is _outer_ from the level specified by the NTL. + +For example, in a system with a private L1 and shared L2, NTL.P1 +followed by PREFETCH.R might prefetch into L2 with read intent. + +To prefetch into the innermost level of cache, do not prefix the +prefetch instruction with an NTL instruction. + +In some systems, NTL.ALL followed by a prefetch instruction might +prefetch into a cache or prefetch buffer internal to a memory +controller. + +Software is discouraged from following an NTL instruction with an +instruction that does not explicitly access memory. Nonadherence to this +recommendation might reduce performance but otherwise has no +architecturally visible effect. + +In the event that a trap is taken on the target instruction, +implementations are discouraged from applying the NTL to the first +instruction in the trap handler. Instead, implementations are +recommended to ignore the HINT in this case. + +If an interrupt occurs between the execution of an NTL instruction and +its target instruction, execution will normally resume at the target +instruction. That the NTL instruction is not reexecuted does not change +the semantics of the program. + +Some implementations might prefer not to process the NTL instruction +until the target instruction is seen (e.g., so that the NTL can be fused +with the memory access it modifies). Such implementations might +preferentially take the interrupt before the NTL, rather than between +the NTL and the memory access. + +Since the NTL instructions are encoded as ADDs, they can be used within +LR/SC loops without voiding the forward-progress guarantee. But, since +using other loads and stores within an LR/SC loop _does_ void the +forward-progress guarantee, the only reason to use an NTL within such a +loop is to modify the LR or the SC. |