aboutsummaryrefslogtreecommitdiff
path: root/src/zihintntl.adoc
diff options
context:
space:
mode:
authorBill Traynor <wmat@riscv.org>2022-08-18 14:45:47 -0400
committerBill Traynor <wmat@riscv.org>2022-08-18 14:45:47 -0400
commita70a30089a1e83d3227a4be3c9e4cfb4d5af75e7 (patch)
treee895b8125c6fc950f12085e860619b5427070b97 /src/zihintntl.adoc
parentdc829d6b72d71b5ab8a88d4436b536702f74470b (diff)
downloadriscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.zip
riscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.tar.gz
riscv-isa-manual-a70a30089a1e83d3227a4be3c9e4cfb4d5af75e7.tar.bz2
subject line
Adding zihintntl.adoc.
Diffstat (limited to 'src/zihintntl.adoc')
-rw-r--r--src/zihintntl.adoc146
1 files changed, 146 insertions, 0 deletions
diff --git a/src/zihintntl.adoc b/src/zihintntl.adoc
new file mode 100644
index 0000000..5ea7f32
--- /dev/null
+++ b/src/zihintntl.adoc
@@ -0,0 +1,146 @@
+[[chap:zihintntl]]
+== ``Zihintntl'' Non-Temporal Locality Hints, Version 0.2
+
+The NTL instructions are HINTs that indicate that the explicit memory
+accesses of the immediately subsequent instruction (henceforth ``target
+instruction'') exhibit poor temporal locality of reference. The NTL
+instructions do not change architectural state, nor do they alter the
+architecturally visible effects of the target instruction. Four variants
+are provided:
+
+The NTL.P1 instruction indicates that the target instruction does not
+exhibit temporal locality within the capacity of the innermost level of
+private cache in the memory hierarchy. NTL.P1 is encoded as
+ADD _x0, x0, x2_.
+
+The NTL.PALL instruction indicates that the target instruction does not
+exhibit temporal locality within the capacity of any level of private
+cache in the memory hierarchy. NTL.PALL is encoded as ADD _x0, x0, x3_.
+
+The NTL.S1 instruction indicates that the target instruction does not
+exhibit temporal locality within the capacity of the innermost level of
+shared cache in the memory hierarchy. NTL.S1 is encoded as
+ADD _x0, x0, x4_.
+
+The NTL.ALL instruction indicates that the target instruction does not
+exhibit temporal locality within the capacity of any level of cache in
+the memory hierarchy. NTL.ALL is encoded as ADD _x0, x0, x5_.
+
+The NTL instructions can be used to avoid cache pollution when streaming
+data or traversing large data structures, or to reduce latency in
+producer-consumer interactions.
+
+A microarchitecture might use the NTL instructions to inform the cache
+replacement policy, or to decide which cache to allocate into, or to
+avoid cache allocation altogether. For example, NTL.P1 might indicate
+that an implementation should not allocate a line in a private L1 cache,
+but should allocate in L2 (whether private or shared). In another
+implementation, NTL.P1 might allocate the line in L1, but in the
+least-recently used state.
+
+NTL.ALL will typically inform implementations not to allocate anywhere
+in the cache hierarchy. Programmers should use NTL.ALL for accesses that
+have no exploitable temporal locality.
+
+Like any HINTs, these instructions may be freely ignored. Hence,
+although they are described in terms of cache-based memory hierarchies,
+they do not mandate the provision of caches.
+
+Some implementations might respect these HINTs for some memory accesses
+but not others: e.g., implementations that implement LR/SC by acquiring
+a cache line in the exclusive state in L1 might ignore NTL instructions
+on LR and SC, but might respect NTL instructions for AMOs and regular
+loads and stores.
+
+Table #tab:ntl-portable[1.1] lists several software use cases and the
+recommended NTL variant that _portable_ software—i.e., software not
+tuned for any specific implementation’s memory hierarchy—should use in
+each case.
+
+[[tab:ntl-portable]]
+.Recommended NTL variant for portable software to employ in various
+scenarios.
+[cols="<,<",options="header",]
+|===
+|Scenario |Recommended NTL variant
+|Access to a working set between and in size |NTL.P1
+|Access to a working set between and in size |NTL.PALL
+|Access to a working set greater than in size |NTL.S1
+|Access with no exploitable temporal locality (e.g., streaming) |NTL.ALL
+|Access to a contended synchronization variable |NTL.PALL
+|===
+
+Cache sizes will obviously vary between implementations, and so the
+working-set sizes listed in Table #tab:ntl-portable[1.1] are merely
+rough guidelines.
+
+Table #tab:ntl[[tab:ntl]] lists several sample memory hierarchies and
+recommends how each NTL variant maps onto each cache level. The table
+also recommends which NTL variant that implementation-tuned software
+should use to avoid allocating in a particular cache level. For example,
+for a system with a private L1 and a shared L2, it is recommended that
+NTL.P1 and NTL.PALL indicate that temporal locality cannot be exploited
+by the L1, and that NTL.S1 and NTL.ALL indicate that temporal locality
+cannot be exploited by the L2. Furthermore, software tuned for such a
+system should use NTL.P1 to indicate a lack of temporal locality
+exploitable by the L1, or should use NTL.ALL indicate a lack of temporal
+locality exploitable by the L2.
+
+If the C extension is provided, compressed variants of these HINTs are
+also provided: C.NTL.P1 is encoded as C.ADD _x0, x2_; C.NTL.PALL is
+encoded as C.ADD _x0, x3_; C.NTL.S1 is encoded as C.ADD _x0, x4_; and
+C.NTL.ALL is encoded as C.ADD _x0, x5_.
+
+The NTL instructions affect all memory-access instructions except the
+cache-management instructions in the Zicbom extension.
+
+As of this writing, there are no other exceptions to this rule, and so
+the NTL instructions affect all memory-access instructions defined in
+the base ISAs and the A, F, D, Q, C, and V standard extensions, as well
+as those defined within the hypervisor extension in Volume II.
+
+The NTL instructions can affect cache-management operations other than
+those in the Zicbom extension. For example, NTL.PALL followed by
+CBO.ZERO might indicate that the line should be allocated in L3 and
+zeroed, but not allocated in L1 or L2.
+
+When an NTL instruction is applied to a prefetch hint in the Zicbop
+extension, it indicates that a cache line should be prefetched into a
+cache that is _outer_ from the level specified by the NTL.
+
+For example, in a system with a private L1 and shared L2, NTL.P1
+followed by PREFETCH.R might prefetch into L2 with read intent.
+
+To prefetch into the innermost level of cache, do not prefix the
+prefetch instruction with an NTL instruction.
+
+In some systems, NTL.ALL followed by a prefetch instruction might
+prefetch into a cache or prefetch buffer internal to a memory
+controller.
+
+Software is discouraged from following an NTL instruction with an
+instruction that does not explicitly access memory. Nonadherence to this
+recommendation might reduce performance but otherwise has no
+architecturally visible effect.
+
+In the event that a trap is taken on the target instruction,
+implementations are discouraged from applying the NTL to the first
+instruction in the trap handler. Instead, implementations are
+recommended to ignore the HINT in this case.
+
+If an interrupt occurs between the execution of an NTL instruction and
+its target instruction, execution will normally resume at the target
+instruction. That the NTL instruction is not reexecuted does not change
+the semantics of the program.
+
+Some implementations might prefer not to process the NTL instruction
+until the target instruction is seen (e.g., so that the NTL can be fused
+with the memory access it modifies). Such implementations might
+preferentially take the interrupt before the NTL, rather than between
+the NTL and the memory access.
+
+Since the NTL instructions are encoded as ADDs, they can be used within
+LR/SC loops without voiding the forward-progress guarantee. But, since
+using other loads and stores within an LR/SC loop _does_ void the
+forward-progress guarantee, the only reason to use an NTL within such a
+loop is to modify the LR or the SC.