Merge pull request #1226 from riscv/cmo

Adding Base Cache Management Operation ISA Extensions chapter.
author: Bill Traynor <wmat@riscv.org> 2024-02-27 11:39:27 -0500
committer: GitHub <noreply@github.com> 2024-02-27 11:39:27 -0500
commit: d4618d1499bc67439def92847babbf401c984357 (patch)
tree: 7e339961fb9fdedd38dbb4e769f0eeb992f77f29 /src/cmo.adoc
parent: b1940473272185c5bd2059c4663ed7537b85b3e3 (diff)
parent: 14c5798ba6272d1faf419626dd31c9659b98cbfe (diff)
download: riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.zip
riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.gz
riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.bz2
1 files changed, 1149 insertions, 0 deletions
diff --git a/src/cmo.adoc b/src/cmo.adoc
new file mode 100644
index 0000000..648c4ec
--- /dev/null
+++ b/src/cmo.adoc
@@ -0,0 +1,1149 @@
+[[cmo]]
+== Base Cache Management Operation ISA Extensions
+
+[acknowledgments]
+=== Acknowledgments
+
+Contributors to this specification (in alphabetical order) include: +
+Allen Baum,
+Paul Donahue,
+Greg Favor,
+Andy Glew,
+John Ingalls,
+David Kruckemyer,
+Josh Scheid,
+Philipp Tomsich,
+Paul Walmsley,
+and
+Derek Williams
+
+We express our gratitude to everyone that contributed to, reviewed, or improved
+this specification through their comments and questions.
+
+=== Pseudocode for instruction semantics
+
+The semantics of each instruction in the <<#insns>> chapter is expressed in a
+SAIL-like syntax.
+
+[#intro,reftext="Introduction"]
+=== Introduction
+
+_Cache-management operation_ (or _CMO_) instructions perform operations on
+copies of data in the memory hierarchy. In general, CMO instructions operate on
+cached copies of data, but in some cases, a CMO instruction may operate on
+memory locations directly. Furthermore, CMO instructions are grouped by
+operation into the following classes:
+
+* A _management_ instruction manipulates cached copies of data with respect to a
+  set of agents that can access the data
+* A _zero_ instruction zeros out a range of memory locations, potentially
+  allocating cached copies of data in one or more caches
+* A _prefetch_ instruction indicates to hardware that data at a given memory
+  location may be accessed in the near future, potentially allocating cached
+  copies of data in one or more caches
+
+This document introduces a base set of CMO ISA extensions that operate
+specifically on cache blocks or the memory locations corresponding to a cache
+block; these are known as _cache-block operation_ (or _CBO_) instructions. Each
+of the above classes of instructions represents an extension in this
+specification:
+
+* The _Zicbom_ extension defines a set of cache-block management instructions:
+  `CBO.INVAL`, `CBO.CLEAN`,  and `CBO.FLUSH`
+* The _Zicboz_ extension defines a cache-block zero instruction: `CBO.ZERO`
+* The _Zicbop_ extension defines a set of cache-block prefetch instructions:
+  `PREFETCH.R`, `PREFETCH.W`, and `PREFETCH.I`
+
+The execution behavior of the above instructions is also modified by CSR state
+added by this specification.
+
+The remainder of this document provides general background information on CMO
+instructions and describes each of the above ISA extensions.
+
+[NOTE]
+====
+_The term CMO encompasses all operations on caches or resources related to
+caches. The term CBO represents a subset of CMOs that operate only on cache
+blocks. The first CMO extensions only define CBOs._
+====
+
+[#background,reftext="Background"]
+=== Background
+
+This chapter provides information common to all CMO extensions.
+
+[#memory-caches,reftext="Memory and Caches"]
+==== Memory and Caches
+
+A _memory location_ is a physical resource in a system uniquely identified by a
+_physical address_. An _agent_ is a logic block, such as a RISC-V hart,
+accelerator, I/O device, etc., that can access a given memory location.
+
+[NOTE]
+====
+_A given agent may not be able to access all memory locations in a system, and
+two different agents may or may not be able to access the same set of memory
+locations._
+====
+
+A _load operation_ (or _store operation_) is performed by an agent to consume
+(or modify) the data at a given memory location. Load and store operations are
+performed as a result of explicit memory accesses to that memory location.
+Additionally, a _read transfer_ from memory fetches the data at the memory
+location, while a _write transfer_ to memory updates the data at the memory
+location.
+
+A _cache_ is a structure that buffers copies of data to reduce average memory
+latency. Any number of caches may be interspersed between an agent and a memory
+location, and load and store operations from an agent may be satisfied by a
+cache instead of the memory location.
+
+[NOTE]
+====
+_Load and store operations are decoupled from read and write transfers by
+caches. For example, a load operation may be satisfied by a cache without
+performing a read transfer from memory, or a store operation may be satisfied by
+a cache that first performs a read transfer from memory._
+====
+
+Caches organize copies of data into _cache blocks_, each of which represents a
+contiguous, naturally aligned power-of-two (or _NAPOT_) range of memory
+locations. A cache block is identified by a physical address corresponding to
+the underlying memory locations. The capacity and organization of a cache and
+the size of a cache block are both _implementation-specific_, and the execution
+environment provides software a means to discover information about the caches
+and cache blocks in a system. In the initial set of CMO extensions, the size of
+a cache block shall be uniform throughout the system.
+
+[NOTE]
+====
+_In future CMO extensions, the requirement for a uniform cache block size may be
+relaxed._
+====
+
+Implementation techniques such as speculative execution or hardware prefetching
+may cause a given cache to allocate or deallocate a copy of a cache block at any
+time, provided the corresponding physical addresses are accessible according to
+the supported access type PMA and are cacheable according to the cacheability
+PMA. Allocating a copy of a cache block results in a read transfer from another
+cache or from memory, while deallocating a copy of a cache block may result in a
+write transfer to another cache or to memory depending on whether the data in
+the copy were modified by a store operation. Additional details are discussed in
+<<#coherent-agents-caches>>.
+
+==== Cache-Block Operations
+
+A CBO instruction causes one or more operations to be performed on the cache
+blocks identified by the instruction. In general, a CBO instruction may identify
+one or more cache blocks; however, in the initial set of CMO extensions, CBO
+instructions identify a single cache block only.
+
+A cache-block management instruction performs one of the following operations,
+relative to the copy of a given cache block allocated in a given cache:
+
+* An _invalidate operation_ deallocates the copy of the cache block
+
+* A _clean operation_ performs a write transfer to another cache or to memory if
+  the data in the copy of the cache block have been modified by a store
+  operation
+
+* A _flush operation_ atomically performs a clean operation followed by an
+  invalidate operation
+
+Additional details, including the actual operation performed by a given
+cache-block management instruction, are described in <<#Zicbom>>.
+
+A cache-block zero instruction performs a set of store operations that write
+zeros to the set of bytes corresponding to a cache block. Unless specified
+otherwise, the store operations generated by a cache-block zero instruction have
+the same general properties and behaviors that other store instructions in the
+architecture have. An implementation may or may not update the entire set of
+bytes atomically with a single store operation. Additional details are described
+in <<#Zicboz>>.
+
+A cache-block prefetch instruction is a HINT to the hardware that software
+expects to perform a particular type of memory access in the near future.
+Additional details are described in <<#Zicbop>>.
+
+[#coherent-agents-caches,reftext="Coherent Agents and Caches"]
+=== Coherent Agents and Caches
+
+For a given memory location, a _set of coherent agents_ consists of the agents
+for which all of the following hold:
+
+* Store operations from all agents in the set appear to be serialized with
+  respect to each other
+* Store operations from all agents in the set eventually appear to all other
+  agents in the set
+* A load operation from an agent in the set returns data from a store operation
+  from an agent in the set (or from the initial data in memory)
+
+The coherent agents within such a set shall access a given memory location with
+the same physical address and the same physical memory attributes; however, if
+the coherence PMA for a given agent indicates a given memory location is not
+coherent, that agent shall not be a member of a set of coherent agents with any
+other agent for that memory location and shall be the sole member of a set of
+coherent agents consisting of itself.
+
+An agent who is a member of a set of coherent agents is said to be _coherent_
+with respect to the other agents in the set. On the other hand, an agent who is
+_not_ a member is said to be _non-coherent_ with respect to the agents in the
+set.
+
+Caches introduce the possibility that multiple copies of a given cache block may
+be present in a system at the same time. An _implementation-specific_ mechanism
+keeps these copies coherent with respect to the load and store operations from
+the agents in the set of coherent agents. Additionally, if a coherent agent in
+the set executes a CBO instruction that specifies the cache block, the resulting
+operation shall apply to any and all of the copies in the caches that can be
+accessed by the load and store operations from the coherent agents.
+
+[NOTE]
+====
+_An operation from a CBO instruction is defined to operate only on the copies of
+a cache block that are cached in the caches accessible by the explicit memory
+accesses performed by the set of coherent agents. This includes copies of a
+cache block in caches that are accessed only indirectly by load and store
+operations, e.g. coherent instruction caches._
+====
+
+The set of caches subject to the above mechanism form a _set of coherent
+caches_, and each coherent cache has the following behaviors, assuming all
+operations are performed by the agents in a set of coherent agents:
+
+* A coherent cache is permitted to allocate and deallocate copies of a cache
+  block and perform read and write transfers as described in <<#memory-caches>> 
+
+* A coherent cache is permitted to perform a write transfer to memory provided
+  that a store operation has modified the data in the cache block since the most
+  recent invalidate, clean, or flush operation on the cache block
+
+* At least one coherent cache is responsible for performing a write transfer to
+  memory once a store operation has modified the data in the cache block until
+  the next invalidate, clean, or flush operation on the cache block, after which
+  no coherent cache is responsible (or permitted) to perform a write transfer to
+  memory until the next store operation has modified the data in the cache block
+
+* A coherent cache is required to perform a write transfer to memory if a store
+  operation has modified the data in the cache block since the most recent
+  invalidate, clean, or flush operation on the cache block and if the next clean
+  or flush operation requires a write transfer to memory
+
+[NOTE]
+====
+_The above restrictions ensure that a "clean" copy of a cache block, fetched by
+a read transfer from memory and unmodified by a store operation, cannot later
+overwrite the copy of the cache block in memory updated by a write transfer to
+memory from a non-coherent agent._
+====
+
+A non-coherent agent may initiate a cache-block operation that operates on the
+set of coherent caches accessed by a set of coherent agents. The mechanism to
+perform such an operation is _implementation-specific_.
+
+==== Memory Ordering
+
+===== Preserved Program Order
+
+The preserved program order (abbreviated _PPO_) rules are defined by the RVWMO
+memory ordering model. How the operations resulting from CMO instructions fit
+into these rules is described below.
+
+For cache-block management instructions, the resulting invalidate, clean, and
+flush operations behave as stores in the PPO rules subject to one additional
+overlapping address rule. Specifically, if _a_ precedes _b_ in program order,
+then _a_ will precede _b_ in the global memory order if:
+
+* _a_ is an invalidate, clean, or flush, _b_ is a load, and _a_ and _b_ access
+  overlapping memory addresses
+
+[NOTE]
+====
+_The above rule ensures that a subsequent load in program order never appears
+in the global memory order before a preceding invalidate, clean, or flush
+operation to an overlapping address._
+====
+
+Additionally, invalidate, clean, and flush operations are classified as W or O
+(depending on the physical memory attributes for the corresponding physical
+addresses) for the purposes of predecessor and successor sets in `FENCE`
+instructions. These operations are _not_ ordered by other instructions that
+order stores, e.g. `FENCE.I` and `SFENCE.VMA`.
+
+For cache-block zero instructions, the resulting store operations behave as
+stores in the PPO rules and are ordered by other instructions that order stores.
+
+Finally, for cache-block prefetch instructions, the resulting operations are
+_not_ ordered by the PPO rules nor are they ordered by any other ordering
+instructions.
+
+===== Load Values
+
+An invalidate operation may change the set of values that can be returned by a
+load. In particular, an additional condition is added to the Load Value Axiom:
+
+* If an invalidate operation _i_ precedes a load _r_ and operates on a byte _x_
+  returned by _r_, and no store to _x_ appears between _i_ and _r_ in program
+  order or in the global memory order, then _r_ returns any of the following
+  values for _x_:
+
+. If no clean or flush operations on _x_ precede _i_ in the global memory order,
+  either the initial value of _x_ or the value of any store to _x_ that precedes
+  _i_
+
+. If no store to _x_ precedes a clean or flush operation on _x_ in the global
+  memory order and if the clean or flush operation on _x_ precedes _i_ in the
+  global memory order, either the initial value of _x_ or the value of any store
+  to _x_ that precedes _i_
+
+. If a store to _x_ precedes a clean or flush operation on _x_ in the global
+  memory order and if the clean or flush operation on _x_ precedes _i_ in the
+  global memory order, either the value of the latest store to _x_ that precedes
+  the latest clean or flush operation on _x_ or the value of any store to _x_
+  that both precedes _i_ and succeeds the latest clean or flush operation on _x_
+  that precedes _i_ 
+
+. The value of any store to _x_ by a non-coherent agent regardless of the above
+  conditions
+
+[NOTE]
+====
+_The first three bullets describe the possible load values at different points
+in the global memory order relative to clean or flush operations. The final
+bullet implies that the load value may be produced by a non-coherent agent at
+any time._
+====
+
+==== Traps
+
+Execution of certain CMO instructions may result in traps due to CSR state,
+described in the <<#csr_state>> section, or due to the address translation and
+protection mechanisms. The trapping behavior of CMO instructions is described in
+the following sections.
+
+===== Illegal Instruction and Virtual Instruction Exceptions
+
+Cache-block management instructions and cache-block zero instructions may raise
+illegal instruction exceptions or virtual instruction exceptions depending on
+the current privilege mode and the state of the CMO control registers described
+in the <<#csr_state>> section.
+
+Cache-block prefetch instructions raise neither illegal instruction exceptions
+nor virtual instruction exceptions.
+
+===== Page Fault, Guest-Page Fault, and Access Fault Exceptions
+
+Similar to load and store instructions, CMO instructions are explicit memory
+access instructions that compute an effective address. The effective address is
+ultimately translated into a physical address based on the privilege mode and
+the enabled translation mechanisms, and the CMO extensions impose the following
+constraints on the physical addresses in a given cache block:
+
+* The PMP access control bits shall be the same for _all_ physical addresses in
+  the cache block, and if write permission is granted by the PMP access control
+  bits, read permission shall also be granted
+
+* The PMAs shall be the same for _all_ physical addresses in the cache block,
+  and if write permission is granted by the supported access type PMAs, read
+  permission shall also be granted
+
+If the above constraints are not met, the behavior of a CBO instruction is
+UNSPECIFIED.
+
+[NOTE]
+====
+_This specification assumes that the above constraints will typically be met for
+main memory regions and may be met for certain I/O regions._
+====
+
+The Zicboz extension introduces an additional supported access type PMA for
+cache-block zero instructions. Main memory regions are required to support
+accesses by cache-block zero instructions; however, I/O regions may specify
+whether accesses by cache-block zero instructions are supported.
+
+A cache-block management instruction is permitted to access the specified cache
+block whenever a load instruction or store instruction is permitted to access
+the corresponding physical addresses. If neither a load instruction nor store
+instruction is permitted to access the physical addresses, but an instruction
+fetch is permitted to access the physical addresses, whether a cache-block
+management instruction is permitted to access the cache block is UNSPECIFIED. If
+access to the cache block is not permitted, a cache-block management instruction
+raises a store page fault or store guest-page fault exception if address
+translation does not permit any access or raises a store access fault exception
+otherwise. During address translation, the instruction also checks the accessed
+bit and may either raise an exception or set the bit as required.
+
+[NOTE]
+====
+_The interaction between cache-block management instructions and instruction
+fetches will be specified in a future extension._
+
+_As implied by omission, a cache-block management instruction does not check the
+dirty bit and neither raises an exception nor sets the bit._
+====
+
+A cache-block zero instruction is permitted to access the specified cache block
+whenever a store instruction is permitted to access the corresponding physical
+addresses and when the PMAs indicate that cache-block zero instructions are a
+supported access type. If access to the cache block is not permitted, a
+cache-block zero instruction raises a store page fault or store guest-page fault
+exception if address translation does not permit write access or raises a store
+access fault exception otherwise. During address translation, the instruction
+also checks the accessed and dirty bits and may either raise an exception or set
+the bits as required.
+
+A cache-block prefetch instruction is permitted to access the specified cache
+block whenever a load instruction, store instruction, or instruction fetch is
+permitted to access the corresponding physical addresses. If access to the cache
+block is not permitted, a cache-block prefetch instruction does not raise any
+exceptions and shall not access any caches or memory. During address
+translation, the instruction does _not_ check the accessed and dirty bits and
+neither raises an exception nor sets the bits.
+
+[NOTE]
+====
+_Like a load or store instruction, a CMO instruction may or may not be permitted
+to access a cache block based on the states of the `MPRV`, `MPV`, and `MPP` bits
+in `mstatus` and the `SUM` and `MXR` bits in `mstatus`, `sstatus`, and
+`vsstatus`._
+
+_This specification expects that implementations will process cache-block
+management instructions like store/AMO instructions, so store/AMO exceptions are
+appropriate for these instructions, regardless of the permissions required._
+====
+
+===== Address Misaligned Exceptions
+
+CMO instructions do _not_ generate address misaligned exceptions.
+
+===== Breakpoint Exceptions and Debug Mode Entry
+
+Unless otherwise defined by the debug architecture specification, the behavior
+of trigger modules with respect to CMO instructions is UNSPECIFIED.
+
+[NOTE]
+====
+_For the Zicbom, Zicboz, and Zicbop extensions, this specification recommends
+the following common trigger module behaviors:_
+
+* Type 6 address match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=0`,
+  should be supported
+
+* Type 2 address/data match triggers, i.e. `tdata1.type=2`, should be
+  unsupported
+    
+* The size of a memory access equals the size of the cache block accessed, and
+  the compare values follow from the addresses of the NAPOT memory region
+  corresponding to the cache block containing the effective address
+  
+* Unless an encoding for a cache block is added to the `mcontrol6.size` field,
+  an address trigger should only match a memory access from a CBO instruction if
+  `mcontrol6.size=0`
+    
+_If the Zicbom extension is implemented, this specification recommends the
+following additional trigger module behaviors:_
+
+* Implementing address match triggers should be optional
+
+* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
+  should be unsupported
+
+* Memory accesses are considered to be stores, i.e. an address trigger matches
+  only if `mcontrol6.store=1`
+
+_If the Zicboz extension is implemented, this specification recommends the
+following additional trigger module behaviors:_
+
+* Implementing address match triggers should be mandatory
+
+* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
+  should be supported, and implementing these triggers should be optional
+
+* Memory accesses are considered to be stores, i.e. an address trigger matches
+  only if `mcontrol6.store=1`
+
+_If the Zicbop extension is implemented, this specification recommends the
+following additional trigger module behaviors:_
+
+* Implementing address match triggers should be optional
+
+* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
+  should be unsupported
+
+* Memory accesses may be considered to be loads or stores depending on the
+  implementation, i.e. whether an address trigger matches on these instructions
+  when `mcontrol6.load=1` or `mcontrol6.store=1` is _implementation-specific_
+
+_This specification also recommends that the behavior of trigger modules with
+respect to the Zicboz extension should be defined in version 1.0 of the debug
+architecture specification. The behavior of trigger modules with respect to the
+Zicbom and Zicbop extensions is expected to be defined in future extensions._
+====
+
+===== Hypervisor Extension
+
+For the purposes of writing the `mtinst` or `htinst` register on a trap, the
+following standard transformation is defined for cache-block management
+instructions and cache-block zero instructions:
+
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 'opcode'},
+	{ bits: 5,  name: 0x0 },
+	{ bits: 3,  name: 'funct3'},
+	{ bits: 5,  name: 0x0},
+	{ bits: 12, name: 'operation'},
+]}
+....
+
+The `operation` field corresponds to the 12 most significant bits of the
+trapping instruction.
+
+[NOTE]
+====
+_As described in the hypervisor extension, a zero may be written into `mtinst`
+or `htinst` instead of the standard transformation defined above._
+====
+
+==== Effects on Constrained LR/SC Loops
+
+The following event is added to the list of events that satisfy the eventuality
+guarantee provided by constrained LR/SC loops, as defined in the A extension:
+
+* Some other hart executes a cache-block management instruction or a cache-block
+  zero instruction to the reservation set of the LR instruction in _H_'s
+  constrained LR/SC loop.
+
+[NOTE]
+====
+_The above event has been added to accommodate cache coherence protocols that
+cannot distinguish between invalidations for stores and invalidations for
+cache-block management operations._
+
+_Aside from the above event, CMO instructions neither change the properties of
+constrained LR/SC loops nor modify the eventuality guarantee provided by them.
+For example, executing a CMO instruction may cause a constrained LR/SC loop on
+any hart to fail periodically or may cause a unconstrained LR/SC sequence on the
+same hart to fail always. Additionally, executing a cache-block prefetch
+instruction does not impact the eventuality guarantee provided by constrained
+LR/SC loops executed on any hart._
+====
+
+==== Software Discovery
+
+The initial set of CMO extensions requires the following information to be
+discovered by software:
+
+* The size of the cache block for management and prefetch instructions
+* The size of the cache block for zero instructions
+* CBIE support at each privilege level
+
+Other general cache characteristics may also be specified in the discovery
+mechanism.
+
+[#csr_state,reftext="Control and Status Register State"]
+=== Control and Status Register State
+
+[NOTE]
+====
+_The CMO extensions rely on state in {csrname} CSRs that will be defined in a
+future update to the privileged architecture. If this CSR update is not
+ratified, the CMO extension will define its own CSRs._
+====
+
+Three CSRs control the execution of CMO instructions:
+
+* `m{csrname}`
+* `s{csrname}`
+* `h{csrname}`
+
+The `s{csrname}` register is used by all supervisor modes, including VS-mode. A
+hypervisor is responsible for saving and restoring `s{csrname}` on guest context
+switches. The `h{csrname}` register is only present if the H-extension is
+implemented and enabled.
+
+Each `x{csrname}` register (where `x` is `m`, `s`, or `h`) has the following
+generic format:
+
+.Generic Format for x{csrname} CSRs
+[cols="^10,^10,80a"]
+|===
+| Bits    | Name     | Description
+
+| [5:4]   | `CBIE`   | Cache Block Invalidate instruction Enable
+
+Enables the execution of the cache block invalidate instruction, `CBO.INVAL`, in
+a lower privilege mode:
+
+* `00`: The instruction raises an illegal instruction or virtual instruction
+  exception
+* `01`: The instruction is executed and performs a flush operation
+* `10`: _Reserved_
+* `11`: The instruction is executed and performs an invalidate operation
+
+| [6]     | `CBCFE`  | Cache Block Clean and Flush instruction Enable
+
+Enables the execution of the cache block clean instruction, `CBO.CLEAN`, and the
+cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode:
+
+* `0`: The instruction raises an illegal instruction or virtual instruction
+  exception
+* `1`: The instruction is executed
+
+| [7]     | `CBZE`   | Cache Block Zero instruction Enable
+
+Enables the execution of the cache block zero instruction, `CBO.ZERO`, in a
+lower privilege mode:
+
+* `0`: The instruction raises an illegal instruction or virtual instruction
+  exception
+* `1`: The instruction is executed
+
+|===
+
+The x{csrname} registers control CBO instruction execution based on the current
+privilege mode and the state of the appropriate CSRs, as detailed below.
+
+A `CBO.INVAL` instruction executes or raises either an illegal instruction
+exception or a virtual instruction exception based on the state of the
+`x{csrname}.CBIE` fields:
+
+[source,sail,subs="attributes+"]
+--
+
+// illegal instruction exceptions
+if (((priv_mode != M) && (m{csrname}.CBIE == 00)) ||
+    ((priv_mode == U) && (s{csrname}.CBIE == 00)))
+{
+  <raise illegal instruction exception>
+}
+// virtual instruction exceptions
+else if (((priv_mode == VS) && (h{csrname}.CBIE == 00)) ||
+         ((priv_mode == VU) && ((h{csrname}.CBIE == 00) || (s{csrname}.CBIE == 00))))
+{
+  <raise virtual instruction exception>
+}
+// execute instruction
+else
+{
+  if (((priv_mode != M) && (m{csrname}.CBIE == 01)) ||
+      ((priv_mode == U) && (s{csrname}.CBIE == 01)) ||
+      ((priv_mode == VS) && (h{csrname}.CBIE == 01)) ||
+      ((priv_mode == VU) && ((h{csrname}.CBIE == 01) || (s{csrname}.CBIE == 01))))
+  {
+    <execute CBO.INVAL and perform flush operation>
+  }
+  else
+  {
+    <execute CBO.INVAL and perform invalidate operation>
+  }
+}
+
+
+--
+
+[NOTE]
+====
+_Until a modified cache block has updated memory, a `CBO.INVAL` instruction may
+expose stale data values in memory if the CSRs are programmed to perform an
+invalidate operation. This behavior may result in a security hole if lower
+privileged level software performs an invalidate operation and accesses
+sensitive information in memory._
+
+_To avoid such holes, higher privileged level software must perform either a
+clean or flush operation on the cache block before permitting lower privileged
+level software to perform an invalidate operation on the block. Alternatively,
+higher privileged level software may program the CSRs so that `CBO.INVAL`
+either traps or performs a flush operation in a lower privileged level._
+====
+
+A `CBO.CLEAN` or `CBO.FLUSH` instruction executes or raises an illegal
+instruction or virtual instruction exception based on the state of the
+`x{csrname}.CBCFE` bits:
+
+[source,sail,subs="attributes+"]
+--
+
+// illegal instruction exceptions
+if (((priv_mode != M) && !m{csrname}.CBCFE) ||
+    ((priv_mode == U) && !s{csrname}.CBCFE))
+{
+  <raise illegal instruction exception>
+}
+// virtual instruction exceptions
+else if (((priv_mode == VS) && !h{csrname}.CBCFE) ||
+         ((priv_mode == VU) && !(h{csrname}.CBCFE && s{csrname}.CBCFE)))
+{
+  <raise virtual instruction exception>
+}
+// execute instruction
+else
+{
+  <execute CBO.CLEAN or CBO.FLUSH>
+}
+
+--
+
+Finally, a `CBO.ZERO` instruction executes or raises an illegal instruction or
+virtual instruction exception based on the state of the `x{csrname}.CBZE` bits:
+
+[source,sail,subs="attributes+"]
+--
+
+// illegal instruction exceptions
+if (((priv_mode != M) && !m{csrname}.CBZE) ||
+    ((priv_mode == U) && !s{csrname}.CBZE))
+{
+  <raise illegal instruction exception>
+}
+// virtual instruction exceptions
+else if (((priv_mode == VS) && !h{csrname}.CBZE) ||
+         ((priv_mode == VU) && !(h{csrname}.CBZE && s{csrname}.CBZE)))
+{
+  <raise virtual instruction exception>
+}
+// execute instruction
+else
+{
+  <execute CBO.ZERO>
+}
+
+--
+
+Each `x{csrname}` register is WARL; however, software should determine the legal
+values from the execution environment discovery mechanism.
+
+[#extensions,reftext="Extensions"]
+=== Extensions
+
+CMO instructions are defined in the following extensions:
+
+* <<#Zicbom>>
+* <<#Zicboz>>
+* <<#Zicbop>>
+
+[#Zicbom,reftext="Cache-Block Management Instructions"]
+==== Cache-Block Management Instructions
+
+Cache-block management instructions enable software running on a set of coherent
+agents to communicate with a set of non-coherent agents by performing one of the
+following operations:
+
+* An invalidate operation makes data from store operations performed by a set of
+  non-coherent agents visible to the set of coherent agents at a point common to
+  both sets by deallocating all copies of a cache block from the set of coherent
+  caches up to that point
+  
+* A clean operation makes data from store operations performed by the set of
+  coherent agents visible to a set of non-coherent agents at a point common to
+  both sets by performing a write transfer of a copy of a cache block to that
+  point provided a coherent agent performed a store operation that modified the
+  data in the cache block since the previous invalidate, clean, or flush
+  operation on the cache block
+  
+* A flush operation atomically performs a clean operation followed by an
+  invalidate operation
+
+In the Zicbom extension, the instructions operate to a point common to _all_
+agents in the system. In other words, an invalidate operation ensures that store
+operations from all non-coherent agents visible to agents in the set of coherent
+agents, and a clean operation ensures that store operations from coherent agents
+visible to all non-coherent agents.
+
+[NOTE]
+====
+_The Zicbom extension does not prohibit agents that fall outside of the above
+architectural definition; however, software cannot rely on the defined cache
+operations to have the desired effects with respect to those agents._
+
+_Future extensions may define different sets of agents for the purposes of
+performance optimization._
+====
+
+These instructions operate on the cache block whose effective address is
+specified in _rs1_. The effective address is translated into a corresponding
+physical address by the appropriate translation mechanisms.
+
+The following instructions comprise the Zicbom extension:
+
+[%header,cols="^1,^1,4,8"]
+|===
+|RV32
+|RV64
+|Mnemonic
+|Instruction
+
+|&#10003;
+|&#10003;
+|cbo.clean _base_
+|<<#insns-cbo_clean>>
+
+|&#10003;
+|&#10003;
+|cbo.flush _base_
+|<<#insns-cbo_flush>>
+
+|&#10003;
+|&#10003;
+|cbo.inval _base_
+|<<#insns-cbo_inval>>
+
+|===
+
+[#Zicboz,reftext="Cache-Block Zero Instructions"]
+==== Cache-Block Zero Instructions
+
+Cache-block zero instructions store zeros to the set of bytes corresponding to a
+cache block. An implementation may update the bytes in any order and with any
+granularity and atomicity, including individual bytes.
+
+[NOTE]
+====
+_Cache-block zero instructions store zeros independently of whether data from
+the underlying memory locations are cacheable. In addition, this specification
+does not constrain how the bytes are written._
+====
+
+These instructions operate on the cache block, or the memory locations
+corresponding to the cache block, whose effective address is specified in _rs1_.
+The effective address is translated into a corresponding physical address by the
+appropriate translation mechanisms.
+
+The following instructions comprise the Zicboz extension:
+
+[%header,cols="^1,^1,4,8"]
+|===
+|RV32
+|RV64
+|Mnemonic
+|Instruction
+
+|&#10003;
+|&#10003;
+|cbo.zero _base_
+|<<#insns-cbo_zero>>
+
+|===
+
+[#Zicbop,reftext="Cache-Block Prefetch Instructions"]
+==== Cache-Block Prefetch Instructions
+
+Cache-block prefetch instructions are HINTs to the hardware to indicate that
+software intends to perform a particular type of memory access in the near
+future. The types of memory accesses are instruction fetch, data read (i.e.
+load), and data write (i.e. store).
+
+These instructions operate on the cache block whose effective address is the sum
+of the base address specified in _rs1_ and the sign-extended offset encoded in
+_imm[11:0]_, where _imm[4:0]_ shall equal `0b00000`. The effective address is
+translated into a corresponding physical address by the appropriate translation
+mechanisms.
+
+[NOTE]
+====
+_Cache-block prefetch instructions are encoded as ORI instructions with rd equal
+to `0b00000`; however, for the purposes of effective address calculation, this
+field is also interpreted as imm[4:0] like a store instruction._
+====
+
+The following instructions comprise the Zicbop extension:
+
+[%header,cols="^1,^1,4,8"]
+|===
+|RV32
+|RV64
+|Mnemonic
+|Instruction
+
+|&#10003;
+|&#10003;
+|prefetch.i _offset_(_base_)
+|<<#insns-prefetch_i>>
+
+|&#10003;
+|&#10003;
+|prefetch.r _offset_(_base_)
+|<<#insns-prefetch_r>>
+
+|&#10003;
+|&#10003;
+|prefetch.w _offset_(_base_)
+|<<#insns-prefetch_w>>
+
+|===
+
+[#insns,reftext="Instructions"]
+=== Instructions
+
+[#insns-cbo_clean,reftext="Cache Block Clean"]
+==== cbo.clean
+
+Synopsis::
+Perform a clean operation on a cache block
+
+Mnemonic::
+cbo.clean _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
+	{ bits: 5,  name: 0x0 },
+	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
+	{ bits: 5,  name: 'rs1', attr: ['base'] },
+	{ bits: 12, name: 0x001, attr: ['CBO.CLEAN'] },
+]}
+....
+
+Description::
+
+A *cbo.clean* instruction performs a clean operation on the cache block whose
+effective address is the base address specified in _rs1_. The offset operand may
+be omitted; otherwise, any expression that computes the offset shall evaluate to
+zero. The instruction operates on the set of coherent caches accessed by the
+agent executing the instruction.
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-cbo_flush,reftext="Cache Block Flush"]
+==== cbo.flush
+
+Synopsis::
+Perform a flush operation on a cache block
+
+Mnemonic::
+cbo.flush _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
+	{ bits: 5,  name: 0x0 },
+	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
+	{ bits: 5,  name: 'rs1', attr: ['base'] },
+	{ bits: 12, name: 0x002, attr: ['CBO.FLUSH'] },
+]}
+....
+
+Description::
+
+A *cbo.flush* instruction performs a flush operation on the cache block whose
+effective address is the base address specified in _rs1_. The offset operand may
+be omitted; otherwise, any expression that computes the offset shall evaluate to
+zero. The instruction operates on the set of coherent caches accessed by the
+agent executing the instruction.
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-cbo_inval,reftext="Cache Block Invalidate"]
+==== cbo.inval
+
+Synopsis::
+Perform an invalidate operation on a cache block
+
+Mnemonic::
+cbo.inval _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
+	{ bits: 5,  name: 0x0 },
+	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
+	{ bits: 5,  name: 'rs1', attr: ['base'] },
+	{ bits: 12, name: 0x000, attr: ['CBO.INVAL'] },
+]}
+....
+
+Description::
+
+A *cbo.inval* instruction performs an invalidate operation on the cache block
+whose effective address is the base address specified in _rs1_. The offset
+operand may be omitted; otherwise, any expression that computes the offset shall
+evaluate to zero. The instruction operates on the set of coherent caches
+accessed by the agent executing the instruction. Depending on CSR programming,
+the instruction may perform a flush operation instead of an invalidate
+operation.
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-cbo_zero,reftext="Cache Block Zero"]
+==== cbo.zero
+
+Synopsis::
+Store zeros to the full set of bytes corresponding to a cache block
+
+Mnemonic::
+cbo.zero _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
+	{ bits: 5,  name: 0x0 },
+	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
+	{ bits: 5,  name: 'rs1', attr: ['base'] },
+	{ bits: 12, name: 0x004, attr: ['CBO.ZERO'] },
+]}
+....
+
+Description::
+
+A *cbo.zero* instruction performs stores of zeros to the full set of bytes
+corresponding to the cache block whose effective address is the base address
+specified in _rs1_. The offset operand may be omitted; otherwise, any expression
+that computes the offset shall evaluate to zero. An implementation may or may
+not update the entire set of bytes atomically.
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-prefetch_i,reftext="Cache Block Prefetch for Instruction Fetch"]
+==== prefetch.i
+
+Synopsis::
+Provide a HINT to hardware that a cache block is likely to be accessed by an
+instruction fetch in the near future
+
+Mnemonic::
+prefetch.i _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
+	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
+	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
+	{ bits: 5,  name: 'rs1',       attr: ['base'] },
+	{ bits: 5,  name: 0x0,         attr: ['PREFETCH.I'] },
+	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
+]}
+....
+
+Description::
+
+A *prefetch.i* instruction indicates to hardware that the cache block whose
+effective address is the sum of the base address specified in _rs1_ and the
+sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
+is likely to be accessed by an instruction fetch in the near future.
+
+[NOTE]
+====
+_An implementation may opt to cache a copy of the cache block in a cache
+accessed by an instruction fetch in order to improve memory access latency, but
+this behavior is not required._
+====
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-prefetch_r,reftext="Cache Block Prefetch for Data Read"]
+==== prefetch.r
+
+Synopsis::
+Provide a HINT to hardware that a cache block is likely to be accessed by a data
+read in the near future
+
+Mnemonic::
+prefetch.r _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
+	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
+	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
+	{ bits: 5,  name: 'rs1',       attr: ['base'] },
+	{ bits: 5,  name: 0x1,         attr: ['PREFETCH.R'] },
+	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
+]}
+....
+
+Description::
+
+A *prefetch.r* instruction indicates to hardware that the cache block whose
+effective address is the sum of the base address specified in _rs1_ and the
+sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
+is likely to be accessed by a data read (i.e. load) in the near future.
+
+[NOTE]
+====
+_An implementation may opt to cache a copy of the cache block in a cache
+accessed by a data read in order to improve memory access latency, but this
+behavior is not required._
+====
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
+[#insns-prefetch_w,reftext="Cache Block Prefetch for Data Write"]
+==== prefetch.w
+
+Synopsis::
+Provide a HINT to hardware that a cache block is likely to be accessed by a data
+write in the near future
+
+Mnemonic::
+prefetch.w _offset_(_base_)
+
+Encoding::
+[wavedrom, , svg]
+....
+{reg:[
+	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
+	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
+	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
+	{ bits: 5,  name: 'rs1',       attr: ['base'] },
+	{ bits: 5,  name: 0x3,         attr: ['PREFETCH.W'] },
+	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
+]}
+....
+
+Description::
+
+A *prefetch.w* instruction indicates to hardware that the cache block whose
+effective address is the sum of the base address specified in _rs1_ and the
+sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
+is likely to be accessed by a data write (i.e. store) in the near future.
+
+[NOTE]
+====
+_An implementation may opt to cache a copy of the cache block in a cache
+accessed by a data write in order to improve memory access latency, but this
+behavior is not required._
+====
+
+Operation::
+[source,sail]
+--
+TODO
+--
+
author	Bill Traynor <wmat@riscv.org>	2024-02-27 11:39:27 -0500
committer	GitHub <noreply@github.com>	2024-02-27 11:39:27 -0500
commit	d4618d1499bc67439def92847babbf401c984357 (patch)
tree	7e339961fb9fdedd38dbb4e769f0eeb992f77f29 /src/cmo.adoc
parent	b1940473272185c5bd2059c4663ed7537b85b3e3 (diff)
parent	14c5798ba6272d1faf419626dd31c9659b98cbfe (diff)
download	riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.zip riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.gz riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.bz2