diff options
author | Bill Traynor <wmat@riscv.org> | 2024-02-27 11:39:27 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-02-27 11:39:27 -0500 |
commit | d4618d1499bc67439def92847babbf401c984357 (patch) | |
tree | 7e339961fb9fdedd38dbb4e769f0eeb992f77f29 /src/cmo.adoc | |
parent | b1940473272185c5bd2059c4663ed7537b85b3e3 (diff) | |
parent | 14c5798ba6272d1faf419626dd31c9659b98cbfe (diff) | |
download | riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.zip riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.gz riscv-isa-manual-d4618d1499bc67439def92847babbf401c984357.tar.bz2 |
Merge pull request #1226 from riscv/cmo
Adding Base Cache Management Operation ISA Extensions chapter.
Diffstat (limited to 'src/cmo.adoc')
-rw-r--r-- | src/cmo.adoc | 1149 |
1 files changed, 1149 insertions, 0 deletions
diff --git a/src/cmo.adoc b/src/cmo.adoc new file mode 100644 index 0000000..648c4ec --- /dev/null +++ b/src/cmo.adoc @@ -0,0 +1,1149 @@ +[[cmo]] +== Base Cache Management Operation ISA Extensions + +[acknowledgments] +=== Acknowledgments + +Contributors to this specification (in alphabetical order) include: + +Allen Baum, +Paul Donahue, +Greg Favor, +Andy Glew, +John Ingalls, +David Kruckemyer, +Josh Scheid, +Philipp Tomsich, +Paul Walmsley, +and +Derek Williams + +We express our gratitude to everyone that contributed to, reviewed, or improved +this specification through their comments and questions. + +=== Pseudocode for instruction semantics + +The semantics of each instruction in the <<#insns>> chapter is expressed in a +SAIL-like syntax. + +[#intro,reftext="Introduction"] +=== Introduction + +_Cache-management operation_ (or _CMO_) instructions perform operations on +copies of data in the memory hierarchy. In general, CMO instructions operate on +cached copies of data, but in some cases, a CMO instruction may operate on +memory locations directly. Furthermore, CMO instructions are grouped by +operation into the following classes: + +* A _management_ instruction manipulates cached copies of data with respect to a + set of agents that can access the data +* A _zero_ instruction zeros out a range of memory locations, potentially + allocating cached copies of data in one or more caches +* A _prefetch_ instruction indicates to hardware that data at a given memory + location may be accessed in the near future, potentially allocating cached + copies of data in one or more caches + +This document introduces a base set of CMO ISA extensions that operate +specifically on cache blocks or the memory locations corresponding to a cache +block; these are known as _cache-block operation_ (or _CBO_) instructions. Each +of the above classes of instructions represents an extension in this +specification: + +* The _Zicbom_ extension defines a set of cache-block management instructions: + `CBO.INVAL`, `CBO.CLEAN`, and `CBO.FLUSH` +* The _Zicboz_ extension defines a cache-block zero instruction: `CBO.ZERO` +* The _Zicbop_ extension defines a set of cache-block prefetch instructions: + `PREFETCH.R`, `PREFETCH.W`, and `PREFETCH.I` + +The execution behavior of the above instructions is also modified by CSR state +added by this specification. + +The remainder of this document provides general background information on CMO +instructions and describes each of the above ISA extensions. + +[NOTE] +==== +_The term CMO encompasses all operations on caches or resources related to +caches. The term CBO represents a subset of CMOs that operate only on cache +blocks. The first CMO extensions only define CBOs._ +==== + +[#background,reftext="Background"] +=== Background + +This chapter provides information common to all CMO extensions. + +[#memory-caches,reftext="Memory and Caches"] +==== Memory and Caches + +A _memory location_ is a physical resource in a system uniquely identified by a +_physical address_. An _agent_ is a logic block, such as a RISC-V hart, +accelerator, I/O device, etc., that can access a given memory location. + +[NOTE] +==== +_A given agent may not be able to access all memory locations in a system, and +two different agents may or may not be able to access the same set of memory +locations._ +==== + +A _load operation_ (or _store operation_) is performed by an agent to consume +(or modify) the data at a given memory location. Load and store operations are +performed as a result of explicit memory accesses to that memory location. +Additionally, a _read transfer_ from memory fetches the data at the memory +location, while a _write transfer_ to memory updates the data at the memory +location. + +A _cache_ is a structure that buffers copies of data to reduce average memory +latency. Any number of caches may be interspersed between an agent and a memory +location, and load and store operations from an agent may be satisfied by a +cache instead of the memory location. + +[NOTE] +==== +_Load and store operations are decoupled from read and write transfers by +caches. For example, a load operation may be satisfied by a cache without +performing a read transfer from memory, or a store operation may be satisfied by +a cache that first performs a read transfer from memory._ +==== + +Caches organize copies of data into _cache blocks_, each of which represents a +contiguous, naturally aligned power-of-two (or _NAPOT_) range of memory +locations. A cache block is identified by a physical address corresponding to +the underlying memory locations. The capacity and organization of a cache and +the size of a cache block are both _implementation-specific_, and the execution +environment provides software a means to discover information about the caches +and cache blocks in a system. In the initial set of CMO extensions, the size of +a cache block shall be uniform throughout the system. + +[NOTE] +==== +_In future CMO extensions, the requirement for a uniform cache block size may be +relaxed._ +==== + +Implementation techniques such as speculative execution or hardware prefetching +may cause a given cache to allocate or deallocate a copy of a cache block at any +time, provided the corresponding physical addresses are accessible according to +the supported access type PMA and are cacheable according to the cacheability +PMA. Allocating a copy of a cache block results in a read transfer from another +cache or from memory, while deallocating a copy of a cache block may result in a +write transfer to another cache or to memory depending on whether the data in +the copy were modified by a store operation. Additional details are discussed in +<<#coherent-agents-caches>>. + +==== Cache-Block Operations + +A CBO instruction causes one or more operations to be performed on the cache +blocks identified by the instruction. In general, a CBO instruction may identify +one or more cache blocks; however, in the initial set of CMO extensions, CBO +instructions identify a single cache block only. + +A cache-block management instruction performs one of the following operations, +relative to the copy of a given cache block allocated in a given cache: + +* An _invalidate operation_ deallocates the copy of the cache block + +* A _clean operation_ performs a write transfer to another cache or to memory if + the data in the copy of the cache block have been modified by a store + operation + +* A _flush operation_ atomically performs a clean operation followed by an + invalidate operation + +Additional details, including the actual operation performed by a given +cache-block management instruction, are described in <<#Zicbom>>. + +A cache-block zero instruction performs a set of store operations that write +zeros to the set of bytes corresponding to a cache block. Unless specified +otherwise, the store operations generated by a cache-block zero instruction have +the same general properties and behaviors that other store instructions in the +architecture have. An implementation may or may not update the entire set of +bytes atomically with a single store operation. Additional details are described +in <<#Zicboz>>. + +A cache-block prefetch instruction is a HINT to the hardware that software +expects to perform a particular type of memory access in the near future. +Additional details are described in <<#Zicbop>>. + +[#coherent-agents-caches,reftext="Coherent Agents and Caches"] +=== Coherent Agents and Caches + +For a given memory location, a _set of coherent agents_ consists of the agents +for which all of the following hold: + +* Store operations from all agents in the set appear to be serialized with + respect to each other +* Store operations from all agents in the set eventually appear to all other + agents in the set +* A load operation from an agent in the set returns data from a store operation + from an agent in the set (or from the initial data in memory) + +The coherent agents within such a set shall access a given memory location with +the same physical address and the same physical memory attributes; however, if +the coherence PMA for a given agent indicates a given memory location is not +coherent, that agent shall not be a member of a set of coherent agents with any +other agent for that memory location and shall be the sole member of a set of +coherent agents consisting of itself. + +An agent who is a member of a set of coherent agents is said to be _coherent_ +with respect to the other agents in the set. On the other hand, an agent who is +_not_ a member is said to be _non-coherent_ with respect to the agents in the +set. + +Caches introduce the possibility that multiple copies of a given cache block may +be present in a system at the same time. An _implementation-specific_ mechanism +keeps these copies coherent with respect to the load and store operations from +the agents in the set of coherent agents. Additionally, if a coherent agent in +the set executes a CBO instruction that specifies the cache block, the resulting +operation shall apply to any and all of the copies in the caches that can be +accessed by the load and store operations from the coherent agents. + +[NOTE] +==== +_An operation from a CBO instruction is defined to operate only on the copies of +a cache block that are cached in the caches accessible by the explicit memory +accesses performed by the set of coherent agents. This includes copies of a +cache block in caches that are accessed only indirectly by load and store +operations, e.g. coherent instruction caches._ +==== + +The set of caches subject to the above mechanism form a _set of coherent +caches_, and each coherent cache has the following behaviors, assuming all +operations are performed by the agents in a set of coherent agents: + +* A coherent cache is permitted to allocate and deallocate copies of a cache + block and perform read and write transfers as described in <<#memory-caches>> + +* A coherent cache is permitted to perform a write transfer to memory provided + that a store operation has modified the data in the cache block since the most + recent invalidate, clean, or flush operation on the cache block + +* At least one coherent cache is responsible for performing a write transfer to + memory once a store operation has modified the data in the cache block until + the next invalidate, clean, or flush operation on the cache block, after which + no coherent cache is responsible (or permitted) to perform a write transfer to + memory until the next store operation has modified the data in the cache block + +* A coherent cache is required to perform a write transfer to memory if a store + operation has modified the data in the cache block since the most recent + invalidate, clean, or flush operation on the cache block and if the next clean + or flush operation requires a write transfer to memory + +[NOTE] +==== +_The above restrictions ensure that a "clean" copy of a cache block, fetched by +a read transfer from memory and unmodified by a store operation, cannot later +overwrite the copy of the cache block in memory updated by a write transfer to +memory from a non-coherent agent._ +==== + +A non-coherent agent may initiate a cache-block operation that operates on the +set of coherent caches accessed by a set of coherent agents. The mechanism to +perform such an operation is _implementation-specific_. + +==== Memory Ordering + +===== Preserved Program Order + +The preserved program order (abbreviated _PPO_) rules are defined by the RVWMO +memory ordering model. How the operations resulting from CMO instructions fit +into these rules is described below. + +For cache-block management instructions, the resulting invalidate, clean, and +flush operations behave as stores in the PPO rules subject to one additional +overlapping address rule. Specifically, if _a_ precedes _b_ in program order, +then _a_ will precede _b_ in the global memory order if: + +* _a_ is an invalidate, clean, or flush, _b_ is a load, and _a_ and _b_ access + overlapping memory addresses + +[NOTE] +==== +_The above rule ensures that a subsequent load in program order never appears +in the global memory order before a preceding invalidate, clean, or flush +operation to an overlapping address._ +==== + +Additionally, invalidate, clean, and flush operations are classified as W or O +(depending on the physical memory attributes for the corresponding physical +addresses) for the purposes of predecessor and successor sets in `FENCE` +instructions. These operations are _not_ ordered by other instructions that +order stores, e.g. `FENCE.I` and `SFENCE.VMA`. + +For cache-block zero instructions, the resulting store operations behave as +stores in the PPO rules and are ordered by other instructions that order stores. + +Finally, for cache-block prefetch instructions, the resulting operations are +_not_ ordered by the PPO rules nor are they ordered by any other ordering +instructions. + +===== Load Values + +An invalidate operation may change the set of values that can be returned by a +load. In particular, an additional condition is added to the Load Value Axiom: + +* If an invalidate operation _i_ precedes a load _r_ and operates on a byte _x_ + returned by _r_, and no store to _x_ appears between _i_ and _r_ in program + order or in the global memory order, then _r_ returns any of the following + values for _x_: + +. If no clean or flush operations on _x_ precede _i_ in the global memory order, + either the initial value of _x_ or the value of any store to _x_ that precedes + _i_ + +. If no store to _x_ precedes a clean or flush operation on _x_ in the global + memory order and if the clean or flush operation on _x_ precedes _i_ in the + global memory order, either the initial value of _x_ or the value of any store + to _x_ that precedes _i_ + +. If a store to _x_ precedes a clean or flush operation on _x_ in the global + memory order and if the clean or flush operation on _x_ precedes _i_ in the + global memory order, either the value of the latest store to _x_ that precedes + the latest clean or flush operation on _x_ or the value of any store to _x_ + that both precedes _i_ and succeeds the latest clean or flush operation on _x_ + that precedes _i_ + +. The value of any store to _x_ by a non-coherent agent regardless of the above + conditions + +[NOTE] +==== +_The first three bullets describe the possible load values at different points +in the global memory order relative to clean or flush operations. The final +bullet implies that the load value may be produced by a non-coherent agent at +any time._ +==== + +==== Traps + +Execution of certain CMO instructions may result in traps due to CSR state, +described in the <<#csr_state>> section, or due to the address translation and +protection mechanisms. The trapping behavior of CMO instructions is described in +the following sections. + +===== Illegal Instruction and Virtual Instruction Exceptions + +Cache-block management instructions and cache-block zero instructions may raise +illegal instruction exceptions or virtual instruction exceptions depending on +the current privilege mode and the state of the CMO control registers described +in the <<#csr_state>> section. + +Cache-block prefetch instructions raise neither illegal instruction exceptions +nor virtual instruction exceptions. + +===== Page Fault, Guest-Page Fault, and Access Fault Exceptions + +Similar to load and store instructions, CMO instructions are explicit memory +access instructions that compute an effective address. The effective address is +ultimately translated into a physical address based on the privilege mode and +the enabled translation mechanisms, and the CMO extensions impose the following +constraints on the physical addresses in a given cache block: + +* The PMP access control bits shall be the same for _all_ physical addresses in + the cache block, and if write permission is granted by the PMP access control + bits, read permission shall also be granted + +* The PMAs shall be the same for _all_ physical addresses in the cache block, + and if write permission is granted by the supported access type PMAs, read + permission shall also be granted + +If the above constraints are not met, the behavior of a CBO instruction is +UNSPECIFIED. + +[NOTE] +==== +_This specification assumes that the above constraints will typically be met for +main memory regions and may be met for certain I/O regions._ +==== + +The Zicboz extension introduces an additional supported access type PMA for +cache-block zero instructions. Main memory regions are required to support +accesses by cache-block zero instructions; however, I/O regions may specify +whether accesses by cache-block zero instructions are supported. + +A cache-block management instruction is permitted to access the specified cache +block whenever a load instruction or store instruction is permitted to access +the corresponding physical addresses. If neither a load instruction nor store +instruction is permitted to access the physical addresses, but an instruction +fetch is permitted to access the physical addresses, whether a cache-block +management instruction is permitted to access the cache block is UNSPECIFIED. If +access to the cache block is not permitted, a cache-block management instruction +raises a store page fault or store guest-page fault exception if address +translation does not permit any access or raises a store access fault exception +otherwise. During address translation, the instruction also checks the accessed +bit and may either raise an exception or set the bit as required. + +[NOTE] +==== +_The interaction between cache-block management instructions and instruction +fetches will be specified in a future extension._ + +_As implied by omission, a cache-block management instruction does not check the +dirty bit and neither raises an exception nor sets the bit._ +==== + +A cache-block zero instruction is permitted to access the specified cache block +whenever a store instruction is permitted to access the corresponding physical +addresses and when the PMAs indicate that cache-block zero instructions are a +supported access type. If access to the cache block is not permitted, a +cache-block zero instruction raises a store page fault or store guest-page fault +exception if address translation does not permit write access or raises a store +access fault exception otherwise. During address translation, the instruction +also checks the accessed and dirty bits and may either raise an exception or set +the bits as required. + +A cache-block prefetch instruction is permitted to access the specified cache +block whenever a load instruction, store instruction, or instruction fetch is +permitted to access the corresponding physical addresses. If access to the cache +block is not permitted, a cache-block prefetch instruction does not raise any +exceptions and shall not access any caches or memory. During address +translation, the instruction does _not_ check the accessed and dirty bits and +neither raises an exception nor sets the bits. + +[NOTE] +==== +_Like a load or store instruction, a CMO instruction may or may not be permitted +to access a cache block based on the states of the `MPRV`, `MPV`, and `MPP` bits +in `mstatus` and the `SUM` and `MXR` bits in `mstatus`, `sstatus`, and +`vsstatus`._ + +_This specification expects that implementations will process cache-block +management instructions like store/AMO instructions, so store/AMO exceptions are +appropriate for these instructions, regardless of the permissions required._ +==== + +===== Address Misaligned Exceptions + +CMO instructions do _not_ generate address misaligned exceptions. + +===== Breakpoint Exceptions and Debug Mode Entry + +Unless otherwise defined by the debug architecture specification, the behavior +of trigger modules with respect to CMO instructions is UNSPECIFIED. + +[NOTE] +==== +_For the Zicbom, Zicboz, and Zicbop extensions, this specification recommends +the following common trigger module behaviors:_ + +* Type 6 address match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=0`, + should be supported + +* Type 2 address/data match triggers, i.e. `tdata1.type=2`, should be + unsupported + +* The size of a memory access equals the size of the cache block accessed, and + the compare values follow from the addresses of the NAPOT memory region + corresponding to the cache block containing the effective address + +* Unless an encoding for a cache block is added to the `mcontrol6.size` field, + an address trigger should only match a memory access from a CBO instruction if + `mcontrol6.size=0` + +_If the Zicbom extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be optional + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be unsupported + +* Memory accesses are considered to be stores, i.e. an address trigger matches + only if `mcontrol6.store=1` + +_If the Zicboz extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be mandatory + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be supported, and implementing these triggers should be optional + +* Memory accesses are considered to be stores, i.e. an address trigger matches + only if `mcontrol6.store=1` + +_If the Zicbop extension is implemented, this specification recommends the +following additional trigger module behaviors:_ + +* Implementing address match triggers should be optional + +* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`, + should be unsupported + +* Memory accesses may be considered to be loads or stores depending on the + implementation, i.e. whether an address trigger matches on these instructions + when `mcontrol6.load=1` or `mcontrol6.store=1` is _implementation-specific_ + +_This specification also recommends that the behavior of trigger modules with +respect to the Zicboz extension should be defined in version 1.0 of the debug +architecture specification. The behavior of trigger modules with respect to the +Zicbom and Zicbop extensions is expected to be defined in future extensions._ +==== + +===== Hypervisor Extension + +For the purposes of writing the `mtinst` or `htinst` register on a trap, the +following standard transformation is defined for cache-block management +instructions and cache-block zero instructions: + +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 'opcode'}, + { bits: 5, name: 0x0 }, + { bits: 3, name: 'funct3'}, + { bits: 5, name: 0x0}, + { bits: 12, name: 'operation'}, +]} +.... + +The `operation` field corresponds to the 12 most significant bits of the +trapping instruction. + +[NOTE] +==== +_As described in the hypervisor extension, a zero may be written into `mtinst` +or `htinst` instead of the standard transformation defined above._ +==== + +==== Effects on Constrained LR/SC Loops + +The following event is added to the list of events that satisfy the eventuality +guarantee provided by constrained LR/SC loops, as defined in the A extension: + +* Some other hart executes a cache-block management instruction or a cache-block + zero instruction to the reservation set of the LR instruction in _H_'s + constrained LR/SC loop. + +[NOTE] +==== +_The above event has been added to accommodate cache coherence protocols that +cannot distinguish between invalidations for stores and invalidations for +cache-block management operations._ + +_Aside from the above event, CMO instructions neither change the properties of +constrained LR/SC loops nor modify the eventuality guarantee provided by them. +For example, executing a CMO instruction may cause a constrained LR/SC loop on +any hart to fail periodically or may cause a unconstrained LR/SC sequence on the +same hart to fail always. Additionally, executing a cache-block prefetch +instruction does not impact the eventuality guarantee provided by constrained +LR/SC loops executed on any hart._ +==== + +==== Software Discovery + +The initial set of CMO extensions requires the following information to be +discovered by software: + +* The size of the cache block for management and prefetch instructions +* The size of the cache block for zero instructions +* CBIE support at each privilege level + +Other general cache characteristics may also be specified in the discovery +mechanism. + +[#csr_state,reftext="Control and Status Register State"] +=== Control and Status Register State + +[NOTE] +==== +_The CMO extensions rely on state in {csrname} CSRs that will be defined in a +future update to the privileged architecture. If this CSR update is not +ratified, the CMO extension will define its own CSRs._ +==== + +Three CSRs control the execution of CMO instructions: + +* `m{csrname}` +* `s{csrname}` +* `h{csrname}` + +The `s{csrname}` register is used by all supervisor modes, including VS-mode. A +hypervisor is responsible for saving and restoring `s{csrname}` on guest context +switches. The `h{csrname}` register is only present if the H-extension is +implemented and enabled. + +Each `x{csrname}` register (where `x` is `m`, `s`, or `h`) has the following +generic format: + +.Generic Format for x{csrname} CSRs +[cols="^10,^10,80a"] +|=== +| Bits | Name | Description + +| [5:4] | `CBIE` | Cache Block Invalidate instruction Enable + +Enables the execution of the cache block invalidate instruction, `CBO.INVAL`, in +a lower privilege mode: + +* `00`: The instruction raises an illegal instruction or virtual instruction + exception +* `01`: The instruction is executed and performs a flush operation +* `10`: _Reserved_ +* `11`: The instruction is executed and performs an invalidate operation + +| [6] | `CBCFE` | Cache Block Clean and Flush instruction Enable + +Enables the execution of the cache block clean instruction, `CBO.CLEAN`, and the +cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode: + +* `0`: The instruction raises an illegal instruction or virtual instruction + exception +* `1`: The instruction is executed + +| [7] | `CBZE` | Cache Block Zero instruction Enable + +Enables the execution of the cache block zero instruction, `CBO.ZERO`, in a +lower privilege mode: + +* `0`: The instruction raises an illegal instruction or virtual instruction + exception +* `1`: The instruction is executed + +|=== + +The x{csrname} registers control CBO instruction execution based on the current +privilege mode and the state of the appropriate CSRs, as detailed below. + +A `CBO.INVAL` instruction executes or raises either an illegal instruction +exception or a virtual instruction exception based on the state of the +`x{csrname}.CBIE` fields: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && (m{csrname}.CBIE == 00)) || + ((priv_mode == U) && (s{csrname}.CBIE == 00))) +{ + <raise illegal instruction exception> +} +// virtual instruction exceptions +else if (((priv_mode == VS) && (h{csrname}.CBIE == 00)) || + ((priv_mode == VU) && ((h{csrname}.CBIE == 00) || (s{csrname}.CBIE == 00)))) +{ + <raise virtual instruction exception> +} +// execute instruction +else +{ + if (((priv_mode != M) && (m{csrname}.CBIE == 01)) || + ((priv_mode == U) && (s{csrname}.CBIE == 01)) || + ((priv_mode == VS) && (h{csrname}.CBIE == 01)) || + ((priv_mode == VU) && ((h{csrname}.CBIE == 01) || (s{csrname}.CBIE == 01)))) + { + <execute CBO.INVAL and perform flush operation> + } + else + { + <execute CBO.INVAL and perform invalidate operation> + } +} + + +-- + +[NOTE] +==== +_Until a modified cache block has updated memory, a `CBO.INVAL` instruction may +expose stale data values in memory if the CSRs are programmed to perform an +invalidate operation. This behavior may result in a security hole if lower +privileged level software performs an invalidate operation and accesses +sensitive information in memory._ + +_To avoid such holes, higher privileged level software must perform either a +clean or flush operation on the cache block before permitting lower privileged +level software to perform an invalidate operation on the block. Alternatively, +higher privileged level software may program the CSRs so that `CBO.INVAL` +either traps or performs a flush operation in a lower privileged level._ +==== + +A `CBO.CLEAN` or `CBO.FLUSH` instruction executes or raises an illegal +instruction or virtual instruction exception based on the state of the +`x{csrname}.CBCFE` bits: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && !m{csrname}.CBCFE) || + ((priv_mode == U) && !s{csrname}.CBCFE)) +{ + <raise illegal instruction exception> +} +// virtual instruction exceptions +else if (((priv_mode == VS) && !h{csrname}.CBCFE) || + ((priv_mode == VU) && !(h{csrname}.CBCFE && s{csrname}.CBCFE))) +{ + <raise virtual instruction exception> +} +// execute instruction +else +{ + <execute CBO.CLEAN or CBO.FLUSH> +} + +-- + +Finally, a `CBO.ZERO` instruction executes or raises an illegal instruction or +virtual instruction exception based on the state of the `x{csrname}.CBZE` bits: + +[source,sail,subs="attributes+"] +-- + +// illegal instruction exceptions +if (((priv_mode != M) && !m{csrname}.CBZE) || + ((priv_mode == U) && !s{csrname}.CBZE)) +{ + <raise illegal instruction exception> +} +// virtual instruction exceptions +else if (((priv_mode == VS) && !h{csrname}.CBZE) || + ((priv_mode == VU) && !(h{csrname}.CBZE && s{csrname}.CBZE))) +{ + <raise virtual instruction exception> +} +// execute instruction +else +{ + <execute CBO.ZERO> +} + +-- + +Each `x{csrname}` register is WARL; however, software should determine the legal +values from the execution environment discovery mechanism. + +[#extensions,reftext="Extensions"] +=== Extensions + +CMO instructions are defined in the following extensions: + +* <<#Zicbom>> +* <<#Zicboz>> +* <<#Zicbop>> + +[#Zicbom,reftext="Cache-Block Management Instructions"] +==== Cache-Block Management Instructions + +Cache-block management instructions enable software running on a set of coherent +agents to communicate with a set of non-coherent agents by performing one of the +following operations: + +* An invalidate operation makes data from store operations performed by a set of + non-coherent agents visible to the set of coherent agents at a point common to + both sets by deallocating all copies of a cache block from the set of coherent + caches up to that point + +* A clean operation makes data from store operations performed by the set of + coherent agents visible to a set of non-coherent agents at a point common to + both sets by performing a write transfer of a copy of a cache block to that + point provided a coherent agent performed a store operation that modified the + data in the cache block since the previous invalidate, clean, or flush + operation on the cache block + +* A flush operation atomically performs a clean operation followed by an + invalidate operation + +In the Zicbom extension, the instructions operate to a point common to _all_ +agents in the system. In other words, an invalidate operation ensures that store +operations from all non-coherent agents visible to agents in the set of coherent +agents, and a clean operation ensures that store operations from coherent agents +visible to all non-coherent agents. + +[NOTE] +==== +_The Zicbom extension does not prohibit agents that fall outside of the above +architectural definition; however, software cannot rely on the defined cache +operations to have the desired effects with respect to those agents._ + +_Future extensions may define different sets of agents for the purposes of +performance optimization._ +==== + +These instructions operate on the cache block whose effective address is +specified in _rs1_. The effective address is translated into a corresponding +physical address by the appropriate translation mechanisms. + +The following instructions comprise the Zicbom extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cbo.clean _base_ +|<<#insns-cbo_clean>> + +|✓ +|✓ +|cbo.flush _base_ +|<<#insns-cbo_flush>> + +|✓ +|✓ +|cbo.inval _base_ +|<<#insns-cbo_inval>> + +|=== + +[#Zicboz,reftext="Cache-Block Zero Instructions"] +==== Cache-Block Zero Instructions + +Cache-block zero instructions store zeros to the set of bytes corresponding to a +cache block. An implementation may update the bytes in any order and with any +granularity and atomicity, including individual bytes. + +[NOTE] +==== +_Cache-block zero instructions store zeros independently of whether data from +the underlying memory locations are cacheable. In addition, this specification +does not constrain how the bytes are written._ +==== + +These instructions operate on the cache block, or the memory locations +corresponding to the cache block, whose effective address is specified in _rs1_. +The effective address is translated into a corresponding physical address by the +appropriate translation mechanisms. + +The following instructions comprise the Zicboz extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|cbo.zero _base_ +|<<#insns-cbo_zero>> + +|=== + +[#Zicbop,reftext="Cache-Block Prefetch Instructions"] +==== Cache-Block Prefetch Instructions + +Cache-block prefetch instructions are HINTs to the hardware to indicate that +software intends to perform a particular type of memory access in the near +future. The types of memory accesses are instruction fetch, data read (i.e. +load), and data write (i.e. store). + +These instructions operate on the cache block whose effective address is the sum +of the base address specified in _rs1_ and the sign-extended offset encoded in +_imm[11:0]_, where _imm[4:0]_ shall equal `0b00000`. The effective address is +translated into a corresponding physical address by the appropriate translation +mechanisms. + +[NOTE] +==== +_Cache-block prefetch instructions are encoded as ORI instructions with rd equal +to `0b00000`; however, for the purposes of effective address calculation, this +field is also interpreted as imm[4:0] like a store instruction._ +==== + +The following instructions comprise the Zicbop extension: + +[%header,cols="^1,^1,4,8"] +|=== +|RV32 +|RV64 +|Mnemonic +|Instruction + +|✓ +|✓ +|prefetch.i _offset_(_base_) +|<<#insns-prefetch_i>> + +|✓ +|✓ +|prefetch.r _offset_(_base_) +|<<#insns-prefetch_r>> + +|✓ +|✓ +|prefetch.w _offset_(_base_) +|<<#insns-prefetch_w>> + +|=== + +[#insns,reftext="Instructions"] +=== Instructions + +[#insns-cbo_clean,reftext="Cache Block Clean"] +==== cbo.clean + +Synopsis:: +Perform a clean operation on a cache block + +Mnemonic:: +cbo.clean _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x001, attr: ['CBO.CLEAN'] }, +]} +.... + +Description:: + +A *cbo.clean* instruction performs a clean operation on the cache block whose +effective address is the base address specified in _rs1_. The offset operand may +be omitted; otherwise, any expression that computes the offset shall evaluate to +zero. The instruction operates on the set of coherent caches accessed by the +agent executing the instruction. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_flush,reftext="Cache Block Flush"] +==== cbo.flush + +Synopsis:: +Perform a flush operation on a cache block + +Mnemonic:: +cbo.flush _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x002, attr: ['CBO.FLUSH'] }, +]} +.... + +Description:: + +A *cbo.flush* instruction performs a flush operation on the cache block whose +effective address is the base address specified in _rs1_. The offset operand may +be omitted; otherwise, any expression that computes the offset shall evaluate to +zero. The instruction operates on the set of coherent caches accessed by the +agent executing the instruction. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_inval,reftext="Cache Block Invalidate"] +==== cbo.inval + +Synopsis:: +Perform an invalidate operation on a cache block + +Mnemonic:: +cbo.inval _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x000, attr: ['CBO.INVAL'] }, +]} +.... + +Description:: + +A *cbo.inval* instruction performs an invalidate operation on the cache block +whose effective address is the base address specified in _rs1_. The offset +operand may be omitted; otherwise, any expression that computes the offset shall +evaluate to zero. The instruction operates on the set of coherent caches +accessed by the agent executing the instruction. Depending on CSR programming, +the instruction may perform a flush operation instead of an invalidate +operation. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-cbo_zero,reftext="Cache Block Zero"] +==== cbo.zero + +Synopsis:: +Store zeros to the full set of bytes corresponding to a cache block + +Mnemonic:: +cbo.zero _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0xF, attr: ['MISC-MEM'] }, + { bits: 5, name: 0x0 }, + { bits: 3, name: 0x2, attr: ['CBO'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 12, name: 0x004, attr: ['CBO.ZERO'] }, +]} +.... + +Description:: + +A *cbo.zero* instruction performs stores of zeros to the full set of bytes +corresponding to the cache block whose effective address is the base address +specified in _rs1_. The offset operand may be omitted; otherwise, any expression +that computes the offset shall evaluate to zero. An implementation may or may +not update the entire set of bytes atomically. + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_i,reftext="Cache Block Prefetch for Instruction Fetch"] +==== prefetch.i + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by an +instruction fetch in the near future + +Mnemonic:: +prefetch.i _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x0, attr: ['PREFETCH.I'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.i* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by an instruction fetch in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by an instruction fetch in order to improve memory access latency, but +this behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_r,reftext="Cache Block Prefetch for Data Read"] +==== prefetch.r + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by a data +read in the near future + +Mnemonic:: +prefetch.r _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x1, attr: ['PREFETCH.R'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.r* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by a data read (i.e. load) in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by a data read in order to improve memory access latency, but this +behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + +[#insns-prefetch_w,reftext="Cache Block Prefetch for Data Write"] +==== prefetch.w + +Synopsis:: +Provide a HINT to hardware that a cache block is likely to be accessed by a data +write in the near future + +Mnemonic:: +prefetch.w _offset_(_base_) + +Encoding:: +[wavedrom, , svg] +.... +{reg:[ + { bits: 7, name: 0x13, attr: ['OP-IMM'] }, + { bits: 5, name: 0x0, attr: ['offset[4:0]'] }, + { bits: 3, name: 0x6, attr: ['ORI'] }, + { bits: 5, name: 'rs1', attr: ['base'] }, + { bits: 5, name: 0x3, attr: ['PREFETCH.W'] }, + { bits: 7, name: 'imm[11:5]', attr: ['offset[11:5]'] }, +]} +.... + +Description:: + +A *prefetch.w* instruction indicates to hardware that the cache block whose +effective address is the sum of the base address specified in _rs1_ and the +sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`, +is likely to be accessed by a data write (i.e. store) in the near future. + +[NOTE] +==== +_An implementation may opt to cache a copy of the cache block in a cache +accessed by a data write in order to improve memory access latency, but this +behavior is not required._ +==== + +Operation:: +[source,sail] +-- +TODO +-- + |