aboutsummaryrefslogtreecommitdiff
path: root/src/cmo.adoc
blob: 710106eaf4bce6fa8bb1d4521383050568415772 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
[[cmo]]
== "CMO" Extensions for Base Cache Management Operation ISA, Version 1.0.0

=== Pseudocode for instruction semantics

The semantics of each instruction in the <<#insns>> chapter is expressed in a
SAIL-like syntax.

[#intro-cmo,reftext="Introduction"]
=== Introduction

_Cache-management operation_ (or _CMO_) instructions perform operations on
copies of data in the memory hierarchy. In general, CMO instructions operate on
cached copies of data, but in some cases, a CMO instruction may operate on
memory locations directly. Furthermore, CMO instructions are grouped by
operation into the following classes:

* A _management_ instruction manipulates cached copies of data with respect to a
  set of agents that can access the data
* A _zero_ instruction zeros out a range of memory locations, potentially
  allocating cached copies of data in one or more caches
* A _prefetch_ instruction indicates to hardware that data at a given memory
  location may be accessed in the near future, potentially allocating cached
  copies of data in one or more caches

This document introduces a base set of CMO ISA extensions that operate
specifically on cache blocks or the memory locations corresponding to a cache
block; these are known as _cache-block operation_ (or _CBO_) instructions. Each
of the above classes of instructions represents an extension in this
specification:

* The _Zicbom_ extension defines a set of cache-block management instructions:
  `CBO.INVAL`, `CBO.CLEAN`,  and `CBO.FLUSH`
* The _Zicboz_ extension defines a cache-block zero instruction: `CBO.ZERO`
* The _Zicbop_ extension defines a set of cache-block prefetch instructions:
  `PREFETCH.R`, `PREFETCH.W`, and `PREFETCH.I`

The execution behavior of the above instructions is also modified by CSR state
added by this specification.

The remainder of this document provides general background information on CMO
instructions and describes each of the above ISA extensions.

[NOTE]
====
_The term CMO encompasses all operations on caches or resources related to
caches. The term CBO represents a subset of CMOs that operate only on cache
blocks. The first CMO extensions only define CBOs._
====

[#background,reftext="Background"]
=== Background

This chapter provides information common to all CMO extensions.

[#memory-caches,reftext="Memory and Caches"]
==== Memory and Caches

A _memory location_ is a physical resource in a system uniquely identified by a
_physical address_. An _agent_ is a logic block, such as a RISC-V hart,
accelerator, I/O device, etc., that can access a given memory location.

[NOTE]
====
_A given agent may not be able to access all memory locations in a system, and
two different agents may or may not be able to access the same set of memory
locations._
====

A _load operation_ (or _store operation_) is performed by an agent to consume
(or modify) the data at a given memory location. Load and store operations are
performed as a result of explicit memory accesses to that memory location.
Additionally, a _read transfer_ from memory fetches the data at the memory
location, while a _write transfer_ to memory updates the data at the memory
location.

A _cache_ is a structure that buffers copies of data to reduce average memory
latency. Any number of caches may be interspersed between an agent and a memory
location, and load and store operations from an agent may be satisfied by a
cache instead of the memory location.

[NOTE]
====
_Load and store operations are decoupled from read and write transfers by
caches. For example, a load operation may be satisfied by a cache without
performing a read transfer from memory, or a store operation may be satisfied by
a cache that first performs a read transfer from memory._
====

Caches organize copies of data into _cache blocks_, each of which represents a
contiguous, naturally aligned power-of-two (or _NAPOT_) range of memory
locations. A cache block is identified by any of the physical addresses corresponding to
the underlying memory locations. The capacity and organization of a cache and
the size of a cache block are both _implementation-specific_, and the execution
environment provides software a means to discover information about the caches
and cache blocks in a system. In the initial set of CMO extensions, the size of
a cache block shall be uniform throughout the system.

[NOTE]
====
_In future CMO extensions, the requirement for a uniform cache block size may be
relaxed._
====

Implementation techniques such as speculative execution or hardware prefetching
may cause a given cache to allocate or deallocate a copy of a cache block at any
time, provided the corresponding physical addresses are accessible according to
the supported access type PMA and are cacheable according to the cacheability
PMA. Allocating a copy of a cache block results in a read transfer from another
cache or from memory, while deallocating a copy of a cache block may result in a
write transfer to another cache or to memory depending on whether the data in
the copy were modified by a store operation. Additional details are discussed in
<<#coherent-agents-caches>>.

==== Cache-Block Operations

A CBO instruction causes one or more operations to be performed on the cache
blocks identified by the instruction. In general, a CBO instruction may identify
one or more cache blocks; however, in the initial set of CMO extensions, CBO
instructions identify a single cache block only.

A cache-block management instruction performs one of the following operations,
relative to the copy of a given cache block allocated in a given cache:

* An _invalidate operation_ deallocates the copy of the cache block

* A _clean operation_ performs a write transfer to another cache or to memory if
  the data in the copy of the cache block have been modified by a store
  operation

* A _flush operation_ atomically performs a clean operation followed by an
  invalidate operation

Additional details, including the actual operation performed by a given
cache-block management instruction, are described in <<#Zicbom>>.

A cache-block zero instruction performs a set of store operations that write
zeros to the set of bytes corresponding to a cache block. Unless specified
otherwise, the store operations generated by a cache-block zero instruction have
the same general properties and behaviors that other store instructions in the
architecture have. An implementation may or may not update the entire set of
bytes atomically with a single store operation. Additional details are described
in <<#Zicboz>>.

A cache-block prefetch instruction is a HINT to the hardware that software
expects to perform a particular type of memory access in the near future.
Additional details are described in <<#Zicbop>>.

[#coherent-agents-caches,reftext="Coherent Agents and Caches"]
=== Coherent Agents and Caches

For a given memory location, a _set of coherent agents_ consists of the agents
for which all of the following hold:

* Store operations from all agents in the set appear to be serialized with
  respect to each other
* Store operations from all agents in the set eventually appear to all other
  agents in the set
* A load operation from an agent in the set returns data from a store operation
  from an agent in the set (or from the initial data in memory)

The coherent agents within such a set shall access a given memory location with
the same physical address and the same physical memory attributes; however, if
the coherence PMA for a given agent indicates a given memory location is not
coherent, that agent shall not be a member of a set of coherent agents with any
other agent for that memory location and shall be the sole member of a set of
coherent agents consisting of itself.

An agent who is a member of a set of coherent agents is said to be _coherent_
with respect to the other agents in the set. On the other hand, an agent who is
_not_ a member is said to be _non-coherent_ with respect to the agents in the
set.

Caches introduce the possibility that multiple copies of a given cache block may
be present in a system at the same time. An _implementation-specific_ mechanism
keeps these copies coherent with respect to the load and store operations from
the agents in the set of coherent agents. Additionally, if a coherent agent in
the set executes a CBO instruction that specifies the cache block, the resulting
operation shall apply to any and all of the copies in the caches that can be
accessed by the load and store operations from the coherent agents.

[NOTE]
====
_An operation from a CBO instruction is defined to operate only on the copies of
a cache block that are cached in the caches accessible by the explicit memory
accesses performed by the set of coherent agents. This includes copies of a
cache block in caches that are accessed only indirectly by load and store
operations, e.g. coherent instruction caches._
====

The set of caches subject to the above mechanism form a _set of coherent
caches_, and each coherent cache has the following behaviors, assuming all
operations are performed by the agents in a set of coherent agents:

* A coherent cache is permitted to allocate and deallocate copies of a cache
  block and perform read and write transfers as described in <<#memory-caches>> 

* A coherent cache is permitted to perform a write transfer to memory provided
  that a store operation has modified the data in the cache block since the most
  recent invalidate, clean, or flush operation on the cache block

* At least one coherent cache is responsible for performing a write transfer to
  memory once a store operation has modified the data in the cache block until
  the next invalidate, clean, or flush operation on the cache block, after which
  no coherent cache is responsible (or permitted) to perform a write transfer to
  memory until the next store operation has modified the data in the cache block

* A coherent cache is required to perform a write transfer to memory if a store
  operation has modified the data in the cache block since the most recent
  invalidate, clean, or flush operation on the cache block and if the next clean
  or flush operation requires a write transfer to memory

[NOTE]
====
_The above restrictions ensure that a "clean" copy of a cache block, fetched by
a read transfer from memory and unmodified by a store operation, cannot later
overwrite the copy of the cache block in memory updated by a write transfer to
memory from a non-coherent agent._
====

A non-coherent agent may initiate a cache-block operation that operates on the
set of coherent caches accessed by a set of coherent agents. The mechanism to
perform such an operation is _implementation-specific_.

==== Memory Ordering

===== Preserved Program Order

The preserved program order (abbreviated _PPO_) rules are defined by the RVWMO
memory ordering model. How the operations resulting from CMO instructions fit
into these rules is described below.

For cache-block management instructions, the resulting invalidate, clean, and
flush operations behave as stores in the PPO rules subject to one additional
overlapping address rule. Specifically, if _a_ precedes _b_ in program order,
then _a_ will precede _b_ in the global memory order if:

* _a_ is an invalidate, clean, or flush, _b_ is a load, and _a_ and _b_ access
  overlapping memory addresses

[NOTE]
====
_The above rule ensures that a subsequent load in program order never appears
in the global memory order before a preceding invalidate, clean, or flush
operation to an overlapping address._
====

Additionally, invalidate, clean, and flush operations are classified as W or O
(depending on the physical memory attributes for the corresponding physical
addresses) for the purposes of predecessor and successor sets in `FENCE`
instructions. These operations are _not_ ordered by other instructions that
order stores, e.g. `FENCE.I` and `SFENCE.VMA`.

For cache-block zero instructions, the resulting store operations behave as
stores in the PPO rules and are ordered by other instructions that order stores.

Finally, for cache-block prefetch instructions, the resulting operations are
_not_ ordered by the PPO rules nor are they ordered by any other ordering
instructions.

===== Load Values

An invalidate operation may change the set of values that can be returned by a
load. In particular, an additional condition is added to the Load Value Axiom:

* If an invalidate operation _i_ precedes a load _r_ and operates on a byte _x_
  returned by _r_, and no store to _x_ appears between _i_ and _r_ in program
  order or in the global memory order, then _r_ returns any of the following
  values for _x_:

. If no clean or flush operations on _x_ precede _i_ in the global memory order,
  either the initial value of _x_ or the value of any store to _x_ that precedes
  _i_

. If no store to _x_ precedes a clean or flush operation on _x_ in the global
  memory order and if the clean or flush operation on _x_ precedes _i_ in the
  global memory order, either the initial value of _x_ or the value of any store
  to _x_ that precedes _i_

. If a store to _x_ precedes a clean or flush operation on _x_ in the global
  memory order and if the clean or flush operation on _x_ precedes _i_ in the
  global memory order, either the value of the latest store to _x_ that precedes
  the latest clean or flush operation on _x_ or the value of any store to _x_
  that both precedes _i_ and succeeds the latest clean or flush operation on _x_
  that precedes _i_ 

. The value of any store to _x_ by a non-coherent agent regardless of the above
  conditions

[NOTE]
====
_The first three bullets describe the possible load values at different points
in the global memory order relative to clean or flush operations. The final
bullet implies that the load value may be produced by a non-coherent agent at
any time._
====

==== Traps

Execution of certain CMO instructions may result in traps due to CSR state,
described in the <<#csr_state>> section, or due to the address translation and
protection mechanisms. The trapping behavior of CMO instructions is described in
the following sections.

===== Illegal Instruction and Virtual Instruction Exceptions

Cache-block management instructions and cache-block zero instructions may raise
illegal instruction exceptions or virtual instruction exceptions depending on
the current privilege mode and the state of the CMO control registers described
in the <<#csr_state>> section.

Cache-block prefetch instructions raise neither illegal instruction exceptions
nor virtual instruction exceptions.

===== Page Fault, Guest-Page Fault, and Access Fault Exceptions

Similar to load and store instructions, CMO instructions are explicit memory
access instructions that compute an effective address. The effective address is
ultimately translated into a physical address based on the privilege mode and
the enabled translation mechanisms, and the CMO extensions impose the following
constraints on the physical addresses in a given cache block:

* The PMP access control bits shall be the same for _all_ physical addresses in
  the cache block, and if write permission is granted by the PMP access control
  bits, read permission shall also be granted

* The PMAs shall be the same for _all_ physical addresses in the cache block,
  and if write permission is granted by the supported access type PMAs, read
  permission shall also be granted

If the above constraints are not met, the behavior of a CBO instruction is
UNSPECIFIED.

[NOTE]
====
_This specification assumes that the above constraints will typically be met for
main memory regions and may be met for certain I/O regions._
====

Additionally, for the purposes of PMP and PMA checks, the access size of a CMO
instruction equals the size of the cache block accessed by the instruction.

The Zicboz extension introduces an additional supported access type PMA for
cache-block zero instructions. Main memory regions are required to support
accesses by cache-block zero instructions; however, I/O regions may specify
whether accesses by cache-block zero instructions are supported.

A cache-block management instruction is permitted to access the specified cache
block whenever a load instruction or store instruction is permitted to access
the corresponding physical addresses. If neither a load instruction nor store
instruction is permitted to access the physical addresses, but an instruction
fetch is permitted to access the physical addresses, whether a cache-block
management instruction is permitted to access the cache block is UNSPECIFIED. If
access to the cache block is not permitted, a cache-block management instruction
raises a store page fault or store guest-page fault exception if address
translation does not permit any access or raises a store access fault exception
otherwise. During address translation, the instruction also checks the accessed
bit and may either raise an exception or set the bit as required.

[NOTE]
====
_The interaction between cache-block management instructions and instruction
fetches will be specified in a future extension._

_As implied by omission, a cache-block management instruction does not check the
dirty bit and neither raises an exception nor sets the bit._
====

A cache-block zero instruction is permitted to access the specified cache block
whenever a store instruction is permitted to access the corresponding physical
addresses and when the PMAs indicate that cache-block zero instructions are a
supported access type. If access to the cache block is not permitted, a
cache-block zero instruction raises a store page fault or store guest-page fault
exception if address translation does not permit write access or raises a store
access fault exception otherwise. During address translation, the instruction
also checks the accessed and dirty bits and may either raise an exception or set
the bits as required.

A cache-block prefetch instruction is permitted to access the specified cache
block whenever a load instruction, store instruction, or instruction fetch is
permitted to access the corresponding physical addresses. If access to the cache
block is not permitted, a cache-block prefetch instruction does not raise any
exceptions and shall not access any caches or memory. During address
translation, the instruction does _not_ check the accessed and dirty bits and
neither raises an exception nor sets the bits.

When a page fault, guest-page fault, or access fault exception is taken, the
relevant *tval CSR is written with the faulting effective address (i.e. the same
faulting address value as for other causes of these exceptions).

[NOTE]
====
_Like a load or store instruction, a CMO instruction may or may not be permitted
to access a cache block based on the states of the `MPRV`, `MPV`, and `MPP` bits
in `mstatus` and the `SUM` and `MXR` bits in `mstatus`, `sstatus`, and
`vsstatus`._

_This specification expects that implementations will process cache-block
management instructions like store/AMO instructions, so store/AMO exceptions are
appropriate for these instructions, regardless of the permissions required._
====

===== Address Misaligned Exceptions

CMO instructions do _not_ generate address misaligned exceptions.

===== Breakpoint Exceptions and Debug Mode Entry

Unless otherwise defined by the debug architecture specification, the behavior
of trigger modules with respect to CMO instructions is UNSPECIFIED.

[NOTE]
====
_For the Zicbom, Zicboz, and Zicbop extensions, this specification recommends
the following common trigger module behaviors:_

* Type 6 address match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=0`,
  should be supported

* Type 2 address/data match triggers, i.e. `tdata1.type=2`, should be
  unsupported
    
* The size of a memory access equals the size of the cache block accessed, and
  the compare values follow from the addresses of the NAPOT memory region
  corresponding to the cache block containing the effective address
  
* Unless an encoding for a cache block is added to the `mcontrol6.size` field,
  an address trigger should only match a memory access from a CBO instruction if
  `mcontrol6.size=0`
    
_If the Zicbom extension is implemented, this specification recommends the
following additional trigger module behaviors:_

* Implementing address match triggers should be optional

* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
  should be unsupported

* Memory accesses are considered to be stores, i.e. an address trigger matches
  only if `mcontrol6.store=1`

_If the Zicboz extension is implemented, this specification recommends the
following additional trigger module behaviors:_

* Implementing address match triggers should be mandatory

* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
  should be supported, and implementing these triggers should be optional

* Memory accesses are considered to be stores, i.e. an address trigger matches
  only if `mcontrol6.store=1`

_If the Zicbop extension is implemented, this specification recommends the
following additional trigger module behaviors:_

* Implementing address match triggers should be optional

* Type 6 data match triggers, i.e. `tdata1.type=6` and `mcontrol6.select=1`,
  should be unsupported

* Memory accesses may be considered to be loads or stores depending on the
  implementation, i.e. whether an address trigger matches on these instructions
  when `mcontrol6.load=1` or `mcontrol6.store=1` is _implementation-specific_

_This specification also recommends that the behavior of trigger modules with
respect to the Zicboz extension should be defined in version 1.0 of the debug
architecture specification. The behavior of trigger modules with respect to the
Zicbom and Zicbop extensions is expected to be defined in future extensions._
====

===== Hypervisor Extension

For the purposes of writing the `mtinst` or `htinst` register on a trap, the
following standard transformation is defined for cache-block management
instructions and cache-block zero instructions:

[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 'opcode'},
	{ bits: 5,  name: 0x0 },
	{ bits: 3,  name: 'funct3'},
	{ bits: 5,  name: 0x0},
	{ bits: 12, name: 'operation'},
]}
....

The `operation` field corresponds to the 12 most significant bits of the
trapping instruction.

[NOTE]
====
_As described in the hypervisor extension, a zero may be written into `mtinst`
or `htinst` instead of the standard transformation defined above._
====

==== Effects on Constrained LR/SC Loops

The following event is added to the list of events that satisfy the eventuality
guarantee provided by constrained LR/SC loops, as defined in the A extension:

* Some other hart executes a cache-block management instruction or a cache-block
  zero instruction to the reservation set of the LR instruction in _H_'s
  constrained LR/SC loop.

[NOTE]
====
_The above event has been added to accommodate cache coherence protocols that
cannot distinguish between invalidations for stores and invalidations for
cache-block management operations._

_Aside from the above event, CMO instructions neither change the properties of
constrained LR/SC loops nor modify the eventuality guarantee provided by them.
For example, executing a CMO instruction may cause a constrained LR/SC loop on
any hart to fail periodically or may cause a unconstrained LR/SC sequence on the
same hart to fail always. Additionally, executing a cache-block prefetch
instruction does not impact the eventuality guarantee provided by constrained
LR/SC loops executed on any hart._
====

==== Software Discovery

The initial set of CMO extensions requires the following information to be
discovered by software:

* The size of the cache block for management and prefetch instructions
* The size of the cache block for zero instructions
* CBIE support at each privilege level

Other general cache characteristics may also be specified in the discovery
mechanism.

[#csr_state,reftext="Control and Status Register State"]
=== Control and Status Register State

[NOTE]
====
_The CMO extensions rely on state in {csrname} CSRs that will be defined in a
future update to the privileged architecture. If this CSR update is not
ratified, the CMO extension will define its own CSRs._
====

Three CSRs control the execution of CMO instructions:

* `m{csrname}`
* `s{csrname}`
* `h{csrname}`

The `s{csrname}` register is used by all supervisor modes, including VS-mode. A
hypervisor is responsible for saving and restoring `s{csrname}` on guest context
switches. The `h{csrname}` register is only present if the H-extension is
implemented and enabled.

Each `x{csrname}` register (where `x` is `m`, `s`, or `h`) has the following
generic format:

.Generic Format for x{csrname} CSRs
[cols="^10,^10,80a"]
|===
| Bits    | Name     | Description

| [5:4]   | `CBIE`   | Cache Block Invalidate instruction Enable

Enables the execution of the cache block invalidate instruction, `CBO.INVAL`, in
a lower privilege mode:

* `00`: The instruction raises an illegal instruction or virtual instruction
  exception
* `01`: The instruction is executed and performs a flush operation
* `10`: _Reserved_
* `11`: The instruction is executed and performs an invalidate operation

| [6]     | `CBCFE`  | Cache Block Clean and Flush instruction Enable

Enables the execution of the cache block clean instruction, `CBO.CLEAN`, and the
cache block flush instruction, `CBO.FLUSH`, in a lower privilege mode:

* `0`: The instruction raises an illegal instruction or virtual instruction
  exception
* `1`: The instruction is executed

| [7]     | `CBZE`   | Cache Block Zero instruction Enable

Enables the execution of the cache block zero instruction, `CBO.ZERO`, in a
lower privilege mode:

* `0`: The instruction raises an illegal instruction or virtual instruction
  exception
* `1`: The instruction is executed

|===

The x{csrname} registers control CBO instruction execution based on the current
privilege mode and the state of the appropriate CSRs, as detailed below.

A `CBO.INVAL` instruction executes or raises either an illegal instruction
exception or a virtual instruction exception based on the state of the
`x{csrname}.CBIE` fields:

[source,sail,subs="attributes+"]
--

// illegal instruction exceptions
if (((priv_mode != M) && (m{csrname}.CBIE == 00)) ||
    ((priv_mode == U) && (s{csrname}.CBIE == 00)))
{
  <raise illegal instruction exception>
}
// virtual instruction exceptions
else if (((priv_mode == VS) && (h{csrname}.CBIE == 00)) ||
         ((priv_mode == VU) && ((h{csrname}.CBIE == 00) || (s{csrname}.CBIE == 00))))
{
  <raise virtual instruction exception>
}
// execute instruction
else
{
  if (((priv_mode != M) && (m{csrname}.CBIE == 01)) ||
      ((priv_mode == U) && (s{csrname}.CBIE == 01)) ||
      ((priv_mode == VS) && (h{csrname}.CBIE == 01)) ||
      ((priv_mode == VU) && ((h{csrname}.CBIE == 01) || (s{csrname}.CBIE == 01))))
  {
    <execute CBO.INVAL and perform flush operation>
  }
  else
  {
    <execute CBO.INVAL and perform invalidate operation>
  }
}


--

[NOTE]
====
_Until a modified cache block has updated memory, a `CBO.INVAL` instruction may
expose stale data values in memory if the CSRs are programmed to perform an
invalidate operation. This behavior may result in a security hole if lower
privileged level software performs an invalidate operation and accesses
sensitive information in memory._

_To avoid such holes, higher privileged level software must perform either a
clean or flush operation on the cache block before permitting lower privileged
level software to perform an invalidate operation on the block. Alternatively,
higher privileged level software may program the CSRs so that `CBO.INVAL`
either traps or performs a flush operation in a lower privileged level._
====

A `CBO.CLEAN` or `CBO.FLUSH` instruction executes or raises an illegal
instruction or virtual instruction exception based on the state of the
`x{csrname}.CBCFE` bits:

[source,sail,subs="attributes+"]
--

// illegal instruction exceptions
if (((priv_mode != M) && !m{csrname}.CBCFE) ||
    ((priv_mode == U) && !s{csrname}.CBCFE))
{
  <raise illegal instruction exception>
}
// virtual instruction exceptions
else if (((priv_mode == VS) && !h{csrname}.CBCFE) ||
         ((priv_mode == VU) && !(h{csrname}.CBCFE && s{csrname}.CBCFE)))
{
  <raise virtual instruction exception>
}
// execute instruction
else
{
  <execute CBO.CLEAN or CBO.FLUSH>
}

--

Finally, a `CBO.ZERO` instruction executes or raises an illegal instruction or
virtual instruction exception based on the state of the `x{csrname}.CBZE` bits:

[source,sail,subs="attributes+"]
--

// illegal instruction exceptions
if (((priv_mode != M) && !m{csrname}.CBZE) ||
    ((priv_mode == U) && !s{csrname}.CBZE))
{
  <raise illegal instruction exception>
}
// virtual instruction exceptions
else if (((priv_mode == VS) && !h{csrname}.CBZE) ||
         ((priv_mode == VU) && !(h{csrname}.CBZE && s{csrname}.CBZE)))
{
  <raise virtual instruction exception>
}
// execute instruction
else
{
  <execute CBO.ZERO>
}

--

Each `x{csrname}` register is WARL; however, software should determine the legal
values from the execution environment discovery mechanism.

[#extensions,reftext="Extensions"]
=== Extensions

CMO instructions are defined in the following extensions:

* <<#Zicbom>>
* <<#Zicboz>>
* <<#Zicbop>>

[#Zicbom,reftext="Cache-Block Management Instructions"]
==== Cache-Block Management Instructions

Cache-block management instructions enable software running on a set of coherent
agents to communicate with a set of non-coherent agents by performing one of the
following operations:

* An invalidate operation makes data from store operations performed by a set of
  non-coherent agents visible to the set of coherent agents at a point common to
  both sets by deallocating all copies of a cache block from the set of coherent
  caches up to that point
  
* A clean operation makes data from store operations performed by the set of
  coherent agents visible to a set of non-coherent agents at a point common to
  both sets by performing a write transfer of a copy of a cache block to that
  point provided a coherent agent performed a store operation that modified the
  data in the cache block since the previous invalidate, clean, or flush
  operation on the cache block
  
* A flush operation atomically performs a clean operation followed by an
  invalidate operation

In the Zicbom extension, the instructions operate to a point common to _all_
agents in the system. In other words, an invalidate operation ensures that store
operations from all non-coherent agents visible to agents in the set of coherent
agents, and a clean operation ensures that store operations from coherent agents
visible to all non-coherent agents.

[NOTE]
====
_The Zicbom extension does not prohibit agents that fall outside of the above
architectural definition; however, software cannot rely on the defined cache
operations to have the desired effects with respect to those agents._

_Future extensions may define different sets of agents for the purposes of
performance optimization._
====

These instructions operate on the cache block whose effective address is
specified in _rs1_. The effective address is translated into a corresponding
physical address by the appropriate translation mechanisms.

The following instructions comprise the Zicbom extension:

[%header,cols="^1,^1,4,8"]
|===
|RV32
|RV64
|Mnemonic
|Instruction

|&#10003;
|&#10003;
|cbo.clean _base_
|<<#insns-cbo_clean>>

|&#10003;
|&#10003;
|cbo.flush _base_
|<<#insns-cbo_flush>>

|&#10003;
|&#10003;
|cbo.inval _base_
|<<#insns-cbo_inval>>

|===

[#Zicboz,reftext="Cache-Block Zero Instructions"]
==== Cache-Block Zero Instructions

Cache-block zero instructions store zeros to the set of bytes corresponding to a
cache block. An implementation may update the bytes in any order and with any
granularity and atomicity, including individual bytes.

[NOTE]
====
_Cache-block zero instructions store zeros independently of whether data from
the underlying memory locations are cacheable. In addition, this specification
does not constrain how the bytes are written._
====

These instructions operate on the cache block, or the memory locations
corresponding to the cache block, whose effective address is specified in _rs1_.
The effective address is translated into a corresponding physical address by the
appropriate translation mechanisms.

The following instructions comprise the Zicboz extension:

[%header,cols="^1,^1,4,8"]
|===
|RV32
|RV64
|Mnemonic
|Instruction

|&#10003;
|&#10003;
|cbo.zero _base_
|<<#insns-cbo_zero>>

|===

[#Zicbop,reftext="Cache-Block Prefetch Instructions"]
==== Cache-Block Prefetch Instructions

Cache-block prefetch instructions are HINTs to the hardware to indicate that
software intends to perform a particular type of memory access in the near
future. The types of memory accesses are instruction fetch, data read (i.e.
load), and data write (i.e. store).

These instructions operate on the cache block whose effective address is the sum
of the base address specified in _rs1_ and the sign-extended offset encoded in
_imm[11:0]_, where _imm[4:0]_ shall equal `0b00000`. The effective address is
translated into a corresponding physical address by the appropriate translation
mechanisms.

[NOTE]
====
_Cache-block prefetch instructions are encoded as ORI instructions with rd equal
to `0b00000`; however, for the purposes of effective address calculation, this
field is also interpreted as imm[4:0] like a store instruction._
====

The following instructions comprise the Zicbop extension:

[%header,cols="^1,^1,4,8"]
|===
|RV32
|RV64
|Mnemonic
|Instruction

|&#10003;
|&#10003;
|prefetch.i _offset_(_base_)
|<<#insns-prefetch_i>>

|&#10003;
|&#10003;
|prefetch.r _offset_(_base_)
|<<#insns-prefetch_r>>

|&#10003;
|&#10003;
|prefetch.w _offset_(_base_)
|<<#insns-prefetch_w>>

|===

[#insns,reftext="Instructions"]
=== Instructions

[#insns-cbo_clean,reftext="Cache Block Clean"]
==== cbo.clean

Synopsis::
Perform a clean operation on a cache block

Mnemonic::
cbo.clean _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
	{ bits: 5,  name: 0x0 },
	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
	{ bits: 5,  name: 'rs1', attr: ['base'] },
	{ bits: 12, name: 0x001, attr: ['CBO.CLEAN'] },
]}
....

Description::

A *cbo.clean* instruction performs a clean operation on the cache block whose
effective address is the base address specified in _rs1_. The offset operand may
be omitted; otherwise, any expression that computes the offset shall evaluate to
zero. The instruction operates on the set of coherent caches accessed by the
agent executing the instruction.

[NOTE]
====
_When executing a *cbo.clean* instruction, an implementation may instead perform
a flush operation, since the result of that operation is indistinguishable from
the sequence of performing a clean operation just before deallocating all cached
copies in the set of coherent caches._
====

Operation::
[source,sail]
--
TODO
--

[#insns-cbo_flush,reftext="Cache Block Flush"]
==== cbo.flush

Synopsis::
Perform a flush operation on a cache block

Mnemonic::
cbo.flush _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
	{ bits: 5,  name: 0x0 },
	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
	{ bits: 5,  name: 'rs1', attr: ['base'] },
	{ bits: 12, name: 0x002, attr: ['CBO.FLUSH'] },
]}
....

Description::

A *cbo.flush* instruction performs a flush operation on the cache block whose
effective address is the base address specified in _rs1_. The offset operand may
be omitted; otherwise, any expression that computes the offset shall evaluate to
zero. The instruction operates on the set of coherent caches accessed by the
agent executing the instruction.

Operation::
[source,sail]
--
TODO
--

[#insns-cbo_inval,reftext="Cache Block Invalidate"]
==== cbo.inval

Synopsis::
Perform an invalidate operation on a cache block

Mnemonic::
cbo.inval _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
	{ bits: 5,  name: 0x0 },
	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
	{ bits: 5,  name: 'rs1', attr: ['base'] },
	{ bits: 12, name: 0x000, attr: ['CBO.INVAL'] },
]}
....

Description::

A *cbo.inval* instruction performs an invalidate operation on the cache block
whose effective address is the base address specified in _rs1_. The offset
operand may be omitted; otherwise, any expression that computes the offset shall
evaluate to zero. The instruction operates on the set of coherent caches
accessed by the agent executing the instruction. Depending on CSR programming,
the instruction may perform a flush operation instead of an invalidate
operation.

[NOTE]
====
_When executing a *cbo.inval* instruction, an implementation may instead perform
a flush operation, since the result of that operation is indistinguishable from
the sequence of performing a write transfer to memory just before performing an
invalidate operation._
====

Operation::
[source,sail]
--
TODO
--

[#insns-cbo_zero,reftext="Cache Block Zero"]
==== cbo.zero

Synopsis::
Store zeros to the full set of bytes corresponding to a cache block

Mnemonic::
cbo.zero _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0xF,   attr: ['MISC-MEM'] },
	{ bits: 5,  name: 0x0 },
	{ bits: 3,  name: 0x2,   attr: ['CBO'] },
	{ bits: 5,  name: 'rs1', attr: ['base'] },
	{ bits: 12, name: 0x004, attr: ['CBO.ZERO'] },
]}
....

Description::

A *cbo.zero* instruction performs stores of zeros to the full set of bytes
corresponding to the cache block whose effective address is the base address
specified in _rs1_. The offset operand may be omitted; otherwise, any expression
that computes the offset shall evaluate to zero. An implementation may or may
not update the entire set of bytes atomically.

Operation::
[source,sail]
--
TODO
--

[#insns-prefetch_i,reftext="Cache Block Prefetch for Instruction Fetch"]
==== prefetch.i

Synopsis::
Provide a HINT to hardware that a cache block is likely to be accessed by an
instruction fetch in the near future

Mnemonic::
prefetch.i _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
	{ bits: 5,  name: 'rs1',       attr: ['base'] },
	{ bits: 5,  name: 0x0,         attr: ['PREFETCH.I'] },
	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
]}
....

Description::

A *prefetch.i* instruction indicates to hardware that the cache block whose
effective address is the sum of the base address specified in _rs1_ and the
sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
is likely to be accessed by an instruction fetch in the near future.

[NOTE]
====
_An implementation may opt to cache a copy of the cache block in a cache
accessed by an instruction fetch in order to improve memory access latency, but
this behavior is not required._
====

Operation::
[source,sail]
--
TODO
--

[#insns-prefetch_r,reftext="Cache Block Prefetch for Data Read"]
==== prefetch.r

Synopsis::
Provide a HINT to hardware that a cache block is likely to be accessed by a data
read in the near future

Mnemonic::
prefetch.r _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
	{ bits: 5,  name: 'rs1',       attr: ['base'] },
	{ bits: 5,  name: 0x1,         attr: ['PREFETCH.R'] },
	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
]}
....

Description::

A *prefetch.r* instruction indicates to hardware that the cache block whose
effective address is the sum of the base address specified in _rs1_ and the
sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
is likely to be accessed by a data read (i.e. load) in the near future.

[NOTE]
====
_An implementation may opt to cache a copy of the cache block in a cache
accessed by a data read in order to improve memory access latency, but this
behavior is not required._
====

Operation::
[source,sail]
--
TODO
--

[#insns-prefetch_w,reftext="Cache Block Prefetch for Data Write"]
==== prefetch.w

Synopsis::
Provide a HINT to hardware that a cache block is likely to be accessed by a data
write in the near future

Mnemonic::
prefetch.w _offset_(_base_)

Encoding::
[wavedrom, , svg]
....
{reg:[
	{ bits: 7,  name: 0x13,        attr: ['OP-IMM'] },
	{ bits: 5,  name: 0x0,         attr: ['offset[4:0]'] },
	{ bits: 3,  name: 0x6,         attr: ['ORI'] },
	{ bits: 5,  name: 'rs1',       attr: ['base'] },
	{ bits: 5,  name: 0x3,         attr: ['PREFETCH.W'] },
	{ bits: 7, name: 'imm[11:5]',  attr: ['offset[11:5]'] },
]}
....

Description::

A *prefetch.w* instruction indicates to hardware that the cache block whose
effective address is the sum of the base address specified in _rs1_ and the
sign-extended offset encoded in _imm[11:0]_, where _imm[4:0]_ equals `0b00000`,
is likely to be accessed by a data write (i.e. store) in the near future.

[NOTE]
====
_An implementation may opt to cache a copy of the cache block in a cache
accessed by a data write in order to improve memory access latency, but this
behavior is not required._
====

Operation::
[source,sail]
--
TODO
--