1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
|
\section{An Operational Memory Model}
\label{sec:operational}
This is an alternative presentation of the RVWMO memory model in
operational style.
%
It aims to admit exactly the same extensional behavior as the
axiomatic presentation: for any given program, admitting an execution
if and only if the axiomatic presentation allows it.
The axiomatic presentation is defined as a predicate on complete
candidate executions. In contrast, this operational presentation has
an abstract microarchitectural flavor: it is expressed as a state
machine, with states that are an abstract representation of hardware
machine states, and with explicit out-of-order and speculative
execution
(but abstracting from more implementation-specific microarchitectural
details such as register renaming, store buffers, cache hierarchies, cache protocols, etc.).
As such, it can provide useful intuition.
It can also
construct executions incrementally, making it possible to
interactively and randomly explore the behavior of larger examples,
while the axiomatic model requires complete candidate executions
over which the axioms can be checked.
The operational presentation covers mixed-size execution, with
potentially overlapping memory accesses of different power-of-two byte
sizes. Misaligned accesses are broken up into single-byte accesses.
The operational model, together with a fragment of the RISC-V ISA
semantics (RV64I and A), are integrated into the {\tt rmem} exploration
tool (\url{https://github.com/rems-project/rmem}). {\tt rmem} can
explore litmus tests (see \ref{sec:litmustests}) and small ELF
binaries exhaustively,
pseudo-randomly and interactively. In {\tt rmem}, the ISA semantics
is expressed explicitly in Sail (see
\url{https://github.com/rems-project/sail} for the Sail language, and
\url{https://github.com/rems-project/sail-riscv} for the RISC-V ISA
model), and the concurrency semantics is expressed in Lem (see
\url{https://github.com/rems-project/lem} for the Lem language).
{\tt rmem} has a command-line interface and a web-interface.
The web-interface runs entirely on the client side, and is provided
online together with a library of litmus tests:
\url{http://www.cl.cam.ac.uk/~pes20/rmem}. The command-line interface
is faster than the web-interface, specially in exhaustive mode.
% A library of RISC-V litmus tests can be downloaded from
% \url{https://github.com/litmus-tests/litmus-tests-riscv}.
% This repository also provides instructions on how to run the litmus
% tests on RISC-V hardware and how to compare the results with the
% operational and axiomatic models. The library is also available
% through the web-interface.
% TODO: compare with the herd and alloy versions
Below is an informal introduction of the model states and transitions.
The description of the formal model starts in the next subsection.
Terminology: In contrast to the axiomatic presentation, here every memory operation is either a load or a store.
Hence, AMOs give rise to two distinct memory operations, a load and a store.
When used in conjunction with ``instruction'', the terms ``load'' and ``store'' refer to instructions that give rise to such memory operations.
As such, both include AMO instructions.
The term ``acquire'' refers to an instruction (or its memory operation) with the acquire-RCpc or acquire-RCsc annotation.
The term ``release'' refers to an instruction (or its memory operation) with the release-RCpc or release-RCsc annotation.
\paragraph{Model states}
A model state consists of a shared memory and a tuple of hart states.
\begin{center}
\sffamily
\begin{tabular}{ccc}
\cline{1-1}\cline{3-3}
\multicolumn{1}{|c|}{Hart 0} & \bf \dots & \multicolumn{1}{|c|}{Hart $n$} \\
\cline{1-1}\cline{3-3}
$\big\uparrow$ $\big\downarrow$ & & $\big\uparrow$ $\big\downarrow$ \\
\hline
\multicolumn{3}{|c|}{Shared Memory} \\
\hline
\end{tabular}
\end{center}
The shared memory state records all the memory store operations that have propagated so far, in the order they propagated (this can be made more efficient, but for simplicity of the presentation we keep it this way).
Each hart state consists principally of a tree of instruction instances, some of which have been \emph{finished}, and some of which have not.
Non-finished instruction instances can be subject to \emph{restart}, e.g.~if they depend on an out-of-order or speculative load that turns out to be unsound.
Conditional branch and indirect jump instructions may have multiple successors in the instruction tree.
When such instruction is finished, any un-taken alternative paths are discarded.
Each instruction instance in the instruction tree has a state that
includes an execution state of the intra-instruction semantics (the
ISA pseudocode for this instruction).
The model uses a formalization of the intra-instruction semantics in Sail.
One can think of the execution state of an instruction as a representation of the pseudocode control state, pseudocode call stack, and local variable values.
An instruction instance state also includes information about the instance's memory and register footprints, its register reads and writes, its memory operations, whether it is finished, etc.
\paragraph{Model transitions}
The model defines, for any model state, the set of allowed transitions, each of which is a single atomic step to a new abstract machine state.
Execution of a single instruction will typically involve many
transitions, and they may be interleaved in operational-model
execution with transitions arising from other instructions.
Each transition arises from a single instruction instance; it will
change the state of that instance, and it may depend on or change the
rest of its hart state and the shared memory state, but it does not depend on other hart states, and it will not change them.
% Instructions cannot be treated
% as atomic units: complete execution of a single instruction instance may
% involve many transitions, which can be interleaved with those of other
% instances in the same or other harts, and some of this is programmer-visible.
The transitions are introduced below and defined in Section~\ref{sec:omm:transitions}, with a precondition and a construction of the post-transition model state for each.
\noindent Transitions for all instructions:
\begin{itemize}
\item \nameref{omm:fetch}: This transition represents a fetch and
decode of a new instruction instance, as a program order successor
of a previously fetched instruction instance (or the initial fetch
address).
The model assumes the instruction memory is fixed; it does not
describe the behavior of self-modifying code.
In particular, the \nameref{omm:fetch} transition does not generate memory load operations, and the shared memory is not involved in the transition.
Instead, the model depends on an external oracle that provides an opcode when given a memory location.
%
\item[$\circ$] \nameref{omm:reg_write}: This is a write of a register value.
\item[$\circ$] \nameref{omm:reg_read}: This is a read of a register
value from the most recent program-order-predecessor instruction instance that writes to that register.
\item[$\circ$] \nameref{omm:sail_interp}: This covers pseudocode
internal computation: arithmetic, function calls, etc.
\item[$\circ$] \nameref{omm:finish}: At this point the instruction pseudocode is done, the instruction cannot be restarted, memory accesses cannot be discarded, and all memory effects have taken place.
For conditional branch and indirect jump instructions, any program order successors that were fetched from an address that is not the one that was written to the {\em pc} register are discarded, together with the sub-tree of instruction instances below them.
\end{itemize}
\noindent Transitions specific to load instructions:
\begin{itemize}
\item[$\circ$] \nameref{omm:initiate_load}: At this point the memory
footprint of the load instruction is provisionally known (it could change if
earlier instructions are restarted) and its individual memory load operations can start being satisfied.
\item \nameref{omm:sat_by_forwarding}: This partially or entirely
satisfies a single memory load operation by forwarding, from
program-order-previous memory store operations.
\item \nameref{omm:sat_from_mem}: This entirely satisfies the outstanding slices of a single memory load operation, from memory.
\item[$\circ$] \nameref{omm:complete_loads}: At this point all the memory load operations of the instruction have been entirely satisfied and the instruction pseudocode can continue executing.
A load instruction can be subject to being restarted until the \nameref{omm:finish} transition.
But, under some conditions, the model might treat a load instruction as non-restartable even before it is finished (e.g.~see \nameref{omm:prop_store}).
\end{itemize}
\noindent Transitions specific to store instructions:
\begin{itemize}
\item[$\circ$] \nameref{omm:initiate_store_footprint}: At this point the memory footprint of the store is provisionally known.
\item[$\circ$] \nameref{omm:instantiate_store_value}: At this point the memory store operations have their values and program-order-successor memory load operations can be satisfied by forwarding from them.
\item[$\circ$] \nameref{omm:commit_stores}: At this point the store operations are guaranteed to happen (the instruction can no longer be restarted or discarded), and they can start being propagated to memory.
\item \nameref{omm:prop_store}: This propagates a single memory store operation to memory.
\item[$\circ$] \nameref{omm:complete_stores}: At this point all the memory store operations of the instruction have been propagated to memory, and the instruction pseudocode can continue executing.
\end{itemize}
\noindent Transitions specific to {\tt sc} instructions:
\begin{itemize}
\item \nameref{omm:early_sc_fail}: This causes the {\tt sc} to fail, either a spontaneous fail or because it is not paired with a program-order-previous {\tt lr}.
\item \nameref{omm:paired_sc}: This transition indicates the {\tt sc} is paired with an {\tt lr} and might succeed.
\item \nameref{omm:commit_sc}: This is an atomic execution of the transitions \nameref{omm:commit_stores} and \nameref{omm:prop_store}, it is enabled only if the stores from which the {\tt lr} read from have not been overwritten.
\item \nameref{omm:late_sc_fail}: This causes the {\tt sc} to fail, either a spontaneous fail or because the stores from which the {\tt lr} read from have been overwritten.
\end{itemize}
\noindent Transitions specific to AMO instructions:
\begin{itemize}
\item \nameref{omm:do_amo}: This is an atomic execution of all the transitions needed to satisfy the load operation, do the required arithmetic, and propagate the store operation.
\end{itemize}
\noindent Transitions specific to fence instructions:
\begin{itemize}
\item[$\circ$] \nameref{omm:commit_fence}
\end{itemize}
The transitions labeled~$\circ$ can always be taken eagerly, as soon as their precondition is satisfied, without excluding other behavior; the $\bullet$ cannot.
Although \nameref{omm:fetch} is marked with a $\bullet$, it can be taken eagerly as long as it is not taken infinitely many times.
An instance of a non-AMO load instruction, after being fetched, will typically experience the following transitions in this order:
\begin{enumerate}
\item \nameref{omm:reg_read}
\item \nameref{omm:initiate_load}
\item \nameref{omm:sat_by_forwarding} and/or \nameref{omm:sat_from_mem} (as many as needed to satisfy all the load operations of the instance)
\item \nameref{omm:complete_loads}
\item \nameref{omm:reg_write}
\item \nameref{omm:finish}
\end{enumerate}
Before, between and after the transitions above, any number of \nameref{omm:sail_interp} transitions may appear.
In addition, a \nameref{omm:fetch} transition for fetching the instruction in the next program location will be available until it is taken.
This concludes the informal description of the operational model.
The following sections describe the formal operational model.
\subsection{Intra-instruction Pseudocode Execution}\label{sec:omm:pseudocode_exec}
The intra-instruction semantics for each instruction instance is expressed as a state machine, essentially running the instruction pseudocode.
Given a pseudocode execution state, it computes the next state. Most
states identify a pending memory or register operation, requested by
the pseudocode, which the memory model has to do. The
states are (this is a tagged union; tags in small-caps):
\begin{center}
\begin{tabular}{l@{ \quad-\quad }l}
{\sc Load\_mem}({\it kind}, {\it address}, {\it size}, {\it load\_continuation})
& memory load operation\\
{\sc Early\_sc\_fail}({\it res\_continuation})
& allow {\tt sc} to fail early\\
{\sc Store\_ea}({\it kind}, {\it address}, {\it size}, {\it next\_state})
& memory store effective address\\
{\sc Store\_memv}({\it mem\_value}, {\it store\_continuation})
& memory store value\\
{\sc Fence}({\it kind}, {\it next\_state})
& fence\\
{\sc Read\_reg}({\it reg\_name}, {\it read\_continuation})
& register read\\
{\sc Write\_reg}({\it reg\_name}, {\it reg\_value}, {\it next\_state})
& register write\\
{\sc Internal}({\it next\_state})
& pseudocode internal step\\
{\sc Done}
& end of pseudocode\\
\end{tabular}
\end{center}
Here:
\begin{tightlist}
\item {\it mem\_value} and {\it reg\_value} are lists of bytes;
\item {\it address} is an integer of XLEN bits;
\item for load/store, {\it kind} identifies whether it is {\tt lr/sc}, acquire-RCpc/release-RCpc, acquire-RCsc/release-RCsc, acquire-release-RCsc;
\item for fence, {\it kind} identifies whether it is a normal or TSO, and (for normal fences) the predecessor and successor ordering bits;
\item {\it reg\_name} identifies a register and a slice thereof (start and
end bit indices); and
\item the continuations describe how the instruction instance will continue for each value that might be provided by the surrounding memory model (the {\it load\_continuation} and {\it read\_continuation} take the value loaded from memory and read from the previous register write, the {\it store\_continuation} takes {\it false} for an {\tt sc} that failed and {\it true} in all other cases, and {\it res\_continuation} takes {\it false} if the {\tt sc} fails and {\it true} otherwise).
\end{tightlist}
\begin{commentary}
For example, given the load instruction \verb!lw x1,0(x2)!,
an execution will typically go as follows.
The initial execution state will be computed from the pseudocode for the given opcode.
This can be expected to be {\sc Read\_reg}({\tt x2}, {\it read\_continuation}).
Feeding the most recently written value of register {\tt x2} (the instruction semantics will be blocked if necessary until the register value is available), say {\tt 0x4000}, to {\it read\_continuation} returns {\sc Load\_mem}({\tt plain\_load}, {\tt 0x4000}, {\tt 4}, {\it load\_continuation}).
Feeding the 4-byte value loaded from memory location {\tt 0x4000}, say {\tt 0x42}, to {\it load\_continuation} returns
{\sc Write\_reg}({\tt x1}, {\tt 0x42}, {\sc Done}).
Many {\sc Internal}({\it next\_state}) states may appear before and between the states above.
\end{commentary}
Notice that writing to memory is split into two steps, {\sc Store\_ea} and {\sc Store\_memv}: the first one makes the memory footprint of the store provisionally known, and the second one adds the value to be stored.
We ensure these are paired in the pseudocode ({\sc Store\_ea} followed by {\sc Store\_memv}), but there may be other steps between them.
\begin{commentary}
It is observable that the {\sc Store\_ea} can occur before the value to be stored is determined.
For example, for the litmus test LB+fence.r.rw+data-po to be allowed by the operational model (as it is by RVWMO), the first store in Hart 1 has to take the {\sc Store\_ea} step before its value is determined, so that the second store can see it is to a non-overlapping memory footprint, allowing the second store to be committed out of order without violating coherence.
\end{commentary}
The pseudocode of each instruction performs at most one store or one load, except for AMOs that perform exactly one load and one store.
Those memory accesses are then split apart into the architecturally atomic units by the hart semantics (see \nameref{omm:initiate_load} and \nameref{omm:initiate_store_footprint} below).
Informally, each bit of a register read should be satisfied from a register write by the most recent (in program order) instruction instance that can write that bit (or from the hart's initial register state if there is no such write).
Hence, it is essential to know the register write footprint of each instruction instance, which we calculate when the instruction instance is created (see the action of \nameref{omm:fetch} below).
We ensure in the pseudocode that each instruction does at most one register write to each register bit, and also that it does not try to read a register value it just wrote.
Data-flow dependencies (address and data) in the model emerge from the
fact that each register read has to wait for the appropriate register write to be executed (as described above).
\subsection{Instruction Instance State}\label{sec:omm:inst_state}
Each instruction instance $i$ has a state comprising:
\begin{itemize}
\item {\it program\_loc}, the memory address from which the instruction was fetched;
\item {\it instruction\_kind}, identifying whether this is a load, store, AMO, fence, branch/jump or a `simple' instruction (this also includes a {\it kind} similar to the one described for the pseudocode execution states);
\item {\it src\_regs}, the set of source {\it reg\_name}s (including system registers), as statically determined from the pseudocode of the instruction;
\item {\it dst\_regs}, the destination {\it reg\_name}s (including system registers), as statically determined from the pseudocode of the instruction;
\item {\it pseudocode\_state} (or sometimes just `state' for short), one of (this is a tagged union; tags in small-caps):
\begin{center}
\begin{tabular}{l@{ \quad-\quad }l}
{\sc Plain}({\it isa\_state})
& ready to make a pseudocode transition \\
{\sc Pending\_mem\_loads}({\it load\_continuation})
& requesting memory load operation(s) \\
{\sc Pending\_mem\_stores}({\it store\_continuation})
& requesting memory store operation(s) \\
% {\sc Pending\_exception}({\it exception}) & performing an exception;
\end{tabular}
\end{center}
\item {\it reg\_reads}, the register reads the instance has performed, including, for each one, the register write slices it read from;
\item {\it reg\_writes}, the register writes the instance has performed;
\item {\it mem\_loads}, a set of memory load operations, and for each one
the as-yet-unsatisfied slices (the byte indices that have not been
satisfied yet), and, for the satisfied slices, the store slices
(each consisting of a memory store operation and subset of its byte indices) that satisfied it.
\item {\it mem\_stores}, a set of memory store operations, and for each one a flag that indicates whether it has been propagated (passed to the shared memory) or not.
\item information recording whether the instance is committed, finished, etc.
\end{itemize}
Each memory load operation includes a memory footprint (address and size).
Each memory store operations includes a memory footprint, and, when available, a value.
A load instruction instance with a non-empty {\it mem\_loads}, for which all the load operations are satisfied (i.e.~there are no unsatisfied load slices) is said to be {\it entirely satisfied}.
Informally, an instruction instance is said to have {\it fully determined data} if the load (and {\tt sc}) instructions feeding its source registers are finished.
Similarly, it is said to have a {\it fully determined memory footprint} if the load (and {\tt sc}) instructions feeding its memory operation address register are finished.
%
Formally, we first define the notion of {\it fully determined register write}: a register write $w$ from {\it reg\_writes} of instruction instance $i$ is said to be {\it fully determined} if one of the following conditions hold:
\begin{enumerate}
\item $i$ is finished; or
\item the value written by $w$ is not affected by a memory operation that $i$ has made (i.e. a value loaded from memory or the result of {\tt sc}), and, for every register read that $i$ has made, that affects $w$, the register write from which $i$ read is fully determined (or $i$ read from the initial register state).
\end{enumerate}
Now, an instruction instance $i$ is said to have {\it fully determined data} if for every register read $r$ from {\it reg\_reads}, the register writes that $r$ reads from are fully determined.
An instruction instance $i$ is said to have a {\it fully determined memory footprint} if for every register read $r$ from {\it reg\_reads} that feeds into $i$'s memory operation address, the register writes that $r$ reads from are fully determined.
\begin{commentary}
The {\tt rmem} tool records, for every register write, the set of register writes from other instructions that have been read by this instruction at the point of performing the write.
By carefully arranging the pseudocode of the instructions covered by the tool we were able to make it so that this is exactly the set of register writes on which the write depends on.
\end{commentary}
\subsection{Hart State}
The model state of a single hart comprises:
\begin{itemize}
\item {\it hart\_id}, a unique identifier of the hart;
%\item {\it register\_data}, the name, bit width, and start bit index for each register;
\item {\it initial\_register\_state}, the initial register value for each register;
\item {\it initial\_fetch\_address}, the initial instruction fetch address;
\item {\it instruction\_tree}, a tree of the instruction instances that have been fetched (and not discarded), in program order.
\end{itemize}
\subsection{Shared Memory State}
The model state of the shared memory comprises a list of memory store operations, in the order they propagated to the shared memory.
When a store operation is propagated to the shared memory it is simply added to the end of the list.
When a load operation is satisfied from memory, for each byte of the load operation, the most recent corresponding store slice is returned.
\begin{commentary}
For most purposes, it is simpler to think of the shared memory as an array, i.e., a map from memory locations to memory store operation slices, where each memory location is mapped to a one-byte slice of the most recent memory store operation to that location.
However, this abstraction is not detailed enough to properly handle the {\tt sc} instruction.
The RVWMO \nameref{rvwmo:ax:atom} allows store operations from the same hart as the {\tt sc} to intervene between the store operation of the {\tt sc} and the store operations the paired {\tt lr} read from.
To allow such store operations to intervene, and forbid others, the array abstraction must be extended to record more information.
Here, we use a list as it is very simple, but a more efficient and scalable implementations should probably use something better.
\end{commentary}
\subsection{Transitions}\label{sec:omm:transitions}
Each of the paragraphs below describes a single kind of system transition.
The description starts with a condition over the current system state.
The transition can be taken in the current state only if the condition is satisfied.
The condition is followed by an action that is applied to that state when the transition is taken, in order to generate the new system state.
\paragraph{Fetch instruction}\label{omm:fetch}
A possible program-order-successor of instruction instance $i$ can be fetched from address {\it loc} if:
\begin{enumerate}
\item it has not already been fetched, i.e., none of the immediate successors of $i$ in the hart's {\it instruction\_tree} are from {\it loc}; and
\item if $i$'s pseudocode has already written an address to {\em pc}, then {\it loc} must be that address, otherwise {\it loc} is:
\begin{itemize}
\item for a conditional branch, the successor address or the branch target address;
\item for a (direct) jump and link instruction ({\tt jal}), the target address;
\item for an indirect jump instruction ({\tt jalr}), any address; and
\item for any other instruction, $i.\textit{program\_loc}+4$.
\end{itemize}
\end{enumerate}
% \fixme{Does an instruction at the end of memory need special-case treatment?}
Action: construct a freshly initialized instruction instance $i'$ for the instruction in the program memory at {\it loc}, with state {\sc Plain}({\it isa\_state}), computed from the instruction pseudocode, including the static information available from the pseudocode such as its {\it instruction\_kind}, {\it src\_regs}, and {\it dst\_regs}, and add $i'$ to the hart's {\it instruction\_tree} as a successor of $i$.
% If the instruction fails to decode, set the state of $i'$ to ``{\sc Pending\_exception} {\it exception}'' with {\it exception} that describes the decoding error.
\begin{commentary}
The possible next fetch addresses ({\it loc}) are available immediately after fetching $i$ and the model does not need to wait for the pseudocode to write to {\em pc}; this allows out-of-order execution, and speculation past conditional branches and jumps.
For most instructions these addresses are easily obtained from the instruction pseudocode.
The only exception to that is the indirect jump instruction ({\tt jalr}), where the address depends on the value held in a register.
%
In principle the mathematical model should allow speculation to
arbitrary addresses here.
%
The exhaustive search in the {\tt rmem} tool handles this by running the exhaustive search multiple times with a growing set of possible next fetch addresses for each indirect jump.
The initial search uses empty sets, hence there is no fetch after indirect jump instruction until the pseudocode of the instruction writes to {\em pc}, and then we use that value for fetching the next instruction.
Before starting the next iteration of exhaustive search, we collect for each indirect jump (grouped by code location) the set of values it wrote to {\em pc} in all the executions in the previous search iteration, and use that as possible next fetch addresses of the instruction.
This process terminates when no new fetch addresses are detected.
\end{commentary}
\paragraph{Initiate memory load operations}\label{omm:initiate_load}
An instruction instance $i$ in state {\sc Plain}({\sc Load\_mem}({\it kind}, {\it address}, {\it size}, {\it load\_continuation})) can always initiate the corresponding memory load operations.
Action:
\begin{enumerate}
\item Construct the appropriate memory load operations $mlos$:
\begin{itemize}
\item if {\it address} is aligned to {\it size} then $mlos$ is a single memory load operation of {\it size} bytes from {\it address};
\item otherwise, $mlos$ is a set of {\it size} memory load operations, each of one byte, from the addresses $\textit{address}\ldots\textit{address}+\textit{size}-1$.
\end{itemize}
\item set {\it mem\_loads} of $i$ to $mlos$; and
\item update the state of $i$ to {\sc Pending\_mem\_loads}({\it load\_continuation}).
\end{enumerate}
\begin{commentary}
In Section~\ref{sec:rvwmo:primitives} it is said that misaligned memory accesses may be decomposed at any granularity.
Here we decompose them to one-byte accesses as this granularity subsumes all others.
\end{commentary}
\paragraph{Satisfy memory load operation by forwarding from unpropagated stores}\label{omm:sat_by_forwarding}
For a non-AMO load instruction instance $i$ in state {\sc Pending\_mem\_loads}({\it load\_continuation}), and a memory load operation $mlo$ in $i.\textit{mem\_loads}$ that has unsatisfied slices, the memory load operation can be partially or entirely satisfied by forwarding from unpropagated memory store operations by store instruction instances that are program-order-before $i$ if:
\begin{enumerate}
\item all program-order-previous {\tt fence} instructions with {\tt .sr} and {\tt .pw} set are finished;
\item for every program-order-previous {\tt fence} instruction, $f$, with {\tt .sr} and {\tt .pr} set, and {\tt .pw} not set, if $f$ is not finished then all load instructions that are program-order-before $f$ are entirely satisfied;
\item for every program-order-previous {\tt fence.tso} instruction, $f$, that is not finished, all load instructions that are program-order-before $f$ are entirely satisfied;
% \item all program-order-previous {\tt fence.i} instructions are finished;
\item if $i$ is a load-acquire-RCsc, all program-order-previous store-releases-RCsc are finished;
\item if $i$ is a load-acquire-release, all program-order-previous instructions are finished;
\item all non-finished program-order-previous load-acquire instructions are entirely satisfied; and
\item all program-order-previous store-acquire-release instructions are finished;
\end{enumerate}
Let $msoss$ be the set of all unpropagated memory store operation slices from non-{\tt sc} store instruction instances that are program-order-before $i$ and have already calculated the value to be stored, that overlap with the unsatisfied slices of $mlo$, and which are not superseded by intervening store operations or store operations that are read from by an intervening load.
The last condition requires, for each memory store operation slice $msos$ in $msoss$ from instruction $i'$:
\begin{tightlist}
\item that there is no store instruction program-order-between $i$ and $i'$ with a memory store operation overlapping $msos$; and
\item that there is no load instruction program-order-between $i$ and $i'$ that was satisfied from an overlapping memory store operation slice from a different hart.
\end{tightlist}
Action:
\begin{enumerate}
\item update $i.\textit{mem\_loads}$ to indicate that $mlo$ was satisfied by $msoss$; and
\item restart any speculative instructions which have violated coherence as a result of this, i.e., for every non-finished instruction $i'$ that is a program-order-successor of $i$, and every memory load operation $mlo'$ of $i'$ that was satisfied from $msoss'$, if there exists a memory store operation slice $msos'$ in $msoss'$, and an overlapping memory store operation slice from a different memory store operation in $msoss$, and $msos'$ is not from an instruction that is a program-order-successor of $i$, restart $i'$ and its {\em restart-dependents}.
\end{enumerate}
Where, the {\em restart-dependents} of instruction $j$ are:
\begin{tightlist}
\item program-order-successors of $j$ that have data-flow dependency on a register write of $j$;
\item program-order-successors of $j$ that have a memory load operation that reads from a memory store operation of $j$ (by forwarding);
\item if $j$ is a load-acquire, all the program-order-successors of $j$;
\item if $j$ is a load, for every {\tt fence}, $f$, with {\tt .sr} and {\tt .pr} set, and {\tt .pw} not set, that is a program-order-successor of $j$, all the load instructions that are program-order-successors of $f$;
\item if $j$ is a load, for every {\tt fence.tso}, $f$, that is a program-order-successor of $j$, all the load instructions that are program-order-successors of $f$;
and \item (recursively) all the restart-dependents of all the instruction instances above.
\end{tightlist}
\begin{commentary}
Forwarding memory store operations to a memory load might satisfy only some slices of the load, leaving other slices unsatisfied.
A program-order-previous store operation that was not available when taking the transition above might make $msoss$ provisionally unsound (violating coherence) when it becomes available.
That store will prevent the load from being finished (see \nameref{omm:finish}), and will cause it to restart when that store operation is propagated (see \nameref{omm:prop_store}).
A consequence of the transition condition above is that store-release-RCsc memory store operations cannot be forwarded to load-acquire-RCsc instructions:
$msoss$ does not include memory store operations from finished stores (as those must be propagated memory store operations), and the condition above requires all program-order-previous store-releases-RCsc to be finished when the load is acquire-RCsc.
\end{commentary}
\paragraph{Satisfy memory load operation from memory}\label{omm:sat_from_mem}
For an instruction instance $i$ of a non-AMO load instruction or an AMO instruction in the context of the ``\nameref{omm:do_amo}'' transition, any memory load operation $mlo$ in $i.\textit{mem\_loads}$ that has unsatisfied slices, can be satisfied from memory if all the conditions of \nameref{omm:sat_by_forwarding} are satisfied.
Action: let $msoss$ be the memory store operation slices from memory covering the unsatisfied slices of $mlo$, and apply the action of \nameref{omm:sat_by_forwarding}.
\begin{commentary}
Note that \nameref{omm:sat_by_forwarding} might leave some slices of the memory load operation unsatisfied, those will have to be satisfied by taking the transition again, or taking \nameref{omm:sat_from_mem}.
\nameref{omm:sat_from_mem}, on the other hand, will always satisfy all the unsatisfied slices of the memory load operation.
\end{commentary}
\paragraph{Complete load operations}\label{omm:complete_loads}
A load instruction instance $i$ in state {\sc Pending\_mem\_loads}({\it load\_continuation}) can be completed (not to be confused with finished) if all the memory load operations $i.\textit{mem\_loads}$ are entirely satisfied (i.e.~there are no unsatisfied slices).
Action: update the state of $i$ to {\sc Plain}({\it load\_continuation(mem\_value)}), where {\it mem\_value} is assembled from all the memory store operation slices that satisfied $i.\textit{mem\_loads}$.
\paragraph{Early {\tt sc} fail}\label{omm:early_sc_fail}
An {\tt sc} instruction instance $i$ in state {\sc Plain}({\sc Early\_sc\_fail}({\it res\_continuation})) can always be made to fail.
Action: update the state of $i$ to {\sc Plain}({\it res\_continuation(false)}).
\paragraph{Paired {\tt sc}}\label{omm:paired_sc}
An {\tt sc} instruction instance $i$ in state {\sc Plain}({\sc Early\_sc\_fail}({\it res\_continuation})) can continue its (potentially successful) execution if $i$ is paired with an {\tt lr}.
Action: update the state of $i$ to {\sc Plain}({\it res\_continuation(true)}).
\paragraph{Initiate memory store operation footprints}\label{omm:initiate_store_footprint}
An instruction instance $i$ in state {\sc Plain}({\sc Store\_ea}({\it kind}, {\it address}, {\it size}, {\it next\_state})) can always announce its pending memory store operation footprint.
Action:
\begin{enumerate}
\item construct the appropriate memory store operations $msos$ (without the store value):
\begin{itemize}
\item if {\it address} is aligned to {\it size} then $msos$ is a single memory store operation of {\it size} bytes to {\it address};
\item otherwise, $msos$ is a set of {\it size} memory store operations, each of one-byte size, to the addresses $\textit{address}\ldots\textit{address}+\textit{size}-1$.
\end{itemize}
\item set $i.\textit{mem\_stores}$ to $msos$; and
\item update the state of $i$ to {\sc Plain}({\it next\_state}).
\end{enumerate}
\begin{commentary}
Note that after taking the transition above the memory store operations do not yet have their values.
The importance of splitting this transition from the transition below is that it allows other program-order-successor store instructions to observe the memory footprint of this instruction, and if they don't overlap, propagate out of order as early as possible (i.e.~before the data register value becomes available).
\end{commentary}
\paragraph{Instantiate memory store operation values}\label{omm:instantiate_store_value}
An instruction instance $i$ in state {\sc Plain}({\sc Store\_memv}({\it mem\_value}, {\it store\_continuation})) can always instantiate the values of the memory store operations $i.\textit{mem\_stores}$.
Action:
\begin{enumerate}
\item split {\it mem\_value} between the memory store operations $i.\textit{mem\_stores}$; and
\item update the state of $i$ to {\sc Pending\_mem\_stores}({\it store\_continuation}).
\end{enumerate}
\paragraph{Commit store instruction}\label{omm:commit_stores}
An uncommitted instruction instance $i$ of a non-{\tt sc} store instruction or an {\tt sc} instruction in the context of the ``\nameref{omm:commit_sc}'' transition, in state {\sc Pending\_mem\_stores}({\it store\_continuation}), can be committed (not to be confused with propagated) if:
\begin{enumerate}
\item $i$ has fully determined data;
\item all program-order-previous conditional branch and indirect jump instructions are finished;
\item all program-order-previous {\tt fence} instructions with {\tt .sw} set are finished;
\item all program-order-previous {\tt fence.tso} instructions are finished;
% \item all program-order-previous {\tt fence.i} instructions are finished;
\item all program-order-previous load-acquire instructions are finished;
\item all program-order-previous store-acquire-release instructions are finished;
\item if $i$ is a store-release, all program-order-previous instructions are finished;
\item\label{omm:commit_store:prev_addrs} all program-order-previous memory access instructions have a fully determined memory footprint;
\item\label{omm:commit_store:prev_stores} all program-order-previous store instructions, except for {\tt sc} that failed, have initiated and so have non-empty {\it mem\_stores}; and
\item\label{omm:commit_store:prev_loads} all program-order-previous load instructions have initiated and so have non-empty {\it mem\_loads}.
\end{enumerate}
Action: record that $i$ is committed.
\begin{commentary}
Notice that if condition \ref{omm:commit_store:prev_addrs} is satisfied the conditions \ref{omm:commit_store:prev_stores} and \ref{omm:commit_store:prev_loads} are also satisfied, or will be satisfied after taking some eager transitions.
Hence, requiring them does not strengthen the model.
By requiring them, we guarantee that previous memory access instructions have taken enough transitions to make their memory operations visible for the condition check of \nameref{omm:prop_store}, which is the next transition the instruction will take, making that condition simpler.
\end{commentary}
\paragraph{Propagate store operation}\label{omm:prop_store}
For a committed instruction instance $i$ in state {\sc Pending\_mem\_stores}({\it store\_continuation}), and an unpropagated memory store operation $mso$ in $i.\textit{mem\_stores}$, $mso$ can be propagated if:
\begin{enumerate}
\item all memory store operations of program-order-previous store instructions that overlap with $mso$ have already propagated;
\item all memory load operations of program-order-previous load instructions that overlap with $mso$ have already been satisfied, and (the load instructions) are {\em non-restartable} (see definition below); and
\item all memory load operations that were satisfied by forwarding $mso$ are entirely satisfied.
\end{enumerate}
Where a non-finished instruction instance $j$ is {\em non-restartable} if:
\begin{enumerate}
\item there does not exist a store instruction $s$ and an unpropagated memory store operation $mso$ of $s$ such that applying the action of the ``\nameref{omm:prop_store}'' transition to $mso$ will result in the restart of $j$; and
\item there does not exist a non-finished load instruction $l$ and a memory load operation $mlo$ of $l$ such that applying the action of the ``\nameref{omm:sat_by_forwarding}''/``\nameref{omm:sat_from_mem}'' transition (even if $mlo$ is already satisfied) to $mlo$ will result in the restart of $j$.
\end{enumerate}
Action:
\begin{enumerate}
\item update the shared memory state with $mso$;
\item update $i.\textit{mem\_stores}$ to indicate that $mso$ was propagated; and
\item restart any speculative instructions which have violated coherence as a result of this, i.e., for every non-finished instruction $i'$ program-order-after $i$ and every memory load operation $mlo'$ of $i'$ that was satisfied from $msoss'$, if there exists a memory store operation slice $msos'$ in $msoss'$ that overlaps with $mso$ and is not from $mso$, and $msos'$ is not from a program-order-successor of $i$, restart $i'$ and its {\em restart-dependents} (see \nameref{omm:sat_by_forwarding}).
\end{enumerate}
\paragraph{Commit and propagate store operation of an {\tt sc}}\label{omm:commit_sc}
An uncommitted {\tt sc} instruction instance $i$, from hart $h$, in state {\sc Pending\_mem\_stores}({\it store\_continuation}), with a paired {\tt lr} $i'$ that has been satisfied by some store slices $msoss$, can be committed and propagated at the same time if:
\begin{enumerate}
\item $i'$ is finished;
\item every memory store operation that has been forwarded to $i'$ is propagated;
\item the conditions of \nameref{omm:commit_stores} is satisfied;
\item the conditions of \nameref{omm:prop_store} is satisfied (notice that an {\tt sc} instruction can only have one memory store operation); and
\item for every store slice $msos$ from $msoss$, $msos$ has not been overwritten, in the shared memory, by a store that is from a hart that is not $h$, at any point since $msos$ was propagated to memory.
\end{enumerate}
Action:
\begin{enumerate}
\item apply the actions of \nameref{omm:commit_stores}; and
\item apply the action of \nameref{omm:prop_store}.
\end{enumerate}
\paragraph{Late {\tt sc} fail}\label{omm:late_sc_fail}
An {\tt sc} instruction instance $i$ in state {\sc Pending\_mem\_stores}({\it store\_continuation}), that has not propagated its memory store operation, can always be made to fail.
Action:
\begin{enumerate}
\item clear $i.\textit{mem\_stores}$; and
\item update the state of $i$ to {\sc Plain}({\it store\_continuation(false)}).
\end{enumerate}
\begin{commentary}
For efficiency, the {\tt rmem} tool allows this transition only when it is not possible to take the \nameref{omm:commit_sc} transition.
This does not affect the set of allowed final states, but when explored interactively, if the {\tt sc} should fail one should use the \nameref{omm:early_sc_fail} transition instead of waiting for this transition.
\end{commentary}
\paragraph{Complete store operations}\label{omm:complete_stores}
A store instruction instance $i$ in state {\sc Pending\_mem\_stores}({\it store\_continuation}), for which all the memory store operations in $i.\textit{mem\_stores}$ have been propagated, can always be completed (not to be confused with finished).
Action: update the state of $i$ to {\sc Plain}({\it store\_continuation(true)}).
\paragraph{Satisfy, commit and propagate operations of an AMO}\label{omm:do_amo}
An AMO instruction instance $i$ in state {\sc Pending\_mem\_loads}({\it load\_continuation}) can perform its memory access if it is possible to perform the following sequence of transitions with no intervening transitions:
\begin{enumerate}
\item \nameref{omm:sat_from_mem}
\item \nameref{omm:complete_loads}
\item \nameref{omm:sail_interp} (zero or more times)
\item \nameref{omm:instantiate_store_value}
\item \nameref{omm:commit_stores}
\item \nameref{omm:prop_store}
\item \nameref{omm:complete_stores}
\end{enumerate}
and in addition, the condition of \nameref{omm:finish}, with the exception of not requiring $i$ to be in state {\sc Plain}({\sc Done}), holds after those transitions.
Action: perform the above sequence of transitions (this does not include \nameref{omm:finish}), one after the other, with no intervening transitions.
\begin{commentary}
Notice that program-order-previous stores cannot be forwarded to the load of an AMO.
This is simply because the sequence of transitions above does not include the forwarding transition.
But even if it did include it, the sequence will fail when trying to do the \nameref{omm:prop_store} transition, as this transition requires all program-order-previous store operations to overlapping memory footprints to be propagated, and forwarding requires the store operation to be unpropagated.
In addition, the store of an AMO cannot be forwarded to a program-order-successor load.
Before taking the transition above, the store operation of the AMO does not have its value and therefore cannot be forwarded; after taking the transition above the store operation is propagated and therefore cannot be forwarded.
\end{commentary}
\paragraph{Commit fence}\label{omm:commit_fence}
A fence instruction instance $i$ in state {\sc Plain}({\sc Fence}({\it kind}, {\it next\_state})) can be committed if:
\begin{enumerate}
\item if $i$ is a normal fence and it has {\tt .pr} set, all program-order-previous load instructions are finished;
\item if $i$ is a normal fence and it has {\tt .pw} set, all program-order-previous store instructions are finished; and
\item if $i$ is a {\tt fence.tso}, all program-order-previous load and store instructions are finished.
% \item if $i$ is a {\tt fence.i} instruction, all program-order-previous memory access instructions have fully determined memory footprints.
\end{enumerate}
Action:
\begin{enumerate}
\item record that $i$ is committed; and
\item update the state of $i$ to {\sc Plain}({\it next\_state}).
\end{enumerate}
\paragraph{Register read}\label{omm:reg_read}
An instruction instance $i$ in state {\sc Plain}({\sc Read\_reg}({\it reg\_name}, {\it read\_cont})) can do a register read of {\it reg\_name} if every instruction instance that it needs to read from has already performed the expected {\it reg\_name} register write.
Let {\it read\_sources} include, for each bit of {\it reg\_name}, the write to
that bit by the most recent (in program order) instruction instance that can write to that bit, if any. If there is no such instruction, the source is the initial register value from {\it initial\_register\_state}.
Let {\it reg\_value} be the value assembled from {\it read\_sources}.
Action:
\begin{enumerate}
\item add {\it reg\_name} to $i.\textit{reg\_reads}$ with {\it read\_sources} and {\it reg\_value}; and
\item update the state of $i$ to {\sc Plain}({\it read\_cont(reg\_value)}).
\end{enumerate}
\paragraph{Register write}\label{omm:reg_write}
An instruction instance $i$ in state {\sc Plain}({\sc Write\_reg}({\it reg\_name}, {\it reg\_value}, {\it next\_state})) can always do a {\it reg\_name} register write.
Action:
\begin{enumerate}
\item add {\it reg\_name} to $i.\textit{reg\_writes}$ with $deps$ and {\it reg\_value}; and
\item update the state of $i$ to {\sc Plain}({\it next\_state}).
\end{enumerate}
where $deps$ is a pair of the set of all {\it read\_sources} from $i.\textit{reg\_reads}$, and a flag that is true iff $i$ is a load instruction instance that has already been entirely satisfied.
\paragraph{Pseudocode internal step}\label{omm:sail_interp}
An instruction instance $i$ in state {\sc Plain}({\sc Internal}({\it next\_state})) can always do that pseudocode-internal step.
Action: update the state of $i$ to {\sc Plain}({\it next\_state}).
\paragraph{Finish instruction}\label{omm:finish}
A non-finished instruction instance $i$ in state {\sc Plain}({\sc Done}) can be finished if:
\begin{enumerate}
\item if $i$ is a load instruction:
\begin{enumerate}
\item all program-order-previous load-acquire instructions are finished;
\item all program-order-previous {\tt fence} instructions with {\tt .sr} set are finished;
\item for every program-order-previous {\tt fence.tso} instruction, $f$, that is not finished, all load instructions that are program-order-before $f$ are finished; and
\item it is guaranteed that the values read by the memory load operations of $i$ will not cause coherence violations, i.e., for any program-order-previous instruction instance $i'$, let $\textit{cfp}$ be the combined footprint of propagated memory store operations from store instructions program-order-between $i$ and $i'$, and {\em fixed memory store operations} that were forwarded to $i$ from store instructions program-order-between $i$ and $i'$ including $i'$, and let $\overline{\textit{cfp}}$ be the complement of $\textit{cfp}$ in the memory footprint of $i$.
If $\overline{\textit{cfp}}$ is not empty:
\begin{enumerate}
\item $i'$ has a fully determined memory footprint;
\item $i'$ has no unpropagated memory store operations that overlap with $\overline{\textit{cfp}}$; and
\item if $i'$ is a load with a memory footprint that overlaps with $\overline{\textit{cfp}}$, then all the memory load operations of $i'$ that overlap with $\overline{\textit{cfp}}$ are satisfied and $i'$ is {\em non-restartable} (see the \nameref{omm:prop_store} transition for how to determined if an instruction is non-restartable).
\end{enumerate}
Here, a memory store operation is called fixed if the store instruction has fully determined data.
\end{enumerate}
\item $i$ has a fully determined data; and
\item if $i$ is not a fence, all program-order-previous conditional branch and indirect jump instructions are finished.
\end{enumerate}
Action:
\begin{enumerate}
\item if $i$ is a conditional branch or indirect jump instruction, discard any untaken paths of execution, i.e., remove all instruction instances that are not reachable by the branch/jump taken in {\it instruction\_tree}; and
\item record the instruction as finished, i.e., set {\it finished} to {\it true}.
\end{enumerate}
\subsection{Limitations}\label{sec:omm:limitations}
\begin{itemize}
\item The model covers user-level RV64I and RV64A.
In particular, it does not support the misaligned atomics extension ``Zam'' or the total store ordering extension ``Ztso''.
It should be trivial to adapt the model to RV32I/A and to the G, Q and C extensions, but we have never tried it. This will involve, mostly, writing Sail code for the instructions, with minimal, if any, changes to the concurrency model.
% unsupported (system) instructions: ecall, ebreak, csrrw, csrrs, csrrc, csrrwi, csrrsi, csrrci
\item The model covers only normal memory accesses (it does not handle I/O accesses).
\item The model does not cover TLB-related effects.
\item The model assumes the instruction memory is fixed.
In particular, the \nameref{omm:fetch} transition does not generate memory load operations, and the shared memory is not involved in the transition.
Instead, the model depends on an external oracle that provides an opcode when given a memory location.
\item The model does not cover exceptions, traps and interrupts.
\end{itemize}
|