aboutsummaryrefslogtreecommitdiff
path: root/src/zawrs.adoc
blob: 7443c41289ed1a97b423e21292119a8fd5f95b5b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
== "Zawrs" Statndard extension for Wait-on-Reservation-Set instructions, version 1.01

The Zawrs extension defines a pair of instructions to be used in polling loops 
that allows a core to enter a low-power state and wait on a store to a memory 
location. Waiting for a memory location to be updated is a common pattern in 
many use cases such as:

. Contenders for a lock waiting for the lock variable to be updated.

. Consumers waiting on the tail of an empty queue for the producer to queue 
  work/data. The producer may be code executing on a RISC-V hart, an accelerator
  device, an external I/O agent.

. Code waiting on a flag to be set in memory indicative of an event occurring. 
  For example, software on a RISC-V hart may wait on a "done" flag to be set in
  memory by an accelerator device indicating completion of a job previously 
  submitted to the device.

Such use cases involve polling on memory locations, and such busy loops can be a
wasteful expenditure of energy. To mitigate the wasteful looping in such usages,
a `WRS.NTO` (WRS-with-no-timeout) instruction is provided. Instead of polling 
for a store to a specific memory location, software registers a reservation set
that includes all the bytes of the memory location using the `LR` instruction. 
Then a subsequent `WRS.NTO` instruction would cause the hart to temporarily 
stall execution in a low-power state until a store occurs to the reservation set
or an interrupt is observed.

Sometimes the program waiting on a memory update may also need to carry out a
task at a future time or otherwise place an upper bound on the wait. To support
such use cases a second instruction `WRS.STO` (WRS-with-short-timeout) is 
provided that works like `WRS.NTO` but bounds the stall duration to an 
implementation-define short timeout such that the stall is terminated on the 
timeout if no other conditions have occurred to terminate the stall. The 
program using this instruction may then determine if its deadline has been 
reached.

[NOTE]
====
The instructions in the Zawrs extension are only useful in conjunction with the
LR instructions, which are provided by the A extension, and which we also expect
to be provided by a narrower Zalrsc extension in the future.
====

[[Zawrs]]
=== Wait-on-Reservation-Set Instructions

The `WRS.NTO` and `WRS.STO` instructions cause the hart to temporarily stall
execution in a low-power state as long as the reservation set is valid and no
pending interrupts, even if disabled, are observed. For `WRS.STO` the stall 
duration is bounded by an implementation defined short timeout. These 
instructions are available in all privilege modes. These instructions are not
supported in a constrained `LR`/`SC` loop.

[wavedrom, ,svg]
....
{reg: [
  {bits: 7, name: 'opcode', attr: ['SYSTEM(0x73)'] },
  {bits: 5, name: 'rd', attr: ['0'] },
  {bits: 3,  name: 'funct3', attr: ['0'] },
  {bits: 5,  name: 'rs1', attr: ['0'] },
  {bits: 12,  name: 'funct12', attr:['WRS.NTO(0x0d)', 'WRS.STO(0x1d)'] },
], config:{lanes: 1, hspace:1024}}
....

Hart execution may be stalled while the following conditions are all satisfied:
[loweralpha]
    . The reservation set is valid 
    . If `WRS.STO`, a "short" duration since start of stall has not elapsed
    . No pending interrupt is observed (see the rules below)

While stalled, an implementation is permitted to occasionally terminate the 
stall and complete execution for any reason. 

`WRS.NTO` and `WRS.STO` instructions follow the rules of the `WFI` instruction
for resuming execution on a pending  interrupt.

When the `TW` (Timeout Wait) bit in `mstatus` is set and `WRS.NTO` is executed
in any privilege mode other than M mode, and it does not complete within an
implementation-specific bounded time limit, the `WRS.NTO` instruction will cause
an illegal instruction exception.

When executing in VS or VU mode, if the `VTW` bit is set in `hstatus`, the 
`TW` bit in `mstatus` is clear, and the `WRS.NTO` does not complete within an 
implementation-specific bounded time limit, the `WRS.NTO` instruction will cause
a virtual instruction exception.

[NOTE]
====
Since the `WRS.STO` and `WRS.NTO` instructions can complete execution for 
reasons other than stores to the reservation set, software will likely need 
a means of looping until the required stores have occurred.

The duration of a `WRS.STO` instruction's timeout may vary significantly within 
and among implementations. In typical implementations this duration should be 
roughly in the range of 10 to 100 times an on-chip cache miss latency or a 
cacheless access to main memory.

`WRS.NTO`, unlike `WFI`, is not specified to cause an illegal instruction
exception if executed in U-mode when the governing `TW` bit is 0. `WFI` is
typically not expected to be used in U-mode and on many systems may promptly
cause an illegal instruction exception if used at U-mode. Unlike `WFI`,
`WRS.NTO` is expected to be used by software in U-mode when waiting on
memory but without a deadline for that wait.
====