src/zifencei.adoc


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

[[zifencei]]
== "Zifencei" Extension for Instruction-Fetch Fence, Version 2.0
This chapter defines the "Zifencei" extension, which includes the
FENCE.I instruction that provides explicit synchronization between
writes to instruction memory and instruction fetches on the same hart.
Currently, this instruction is the only standard mechanism to ensure
that stores visible to a hart will also be visible to its instruction
fetches.
(((store instruction word, not included)))

[NOTE]
====
We considered but did not include a "store instruction word"
instruction as in cite:[majc]. JIT compilers may generate a large trace of
instructions before a single FENCE.I, and amortize any instruction cache
snooping/invalidation overhead by writing translated instructions to
memory regions that are known not to reside in the I-cache.
====
'''
[TIP]
====
The FENCE.I instruction was designed to support a wide variety of
implementations. A simple implementation can flush the local instruction
cache and the instruction pipeline when the FENCE.I is executed. A more
complex implementation might snoop the instruction (data) cache on every
data (instruction) cache miss, or use an inclusive unified private L2
cache to invalidate lines from the primary instruction cache when they
are being written by a local store instruction. If instruction and data
caches are kept coherent in this way, or if the memory system consists
of only uncached RAMs, then just the fetch pipeline needs to be flushed
at a FENCE.I.

The FENCE.I instruction was previously part of the base I instruction
set. Two main issues are driving moving this out of the mandatory base,
although at time of writing it is still the only standard method for
maintaining instruction-fetch coherence.

First, it has been recognized that on some systems, FENCE.I will be
expensive to implement and alternate mechanisms are being discussed in
the memory model task group. In particular, for designs that have an
incoherent instruction cache and an incoherent data cache, or where the
instruction cache refill does not snoop a coherent data cache, both
caches must be completely flushed when a FENCE.I instruction is
encountered. This problem is exacerbated when there are multiple levels
of I and D cache in front of a unified cache or outer memory system.

Second, the instruction is not powerful enough to make available at user
level in a Unix-like operating system environment. The FENCE.I only
synchronizes the local hart, and the OS can reschedule the user hart to
a different physical hart after the FENCE.I. This would require the OS
to execute an additional FENCE.I as part of every context migration. For
this reason, the standard Linux ABI has removed FENCE.I from user-level
and now requires a system call to maintain instruction-fetch coherence,
which allows the OS to minimize the number of FENCE.I executions
required on current systems and provides forward-compatibility with
future improved instruction-fetch coherence mechanisms.

Future approaches to instruction-fetch coherence under discussion
include providing more restricted versions of FENCE.I that only target a
given address specified in _rs1_, and/or allowing software to use an ABI
that relies on machine-mode cache-maintenance operations.
====

include::images/wavedrom/zifencei-ff.adoc[]
[[zifencei-ff]]
//.FENCE.I instruction
(((FENCE.I, synchronization)))

The FENCE.I instruction is used to synchronize the instruction and data
streams. RISC-V does not guarantee that stores to instruction memory
will be made visible to instruction fetches on a RISC-V hart until that
hart executes a FENCE.I instruction. A FENCE.I instruction ensures that
a subsequent instruction fetch on a RISC-V hart will see any previous
data stores already visible to the same RISC-V hart. FENCE.I does _not_
ensure that other RISC-V harts' instruction fetches will observe the
local hart's stores in a multiprocessor system. To make a store to
instruction memory visible to all RISC-V harts, the writing hart also
has to execute a data FENCE before requesting that all remote RISC-V
harts execute a FENCE.I.

The unused fields in the FENCE.I instruction, _imm[11:0]_, _rs1_, and
_rd_, are reserved for finer-grain fences in future extensions. For
forward compatibility, base implementations shall ignore these fields,
and standard software shall zero these fields.
(((FENCE.I, finer-grained)))
(((FENCE.I, forward compatibility)))

[NOTE]
====
Because FENCE.I only orders stores with a hart's own instruction
fetches, application code should only rely upon FENCE.I if the
application thread will not be migrated to a different hart. The EEI can
provide mechanisms for efficient multiprocessor instruction-stream
synchronization.
====