aboutsummaryrefslogtreecommitdiff
path: root/src
diff options
context:
space:
mode:
authorKrste Asanovic <krste@eecs.berkeley.edu>2018-08-06 00:43:16 -0700
committerKrste Asanovic <krste@eecs.berkeley.edu>2018-08-06 00:43:16 -0700
commit0ebf86910b43ee87d08c4cabe136d464541e1ac0 (patch)
tree98837764d3b7ccb5c9b53083e5328b8f007f8114 /src
parent61cadb9df80baac7e6da6574f7b81d8e8d97f283 (diff)
downloadriscv-isa-manual-0ebf86910b43ee87d08c4cabe136d464541e1ac0.zip
riscv-isa-manual-0ebf86910b43ee87d08c4cabe136d464541e1ac0.tar.gz
riscv-isa-manual-0ebf86910b43ee87d08c4cabe136d464541e1ac0.tar.bz2
Moved CSR instructions into separate chapter.
Diffstat (limited to 'src')
-rw-r--r--src/riscv-spec.tex1
-rw-r--r--src/rv32.tex448
-rw-r--r--src/rv32e.tex9
-rw-r--r--src/rv64.tex8
4 files changed, 111 insertions, 355 deletions
diff --git a/src/riscv-spec.tex b/src/riscv-spec.tex
index ccf6e92..8648fd5 100644
--- a/src/riscv-spec.tex
+++ b/src/riscv-spec.tex
@@ -74,6 +74,7 @@ Andrew Waterman and Krste Asanovi\'{c}, RISC-V Foundation, May 2017.
\input{rv32e}
\input{rv64}
\input{rv128}
+\input{csr}
\input{rvwmo}
\input{m}
\input{a}
diff --git a/src/rv32.tex b/src/rv32.tex
index f5aafd4..3c6727d 100644
--- a/src/rv32.tex
+++ b/src/rv32.tex
@@ -8,19 +8,19 @@ instruction set.
RV32I was designed to be sufficient to form a compiler target and to
support modern operating system environments. The ISA was also
designed to reduce the hardware required in a minimal implementation.
-RV32I contains 47 unique instructions, though a simple implementation
-might cover the eight ECALL/EBREAK/CSRR* instructions with a single
+RV32I contains 41 unique instructions, though a simple implementation
+might cover the ECALL/EBREAK instructions with a single
SYSTEM hardware instruction that always traps and might be able to
implement the FENCE and FENCE.I instructions as NOPs, reducing
-hardware instruction count to 38 total. RV32I can emulate almost any
+hardware instruction count to 37 total. RV32I can emulate almost any
other ISA extension (except the A extension, which requires additional
hardware support for atomicity).
Subsets of the base integer ISA might be useful for pedagogical
purposes, but the base has been defined such that there should be
little incentive to subset a real hardware implementation beyond
-omitting support for misaligned memory accesses and treating all SYSTEM
-instructions as a single trap.
+omitting support for misaligned memory accesses and treating all
+SYSTEM and FENCE instructions as a single trap.
\end{commentary}
\begin{commentary}
@@ -1079,326 +1079,7 @@ automatically optimize accesses depending on whether runtime addresses
are aligned.
\end{commentary}
-\section{Control and Status Register Instructions}
-\label{sec:csrinsts}
-
-SYSTEM instructions are used to access system functionality that might
-require privileged access and are encoded using the I-type instruction
-format. These can be divided into two main classes: those that
-atomically read-modify-write control and status registers (CSRs), and
-all other potentially privileged instructions. CSR instructions are
-described in this section, with the two other user-level SYSTEM
-instructions described in the following section.
-
-\begin{commentary}
-The SYSTEM instructions are defined to allow simpler implementations
-to always trap to a single software trap handler. More sophisticated
-implementations might execute more of each system instruction in
-hardware.
-\end{commentary}
-
-\subsubsection*{CSR Instructions}
-
-We define the full set of CSR instructions here, although in the standard
-user-level base ISA, only a handful of read-only counter CSRs are accessible.
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{csr} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-source/dest & source & CSRRW & dest & SYSTEM \\
-source/dest & source & CSRRS & dest & SYSTEM \\
-source/dest & source & CSRRC & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRWI & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRSI & dest & SYSTEM \\
-source/dest & uimm[4:0] & CSRRCI & dest & SYSTEM \\
-\end{tabular}
-\end{center}
-
-The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values
-in the CSRs and integer registers. CSRRW reads the old value of the
-CSR, zero-extends the value to XLEN bits, then writes it to integer
-register {\em rd}. The initial value in {\em rs1} is written to the
-CSR. If {\em rd}={\tt x0}, then the instruction shall not read the CSR
-and shall not cause any of the side-effects that might occur on a CSR
-read.
-
-The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the
-value of the CSR, zero-extends the value to XLEN bits, and writes it
-to integer register {\em rd}. The initial value in integer register
-{\em rs1} is treated as a bit mask that specifies bit positions to be
-set in the CSR. Any bit that is high in {\em rs1} will cause the
-corresponding bit to be set in the CSR, if that CSR bit is writable.
-Other bits in the CSR are unaffected (though CSRs might have side
-effects when written).
-
-The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
-value of the CSR, zero-extends the value to XLEN bits, and writes it
-to integer register {\em rd}. The initial value in integer register
-{\em rs1} is treated as a bit mask that specifies bit positions to be
-cleared in the CSR. Any bit that is high in {\em rs1} will cause the
-corresponding bit to be cleared in the CSR, if that CSR bit is
-writable. Other bits in the CSR are unaffected.
-
-For both CSRRS and CSRRC, if {\em rs1}={\tt x0}, then the instruction
-will not write to the CSR at all, and so shall not cause any of the
-side effects that might otherwise occur on a CSR write, such as
-raising illegal instruction exceptions on accesses to read-only CSRs.
-Note that if {\em rs1} specifies a register holding a zero value other
-than {\tt x0}, the instruction will still attempt to write the
-unmodified value back to the CSR and will cause any attendant side effects.
-
-The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS,
-and CSRRC respectively, except they update the CSR using an XLEN-bit
-value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field
-encoded in the {\em rs1} field instead of a value from an integer
-register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then
-these instructions will not write to the CSR, and shall not cause any
-of the side effects that might otherwise occur on a CSR write. For
-CSRRWI, if {\em rd}={\tt x0}, then the instruction shall not read the
-CSR and shall not cause any of the side-effects that might occur on a
-CSR read.
-
-Some CSRs, such as the instructions retired counter, {\tt instret}, may be
-modified as side effects of instruction execution. In these cases, if a CSR
-access instruction reads a CSR, it reads the value prior to the execution of
-the instruction. If a CSR access instruction writes a CSR, the update occurs
-after the execution of the instruction. In particular, a value written to
-{\tt instret} by one instruction will be the value read by the following
-instruction (i.e., the increment of {\tt instret} caused by the first
-instruction retiring happens before the write of the new value).
-
-The assembler pseudoinstruction to read a CSR, CSRR {\em rd, csr}, is
-encoded as CSRRS {\em rd, csr, x0}. The assembler pseudoinstruction
-to write a CSR, CSRW {\em csr, rs1}, is encoded as CSRRW {\em x0, csr,
- rs1}, while CSRWI {\em csr, uimm}, is encoded as CSRRWI {\em x0,
- csr, uimm}.
-
-Further assembler pseudoinstructions are defined to set and clear
-bits in the CSR when the old value is not required: CSRS/CSRC {\em
- csr, rs1}; CSRSI/CSRCI {\em csr, uimm}.
-
-\subsubsection*{Timers and Counters}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{csr} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM \\
-RDTIME[H] & 0 & CSRRS & dest & SYSTEM \\
-RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM \\
-\end{tabular}
-\end{center}
-
-RV32I provides a number of 64-bit read-only user-level counters, which
-are mapped into the 12-bit CSR address space and accessed in 32-bit
-pieces using CSRRS instructions.
-
-\begin{commentary}
-Some execution environments might prohibit access to counters to
-impede timing side-channel attacks.
-\end{commentary}
-
-The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt
- cycle} CSR which holds a count of the number of clock cycles
-executed by the processor core on which the hart is running from
-an arbitrary start time in the past. RDCYCLEH is
-an RV32I-only instruction that reads bits 63--32 of the same cycle
-counter. The underlying 64-bit counter should never overflow in
-practice. The rate at which the cycle counter advances will depend on
-the implementation and operating environment. The execution
-environment should provide a means to determine the current rate
-(cycles/second) at which the cycle counter is incrementing.
-
-\begin{commentary}
-RDCYCLE is intended to return the number of cycles executed by the
-processor core, not the hart. Precisely defining what is a ``core'' is
-difficult given some implementation choices (e.g., AMD Bulldozer).
-Precisely defining what is a ``clock cycle'' is also difficult given the
-range of implementations (including software emulations), but the
-intent is that RDCYCLE is used for performance monitoring along with the
-other performance counters. In particular, where there is one
-hart/core, one would expect cycle-count/instructions-retired to
-measure CPI for a hart.
-
-Cores don't have to be exposed to software at all, and an implementor
-might choose to pretend multiple harts on one physical core are
-running on separate cores with one hart/core, and provide separate
-cycle counters for each hart. This might make sense in a simple
-barrel processor (e.g., CDC 6600 peripheral processors) where
-inter-hart timing interactions are non-existent or minimal.
-
-Where there is more than one hart/core and dynamic multithreading, it
-is not generally possible to separate out cycles per hart (especially
-with SMT). It might be possible to define a separate performance
-counter that tried to capture the number of cycles a particular hart
-was running, but this definition would have to be very fuzzy to cover
-all the possible threading implementations. For example, should we
-only count cycles for which any instruction was issued to execution
-for this hart, and/or cycles any instruction retired, or include
-cycles this hart was occupying machine resources but couldn't execute
-due to stalls while other harts went into execution? Likely, ``all of
-the above'' would be needed to have understandable performance stats.
-This complexity of defining a per-hart cycle count, and also the need
-in any case for a total per-core cycle count when tuning multithreaded
-code led to just standardizing the per-core cycle counter, which also
-happens to work well for the common single hart/core case.
-
-Standardizing what happens during ``sleep'' is not practical given
-that what ``sleep'' means is not standardized across execution
-environments, but if the entire core is paused (entirely clock-gated
-or powered-down in deep sleep), then it is not executing clock cycles,
-and the cycle count shouldn't be increasing per the spec. There are
-many details, e.g., whether clock cycles required to reset a processor
-after waking up from a power-down event should be counted, and these
-are considered execution-environment-specific details.
-
-Even though there is no precise definition that works for all
-platforms, this is still a useful facility for most platforms, and an
-imprecise, common, ``usually correct'' standard here is better than no
-standard. The intent of RDCYCLE was primarily performance
-monitoring/tuning, and the specification was written with that goal in
-mind.
-\end{commentary}
-
-The RDTIME pseudoinstruction reads the low XLEN bits of the {\tt
- time} CSR, which counts wall-clock real time that has passed from an
-arbitrary start time in the past. RDTIMEH is an RV32I-only instruction
-that reads bits 63--32 of the same real-time counter. The underlying 64-bit
-counter should never overflow in practice. The execution environment
-should provide a means of determining the period of the real-time
-counter (seconds/tick). The period must be constant. The
-real-time clocks of all harts in a single user application
-should be synchronized to within one tick of the real-time clock. The
-environment should provide a means to determine the accuracy of the
-clock.
-
-\begin{commentary}
-On some simple platforms, cycle count might represent a valid
-implementation of RDTIME, but in this case, platforms should implement
-the RDTIME instruction as an alias for RDCYCLE to make code more
-portable, rather than using RDCYCLE to measure wall-clock time.
-\end{commentary}
-
-The RDINSTRET pseudoinstruction reads the low XLEN bits of the {\tt
- instret} CSR, which counts the number of instructions retired by
-this hart from some arbitrary start point in the past. RDINSTRETH is
-an RV32I-only instruction that reads bits 63--32 of the same
-instruction counter. The underlying 64-bit counter that should never
-overflow in practice.
-
-The following code sequence will read a valid 64-bit cycle counter value into
-{\tt x3}:{\tt x2}, even if the counter overflows between reading its upper
-and lower halves.
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- again:
- rdcycleh x3
- rdcycle x2
- rdcycleh x4
- bne x3, x4, again
-\end{verbatim}
-\end{center}
-\caption{Sample code for reading the 64-bit cycle counter in RV32.}
-\label{rdcycle}
-\end{figure}
-
-\begin{commentary}
-We would like these basic counters be provided in all implementations as
-they are essential for basic performance analysis, adaptive and
-dynamic optimization, and to allow an application to work with
-real-time streams. Additional counters should be provided to help
-diagnose performance problems and these should be made accessible from
-user-level application code with low overhead.
-
-We required the counters be 64 bits wide, even on RV32, as otherwise
-it is very difficult for software to determine if values have
-overflowed. For a low-end implementation, the upper 32 bits of each
-counter can be implemented using software counters incremented by a
-trap handler triggered by overflow of the lower 32 bits. The sample
-code described above shows how the full 64-bit width value can be
-safely read using the individual 32-bit instructions.
-
-In some applications, it is important to be able to read multiple
-counters at the same instant in time. When run under a multitasking
-environment, a user thread can suffer a context switch while
-attempting to read the counters. One solution is for the user thread
-to read the real-time counter before and after reading the other
-counters to determine if a context switch occurred in the middle of the
-sequence, in which case the reads can be retried. We considered
-adding output latches to allow a user thread to snapshot the counter
-values atomically, but this would increase the size of the user
-context, especially for implementations with a richer set of counters.
-\end{commentary}
-
-
-\section{Environment Call and Breakpoints}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{funct12} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-ECALL & 0 & PRIV & 0 & SYSTEM \\
-EBREAK & 0 & PRIV & 0 & SYSTEM \\
-\end{tabular}
-\end{center}
-
-The ECALL instruction is used to make a request to the supporting
-execution environment, which is usually an operating system. The ABI
-for the system will define how parameters for the environment request
-are passed, but usually these will be in defined locations in the
-integer register file.
-
-The EBREAK instruction is used by debuggers to cause control to be
-transferred back to a debugging environment.
-
-\begin{commentary}
-ECALL and EBREAK were previously named SCALL and SBREAK. The
-instructions have the same functionality and encoding, but were
-renamed to reflect that they can be used more generally than to call a
-supervisor-level operating system or debugger.
-\end{commentary}
+\pagebreak
\section{Memory Ordering Instructions}
\label{sec:fence}
@@ -1451,15 +1132,15 @@ hart or external device can observe any operation in the {\em
Chapter~\ref{ch:memorymodel} provides a precise description of the
RISC-V memory consistency model.
-The execution environment
-will define what I/O operations are possible, and in particular, which
-load and store instructions might be treated and ordered as device
-input and device output operations respectively rather than memory
-reads and writes. For example, memory-mapped I/O devices will
-typically be accessed with uncached loads and stores that are ordered
-using the I and O bits rather than the R and W bits. Instruction-set
-extensions might also describe new coprocessor I/O instructions that
-will also be ordered using the I and O bits in a FENCE.
+The EEI will define what I/O operations are possible, and in
+particular, which load and store instructions might be treated and
+ordered as device input and device output operations respectively
+rather than memory reads and writes. For example, memory-mapped I/O
+devices will typically be accessed with uncached loads and stores that
+are ordered using the I and O bits rather than the R and W bits.
+Instruction-set extensions might also describe new coprocessor I/O
+instructions that will also be ordered using the I and O bits in a
+FENCE.
\begin{table}[htp]
\begin{small}
@@ -1546,10 +1227,11 @@ forward compatibility, base implementations shall ignore these fields, and
standard software shall zero these fields.
\begin{commentary}
-Because FENCE.I only orders stores with a hart's own instruction fetches,
-application code should only rely upon FENCE.I if the application thread will
-not be migrated to a different hart. The ABI will provide mechanisms for
-multiprocessor instruction-stream synchronization.
+Because FENCE.I only orders stores with a hart's own instruction
+fetches, application code should only rely upon FENCE.I if the
+application thread will not be migrated to a different hart. The EEI
+can provide mechanisms for efficient multiprocessor instruction-stream
+synchronization.
\end{commentary}
\begin{commentary}
@@ -1571,6 +1253,95 @@ instructions to memory regions that are known not to reside in the
I-cache.
\end{commentary}
+\section{Environment Call and Breakpoints}
+
+SYSTEM instructions are used to access system functionality that might
+require privileged access and are encoded using the I-type instruction
+format. These can be divided into two main classes: those that
+atomically read-modify-write control and status registers (CSRs), and
+all other potentially privileged instructions. CSR instructions are
+described in Chapter~\ref{csrinsts}, and the base unprivileged instructions
+are described in the following section.
+
+\begin{commentary}
+The SYSTEM instructions are defined to allow simpler implementations
+to always trap to a single software trap handler. More sophisticated
+implementations might execute more of each system instruction in
+hardware.
+\end{commentary}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{M@{}R@{}F@{}R@{}S}
+\\
+\instbitrange{31}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct12} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{funct3} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+12 & 5 & 3 & 5 & 7 \\
+ECALL & 0 & PRIV & 0 & SYSTEM \\
+EBREAK & 0 & PRIV & 0 & SYSTEM \\
+\end{tabular}
+\end{center}
+
+There two instructions cause a precise requested trap to the
+supporting execution environment.
+
+The ECALL instruction is used to make a service request to the
+execution environment. The EEI will define how parameters for the
+service request are passed, but usually these will be in defined
+locations in the integer register file.
+
+The EBREAK instruction is used to return control to a debugging
+environment.
+
+\begin{commentary}
+ECALL and EBREAK were previously named SCALL and SBREAK. The
+instructions have the same functionality and encoding, but were
+renamed to reflect that they can be used more generally than to call a
+supervisor-level operating system or debugger.
+\end{commentary}
+
+\begin{commentary}
+ EBREAK was primarily designed to be used by a debugger to cause
+ execution to stop and fall back into the debugger. EBREAK is also
+ used by the standard gcc compiler to mark code paths that should not
+ be executed.
+
+ Another use of EBREAK is to support ``semihosting'', where the
+ execution environment includes a debugger that can provide services
+ over an alternate system call interface built around the EBREAK
+ instruction. Because the RISC-V base ISA does not provide more than
+ one EBREAK instruction, RISC-V semihosting uses a special sequence of
+ instructions to distinguish a semihosting EBREAK from a debugger
+ inserted EBREAK.
+\begin{verbatim}
+ slli x0, x0, 0x1f # Entry NOP
+ ebreak # Break to debugger
+ srai x0, x0, 7 # NOP encoding the semihosting call number 7
+\end{verbatim}
+ The shift NOP instructions are still considered available for use as
+ HINTS.
+
+ Semihosting is a form of service call and would be more naturally
+ encoded as an ECALL using an existing ABI, but this would require
+ the debugger to be able to intercept ECALLs, which is a newer
+ addition to the debug standard. We intend to move over to using
+ ECALLs with a standard ABI, in which case, semihosting can share a
+ service ABI with an existing standard.
+
+ We note that ARM processors have also moved to using SVC instead of
+ BKPT for semihosting calls in newer designs.
+\end{commentary}
+
\section{HINT Instructions}
\label{sec:rv32i-hints}
@@ -1637,3 +1408,4 @@ simulation/emulation.
\caption{RV32I HINT instructions.}
\label{tab:rv32i-hints}
\end{table}
+
diff --git a/src/rv32e.tex b/src/rv32e.tex
index 2c08be5..7682f03 100644
--- a/src/rv32e.tex
+++ b/src/rv32e.tex
@@ -50,15 +50,6 @@ bits freed up by the reduced register-specifier fields and so these
are available for non-standard extensions.
\end{commentary}
-A further simplification is that the counter instructions ({\tt
- rdcycle[h]},{\tt rdtime[h]}, {\tt rdinstret[h]}) are no longer
-mandatory.
-
-\begin{commentary}
-The mandatory counters require additional registers and logic, and can
-be replaced with more application-specific facilities.
-\end{commentary}
-
\section{RV32E Extensions}
RV32E can be extended with the M, A, and C user-level standard extensions.
diff --git a/src/rv64.tex b/src/rv64.tex
index bac88cb..e4bfb3c 100644
--- a/src/rv64.tex
+++ b/src/rv64.tex
@@ -244,14 +244,6 @@ values, as are LB and LBU for 8-bit values. The SD, SW, SH, and SB
instructions store 64-bit, 32-bit, 16-bit, and 8-bit values from the
low bits of register {\em rs2} to memory respectively.
-\section{System Instructions}
-
-In RV64I, the CSR instructions can manipulate 64-bit CSRs. In particular, the
-RDCYCLE, RDTIME, and RDINSTRET pseudoinstructions read the full 64 bits of
-the {\tt cycle}, {\tt time}, and {\tt instret} counters. Hence, the RDCYCLEH,
-RDTIMEH, and RDINSTRETH instructions are not necessary and are illegal in
-RV64I.
-
\section{HINT Instructions}
\label{sec:rv64i-hints}