aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKrste Asanovic <krste@eecs.berkeley.edu>2018-08-06 00:51:03 -0700
committerKrste Asanovic <krste@eecs.berkeley.edu>2018-08-06 00:51:03 -0700
commit695d84330e22b39a0b8cdfc0c54bb0e390bbcefa (patch)
tree8d3176f338c85c1ef6880999b09b29c7aec159dc
parent4845f5d61f96a827ec4d21d2c5a80b6bf7881e56 (diff)
downloadriscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.zip
riscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.tar.gz
riscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.tar.bz2
CSR instructions file.
-rw-r--r--src/csr.tex277
1 files changed, 277 insertions, 0 deletions
diff --git a/src/csr.tex b/src/csr.tex
new file mode 100644
index 0000000..16877ad
--- /dev/null
+++ b/src/csr.tex
@@ -0,0 +1,277 @@
+\chapter{Control and Status Register (CSR) Instructions}
+\label{csrinsts}
+
+This chapter defines the full set of CSR instructions, although the
+control and status registers are primarily used by the privileged
+architecture. There are several uses in unprivileged code including
+for counters and timers, and floating-point status.
+
+\begin{commentary}
+ The counters and timers are no longer considered mandatory parts of
+ the standard base ISAs, and so have been moved into this separate
+ chapter.
+\end{commentary}
+
+\section{CSR Instructions}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{M@{}R@{}F@{}R@{}S}
+\\
+\instbitrange{31}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{csr} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{funct3} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+12 & 5 & 3 & 5 & 7 \\
+source/dest & source & CSRRW & dest & SYSTEM \\
+source/dest & source & CSRRS & dest & SYSTEM \\
+source/dest & source & CSRRC & dest & SYSTEM \\
+source/dest & uimm[4:0] & CSRRWI & dest & SYSTEM \\
+source/dest & uimm[4:0] & CSRRSI & dest & SYSTEM \\
+source/dest & uimm[4:0] & CSRRCI & dest & SYSTEM \\
+\end{tabular}
+\end{center}
+
+The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values
+in the CSRs and integer registers. CSRRW reads the old value of the
+CSR, zero-extends the value to XLEN bits, then writes it to integer
+register {\em rd}. The initial value in {\em rs1} is written to the
+CSR. If {\em rd}={\tt x0}, then the instruction shall not read the CSR
+and shall not cause any of the side-effects that might occur on a CSR
+read.
+
+The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the
+value of the CSR, zero-extends the value to XLEN bits, and writes it
+to integer register {\em rd}. The initial value in integer register
+{\em rs1} is treated as a bit mask that specifies bit positions to be
+set in the CSR. Any bit that is high in {\em rs1} will cause the
+corresponding bit to be set in the CSR, if that CSR bit is writable.
+Other bits in the CSR are unaffected (though CSRs might have side
+effects when written).
+
+The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
+value of the CSR, zero-extends the value to XLEN bits, and writes it
+to integer register {\em rd}. The initial value in integer register
+{\em rs1} is treated as a bit mask that specifies bit positions to be
+cleared in the CSR. Any bit that is high in {\em rs1} will cause the
+corresponding bit to be cleared in the CSR, if that CSR bit is
+writable. Other bits in the CSR are unaffected.
+
+For both CSRRS and CSRRC, if {\em rs1}={\tt x0}, then the instruction
+will not write to the CSR at all, and so shall not cause any of the
+side effects that might otherwise occur on a CSR write, such as
+raising illegal instruction exceptions on accesses to read-only CSRs.
+Note that if {\em rs1} specifies a register holding a zero value other
+than {\tt x0}, the instruction will still attempt to write the
+unmodified value back to the CSR and will cause any attendant side effects.
+
+The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS,
+and CSRRC respectively, except they update the CSR using an XLEN-bit
+value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field
+encoded in the {\em rs1} field instead of a value from an integer
+register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then
+these instructions will not write to the CSR, and shall not cause any
+of the side effects that might otherwise occur on a CSR write. For
+CSRRWI, if {\em rd}={\tt x0}, then the instruction shall not read the
+CSR and shall not cause any of the side-effects that might occur on a
+CSR read.
+
+Some CSRs, such as the instructions retired counter, {\tt instret}, may be
+modified as side effects of instruction execution. In these cases, if a CSR
+access instruction reads a CSR, it reads the value prior to the execution of
+the instruction. If a CSR access instruction writes a CSR, the update occurs
+after the execution of the instruction. In particular, a value written to
+{\tt instret} by one instruction will be the value read by the following
+instruction (i.e., the increment of {\tt instret} caused by the first
+instruction retiring happens before the write of the new value).
+
+The assembler pseudoinstruction to read a CSR, CSRR {\em rd, csr}, is
+encoded as CSRRS {\em rd, csr, x0}. The assembler pseudoinstruction
+to write a CSR, CSRW {\em csr, rs1}, is encoded as CSRRW {\em x0, csr,
+ rs1}, while CSRWI {\em csr, uimm}, is encoded as CSRRWI {\em x0,
+ csr, uimm}.
+
+Further assembler pseudoinstructions are defined to set and clear
+bits in the CSR when the old value is not required: CSRS/CSRC {\em
+ csr, rs1}; CSRSI/CSRCI {\em csr, uimm}.
+
+\section{Timers and Counters}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{M@{}R@{}F@{}R@{}S}
+\\
+\instbitrange{31}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{csr} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{funct3} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+12 & 5 & 3 & 5 & 7 \\
+RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM \\
+RDTIME[H] & 0 & CSRRS & dest & SYSTEM \\
+RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM \\
+\end{tabular}
+\end{center}
+
+RV32I provides a number of 64-bit read-only user-level counters, which
+are mapped into the 12-bit CSR address space and accessed in 32-bit
+pieces using CSRRS instructions. In RV64I, the CSR instructions can
+manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and
+RDINSTRET pseudoinstructions read the full 64 bits of the {\tt cycle},
+{\tt time}, and {\tt instret} counters. Hence, the RDCYCLEH, RDTIMEH,
+and RDINSTRETH instructions are not required in RV64I.
+
+\begin{commentary}
+Some execution environments might prohibit access to counters to
+impede timing side-channel attacks.
+\end{commentary}
+
+The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt
+ cycle} CSR which holds a count of the number of clock cycles
+executed by the processor core on which the hart is running from
+an arbitrary start time in the past. RDCYCLEH is
+an RV32I-only instruction that reads bits 63--32 of the same cycle
+counter. The underlying 64-bit counter should never overflow in
+practice. The rate at which the cycle counter advances will depend on
+the implementation and operating environment. The execution
+environment should provide a means to determine the current rate
+(cycles/second) at which the cycle counter is incrementing.
+
+\begin{commentary}
+RDCYCLE is intended to return the number of cycles executed by the
+processor core, not the hart. Precisely defining what is a ``core'' is
+difficult given some implementation choices (e.g., AMD Bulldozer).
+Precisely defining what is a ``clock cycle'' is also difficult given the
+range of implementations (including software emulations), but the
+intent is that RDCYCLE is used for performance monitoring along with the
+other performance counters. In particular, where there is one
+hart/core, one would expect cycle-count/instructions-retired to
+measure CPI for a hart.
+
+Cores don't have to be exposed to software at all, and an implementor
+might choose to pretend multiple harts on one physical core are
+running on separate cores with one hart/core, and provide separate
+cycle counters for each hart. This might make sense in a simple
+barrel processor (e.g., CDC 6600 peripheral processors) where
+inter-hart timing interactions are non-existent or minimal.
+
+Where there is more than one hart/core and dynamic multithreading, it
+is not generally possible to separate out cycles per hart (especially
+with SMT). It might be possible to define a separate performance
+counter that tried to capture the number of cycles a particular hart
+was running, but this definition would have to be very fuzzy to cover
+all the possible threading implementations. For example, should we
+only count cycles for which any instruction was issued to execution
+for this hart, and/or cycles any instruction retired, or include
+cycles this hart was occupying machine resources but couldn't execute
+due to stalls while other harts went into execution? Likely, ``all of
+the above'' would be needed to have understandable performance stats.
+This complexity of defining a per-hart cycle count, and also the need
+in any case for a total per-core cycle count when tuning multithreaded
+code led to just standardizing the per-core cycle counter, which also
+happens to work well for the common single hart/core case.
+
+Standardizing what happens during ``sleep'' is not practical given
+that what ``sleep'' means is not standardized across execution
+environments, but if the entire core is paused (entirely clock-gated
+or powered-down in deep sleep), then it is not executing clock cycles,
+and the cycle count shouldn't be increasing per the spec. There are
+many details, e.g., whether clock cycles required to reset a processor
+after waking up from a power-down event should be counted, and these
+are considered execution-environment-specific details.
+
+Even though there is no precise definition that works for all
+platforms, this is still a useful facility for most platforms, and an
+imprecise, common, ``usually correct'' standard here is better than no
+standard. The intent of RDCYCLE was primarily performance
+monitoring/tuning, and the specification was written with that goal in
+mind.
+\end{commentary}
+
+The RDTIME pseudoinstruction reads the low XLEN bits of the {\tt
+ time} CSR, which counts wall-clock real time that has passed from an
+arbitrary start time in the past. RDTIMEH is an RV32I-only instruction
+that reads bits 63--32 of the same real-time counter. The underlying 64-bit
+counter should never overflow in practice. The execution environment
+should provide a means of determining the period of the real-time
+counter (seconds/tick). The period must be constant. The
+real-time clocks of all harts in a single user application
+should be synchronized to within one tick of the real-time clock. The
+environment should provide a means to determine the accuracy of the
+clock.
+
+\begin{commentary}
+On some simple platforms, cycle count might represent a valid
+implementation of RDTIME, but in this case, platforms should implement
+the RDTIME instruction as an alias for RDCYCLE to make code more
+portable, rather than using RDCYCLE to measure wall-clock time.
+\end{commentary}
+
+The RDINSTRET pseudoinstruction reads the low XLEN bits of the {\tt
+ instret} CSR, which counts the number of instructions retired by
+this hart from some arbitrary start point in the past. RDINSTRETH is
+an RV32I-only instruction that reads bits 63--32 of the same
+instruction counter. The underlying 64-bit counter that should never
+overflow in practice.
+
+The following code sequence will read a valid 64-bit cycle counter value into
+{\tt x3}:{\tt x2}, even if the counter overflows between reading its upper
+and lower halves.
+
+\begin{figure}[h!]
+\begin{center}
+\begin{verbatim}
+ again:
+ rdcycleh x3
+ rdcycle x2
+ rdcycleh x4
+ bne x3, x4, again
+\end{verbatim}
+\end{center}
+\caption{Sample code for reading the 64-bit cycle counter in RV32.}
+\label{rdcycle}
+\end{figure}
+
+\begin{commentary}
+We would like these basic counters be provided in all implementations as
+they are essential for basic performance analysis, adaptive and
+dynamic optimization, and to allow an application to work with
+real-time streams. Additional counters should be provided to help
+diagnose performance problems and these should be made accessible from
+user-level application code with low overhead.
+
+We required the counters be 64 bits wide, even on RV32, as otherwise
+it is very difficult for software to determine if values have
+overflowed. For a low-end implementation, the upper 32 bits of each
+counter can be implemented using software counters incremented by a
+trap handler triggered by overflow of the lower 32 bits. The sample
+code described above shows how the full 64-bit width value can be
+safely read using the individual 32-bit instructions.
+
+In some applications, it is important to be able to read multiple
+counters at the same instant in time. When run under a multitasking
+environment, a user thread can suffer a context switch while
+attempting to read the counters. One solution is for the user thread
+to read the real-time counter before and after reading the other
+counters to determine if a context switch occurred in the middle of the
+sequence, in which case the reads can be retried. We considered
+adding output latches to allow a user thread to snapshot the counter
+values atomically, but this would increase the size of the user
+context, especially for implementations with a richer set of counters.
+\end{commentary}
+