aboutsummaryrefslogtreecommitdiff
path: root/src/csr.tex
diff options
context:
space:
mode:
Diffstat (limited to 'src/csr.tex')
-rw-r--r--src/csr.tex193
1 files changed, 15 insertions, 178 deletions
diff --git a/src/csr.tex b/src/csr.tex
index 16877ad..1d02710 100644
--- a/src/csr.tex
+++ b/src/csr.tex
@@ -1,19 +1,28 @@
\chapter{Control and Status Register (CSR) Instructions}
\label{csrinsts}
-This chapter defines the full set of CSR instructions, although the
-control and status registers are primarily used by the privileged
-architecture. There are several uses in unprivileged code including
-for counters and timers, and floating-point status.
+RISC-V defines a separate address space of 4096 Control and Status
+registers associated with each hart. This chapter defines the full
+set of CSR instructions that operate on these CSRs.
\begin{commentary}
+ While CSRs are primarily used by the privileged architecture, there
+ are several uses in unprivileged code including for counters and
+ timers, and for floating-point status.
+
The counters and timers are no longer considered mandatory parts of
- the standard base ISAs, and so have been moved into this separate
- chapter.
+ the standard base ISAs, and so the CSR instructions required to
+ access them have been moved out of the base ISA chapter into this
+ separate chapter.
\end{commentary}
\section{CSR Instructions}
+All CSR instructions atomically read-modify-write a single CSR, whose
+CSR specifier is encoded in the 12-bit {\em csr} field of the
+instruction held in bits 31--20. The immediate forms use a 5-bit
+zero-extended immediate encoded in the {\em rs1} field.
+
\vspace{-0.2in}
\begin{center}
\begin{tabular}{M@{}R@{}F@{}R@{}S}
@@ -103,175 +112,3 @@ Further assembler pseudoinstructions are defined to set and clear
bits in the CSR when the old value is not required: CSRS/CSRC {\em
csr, rs1}; CSRSI/CSRCI {\em csr, uimm}.
-\section{Timers and Counters}
-
-\vspace{-0.2in}
-\begin{center}
-\begin{tabular}{M@{}R@{}F@{}R@{}S}
-\\
-\instbitrange{31}{20} &
-\instbitrange{19}{15} &
-\instbitrange{14}{12} &
-\instbitrange{11}{7} &
-\instbitrange{6}{0} \\
-\hline
-\multicolumn{1}{|c|}{csr} &
-\multicolumn{1}{c|}{rs1} &
-\multicolumn{1}{c|}{funct3} &
-\multicolumn{1}{c|}{rd} &
-\multicolumn{1}{c|}{opcode} \\
-\hline
-12 & 5 & 3 & 5 & 7 \\
-RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM \\
-RDTIME[H] & 0 & CSRRS & dest & SYSTEM \\
-RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM \\
-\end{tabular}
-\end{center}
-
-RV32I provides a number of 64-bit read-only user-level counters, which
-are mapped into the 12-bit CSR address space and accessed in 32-bit
-pieces using CSRRS instructions. In RV64I, the CSR instructions can
-manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and
-RDINSTRET pseudoinstructions read the full 64 bits of the {\tt cycle},
-{\tt time}, and {\tt instret} counters. Hence, the RDCYCLEH, RDTIMEH,
-and RDINSTRETH instructions are not required in RV64I.
-
-\begin{commentary}
-Some execution environments might prohibit access to counters to
-impede timing side-channel attacks.
-\end{commentary}
-
-The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt
- cycle} CSR which holds a count of the number of clock cycles
-executed by the processor core on which the hart is running from
-an arbitrary start time in the past. RDCYCLEH is
-an RV32I-only instruction that reads bits 63--32 of the same cycle
-counter. The underlying 64-bit counter should never overflow in
-practice. The rate at which the cycle counter advances will depend on
-the implementation and operating environment. The execution
-environment should provide a means to determine the current rate
-(cycles/second) at which the cycle counter is incrementing.
-
-\begin{commentary}
-RDCYCLE is intended to return the number of cycles executed by the
-processor core, not the hart. Precisely defining what is a ``core'' is
-difficult given some implementation choices (e.g., AMD Bulldozer).
-Precisely defining what is a ``clock cycle'' is also difficult given the
-range of implementations (including software emulations), but the
-intent is that RDCYCLE is used for performance monitoring along with the
-other performance counters. In particular, where there is one
-hart/core, one would expect cycle-count/instructions-retired to
-measure CPI for a hart.
-
-Cores don't have to be exposed to software at all, and an implementor
-might choose to pretend multiple harts on one physical core are
-running on separate cores with one hart/core, and provide separate
-cycle counters for each hart. This might make sense in a simple
-barrel processor (e.g., CDC 6600 peripheral processors) where
-inter-hart timing interactions are non-existent or minimal.
-
-Where there is more than one hart/core and dynamic multithreading, it
-is not generally possible to separate out cycles per hart (especially
-with SMT). It might be possible to define a separate performance
-counter that tried to capture the number of cycles a particular hart
-was running, but this definition would have to be very fuzzy to cover
-all the possible threading implementations. For example, should we
-only count cycles for which any instruction was issued to execution
-for this hart, and/or cycles any instruction retired, or include
-cycles this hart was occupying machine resources but couldn't execute
-due to stalls while other harts went into execution? Likely, ``all of
-the above'' would be needed to have understandable performance stats.
-This complexity of defining a per-hart cycle count, and also the need
-in any case for a total per-core cycle count when tuning multithreaded
-code led to just standardizing the per-core cycle counter, which also
-happens to work well for the common single hart/core case.
-
-Standardizing what happens during ``sleep'' is not practical given
-that what ``sleep'' means is not standardized across execution
-environments, but if the entire core is paused (entirely clock-gated
-or powered-down in deep sleep), then it is not executing clock cycles,
-and the cycle count shouldn't be increasing per the spec. There are
-many details, e.g., whether clock cycles required to reset a processor
-after waking up from a power-down event should be counted, and these
-are considered execution-environment-specific details.
-
-Even though there is no precise definition that works for all
-platforms, this is still a useful facility for most platforms, and an
-imprecise, common, ``usually correct'' standard here is better than no
-standard. The intent of RDCYCLE was primarily performance
-monitoring/tuning, and the specification was written with that goal in
-mind.
-\end{commentary}
-
-The RDTIME pseudoinstruction reads the low XLEN bits of the {\tt
- time} CSR, which counts wall-clock real time that has passed from an
-arbitrary start time in the past. RDTIMEH is an RV32I-only instruction
-that reads bits 63--32 of the same real-time counter. The underlying 64-bit
-counter should never overflow in practice. The execution environment
-should provide a means of determining the period of the real-time
-counter (seconds/tick). The period must be constant. The
-real-time clocks of all harts in a single user application
-should be synchronized to within one tick of the real-time clock. The
-environment should provide a means to determine the accuracy of the
-clock.
-
-\begin{commentary}
-On some simple platforms, cycle count might represent a valid
-implementation of RDTIME, but in this case, platforms should implement
-the RDTIME instruction as an alias for RDCYCLE to make code more
-portable, rather than using RDCYCLE to measure wall-clock time.
-\end{commentary}
-
-The RDINSTRET pseudoinstruction reads the low XLEN bits of the {\tt
- instret} CSR, which counts the number of instructions retired by
-this hart from some arbitrary start point in the past. RDINSTRETH is
-an RV32I-only instruction that reads bits 63--32 of the same
-instruction counter. The underlying 64-bit counter that should never
-overflow in practice.
-
-The following code sequence will read a valid 64-bit cycle counter value into
-{\tt x3}:{\tt x2}, even if the counter overflows between reading its upper
-and lower halves.
-
-\begin{figure}[h!]
-\begin{center}
-\begin{verbatim}
- again:
- rdcycleh x3
- rdcycle x2
- rdcycleh x4
- bne x3, x4, again
-\end{verbatim}
-\end{center}
-\caption{Sample code for reading the 64-bit cycle counter in RV32.}
-\label{rdcycle}
-\end{figure}
-
-\begin{commentary}
-We would like these basic counters be provided in all implementations as
-they are essential for basic performance analysis, adaptive and
-dynamic optimization, and to allow an application to work with
-real-time streams. Additional counters should be provided to help
-diagnose performance problems and these should be made accessible from
-user-level application code with low overhead.
-
-We required the counters be 64 bits wide, even on RV32, as otherwise
-it is very difficult for software to determine if values have
-overflowed. For a low-end implementation, the upper 32 bits of each
-counter can be implemented using software counters incremented by a
-trap handler triggered by overflow of the lower 32 bits. The sample
-code described above shows how the full 64-bit width value can be
-safely read using the individual 32-bit instructions.
-
-In some applications, it is important to be able to read multiple
-counters at the same instant in time. When run under a multitasking
-environment, a user thread can suffer a context switch while
-attempting to read the counters. One solution is for the user thread
-to read the real-time counter before and after reading the other
-counters to determine if a context switch occurred in the middle of the
-sequence, in which case the reads can be retried. We considered
-adding output latches to allow a user thread to snapshot the counter
-values atomically, but this would increase the size of the user
-context, especially for implementations with a richer set of counters.
-\end{commentary}
-