diff options
author | Krste Asanovic <krste@eecs.berkeley.edu> | 2018-08-06 00:51:03 -0700 |
---|---|---|
committer | Krste Asanovic <krste@eecs.berkeley.edu> | 2018-08-06 00:51:03 -0700 |
commit | 695d84330e22b39a0b8cdfc0c54bb0e390bbcefa (patch) | |
tree | 8d3176f338c85c1ef6880999b09b29c7aec159dc | |
parent | 4845f5d61f96a827ec4d21d2c5a80b6bf7881e56 (diff) | |
download | riscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.zip riscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.tar.gz riscv-isa-manual-695d84330e22b39a0b8cdfc0c54bb0e390bbcefa.tar.bz2 |
CSR instructions file.
-rw-r--r-- | src/csr.tex | 277 |
1 files changed, 277 insertions, 0 deletions
diff --git a/src/csr.tex b/src/csr.tex new file mode 100644 index 0000000..16877ad --- /dev/null +++ b/src/csr.tex @@ -0,0 +1,277 @@ +\chapter{Control and Status Register (CSR) Instructions} +\label{csrinsts} + +This chapter defines the full set of CSR instructions, although the +control and status registers are primarily used by the privileged +architecture. There are several uses in unprivileged code including +for counters and timers, and floating-point status. + +\begin{commentary} + The counters and timers are no longer considered mandatory parts of + the standard base ISAs, and so have been moved into this separate + chapter. +\end{commentary} + +\section{CSR Instructions} + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{M@{}R@{}F@{}R@{}S} +\\ +\instbitrange{31}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{csr} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{funct3} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +12 & 5 & 3 & 5 & 7 \\ +source/dest & source & CSRRW & dest & SYSTEM \\ +source/dest & source & CSRRS & dest & SYSTEM \\ +source/dest & source & CSRRC & dest & SYSTEM \\ +source/dest & uimm[4:0] & CSRRWI & dest & SYSTEM \\ +source/dest & uimm[4:0] & CSRRSI & dest & SYSTEM \\ +source/dest & uimm[4:0] & CSRRCI & dest & SYSTEM \\ +\end{tabular} +\end{center} + +The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values +in the CSRs and integer registers. CSRRW reads the old value of the +CSR, zero-extends the value to XLEN bits, then writes it to integer +register {\em rd}. The initial value in {\em rs1} is written to the +CSR. If {\em rd}={\tt x0}, then the instruction shall not read the CSR +and shall not cause any of the side-effects that might occur on a CSR +read. + +The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the +value of the CSR, zero-extends the value to XLEN bits, and writes it +to integer register {\em rd}. The initial value in integer register +{\em rs1} is treated as a bit mask that specifies bit positions to be +set in the CSR. Any bit that is high in {\em rs1} will cause the +corresponding bit to be set in the CSR, if that CSR bit is writable. +Other bits in the CSR are unaffected (though CSRs might have side +effects when written). + +The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the +value of the CSR, zero-extends the value to XLEN bits, and writes it +to integer register {\em rd}. The initial value in integer register +{\em rs1} is treated as a bit mask that specifies bit positions to be +cleared in the CSR. Any bit that is high in {\em rs1} will cause the +corresponding bit to be cleared in the CSR, if that CSR bit is +writable. Other bits in the CSR are unaffected. + +For both CSRRS and CSRRC, if {\em rs1}={\tt x0}, then the instruction +will not write to the CSR at all, and so shall not cause any of the +side effects that might otherwise occur on a CSR write, such as +raising illegal instruction exceptions on accesses to read-only CSRs. +Note that if {\em rs1} specifies a register holding a zero value other +than {\tt x0}, the instruction will still attempt to write the +unmodified value back to the CSR and will cause any attendant side effects. + +The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, +and CSRRC respectively, except they update the CSR using an XLEN-bit +value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field +encoded in the {\em rs1} field instead of a value from an integer +register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then +these instructions will not write to the CSR, and shall not cause any +of the side effects that might otherwise occur on a CSR write. For +CSRRWI, if {\em rd}={\tt x0}, then the instruction shall not read the +CSR and shall not cause any of the side-effects that might occur on a +CSR read. + +Some CSRs, such as the instructions retired counter, {\tt instret}, may be +modified as side effects of instruction execution. In these cases, if a CSR +access instruction reads a CSR, it reads the value prior to the execution of +the instruction. If a CSR access instruction writes a CSR, the update occurs +after the execution of the instruction. In particular, a value written to +{\tt instret} by one instruction will be the value read by the following +instruction (i.e., the increment of {\tt instret} caused by the first +instruction retiring happens before the write of the new value). + +The assembler pseudoinstruction to read a CSR, CSRR {\em rd, csr}, is +encoded as CSRRS {\em rd, csr, x0}. The assembler pseudoinstruction +to write a CSR, CSRW {\em csr, rs1}, is encoded as CSRRW {\em x0, csr, + rs1}, while CSRWI {\em csr, uimm}, is encoded as CSRRWI {\em x0, + csr, uimm}. + +Further assembler pseudoinstructions are defined to set and clear +bits in the CSR when the old value is not required: CSRS/CSRC {\em + csr, rs1}; CSRSI/CSRCI {\em csr, uimm}. + +\section{Timers and Counters} + +\vspace{-0.2in} +\begin{center} +\begin{tabular}{M@{}R@{}F@{}R@{}S} +\\ +\instbitrange{31}{20} & +\instbitrange{19}{15} & +\instbitrange{14}{12} & +\instbitrange{11}{7} & +\instbitrange{6}{0} \\ +\hline +\multicolumn{1}{|c|}{csr} & +\multicolumn{1}{c|}{rs1} & +\multicolumn{1}{c|}{funct3} & +\multicolumn{1}{c|}{rd} & +\multicolumn{1}{c|}{opcode} \\ +\hline +12 & 5 & 3 & 5 & 7 \\ +RDCYCLE[H] & 0 & CSRRS & dest & SYSTEM \\ +RDTIME[H] & 0 & CSRRS & dest & SYSTEM \\ +RDINSTRET[H] & 0 & CSRRS & dest & SYSTEM \\ +\end{tabular} +\end{center} + +RV32I provides a number of 64-bit read-only user-level counters, which +are mapped into the 12-bit CSR address space and accessed in 32-bit +pieces using CSRRS instructions. In RV64I, the CSR instructions can +manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and +RDINSTRET pseudoinstructions read the full 64 bits of the {\tt cycle}, +{\tt time}, and {\tt instret} counters. Hence, the RDCYCLEH, RDTIMEH, +and RDINSTRETH instructions are not required in RV64I. + +\begin{commentary} +Some execution environments might prohibit access to counters to +impede timing side-channel attacks. +\end{commentary} + +The RDCYCLE pseudoinstruction reads the low XLEN bits of the {\tt + cycle} CSR which holds a count of the number of clock cycles +executed by the processor core on which the hart is running from +an arbitrary start time in the past. RDCYCLEH is +an RV32I-only instruction that reads bits 63--32 of the same cycle +counter. The underlying 64-bit counter should never overflow in +practice. The rate at which the cycle counter advances will depend on +the implementation and operating environment. The execution +environment should provide a means to determine the current rate +(cycles/second) at which the cycle counter is incrementing. + +\begin{commentary} +RDCYCLE is intended to return the number of cycles executed by the +processor core, not the hart. Precisely defining what is a ``core'' is +difficult given some implementation choices (e.g., AMD Bulldozer). +Precisely defining what is a ``clock cycle'' is also difficult given the +range of implementations (including software emulations), but the +intent is that RDCYCLE is used for performance monitoring along with the +other performance counters. In particular, where there is one +hart/core, one would expect cycle-count/instructions-retired to +measure CPI for a hart. + +Cores don't have to be exposed to software at all, and an implementor +might choose to pretend multiple harts on one physical core are +running on separate cores with one hart/core, and provide separate +cycle counters for each hart. This might make sense in a simple +barrel processor (e.g., CDC 6600 peripheral processors) where +inter-hart timing interactions are non-existent or minimal. + +Where there is more than one hart/core and dynamic multithreading, it +is not generally possible to separate out cycles per hart (especially +with SMT). It might be possible to define a separate performance +counter that tried to capture the number of cycles a particular hart +was running, but this definition would have to be very fuzzy to cover +all the possible threading implementations. For example, should we +only count cycles for which any instruction was issued to execution +for this hart, and/or cycles any instruction retired, or include +cycles this hart was occupying machine resources but couldn't execute +due to stalls while other harts went into execution? Likely, ``all of +the above'' would be needed to have understandable performance stats. +This complexity of defining a per-hart cycle count, and also the need +in any case for a total per-core cycle count when tuning multithreaded +code led to just standardizing the per-core cycle counter, which also +happens to work well for the common single hart/core case. + +Standardizing what happens during ``sleep'' is not practical given +that what ``sleep'' means is not standardized across execution +environments, but if the entire core is paused (entirely clock-gated +or powered-down in deep sleep), then it is not executing clock cycles, +and the cycle count shouldn't be increasing per the spec. There are +many details, e.g., whether clock cycles required to reset a processor +after waking up from a power-down event should be counted, and these +are considered execution-environment-specific details. + +Even though there is no precise definition that works for all +platforms, this is still a useful facility for most platforms, and an +imprecise, common, ``usually correct'' standard here is better than no +standard. The intent of RDCYCLE was primarily performance +monitoring/tuning, and the specification was written with that goal in +mind. +\end{commentary} + +The RDTIME pseudoinstruction reads the low XLEN bits of the {\tt + time} CSR, which counts wall-clock real time that has passed from an +arbitrary start time in the past. RDTIMEH is an RV32I-only instruction +that reads bits 63--32 of the same real-time counter. The underlying 64-bit +counter should never overflow in practice. The execution environment +should provide a means of determining the period of the real-time +counter (seconds/tick). The period must be constant. The +real-time clocks of all harts in a single user application +should be synchronized to within one tick of the real-time clock. The +environment should provide a means to determine the accuracy of the +clock. + +\begin{commentary} +On some simple platforms, cycle count might represent a valid +implementation of RDTIME, but in this case, platforms should implement +the RDTIME instruction as an alias for RDCYCLE to make code more +portable, rather than using RDCYCLE to measure wall-clock time. +\end{commentary} + +The RDINSTRET pseudoinstruction reads the low XLEN bits of the {\tt + instret} CSR, which counts the number of instructions retired by +this hart from some arbitrary start point in the past. RDINSTRETH is +an RV32I-only instruction that reads bits 63--32 of the same +instruction counter. The underlying 64-bit counter that should never +overflow in practice. + +The following code sequence will read a valid 64-bit cycle counter value into +{\tt x3}:{\tt x2}, even if the counter overflows between reading its upper +and lower halves. + +\begin{figure}[h!] +\begin{center} +\begin{verbatim} + again: + rdcycleh x3 + rdcycle x2 + rdcycleh x4 + bne x3, x4, again +\end{verbatim} +\end{center} +\caption{Sample code for reading the 64-bit cycle counter in RV32.} +\label{rdcycle} +\end{figure} + +\begin{commentary} +We would like these basic counters be provided in all implementations as +they are essential for basic performance analysis, adaptive and +dynamic optimization, and to allow an application to work with +real-time streams. Additional counters should be provided to help +diagnose performance problems and these should be made accessible from +user-level application code with low overhead. + +We required the counters be 64 bits wide, even on RV32, as otherwise +it is very difficult for software to determine if values have +overflowed. For a low-end implementation, the upper 32 bits of each +counter can be implemented using software counters incremented by a +trap handler triggered by overflow of the lower 32 bits. The sample +code described above shows how the full 64-bit width value can be +safely read using the individual 32-bit instructions. + +In some applications, it is important to be able to read multiple +counters at the same instant in time. When run under a multitasking +environment, a user thread can suffer a context switch while +attempting to read the counters. One solution is for the user thread +to read the real-time counter before and after reading the other +counters to determine if a context switch occurred in the middle of the +sequence, in which case the reads can be retried. We considered +adding output latches to allow a user thread to snapshot the counter +values atomically, but this would increase the size of the user +context, especially for implementations with a richer set of counters. +\end{commentary} + |