aboutsummaryrefslogtreecommitdiff
path: root/src/counters.adoc
diff options
context:
space:
mode:
authorelisa <elisa@riscv.org>2021-10-05 15:06:55 -0700
committerelisa <elisa@riscv.org>2021-10-05 15:06:55 -0700
commit5921e762efd81682f720130a0d72ce8a1a0da16e (patch)
tree98f1ec7488a804b717b9d2c5e323791834e76739 /src/counters.adoc
parentc6ae16c883f6b937c9696c427f046bfc9f8b25f6 (diff)
downloadriscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.zip
riscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.tar.gz
riscv-isa-manual-5921e762efd81682f720130a0d72ce8a1a0da16e.tar.bz2
adoc formatting and table fixes for intro, a, c, counters, m, rvmo chapters
Diffstat (limited to 'src/counters.adoc')
-rw-r--r--src/counters.adoc36
1 files changed, 18 insertions, 18 deletions
diff --git a/src/counters.adoc b/src/counters.adoc
index 670ed6e..a92b3fc 100644
--- a/src/counters.adoc
+++ b/src/counters.adoc
@@ -1,10 +1,10 @@
[[perf-counters]]
== Counters
-RISC-V ISAs provide a set of up to 32latexmath:[$\times$]64-bit
+RISC-V ISAs provide a set of up to 32_X_64-bit
performance counters and timers that are accessible via unprivileged
-XLEN read-only CSR registers `0xC00`–`0xC1F` (with the upper 32 bits
-accessed via CSR registers `0xC80`–`0xC9F` on RV32). The first three of
+XLEN read-only CSR registers _0xC00_–_0xC1F_ (with the upper 32 bits
+accessed via CSR registers _0xC80_–_0xC9F_ on RV32). The first three of
these (CYCLE, TIME, and INSTRET) have dedicated functions (cycle count,
real-time clock, and instructions-retired respectively), while the
remaining counters, if implemented, provide programmable event counting.
@@ -22,8 +22,8 @@ RV32I provides a number of 64-bit read-only user-level counters, which
are mapped into the 12-bit CSR address space and accessed in 32-bit
pieces using CSRRS instructions. In RV64I, the CSR instructions can
manipulate 64-bit CSRs. In particular, the RDCYCLE, RDTIME, and
-RDINSTRET pseudoinstructions read the full 64 bits of the `cycle`,
-`time`, and `instret` counters. Hence, the RDCYCLEH, RDTIMEH, and
+RDINSTRET pseudoinstructions read the full 64 bits of the _cycle_,
+_time_, and _instret_ counters. Hence, the RDCYCLEH, RDTIMEH, and
RDINSTRETH instructions are RV32I-only.
[NOTE]
@@ -33,7 +33,7 @@ timing side-channel attacks.
====
(((counters, pseudoinstruction)))
-The RDCYCLE pseudoinstruction reads the low XLEN bits of the `cycle`
+The RDCYCLE pseudoinstruction reads the low XLEN bits of the _cycle_
CSR which holds a count of the number of clock cycles executed by the
processor core on which the hart is running from an arbitrary start time
in the past. RDCYCLEH is an RV32I-only instruction that reads bits 63–32
@@ -46,9 +46,9 @@ environment should provide a means to determine the current rate
[TIP]
====
RDCYCLE is intended to return the number of cycles executed by the
-processor core, not the hart. Precisely defining what is
+processor core, not the hart. Precisely defining what is a "core"
difficult given some implementation choices (e.g., AMD Bulldozer).
-Precisely defining what is a `clock cycle` is also difficult given the
+Precisely defining what is a "clock cycle" is also difficult given the
range of implementations (including software emulations), but the intent
is that RDCYCLE is used for performance monitoring along with the other
performance counters. In particular, where there is one hart/core, one
@@ -71,7 +71,7 @@ threading implementations. For example, should we only count cycles for
which any instruction was issued to execution for this hart, and/or
cycles any instruction retired, or include cycles this hart was
occupying machine resources but couldn’t execute due to stalls while
-other harts went into execution? Likely, `all of the above` would be
+other harts went into execution? Likely, _all of the above_ would be
needed to have understandable performance stats. This complexity of
defining a per-hart cycle count, and also the need in any case for a
total per-core cycle count when tuning multithreaded code led to just
@@ -79,8 +79,8 @@ standardizing the per-core cycle counter, which also happens to work
well for the common single hart/core case.
(((counters, handling sleep cycles)))
-Standardizing what happens during `sleep` is not practical given that
-what `sleep` means is not standardized across execution environments,
+Standardizing what happens during "sleep" is not practical given that
+what "sleep" means is not standardized across execution environments,
but if the entire core is paused (entirely clock-gated or powered-down
in deep sleep), then it is not executing clock cycles, and the cycle
count shouldn’t be increasing per the spec. There are many details,
@@ -90,12 +90,12 @@ execution-environment-specific details.
Even though there is no precise definition that works for all platforms,
this is still a useful facility for most platforms, and an imprecise,
-common, `usually correct` standard here is better than no standard.
+common, "usually correct" standard here is better than no standard.
The intent of RDCYCLE was primarily performance monitoring/tuning, and
the specification was written with that goal in mind.
====
-The RDTIME pseudoinstruction reads the low XLEN bits of the ` time` CSR,
+The RDTIME pseudoinstruction reads the low XLEN bits of the *time* CSR,
which counts wall-clock real time that has passed from an arbitrary
start time in the past. RDTIMEH is an RV32I-only instruction that reads
bits 63–32 of the same real-time counter. The underlying 64-bit counter
@@ -116,14 +116,14 @@ portable, rather than using RDCYCLE to measure wall-clock time.
(((counters, pseudoinstructions)))
The RDINSTRET pseudoinstruction reads the low XLEN bits of the
-` instret` CSR, which counts the number of instructions retired by this
+*instret* CSR, which counts the number of instructions retired by this
hart from some arbitrary start point in the past. RDINSTRETH is an
RV32I-only instruction that reads bits 63–32 of the same instruction
counter. The underlying 64-bit counter should never overflow in
practice.
The following code sequence will read a valid 64-bit cycle counter value
-into `x3`:`x2`, even if the counter overflows its lower half between
+into _x3_:_x2_, even if the counter overflows its lower half between
reading its upper and lower halves.
.Sample code for reading the 64-bit cycle counter in RV32.
@@ -168,9 +168,9 @@ implementations with a richer set of counters.
(((counters, performance)))
There is CSR space allocated for 29 additional unprivileged 64-bit
-hardware performance counters, `hpmcounter3`–`hpmcounter31`. For RV32,
+hardware performance counters, _hpmcounter3_–_hpmcounter31_. For RV32,
the upper 32 bits of these performance counters is accessible via
-additional CSRs `hpmcounter3h`–` hpmcounter31h`. These counters count
+additional CSRs _hpmcounter3h_–_hpmcounter31h_. These counters count
platform-specific events and are configured via additional privileged
registers. The number and width of these additional counters, and the
set of events they count is platform-specific.
@@ -184,6 +184,6 @@ counted.
It would be useful to eventually standardize event settings to count
ISA-level metrics, such as the number of floating-point instructions
executed for example, and possibly a few common microarchitectural
-metrics, such as `L1 instruction cache misses`.
+metrics, such as _L1 instruction cache misses_.
====