Provide new description of misaligned load/store behavior compatible with privileged architecture.

author: Krste Asanovic <krste@eecs.berkeley.edu> 2018-08-05 20:21:36 -0700
committer: Krste Asanovic <krste@eecs.berkeley.edu> 2018-08-05 20:21:36 -0700
commit: 61cadb9df80baac7e6da6574f7b81d8e8d97f283 (patch)
tree: 9910e5dbdbaa876530fe8fcdc17f2e8707f463ea
parent: db0b5ca0c2269ea494a2e9cf442b14d3039b4ce4 (diff)
download: riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.zip
riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.tar.gz
riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.tar.bz2
3 files changed, 122 insertions, 82 deletions
diff --git a/src/intro.tex b/src/intro.tex
index 5077ae5..a5fd3bd 100644
--- a/src/intro.tex
+++ b/src/intro.tex
@@ -53,22 +53,28 @@ including various data-parallel accelerators, is an explicit goal of
 the ISA design.
 \end{commentary}
 
-The RISC-V manual is structured in two volumes.  This volume covers
-the design of the base {\em unprivileged} instructions, including
-optional unprivileged ISA extensions.  Unprivileged instructions are
-those that are generally usable in all privilege modes, though
-behavior might vary depending on privilege mode.  The second volume
-provides the design of the first (``classic'') privileged
-architecture.
+The RISC-V ISA is defined avoiding implementation details as much as
+possible (although commentary is included on implementation-driven
+decisions) and should be read as the software-visible interface to a
+wide variety of implementations rather than as the design of a
+particular hardware artifact.  The RISC-V manual is structured in two
+volumes.  This volume covers the design of the base {\em unprivileged}
+instructions, including optional unprivileged ISA extensions.
+Unprivileged instructions are those that are generally usable in all
+privilege modes in all privileged architectures, though behavior might
+vary depending on privilege mode and privilege architecture.  The
+second volume provides the design of the first (``classic'')
+privileged architecture.
 
 \begin{commentary}
 In the unprivileged ISA design, we tried to remove any dependence on
-particular microarchitectural features or on privileged architecture
-details.  This is both for simplicity and to allow maximum flexibility
-for alternative microarchitecture or privileged architecture
-implementations.
+particular microarchitectural features, such as cache line size, or on
+privileged architecture details, such as page translation.  This is
+both for simplicity and to allow maximum flexibility for alternative
+microarchitectures or alternative privileged architectures.
 \end{commentary}
 
+
 \section{RISC-V Hardware Platform Terminology}
 
 A RISC-V hardware platform can contain one or more RISC-V-compatible
@@ -109,17 +115,20 @@ isolation between subsystems.
 \section{RISC-V Software Execution Environments and Harts}
 
 The behavior of a RISC-V program depends on the execution environment
-in which it runs.  The execution environment defines the initial state
-of the program, the number and type of harts in the environment including
-the privilege modes supported by the harts, the
-accessibility and attributes of memory and I/O regions, the behavior
-of all legal instructions executed on each hart, and the handling of
-any interrupts or exceptions raised during execution including
-environment calls.  The implementation of a RISC-V execution
-environment can be pure hardware, pure software, or a combination of
-hardware and software.  For example, opcode traps and software
-emulation can be used to implement functionality not provided in
-hardware.  Examples of execution environments include:
+in which it runs.  A RISC-V execution environment interface (EEI)
+defines the initial state of the program, the number and type of harts
+in the environment including the privilege modes supported by the
+harts, the accessibility and attributes of memory and I/O regions, the
+behavior of all legal instructions executed on each hart (i.e., the
+ISA is one component of the EEI), and the handling of any interrupts
+or exceptions raised during execution including environment calls.
+Examples of EEIs include the Linux application binary interface (ABI),
+or the RISC-V supervisor binary interface (SBI).  The implementation
+of a RISC-V execution environment can be pure hardware, pure software,
+or a combination of hardware and software.  For example, opcode traps
+and software emulation can be used to implement functionality not
+provided in hardware.  Examples of execution environment
+implementations include:
 \begin{itemize}
   \item ``Bare metal'' hardware platforms where harts are directly
     implemented by physical processor threads and instructions have
@@ -137,13 +146,14 @@ hardware.  Examples of execution environments include:
 \end{itemize}
 
 \begin{commentary}
-  A bare hardware platform can be considered an execution environment,
-  where the accessible harts, memory, and other devices populate the
-  execution environment, and the initial state is that at power-on
-  reset.  Generally, most software is designed to use a more abstract
-  interface to the hardware, as this abstract execution environment
-  provides greater portability across different hardware platforms.
-  Often execution environments are layered on top of one another.
+  A bare hardware platform can be considered to define an EEI, where
+  the accessible harts, memory, and other devices populate the
+  environment, and the initial state is that at power-on reset.
+  Generally, most software is designed to use a more abstract
+  interface to the hardware, as more abstract EEIs provide greater
+  portability across different hardware platforms.  Often EEIs are
+  layered on top of one another, where one higher-level EEI uses
+  another lower-level EEI.
 \end{commentary}
 
 From the perspective of software running in a given execution
@@ -151,9 +161,8 @@ environment, a hart is a resource that autonomously fetches and
 executes RISC-V instructions within that execution environment.  In
 this respect, a hart behaves like a hardware thread resource even if
 time-multiplexed onto real hardware by the execution environment.
-Some execution environments support the creation and destruction of
-additional harts, for example, via environment calls to fork new
-harts.
+Some EEIs support the creation and destruction of additional harts,
+for example, via environment calls to fork new harts.
 
 \begin{commentary}
 The term hart was introduced in the work on
@@ -544,13 +553,12 @@ to the transfer of control to a trap handler caused by either an
 exception or an interrupt.
 
 The instruction descriptions in following chapters describe conditions
-that can raise an exception during execution.  The general assumption
-for most RISC-V execution environments is that a trap to some handler
-occurs when an exception is signaled on an instruction (except for
-floating-point exceptions, which, in the standard floating-point
-extensions, do not cause traps).  The manner in which interrupts are
-generated, routed to, and enabled by a hart depends on the execution
-environment.
+that can raise an exception during execution.  The general behavior of
+most RISC-V EEIs is that a trap to some handler occurs when an
+exception is signaled on an instruction (except for floating-point
+exceptions, which, in the standard floating-point extensions, do not
+cause traps).  The manner in which interrupts are generated, routed
+to, and enabled by a hart depends on the EEI.
 
 \begin{commentary}
 Our use of ``exception'' and ``trap'' is compatible with that in the IEEE-754
@@ -564,7 +572,7 @@ by a hart at runtime can have four different effects:
 \begin{description}
   \item[Contained Trap:] The trap is visible to, and handled by,
     software running inside the execution environment.  For example,
-    in an execution environment providing both supervisor and user
+    in an EEI providing both supervisor and user
     mode on harts, an ECALL by a user-mode hart will generally result
     in a transfer of control to a supervisor-mode handler running on
     the same hart.  Similarly, in the same environment, when a hart is
@@ -587,10 +595,9 @@ by a hart at runtime can have four different effects:
     ignore timing effects in these definitions).
   \item[Fatal Trap:] The trap represents a fatal failure and causes
     the execution environment to terminate execution.  Examples
-    include failing a virtual-memory page-protection check or allowing a
-    watchdog timer to expire.  Each execution environment should
-    define how execution is terminated and reported to an external
-    environment.
+    include failing a virtual-memory page-protection check or allowing
+    a watchdog timer to expire.  Each EEI should define how execution
+    is terminated and reported to an external environment.
 \end{description}
 
 The following table shows the characteristics of each kind of trap:
@@ -610,20 +617,20 @@ The following table shows the characteristics of each kind of trap:
     requested, 2) imprecise fatal traps might be observable by software.}
 \end{table}
 
-The execution environment defines for each trap whether it is handled
-precisely, though the recommendation is to maintain preciseness where
-possible.  Contained and requested traps can be observed to be
-imprecise by software inside the execution environment.  Invisible
-traps, by definition, cannot be observed to be precise or imprecise by
-software running inside the execution environment.  Fatal traps can be
-observed to be imprecise by software running inside the execution
-environment, if known-errorful instructions do not cause immediate
-termination.
+The EEI defines for each trap whether it is handled precisely, though
+the recommendation is to maintain preciseness where possible.
+Contained and requested traps can be observed to be imprecise by
+software inside the execution environment.  Invisible traps, by
+definition, cannot be observed to be precise or imprecise by software
+running inside the execution environment.  Fatal traps can be observed
+to be imprecise by software running inside the execution environment,
+if known-errorful instructions do not cause immediate termination.
 
 Because this document describes unprivileged instructions, traps are
 rarely mentioned.  Architectural means to handle contained traps are
-defined in the privileged architecture manual.  Unprivileged
-instructions that are defined solely to cause requested traps are
-documented here.  Invisible traps are, by their nature, out of scope
-for this document.  Instruction encodings that are not defined here
-and not defined by some other means may cause a fatal trap.
+defined in the privileged architecture manual, along with other
+features to support richer EEIs.  Unprivileged instructions that are
+defined solely to cause requested traps are documented here.
+Invisible traps are, by their nature, out of scope for this document.
+Instruction encodings that are not defined here and not defined by
+some other means may cause a fatal trap.
diff --git a/src/preface.tex b/src/preface.tex
index 02ab421..dd37bd5 100644
--- a/src/preface.tex
+++ b/src/preface.tex
@@ -53,7 +53,14 @@ The major changes in this version of the document include:
 \item Added clearer and more precise definitions of execution
   environments, harts, and traps.
 \item Defined instruction-set categories: {\em standard}, {\em
-  reserved}, {\em custom}, {\em non-standard}, and {\em non-conforming}.
+  reserved}, {\em custom}, {\em non-standard}, and {\em
+  non-conforming}.
+\item Changed description of misaligned load and store behavior to
+  support change to be an unprivileged ISA manual versus a user ISA
+  manual.  Now allow visible misaligned exceptions in execution
+  environment interfaces rather then just mandating invisible handling
+  of misaligned loads and stores in user mode.  This behavior was
+  already needed for definition of classic privileged architecture.
 \item Defined the signed-zero behavior of FMIN.{\em fmt} and FMAX.{\em fmt},
   and changed their behavior on signaling-NaN inputs to conform to the
   minimumNumber and maximumNumber operations in the proposed IEEE 754-201x
diff --git a/src/rv32.tex b/src/rv32.tex
index 93525ee..f5aafd4 100644
--- a/src/rv32.tex
+++ b/src/rv32.tex
@@ -949,7 +949,7 @@ speculatively and out-of-order with respect to other code~\cite{ibmpower7}.
 RV32I is a load-store architecture, where only load and store
 instructions access memory and arithmetic instructions only operate on
 CPU registers.  RV32I provides a 32-bit address space that is
-byte-addressed and little-endian.  The execution environment will
+byte-addressed and little-endian.  The EEI will
 define what portions of the address space are legal to access with
 which instructions (e.g., some addresses might be read only, or
 support word access only).  Loads with a destination of {\tt x0} must
@@ -1015,25 +1015,41 @@ defined analogously for 8-bit values.  The SW, SH, and SB instructions
 store 32-bit, 16-bit, and 8-bit values from the low bits of register
 {\em rs2} to memory.
 
-For best performance, the effective address for all loads and stores
-should be naturally aligned for each data type (i.e., on a four-byte
-boundary for 32-bit accesses, and a two-byte boundary for 16-bit
-accesses).  The base ISA supports misaligned accesses, but these might
-run extremely slowly depending on the implementation.  Furthermore,
-naturally aligned loads and stores are guaranteed to execute
-atomically, whereas misaligned loads and stores might not, and hence
-require additional synchronization to ensure atomicity.
+Regardless of EEI, loads and stores whose effective addresses are
+naturally aligned shall not raise an address-misaligned exception.
+Loads and stores where the effective address is not naturally aligned
+to the referenced datatype (i.e., on a four-byte boundary for 32-bit
+accesses, and a two-byte boundary for 16-bit accesses) have behavior
+dependent on the EEI.
+
+An EEI may guarantee that misaligned loads and stores are fully
+supported, and so the software running inside the execution
+environment will never experience a contained or fatal
+address-misaligned trap.  In this case, the misaligned loads and
+stores can be handled in hardware, or via an invisible trap into the
+execution environment implementation, or possibly a combination of
+hardware or software depending on address.
+
+An EEI may not guarantee misaligned loads and stores are handled
+invisibly.  In this case, loads and stores that are not naturally
+aligned may either complete execution successfully or raise an
+address-misaligned exception, but must not fail to execute due to
+misalignment and not raise an address-misaligned exception.  The EEI
+will define if address-misaligned exceptions cause a contained trap
+(allowing software running inside the execution environment to handle
+the trap) or a fatal trap (terminating execution).
 
 \begin{commentary}
 Misaligned accesses are occasionally required when porting legacy
-code, and are essential for good performance on many applications when
-using any form of packed-SIMD extension.  Our rationale for supporting
-misaligned accesses via the regular load and store instructions is to
-simplify the addition of misaligned hardware support.  One option
-would have been to disallow misaligned accesses in the base ISA and
-then provide some separate ISA support for misaligned accesses, either
-special instructions to help software handle misaligned accesses or a
-new hardware addressing mode for misaligned accesses.  Special
+code, and help performance on applications when using any form of
+packed-SIMD extension or handling externally packed data structures.
+Our rationale for allowing EEIs to choose to support misaligned
+accesses via the regular load and store instructions is to simplify
+the addition of misaligned hardware support.  One option would have
+been to disallow misaligned accesses in the base ISA and then provide
+some separate ISA support for misaligned accesses, either special
+instructions to help software handle misaligned accesses or a new
+hardware addressing mode for misaligned accesses.  Special
 instructions are difficult to use, complicate the ISA, and often add
 new processor state (e.g., SPARC VIS align address offset register) or
 complicate access to existing processor state (e.g., MIPS LWL/LWR
@@ -1044,13 +1060,23 @@ alignment, which complicates code generation and adds to loop startup
 overhead.  New misaligned hardware addressing modes take considerable
 space in the instruction encoding or require very simplified
 addressing modes (e.g., register indirect only).
+\end{commentary}
 
-We do not mandate atomicity for misaligned accesses so simple
-implementations can just use a machine trap and software handler to
-handle some or all misaligned accesses.  If hardware misaligned support is
-provided, software can exploit this by simply using regular load and
-store instructions.  Hardware can then automatically optimize accesses
-depending on whether runtime addresses are aligned.
+Even when misaligned loads and stores complete successfully, these
+accesses might run extremely slowly depending on the implementation
+(e.g., when implemented via an invisible trap).  Furthermore, whereas
+naturally aligned loads and stores are guaranteed to execute
+atomically, misaligned loads and stores might not, and hence
+require additional synchronization to ensure atomicity.
+
+\begin{commentary}
+We do not mandate atomicity for misaligned accesses so execution
+environment implementations can use an invisible machine trap and
+a software handler to handle some or all misaligned accesses.  If
+hardware misaligned support is provided, software can exploit this by
+simply using regular load and store instructions.  Hardware can then
+automatically optimize accesses depending on whether runtime addresses
+are aligned.
 \end{commentary}
 
 \section{Control and Status Register Instructions}
author	Krste Asanovic <krste@eecs.berkeley.edu>	2018-08-05 20:21:36 -0700
committer	Krste Asanovic <krste@eecs.berkeley.edu>	2018-08-05 20:21:36 -0700
commit	61cadb9df80baac7e6da6574f7b81d8e8d97f283 (patch)
tree	9910e5dbdbaa876530fe8fcdc17f2e8707f463ea
parent	db0b5ca0c2269ea494a2e9cf442b14d3039b4ce4 (diff)
download	riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.zip riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.tar.gz riscv-isa-manual-61cadb9df80baac7e6da6574f7b81d8e8d97f283.tar.bz2