1 files changed, 139 insertions, 0 deletions
diff --git a/src/m.tex b/src/m.tex
new file mode 100644
index 0000000..cc289e8
--- /dev/null
+++ b/src/m.tex
@@ -0,0 +1,139 @@
+\chapter{``M'' Standard Extension for Integer Multiplication and
+  Division, Version 2.0}
+
+This chapter describes the standard integer multiplication and
+division instruction extension, which is named ``M'' and contains
+instructions that multiply or divide values held in two integer
+registers.
+
+\begin{commentary}
+We separate integer multiply and divide out from the base to simplify
+low-end implementations, or for applications where integer multiply
+and divide operations are either infrequent or better handled in
+attached accelerators.
+\end{commentary}
+
+\section{Multiplication Operations}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{S@{}R@{}R@{}S@{}R@{}O}
+\\
+\instbitrange{31}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct7} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{funct3} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+7 & 5 & 5 & 3 & 5 & 7 \\
+MULDIV & multiplier & multiplicand & MUL/MULH[[S]U] & dest & OP    \\
+MULDIV & multiplier & multiplicand & MULW           & dest & OP-32 \\
+\end{tabular}
+\end{center}
+
+MUL performs an XLEN-bit$\times$XLEN-bit multiplication and places the
+lower XLEN bits in the destination register.  MULH, MULHU, and MULHSU
+perform the same multiplication but return the upper XLEN bits of the
+full 2$\times$XLEN-bit product, for signed$\times$signed,
+unsigned$\times$unsigned, and signed$\times$unsigned multiplication
+respectively.  If both the high and low bits of the same product are
+required, then the recommended code sequence is: MULH[[S]U] {\em rdh,
+  rs1, rs2}; MUL {\em rdl, rs1, rs2} (source register specifiers must
+be in same order and {\em rdh} cannot be the same as {\em rs1} or {\em
+  rs2}).  Microarchitectures can then fuse these into a single
+multiply operation instead of performing two separate multiplies.
+
+MULW is only valid for RV64, and multiplies the lower
+32 bits of the source registers, placing the sign-extension of the
+lower 32 bits of the result into the destination register.  MUL can be
+used to obtain the upper 32 bits of the 64-bit product, but signed
+arguments must be proper 32-bit signed values, whereas unsigned
+arguments must have their upper 32 bits clear.
+
+\section{Division Operations}
+
+\vspace{-0.2in}
+\begin{center}
+\begin{tabular}{S@{}R@{}R@{}O@{}R@{}O}
+\\
+\instbitrange{31}{25} &
+\instbitrange{24}{20} &
+\instbitrange{19}{15} &
+\instbitrange{14}{12} &
+\instbitrange{11}{7} &
+\instbitrange{6}{0} \\
+\hline
+\multicolumn{1}{|c|}{funct7} &
+\multicolumn{1}{c|}{rs2} &
+\multicolumn{1}{c|}{rs1} &
+\multicolumn{1}{c|}{funct3} &
+\multicolumn{1}{c|}{rd} &
+\multicolumn{1}{c|}{opcode} \\
+\hline
+7 & 5 & 5 & 3 & 5 & 7 \\
+MULDIV & divisor & dividend & DIV[U]/REM[U]   & dest & OP    \\
+MULDIV & divisor & dividend & DIV[U]W/REM[U]W & dest & OP-32 \\
+\end{tabular}
+\end{center}
+
+DIV and DIVU perform signed and unsigned integer division of XLEN
+bits by XLEN bits.  REM and REMU provide the remainder of the
+corresponding division operation.  If both the quotient and remainder
+are required from the same division, the recommended code sequence is:
+DIV[U] {\em rdq, rs1, rs2}; REM[U] {\em rdr, rs1, rs2} ({\em rdq}
+cannot be the same as {\em rs1} or {\em rs2}).  Microarchitectures can
+then fuse these into a single divide operation instead of performing
+two separate divides.
+
+DIVW and DIVUW instructions are only valid for RV64, and divide the
+lower 32 bits of {\em rs1} by the lower 32 bits of {\em rs2}, treating
+them as signed and unsigned integers respectively, placing the 32-bit
+quotient in {\em rd}, sign-extended to 64 bits.  REMW and REMUW
+instructions are only valid for RV64, and provide the corresponding
+signed and unsigned remainder operations respectively. Both REMW and
+REMUW sign-extend the 32-bit result to 64 bits.
+
+The semantics for division by zero and division overflow are summarized in
+Table~\ref{tab:divby0}.  The quotient of division by zero has all bits set,
+i.e. $2^{XLEN}-1$ for unsigned division or $-1$ for signed division.  The
+remainder of division by zero equals the dividend.  Signed division overflow
+occurs only when the most-negative integer, $-2^{XLEN-1}$, is divided by $-1$.
+The quotient of signed division overflow is equal to the dividend, and the
+remainder is zero.  Unsigned division overflow cannot occur.
+
+\vspace{0.1in}
+\begin{table}[h]
+\center
+\begin{tabular}{|l|c|c||c|c|c|c|}
+\hline
+Condition              & Dividend      & Divisor & DIVU         & REMU & DIV           & REM  \\ \hline
+Division by zero       & $x$           & 0       & $2^{XLEN}-1$ & $x$  & $-1$          & $x$  \\
+Overflow (signed only) & $-2^{XLEN-1}$ & $-1$    & --           & --   & $-2^{XLEN-1}$ & 0    \\
+\hline
+\end{tabular}
+\caption{Semantics for division by zero and division overflow.}
+\label{tab:divby0}
+\end{table}
+
+\begin{commentary}
+We considered raising exceptions on integer divide by zero, with these
+exceptions causing a trap in most execution environments.  However,
+this would be the only arithmetic trap in the standard ISA
+(floating-point exceptions set flags and write default values, but do
+not cause traps) and would require language implementors to interact
+with the execution environment's trap handlers for this case.
+Further, where language standards mandate that a divide-by-zero
+exception must cause an immediate control flow change, only a single
+branch instruction needs to be added to each divide operation, and
+this branch instruction can be inserted after the divide and should
+normally be very predictably not taken, adding little runtime
+overhead.
+\end{commentary}