aboutsummaryrefslogtreecommitdiff
path: root/libjava/java/util/Collections$CheckedMap$CheckedEntrySet.h
diff options
context:
space:
mode:
authorGCC Administrator <gccadmin@gcc.gnu.org>2015-07-18 00:16:12 +0000
committerGCC Administrator <gccadmin@gcc.gnu.org>2015-07-18 00:16:12 +0000
commitca4b5dbd8a483d8560d05f69f057b5727924c067 (patch)
tree14b3e90f3ddb5bc8bdaf2daf0e0fecb546bc7553 /libjava/java/util/Collections$CheckedMap$CheckedEntrySet.h
parent736cad254a59b12856fd0c18a37b47c781d41531 (diff)
downloadgcc-ca4b5dbd8a483d8560d05f69f057b5727924c067.zip
gcc-ca4b5dbd8a483d8560d05f69f057b5727924c067.tar.gz
gcc-ca4b5dbd8a483d8560d05f69f057b5727924c067.tar.bz2
Daily bump.
From-SVN: r225978
Diffstat (limited to 'libjava/java/util/Collections$CheckedMap$CheckedEntrySet.h')
0 files changed, 0 insertions, 0 deletions
/a> 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086
:sectnums:
:doctype: book

[discrete]
= RISC-V Instruction Set Manual
[[rv32]]
[#sec:rv32]
= RV32I Base Integer Instruction Set, Version 2.1

This chapter describes the RV32I base integer instruction set.

[TIP]
====
RV32I was designed to be sufficient to form a compiler target and to
support modern operating system environments. The ISA was also designed
to reduce the hardware required in a minimal implementation. RV32I
contains 40 unique instructions, though a simple implementation might
cover the ECALL/EBREAK instructions with a single SYSTEM hardware
instruction that always traps and might be able to implement the FENCE
instruction as a NOP, reducing base instruction count to 38 total. RV32I
can emulate almost any other ISA extension (except the A extension,
which requires additional hardware support for atomicity).

In practice, a hardware implementation including the machine-mode
privileged architecture will also require the 6 CSR instructions.

Subsets of the base integer ISA might be useful for pedagogical
purposes, but the base has been defined such that there should be little
incentive to subset a real hardware implementation beyond omitting
support for misaligned memory accesses and treating all SYSTEM
instructions as a single trap.
====

[NOTE]
====
The standard RISC-V assembly language syntax is documented in the
Assembly Programmer's Manual cite:[riscv-asm-manual].
====

[NOTE]
====
Most of the commentary for RV32I also applies to the RV64I base.
====

[#sec:rv32i-model]
== Programmers' Model for Base Integer ISA

<<gprs>> shows the unprivileged state for the base
integer ISA. For RV32I, the 32 `x` registers are each 32 bits wide,
i.e., `XLEN=32`. Register `x0` is hardwired with all bits equal to 0.
General purpose registers `x1-x31` hold values that various
instructions interpret as a collection of Boolean values, or as two's
complement signed binary integers or unsigned binary integers.

There is one additional unprivileged register: the program counter `pc`
holds the address of the current instruction.

[[gprs]]
.RISC-V base unprivileged integer register state.
[cols="<,^,>",options="header",width="50%",align="center",grid="rows"]
|===
<| [.small]#XLEN-1#| >| [.small]#0#
3+^| [.small]#x0/zero#
3+^| [.small]#x1#
3+^| [.small]#x2#
3+^| [.small]#x3#
3+^| [.small]#x4#
3+^| [.small]#x5#
3+^| [.small]#x6#
3+^| [.small]#x7#
3+^| [.small]#x8#
3+^| [.small]#x9#
3+^| [.small]#x10#
3+^| [.small]#x11#
3+^| [.small]#x12#
3+^| [.small]#x13#
3+^| [.small]#x14#
3+^| [.small]#x15#
3+^| [.small]#x16#
3+^| [.small]#x17#
3+^| [.small]#x18#
3+^| [.small]#x19#
3+^| [.small]#x20#
3+^| [.small]#x21#
3+^| [.small]#x22#
3+^| [.small]#x23#
3+^| [.small]#x24#
3+^| [.small]#x25#
3+^| [.small]#x26#
3+^| [.small]#x27#
3+^| [.small]#x28#
3+^| [.small]#x29#
3+^| [.small]#x30#
3+^| [.small]#x31#
3+^| [.small]#XLEN#
| [.small]#XLEN-1#| >| [.small]#0#
3+^|  [.small]#pc#
3+^| [.small]#XLEN#
|===
[NOTE]
====
There is no dedicated stack pointer or subroutine return address link
register in the Base Integer ISA; the instruction encoding allows any
`x` register to be used for these purposes. However, the standard
software calling convention uses register `x1` to hold the return
address for a call, with register `x5` available as an alternate link
register. The standard calling convention uses register `x2` as the
stack pointer.

Hardware might choose to accelerate function calls and returns that use
`x1` or `x5`. See the descriptions of the JAL and JALR instructions.

The optional compressed 16-bit instruction format is designed around the
assumption that `x1` is the return address register and `x2` is the
stack pointer. Software using other conventions will operate correctly
but may have greater code size.

The number of available architectural registers can have large impacts
on code size, performance, and energy consumption. Although 16 registers
would arguably be sufficient for an integer ISA running compiled code,
it is impossible to encode a complete ISA with 16 registers in 16-bit
instructions using a 3-address format. Although a 2-address format would
be possible, it would increase instruction count and lower efficiency.
We wanted to avoid intermediate instruction sizes (such as Xtensa's
24-bit instructions) to simplify base hardware implementations, and once
a 32-bit instruction size was adopted, it was straightforward to support
32 integer registers. A larger number of integer registers also helps
performance on high-performance code, where there can be extensive use
of loop unrolling, software pipelining, and cache tiling.

For these reasons, we chose a conventional size of 32 integer registers
for RV32I. Dynamic register usage tends to be dominated by a few
frequently accessed registers, and regfile implementations can be
optimized to reduce access energy for the frequently accessed
registers cite:[jtseng:sbbci]. The optional compressed 16-bit instruction format mostly
only accesses 8 registers and hence can provide a dense instruction
encoding, while additional instruction-set extensions could support a
much larger register space (either flat or hierarchical) if desired.

For resource-constrained embedded applications, we have defined the
RV32E subset, which only has 16 registers
(xref:rv32e.adoc[RV32E and RV64E Base Integer Instruction Sets, Version 2.0]).
====

[#sec:instr]
== Base Instruction Formats
In the base RV32I ISA, there are four core instruction formats
(R/I/S/U), as shown in <<base_instr>>. All are a fixed 32
bits in length. The base ISA has `IALIGN=32`, meaning that instructions must be aligned on a four-byte boundary in memory. An
instruction-address-misaligned exception is generated on a taken branch
or unconditional jump if the target address is not `IALIGN-bit` aligned.
This exception is reported on the branch or jump instruction, not on the
target instruction. No instruction-address-misaligned exception is
generated for a conditional branch that is not taken.

[NOTE]
====
The alignment constraint for base ISA instructions is relaxed to a
two-byte boundary when instruction extensions with 16-bit lengths or
other odd multiples of 16-bit lengths are added (i.e., IALIGN=16).

Instruction-address-misaligned exceptions are reported on the branch or
jump that would cause instruction misalignment to help debugging, and to
simplify hardware design for systems with IALIGN=32, where these are the
only places where misalignment can occur.
====

The behavior upon decoding a reserved instruction is UNSPECIFIED.

[NOTE]
====
Some platforms may require that opcodes reserved for standard use raise
an illegal-instruction exception. Other platforms may permit reserved
opcode space be used for non-conforming extensions.
====

The RISC-V ISA keeps the source (_rs1_ and _rs2_) and destination (_rd_)
registers at the same position in all formats to simplify decoding.
Except for the 5-bit immediates used in CSR instructions
(xref:zicsr.adoc#sec:csrinsts[CSR Instructions]),  immediates are always
sign-extended, and are generally packed towards the leftmost available
bits in the instruction and have been allocated to reduce hardware
complexity. In particular, the sign bit for all immediates is always in
bit 31 of the instruction to speed sign-extension circuitry.

include::unpriv:partial$wavedrom/instruction_formats.adoc[]
[[base_instr,Base instruction formats]]
RISC-V base instruction formats. Each immediate subfield is labeled with the bit position (imm[x]) in the immediate value being produced, rather than the bit position within the instruction's immediate field as is usually done.

[NOTE]
====
Decoding register specifiers is usually on the critical paths in
implementations, and so the instruction format was chosen to keep all
register specifiers at the same position in all formats at the expense
of having to move immediate bits across formats (a property shared with
RISC-IV aka. SPUR cite:[spur-jsscc1989]).

In practice, most immediates are either small or require all XLEN bits.
We chose an asymmetric immediate split (12 bits in regular instructions
plus a special load-upper-immediate instruction with 20 bits) to
increase the opcode space available for regular instructions.

Immediates are sign-extended because we did not observe a benefit to
using zero-extension for some immediates as in the MIPS ISA and wanted
to keep the ISA as simple as possible.
====

[#sec:imm-variants]
== Immediate Encoding Variants

There are a further two variants of the instruction formats (B/J) based
on the handling of immediates, as shown in <<baseinstformatsimm>>.

include::unpriv:partial$wavedrom/immediate_variants.adoc[]
[[baseinstformatsimm,Base instruction formats immediate variants.]]
//.RISC-V base instruction formats showing immediate variants.


The only difference between the S and B formats is that the 12-bit
immediate field is used to encode branch offsets in multiples of 2 in
the B format. Instead of shifting all bits in the instruction-encoded
immediate left by one in hardware as is conventionally done, the middle
bits (imm[10:1]) and sign bit stay in fixed positions, while the lowest
bit in S format (inst[7]) encodes a high-order bit in B format.

Similarly, the only difference between the U and J formats is that the
20-bit immediate is shifted left by 12 bits to form U immediates and by
1 bit to form J immediates. The location of instruction bits in the U
and J format immediates is chosen to maximize overlap with the other
formats and with each other.

<<immtypes>> shows the immediates produced by
each of the base instruction formats, and is labeled to show which
instruction bit (inst[_y_]) produces each bit of the immediate value.
[[immtypes, Immediate types]]
.Types of immediate produced by RISC-V instructions.
include::unpriv:partial$wavedrom/immediate.adoc[]

The fields are labeled with the instruction bits used to construct their value.  Sign extensions always uses inst[31].

[NOTE]
====
Sign-extension is one of the most critical operations on immediates
(particularly for XLEN>32), and in RISC-V the sign bit for
all immediates is always held in bit 31 of the instruction to allow
sign-extension to proceed in parallel with instruction decoding.

Although more complex implementations might have separate adders for
branch and jump calculations and so would not benefit from keeping the
location of immediate bits constant across types of instruction, we
wanted to reduce the hardware cost of the simplest implementations. By
rotating bits in the instruction encoding of B and J immediates instead
of using dynamic hardware muxes to multiply the immediate by 2, we
reduce instruction signal fanout and immediate mux costs by around a
factor of 2. The scrambled immediate encoding will add negligible time
to static or ahead-of-time compilation. For dynamic generation of
instructions, there is some small additional overhead, but the most
common short forward branches have straightforward immediate encodings.
====

[#sec:int-comp]
== Integer Computational Instructions

Most integer computational instructions operate on `XLEN` bits of values
held in the integer register file. Integer computational instructions
are either encoded as register-immediate operations using the I-type
format or as register-register operations using the R-type format. The
destination is register _rd_ for both register-immediate and
register-register instructions. No integer computational instructions
cause arithmetic exceptions.

[TIP]
====
We did not include special instruction-set support for overflow checks
on integer arithmetic operations in the base instruction set, as many
overflow checks can be cheaply implemented using RISC-V branches.
Overflow checking for unsigned addition requires only a single
additional branch instruction after the addition:
`add t0, t1, t2; bltu t0, t1, overflow`.

For signed addition, if one operand's sign is known, overflow checking
requires only a single branch after the addition:
`addi t0, t1, +imm; blt t0, t1, overflow`. This covers the common case
of addition with an immediate operand.

For general signed addition, three additional instructions after the
addition are required, leveraging the observation that the sum should be
less than one of the operands if and only if the other operand is
negative.

[source,text]
....
         add t0, t1, t2
         slti t3, t2, 0
         slt t4, t0, t1
         bne t3, t4, overflow
....

In RV64I, checks of 32-bit signed additions can be optimized further by
comparing the results of ADD and ADDW on the operands.
====

[#sec:int-reg-imm]
=== Integer Register-Immediate Instructions

include::unpriv:partial$wavedrom/integer_computational.adoc[]
//.Integer Computational Instructions

ADDI adds the sign-extended 12-bit immediate to register _rs1_.
Arithmetic overflow is ignored and the result is simply the low XLEN
bits of the result. ADDI _rd, rs1, 0_ is used to implement the MV _rd,
rs1_ assembler pseudoinstruction.

SLTI (set less than immediate) places the value 1 in register _rd_ if
register _rs1_ is less than the sign-extended immediate when both are
treated as signed numbers, else 0 is written to _rd_. SLTIU is similar
but compares the values as unsigned numbers (i.e., the immediate is
first sign-extended to XLEN bits then treated as an unsigned number).
Note, SLTIU _rd, rs1, 1_ sets _rd_ to 1 if _rs1_ equals zero, otherwise
sets _rd_ to 0 (assembler pseudoinstruction SEQZ _rd, rs_).

ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and
XOR on register _rs1_ and the sign-extended 12-bit immediate and place
the result in _rd_. Note, XORI _rd, rs1, -1_ performs a bitwise logical
inversion of register _rs1_ (assembler pseudoinstruction NOT _rd, rs_).

include::unpriv:partial$wavedrom/int-comp-slli-srli-srai.adoc[]
[[int-comp-slli-srli-srai]]
//.Integer register-immediate, SLLI, SRLI, SRAI

Shifts by a constant are encoded as a specialization of the I-type
format. The operand to be shifted is in _rs1_, and the shift amount is
encoded in the lower 5 bits of the I-immediate field. The right shift
type is encoded in bit 30. SLLI is a logical left shift (zeros are
shifted into the lower bits); SRLI is a logical right shift (zeros are
shifted into the upper bits); and SRAI is an arithmetic right shift (the
original sign bit is copied into the vacated upper bits).

include::unpriv:partial$wavedrom/int-comp-lui-aiupc.adoc[]
[[int-comp-lui-aiupc]]
//.Integer register-immediate, U-immediate

LUI (load upper immediate) is used to build 32-bit constants and uses
the U-type format. LUI places the 32-bit U-immediate value into the
destination register _rd_, filling in the lowest 12 bits with zeros.

AUIPC (add upper immediate to `pc`) is used to build `pc`-relative
addresses and uses the U-type format. AUIPC forms a 32-bit offset from
the U-immediate, filling in the lowest 12 bits with zeros, adds this
offset to the address of the AUIPC instruction, then places the result
in register _rd_.

[NOTE]
====
The assembly syntax for `lui` and `auipc` does not represent the lower
12 bits of the U-immediate, which are always zero.

The AUIPC instruction supports two-instruction sequences to access
arbitrary offsets from the PC for both control-flow transfers and data
accesses. The combination of an AUIPC and the 12-bit immediate in a JALR
can transfer control to any 32-bit PC-relative address, while an AUIPC
plus the 12-bit immediate offset in regular load or store instructions
can access any 32-bit PC-relative data address.

The current PC can be obtained by setting the U-immediate to 0. Although
a JAL +4 instruction could also be used to obtain the local PC (of the
instruction following the JAL), it might cause pipeline breaks in
simpler microarchitectures or pollute BTB structures in more complex
microarchitectures.
====

[#sec:reg-reg]
=== Integer Register-Register Operations

RV32I defines several arithmetic R-type operations. All operations read
the _rs1_ and _rs2_ registers as source operands and write the result
into register _rd_. The _funct7_ and _funct3_ fields select the type of
operation.

include::unpriv:partial$wavedrom/int_reg-reg.adoc[]
[[int-reg-reg]]
//.Integer register-register

ADD performs the addition of _rs1_ and _rs2_. SUB performs the
subtraction of _rs2_ from _rs1_. Overflows are ignored and the low XLEN
bits of results are written to the destination _rd_. SLT and SLTU
perform signed and unsigned compares respectively, writing 1 to _rd_ if
_rs1_ < _rs2_, 0 otherwise. Note, SLTU _rd_, _x0_, _rs2_ sets _rd_ to 1 if
_rs2_ is not equal to zero, otherwise sets _rd_ to zero (assembler
pseudoinstruction SNEZ _rd, rs_). AND, OR, and XOR perform bitwise
logical operations.

SLL, SRL, and SRA perform logical left, logical right, and arithmetic
right shifts on the value in register _rs1_ by the shift amount held in
the lower 5 bits of register _rs2_.

[#sec:nop]
=== NOP Instruction

include::unpriv:partial$wavedrom/nop.adoc[]
[[nop]]
//.NOP instructions

The NOP instruction does not change any architecturally visible state,
except for advancing the `pc` and incrementing any applicable
performance counters. NOP is encoded as ADDI _x0, x0, 0_.

[NOTE]
====
NOPs can be used to align code segments to microarchitecturally
significant address boundaries, or to leave space for inline code
modifications. Although there are many possible ways to encode a NOP, we
define a canonical NOP encoding to allow microarchitectural
optimizations as well as for more readable disassembly output. The other
NOP encodings are made available for <<rv32i-hints>>.

ADDI was chosen for the NOP encoding as this is most likely to take
fewest resources to execute across a range of systems (if not optimized
away in decode). In particular, the instruction only reads one register.
Also, an ADDI functional unit is more likely to be available in a
superscalar design as adds are the most common operation. In particular,
address-generation functional units can execute ADDI using the same
hardware needed for base+offset address calculations, while
register-register ADD or logical/shift operations require additional
hardware.
====

[#sec:ct-instructions]
== Control Transfer Instructions
RV32I provides two types of control transfer instructions: unconditional
jumps and conditional branches. Control transfer instructions in RV32I
do _not_ have architecturally visible delay slots.

If an instruction access-fault or instruction page-fault exception
occurs on the target of a jump or taken branch, the exception is
reported on the target instruction, not on the jump or branch
instruction.

[#sec:ct-branches]
=== Unconditional Jumps
The jump and link (JAL) instruction uses the J-type format, where the
J-immediate encodes a signed offset in multiples of 2 bytes. The offset
is sign-extended and added to the address of the jump instruction to
form the jump target address. Jumps can therefore target a
&#177;1 MiB range. JAL stores the address of the instruction
following the jump ('pc'+4) into register _rd_. The standard software
calling convention uses 'x1' as the return address register and 'x5' as
an alternate link register.

[NOTE]
====
The alternate link register supports calling millicode routines (e.g.,
those to save and restore registers in compressed code) while preserving
the regular return address register. The register `x5` was chosen as the
alternate link register as it maps to a temporary in the standard
calling convention, and has an encoding that is only one bit different
than the regular link register.
====

Plain unconditional jumps (assembler pseudoinstruction J) are encoded as
a JAL with _rd_=`x0`.

include::unpriv:partial$wavedrom/ct-unconditional.adoc[]
[[ct-unconditional]]
//.The unconditional-jump instruction, JAL

The indirect jump instruction JALR (jump and link register) uses the
I-type encoding. The target address is obtained by adding the
sign-extended 12-bit I-immediate to the register _rs1_, then setting the
least-significant bit of the result to zero. The address of the
instruction following the jump (`pc`+4) is written to register _rd_.
Register `x0` can be used as the destination if the result is not
required.

include::unpriv:partial$wavedrom/ct-unconditional-2.adoc[]
[[ct-unconditional-2]]
//.The indirect unconditional-jump instruction, JALR

[NOTE]
====
The unconditional jump instructions all use PC-relative addressing to
help support position-independent code. The JALR instruction was defined
to enable a two-instruction sequence to jump anywhere in a 32-bit
absolute address range. A LUI instruction can first load _rs1_ with the
upper 20 bits of a target address, then JALR can add in the lower bits.
Similarly, AUIPC then JALR can jump anywhere in a 32-bit `pc`-relative
address range.

Note that the JALR instruction does not treat the 12-bit immediate as
multiples of 2 bytes, unlike the conditional branch instructions. This
avoids one more immediate format in hardware. In practice, most uses of
JALR will have either a zero immediate or be paired with a LUI or AUIPC,
so the slight reduction in range is not significant.

Clearing the least-significant bit when calculating the JALR target
address both simplifies the hardware slightly and allows the low bit of
function pointers to be used to store auxiliary information. Although
there is potentially a slight loss of error checking in this case, in
practice jumps to an incorrect instruction address will usually quickly
raise an exception.

When used with a base _rs1_=`x0`, JALR can be used to
implement a single instruction subroutine call to the lowest or highest
address region from anywhere in the address space, which could be used
to implement fast calls to a small runtime library. Alternatively, an
ABI could dedicate a general-purpose register to point to a library
elsewhere in the address space.
====

The JAL and JALR instructions will generate an
instruction-address-misaligned exception if the target address is not
aligned to a four-byte boundary.

[NOTE]
====
Instruction-address-misaligned exceptions are not possible on machines
that support extensions with 16-bit aligned instructions, such as the
compressed instruction-set extension, C.
====

Return-address prediction stacks are a common feature of
high-performance instruction-fetch units, but require accurate detection
of instructions used for procedure calls and returns to be effective.
For RISC-V, hints as to the instructions' usage are encoded implicitly
via the register numbers used. A JAL instruction should push the return
address onto a return-address stack (RAS) only when _rd_ is 'x1' or
`x5`. JALR instructions should push/pop a RAS as shown in <<rashints>>.

[[rashints]]
.Return-address stack prediction hints encoded in the register operands of a JALR instruction.
[%autowidth,float="center",align="center",cols="^,^,^,<",options="header"]
|===
|_rd_ is _x1/x5_ |_rs1_ is _x1/x5_ |_rd_=_rs1_ |RAS action

|No |No |-- |None

|No |Yes |-- |Pop

|Yes |No |-- |Push

|Yes |Yes |No |Pop, then push

|Yes |Yes |Yes |Push
|===


[NOTE]
====
Some other ISAs added explicit hint bits to their indirect-jump
instructions to guide return-address stack manipulation. We use implicit
hinting tied to register numbers and the calling convention to reduce
the encoding space used for these hints.

When two different link registers (`x1` and `x5`) are given as _rs1_ and
_rd_, then the RAS is both popped and pushed to support coroutines. If
_rs1_ and _rd_ are the same link register (either `x1` or `x5`), the RAS
is only pushed to enable macro-op fusion of the sequences:
`lui ra, imm20; jalr ra, imm12(ra)_ and _auipc ra, imm20; jalr ra, imm12(ra)`
====

[#sec:ct-cond-branches]
=== Conditional Branches

All branch instructions use the B-type instruction format. The 12-bit
B-immediate encodes signed offsets in multiples of 2 bytes. The offset
is sign-extended and added to the address of the branch instruction to
give the target address. The conditional branch range is
&#177;4 KiB.

include::unpriv:partial$wavedrom/ct-conditional.adoc[]
[[ct-conditional]]
//.Conditional branches

Branch instructions compare two registers. BEQ and BNE take the branch
if registers _rs1_ and _rs2_ are equal or unequal respectively. BLT and
BLTU take the branch if _rs1_ is less than _rs2_, using signed and
unsigned comparison respectively. BGE and BGEU take the branch if _rs1_
is greater than or equal to _rs2_, using signed and unsigned comparison
respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by
reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

[NOTE]
====
Signed array bounds may be checked with a single BLTU instruction, since
any negative index will compare greater than any nonnegative bound.
====

Software should be optimized such that the sequential code path is the
most common path, with less-frequently taken code paths placed out of
line. Software should also assume that backward branches will be
predicted taken and forward branches as not taken, at least the first
time they are encountered. Dynamic predictors should quickly learn any
predictable branch behavior.

Unlike some other architectures, the RISC-V jump (JAL with _rd_=`x0`)
instruction should always be used for unconditional branches instead of
a conditional branch instruction with an always-true condition. RISC-V
jumps are also PC-relative and support a much wider offset range than
branches, and will not pollute conditional-branch prediction tables.

[TIP]
====
The conditional branches were designed to include arithmetic comparison
operations between two registers (as also done in PA-RISC, Xtensa, and
MIPS R6), rather than use condition codes (x86, ARM, SPARC, PowerPC), or
to only compare one register against zero (Alpha, MIPS), or two
registers only for equality (MIPS). This design was motivated by the
observation that a combined compare-and-branch instruction fits into a
regular pipeline, avoids additional condition code state or use of a
temporary register, and reduces static code size and dynamic instruction
fetch traffic. Another point is that comparisons against zero require
non-trivial circuit delay (especially after the move to static logic in
advanced processes) and so are almost as expensive as arithmetic
magnitude compares. Another advantage of a fused compare-and-branch
instruction is that branches are observed earlier in the front-end
instruction stream, and so can be predicted earlier. There is perhaps an
advantage to a design with condition codes in the case where multiple
branches can be taken based on the same condition codes, but we believe
this case to be relatively rare.

We considered but did not include static branch hints in the instruction
encoding. These can reduce the pressure on dynamic predictors, but
require more instruction encoding space and software profiling for best
results, and can result in poor performance if production runs do not
match profiling runs.

We considered but did not include conditional moves or predicated
instructions, which can effectively replace unpredictable short forward
branches. Conditional moves are the simpler of the two, but are
difficult to use with conditional code that might cause exceptions
(memory accesses and floating-point operations). Predication adds
additional flag state to a system, additional instructions to set and
clear flags, and additional encoding overhead on every instruction. Both
conditional move and predicated instructions add complexity to
out-of-order microarchitectures, adding an implicit third source operand
due to the need to copy the original value of the destination
architectural register into the renamed destination physical register if
the predicate is false. Also, static compile-time decisions to use
predication instead of branches can result in lower performance on
inputs not included in the compiler training set, especially given that
unpredictable branches are rare, and becoming rarer as branch prediction
techniques improve.

We note that various microarchitectural techniques exist to dynamically
convert unpredictable short forward branches into internally predicated
code to avoid the cost of flushing pipelines on a branch mispredict cite:[heil-tr1996], cite:[Klauser-1998], cite:[Kim-micro2005] and
have been implemented in commercial processors  cite:[ibmpower7]. The simplest techniques
just reduce the penalty of recovering from a mispredicted short forward
branch by only flushing instructions in the branch shadow instead of the
entire fetch pipeline, or by fetching instructions from both sides using
wide instruction fetch or idle instruction fetch slots. More complex
techniques for out-of-order cores add internal predicates on
instructions in the branch shadow, with the internal predicate value
written by the branch instruction, allowing the branch and following
instructions to be executed speculatively and out-of-order with respect
to other code.
====

The conditional branch instructions will generate an
instruction-address-misaligned exception if the target address is not
aligned to a four-byte boundary and the branch condition evaluates to
true. If the branch condition evaluates to false, the
instruction-address-misaligned exception will not be raised.

[NOTE]
====
Instruction-address-misaligned exceptions are not possible on machines
that support extensions with 16-bit aligned instructions, such as the
compressed instruction-set extension, C.
====

[[ldst]]

[#sec:rv32i-load-store]
== Load and Store Instructions
RV32I is a load-store architecture, where only load and store
instructions access memory and arithmetic instructions only operate on
CPU registers. RV32I provides a 32-bit address space that is
byte-addressed. The EEI will define what portions of the address space
are legal to access with which instructions (e.g., some addresses might
be read only, or support word access only). Loads with a destination of
`x0` must still raise any exceptions and cause any other side effects
even though the load value is discarded.

The EEI will define whether the memory system is little-endian or
big-endian. In RISC-V, endianness is byte-address invariant.

[TIP]
====
In a system for which endianness is byte-address invariant, the
following property holds: if a byte is stored to memory at some address
in some endianness, then a byte-sized load from that address in any
endianness returns the stored value.

In a little-endian configuration, multibyte stores write the
least-significant register byte at the lowest memory byte address,
followed by the other register bytes in ascending order of their
significance. Loads similarly transfer the contents of the lesser memory
byte addresses to the less-significant register bytes.

In a big-endian configuration, multibyte stores write the
most-significant register byte at the lowest memory byte address,
followed by the other register bytes in descending order of their
significance. Loads similarly transfer the contents of the greater
memory byte addresses to the less-significant register bytes.
====

include::unpriv:partial$wavedrom/load_store.adoc[]
[[load-store,load and store]]
//.Load and store instructions

Load and store instructions transfer a value between the registers and
memory. Loads are encoded in the I-type format and stores are S-type.
The effective address is obtained by adding register _rs1_ to the
sign-extended 12-bit offset. Loads copy a value from memory to register
_rd_. Stores copy the value in register _rs2_ to memory.

The LW instruction loads a 32-bit value from memory into _rd_. LH loads
a 16-bit value from memory, then sign-extends to 32-bits before storing
in _rd_. LHU loads a 16-bit value from memory but then zero extends to
32-bits before storing in _rd_. LB and LBU are defined analogously for
8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and
8-bit values from the low bits of register _rs2_ to memory.

Regardless of EEI, loads and stores whose effective addresses are
naturally aligned shall not raise an address-misaligned exception. Loads
and stores whose effective address is not naturally aligned to the
referenced datatype (i.e., the effective address is not divisible by the
size of the access in bytes) have behavior dependent on the EEI.

An EEI may guarantee that misaligned loads and stores are fully
supported, and so the software running inside the execution environment
will never experience a contained or fatal address-misaligned trap. In
this case, the misaligned loads and stores can be handled in hardware,
or via an invisible trap into the execution environment implementation,
or possibly a combination of hardware and invisible trap depending on
address.

An EEI may not guarantee misaligned loads and stores are handled
invisibly. In this case, loads and stores that are not naturally aligned
may either complete execution successfully or raise an exception. The
exception raised can be either an address-misaligned exception or an
access-fault exception. For a memory access that would otherwise be able
to complete except for the misalignment, an access-fault exception can
be raised instead of an address-misaligned exception if the misaligned
access should not be emulated, e.g., if accesses to the memory region
have side effects. When an EEI does not guarantee misaligned loads and
stores are handled invisibly, the EEI must define if exceptions caused
by address misalignment result in a contained trap (allowing software
running inside the execution environment to handle the trap) or a fatal
trap (terminating execution).

[TIP]
====
Misaligned accesses are occasionally required when porting legacy code,
and help performance on applications when using any form of packed-SIMD
extension or handling externally packed data structures. Our rationale
for allowing EEIs to choose to support misaligned accesses via the
regular load and store instructions is to simplify the addition of
misaligned hardware support. One option would have been to disallow
misaligned accesses in the base ISAs and then provide some separate ISA
support for misaligned accesses, either special instructions to help
software handle misaligned accesses or a new hardware addressing mode
for misaligned accesses. Special instructions are difficult to use,
complicate the ISA, and often add new processor state (e.g., SPARC VIS
align address offset register) or complicate access to existing
processor state (e.g., MIPS LWL/LWR partial register writes). In
addition, for loop-oriented packed-SIMD code, the extra overhead when
operands are misaligned motivates software to provide multiple forms of
loop depending on operand alignment, which complicates code generation
and adds to loop startup overhead. New misaligned hardware addressing
modes take considerable space in the instruction encoding or require
very simplified addressing modes (e.g., register indirect only).
====

Even when misaligned loads and stores complete successfully, these
accesses might run extremely slowly depending on the implementation
(e.g., when implemented via an invisible trap). Furthermore, whereas
naturally aligned loads and stores are guaranteed to execute atomically,
misaligned loads and stores might not, and hence require additional
synchronization to ensure atomicity.

[NOTE]
====
We do not mandate atomicity for misaligned accesses so execution
environment implementations can use an invisible machine trap and a
software handler to handle some or all misaligned accesses. If hardware
misaligned support is provided, software can exploit this by simply
using regular load and store instructions. Hardware can then
automatically optimize accesses depending on whether runtime addresses
are aligned.
====

[#sec:fence]
[#sec:mem-order]
== Memory Ordering Instructions

include::unpriv:partial$wavedrom/mem_order.adoc[]
[[mem-order]]
//.Memory ordering instructions

The FENCE instruction is used to order device I/O and memory accesses as
viewed by other RISC-V harts and external devices or coprocessors. Any
combination of device input (I), device output (O), memory reads \(R),
and memory writes (W) may be ordered with respect to any combination of
the same. Informally, no other RISC-V hart or external device can
observe any operation in the _successor_ set following a FENCE before
any operation in the _predecessor_ set preceding the FENCE.
xref:rvwmo.adoc["RVWMO Memory Consistency Model] provides a precise description
of the RISC-V memory consistency model.

The FENCE instruction also orders memory reads and writes made by the
hart as observed by memory reads and writes made by an external device.
However, FENCE does not order observations of events made by an external
device using any other signaling mechanism.

[NOTE]
====
A device might observe an access to a memory location via some external
communication mechanism, e.g., a memory-mapped control register that
drives an interrupt signal to an interrupt controller. This
communication is outside the scope of the FENCE ordering mechanism and
hence the FENCE instruction can provide no guarantee on when a change in
the interrupt signal is visible to the interrupt controller. Specific
devices might provide additional ordering guarantees to reduce software
overhead but those are outside the scope of the RISC-V memory model.
====

The EEI will define what I/O operations are possible, and in particular,
which memory addresses when accessed by load and store instructions will
be treated and ordered as device input and device output operations
respectively rather than memory reads and writes. For example,
memory-mapped I/O devices will typically be accessed with uncached loads
and stores that are ordered using the I and O bits rather than the R and
W bits. Instruction-set extensions might also describe new I/O
instructions that will also be ordered using the I and O bits in a
FENCE.

[[fm]]
[float="center",align="center",cols="^1,^1,<3",options="header"]
.Fence mode encoding
|===
|_fm_ field |Mnemonic |Meaning
|0000 |_none_ |Normal Fence
|1000 |TSO |With `FENCE RW,RW`: exclude write-to-read ordering; otherwise: _Reserved for future use._
2+|_other_ |_Reserved for future use._
|===

The fence mode field _fm_ defines the semantics of the `FENCE`. A `FENCE`
with _fm_=`0000` orders all memory operations in its predecessor set
before all memory operations in its successor set.

The `FENCE.TSO` instruction is encoded as a `FENCE` instruction
with _fm_=`1000`, _predecessor_=`RW`, and _successor_=`RW`. `FENCE.TSO` orders
all load operations in its predecessor set before all memory operations
in its successor set, and all store operations in its predecessor set
before all store operations in its successor set. This leaves `non-AMO`
store operations in the `FENCE.TSO's` predecessor set unordered with
`non-AMO` loads in its successor set.

[NOTE]
====
Because FENCE RW,RW imposes a superset of the orderings that FENCE.TSO
imposes, it is correct to ignore the _fm_ field and implement FENCE.TSO as FENCE RW,RW.
====

The unused fields in the `FENCE` instructions--_rs1_ and _rd_--are reserved
for finer-grain fences in future extensions. For forward compatibility,
base implementations shall ignore these fields, and standard software
shall zero these fields. Likewise, many _fm_ and predecessor/successor
set settings in <<fm>> are also reserved for future use.
Base implementations shall treat all such reserved configurations as
normal fences with _fm_=0000, and standard software shall use only
non-reserved configurations.

[TIP]
====
We chose a relaxed memory model to allow high performance from simple
machine implementations and from likely future coprocessor or
accelerator extensions. We separate out I/O ordering from memory R/W
ordering to avoid unnecessary serialization within a device-driver hart
and also to support alternative non-memory paths to control added
coprocessors or I/O devices. Simple implementations may additionally
ignore the _predecessor_ and _successor_ fields and always execute a
conservative fence on all operations.
====

[#sec:env-call-breakpoints]
== Environment Call and Breakpoints
`SYSTEM` instructions are used to access system functionality that might
require privileged access and are encoded using the I-type instruction
format. These can be divided into two main classes: those that
atomically read-modify-write control and status registers (CSRs), and
all other potentially privileged instructions. CSR instructions are
described in xref:zicsr.adoc#sec:csrinsts[CSR Instructions], and the base
unprivileged instructions are described in the following section.


[TIP]
====
The SYSTEM instructions are defined to allow simpler implementations to
always trap to a single software trap handler. More sophisticated
implementations might execute more of each system instruction in
hardware.
====

include::unpriv:partial$wavedrom/env_call-breakpoint.adoc[]
[[env-call]]
//.Environment call and breakpoint instructions

These two instructions cause a precise requested trap to the supporting
execution environment.

The `ECALL` instruction is used to make a service request to the execution
environment. The `EEI` will define how parameters for the service request
are passed, but usually these will be in defined locations in the
integer register file.

The `EBREAK` instruction is used to return control to a debugging
environment.

[NOTE]
====
ECALL and EBREAK were previously named SCALL and SBREAK. The
instructions have the same functionality and encoding, but were renamed
to reflect that they can be used more generally than to call a
supervisor-level operating system or debugger.
====

[TIP]
====
EBREAK was primarily designed to be used by a debugger to cause
execution to stop and fall back into the debugger. EBREAK is also used
by the standard gcc compiler to mark code paths that should not be
executed.

Another use of EBREAK is to support "semihosting", where the execution
environment includes a debugger that can provide services over an
alternate system call interface built around the EBREAK instruction.
Because the RISC-V base ISAs do not provide more than one EBREAK
instruction, RISC-V semihosting uses a special sequence of instructions
to distinguish a semihosting EBREAK from a debugger inserted EBREAK.

[source,asm]
....
    slli x0, x0, 0x1f   # Entry NOP
    ebreak              # Break to debugger
    srai x0, x0, 7      # NOP encoding the semihosting call number 7
....

Note that these three instructions must be 32-bit-wide instructions,
i.e., they mustn't be among the compressed 16-bit instructions described
in xref:c-st-ext.adoc["C" Extension for Compressed Instructions].

The shift NOP instructions are still considered available for use as
HINTs.

Semihosting is a form of service call and would be more naturally
encoded as an ECALL using an existing ABI, but this would require the
debugger to be able to intercept ECALLs, which is a newer addition to
the debug standard. We intend to move over to using ECALLs with a
standard ABI, in which case, semihosting can share a service ABI with an
existing standard.

We note that ARM processors have also moved to using SVC instead of BKPT
for semihosting calls in newer designs.
====

[#sec:rv32i-hints]
== HINT Instructions
//[#rv32i-hints,HINT Instructions]

[[rv32i-hints,HINT Instructions]]

RV32I reserves a large encoding space for HINT instructions, which are
usually used to communicate performance hints to the microarchitecture.
Like the NOP instruction, HINTs do not change any architecturally
visible state, except for advancing the `pc` and any applicable
performance counters. Implementations are always allowed to ignore the
encoded hints.

Most RV32I HINTs are encoded as integer computational instructions with
_rd_=x0. The other RV32I HINTs are encoded as FENCE instructions with
a null predecessor or successor set and with _fm_=0.

[NOTE]
====
These HINT encodings have been chosen so that simple implementations can
ignore HINTs altogether, and instead execute a HINT as a regular
instruction that happens not to mutate the architectural state. For
example, ADD is a HINT if the destination register is `x0`; the five-bit
_rs1_ and _rs2_ fields encode arguments to the HINT. However, a simple
implementation can simply execute the HINT as an ADD of _rs1_ and _rs2_
that writes `x0`, which has no architecturally visible effect.

As another example, a FENCE instruction with a zero _pred_ field and a
zero _fm_ field is a HINT; the _succ_, _rs1_, and _rd_ fields encode the
arguments to the HINT. A simple implementation can simply execute the
HINT as a FENCE that orders the null set of prior memory accesses before
whichever subsequent memory accesses are encoded in the _succ_ field.
Since the intersection of the predecessor and successor sets is null,
the instruction imposes no memory orderings, and so it has no
architecturally visible effect.
====

<<t-rv32i-hints>> lists all RV32I HINT code points. 91% of the
HINT space is reserved for standard HINTs. The remainder of the HINT
space is designated for custom HINTs: no standard HINTs will ever be
defined in this subspace.

[TIP]
====
We anticipate standard hints to eventually include memory-system spatial
and temporal locality hints, branch prediction hints, thread-scheduling
hints, security tags, and instrumentation flags for simulation/emulation.
====

// this table might still have some problems--some rows might not have landed properly. It needs to be checked cell-by cell.

[[t-rv32i-hints]]
.RV32I HINT instructions.
[float="center",align="center",cols="<,<,^,<",options="header"]
|===
|Instruction |Constraints |Code Points |Purpose

|LUI |_rd_=`x0` |latexmath:[2^{20}] .8+<.^m|_Designated for future standard use_

|AUIPC |_rd_=`x0` |latexmath:[2^{20}]

|ADDI |_rd_=`x0`, and either _rs1_&#8800;``x0`` or _imm_&#8800;0 |latexmath:[2^{17}-1]

|ANDI |_rd_=`x0` |latexmath:[2^{17}]

|ORI |_rd_=`x0` |latexmath:[2^{17}]

|XORI |_rd_=`x0` |latexmath:[2^{17}]

|ADD |_rd_=`x0`, _rs1_&#8800;``x0`` |latexmath:[2^{10}-32]

|ADD |_rd_=`x0`, _rs1_=`x0`, _rs2_&#8800;``x2-x5`` | 28

|ADD |_rd_=`x0`, _rs1_=`x0`, _rs2_=`x2-x5` |4|(_rs2_=`x2`) NTL.P1 +
(_rs2_=`x3`) NTL.PALL +
(_rs2_=`x4`) NTL.S1 +
(_rs2_=`x5`) NTL.ALL

|SUB |_rd_=`x0` |latexmath:[2^{10}] .11+<.^m|_Designated for future standard use_

|AND |_rd_=`x0` |latexmath:[2^{10}]

|OR |_rd_=`x0` |latexmath:[2^{10}]

|XOR |_rd_=`x0` |latexmath:[2^{10}]

|SLL |_rd_=`x0` |latexmath:[2^{10}]

|SRL |_rd_=`x0` |latexmath:[2^{10}]

|SRA |_rd_=`x0` |latexmath:[2^{10}]

|FENCE|_rd_=`x0`, _rs1_&#8800;``x0``, _fm_=0, and either _pred_=0 or _succ_=0| latexmath:[2^{10}-63]

|FENCE|_rd_&#8800;``x0``, _rs1_=`x0`, _fm_=0, and either _pred_=0 or _succ_=0| latexmath:[2^{10}-63]

|FENCE |_rd_=_rs1_=`x0`, _fm_=0, _pred_=0, _succ_&#8800;0 |15

|FENCE |_rd_=_rs1_=`x0`, _fm_=0, _pred_&#8800;W, _succ_=0 |15

|FENCE |_rd_=_rs1_=`x0`, _fm_=0, _pred_=W, _succ_=0 |1 |PAUSE

4+|

|SLTI |_rd_=`x0` |latexmath:[2^{17}] .7+<.^m|_Designated for custom use_

|SLTIU|_rd_=`x0` |latexmath:[2^{17}]

|SLLI |_rd_=`x0` |latexmath:[2^{10}]

|SRLI |_rd_=`x0` |latexmath:[2^{10}]

|SRAI |_rd_=`x0` |latexmath:[2^{10}]

|SLT |_rd_=`x0` |latexmath:[2^{10}]

|SLTU |_rd_=`x0` |latexmath:[2^{10}]
|===