aboutsummaryrefslogtreecommitdiff
path: root/src/c-st-ext.adoc
blob: 97aca5faef4233c004d19e4d6eca7934c6bab075 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
[[compressed]]
== "C" Extension for Compressed Instructions, Version 2.0

This chapter describes the RISC-V standard compressed instruction-set
extension, named "C", which reduces static and dynamic code size by
adding short 16-bit instruction encodings for common operations. The C
extension can be added to any of the base ISAs (RV32, RV64, RV128), and
we use the generic term "RVC" to cover any of these. Typically,
50%-60% of the RISC-V instructions in a program can be replaced with RVC
instructions, resulting in a 25%-30% code-size reduction.

=== Overview

RVC uses a simple compression scheme that offers shorter 16-bit versions
of common 32-bit RISC-V instructions when:

* the immediate or address offset is small, or
* one of the registers is the zero register (`x0`), the ABI link register
(`x1`), or the ABI stack pointer (`x2`), or
* the destination register and the first source register are identical, or
* the registers used are the 8 most popular ones.

The C extension is compatible with all other standard instruction
extensions. The C extension allows 16-bit instructions to be freely
intermixed with 32-bit instructions, with the latter now able to start
on any 16-bit boundary, i.e., IALIGN=16. With the addition of the C
extension, no instructions can raise instruction-address-misaligned
exceptions.

[NOTE]
====
Removing the 32-bit alignment constraint on the original 32-bit
instructions allows significantly greater code density.
====

The compressed instruction encodings are mostly common across RV32C,
RV64C, and RV128C, but as shown in <<rvc-instr-table0, Table 34>>, a few opcodes are used for
different purposes depending on base ISA. For example, the wider
address-space RV64C and RV128C variants require additional opcodes to
compress loads and stores of 64-bit integer values, while RV32C uses the
same opcodes to compress loads and stores of single-precision
floating-point values. Similarly, RV128C requires additional opcodes to
capture loads and stores of 128-bit integer values, while these same
opcodes are used for loads and stores of double-precision floating-point
values in RV32C and RV64C. If the C extension is implemented, the
appropriate compressed floating-point load and store instructions must
be provided whenever the relevant standard floating-point extension (F
and/or D) is also implemented. In addition, RV32C includes a compressed
jump and link instruction to compress short-range subroutine calls,
where the same opcode is used to compress ADDIW for RV64C and RV128C.

[TIP]
====
Double-precision loads and stores are a significant fraction of static
and dynamic instructions, hence the motivation to include them in the
RV32C and RV64C encoding.
   
Although single-precision loads and stores are not a significant source
of static or dynamic compression for benchmarks compiled for the
currently supported ABIs, for microcontrollers that only provide
hardware single-precision floating-point units and have an ABI that only
supports single-precision floating-point numbers, the single-precision
loads and stores will be used at least as frequently as double-precision
loads and stores in the measured benchmarks. Hence, the motivation to
provide compressed support for these in RV32C.

Short-range subroutine calls are more likely in small binaries for
microcontrollers, hence the motivation to include these in RV32C.

Although reusing opcodes for different purposes for different base ISAs
adds some complexity to documentation, the impact on implementation
complexity is small even for designs that support multiple base ISAs.
The compressed floating-point load and store variants use the same
instruction format with the same register specifiers as the wider
integer loads and stores.
====

RVC was designed under the constraint that each RVC instruction expands
into a single 32-bit instruction in either the base ISA (RV32I/E, RV64I/E,
or RV128I) or the F and D standard extensions where present. Adopting
this constraint has two main benefits:

* Hardware designs can simply expand RVC instructions during decode,
simplifying verification and minimizing modifications to existing
microarchitectures.

* Compilers can be unaware of the RVC extension and leave code compression
to the assembler and linker, although a compression-aware compiler will
generally be able to produce better results.

[NOTE]
====
We felt the multiple complexity reductions of a simple one-one mapping
between C and base IFD instructions far outweighed the potential gains
of a slightly denser encoding that added additional instructions only
supported in the C extension, or that allowed encoding of multiple IFD
instructions in one C instruction.
====

It is important to note that the C extension is not designed to be a
stand-alone ISA, and is meant to be used alongside a base ISA.

[TIP]
====
Variable-length instruction sets have long been used to improve code
density. For example, the IBM Stretch cite:[stretch], developed in the late 1950s, had
an ISA with 32-bit and 64-bit instructions, where some of the 32-bit
instructions were compressed versions of the full 64-bit instructions.
Stretch also employed the concept of limiting the set of registers that
were addressable in some of the shorter instruction formats, with short
branch instructions that could only refer to one of the index registers.
The later IBM 360 architecture cite:[ibm360] supported a simple variable-length
instruction encoding with 16-bit, 32-bit, or 48-bit instruction formats.

In 1963, CDC introduced the Cray-designed CDC 6600 cite:[cdc6600], a precursor to RISC
architectures, that introduced a register-rich load-store architecture
with instructions of two lengths, 15-bits and 30-bits. The later Cray-1
design used a very similar instruction format, with 16-bit and 32-bit
instruction lengths.

The initial RISC ISAs from the 1980s all picked performance over code
size, which was reasonable for a workstation environment, but not for
embedded systems. Hence, both ARM and MIPS subsequently made versions of
the ISAs that offered smaller code size by offering an alternative
16-bit wide instruction set instead of the standard 32-bit wide
instructions. The compressed RISC ISAs reduced code size relative to
their starting points by about 25-30%, yielding code that was
significantly smaller than 80x86. This result surprised some, as their
intuition was that the variable-length CISC ISA should be smaller than
RISC ISAs that offered only 16-bit and 32-bit formats.

Since the original RISC ISAs did not leave sufficient opcode space free
to include these unplanned compressed instructions, they were instead
developed as complete new ISAs. This meant compilers needed different
code generators for the separate compressed ISAs. The first compressed
RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only a fixed
16-bit instruction size, which gave good reductions in static code size
but caused an increase in dynamic instruction count, which led to lower
performance compared to the original fixed-width 32-bit instruction
size. This led to the development of a second generation of compressed
RISC ISA designs with mixed 16-bit and 32-bit instruction lengths (e.g.,
ARM Thumb2, microMIPS, PowerPC VLE), so that performance was similar to
pure 32-bit instructions but with significant code size savings.
Unfortunately, these different generations of compressed ISAs are
incompatible with each other and with the original uncompressed ISA,
leading to significant complexity in documentation, implementations, and
software tools support.

Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently
supports a compressed instruction format. It is surprising that the most
popular 64-bit ISA for mobile platforms (ARM v8) does not include a
compressed instruction format given that static code size and dynamic
instruction fetch bandwidth are important metrics. Although static code
size is not a major concern in larger systems, instruction fetch
bandwidth can be a major bottleneck in servers running commercial
workloads, which often have a large instruction working set.

Benefiting from 25 years of hindsight, RISC-V was designed to support
compressed instructions from the outset, leaving enough opcode space for
RVC to be added as a simple extension on top of the base ISA (along with
many other extensions). The philosophy of RVC is to reduce code size for
embedded applications _and_ to improve performance and energy-efficiency
for all applications due to fewer misses in the instruction cache.
Waterman shows that RVC fetches 25%-30% fewer instruction bits, which
reduces instruction cache misses by 20%-25%, or roughly the same
performance impact as doubling the instruction cache size. cite:[waterman-ms]
====

=== Compressed Instruction Formats
((((compressed, formats))))

<<rvc-form>> shows the nine compressed instruction
formats. CR, CI, and CSS can use any of the 32 RVI registers, but CIW,
CL, CS, CA, and CB are limited to just 8 of them.
<<registers>> lists these popular registers, which
correspond to registers `x8` to `x15`. Note that there is a separate
version of load and store instructions that use the stack pointer as the
base address register, since saving to and restoring from the stack are
so prevalent, and that they use the CI and CSS formats to allow access
to all 32 data registers. CIW supplies an 8-bit immediate for the
ADDI4SPN instruction.

[NOTE]
====
The RISC-V ABI was changed to make the frequently used registers map to
registers 'x8-x15'. This simplifies the decompression decoder by
having a contiguous naturally aligned set of register numbers, and is
also compatible with the RV32E and RV64E base ISAs, which only have 16 integer
registers.
====
Compressed register-based floating-point loads and stores also use the
CL and CS formats respectively, with the eight registers mapping to `f8` to `f15`.
((((calling convention, standard))))
[NOTE]
====
_The standard RISC-V calling convention maps the most frequently used
floating-point registers to registers `f8` to `f15`, which allows the
same register decompression decoding as for integer register numbers._
====
((((register source spcifiers, c-ext))))
The formats were designed to keep bits for the two register source
specifiers in the same place in all instructions, while the destination
register field can move. When the full 5-bit destination register
specifier is present, it is in the same place as in the 32-bit RISC-V
encoding. Where immediates are sign-extended, the sign extension is
always from bit 12. Immediate fields have been scrambled, as in the base
specification, to reduce the number of immediate muxes required.
[NOTE]
====
The immediate fields are scrambled in the instruction formats instead of
in sequential order so that as many bits as possible are in the same
position in every instruction, thereby simplifying implementations.
====

For many RVC instructions, zero-valued immediates are disallowed and
`x0` is not a valid 5-bit register specifier. These restrictions free up
encoding space for other instructions requiring fewer operand bits.

//[[cr-register]]
//include::images/wavedrom/cr-register.adoc[]
//.Compressed 16-bit RVC instructions
//(((compressed, 16-bit)))

[[rvc-form]]
.Compressed 16-bit RVC instruction formats
//[%header]
[float="center",align="center",cols="1a, 2a",frame="none",grid="none"]
|===
| 
[%autowidth,float="right",align="right",cols="^,^",frame="none",grid="none",options="noheader"]
!===
!Format ! Meaning
!CR ! Register
!CI ! Immediate
!CSS ! Stack-relative Store
!CIW ! Wide Immediate
!CL ! Load
!CS ! Store
!CA ! Arithmetic
!CB ! Branch/Arithmetic
!CJ ! Jump
!===
|
[float="left",align="left",cols="1,1,1,1,1,1,1",options="noheader"]
!===
2+^!15 14 13 12 2+^!11 10 9 8 7 2+^!6 5 4 3 2 ^!1 0
2+^!funct4 2+^!rd/rs1 2+^!rs2 ^!  op
^!funct3 ^!imm 2+^!rd/rs1  2+^!imm ^!  op
^!funct3 3+^!imm  2+^!rs2 ^!  op
^!funct3 4+^!imm ^!rd&#x2032; ^! op
^!funct3 2+^!imm ^!rs1&#x2032; ^!imm ^!rd&#x2032; ^! op
^!funct3 2+^!imm ^!rs1&#x2032; ^! imm ^!rs2&#x2032; ^! op
3+^!funct6 ^!rd&#x2032;/rs1&#x2032; ^!funct2 ^!rs2&#x2032; ^! op
^!funct3 2+^!offset ^!rd&#x2032;/rs1&#x2032; 2+^!offset ^! op
^!funct3 5+^!jump target ^! op
!===
|===

[[registers]]
.Registers specified by the three-bit _rs1_&#x2032;, _rs2_&#x2032;, and _rd_&#x2032; fields of the CIW, CL, CS, CA, and CB formats.
//[cols="20%,10%,10%,10%,10%,10%,10%,10%,10%"]
[float="center",align="center",cols="1a, 1a",frame="none",grid="none"]
|===
| 
[%autowidth,cols="<",frame="none",grid="none",options="noheader"]
!===
!RVC Register Number
!Integer Register Number
!Integer Register ABI Name 
!Floating-Point Register Number
!Floating-Point Register ABI Name 
!===
|

[%autowidth,cols="^,^,^,^,^,^,^,^",options="noheader"]
!===
!`000` !`001` !`010` !`011` !`100` !`101` !`110` !`111`
!`x8` !`x9` !`x10` !`x11` !`x12` !`x13` !`x14`!`x15`
!`s0` !`s1` !`a0` !`a1` !`a2` !`a3` !`a4`!`a5`
!`f8` !`f9` !`f10` !`f11` !`f12` !`f13`!`f14` !`f15`
!`fs0` !`fs1` !`fa0` !`fa1` !`fa2`!`fa3` !`fa4` !`fa5`
!===
|===


=== Load and Store Instructions

To increase the reach of 16-bit instructions, data-transfer instructions
use zero-extended immediates that are scaled by the size of the data in
bytes: ×4 for words, ×8 for double
words, and ×16 for quad words.

RVC provides two variants of loads and stores. One uses the ABI stack
pointer, `x2`, as the base address and can target any data register. The
other can reference one of 8 base address registers and one of 8 data
registers.

==== Stack-Pointer-Based Loads and Stores

include::images/wavedrom/c-sp-load-store.adoc[]
[[c-sp-load-store]]
//.Stack-Pointer-Based Loads and Stores--these instructions use the CI format.

These instructions use the CI format.

C.LWSP loads a 32-bit value from memory into register _rd_. It computes
an effective address by adding the _zero_-extended offset, scaled by 4,
to the stack pointer, `x2`. It expands to `lw rd, offset(x2)`. C.LWSP is
only valid when _rd_&#x2260;x0 the code points with _rd_=x0 are reserved.

C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value
from memory into register _rd_. It computes its effective address by
adding the zero-extended offset, scaled by 8, to the stack pointer,
`x2`. It expands to `ld rd, offset(x2)`. C.LDSP is only valid when
_rd_&#x2260;x0 the code points with
_rd_=x0 are reserved.

C.LQSP is an RV128C-only instruction that loads a 128-bit value from
memory into register _rd_. It computes its effective address by adding
the zero-extended offset, scaled by 16, to the stack pointer, `x2`. It
expands to `lq rd, offset(x2)`. C.LQSP is only valid when
_rd_&#x2260;x0 the code points with
_rd_=x0 are reserved.

C.FLWSP is an RV32FC-only instruction that loads a single-precision
floating-point value from memory into floating-point register _rd_. It
computes its effective address by adding the _zero_-extended offset,
scaled by 4, to the stack pointer, `x2`. It expands to
`flw rd, offset(x2)`.

C.FLDSP is an RV32DC/RV64DC-only instruction that loads a
double-precision floating-point value from memory into floating-point
register _rd_. It computes its effective address by adding the
_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
expands to `fld rd, offset(x2)`.

include::images/wavedrom/c-sp-load-store-css.adoc[]
[[c-sp-load-store-css]]
//.Stack-Pointer-Based Loads and Stores--these instructions use the CSS format.

These instructions use the CSS format.

C.SWSP stores a 32-bit value in register _rs2_ to memory. It computes an
effective address by adding the _zero_-extended offset, scaled by 4, to
the stack pointer, `x2`. It expands to `sw rs2, offset(x2)`.

C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in
register _rs2_ to memory. It computes an effective address by adding the
_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
expands to `sd rs2, offset(x2)`.

C.SQSP is an RV128C-only instruction that stores a 128-bit value in
register _rs2_ to memory. It computes an effective address by adding the
_zero_-extended offset, scaled by 16, to the stack pointer, `x2`. It
expands to `sq rs2, offset(x2)`.

C.FSWSP is an RV32FC-only instruction that stores a single-precision
floating-point value in floating-point register _rs2_ to memory. It
computes an effective address by adding the _zero_-extended offset,
scaled by 4, to the stack pointer, `x2`. It expands to
`fsw rs2, offset(x2)`.

C.FSDSP is an RV32DC/RV64DC-only instruction that stores a
double-precision floating-point value in floating-point register _rs2_
to memory. It computes an effective address by adding the
_zero_-extended offset, scaled by 8, to the stack pointer, `x2`. It
expands to `fsd rs2, offset(x2)`.

[NOTE]
====
Register save/restore code at function entry/exit represents a
significant portion of static code size. The stack-pointer-based
compressed loads and stores in RVC are effective at reducing the
save/restore static code size by a factor of 2 while improving
performance by reducing dynamic instruction bandwidth.

A common mechanism used in other ISAs to further reduce save/restore
code size is load-multiple and store-multiple instructions. We
considered adopting these for RISC-V but noted the following drawbacks
to these instructions:

* These instructions complicate processor implementations.
* For virtual memory systems, some data accesses could be resident in
physical memory and some could not, which requires a new restart
mechanism for partially executed instructions.
* Unlike the rest of the RVC instructions, there is no IFD equivalent to
Load Multiple and Store Multiple.
* Unlike the rest of the RVC instructions, the compiler would have to be
aware of these instructions to both generate the instructions and to
allocate registers in an order to maximize the chances of the them being
saved and stored, since they would be saved and restored in sequential
order.
* Simple microarchitectural implementations will constrain how other
instructions can be scheduled around the load and store multiple
instructions, leading to a potential performance loss.
* The desire for sequential register allocation might conflict with the
featured registers selected for the CIW, CL, CS, CA, and CB formats.

Furthermore, much of the gains can be realized in software by replacing
prologue and epilogue code with subroutine calls to common prologue and
epilogue code, a technique described in Section 5.6 of cite:[waterman-phd].

While reasonable architects might come to different conclusions, we
decided to omit load and store multiple and instead use the
software-only approach of calling save/restore millicode routines to
attain the greatest code size reduction.
====

==== Register-Based Loads and Stores

[[reg-based-ldnstr]]
include::images/wavedrom/reg-based-ldnstr.adoc[]
//.Compressed, register-based load and stores--these instructions use the CL format.
(((compressed, register-based load and store)))
These instructions use the CL format.

C.LW loads a 32-bit value from memory into register
`_rd′_`. It computes an effective address by adding the
_zero_-extended offset, scaled by 4, to the base address in register
`_rs1′_`. It expands to `lw rd′, offset(rs1′)`.

C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from
memory into register `_rd′_`. It computes an effective
address by adding the _zero_-extended offset, scaled by 8, to the base
address in register `_rs1′_`. It expands to
`ld rd′, offset(rs1′)`.

C.LQ is an RV128C-only instruction that loads a 128-bit value from
memory into register `_rd′_`. It computes an effective
address by adding the _zero_-extended offset, scaled by 16, to the base
address in register `_rs1′_`. It expands to
`lq rd′, offset(rs1′)`.

C.FLW is an RV32FC-only instruction that loads a single-precision
floating-point value from memory into floating-point register
`_rd′_`. It computes an effective address by adding the
_zero_-extended offset, scaled by 4, to the base address in register
`_rs1′_`. It expands to
`flw rd′, offset(rs1′)`.

C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision
floating-point value from memory into floating-point register
`_rd′_`. It computes an effective address by adding the
_zero_-extended offset, scaled by 8, to the base address in register
`_rs1′_`. It expands to
`fld rd′, offset(rs1′)`.

[[c-cs-format-ls]]
include::images/wavedrom/c-cs-format-ls.adoc[]
//.Compressed, CS format load and store--these instructions use the CS format.
(((compressed, cs-format load and store)))

These instructions use the CS format.

C.SW stores a 32-bit value in register `_rs2′_` to memory.
It computes an effective address by adding the _zero_-extended offset,
scaled by 4, to the base address in register `_rs1′_`. It
expands to `sw rs2′, offset(rs1′)`.

C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in
register `_rs2′_` to memory. It computes an effective
address by adding the _zero_-extended offset, scaled by 8, to the base
address in register `_rs1′_`. It expands to
`sd rs2′, offset(rs1′)`.

C.SQ is an RV128C-only instruction that stores a 128-bit value in
register `_rs2′_` to memory. It computes an effective
address by adding the _zero_-extended offset, scaled by 16, to the base
address in register `_rs1′_`. It expands to
`sq rs2′, offset(rs1′)`.

C.FSW is an RV32FC-only instruction that stores a single-precision
floating-point value in floating-point register `_rs2′_` to
memory. It computes an effective address by adding the _zero_-extended
offset, scaled by 4, to the base address in register
`_rs1′_`. It expands to
`fsw rs2′, offset(rs1′)`.

C.FSD is an RV32DC/RV64DC-only instruction that stores a
double-precision floating-point value in floating-point register
`_rs2′_` to memory. It computes an effective address by
adding the _zero_-extended offset, scaled by 8, to the base address in
register `_rs1′_`. It expands to
`fsd rs2′, offset(rs1′)`.

=== Control Transfer Instructions

RVC provides unconditional jump instructions and conditional branch
instructions. As with base RVI instructions, the offsets of all RVC
control transfer instructions are in multiples of 2 bytes.

[[c-cj-format-ls]]
include::images/wavedrom/c-cj-format-ls.adoc[]
//.Compressed, CJ format load and store--these instructions use the CJ format.
(((compressed, cj-format load and store)))

These instructions use the CJ format.

C.J performs an unconditional control transfer. The offset is
sign-extended and added to the `pc` to form the jump target address. C.J
can therefore target a &#177;2 KiB range. C.J expands to
`jal x0, offset`.

C.JAL is an RV32C-only instruction that performs the same operation as
C.J, but additionally writes the address of the instruction following
the jump (`pc+2`) to the link register, `x1`. C.JAL expands to
`jal x1, offset`.

[[c-cr-format-ls]]
include::images/wavedrom/c-cr-format-ls.adoc[]
//.Compressed, CR format load and store--these instructions use the CR format.
(((compressed, cr-format load and store)))

These instructions use the CR format.

C.JR (jump register) performs an unconditional control transfer to the
address in register _rs1_. C.JR expands to `jalr x0, 0(rs1)`. C.JR is
only valid when latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code
point with latexmath:[$\textit{rs1}{=}\texttt{x0}$] is reserved.

C.JALR (jump and link register) performs the same operation as C.JR, but
additionally writes the address of the instruction following the jump
(`pc`+2) to the link register, `x1`. C.JALR expands to
`jalr x1, 0(rs1)`. C.JALR is only valid when
latexmath:[$\textit{rs1}{\neq}\texttt{x0}$]; the code point with
latexmath:[$\textit{rs1}{=}\texttt{x0}$] corresponds to the C.EBREAK
instruction.

[TIP]
====
Strictly speaking, C.JALR does not expand exactly to a base RVI
instruction as the value added to the PC to form the link address is 2
rather than 4 as in the base ISA, but supporting both offsets of 2 and 4
bytes is only a very minor change to the base microarchitecture.
====

[[c-cb-format-ls]]
include::images/wavedrom/c-cb-format-ls.adoc[]
//.Compressed, CB format load and store--these instructions use the CB format.
(((compressed, cb-format load and store)))

These instructions use the CB format.

C.BEQZ performs conditional control transfers. The offset is
sign-extended and added to the `pc` to form the branch target address.
It can therefore target a &#177;256 B range. C.BEQZ takes the
branch if the value in register _rs1′_ is zero. It
expands to `beq rs1′, x0, offset`.

C.BNEZ is defined analogously, but it takes the branch if
_rs1′_ contains a nonzero value. It expands to
`bne rs1′, x0, offset`.

=== Integer Computational Instructions

RVC provides several instructions for integer arithmetic and constant
generation.

==== Integer Constant-Generation Instructions

The two constant-generation instructions both use the CI instruction
format and can target any integer register.

[[c-integer-const-gen]]
include::images/wavedrom/c-integer-const-gen.adoc[]
//.Integer constant generation format.
(((compressed, integer constant generation)))


C.LI loads the sign-extended 6-bit immediate, _imm_, into register _rd_.
C.LI expands into `addi rd, x0, imm`. C.LI is only valid when
`_rd_≠x0`; the code points with `_rd_=x0` encode HINTs.

C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the
destination register, clears the bottom 12 bits, and sign-extends bit 17
into all higher bits of the destination. C.LUI expands into
`lui rd, imm`. C.LUI is only valid when 
latexmath:[$\textit{rd}{\neq}{\left\{\texttt{x0},\texttt{x2}\right\}}$],
and when the immediate is not equal to zero. The code points with
_imm_=0 are reserved; the remaining code points with _rd_=`x0` are
HINTs; and the remaining code points with _rd_=`x2` correspond to the
C.ADDI16SP instruction.

==== Integer Register-Immediate Operations

These integer register-immediate operations are encoded in the CI format
and perform operations on an integer register and a 6-bit immediate.

[[c-integer-register-immediate]]
include::images/wavedrom/c-int-reg-immed.adoc[]
//.Integer register-immediate format.
(((compressed, integer register-immediate)))

C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in
register _rd_ then writes the result to _rd_. C.ADDI expands into
`addi rd, rd, imm`. C.ADDI is only valid when
`_rd_≠x0` and `_imm_≠0`. The code
points with `_rd_=x0` encode the C.NOP instruction; the remaining code
points with _imm_=0 encode HINTs.

C.ADDIW is an RV64C/RV128C-only instruction that performs the same
computation but produces a 32-bit result, then sign-extends result to 64
bits. C.ADDIW expands into `addiw rd, rd, imm`. The immediate can be
zero for C.ADDIW, where this corresponds to `sext.w rd`. C.ADDIW is
only valid when `_rd_≠x0`; the code points with
`_rd_=x0` are reserved.

C.ADDI16SP shares the opcode with C.LUI, but has a destination field of
`x2`. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the
value in the stack pointer (`sp=x2`), where the immediate is scaled to
represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to
adjust the stack pointer in procedure prologues and epilogues. It
expands into `addi x2, x2, nzimm[9:4]`. C.ADDI16SP is only valid when
_nzimm_≠0; the code point with _nzimm_=0 is reserved.

[NOTE]
====
In the standard RISC-V calling convention, the stack pointer `sp` is
always 16-byte aligned.
====

[[c-ciw]]
include::images/wavedrom/c-ciw.adoc[]
//.CIW format.
(((compressed, CIW)))
C.ADDI4SPN is a CIW-format instruction that adds a _zero_-extended
non-zero immediate, scaled by 4, to the stack pointer, `x2`, and writes
the result to `rd′`. This instruction is used to generate
pointers to stack-allocated variables, and expands to
`addi rd′, x2, nzuimm[9:2]`. C.ADDI4SPN is only valid when
_nzuimm_≠0; the code points with _nzuimm_=0 are
reserved.

[[c-ci]]
include::images/wavedrom/c-ci.adoc[]
//.CI format.
(((compressed, CI)))

C.SLLI is a CI-format instruction that performs a logical left shift of
the value in register _rd_ then writes the result to _rd_. The shift
amount is encoded in the _shamt_ field. For RV128C, a shift amount of
zero is used to encode a shift of 64. C.SLLI expands into
`slli rd, rd, shamt[5:0]`, except for RV128C with `shamt=0`, which expands to
`slli rd, rd, 64`.

For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1
are designated for custom extensions. For RV32C and RV64C, the shift
amount must be non-zero; the code points with _shamt_=0 are HINTs. For
all base ISAs, the code points with `_rd_=x0` are HINTs, except those
with _shamt[5]_=1 in RV32C.

[[c-srli-srai]]
include::images/wavedrom/c-srli-srai.adoc[]
//.C-SRLI-SRAI format.
(((compressed, C.SRLI, C.SRAI)))

C.SRLI is a CB-format instruction that performs a logical right shift of
the value in register _rd′_ then writes the result to
_rd′_. The shift amount is encoded in the _shamt_ field.
For RV128C, a shift amount of zero is used to encode a shift of 64.
Furthermore, the shift amount is sign-extended for RV128C, and so the
legal shift amounts are 1-31, 64, and 96-127. C.SRLI expands into
`srli rd′, rd′, shamt`, except for
RV128C with `shamt=0`, which expands to
`srli rd′, rd′, 64`.

For RV32C, _shamt[5]_ must be zero; the code points with _shamt[5]_=1
are designated for custom extensions. For RV32C and RV64C, the shift
amount must be non-zero; the code points with _shamt_=0 are HINTs.

C.SRAI is defined analogously to C.SRLI, but instead performs an
arithmetic right shift. C.SRAI expands to
`srai rd′, rd′, shamt`.

[NOTE]
====
Left shifts are usually more frequent than right shifts, as left shifts
are frequently used to scale address values. Right shifts have therefore
been granted less encoding space and are placed in an encoding quadrant
where all other immediates are sign-extended. For RV128, the decision
was made to have the 6-bit shift-amount immediate also be sign-extended.
Apart from reducing the decode complexity, we believe right-shift
amounts of 96-127 will be more useful than 64-95, to allow extraction of
tags located in the high portions of 128-bit address pointers. We note
that RV128C will not be frozen at the same point as RV32C and RV64C, to
allow evaluation of typical usage of 128-bit address-space codes.
====
[[c-andi]]
include::images/wavedrom/c-andi.adoc[]
//.C.ANDI format
(((compressed, C.ANDI)))

C.ANDI is a CB-format instruction that computes the bitwise AND of the
value in register _rd′_ and the sign-extended 6-bit
immediate, then writes the result to _rd′_. C.ANDI
expands to `andi rd′, rd′, imm`.

==== Integer Register-Register Operations

[[c-cr]]
include::images/wavedrom/c-int-reg-to-reg-cr-format.adoc[]
//C.CR format
((((compressed. C.CR))))
These instructions use the CR format.

C.MV copies the value in register _rs2_ into register _rd_. C.MV expands
into `add rd, x0, rs2`. C.MV is only valid when
`rs2≠x0` the code points with `rs2=x0` correspond to the C.JR instruction. The code points with `rs2≠x0` and `rd=x0` are HINTs.

[TIP]
====
_C.MV expands to a different instruction than the canonical MV
pseudoinstruction, which instead uses ADDI. Implementations that handle
MV specially, e.g. using register-renaming hardware, may find it more
convenient to expand C.MV to MV instead of ADD, at slight additional
hardware cost._
====

C.ADD adds the values in registers _rd_ and _rs2_ and writes the result
to register _rd_. C.ADD expands into `add rd, rd, rs2`. C.ADD is only
valid when `rs2≠x0` the code points with `rs2=x0` correspond to the C.JALR
and C.EBREAK instructions. The code points with `rs2≠x0` and rd=x0 are HINTs.

[[c-ca]]
include::images/wavedrom/c-int-reg-to-reg-ca-format.adoc[]
//C.CA format
((((compressed. C.CA))))

These instructions use the CA format.

`C.AND` computes the bitwise `AND` of the values in registers
_rd′_ and _rs2′_, then writes the result
to register _rd′_. `C.AND` expands into
*`_and rd′, rd′, rs2′_`*.

`C.OR` computes the bitwise `OR` of the values in registers
_rd′_ and _rs2′_, then writes the result
to register _rd′_. `C.OR` expands into
*`_or rd&#8242;, rd&#8242;, rs2&#8242;_`*.

`C.XOR` computes the bitwise `XOR` of the values in registers
_rd′_ and _rs2′_, then writes the result
to register _rd′_. `C.XOR` expands into
*`_xor rd′, rd′, rs2′_`*.

`C.SUB` subtracts the value in register _rs2′_ from the
value in register _rd′_, then writes the result to
register _rd′_. `C.SUB` expands into
*`_sub rd′, rd′, rs2′_`*.

`C.ADDW` is an RV64C/RV128C-only instruction that adds the values in
registers _rd′_ and _rs2′_, then
sign-extends the lower 32 bits of the sum before writing the result to
register _rd′_. `C.ADDW` expands into
*`_addw rd′, rd′, rs2′_`*.

`C.SUBW` is an RV64C/RV128C-only instruction that subtracts the value in
register _rs2′_ from the value in register
_rd′_, then sign-extends the lower 32 bits of the
difference before writing the result to register _rd′_.
`C.SUBW` expands into *`_subw rd′, rd′, rs2′_`*.

[NOTE]
====
This group of six instructions do not provide large savings
individually, but do not occupy much encoding space and are
straightforward to implement, and as a group provide a worthwhile
improvement in static and dynamic compression.
====

==== Defined Illegal Instruction

[[c-def-illegal-inst]]
include::images/wavedrom/c-def-illegal-inst.adoc[]
((((compressed. C.DIINST))))

A 16-bit instruction with all bits zero is permanently reserved as an
illegal instruction.

[NOTE]
====
We reserve all-zero instructions to be illegal instructions to help trap
attempts to execute zero-ed or non-existent portions of the memory
space. The all-zero value should not be redefined in any non-standard
extension. Similarly, we reserve instructions with all bits set to 1
(corresponding to very long instructions in the RISC-V variable-length
encoding scheme) as illegal to capture another common value seen in
non-existent memory regions.
====

==== NOP Instruction

[[c-nop-instr]]
include::images/wavedrom/c-nop-instr.adoc[]
((((compressed. C.NOPINSTR))))

`C.NOP` is a CI-format instruction that does not change any user-visible
state, except for advancing the `pc` and incrementing any applicable
performance counters. `C.NOP` expands to `nop`. `C.NOP` is only valid when
_imm_=0; the code points with _imm_≠0 encode HINTs.

==== Breakpoint Instruction

[[c-breakpoint-instr]]
include::images/wavedrom/c-breakpoint-instr.adoc[]
((((compressed. C.BREAKPOINTINSTR))))

Debuggers can use the `C.EBREAK` instruction, which expands to `ebreak`,
to cause control to be transferred back to the debugging environment.
`C.EBREAK` shares the opcode with the `C.ADD` instruction, but with _rd_ and
_rs2_ both zero, thus can also use the `CR` format.

=== Usage of C Instructions in LR/SC Sequences

On implementations that support the C extension, compressed forms of the
I instructions permitted inside constrained LR/SC sequences, as
described in <<sec:lrscseq>>, are also permitted
inside constrained LR/SC sequences.

[NOTE]
====
The implication is that any implementation that claims to support both
the A and C extensions must ensure that LR/SC sequences containing valid
C instructions will eventually complete.
====

[[rvc-hints]]
=== HINT Instructions

A portion of the RVC encoding space is reserved for microarchitectural
HINTs. Like the HINTs in the RV32I base ISA (see
<<rv32i-hints>>), these instructions do not
modify any architectural state, except for advancing the `pc` and any
applicable performance counters. HINTs are executed as no-ops on
implementations that ignore them.

RVC HINTs are encoded as computational instructions that do not modify
the architectural state, either because _rd_=`x0` (e.g.
`C.ADD _x0_, _t0_`), or because _rd_ is overwritten with a copy of itself
(e.g. `C.ADDI _t0_, 0`).

[NOTE]
====
This HINT encoding has been chosen so that simple implementations can
ignore HINTs altogether, and instead execute a HINT as a regular
computational instruction that happens not to mutate the architectural
state.
====

RVC HINTs do not necessarily expand to their RVI HINT counterparts. For
example, `C.ADD` _x0_, _a0_ might not encode the same HINT as
`ADD` _x0_, _x0_, _a0_.

[NOTE]
====
The primary reason to not require an RVC HINT to expand to an RVI HINT
is that HINTs are unlikely to be compressible in the same manner as the
underlying computational instruction. Also, decoupling the RVC and RVI
HINT mappings allows the scarce RVC HINT space to be allocated to the
most popular HINTs, and in particular, to HINTs that are amenable to
macro-op fusion.
====

<<rvc-t-hints, Table 32>> lists all RVC HINT code points. For RV32C, 78%
of the HINT space is reserved for standard HINTs. The remainder of the HINT space is designated for custom HINTs;
no standard HINTs will ever be defined in this subspace.

[[rvc-t-hints]]
.RVC HINT instructions.
[cols="<,<,>,<",options="header",]
|===
|Instruction |Constraints |Code Points |Purpose

|C.NOP |_imm_≠0 |63 .6+.^|_Designated for future standard use_

|C.ADDI | _rd_≠`x0`, _imm_=0 |31

|C.LI | _rd_=`x0` |64

|C.LUI | _rd_=`x0`, _imm_≠0 |63

|C.MV | _rd_=`x0`, _rs2_≠`x0` |31

|C.ADD | _rd_=`x0`, _rs2_≠`x0`, _rs2_≠`x2-x5` | 27

|C.ADD | _rd_=`x0`, _rs2_≠`x2-x5` |4|(rs2=x2) C.NTL.P1 (rs2=x3) C.NTL.PALL (rs2=x4) C.NTL.S1 (rs2=x5) C.NTL.ALL

|C.SLLI |_rd_=`x0`, _imm_≠0 |31 (RV32), 63 (RV64/128)  .5+.^|_Designated for custom use_

|C.SLLI64 | _rd_=_x0_ |1

|C.SLLI64 | _rd_≠`x0`, RV32 and RV64 only |31

|C.SRLI64 | RV32 and RV64 only |8

|C.SRAI64 | RV32 and RV64 only |8
|===

=== RVC Instruction Set Listings

<<rvcopcodemap>> shows a map of the major
opcodes for RVC. Each row of the table corresponds to one quadrant of
the encoding space. The last quadrant, which has the two
least-significant bits set, corresponds to instructions wider than 16
bits, including those in the base ISAs. Several instructions are only
valid for certain operands; when invalid, they are marked either _RES_
to indicate that the opcode is reserved for future standard extensions;
_Custom_ to indicate that the opcode is designated for custom
extensions; or _HINT_ to indicate that the opcode is reserved for
microarchitectural hints (see <<rvc-hints, Section 18.7>>).

<<<

[[rvcopcodemap]]
.RVC opcode map instructions.
[%autowidth,float="center",align="center",cols=">,^,^,^,^,^,^,^,^,^,<]
|===
2+>|inst[15:13] +
inst[1:0] ^.^s|000 ^.^s|001 ^.^s|010 ^.^s|011 ^.^s|100 ^.^s|101 ^.^s|110 ^.^s|111 |

2+>.^|00 .^|ADDI4SPN ^.^|FLD +
FLD +
LQ ^.^| LW ^.^| FLW +
LD +
LD ^.^| _Reserved_ ^.^| FSD +
FSD +
SQ ^.^| SW ^.^| FSW +
SD +
SD 
^.^| RV32 +
RV64 +
RV128

2+>.^|01 ^.^|ADDI ^.^|JAL +
ADDIW +
ADDIW ^.^|LI ^.^|LUI/ADDI16SP ^.^|MISC-ALU ^.^|J ^.^|BEQZ ^.^|BNEZ ^.^|RV32 +
RV64 +
RV128

2+>.^|10 ^.^|SLLI ^.^|FLDSP +
FLDSP +
LQSP ^.^|LWSP ^.^|FLWSP +
LDSP +
LDSP ^.^|J[AL]R/MV/ADD ^.^|FSDSP +
FSDSP +
SQSP ^.^|SWSP ^.^|FSWSP +
SDSP +
SDSP ^.^|RV32 +
RV64 +
RV128

2+>.^|11 9+^|>16b
|===

<<rvc-instr-table0>>, <<rvc-instr-table1>>, and <<rvc-instr-table2>> list the RVC instructions.

[[rvc-instr-table0]]
.Instruction listing for RVC, Quadrant 0
include::images/bytefield/rvc-instr-quad0.adoc[]
//include::images/bytefield/rvc-instr-quad0.png[]

[[rvc-instr-table1]]
.Instruction listing for RVC, Quadrant 1
include::images/bytefield/rvc-instr-quad1.adoc[]
//include::images/bytefield/rvc-instr-quad1.png[]

[[rvc-instr-table2]]
.Instruction listing for RVC, Quadrant 2
include::images/bytefield/rvc-instr-quad2.adoc[]
//include::images/bytefield/rvc-instr-quad2.png[]