1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
|
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename g++int.info
@settitle G++ internals
@setchapternewpage odd
@c %**end of header
@node Top, Limitations of g++, (dir), (dir)
@chapter Internal Architecture of the Compiler
This is meant to describe the C++ front-end for gcc in detail.
Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}.
@menu
* Limitations of g++::
* Routines::
* Implementation Specifics::
* Glossary::
* Macros::
* Typical Behavior::
* Coding Conventions::
* Templates::
* Access Control::
* Error Reporting::
* Parser::
* Exception Handling::
* Free Store::
* Mangling:: Function name mangling for C++ and Java
* Concept Index::
@end menu
@node Limitations of g++, Routines, Top, Top
@section Limitations of g++
@itemize @bullet
@item
Limitations on input source code: 240 nesting levels with the parser
stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
16.4k swap space per nesting level. The parser needs about 2.09 *
number of nesting levels worth of stackspace.
@cindex pushdecl_class_level
@item
I suspect there are other uses of pushdecl_class_level that do not call
set_identifier_type_value in tandem with the call to
pushdecl_class_level. It would seem to be an omission.
@cindex access checking
@item
Access checking is unimplemented for nested types.
@cindex @code{volatile}
@item
@code{volatile} is not implemented in general.
@end itemize
@node Routines, Implementation Specifics, Limitations of g++, Top
@section Routines
This section describes some of the routines used in the C++ front-end.
@code{build_vtable} and @code{prepare_fresh_vtable} is used only within
the @file{cp-class.c} file, and only in @code{finish_struct} and
@code{modify_vtable_entries}.
@code{build_vtable}, @code{prepare_fresh_vtable}, and
@code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
@code{finish_struct} can steal the virtual function table from parents,
this prohibits related_vslot from working. When finish_struct steals,
we know that
@example
get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
@end example
@noindent
will get the related binfo.
@code{layout_basetypes} does something with the VIRTUALS.
Supposedly (according to Tiemann) most of the breadth first searching
done, like in @code{get_base_distance} and in @code{get_binfo} was not
because of any design decision. I have since found out the at least one
part of the compiler needs the notion of depth first binfo searching, I
am going to try and convert the whole thing, it should just work. The
term left-most refers to the depth first left-most node. It uses
@code{MAIN_VARIANT == type} as the condition to get left-most, because
the things that have @code{BINFO_OFFSET}s of zero are shared and will
have themselves as their own @code{MAIN_VARIANT}s. The non-shared right
ones, are copies of the left-most one, hence if it is its own
@code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
a non-left-most one.
@code{get_base_distance}'s path and distance matters in its use in:
@itemize @bullet
@item
@code{prepare_fresh_vtable} (the code is probably wrong)
@item
@code{init_vfields} Depends upon distance probably in a safe way,
build_offset_ref might use partial paths to do further lookups,
hack_identifier is probably not properly checking access.
@item
@code{get_first_matching_virtual} probably should check for
@code{get_base_distance} returning -2.
@item
@code{resolve_offset_ref} should be called in a more deterministic
manner. Right now, it is called in some random contexts, like for
arguments at @code{build_method_call} time, @code{default_conversion}
time, @code{convert_arguments} time, @code{build_unary_op} time,
@code{build_c_cast} time, @code{build_modify_expr} time,
@code{convert_for_assignment} time, and
@code{convert_for_initialization} time.
But, there are still more contexts it needs to be called in, one was the
ever simple:
@example
if (obj.*pmi != 7)
@dots{}
@end example
Seems that the problems were due to the fact that @code{TREE_TYPE} of
the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
of the referent (like @code{INTEGER_TYPE}). This problem was fixed by
changing @code{default_conversion} to check @code{TREE_CODE (x)},
instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
was @code{OFFSET_TYPE}.
@end itemize
@node Implementation Specifics, Glossary, Routines, Top
@section Implementation Specifics
@itemize @bullet
@item Explicit Initialization
The global list @code{current_member_init_list} contains the list of
mem-initializers specified in a constructor declaration. For example:
@example
foo::foo() : a(1), b(2) @{@}
@end example
@noindent
will initialize @samp{a} with 1 and @samp{b} with 2.
@code{expand_member_init} places each initialization (a with 1) on the
global list. Then, when the fndecl is being processed,
@code{emit_base_init} runs down the list, initializing them. It used to
be the case that g++ first ran down @code{current_member_init_list},
then ran down the list of members initializing the ones that weren't
explicitly initialized. Things were rewritten to perform the
initializations in order of declaration in the class. So, for the above
example, @samp{a} and @samp{b} will be initialized in the order that
they were declared:
@example
class foo @{ public: int b; int a; foo (); @};
@end example
@noindent
Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
initialized with 1, regardless of how they're listed in the mem-initializer.
@item The Explicit Keyword
The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
to set the field @code{DECL_NONCONVERTING_P}. That value is used by
@code{build_method_call} and @code{build_user_type_conversion_1} to decide
if a particular constructor should be used as a candidate for conversions.
@end itemize
@node Glossary, Macros, Implementation Specifics, Top
@section Glossary
@table @r
@item binfo
The main data structure in the compiler used to represent the
inheritance relationships between classes. The data in the binfo can be
accessed by the BINFO_ accessor macros.
@item vtable
@itemx virtual function table
The virtual function table holds information used in virtual function
dispatching. In the compiler, they are usually referred to as vtables,
or vtbls. The first index is not used in the normal way, I believe it
is probably used for the virtual destructor.
@item vfield
vfields can be thought of as the base information needed to build
vtables. For every vtable that exists for a class, there is a vfield.
See also vtable and virtual function table pointer. When a type is used
as a base class to another type, the virtual function table for the
derived class can be based upon the vtable for the base class, just
extended to include the additional virtual methods declared in the
derived class. The virtual function table from a virtual base class is
never reused in a derived class. @code{is_normal} depends upon this.
@item virtual function table pointer
These are @code{FIELD_DECL}s that are pointer types that point to
vtables. See also vtable and vfield.
@end table
@node Macros, Typical Behavior, Glossary, Top
@section Macros
This section describes some of the macros used on trees. The list
should be alphabetical. Eventually all macros should be documented
here.
@table @code
@item BINFO_BASETYPES
A vector of additional binfos for the types inherited by this basetype.
The binfos are fully unshared (except for virtual bases, in which
case the binfo structure is shared).
If this basetype describes type D as inherited in C,
and if the basetypes of D are E anf F,
then this vector contains binfos for inheritance of E and F by C.
Has values of:
TREE_VECs
@item BINFO_INHERITANCE_CHAIN
Temporarily used to represent specific inheritances. It usually points
to the binfo associated with the lesser derived type, but it can be
reversed by reverse_path. For example:
@example
Z ZbY least derived
|
Y YbX
|
X Xb most derived
TYPE_BINFO (X) == Xb
BINFO_INHERITANCE_CHAIN (Xb) == YbX
BINFO_INHERITANCE_CHAIN (Yb) == ZbY
BINFO_INHERITANCE_CHAIN (Zb) == 0
@end example
Not sure is the above is really true, get_base_distance has is point
towards the most derived type, opposite from above.
Set by build_vbase_path, recursive_bounded_basetype_p,
get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
What things can this be used on:
TREE_VECs that are binfos
@item BINFO_OFFSET
The offset where this basetype appears in its containing type.
BINFO_OFFSET slot holds the offset (in bytes) from the base of the
complete object to the base of the part of the object that is allocated
on behalf of this `type'. This is always 0 except when there is
multiple inheritance.
Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
@item BINFO_VIRTUALS
A unique list of functions for the virtual function table. See also
TYPE_BINFO_VIRTUALS.
What things can this be used on:
TREE_VECs that are binfos
@item BINFO_VTABLE
Used to find the VAR_DECL that is the virtual function table associated
with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual
function table pointer, see CLASSTYPE_VFIELD.
What things can this be used on:
TREE_VECs that are binfos
Has values of:
VAR_DECLs that are virtual function tables
@item BLOCK_SUPERCONTEXT
In the outermost scope of each function, it points to the FUNCTION_DECL
node. It aids in better DWARF support of inline functions.
@item CLASSTYPE_TAGS
CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
these and calls pushtag on them.)
finish_struct scans these to produce TYPE_DECLs to add to the
TYPE_FIELDS of the type.
It is expected that name found in the TREE_PURPOSE slot is unique,
resolve_scope_to_name is one such place that depends upon this
uniqueness.
@item CLASSTYPE_METHOD_VEC
The following is true after finish_struct has been called (on the
class?) but not before. Before finish_struct is called, things are
different to some extent. Contains a TREE_VEC of methods of the class.
The TREE_VEC_LENGTH is the number of differently named methods plus one
for the 0th entry. The 0th entry is always allocated, and reserved for
ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL,
there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a
given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next
method that has the same name (but a different signature). It would
seem that it is not true that because the DECL_CHAIN slot is used in
this way, we cannot call pushdecl to put the method in the global scope
(cause that would overwrite the TREE_CHAIN slot), because they use
different _CHAINs. finish_struct_methods setups up one version of the
TREE_CHAIN slots on the FUNCTION_DECLs.
friends are kept in TREE_LISTs, so that there's no need to use their
TREE_CHAIN slot for anything.
Has values of:
TREE_VECs
@item CLASSTYPE_VFIELD
Seems to be in the process of being renamed TYPE_VFIELD. Use on types
to get the main virtual function table pointer. To get the virtual
function table use BINFO_VTABLE (TYPE_BINFO ()).
Has values of:
FIELD_DECLs that are virtual function table pointers
What things can this be used on:
RECORD_TYPEs
@item DECL_CLASS_CONTEXT
Identifies the context that the _DECL was found in. For virtual function
tables, it points to the type associated with the virtual function
table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
The difference between this and DECL_CONTEXT, is that for virtuals
functions like:
@example
struct A
@{
virtual int f ();
@};
struct B : A
@{
int f ();
@};
DECL_CONTEXT (A::f) == A
DECL_CLASS_CONTEXT (A::f) == A
DECL_CONTEXT (B::f) == A
DECL_CLASS_CONTEXT (B::f) == B
@end example
Has values of:
RECORD_TYPEs, or UNION_TYPEs
What things can this be used on:
TYPE_DECLs, _DECLs
@item DECL_CONTEXT
Identifies the context that the _DECL was found in. Can be used on
virtual function tables to find the type associated with the virtual
function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
better access method. Internally the same as DECL_FIELD_CONTEXT, so
don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
DECL_CLASS_CONTEXT.
Has values of:
RECORD_TYPEs
What things can this be used on:
@display
VAR_DECLs that are virtual function tables
_DECLs
@end display
@item DECL_FIELD_CONTEXT
Identifies the context that the FIELD_DECL was found in. Internally the
same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT,
DECL_FCONTEXT and DECL_CLASS_CONTEXT.
Has values of:
RECORD_TYPEs
What things can this be used on:
@display
FIELD_DECLs that are virtual function pointers
FIELD_DECLs
@end display
@item DECL_NAME
Has values of:
@display
0 for things that don't have names
IDENTIFIER_NODEs for TYPE_DECLs
@end display
@item DECL_IGNORED_P
A bit that can be set to inform the debug information output routines in
the back-end that a certain _DECL node should be totally ignored.
Used in cases where it is known that the debugging information will be
output in another file, or where a sub-type is known not to be needed
because the enclosing type is not needed.
A compiler constructed virtual destructor in derived classes that do not
define an explicit destructor that was defined explicit in a base class
has this bit set as well. Also used on __FUNCTION__ and
__PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and
c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
and ``user-invisible variable.''
Functions built by the C++ front-end such as default destructors,
virtual destructors and default constructors want to be marked that
they are compiler generated, but unsure why.
Currently, it is used in an absolute way in the C++ front-end, as an
optimization, to tell the debug information output routines to not
generate debugging information that will be output by another separately
compiled file.
@item DECL_VIRTUAL_P
A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is
wrong.) Used in VAR_DECLs to indicate that the variable is a vtable.
It is also used in FIELD_DECLs for vtable pointers.
What things can this be used on:
FIELD_DECLs and VAR_DECLs
@item DECL_VPARENT
Used to point to the parent type of the vtable if there is one, else it
is just the type associated with the vtable. Because of the sharing of
virtual function tables that goes on, this slot is not very useful, and
is in fact, not used in the compiler at all. It can be removed.
What things can this be used on:
VAR_DECLs that are virtual function tables
Has values of:
RECORD_TYPEs maybe UNION_TYPEs
@item DECL_FCONTEXT
Used to find the first baseclass in which this FIELD_DECL is defined.
See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
How it is used:
Used when writing out debugging information about vfield and
vbase decls.
What things can this be used on:
FIELD_DECLs that are virtual function pointers
FIELD_DECLs
@item DECL_REFERENCE_SLOT
Used to hold the initialize for the reference.
What things can this be used on:
PARM_DECLs and VAR_DECLs that have a reference type
@item DECL_VINDEX
Used for FUNCTION_DECLs in two different ways. Before the structure
containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
FUNCTION_DECL will replace as a virtual function. When the class is
laid out, this pointer is changed to an INTEGER_CST node which is
suitable to find an index into the virtual function table. See
get_vtable_entry as to how one can find the right index into the virtual
function table. The first index 0, of a virtual function table it not
used in the normal way, so the first real index is 1.
DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
overridden FUNCTION_DECLs. add_virtual_function has code to deal with
this when it uses the variable base_fndecl_list, but it would seem that
somehow, it is possible for the TREE_LIST to pursist until method_call,
and it should not.
What things can this be used on:
FUNCTION_DECLs
@item DECL_SOURCE_FILE
Identifies what source file a particular declaration was found in.
Has values of:
"<built-in>" on TYPE_DECLs to mean the typedef is built in
@item DECL_SOURCE_LINE
Identifies what source line number in the source file the declaration
was found at.
Has values of:
@display
0 for an undefined label
0 for TYPE_DECLs that are internally generated
0 for FUNCTION_DECLs for functions generated by the compiler
(not yet, but should be)
0 for ``magic'' arguments to functions, that the user has no
control over
@end display
@item TREE_USED
Has values of:
0 for unused labels
@item TREE_ADDRESSABLE
A flag that is set for any type that has a constructor.
@item TREE_COMPLEXITY
They seem a kludge way to track recursion, poping, and pushing. They only
appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
proper fixing, and removal.
@item TREE_HAS_CONSTRUCTOR
A flag to indicate when a CALL_EXPR represents a call to a constructor.
If set, we know that the type of the object, is the complete type of the
object, and that the value returned is nonnull. When used in this
fashion, it is an optimization. Can also be used on SAVE_EXPRs to
indicate when they are of fixed type and nonnull. Can also be used on
INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
@item TREE_PRIVATE
Set for FIELD_DECLs by finish_struct. But not uniformly set.
The following routines do something with PRIVATE access:
build_method_call, alter_access, finish_struct_methods,
finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
@item TREE_PROTECTED
The following routines do something with PROTECTED access:
build_method_call, alter_access, finish_struct, convert_to_aggr,
CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
dbxout_type_method_1
@item TYPE_BINFO
Used to get the binfo for the type.
Has values of:
TREE_VECs that are binfos
What things can this be used on:
RECORD_TYPEs
@item TYPE_BINFO_BASETYPES
See also BINFO_BASETYPES.
@item TYPE_BINFO_VIRTUALS
A unique list of functions for the virtual function table. See also
BINFO_VIRTUALS.
What things can this be used on:
RECORD_TYPEs
@item TYPE_BINFO_VTABLE
Points to the virtual function table associated with the given type.
See also BINFO_VTABLE.
What things can this be used on:
RECORD_TYPEs
Has values of:
VAR_DECLs that are virtual function tables
@item TYPE_NAME
Names the type.
Has values of:
@display
0 for things that don't have names.
should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
ENUM_TYPEs.
TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
shouldn't be.
TYPE_DECL for typedefs, unsure why.
@end display
What things can one use this on:
@display
TYPE_DECLs
RECORD_TYPEs
UNION_TYPEs
ENUM_TYPEs
@end display
History:
It currently points to the TYPE_DECL for RECORD_TYPEs,
UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
@item TYPE_METHODS
Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with
@code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a
class.
@item TYPE_DECL
Used to represent typedefs, and used to represent bindings layers.
Components:
DECL_NAME is the name of the typedef. For example, foo would
be found in the DECL_NAME slot when @code{typedef int foo;} is
seen.
DECL_SOURCE_LINE identifies what source line number in the
source file the declaration was found at. A value of 0
indicates that this TYPE_DECL is just an internal binding layer
marker, and does not correspond to a user supplied typedef.
DECL_SOURCE_FILE
@item TYPE_FIELDS
A linked list (via @code{TREE_CHAIN}) of member types of a class. The
list can contain @code{TYPE_DECL}s, but there can also be other things
in the list apparently. See also @code{CLASSTYPE_TAGS}.
@item TYPE_VIRTUAL_P
A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
a virtual function table or a pointer to one. When used on a
@code{FUNCTION_DECL}, indicates that it is a virtual function. When
used on an @code{IDENTIFIER_NODE}, indicates that a function with this
same name exists and has been declared virtual.
When used on types, it indicates that the type has virtual functions, or
is derived from one that does.
Not sure if the above about virtual function tables is still true. See
also info on @code{DECL_VIRTUAL_P}.
What things can this be used on:
FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
@item VF_BASETYPE_VALUE
Get the associated type from the binfo that caused the given vfield to
exist. This is the least derived class (the most parent class) that
needed a virtual function table. It is probably the case that all uses
of this field are misguided, but they need to be examined on a
case-by-case basis. See history for more information on why the
previous statement was made.
Set at @code{finish_base_struct} time.
What things can this be used on:
TREE_LISTs that are vfields
History:
This field was used to determine if a virtual function table's
slot should be filled in with a certain virtual function, by
checking to see if the type returned by VF_BASETYPE_VALUE was a
parent of the context in which the old virtual function existed.
This incorrectly assumes that a given type _could_ not appear as
a parent twice in a given inheritance lattice. For single
inheritance, this would in fact work, because a type could not
possibly appear more than once in an inheritance lattice, but
with multiple inheritance, a type can appear more than once.
@item VF_BINFO_VALUE
Identifies the binfo that caused this vfield to exist. If this vfield
is from the first direct base class that has a virtual function table,
then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL}
on result to find out if it is a virtual base class. Related to the
binfo found by
@example
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
@end example
@noindent
where @samp{t} is the type that has the given vfield.
@example
get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
@end example
@noindent
will return the binfo for the given vfield.
May or may not be set at @code{modify_vtable_entries} time. Set at
@code{finish_base_struct} time.
What things can this be used on:
TREE_LISTs that are vfields
@item VF_DERIVED_VALUE
Identifies the type of the most derived class of the vfield, excluding
the class this vfield is for.
Set at @code{finish_base_struct} time.
What things can this be used on:
TREE_LISTs that are vfields
@item VF_NORMAL_VALUE
Identifies the type of the most derived class of the vfield, including
the class this vfield is for.
Set at @code{finish_base_struct} time.
What things can this be used on:
TREE_LISTs that are vfields
@item WRITABLE_VTABLES
This is a option that can be defined when building the compiler, that
will cause the compiler to output vtables into the data segment so that
the vtables maybe written. This is undefined by default, because
normally the vtables should be unwritable. People that implement object
I/O facilities may, or people that want to change the dynamic type of
objects may want to have the vtables writable. Another way of achieving
this would be to make a copy of the vtable into writable memory, but the
drawback there is that that method only changes the type for one object.
@end table
@node Typical Behavior, Coding Conventions, Macros, Top
@section Typical Behavior
@cindex parse errors
Whenever seemingly normal code fails with errors like
@code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
returning a NULL_TREE for whatever reason.
@node Coding Conventions, Templates, Typical Behavior, Top
@section Coding Conventions
It should never be that case that trees are modified in-place by the
back-end, @emph{unless} it is guaranteed that the semantics are the same
no matter how shared the tree structure is. @file{fold-const.c} still
has some cases where this is not true, but rms hypothesizes that this
will never be a problem.
@node Templates, Access Control, Coding Conventions, Top
@section Templates
A template is represented by a @code{TEMPLATE_DECL}. The specific
fields used are:
@table @code
@item DECL_TEMPLATE_RESULT
The generic decl on which instantiations are based. This looks just
like any other decl.
@item DECL_TEMPLATE_PARMS
The parameters to this template.
@end table
The generic decl is parsed as much like any other decl as possible,
given the parameterization. The template decl is not built up until the
generic decl has been completed. For template classes, a template decl
is generated for each member function and static data member, as well.
Template members of template classes are represented by a TEMPLATE_DECL
for the class' parameters around another TEMPLATE_DECL for the member's
parameters.
All declarations that are instantiations or specializations of templates
refer to their template and parameters through DECL_TEMPLATE_INFO.
How should I handle parsing member functions with the proper param
decls? Set them up again or try to use the same ones? Currently we do
the former. We can probably do this without any extra machinery in
store_pending_inline, by deducing the parameters from the decl in
do_pending_inlines. PRE_PARSED_TEMPLATE_DECL?
If a base is a parm, we can't check anything about it. If a base is not
a parm, we need to check it for name binding. Do finish_base_struct if
no bases are parameterized (only if none, including indirect, are
parms). Nah, don't bother trying to do any of this until instantiation
-- we only need to do name binding in advance.
Always set up method vec and fields, inc. synthesized methods. Really?
We can't know the types of the copy folks, or whether we need a
destructor, or can have a default ctor, until we know our bases and
fields. Otherwise, we can assume and fix ourselves later. Hopefully.
@node Access Control, Error Reporting, Templates, Top
@section Access Control
The function compute_access returns one of three values:
@table @code
@item access_public
means that the field can be accessed by the current lexical scope.
@item access_protected
means that the field cannot be accessed by the current lexical scope
because it is protected.
@item access_private
means that the field cannot be accessed by the current lexical scope
because it is private.
@end table
DECL_ACCESS is used for access declarations; alter_access creates a list
of types and accesses for a given decl.
Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
codes of compute_access and were used as a cache for compute_access.
Now they are not used at all.
TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
granted by the containing class. BEWARE: TREE_PUBLIC means something
completely unrelated to access control!
@node Error Reporting, Parser, Access Control, Top
@section Error Reporting
The C++ front-end uses a call-back mechanism to allow functions to print
out reasonable strings for types and functions without putting extra
logic in the functions where errors are found. The interface is through
the @code{cp_error} function (or @code{cp_warning}, etc.). The
syntax is exactly like that of @code{error}, except that a few more
conversions are supported:
@itemize @bullet
@item
%C indicates a value of `enum tree_code'.
@item
%D indicates a *_DECL node.
@item
%E indicates a *_EXPR node.
@item
%L indicates a value of `enum languages'.
@item
%P indicates the name of a parameter (i.e. "this", "1", "2", ...)
@item
%T indicates a *_TYPE node.
@item
%O indicates the name of an operator (MODIFY_EXPR -> "operator =").
@end itemize
There is some overlap between these; for instance, any of the node
options can be used for printing an identifier (though only @code{%D}
tries to decipher function names).
For a more verbose message (@code{class foo} as opposed to just @code{foo},
including the return type for functions), use @code{%#c}.
To have the line number on the error message indicate the line of the
DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
use @code{%+D}, or it will default to the first.
@node Parser, Exception Handling, Error Reporting, Top
@section Parser
Some comments on the parser:
The @code{after_type_declarator} / @code{notype_declarator} hack is
necessary in order to allow redeclarations of @code{TYPENAME}s, for
instance
@example
typedef int foo;
class A @{
char *foo;
@};
@end example
In the above, the first @code{foo} is parsed as a @code{notype_declarator},
and the second as a @code{after_type_declarator}.
Ambiguities:
There are currently four reduce/reduce ambiguities in the parser. They are:
1) Between @code{template_parm} and
@code{named_class_head_sans_basetype}, for the tokens @code{aggr
identifier}. This situation occurs in code looking like
@example
template <class T> class A @{ @};
@end example
It is ambiguous whether @code{class T} should be parsed as the
declaration of a template type parameter named @code{T} or an unnamed
constant parameter of type @code{class T}. Section 14.6, paragraph 3 of
the January '94 working paper states that the first interpretation is
the correct one. This ambiguity results in two reduce/reduce conflicts.
2) Between @code{primary} and @code{type_id} for code like @samp{int()}
in places where both can be accepted, such as the argument to
@code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies
that these ambiguous constructs will be interpreted as @code{typename}s.
This ambiguity results in six reduce/reduce conflicts between
@samp{absdcl} and @samp{functional_cast}.
3) Between @code{functional_cast} and
@code{complex_direct_notype_declarator}, for various token strings.
This situation occurs in code looking like
@example
int (*a);
@end example
This code is ambiguous; it could be a declaration of the variable
@samp{a} as a pointer to @samp{int}, or it could be a functional cast of
@samp{*a} to @samp{int}. Section 6.8 specifies that the former
interpretation is correct. This ambiguity results in 7 reduce/reduce
conflicts. Another aspect of this ambiguity is code like 'int (x[2]);',
which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
between @samp{direct_notype_declarator} and
@samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r
conflicts between @samp{expr_or_declarator} and @samp{primary} over code
like 'int (a);', which could probably be resolved but would also
probably be more trouble than it's worth. In all, this situation
accounts for 17 conflicts. Ack!
The second case above is responsible for the failure to parse 'LinppFile
ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
Math.h++) as an object declaration, and must be fixed so that it does
not resolve until later.
4) Indirectly between @code{after_type_declarator} and @code{parm}, for
type names. This occurs in (as one example) code like
@example
typedef int foo, bar;
class A @{
foo (bar);
@};
@end example
What is @code{bar} inside the class definition? We currently interpret
it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
@code{after_type_declarator}. I believe that xlC is correct, in light
of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
could possibly be a type name is taken as the @i{decl-specifier-seq} of
a @i{declaration}." However, it seems clear that this rule must be
violated in the case of constructors. This ambiguity accounts for 8
conflicts.
Unlike the others, this ambiguity is not recognized by the Working Paper.
@node Exception Handling, Free Store, Parser, Top
@section Exception Handling
Note, exception handling in g++ is still under development.
This section describes the mapping of C++ exceptions in the C++
front-end, into the back-end exception handling framework.
The basic mechanism of exception handling in the back-end is
unwind-protect a la elisp. This is a general, robust, and language
independent representation for exceptions.
The C++ front-end exceptions are mapping into the unwind-protect
semantics by the C++ front-end. The mapping is describe below.
When -frtti is used, rtti is used to do exception object type checking,
when it isn't used, the encoded name for the type of the object being
thrown is used instead. All code that originates exceptions, even code
that throws exceptions as a side effect, like dynamic casting, and all
code that catches exceptions must be compiled with either -frtti, or
-fno-rtti. It is not possible to mix rtti base exception handling
objects with code that doesn't use rtti. The exceptions to this, are
code that doesn't catch or throw exceptions, catch (...), and code that
just rethrows an exception.
Currently we use the normal mangling used in building functions names
(int's are "i", const char * is PCc) to build the non-rtti base type
descriptors for exception handling. These descriptors are just plain
NULL terminated strings, and internally they are passed around as char
*.
In C++, all cleanups should be protected by exception regions. The
region starts just after the reason why the cleanup is created has
ended. For example, with an automatic variable, that has a constructor,
it would be right after the constructor is run. The region ends just
before the finalization is expanded. Since the backend may expand the
cleanup multiple times along different paths, once for normal end of the
region, once for non-local gotos, once for returns, etc, the backend
must take special care to protect the finalization expansion, if the
expansion is for any other reason than normal region end, and it is
`inline' (it is inside the exception region). The backend can either
choose to move them out of line, or it can created an exception region
over the finalization to protect it, and in the handler associated with
it, it would not run the finalization as it otherwise would have, but
rather just rethrow to the outer handler, careful to skip the normal
handler for the original region.
In Ada, they will use the more runtime intensive approach of having
fewer regions, but at the cost of additional work at run time, to keep a
list of things that need cleanups. When a variable has finished
construction, they add the cleanup to the list, when the come to the end
of the lifetime of the variable, the run the list down. If the take a
hit before the section finishes normally, they examine the list for
actions to perform. I hope they add this logic into the back-end, as it
would be nice to get that alternative approach in C++.
On an rs6000, xlC stores exception objects on that stack, under the try
block. When is unwinds down into a handler, the frame pointer is
adjusted back to the normal value for the frame in which the handler
resides, and the stack pointer is left unchanged from the time at which
the object was thrown. This is so that there is always someplace for
the exception object, and nothing can overwrite it, once we start
throwing. The only bad part, is that the stack remains large.
The below points out some things that work in g++'s exception handling.
All completely constructed temps and local variables are cleaned up in
all unwinded scopes. Completely constructed parts of partially
constructed objects are cleaned up. This includes partially built
arrays. Exception specifications are now handled. Thrown objects are
now cleaned up all the time. We can now tell if we have an active
exception being thrown or not (__eh_type != 0). We use this to call
terminate if someone does a throw; without there being an active
exception object. uncaught_exception () works. Exception handling
should work right if you optimize. Exception handling should work with
-fpic or -fPIC.
The below points out some flaws in g++'s exception handling, as it now
stands.
Only exact type matching or reference matching of throw types works when
-fno-rtti is used. Only works on a SPARC (like Suns) (both -mflat and
-mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
PowerPC, Alpha, mips, VAX, m68k and z8k machines. SPARC v9 may not
work. HPPA is mostly done, but throwing between a shared library and
user code doesn't yet work. Some targets have support for data-driven
unwinding. Partial support is in for all other machines, but a stack
unwinder called __unwind_function has to be written, and added to
libgcc2 for them. The new EH code doesn't rely upon the
__unwind_function for C++ code, instead it creates per function
unwinders right inside the function, unfortunately, on many platforms
the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
is wrong. See below for details on __unwind_function. RTL_EXPRs for EH
cond variables for && and || exprs should probably be wrapped in
UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
We only do pointer conversions on exception matching a la 15.3 p2 case
3: `A handler with type T, const T, T&, or const T& is a match for a
throw-expression with an object of type E if [3]T is a pointer type and
E is a pointer type that can be converted to T by a standard pointer
conversion (_conv.ptr_) not involving conversions to pointers to private
or protected base classes.' when -frtti is given.
We don't call delete on new expressions that die because the ctor threw
an exception. See except/18 for a test case.
15.2 para 13: The exception being handled should be rethrown if control
reaches the end of a handler of the function-try-block of a constructor
or destructor, right now, it is not.
15.2 para 12: If a return statement appears in a handler of
function-try-block of a constructor, the program is ill-formed, but this
isn't diagnosed.
15.2 para 11: If the handlers of a function-try-block contain a jump
into the body of a constructor or destructor, the program is ill-formed,
but this isn't diagnosed.
15.2 para 9: Check that the fully constructed base classes and members
of an object are destroyed before entering the handler of a
function-try-block of a constructor or destructor for that object.
build_exception_variant should sort the incoming list, so that it
implements set compares, not exact list equality. Type smashing should
smash exception specifications using set union.
Thrown objects are usually allocated on the heap, in the usual way. If
one runs out of heap space, throwing an object will probably never work.
This could be relaxed some by passing an __in_chrg parameter to track
who has control over the exception object. Thrown objects are not
allocated on the heap when they are pointer to object types. We should
extend it so that all small (<4*sizeof(void*)) objects are stored
directly, instead of allocated on the heap.
When the backend returns a value, it can create new exception regions
that need protecting. The new region should rethrow the object in
context of the last associated cleanup that ran to completion.
The structure of the code that is generated for C++ exception handling
code is shown below:
@example
Ln: throw value;
copy value onto heap
jump throw (Ln, id, address of copy of value on heap)
try @{
+Lstart: the start of the main EH region
|... ...
+Lend: the end of the main EH region
@} catch (T o) @{
...1
@}
Lresume:
nop used to make sure there is something before
the next region ends, if there is one
... ...
jump Ldone
[
Lmainhandler: handler for the region Lstart-Lend
cleanup
] zero or more, depending upon automatic vars with dtors
+Lpartial:
| jump Lover
+Lhere:
rethrow (Lhere, same id, same obj);
Lterm: handler for the region Lpartial-Lhere
call terminate
Lover:
[
[
call throw_type_match
if (eq) @{
] these lines disappear when there is no catch condition
+Lsregion2:
| ...1
| jump Lresume
|Lhandler: handler for the region Lsregion2-Leregion2
| rethrow (Lresume, same id, same obj);
+Leregion2
@}
] there are zero or more of these sections, depending upon how many
catch clauses there are
----------------------------- expand_end_all_catch --------------------------
here we have fallen off the end of all catch
clauses, so we rethrow to outer
rethrow (Lresume, same id, same obj);
----------------------------- expand_end_all_catch --------------------------
[
L1: maybe throw routine
] depending upon if we have expanded it or not
Ldone:
ret
start_all_catch emits labels: Lresume,
@end example
The __unwind_function takes a pointer to the throw handler, and is
expected to pop the stack frame that was built to call it, as well as
the frame underneath and then jump to the throw handler. It must
restore all registers to their proper values as well as all other
machine state as determined by the context in which we are unwinding
into. The way I normally start is to compile:
void *g;
foo(void* a) @{ g = a; @}
with -S, and change the thing that alters the PC (return, or ret
usually) to not alter the PC, making sure to leave all other semantics
(like adjusting the stack pointer, or frame pointers) in. After that,
replicate the prologue once more at the end, again, changing the PC
altering instructions, and finally, at the very end, jump to `g'.
It takes about a week to write this routine, if someone wants to
volunteer to write this routine for any architecture, exception support
for that architecture will be added to g++. Please send in those code
donations. One other thing that needs to be done, is to double check
that __builtin_return_address (0) works.
@subsection Specific Targets
For the alpha, the __unwind_function will be something resembling:
@example
void
__unwind_function(void *ptr)
@{
/* First frame */
asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
/* Second frame */
asm ("ldq $15, 8($30)"); /* fp */
asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
/* Return */
asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
@}
@end example
@noindent
However, there are a few problems preventing it from working. First of
all, the gcc-internal function @code{__builtin_return_address} needs to
work given an argument of 0 for the alpha. As it stands as of August
30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
will definitely not work on the alpha. Instead, we need to define
the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
@code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
definition for @code{RETURN_ADDR_RTX}.
In addition (and more importantly), we need a way to reliably find the
frame pointer on the alpha. The use of the value 8 above to restore the
frame pointer (register 15) is incorrect. On many systems, the frame
pointer is consistently offset to a specific point on the stack. On the
alpha, however, the frame pointer is pushed last. First the return
address is stored, then any other registers are saved (e.g., @code{s0}),
and finally the frame pointer is put in place. So @code{fp} could have
an offset of 8, but if the calling function saved any registers at all,
they add to the offset.
The only places the frame size is noted are with the @samp{.frame}
directive, for use by the debugger and the OSF exception handling model
(useless to us), and in the initial computation of the new value for
@code{sp}, the stack pointer. For example, the function may start with:
@example
lda $30,-32($30)
.frame $15,32,$26,0
@end example
@noindent
The 32 above is exactly the value we need. With this, we can be sure
that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
The drawback is that there is no way that I (Brendan) have found to let
us discover the size of a previous frame @emph{inside} the definition
of @code{__unwind_function}.
So to accomplish exception handling support on the alpha, we need two
things: first, a way to figure out where the frame pointer was stored,
and second, a functional @code{__builtin_return_address} implementation
for except.c to be able to use it.
Or just support DWARF 2 unwind info.
@subsection New Backend Exception Support
This subsection discusses various aspects of the design of the
data-driven model being implemented for the exception handling backend.
The goal is to generate enough data during the compilation of user code,
such that we can dynamically unwind through functions at run time with a
single routine (@code{__throw}) that lives in libgcc.a, built by the
compiler, and dispatch into associated exception handlers.
This information is generated by the DWARF 2 debugging backend, and
includes all of the information __throw needs to unwind an arbitrary
frame. It specifies where all of the saved registers and the return
address can be found at any point in the function.
Major disadvantages when enabling exceptions are:
@itemize @bullet
@item
Code that uses caller saved registers, can't, when flow can be
transferred into that code from an exception handler. In high performance
code this should not usually be true, so the effects should be minimal.
@end itemize
@subsection Backend Exception Support
The backend must be extended to fully support exceptions. Right now
there are a few hooks into the alpha exception handling backend that
resides in the C++ frontend from that backend that allows exception
handling to work in g++. An exception region is a segment of generated
code that has a handler associated with it. The exception regions are
denoted in the generated code as address ranges denoted by a starting PC
value and an ending PC value of the region. Some of the limitations
with this scheme are:
@itemize @bullet
@item
The backend replicates insns for such things as loop unrolling and
function inlining. Right now, there are no hooks into the frontend's
exception handling backend to handle the replication of insns. When
replication happens, a new exception region descriptor needs to be
generated for the new region.
@item
The backend expects to be able to rearrange code, for things like jump
optimization. Any rearranging of the code needs have exception region
descriptors updated appropriately.
@item
The backend can eliminate dead code. Any associated exception region
descriptor that refers to fully contained code that has been eliminated
should also be removed, although not doing this is harmless in terms of
semantics.
@end itemize
The above is not meant to be exhaustive, but does include all things I
have thought of so far. I am sure other limitations exist.
Below are some notes on the migration of the exception handling code
backend from the C++ frontend to the backend.
NOTEs are to be used to denote the start of an exception region, and the
end of the region. I presume that the interface used to generate these
notes in the backend would be two functions, start_exception_region and
end_exception_region (or something like that). The frontends are
required to call them in pairs. When marking the end of a region, an
argument can be passed to indicate the handler for the marked region.
This can be passed in many ways, currently a tree is used. Another
possibility would be insns for the handler, or a label that denotes a
handler. I have a feeling insns might be the best way to pass it.
Semantics are, if an exception is thrown inside the region, control is
transferred unconditionally to the handler. If control passes through
the handler, then the backend is to rethrow the exception, in the
context of the end of the original region. The handler is protected by
the conventional mechanisms; it is the frontend's responsibility to
protect the handler, if special semantics are required.
This is a very low level view, and it would be nice is the backend
supported a somewhat higher level view in addition to this view. This
higher level could include source line number, name of the source file,
name of the language that threw the exception and possibly the name of
the exception. Kenner may want to rope you into doing more than just
the basics required by C++. You will have to resolve this. He may want
you to do support for non-local gotos, first scan for exception handler,
if none is found, allow the debugger to be entered, without any cleanups
being done. To do this, the backend would have to know the difference
between a cleanup-rethrower, and a real handler, if would also have to
have a way to know if a handler `matches' a thrown exception, and this
is frontend specific.
The stack unwinder is one of the hardest parts to do. It is highly
machine dependent. The form that kenner seems to like was a couple of
macros, that would do the machine dependent grunt work. One preexisting
function that might be of some use is __builtin_return_address (). One
macro he seemed to want was __builtin_return_address, and the other
would do the hard work of fixing up the registers, adjusting the stack
pointer, frame pointer, arg pointer and so on.
@node Free Store, Mangling, Exception Handling, Top
@section Free Store
@code{operator new []} adds a magic cookie to the beginning of arrays
for which the number of elements will be needed by @code{operator delete
[]}. These are arrays of objects with destructors and arrays of objects
that define @code{operator delete []} with the optional size_t argument.
This cookie can be examined from a program as follows:
@example
typedef unsigned long size_t;
extern "C" int printf (const char *, ...);
size_t nelts (void *p)
@{
struct cookie @{
size_t nelts __attribute__ ((aligned (sizeof (double))));
@};
cookie *cp = (cookie *)p;
--cp;
return cp->nelts;
@}
struct A @{
~A() @{ @}
@};
main()
@{
A *ap = new A[3];
printf ("%ld\n", nelts (ap));
@}
@end example
@section Linkage
The linkage code in g++ is horribly twisted in order to meet two design goals:
1) Avoid unnecessary emission of inlines and vtables.
2) Support pedantic assemblers like the one in AIX.
To meet the first goal, we defer emission of inlines and vtables until
the end of the translation unit, where we can decide whether or not they
are needed, and how to emit them if they are.
@node Mangling, Concept Index, Free Store, Top
@section Function name mangling for C++ and Java
Both C++ and Java provide overloaded functions and methods,
which are methods with the same types but different parameter lists.
Selecting the correct version is done at compile time.
Though the overloaded functions have the same name in the source code,
they need to be translated into different assembler-level names,
since typical assemblers and linkers cannot handle overloading.
This process of encoding the parameter types with the method name
into a unique name is called @dfn{name mangling}. The inverse
process is called @dfn{demangling}.
It is convenient that C++ and Java use compatible mangling schemes,
since the makes life easier for tools such as gdb, and it eases
integration between C++ and Java.
Note there is also a standard "Jave Native Interface" (JNI) which
implements a different calling convention, and uses a different
mangling scheme. The JNI is a rather abstract ABI so Java can call methods
written in C or C++;
we are concerned here about a lower-level interface primarily
intended for methods written in Java, but that can also be used for C++
(and less easily C).
Note that on systems that follow BSD tradition, a C identifier @code{var}
would get "mangled" into the assembler name @samp{_var}. On such
systems, all other mangled names are also prefixed by a @samp{_}
which is not shown in the following examples.
@subsection Method name mangling
C++ mangles a method by emitting the function name, followed by @code{__},
followed by encodings of any method qualifiers (such as @code{const}),
followed by the mangling of the method's class,
followed by the mangling of the parameters, in order.
For example @code{Foo::bar(int, long) const} is mangled
as @samp{bar__C3Fooil}.
For a constructor, the method name is left out.
That is @code{Foo::Foo(int, long) const} is mangled
as @samp{__C3Fooil}.
GNU Java does the same.
@subsection Primitive types
The C++ types @code{int}, @code{long}, @code{short}, @code{char},
and @code{long long} are mangled as @samp{i}, @samp{l},
@samp{s}, @samp{c}, and @samp{x}, respectively.
The corresponding unsigned types have @samp{U} prefixed
to the mangling. The type @code{signed char} is mangled @samp{Sc}.
The C++ and Java floating-point types @code{float} and @code{double}
are mangled as @samp{f} and @samp{d} respectively.
The C++ @code{bool} type and the Java @code{boolean} type are
mangled as @samp{b}.
The C++ @code{wchar_t} and the Java @code{char} types are
mangled as @samp{w}.
The Java integral types @code{byte}, @code{short}, @code{int}
and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
and @samp{x}, respectively.
C++ code that has included @code{javatypes.h} will mangle
the typedefs @code{jbyte}, @code{jshort}, @code{jint}
and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
and @samp{x}. (This has not been implemented yet.)
@subsection Mangling of simple names
A simple class, package, template, or namespace name is
encoded as the number of characters in the name, followed by
the actual characters. Thus the class @code{Foo}
is encoded as @samp{3Foo}.
If any of the characters in the name are not alphanumeric
(i.e not one of the standard ASCII letters, digits, or '_'),
or the initial character is a digit, then the name is
mangled as a sequence of encoded Unicode letters.
A Unicode encoding starts with a @samp{U} to indicate
that Unicode escapes are used, followed by the number of
bytes used by the Unicode encoding, followed by the bytes
representing the encoding. ASSCI letters and
non-initial digits are encoded without change. However, all
other characters (including underscore and initial digits) are
translated into a sequence starting with an underscore,
followed by the big-endian 4-hex-digit lower-case encoding of the character.
If a method name contains Unicode-escaped characters, the
entire mangled method name is followed by a @samp{U}.
For example, the method @code{X\u0319::M\u002B(int)} is encoded as
@samp{M_002b__U6X_0319iU}.
@subsection Pointer and reference types
A C++ pointer type is mangled as @samp{P} followed by the
mangling of the type pointed to.
A C++ reference type as mangled as @samp{R} followed by the
mangling of the type referenced.
A Java object reference type is equivalent
to a C++ pointer parameter, so we mangle such an parameter type
as @samp{P} followed by the mangling of the class name.
@subsection Squangled type compression
Squangling (enabled with the @samp{-fsquangle} option), utilizes the
@samp{B} code to indicate reuse of a previously seen type within an
indentifier. Types are recognized in a left to right manner and given
increasing values, which are appended to the code in the standard
manner. Ie, multiple digit numbers are delimited by @samp{_}
characters. A type is considered to be any non primitive type,
regardless of whether its a parameter, template parameter, or entire
template. Certain codes are considered modifiers of a type, and are not
included as part of the type. These are the @samp{C}, @samp{V},
@samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
constant, volatile, pointer, array, reference, unsigned, and restrict.
These codes may precede a @samp{B} type in order to make the required
modifications to the type.
For example:
@example
template <class T> class class1 @{ @};
template <class T> class class2 @{ @};
class class3 @{ @};
int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
B0 -> class2<class1<class3>
B1 -> class1<class3>
B2 -> class3
@end example
Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
The int parameter is a basic type, and does not receive a B encoding...
@subsection Qualified names
Both C++ and Java allow a class to be lexically nested inside another
class. C++ also supports namespaces.
Java also supports packages.
These are all mangled the same way: First the letter @samp{Q}
indicates that we are emitting a qualified name.
That is followed by the number of parts in the qualified name.
If that number is 9 or less, it is emitted with no delimiters.
Otherwise, an underscore is written before and after the count.
Then follows each part of the qualified name, as described above.
For example @code{Foo::\u0319::Bar} is encoded as
@samp{Q33FooU5_03193Bar}.
Squangling utilizes the the letter @samp{K} to indicate a
remembered portion of a qualified name. As qualified names are processed
for an identifier, the names are numbered and remembered in a
manner similar to the @samp{B} type compression code.
Names are recognized left to right, and given increasing values, which are
appended to the code in the standard manner. ie, multiple digit numbers
are delimited by @samp{_} characters.
For example
@example
class Andrew
@{
class WasHere
@{
class AndHereToo
@{
@};
@};
@};
f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
K0 -> Andrew
K1 -> Andrew::WasHere
K2 -> Andrew::WasHere::AndHereToo
@end example
Function @samp{f()} would be mangled as :
@samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
There are some occasions when either a @samp{B} or @samp{K} code could
be chosen, preference is always given to the @samp{B} code. Ie, the example
in the section on @samp{B} mangling could have used a @samp{K} code
instead of @samp{B2}.
@subsection Templates
A class template instantiation is encoded as the letter @samp{t},
followed by the encoding of the template name, followed
the number of template parameters, followed by encoding of the template
parameters. If a template parameter is a type, it is written
as a @samp{Z} followed by the encoding of the type. If it is a
template, it is encoded as @samp{z} followed by the parameter
of the template template parameter and the template name.
A function template specialization (either an instantiation or an
explicit specialization) is encoded by an @samp{H} followed by the
encoding of the template parameters, as described above, followed by an
@samp{_}, the encoding of the argument types to the template function
(not the specialization), another @samp{_}, and the return type. (Like
the argument types, the return type is the return type of the function
template, not the specialization.) Template parameters in the argument
and return types are encoded by an @samp{X} for type parameters,
@samp{zX} for template parameters,
or a @samp{Y} for constant parameters, an index indicating their position
in the template parameter list declaration, and their template depth.
@subsection Arrays
C++ array types are mangled by emitting @samp{A}, followed by
the length of the array, followed by an @samp{_}, followed by
the mangling of the element type. Of course, normally
array parameter types decay into a pointer types, so you
don't see this.
Java arrays are objects. A Java type @code{T[]} is mangled
as if it were the C++ type @code{JArray<T>}.
For example @code{java.lang.String[]} is encoded as
@samp{Pt6JArray1ZPQ34java4lang6String}.
@subsection Static fields
Both C++ and Java classes can have static fields.
These are allocated statically, and are shared among all instances.
The mangling starts with a prefix (@samp{_} in most systems), which is
followed by the mangling
of the class name, followed by the "joiner" and finally the field name.
The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
separator character. For historical reasons (and idiosyncracies
of assembler syntax) it can @samp{$} or @samp{.} (or even
@samp{_} on a few systems). If the joiner is @samp{_} then the prefix
is @samp{__static_} instead of just @samp{_}.
For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
(or rarely @samp{__static_Q23Foo3Bar_var}).
If the name of a static variable needs Unicode escapes,
the Unicode indicator @samp{U} comes before the "joiner".
This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
@subsection Table of demangling code characters
The following special characters are used in mangling:
@table @samp
@item A
Indicates a C++ array type.
@item b
Encodes the C++ @code{bool} type,
and the Java @code{boolean} type.
@item B
Used for squangling. Similar in concept to the 'T' non-squangled code.
@item c
Encodes the C++ @code{char} type, and the Java @code{byte} type.
@item C
A modifier to indicate a @code{const} type.
Also used to indicate a @code{const} member function
(in which cases it precedes the encoding of the method's class).
@item d
Encodes the C++ and Java @code{double} types.
@item e
Indicates extra unknown arguments @code{...}.
@item E
Indicates the opening parenthesis of an expression.
@item f
Encodes the C++ and Java @code{float} types.
@item F
Used to indicate a function type.
@item H
Used to indicate a template function.
@item i
Encodes the C++ and Java @code{int} types.
@item I
Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a
positive decimal number. The @samp{I} is followed by either two
hexidecimal digits, which encode the value of @var{n}, or by an
arbitrary number of hexidecimal digits between underscores. For
example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_}
encodes the type @code{int512_t}.
@item J
Indicates a complex type.
@item K
Used by squangling to compress qualified names.
@item l
Encodes the C++ @code{long} type.
@item n
Immediate repeated type. Followed by the repeat count.
@item N
Repeated type. Followed by the repeat count of the repeated type,
followed by the type index of the repeated type. Due to a bug in
g++ 2.7.2, this is only generated if index is 0. Superceded by
@samp{n} when squangling.
@item P
Indicates a pointer type. Followed by the type pointed to.
@item Q
Used to mangle qualified names, which arise from nested classes.
Also used for namespaces.
In Java used to mangle package-qualified names, and inner classes.
@item r
Encodes the GNU C++ @code{long double} type.
@item R
Indicates a reference type. Followed by the referenced type.
@item s
Encodes the C++ and java @code{short} types.
@item S
A modifier that indicates that the following integer type is signed.
Only used with @code{char}.
Also used as a modifier to indicate a static member function.
@item t
Indicates a template instantiation.
@item T
A back reference to a previously seen type.
@item U
A modifier that indicates that the following integer type is unsigned.
Also used to indicate that the following class or namespace name
is encoded using Unicode-mangling.
@item u
The @code{restrict} type qualifier.
@item v
Encodes the C++ and Java @code{void} types.
@item V
A modifier for a @code{volatile} type or method.
@item w
Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
@item W
Indicates the closing parenthesis of an expression.
@item x
Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
@item X
Encodes a template type parameter, when part of a function type.
@item Y
Encodes a template constant parameter, when part of a function type.
@item z
Used for template template parameters.
@item Z
Used for template type parameters.
@end table
The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
also seem to be used for obscure purposes ...
@node Concept Index, , Mangling, Top
@section Concept Index
@printindex cp
@bye
|