1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
1909
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
2065
2066
2067
2068
2069
2070
2071
2072
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
2088
2089
2090
2091
2092
2093
2094
2095
2096
2097
2098
2099
2100
2101
2102
2103
2104
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
2151
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
2188
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
|
.. _skiboot-5.10:
skiboot-5.10
============
skiboot v5.10 was released on Friday February 23rd 2018. It is the first
release of skiboot 5.10, and becomes the new stable release
of skiboot following the 5.9 release, first released October 31st 2017.
skiboot v5.10 contains all bug fixes as of :ref:`skiboot-5.9.8`
and :ref:`skiboot-5.4.9`. We do not forsee any further 5.9.x releases.
For how the skiboot stable releases work, see :ref:`stable-rules` for details.
Over skiboot-5.9, we have the following changes:
New Features
------------
Since skiboot-5.10-rc3:
- sensor-groups: occ: Add support to disable/enable sensor group
This patch adds a new opal call to enable/disable a sensor group. This
call is used to select the sensor groups that needs to be copied to
main memory by OCC at runtime.
- sensors: occ: Add energy counters
Export the accumulated power values as energy sensors. The accumulator
field of power sensors are used for representing energy counters which
can be exported as energy counters in Linux hwmon interface.
- sensors: Support reading u64 sensor values
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors
upto 64bits.
This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads.
- dt: add /cpus/ibm,powerpc-cpu-features device tree bindings
This is a new CPU feature advertising interface that is fine-grained,
extensible, aware of privilege levels, and gives control of features
to all levels of the stack (firmware, hypervisor, and OS).
The design and binding specification is described in detail in doc/.
Since skiboot-5.10-rc2:
- DT: Add "version" property under ibm, firmware-versions node
First line of VERSION section in PNOR contains firmware version.
Use that to add "version" property under firmware versions dt node.
Sample output:
.. code-block:: console
root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop
version "witherspoon-ibm-OP9_v1.19_1.94"
Since skiboot-5.10-rc1:
- hw/npu2: Implement logging HMI actions
Since skiboot-5.9:
- hdata: Parse IPL FW feature settings
Add parsing for the firmware feature flags in the HDAT. This
indicates the settings of various parameters which are set at IPL time
by firmware.
- opal/xstop: Use nvram option to enable/disable sw checkstop.
Add a mechanism to enable/disable sw checkstop by looking at nvram option
opal-sw-xstop=<enable/disable>.
For now this patch disables the sw checkstop trigger unless explicitly
enabled through nvram option 'opal-sw-xstop=enable'i for p9. This will allow
an opportunity to get host kernel in panic path or xmon for unrecoverable
HMIs or MCE, to be able to debug the issue effectively.
To enable sw checkstop in opal issue following command: ::
nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
**NOTE:** This is a workaround patch to disable sw checkstop by default to gain
control in host kernel for better checkstop debugging. Once we have most of
the checkstop issues stabilized/resolved, revisit this patch to enable sw
checkstop by default.
For p8 platform it will remain enabled by default unless explicitly disabled.
To disable sw checkstop on p8 issue following command: ::
nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
- hdata: Parse SPD data
Parse SPD data and populate device tree.
list of properties parsing from SPD: ::
[root@ltc-wspoon dimm@d00f]# lsprop .
memory-id 0000000c (12) # DIMM type
product-version 00000032 (50) # Module Revision Code
device_type "memory-dimm-ddr4"
serial-number 15d9acb6 (366587062)
status "okay"
size 00004000 (16384)
phandle 000000bd (189)
ibm,loc-code "UOPWR.0000000-Node0-DIMM7"
part-number "36ASF2G72PZ-2G6B2 "
reg 0000d007 (53255)
name "dimm"
manufacturer-id 0000802c (32812) # Vendor ID, we can get vendor name from this ID
Also update documentation.
- hdata: Add memory hierarchy under xscom node
We have memory to chip mapping but doesn't have complete memory hierarchy.
This patch adds memory hierarchy under xscom node. This is specific to
P9 system as these hierarchy may change between processor generation.
It uses memory controller ID details and populates nodes like:
xscom@<addr>/mcbist@<mcbist_id>/mcs@<mcs_id>/mca@<mca_id>/dimm@<resource_id>
Also this patch adds few properties under dimm node.
Finally make sure xscom nodes created before calling memory_parse().
Fast Reboot and Quiesce
^^^^^^^^^^^^^^^^^^^^^^^
We have a preliminary fast reboot implementation for POWER9 systems, which
we look to enabling by default in the next release.
The OPAL Quiesce calls are designed to improve reliability and debuggability
around reboot and error conditions. See the full API documentation for details:
:ref:`opal-quiesce`.
- fast-reboot: bare bones fast reboot implementation for POWER9
This is an initial fast reboot implementation for p9 which has only been
tested on the Witherspoon platform, and without the use of NPUs, NX/VAS,
etc.
This has worked reasonably well so far, with no failures in about 100
reboots. It is hidden behind the traditional fast-reboot experimental
nvram option, until more platforms and configurations are tested.
- fast-reboot: move boot CPU clean-up logically together with secondaries
Move the boot CPU clean-up and state transition to active, logically
together with secondaries. Don't release secondaries from fast reboot
hold until everyone has cleaned up and transitioned to active.
This is cosmetic, but it is helpful to run the fast reboot state machine
the same way on all CPUs.
- fast-reboot: improve failure error messages
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure.
- fast-reboot: quiesce opal before initiating a fast reboot
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.
Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.
In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time.
- core: Add support for quiescing OPAL
Quiescing is ensuring all host controlled CPUs (except the current
one) are out of OPAL and prevented from entering. This can be use in
debug and shutdown paths, particularly with system reset sequences.
This patch adds per-CPU entry and exit tracking for OPAL calls, and
adds logic to "hold" or "reject" at entry time, if OPAL is quiesced.
An OPAL call is added, to expose the functionality to Linux, where it
can be used for shutdown, kexec, and before generating sreset IPIs for
debugging (so the debug code does not recurse into OPAL).
- dctl: p9 increase thread quiesce timeout
We require all instructions to be completed before a thread is
considered stopped, by the dctl interface. Long running instructions
like cache misses and CI loads may take a significant amount of time
to complete, and timeouts have been observed in stress testing.
Increase the timeout significantly, to cover this. The workbook
just says to poll, but we like to have timeouts to avoid getting
stuck in firmware.
POWER9 power saving
^^^^^^^^^^^^^^^^^^^
There is much improved support for deeper sleep/idle (stop) states on POWER9.
- OCC: Increase max pstate check on P9 to 255
This has changed from P8, we can now have > 127 pstates.
This was observed on Boston during WoF bring up.
- SLW: Add idle state stop5 for DD2.0 and above
Adding stop5 idle state with rough residency and latency numbers.
- SLW: Add p9_stop_api calls for IMC
Add p9_stop_api for EVENT_MASK and PDBAR scoms. These scoms are lost on
wakeup from stop11.
- SCOM restore for DARN and XIVE
While waking up from stop11, we want NCU_DARN_BAR to have enable bit set.
Without this stop_api call, the value restored is without enable bit set.
We loose NCU_SPEC_BAR when the quad goes into stop11, stop_api will
restore while waking up from stop11.
- SLW: Call p9_stop_api only if deep_states are enabled
All init time p9_stop_api calls have been isolated to slw_late_init. If
p9_stop_api fails, then the deep states can be excluded from device tree.
For p9_stop_api called after device-tree for cpuidle is created ,
has_deep_states will be used to check if this call is even required.
- Better handle errors in setting up sleep states (p9_stop_api)
We won't put affected stop states in the device tree if the wakeup
engine is not present or has failed.
- SCOM Restore: Increased the EQ SCOM restore limit.
Commit increases the SCOM restore limit from 16 to 31.
- hw/dts: retry special wakeup operation if core still gated
It has been observed that in some cases the special wakeup
operation can "succeed" but the core is still in a gated/offline
state.
Check for this state after attempting to wakeup a core and retry
the wakeup if necessary.
- core/direct-controls: add function to read core gated state
- core/direct-controls: wait for core special wkup bit cleared
When clearing special wakeup bit on a core, wait until the
bit is actually cleared by the hardware in the status register
until returning success.
This may help avoid issues with back-to-back reads where the
special wakeup request is cleared but the firmware is still
processing the request and the next attempt to set the bit
reads an immediate success from the previous operation.
- p9_stop_api: PM: Added support for version control in SCOM restore entries.
- adds version info in SCOM restore entry header
- adds version specific details in SCOM restore entry header
- retains old behaviour of SGPE Hcode's base version
- p9_stop_api: EQ SCOM Restore: Introduced version control in SCOM restore entry.
- introduces version control in header of SCOM restore entry
- ensures backward compatibility
- introduces flexibility to handle any number of SCOM restore entry.
Secure and Trusted Boot for POWER9
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We introduce support for Secure and Trusted Boot for POWER9 systems, with equal
functionality that we have on POWER8 systems, that is, we have the mechanisms in
place to boot to petitboot (i.e. to BOOTKERNEL).
See the :ref:`stb-overview` for full documentation of OPAL secure and trusted boot.
Since skiboot-5.10-rc2:
- stb: Put correct label (for skiboot) into container
Hostboot will expect the label field of the stb header to contain
"PAYLOAD" for skiboot or it will fail to load and run skiboot.
The failure looks something like this: ::
53.40896|ISTEP 20. 1 - host_load_payload
53.65840|secure|Secureboot Failure plid = 0x90000755, rc = 0x1E07
53.65881|System shutting down with error status 0x1E07
53.67547|================================================
53.67954|Error reported by secure (0x1E00) PLID 0x90000755
53.67560| Container's component ID does not match expected component ID
53.67561| ModuleId 0x09 SECUREBOOT::MOD_SECURE_VERIFY_COMPONENT
53.67845| ReasonCode 0x1e07 SECUREBOOT::RC_ROM_VERIFY
53.67998| UserData1 : 0x0000000000000000
53.67999| UserData2 : 0x0000000000000000
53.67999|------------------------------------------------
53.68000| Callout type : Procedure Callout
53.68000| Procedure : EPUB_PRC_HB_CODE
53.68001| Priority : SRCI_PRIORITY_HIGH
53.68001|------------------------------------------------
53.68002| Callout type : Procedure Callout
53.68003| Procedure : EPUB_PRC_FW_VERIFICATION_ERR
53.68003| Priority : SRCI_PRIORITY_HIGH
53.68004|------------------------------------------------
Since skiboot-5.10-rc1:
- stb: Enforce secure boot if called before libstb initialized
- stb: Correctly error out when no PCR for resource
- core/init: move imc catalog preload init after the STB init.
As a safer side move the imc catalog preload after the STB init
to make sure the imc catalog resource get's verified and measured
properly during loading when both secure and trusted boot modes
are on.
- libstb: fix failure of calling trusted measure without STB initialization.
When we load a flash resource during OPAL init, STB calls trusted measure
to measure the given resource. There is a situation when a flash gets loaded
before STB initialization then trusted measure cannot measure properly.
So this patch fixes this issue by calling trusted measure only if the
corresponding trusted init was done.
The ideal fix is to make sure STB init done at the first place during init
and then do the loading of flash resources, by that way STB can properly
verify and measure the all resources.
- libstb: fix failure of calling cvc verify without STB initialization.
Currently in OPAL init time at various stages we are loading various
PNOR partition containers from the flash device. When we load a flash
resource STB calls the CVC verify and trusted measure(sha512) functions.
So when we have a flash resource gets loaded before STB initialization,
then cvc verify function fails to start the verify and enforce the boot.
Below is one of the example failure where our VERSION partition gets
loading early in the boot stage without STB initialization done.
This is with secure mode off.
STB: VERSION NOT VERIFIED, invalid param. buf=0x305ed930, len=4096 key-hash=0x0 hash-size=0
In the same code path when secure mode is on, the boot process will abort.
So this patch fixes this issue by calling cvc verify only if we have
STB init was done.
And also we need a permanent fix in init path to ensure STB init gets
done at first place and then start loading all other flash resources.
- libstb/tpm_chip: Add missing new line to print messages.
- libstb: increase the log level of verify/measure messages to PR_NOTICE.
Currently libstb logs the verify and hash caluculation messages in
PR_INFO level. So when there is a secure boot enforcement happens
in loading last flash resource(Ex: BOOTKERNEL), the previous verify
and measure messages are not logged to console, which is not clear
to the end user which resource is verified and measured.
So this patch fixes this by increasing the log level to PR_NOTICE.
Since skiboot-5.9:
- allow secure boot if not enforcing it
We check the secure boot containers no matter what, only *enforcing*
secure boot if we're booting in secure mode. This gives us an extra
layer of checking firmware is legit even when secure mode isn't enabled,
as well as being really useful for testing.
- libstb/(create|print)-container: Sync with sb-signing-utils
The sb-signing-utils project has improved upon the skeleton
create-container tool that existed in skiboot, including
being able to (quite easily) create *signed* images.
This commit brings in that code (and makes it build in the
skiboot build environment) and updates our skiboot.*.stb
generating code to use the development keys. This means that by
default, skiboot build process will let you build firmware that can
do a secure boot with *development* keys.
See :ref:`signing-firmware-code` for details on firmware signing.
We also update print-container as well, syncing it with the
upstream project.
Derived from github.com:open-power/sb-signing-utils.git
at v0.3-5-gcb111c03ad7f
(Some discussion ongoing on the changes, another sync will come shortly)
- doc: update libstb documentation with POWER9 changes.
See: :ref:`stb-overview`.
POWER9 changes reflected in the libstb:
- bumped ibm,secureboot node to v2
- added ibm,cvc node
- hash-algo superseded by hw-key-hash-size
- libstb/cvc: update memory-region to point to /reserved-memory
The linux documentation, reserved-memory.txt, says that memory-region is
a phandle that pairs to a children of /reserved-memory.
This updates /ibm,secureboot/ibm,cvc/memory-region to point to
/reserved-memory/secure-crypt-algo-code instead of
/ibm,hostboot/reserved-memory/secure-crypt-algo-code.
- libstb: add support for ibm,secureboot-v2
ibm,secureboot-v2 changes:
- The Container Verification Code is represented by the ibm,cvc node.
- Each ibm,cvc child describes a CVC service.
- hash-algo is superseded by hw-key-hash-size.
- hdata/tpmrel.c: add ibm, cvc device tree node
In P9, the Container Verification Code is stored in a hostboot reserved
memory and the list of provided CVC services is stored in the
TPMREL_IDATA_HASH_VERIF_OFFSETS idata array. Each CVC service has an
offset and version.
This adds the ibm,cvc device tree node and its documentation.
- hdata/tpmrel.c: add firmware event log info to the tpm node
This parses the firmware event log information from the
secureboot_tpm_info HDAT structure and add it to the tpm device tree
node.
There can be multiple secureboot_tpm_info entries with each entry
corresponding to a master processor that has a tpm device, however,
multiple tpm is not supported.
- hdata/spira: add ibm,secureboot node in P9
In P9, skiboot builds the device tree from the HDAT. These are the
"ibm,secureboot" node changes compared to P8:
- The Container-Verification-Code (CVC), a.k.a. ROM code, is no longer
stored in a secure ROM with static address. In P9, it is stored in a
hostboot reserved memory and each service provided also has a version,
not only an offset.
- The hash-algo property is not provided via HDAT, instead it provides
the hw-key-hash-size, which is indeed the information required by the
CVC to verify containers.
This parses the iplparams_sysparams HDAT structure and creates the
"ibm,secureboot", which is bumped to "ibm,secureboot-v2".
In "ibm,secureboot-v2":
- hash-algo property is superseded by hw-key-hash-size.
- container verification code is explicitly described by a child node.
Added in a subsequent patch.
See :ref:`device-tree/ibm,secureboot` for documentation.
- libstb/tpm_chip.c: define pr_fmt and fix messages logged
This defines pr_fmt and also fix messages logged:
- EV_SEPARATOR instead of 0xFFFFFFFF
- when an event is measured it also prints the tpm id, event type and
event log length
Now we can filter the messages logged by libstb and its
sub-modules by running: ::
grep STB /sys/firmware/opal/msglog
- libstb/tss: update the list of event types supported
Skiboot, precisely the tpmLogMgr, initializes the firmware event log by
calculating its length so that a new event can be recorded without
exceeding the log size. In order to calculate the size, it walks through
the log until it finds a specific event type. However, if the log has
an unknown event type, the tpmLogMgr will not be able to reach the end
of the log.
This updates the list of event types with all of those supported by
hostboot. Thus, skiboot can properly calculate the event log length.
- tpm_i2c_nuvoton: add nuvoton, npct601 to the compatible property
The linux kernel doesn't have a driver compatible with
"nuvoton,npct650", but it does have for "nuvoton,npct601", which should
also be compatible with npct650.
This adds "nuvoton,npct601" to the compatible devtree property.
- libstb/trustedboot.c: import stb_final() from stb.c
The stb_final() primary goal is to measure the event EV_SEPARATOR
into PCR[0-7] when trusted boot is about to exit the boot services.
This imports the stb_final() from stb.c into trustedboot.c, but making
the following changes:
- Rename it to trustedboot_exit_boot_services().
- As specified in the TCG PC Client specification, EV_SEPARATOR events must
be logged with the name 0xFFFFFF.
- Remove the ROM driver clean-up call.
- Don't allow code to be measured in skiboot after
trustedboot_exit_boot_services() is called.
- libstb/cvc.c: import softrom behaviour from drivers/sw_driver.c
Softrom is used only for testing with mambo. By setting
compatible="ibm,secureboot-v1-softrom" in the "ibm,secureboot" node,
firmware images can be properly measured even if the
Container-Verification-Code (CVC) is not available. In this case, the
mbedtls_sha512() function is used to calculate the sha512 hash of the
firmware images.
This imports the softrom behaviour from libstb/drivers/sw_driver.c code
into cvc.c, but now softrom is implemented as a flag. When the flag is
set, the wrappers for the CVC services work the same way as in
sw_driver.c.
- libstb/trustedboot.c: import tb_measure() from stb.c
This imports tb_measure() from stb.c, but now it calls the CVC sha512
wrapper to calculate the sha512 hash of the firmware image provided.
In trustedboot.c, the tb_measure() is renamed to trustedboot_measure().
The new function, trustedboot_measure(), no longer checks if the
container payload hash calculated at boot time matches with the hash
found in the container header. A few reasons:
- If the system admin wants the container header to be
checked/validated, the secure boot jumper must be set. Otherwise,
the container header information may not be reliable.
- The container layout is expected to change over time. Skiboot
would need to maintain a parser for each container layout
change.
- Skiboot could be checking the hash against a container version that
is not supported by the Container-Verification-Code (CVC).
The tb_measure() calls are updated to trustedboot_measure() in a
subsequent patch.
- libstb/secureboot.c: import sb_verify() from stb.c
This imports the sb_verify() function from stb.c, but now it calls the
CVC verify wrapper in order to verify signed firmware images. The
hw-key-hash and hw-key-hash-size initialized in secureboot.c are passed
to the CVC verify function wrapper.
In secureboot.c, the sb_verify() is renamed to secureboot_verify(). The
sb_verify() calls are updated in a subsequent patch.
XIVE
----
- xive: Don't bother cleaning up disabled EQs in reset
Additionally, warn if we find an enabled one that isn't one
of the firmware built-in queues.
- xive: Warn on valid VPs found in abnormal cases
If an allocated VP is left valid at xive_reset() or Linux tries
to free a valid (enabled) VP block, print errors. The former happens
occasionally if kdump'ing while KVM is running so keep it as a debug
message. The latter is a programming error in Linux so use a an
error log level.
- xive: Properly reserve built-in VPs in non-group mode
This is not normally used but if the #define is changed to
disable block group mode we would incorrectly clear the
buddy completely without marking the built-in VPs reserved.
- xive: Quieten debug messages in standard builds
This makes a bunch of messages, especially the per-CPU ones,
only enabled in debug builds. This avoids clogging up the
OPAL logs with XIVE related messages that have proven not
being particularly useful for field defects.
- xive: Implement "single escalation" feature
This adds a new VP flag to control the new DD2.0
"single escalation" feature.
This feature allows us to have a single escalation
interrupt per VP instead of one per queue.
It works by hijacking queue 7 (which is this no longer
usable when that is enabled) and exploiting two new
hardware bits that will:
- Make the normal queues (0..6) escalate unconditionally
thus ignoring the ESe bits.
- Route the above escalations to queue 7
- Have queue 7 silently escalate without notification
Thus the escalation of queue 7 becomes the one escalation
interrupt for all the other queues.
- xive: When disabling a VP, wipe all of its settings
- xive: Improve cleaning up of EQs
Factors out the function that sets an EQ back to a clean
state and add a cleaning pass for queue left enabled
when freeing a block of VPs.
- xive: When disabling an EQ, wipe all of its settings
This avoids having configuration bits left over
- xive: Define API for single-escalation VP mode
This mode allows all queues of a VP to use the same
escalation interrupt, at the cost of losing priority 7.
This adds the definition and documentation of the API,
the implementation will come next.
- xive: Fix ability to clear some EQ flags
We could never clear "unconditional notify" and "escalate"
- xive: Update inits for DD2.0
This updates some inits based on information from the HW
designers. This includes enabling some new DD2.0 features
that we don't yet exploit.
- xive: Ensure VC informational FIRs are masked
Some HostBoot versions leave those as checkstop, they are harmless
and can sometimes occur during normal operations.
- xive: Fix occasional VC checkstops in xive_reset
The current workaround for the scrub bug described in
__xive_cache_scrub() has an issue in that it can leave
dirty invalid entries in the cache.
When cleaning up EQs or VPs during reset, if we then
remove the underlying indirect page for these entries,
the XIVE will checkstop when trying to flush them out
of the cache.
This replaces the existing workaround with a new pair of
workarounds for VPs and EQs:
- The VP one does the dummy watch on another entry than
the one we scrubbed (which does the job of pushing old
stores out) using an entry that is known to be backed by
a permanent indirect page.
- The EQ one switches to a more efficient workaround
which consists of doing a non-side-effect ESB load from
the EQ's ESe control bits.
- xive: Do not return a trigger page for an escalation interrupt
This is bogus, we don't support them. (Thankfully the callers
didn't actually try to use this on escalation interrupts).
- xive: Mark a freed IRQs IVE as valid and masked
Removing the valid bit means a FIR will trip if it's accessed
inadvertently. Under some circumstances, the XIVE will speculatively
access an IVE for a masked interrupt and trip it. So make sure that
freed entries are still marked valid (but masked).
PCI
---
Since skiboot-5.10-rc3:
- phb3/phb4/p7ioc: Document supported TCE sizes in DT
Add a new property, "ibm,supported-tce-sizes", to advertise to Linux how
big the available TCE sizes are. Each value is a bit shift, from
smallest to largest.
- phb4: Fix TCE page size
The page sizes for TCEs on P9 were inaccurate and just copied from PHB3,
so correct them.
- Revert "pci: Shared slot state synchronisation for hot reset"
An issue was found in shared slot reset where the system can be stuck in
an infinite loop, pull the code out until there's a proper fix.
This reverts commit 1172a6c57ff3c66f6361e572a1790cbcc0e5ff37.
- hdata/iohub: Use only wildcard slots for pluggables
We don't want to cause a VID:DID check against pluggable devices, as
they may use multiple devids.
Narrow the condition under which VID:DID is listed in the dt, so that
we'll end up creating a wildcard slot for these instead.
Since skiboot-5.9:
- pci: Shared slot state synchronisation for hot reset
When a device is shared between two PHBs, it doesn't get reset properly
unless both PHBs issue a hot reset at "the same time". Practically this
means a hot reset needs to be issued on both sides, and neither should
bring the link up until the reset on both has completed.
- pci: Track peers of slots
Witherspoon introduced a new concept where one physical slot is shared
between two PHBs. Making a slot aware of its peer enables syncing
between them where necessary.
PHB4
----
Since skiboot-5.10-rc4:
- phb4: Disable lane eq when retrying some nvidia GEN3 devices
This fixes these nvidia cards training at only GEN2 spends rather than
GEN3 by disabling PCIe lane equalisation.
Firstly we check if the card is in a whitelist. If it is and the link
has not trained optimally, retry with lane equalisation off. We do
this on all POWER9 chip revisions since this is a device issue, not
a POWER9 chip issue.
Since skiboot-5.10-rc2:
- phb4: Only escalate freezes on MMIO load where necessary
In order to work around a hardware issue, MMIO load freezes were
escalated to fences on every chip. Now that hardware no longer requires
this, restrict escalation to the chips that actually need it.
Since skiboot-5.9:
- phb4: Change PCI MMIO timers
Currently we have a mismatch between the NCU and PCI timers for MMIO
accesses. The PCI timers must be lower than the NCU timers otherwise
it may cause checkstops.
This changes PCI timeouts controlled by skiboot to 33-50ms. It should
be forwards and backwards compatible with expected hostboot changes to
the NCU timer.
- phb4: Change default GEN3 lane equalisation setting to 0x54
Currently our GEN3 lane equalisation settings are set to 0x77. Change
this to 0x54. This change will allow us to train at GEN3 in a shorter
time and more consistently.
This setting gives us a TX preset 0x4 and RX hint 0x5. This gives a
boost in gain for high frequency signalling. It allows the most optimal
continuous time linear equalizers (CTLE) for the remote receiver port
and de-emphasis and pre-shoot for the remote transmitter port.
Machine Readable Workbooks (MRW) are moving to this new value also.
- phb4: Init changes
These init changes for phb4 from the HW team.
Link down are now endpoint recoverable (ERC) rather than PHB fatal
errors.
BLIF Completion Timeout Error now generate an interrupt rather than
causing freeze events.
- phb4: Fix lane equalisation setting
Fix cut and paste from phb3. The sizes have changes now we have GEN4,
so the check here needs to change also
Without this we end up with the default settings (all '7') rather
than what's in HDAT.
- hdata: Fix copying GEN4 lane equalisation settings
These aren't copied currently but should be.
- phb4: Fix PE mapping of M32 BAR
The M32 BAR is the PHB4 region used to map all the non-prefetchable
or 32-bit device BARs. It's supposed to have its segments remapped
via the MDT and Linux relies on that to assign them individual PE#.
However, we weren't configuring that properly and instead used the
mode where PE# == segment#, thus causing EEH to freeze the wrong
device or PE#.
- phb4: Fix lost bit in PE number on config accesses
A PE number can be up to 9 bits, using a uint8_t won't fly..
That was causing error on config accesses to freeze the
wrong PE.
- phb4: Update inits
New init value from HW folks for the fence enable register.
This clears bit 17 (CFG Write Error CA or UR response) and bit 22 (MMIO Write
DAT_ERR Indication) and sets bit 21 (MMIO CFG Pending Error)
CAPI
----
Since skiboot-5.10-rc2:
- capi: Enable channel tag streaming for PHB in CAPP mode
We re-enable channel tag streaming for PHB in CAPP mode as without it
PEC was waiting for cresp for each DMA write command before sending a
new DMA write command on the Powerbus. This resulted in much lower DMA
write performance than expected.
The patch updates enable_capi_mode() to remove the masking of
channel_streaming_en bit in PBCQ Hardware Configuration Register. Also
does some re-factoring of the code that updates this register to use
xscom_write_mask instead of xscom_read followed by a xscom_write.
Since skiboot-5.10-rc1:
- capi: Fix the max tlbi divider and the directory size.
Switch to 512KB mode (directory size) as we don’t use bit 48 of the tag
in addressing the array. This mode is controlled by the Snoop CAPI
Configuration Register.
Set the maximum of the number of data polls received before signaling
TLBI hang detect timer expired. The value of '0000' is equal to 16.
Since skiboot-5.9:
- capi: Disable CAPP virtual machines
When exercising more than one CAPI accelerators simultaneously in
cache coherency mode, the verification team is seeing a deadlock. To
fix this a workaround of disabling CAPP virtual machines is
suggested. These 'virtual machines' let PSL queue multiple CAPP
commands for servicing by CAPP there by increasing
throughput. Below is the error scenario described by the h/w team:
" With virtual machines enabled we had a deadlock scenario where with 2
or more CAPI's in a system you could get in a deadlock scenario due to
cast-outs that are required break the deadlock (evict lines that
another CAPI is requesting) get stuck in the virtual machine queue by
a command ahead of it that is being retried by the same scenario in
the other CAPI. "
- capi: Perform capp recovery sequence only when PBCQ is idle
Presently during a CRESET the CAPP recovery sequence can be executed
multiple times in case PBCQ on the PEC is still busy processing in/out
bound in-flight transactions.
- xive: Mask MMIO load/store to bad location FIR
For opencapi, the trigger page of an interrupt is mapped to user
space. The intent is to write the page to raise an interrupt but
there's nothing to prevent a user process from reading it, which has
the unfortunate consequence of checkstopping the system.
Mask the FIR bit raised when an MMIO operation targets an invalid
location. It's the recommendation from recent documentation and
hostboot is expected to mask it at some point. In the meantime, let's
play it safe.
- phb4: Dump CAPP error registers when it asserts link down
This patch introduces a new function phb4_dump_app_err_regs() that
dumps CAPP error registers in case the PEC nestfir register indicates
that the fence was due to a CAPP error (BIT-24).
Contents of these registers are helpful in diagnosing CAPP
issues. Registers that are dumped in phb4_dump_app_err_regs() are:
* CAPP FIR Register
* CAPP APC Master Error Report Register
* CAPP Snoop Error Report Register
* CAPP Transport Error Report Register
* CAPP TLBI Error Report Register
* CAPP Error Status and Control Register
- capi: move the acknowledge of the HMI interrupt
We need to acknowledge an eventual HMI initiated by the previous forced
fence on the PHB to work around a non-existent PE in the phb4_creset()
function.
For this reason do_capp_recovery_scoms() is called now at the
beginning of the step: PHB4_SLOT_CRESET_WAIT_CQ
- capi: update ci store buffers and dma engines
The number of read (APC type traffic) and mmio store (MSG type traffic)
resources assigned to the CAPP is controlled by the CAPP control
register.
According to the type of CAPI cards present on the server, we have to
configure differently the CAPP messages and the DMA read engines given
to the CAPP for use.
HMI
---
- core/hmi: Display chip location code while displaying core FIR.
- core/hmi: Do not display FIR details if none of the bits are set.
So that we don't flood OPAL console logs with information that is not
useful.
- opal/hmi: HMI logging with location code info.
Add few HMI debug prints with location code info few additional info.
No functionality change.
With this patch the log messages will look like: ::
[210612.175196744,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
[210612.175200449,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc1]: P:8 C:16 T:1: TFMR(2d12000870e04020) Timer Facility Error
[210660.259689526,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000
[210660.259695649,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc0]: P:0 C:16 T:1: Processor recovery Done.
- core/hmi: Use pr_fmt macro for tagging log messages
No functionality changes.
- opal: Get chip location code
and store it under proc_chip for quick reference during HMI handling
code.
Sensors
-------
- occ-sensors: Fix up quad/gpu location mix-up
The GPU and QUAD sensor location types are swapped compared to what
exists in the OCC code base which is authoritative. Fix them up.
- sensors: occ: Skip counter type of sensors
Don't add counter type of sensors to device-tree as they don't
fit into hwmon sensor interface.
- sensors: dts: Assert special wakeup on idle cores while reading temperature
In P9, when a core enters a stop state, its clocks will be stopped
to save power and hence we will not be able to perform a SCOM
operation to read the DTS temperature sensor. Hence, assert
a special wakeup on cores that have entered a stop state in order to
successfully complete the SCOM operation.
- sensors: occ: Skip power sensors with zero sample value
APSS is not available on platforms like Zaius, Romulus where OCC
can only measure Vdd (core) and Vdn (nest) power from the AVSbus
reading. So all the sensors for APSS channels will be populated
with 0. Different component power sensors like system, memory
which point to the APSS channels will also be 0.
As per OCC team (Martha Broyles) zeroed power sensor means that the
system doesn't have it. So this patch filters out these sensors.
- sensors: occ: Skip GPU sensors for non-gpu systems
- sensors: Fix dtc warning for new occ in-band sensors.
dtc complains about missing reg property when a DT node is having a
unit name or address but no reg property. ::
/ibm,opal/sensors/vrm-in@c00004 has a unit name, but no reg property
/ibm,opal/sensors/gpu-in@c0001f has a unit name, but no reg property
/ibm,opal/sensor-groups/occ-js@1c00040 has a unit name, but no reg property
This patch fixes these warnings for new occ in-band sensors and also for
sensor-groups by adding necessary properties.
- sensors: Fix dtc warning for dts sensors.
dtc complains about missing reg property when a DT node is having a
unit name or address but no reg property.
Example warning for core dts sensor: ::
/ibm,opal/sensors/core-temp@5c has a unit name, but no reg property
/ibm,opal/sensors/core-temp@804 has a unit name, but no reg property
This patch fixes this by adding necessary properties.
- hw/occ: Fix psr cpu-to-gpu sensors node dtc warning.
dtc complains about missing reg property when a DT node is having a
unit name or address but no reg property. ::
/ibm,opal/power-mgt/psr/cpu-to-gpu@0 has a unit name, but no reg property
/ibm,opal/power-mgt/psr/cpu-to-gpu@100 has a unit name, but no reg property
This patch fixes this by adding necessary properties.
General fixes
-------------
Since skiboot-5.10-rc3:
- core: Fix mismatched names between reserved memory nodes & properties
OPAL exposes reserved memory regions through the device tree in both new
(nodes) and old (properties) formats.
However, the names used for these don't match - we use a generated cell
address for the nodes, but the plain region name for the properties.
This fixes a warning from FWTS
Since skiboot-5.10-rc2:
- vas: Disable VAS/NX-842 on some P9 revisions
VAS/NX-842 are not functional on some P9 revisions, so disable them
in hardware and skip creating their device tree nodes.
Since the intent is to prevent OS from configuring VAS/NX, we remove
only the platform device nodes but leave the VAS/NX DT nodes under
xscom (i.e we don't skip add_vas_node() in hdata/spira.c)
- core/device.c: Fix dt_find_compatible_node
dt_find_compatible_node() and dt_find_compatible_node_on_chip() are used to
find device nodes under a parent/root node with a given compatible
property.
dt_next(root, prev) is used to walk the child nodes of the given parent and
takes two arguments - root contains the parent node to walk whilst prev
contains the previous child to search from so that it can be used as an
iterator over all children nodes.
The first iteration of dt_find_compatible_node(root, prev) calls
dt_next(root, root) which is not a well defined operation as prev is
assumed to be child of the root node. The result is that when a node
contains no children it will start returning the parent nodes siblings
until it hits the top of the tree at which point a NULL derefence is
attempted when looking for the root nodes parent.
Dereferencing NULL can result in undesirable data exceptions during system
boot and untimely non-hilarious system crashes. dt_next() should not be
called with prev == root. Instead we add a check to dt_next() such that
passing prev = NULL will cause it to start iterating from the first child
node (if any).
This manifested itself in a crash on boot on ZZ systems.
- hw/occ: Fix fast-reboot crash in P8 platforms.
commit 85a1de35cbe4 ("fast-boot: occ: Re-parse the pstate table during fast-boot" )
breaks the fast-reboot on P8 platforms while reiniting the OCC pstates. On P8
platforms OPAL adds additional two properties #address-cells and #size-cells
under ibm,opal/power-mgmt/ DT node. While in fast-reboot same properties adding
back to the same node results in Duplicate properties and hence fast-reboot fails
with below traces. ::
[ 541.410373292,5] OCC: All Chip Rdy after 0 ms
[ 541.410488745,3] Duplicate property "#address-cells" in node /ibm,opal/power-mgt
[ 541.410694290,0] Aborting!
CPU 0058 Backtrace:
S: 0000000031d639d0 R: 000000003001367c .backtrace+0x48
S: 0000000031d63a60 R: 000000003001a03c ._abort+0x4c
S: 0000000031d63ae0 R: 00000000300267d8 .new_property+0xd8
S: 0000000031d63b70 R: 0000000030026a28 .__dt_add_property_cells+0x30
S: 0000000031d63c10 R: 000000003003ea3c .occ_pstates_init+0x984
S: 0000000031d63d90 R: 00000000300142d8 .load_and_boot_kernel+0x86c
S: 0000000031d63e70 R: 000000003002586c .fast_reboot_entry+0x358
S: 0000000031d63f00 R: 00000000300029f4 fast_reset_entry+0x2c
This patch fixes this issue by removing these two properties on P8 while doing
OCC pstates re-init in fast-reboot code path.
Since skiboot-5.10-rc1:
- fast-reboot: occ: Re-parse the pstate table during fast-reboot
OCC shares the frequency list to host by copying the pstate table to
main memory in HOMER. This table is parsed during boot to create
device-tree properties for frequency and pstate IDs. OCC can update
the pstate table to present a new set of frequencies to the host. But
host will remain oblivious to these changes unless it is re-inited
with the updated device-tree CPU frequency properties. So this patch
allows to re-parse the pstate table and update the device-tree
properties during fast-reboot.
OCC updates the pstate table when asked to do so using pstate-table
bias command. And this is mainly used by WOF team for
characterization purposes.
- fast-reboot: move pci_reset error handling into fast-reboot code
pci_reset() currently does a platform reboot if it fails. It
should not know about fast-reboot at this level, so instead have
it return an error, and the fast reboot caller will do the
platform reboot.
The code essentially does the same thing, but flexibility is
improved. Ideally the fast reboot code should perform pci_reset
and all such fail-able operations before the CPU resets itself
and destroys its own stack. That's not the case now, but that
should be the goal.
Since skiboot-5.9:
- lpc: Clear pending IRQs at boot
When we come in from hostboot the LPC master has the bus reset indicator
set. This error isn't handled until the host kernel unmasks interrupts,
at which point we get the following spurious error: ::
[ 20.053560375,3] LPC: Got LPC reset on chip 0x0 !
[ 20.053564560,3] LPC[000]: Unknown LPC error Error address reg: 0x00000000
Fix this by clearing the various error bits in the LPC status register
before we initialise the skiboot LPC bus driver.
- hw/imc: Check ucode state before exposing units to Linux
disable_unavailable_units() checks whether the ucode
is in the running state before enabling the nest units
in the device tree. From a recent debug, it is found
that on some system boot, ucode is not loaded and
running in all the chips in the system. And this
caused a fail in OPAL_IMC_COUNTERS_STOP call where
we check for ucode state on each chip. Bug here is
that disable_unavailable_units() checks the state
of the ucode only in boot cpu chip. Patch adds a
condition in disable_unavailable_units() to check
for the ucode state in all the chip before enabling
the nest units in the device tree node.
- hdata/vpd: Add vendor property
ibm,vpd blob contains VN field. Use that to populate vendor property
for various FRU's.
- hdata/vpd: Fix DTC warnings
All the nodes under the vpd hierarchy have a unit address (their SLCA
index) but no reg properties. Add them and their size/address cells
to squash the warnings.
- HDAT/i2c: Fix SPD EEPROM compatible string
Hostboot doesn't give us accurate information about the DIMM SPD
devices. Hack around by assuming any EEPROM we find on the SPD I2C
master is an SPD EEPROM.
- hdata/i2c: Fix 512Kb EEPROM size
There's no such thing as a 412Kb EEPROM.
- libflash/mbox-flash: fall back to requesting lower MBOX versions from BMC
Some BMC mbox implementations seem to sometimes mysteriously fail when trying
to negotiate v3 when they only support v2. To work around this, we
can fall back to requesting lower mbox protocol versions until we find
one that works.
In theory, this should already "just work", but we have a counter example,
which this patch fixes.
- IPMI: Fix platform.cec_reboot() null ptr checks
Kudos to Hugo Landau who reported this in:
https://github.com/open-power/skiboot/issues/142
- hdata: Add location code property to xscom node
This patch adds chip location code property to xscom node.
- p8-i2c: Limit number of retry attempts
Current we will attempt to start an I2C transaction until it succeeds.
In the event that the OCC does not release the lock on an I2C bus this
results in an async token being held forever and the kernel thread that
started the transaction will block forever while waiting for an async
completion message. Fix this by limiting the number of attempts to
start the transaction.
- p8-i2c: Don't write the watermark register at init
On P9 the I2C master is shared with the OCC. Currently the watermark
values are set once at init time which is bad for two reasons:
a) We don't take the OCC master lock before setting it. Which
may cause issues if the OCC is currently using the master.
b) The OCC might change the watermark levels and we need to reset
them.
Change this so that we set the watermark value when a new transaction
is started rather than at init time.
- hdata: Rename 'fsp-ipl-side' as 'sp-ipl-side'
as OPAL is building device tree for both FSP and BMC system.
Also I don't see anyone using this property today. Hence renaming
should be fine.
- hdata/vpd: add support for parsing CPU VRML records
Allows skiboot to parse out the processor part/serial numbers
on OpenPOWER P9 machines.
- core/lock: Introduce atomic cmpxchg and implement try_lock with it
cmpxchg will be used in a subsequent change, and this reduces the
amount of asm code.
- direct-controls: add xscom error handling for p8
Add xscom checks which will print something useful and return error
back to callers (which already have error handling plumbed in).
- direct-controls: p8 implementation of generic direct controls
This reworks the sreset functionality that was brought over from
fast-reboot, and fits it under the generic direct controls APIs.
The fast reboot APIs are implemented using generic direct controls,
which also makes them available on p9.
- fast-reboot: allow mambo fast reboot independent of CPU type
Don't tie mambo fast reboot to POWER8 CPU type.
- fast-reboot: remove delay after sreset
There is a 100ms delay when targets reach sreset which does not appear
to have a good purpose. Remove it and therefore reduce the sreset timeout
by the same amount.
- fast-reboot: add more barriers around cpu state changes
This is a bit of paranoia, but when a CPU changes state to signal it
has reached a particular point, all previous stores should be visible.
- fast-reboot: add sreset timeout detection and handling
Have the initiator wait for all its sreset targets to call in, and
time out after 200ms if they did not. Fail and revert to IPL reboot.
Testing indicates that after successful sreset_all_others(), it
takes less than 102ms (in hundreds of fast reboots) for secondaries
to call in. 100 of that is due to an initial delay, but core
un-splitting was not measured.
- fast-reboot: make spin loops consistent and SMT friendly
- fast-reboot: add sreset_all_others error handling
Pass back failures from sreset_all_others, also change return codes to
OPAL form in sreset_all_prepare to match.
Errors will revert to the IPL path, so it's not critical to completely
clean up everything if that would complicate things. Detecting the
error and failing is the important thing.
- fast-reboot: restore SMT priority on spin loop exit
- Add documentation for ibm, firmware-versions device tree node
- NX: Print read xscom config failures.
Currently in NX, only write xscom config failures are tracing.
Add trace statements for read xscom config failures too.
No functional changes.
- hw/nx: Fix NX BAR assignments
The NX rng BAR is used by each core to source random numbers for the
DARN instruction. Currently we configure each core to use the NX rng of
the chip that it exists on. Unfortunately, the NX can be de-configured by
hostboot and in this case we need to use the NX of a different chip.
This patch moves the BAR assignments for the NX into the normal nx-rng
init path. This lets us check if the normal (chip local) NX is active
when configuring which NX a core should use so that we can fall back
gracefully.
- FSP-elog: Reduce verbosity of elog messages
These messages just fill up the opal console log with useless messages
resulting in us losing useful information.
They have been like this since the first commit in skiboot. Make them
trace.
- core/bitmap: fix bitmap iteration limit corruption
The bitmap iterators did not reduce the number of bits to scan
when searching for the next bit, which would result in them
overrunning their bitmap.
These are only used in one place, in xive reset, and the effect
is that the xive reset code will keep zeroing memory until it
reaches a block of memory of MAX_EQ_COUNT >> 3 bits in length,
all zeroes.
- hw/imc: always enable "imc_nest_chip" exports property
imc_dt_update_nest_node() adds a "imc_nest_chip" property
to the "exports" node (under opal_node) to view nest counter
region. This comes handy when debugging ucode runtime
errors (like counter data update or control block update
so on...). And current code enables the property only if
the microcode is in running state at system boot. To aid
the debug of ucode not running/starting issues at boot,
enable the addition of "imc_nest_chip" property always.
NVLINK2
-------
Since skiboot-5.10-rc2:
- npu2: Disable TVT range check when in bypass mode
On POWER9 the GPUs need to be able to access the MMIO memory space. Therefore
the TVT range check needs to include the MMIO address space. As any possible
range check would cover all of memory anyway this patch just disables the TVT
range check all together when bypassing the TCE tables.
- hw/npu2: support creset of npu2 devices
creset calls in the hw procedure that resets the PHY, we don't
take them out of reset, just put them in reset.
this fixes a kexec issue.
Since skiboot-5.10-rc1:
- npu2/tce: Fix page size checking
The page size is encoded in the TVT data [59:63] as @shift+11 but
the tce_kill handler does not do the math right; this fixes it.
Since skiboot-5.9:
- npu2-hw-procedures.c: Correct phy lane mapping
Each NVLINK2 device is associated with a particular group of OBUS lanes via
a lane mask which is read from HDAT via the device-tree. However Skiboot's
interpretation of lane mask was different to what is exported from the
HDAT.
Specifically the lane mask bits in the HDAT are encoded in IBM bit ordering
for a 24-bit wide value. So for example in normal bit ordering lane-0 is
represented by having lane-mask bit 23 set and lane-23 is represented by
lane-mask bit 0. This patch alters the Skiboot interpretation to match what
is passed from HDAT.
- npu2-hw-procedures.c: Power up lanes during ntl reset
Newer versions of Hostboot will not power up the NVLINK2 PHY lanes by
default. The phy_reset procedure already powers up the lanes but they also
need to be powered up in order to access the DL.
The reset_ntl procedure is called by the device driver to bring the DL out
of reset and get it into a working state. Therefore we also need to add
lane and clock power up to the reset_ntl procedure.
- npu2.c: Add PE error detection
Invalid accesses from the GPU can cause a specific PE to be frozen by the
NPU. Add an interrupt handler which reports the frozen PE to the operating
system via as an EEH event.
- npu2.c: Fix XIVE IRQ alignment
- npu2: hw-procedures: Refactor reset_ntl procedure
Change the implementation of reset_ntl to match the latest programming
guide documentation.
- npu2: hw-procedures: Add phy_rx_clock_sel()
Change the RX clk mux control to be done by software instead of HW. This
avoids glitches caused by changing the mux setting.
- npu2: hw-procedures: Change phy_rx_clock_sel values
The clock selection bits we set here are inputs to a state machine.
DL clock select (bits 30-31)
0b00
lane 0 clock
0b01
lane 7 clock
0b10
grid clock
0b11
invalid/no-op
To recover from a potential glitch, we need to ensure that the value we
set forces a state change. Our current sequence is to set 0x3 followed
by 0x1. With the above now known, that is actually a no-op followed by
selection of lane 7. Depending on lane reversal, that selection is not a
state change for some bricks.
The way to force a state change in all cases is to switch to the grid
clock, and then back to a lane.
- npu2: hw-procedures: Manipulate IOVALID during training
Ensure that the IOVALID bit for this brick is raised at the start of
link training, in the reset_ntl procedure.
Then, to protect us from a glitch when the PHY clock turns off or gets
chopped, lower IOVALID for the duration of the phy_reset and
phy_rx_dccal procedures.
- npu2: hw-procedures: Add check_credits procedure
As an immediate mitigation for a current hardware glitch, add a procedure
that can be used to validate NTL credit values. This will be called as a
safeguard to check that link training succeeded.
Assert that things are exactly as we expect, because if they aren't, the
system will experience a catastrophic failure shortly after the start of
link traffic.
- npu2: Print bdfn in NPU2DEV* logging macros
Revise the NPU2DEV{DBG,INF,ERR} logging macros to include the device's
bdfn. It's useful to know exactly which link we're referring to.
For instance, instead of ::
[ 234.044921238,6] NPU6: Starting procedure reset_ntl
[ 234.048578101,6] NPU6: Starting procedure reset_ntl
[ 234.051049676,6] NPU6: Starting procedure reset_ntl
[ 234.053503542,6] NPU6: Starting procedure reset_ntl
[ 234.057182864,6] NPU6: Starting procedure reset_ntl
[ 234.059666137,6] NPU6: Starting procedure reset_ntl
we'll get ::
[ 234.044921238,6] NPU6:0:0.0 Starting procedure reset_ntl
[ 234.048578101,6] NPU6:0:0.1 Starting procedure reset_ntl
[ 234.051049676,6] NPU6:0:0.2 Starting procedure reset_ntl
[ 234.053503542,6] NPU6:0:1.0 Starting procedure reset_ntl
[ 234.057182864,6] NPU6:0:1.1 Starting procedure reset_ntl
[ 234.059666137,6] NPU6:0:1.2 Starting procedure reset_ntl
- npu2: Move to new GPU memory map
There are three different ways we configure the MCD and memory map.
1) Old way (current way)
Skiboot configures the MCD and puts GPUs at 4TB and below
2) New way with MCD
Hostboot configures the MCD and skiboot puts GPU at 4TB and above
3) New way without MCD
No one configures the MCD and skiboot puts GPU at 4TB and below
The patch keeps option 1 and adds options 2 and 3.
The different configurations are detected using certain scoms (see
patch).
Option 1 will go away eventually as it's a configuration that can
cause xstops or data integrity problems. We are keeping it around to
support existing hostboot.
Option 2 supports only 4 GPUs and 512GB of memory per socket.
Option 3 supports 6 GPUs and 4TB of memory but may have some
performance impact.
- phys-map: Rename GPU_MEM to GPU_MEM_4T_DOWN
This map is soon to be replaced, but we are going to keep it around
for a little while so that we support older hostboot firmware.
Platform Specific Fixes
-----------------------
Witherspoon
^^^^^^^^^^^
- Witherspoon: Remove old Witherspoon platform definition
An old Witherspoon platform definition was added to aid the transition from
versions of Hostboot which didn't have the correct NVLINK2 HDAT information
available and/or planar VPD. These system should now be updated so remove
the possibly incorrect default assumption.
This may disable NVLINK2 on old out-dated systems but it can easily be
restored with the appropriate FW and/or VPD updates. In any case there is a
a 50% chance the existing default behaviour was incorrect as it only
supports 6 GPU systems. Using an incorrect platform definition leads to
undefined behaviour which is more difficult to detect/debug than not
creating the NVLINK2 devices so remove the possibly incorrect default
behaviour.
- Witherspoon: Fix VPD EEPROM type
There are user-space tools that update the planar VPD via the sysfs
interface. Currently we do not get correct information from hostboot
about the exact type of the EEPROM so we need to manually fix it up
here. This needs to be done as a platform specific fix since there is
not standardised VPD EEPROM type.
IBM FSP Systems
^^^^^^^^^^^^^^^
- nvram: Fix 'missing' nvram on FSP systems.
commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
from NVRAM rather early. This works fine on BMC based systems as
nvram_init() is actually synchronous. This is not true for FSP systems
and it turns out that the query for the console log level simply
queries blank nvram.
The simple fix is to wait for the NVRAM read to complete before
performing any query. Unfortunately it turns out that the fsp-nvram
code does not inform the generic NVRAM layer when the read is complete,
rather, it must be prompted to do so.
This patch addresses both these problems. This patch adds a check before
the first read of the NVRAM (for the console log level) that the read
has completed. The fsp-nvram code has been updated to inform the generic
layer as soon as the read completes.
The old prompt to the fsp-nvram code has been removed but a check to
ensure that the NVRAM has been loaded remains. It is conservative but
if the NVRAM is not done loading before the host is booted it will not
have an nvram device-tree node which means it won't be able to access
the NVRAM at all, ever, even after the NVRAM has loaded.
Utilities
----------
Since skiboot-5.10-rc1:
- opal-prd: Fix FTBFS with -Werror=format-overflow
i2c.c fails to compile with gcc7 and -Werror=format-overflow used in
Debian Unstable and Ubuntu 18.04 : ::
i2c.c: In function ‘i2c_init’:
i2c.c:211:15: error: ‘%s’ directive writing up to 255 bytes into a
region of size 236 [-Werror=format-overflow=]
Since skiboot-5.9:
- Fix xscom-utils distclean target
In Debian/Ubuntu, the packaging system likes to have a full clean-up that
restores the tree back to original one, so add some files to the distclean
target.
- Add man pages for xscom-utils and pflash
For the need of Debian/Ubuntu packaging, I inferred some initial man
pages from their help output.
gard
^^^^
- gard: Add tests
I hear Stewart likes these for some reason. Dunno why.
- gard: Add OpenBMC vPNOR support
A big-ol-hack to add some checking for OpenBMC's vPNOR GUARD files under
/media/pnor-prsv. This isn't ideal since it doesn't handle the create
case well, but it's better than nothing.
- gard: Always use MTD to access flash
Direct mode is generally either unsafe or unsupported. We should always
access the PNOR via an MTD device so make that the default. If someone
really needs direct mode, then they can use pflash.
- gard: Fix up do_create return values
The return value of a subcommand is interpreted as a libflash error code
when it's positive or some subcommand specific error when negative.
Currently the create subcommand always returns zero when exiting (even
for errors) so fix that.
- gard: Add usage message for -p
The -p argument only really makes sense when -f is specified. Print an
actual error message rather than just the usage blob.
- gard: Fix max instance count
There's an entire byte for the instance count rather than a nibble. Only
barf if the instance number is beyond 255 rather than 16.
- gard: Fix up path parsing
Currently we assume that the Unit ID can be used as an array index into
the chip_units[] structure. There are holes in the ID space though, so
this doesn't actually work. Fix it up by walking the array looking for
the ID.
- gard: Set chip generation based on PVR
Currently we assume that this tool is being used on a P8 system by
default and allow the user to override this behaviour using the -8 and
-9 command line arguments. When running on the host we can use the
PVR to guess what chip generation so do that.
This also changes the default behaviour to assume that the host is a P9
when running on an ARM system. This tool didn't even work when compiled
for ARM until recently and the OpenBMC vPNOR hack that we have currently
is broken for P9 systems that don't use vPNOR (Zaius and Romulus).
- gard: Allow records with an ID of 0xffffffff
We currently assume that a record with an ID of 0xffffffff is invalid.
Apparently this is incorrect and we should display these records, so
expand the check to compare the entire record with 0xff rather than
just the ID.
- gard: create: Allow creating arbitrary GARD records
Add a new sub-command that allows us to create GARD records for
arbitrary chip units. There isn't a whole lot of constraints on this and
that limits how useful it can be, but it does allow a user to GARD out
individual DIMMs, chips or cores from the BMC (or host) if needed.
There are a few caveats though:
1) Not everything can, or should, have a GARD record applied it to.
2) There is no validation that the unit actually exists. Doing that
sort of validation requires something that understands the FAPI
targeting information (I think) and adding support for it here
would require some knowledge from the system XML file.
3) There's no way to get a list of paths in the system.
4) Although we can create a GARD record at runtime it won't be applied
until the next IPL.
- gard: Add path parsing support
In order to support manual GARD records we need to be able to parse the
hardware unit path strings. This patch implements that.
- gard: list: Improve output
Display the full path to the GARDed hardware unit in each record rather
than relying on the output of `gard show` and convert do_list() to use
the iterator while we're here.
- gard: {list, show}: Fix the Type field in the output
The output of `gard list` has a field named "Type", however this
doesn't actually indicate the type of the record. Rather, it
shows the type of the path used to identify the hardware being
GARDed. This is of pretty dubious value considering the Physical
path seems to always be used when referring to GARDed hardware.
- gard: Add P9 support
- gard: Update chip unit data
Source the list of units from the hostboot source rather than the
previous hard coded list. The list of path element types changes
between generations so we need to add a level of indirection to
accommodate P9. This also changes the names used to match those
printed by Hostboot at IPL time and paves the way to adding support
for manual GARD record creation.
- gard: show: Remove "Res Recovery" field
This field has never been populated by hostboot on OpenPower systems
so there's no real point in reporting it's contents.
libflash / pflash
^^^^^^^^^^^^^^^^^
Anybody shipping libflash or pflash to interact with POWER9 systems must
upgrade to this version.
Since skiboot-5.10-rc2:
- pflash: Fix makefile dependency issue
Since skiboot-5.9:
- pflash: Support for volatile flag
The volatile flag was added to the PNOR image to
indicate partitions that are cleared during a host
power off. Display this flag from the pflash command.
- pflash: Support for clean_on_ecc_error flag
Add the misc flag clear_on_ecc_error to libflash/pflash. This was
the only missing flag. The generator of the virtual PNOR image
relies on libflash/pflash to provide the partition information,
so all flags are needed to build an accurate virtual PNOR partition
table.
- pflash: Respect write(2) return values
The write(2) system call returns the number of bytes written, this is
important since it is entitled to write less than what we requested.
Currently we ignore the return value and assume it wrote everything we
requested. While in practice this is likely to always be the case, it
isn't actually correct.
- external/pflash: Fix erasing within a single erase block
It is possible to erase within a single erase block. Currently the
pflash code assumes that if the erase starts part way into an erase
block it is because it needs to be aligned up to the boundary with the
next erase block.
Doing an erase smaller than a single erase block will cause underflows
and looping forever on erase.
- external/pflash: Fix non-zero return code for successful read when size%256 != 0
When performing a read the return value from pflash is non-zero, even for
a successful read, when the size being read is not a multiple of 256.
This is because do_read_file returns the value from the write system
call which is then returned by pflash. When the size is a multiple of
256 we get lucky in that this wraps around back to zero. However for any
other value the return code is size % 256. This means even when the
operation is successful the return code will seem to reflect an error.
Fix this by returning zero if the entire size was read correctly,
otherwise return the corresponding error code.
- libflash: Fix parity calculation on ARM
To calculate the ECC syndrome we need to calculate the parity of a 64bit
number. On non-powerpc platforms we use the GCC builtin function
__builtin_parityl() to do this calculation. This is broken on 32bit ARM
where sizeof(unsigned long) is four bytes. Using __builtin_parityll()
instead cures this.
- libflash/mbox-flash: Add the ability to lock flash
- libflash/mbox-flash: Understand v3
- libflash/mbox-flash: Use BMC suggested timeout value
- libflash/mbox-flash: Simplify message sending
hw/lpc-mbox no longer requires that the memory associated with messages
exist for the lifetime of the message. Once it has been sent to the BMC,
that is bmc_mbox_enqueue() returns, lpc-mbox does not need the message
to continue to exist. On the receiving side, lpc-mbox will ensure that a
message exists for the receiving callback function.
Remove all code to deal with allocating messages.
- hw/lpc-mbox: Simplify message bookkeeping and timeouts
Currently the hw/lpc-mbox layer keeps a pointer for the currently
in-flight message for the duration of the mbox call. This creates
problems when messages timeout, is that pointer still valid, what can we
do with it. The memory is owned by the caller but if the caller has
declared a timeout, it may have freed that memory.
Another problem is locking. This patch also locks around sending and
receiving to avoid races with timeouts and possible resends. There was
some locking previously which was likely insufficient - definitely too
hard to be sure is correct
All this is made much easier with the previous rework which moves
sequence number allocation and verification into lpc-mbox rather than
the caller.
- libflash/mbox-flash: Allow mbox-flash to tell the driver msg timeouts
Currently when mbox-flash decides that a message times out the driver
has no way of knowing to drop the message and will continue waiting for
a response indefinitely preventing more messages from ever being sent.
This is a problem if the BMC crashes or has some other issue where it
won't ever respond to our outstanding message.
This patch provides a method for mbox-flash to tell the driver how long
it should wait before it no longer needs to care about the response.
- libflash/mbox-flash: Move sequence handling to driver level
- libflash/mbox-flash: Always close windows before opening a new window
The MBOX protocol states that if an open window command fails then all
open windows are closed. Currently, if an open window command fails
mbox-flash will erroneously assume that the previously open window is
still open.
The solution to this is to mark all windows as closed before issuing an
open window command and then on success we'll mark the new window as
open.
- libflash/mbox-flash: Add v2 error codes
opal-prd
^^^^^^^^
Anybody shipping `opal-prd` for POWER9 systems must upgrade `opal-prd` to
this new version.
- prd: Log unsupported message type
Useful for debugging.
Sample output: ::
[29155.157050283,7] PRD: Unsupported prd message type : 0xc
- opal-prd: occ: Add support for runtime OCC load/start in ZZ
This patch adds support to handle OCC load/start event from FSP/PRD.
During IPL we send a success directly to FSP without invoking any HBRT
load routines on receiving OCC load mbox message from FSP. At runtime
we forward this event to host opal-prd.
This patch provides support for invoking OCC load/start HBRT routines
like load_pm_complex() and start_pm_complex() from opal-prd.
- opal-prd: Add support for runtime OCC reset in ZZ
This patch handles OCC_RESET runtime events in host opal-prd and also
provides support for calling 'hostinterface->wakeup()' which is
required for doing the reset operation.
- prd: Enable error logging via firmware_request interface
In P9 HBRT sends error logs to FSP via firmware_request interface.
This patch adds support to parse error log and send it to FSP.
- prd: Add generic response structure inside prd_fw_msg
This patch adds generic response structure. Also sync prd_fw_msg type
macros with hostboot.
- opal-prd: flush after logging to stdio in debug mode
When in debug mode, flush after each log output. This makes it more
likely that we'll catch failure reasons on severe errors.
Debugging and reliability improvements
--------------------------------------
Since skiboot-5.10-rc3:
- increase log verbosity in debug builds
- Add -debug to version on DEBUG builds
- cpu_wait_job: Correctly report time spent waiting for job
Since skiboot-5.10-rc2:
- ATTN: Enable flush instruction cache bit in HID register
In P9, we have to enable "flush the instruction cache" bit along with
"attn instruction support" bit to trigger attention.
Since skiboot-5.10-rc1:
- core/init: manage MSR[ME] explicitly, always enable
The current boot sequence inherits MSR[ME] from the IPL firmware, and
never changes it. Some environments disable MSR[ME] (e.g., mambo), and
others can enable it (hostboot).
This has two problems. First, MSR[ME] must be disabled while in the
process of taking over the interrupt vector from the previous
environment. Second, after installing our machine check handler,
MSR[ME] should be enabled to get some useful output rather than a
checkstop.
- core/exception: beautify exception handler, add MCE-involved registers
Print DSISR and DAR, to help with deciphering machine check exceptions,
and improve the output a bit, decode NIP symbol, improve alignment, etc.
Also print a specific header for machine check, because we do expect to
see these if there is a hardware failure.
Before: ::
[ 0.005968779,3] ***********************************************
[ 0.005974102,3] Unexpected exception 200 !
[ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000
[ 0.005998130,3] CFAR : 00000000300b58bc
[ 0.006002769,3] CR : 40000004 XER: 20000000
[ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000
[ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000
[...]
After: ::
[ 0.003287941,3] ***********************************************
[ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24
[ 0.003579628,3] CFAR : 00000000300b5964
[ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000
[ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80
[ 0.003609930,3] CR : 40000004 XER : 20000000
[ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000
[ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000
[...]
Since skiboot-5.9:
- lock: Add additional lock auditing code
Keep track of lock owner name and replace lock_depth counter
with a per-cpu list of locks held by the cpu.
This allows us to print the actual locks held in case we hit
the (in)famous message about opal_pollers being run with a
lock held.
It also allows us to warn (and drop them) if locks are still
held when returning to the OS or completing a scheduled job.
- Add support for new GCC 7 parametrized stack protector
This gives us per-cpu guard values as well. For now I just
XOR a magic constant with the CPU PIR value.
- Mambo: run hello_world and sreset_world tests with Secure and Trusted Boot
We *disable* the secure boot part, but we keep the verified boot
part as we don't currently have container verification code for Mambo.
We can run a small part of the code currently though.
- core/flash.c: extern function to get the name of a PNOR partition
This adds the flash_map_resource_name() to allow skiboot subsystems to
lookup the name of a PNOR partition. Thus, we don't need to duplicate
the same information in other places (e.g. libstb).
- libflash/mbox-flash: only wait for MBOX_DEFAULT_POLL_MS if busy
This makes the mbox unit test run 300x quicker and seems to
shave about 6 seconds from boot time on Witherspoon.
- make check: Make valgrind optional
To (slightly) lower the barrier for contributions, we can make valgrind
optional with just a small amount of plumbing.
This allows make check to run successfully without valgrind.
- libflash/test: Add tests for mbox-flash
A first basic set of tests for mbox-flash. These tests do their testing
by stubbing out or otherwise replacing functions not in
libflash/mbox-flash.c. The stubbed out version of the function can then
be used to emulate a BMC mbox daemon talking to back to the code in
mbox-flash and it can ensure that there is some adherence to the
protocol and that from a block-level api point of view the world appears
sane.
This makes these tests simple to run and they have been integrated into
`make check`. The down side is that these tests rely on duplicated
feature incomplete BMC daemon behaviour. Therefore these tests are a
strong indicator of broken behaviour but a very unreliable indicator of
correctness.
Full integration tests with a 'real' BMC daemon are probably beyond the
scope of this repository.
- external/test/test.sh: fix VERSION substitution when no tags
i.e. we get a hash rather than a version number
This seems to be occurring in Travis if it doesn't pull a tag.
- external/test: make stripping out version number more robust
For some bizarre reason, Travis started failing on this
substitution when there'd been zero code changes in this
area... This at least papers over whatever the problem is
for the time being.
- io: Add load_wait() helper
This uses the standard form twi/isync pair to ensure a load
is consumed by the core before continuing. This can be necessary
under some circumstances for example when having the following
sequence:
- Store reg A
- Load reg A (ensure above store pushed out)
- delay loop
- Store reg A
I.E., a mandatory delay between 2 stores. In theory the first store
is only guaranteed to reach the device after the load from the same
location has completed. However the processor will start executing
the delay loop without waiting for the return value from the load.
This construct enforces that the delay loop isn't executed until
the load value has been returned.
- chiptod: Keep boot timestamps contiguous
Currently we reset the timebase value to (almost) zero when
synchronising the timebase of each chip to the Chip TOD network which
results in this: ::
[ 42.374813167,5] CPU: All 80 processors called in...
[ 2.222791151,5] FLASH: Found system flash: Macronix MXxxL51235F id:0
[ 2.222977933,5] BT: Interface initialized, IO 0x00e4
This patch modifies the chiptod_init() process to use the current
timebase value rather than resetting it to zero. This results in the
timestamps remaining contiguous from the start of hostboot until
the petikernel starts. e.g. ::
[ 70.188811484,5] CPU: All 144 processors called in...
[ 72.458004252,5] FLASH: Found system flash: id:0
[ 72.458147358,5] BT: Interface initialized, IO 0x00e4
- hdata/spira: Add missing newline to prlog() call
We're missing a \n here.
- opal/xscom: Add recovery for lost core wakeup SCOM failures.
Due to a hardware issue where core responding to SCOM was delayed due to
thread reconfiguration, leaves the SCOM logic in a state where the
subsequent SCOM to that core can get errors. This is affected for Core
PC SCOM registers in the range of 20010A80-20010ABF
The solution is if a xscom timeout occurs to one of Core PC SCOM registers
in the range of 20010A80-20010ABF, a clearing SCOM write is done to
0x20010800 with data of '0x00000000' which will also get a timeout but
clears the SCOM logic errors. After the clearing write is done the original
SCOM operation can be retried.
The SCOM timeout is reported as status 0x4 (Invalid address) in HMER[21-23].
- opal/xscom: Move the delay inside xscom_reset() function.
So caller of xscom_reset() does not have to bother about adding a delay
separately. Instead caller can control whether to add a delay or not using
second argument to xscom_reset().
- timer: Stop calling list_top() racily
This will trip the debug checks in debug builds under some circumstances
and is actually a rather bad idea as we might look at a timer that is
concurrently being removed and modified, and thus incorrectly assume
there is no work to do.
- fsp: Bail out of HIR if FSP is resetting voluntarily
a. Surveillance response times out and OPAL triggers a HIR
b. Before the HIR process kicks in, OPAL gets a PSI interrupt indicating link down
c. HIR process continues and OPAL tries to write to DRCR; PSI link inactive => xstop
OPAL should confirm that the FSP is not already in reset in the HIR path.
- sreset_kernel: only run SMT tests due to not supporting re-entry
- Use systemsim-p9 v1.1
- direct-controls: enable fast reboot direct controls for mambo
Add mambo direct controls to stop threads, which is required for
reliable fast-reboot. Enable direct controls by default on mambo.
- core/opal: always verify cpu->pir on entry
- asm/head: add entry/exit calls
Add entry and exit C functions that can do some more complex
checks before the opal proper call. This requires saving off
volatile registers that have arguments in them.
- core/lock: improve bust_locks
Prevent try_lock from modifying the lock state when bust_locks is set.
unlock will not unlock it in that case, so locks will get taken and
never released while bust_locks is set.
- hw/occ: Log proper SCOM register names
This patch fixes the logging of incorrect SCOM
register names.
- mambo: Add support for NUMA
Currently the mambo scripts can do multiple chips, but only the first
ever has memory.
This patch adds support for having memory on each chip, with each
appearing as a separate NUMA node. Each node gets MEM_SIZE worth of
memory.
It's opt-in, via ``export MAMBO_NUMA=1``.
- external/mambo: Switch qtrace command to use plug-ins
The plug-in seems to be the preferred way to do this now, it works
better, and the qtracer emitter seems to generate invalid traces
in new mambo versions.
- asm/head: Loop after attn
We use the attn instruction to raise an error in early boot if OPAL
don't recognise the PVR. It's possible for hostboot to disable the
attn instruction before entering OPAL so add an extra busy loop after
the attn to prevent attempting to boot on an unknown processor.
Contributors
------------
- 302 csets from 32 developers
- 3 employers found
- A total of 15919 lines added, 4786 removed (delta 11133)
Extending the analysis done for some previous releases, we can see our trends
in code review across versions:
======= ====== ======== ========= ========= ===========
Release csets Ack % Reviews % Tested % Reported %
======= ====== ======== ========= ========= ===========
5.0 329 15 (5%) 20 (6%) 1 (0%) 0 (0%)
5.1 372 13 (3%) 38 (10%) 1 (0%) 4 (1%)
5.2-rc1 334 20 (6%) 34 (10%) 6 (2%) 11 (3%)
5.3-rc1 302 36 (12%) 53 (18%) 4 (1%) 5 (2%)
5.4 361 16 (4%) 28 (8%) 1 (0%) 9 (2%)
5.5 408 11 (3%) 48 (12%) 14 (3%) 10 (2%)
5.6 87 12 (14%) 6 (7%) 5 (6%) 2 (2%)
5.7 232 30 (13%) 32 (14%) 5 (2%) 2 (1%)
5.8 157 13 (8%) 36 (23%) 2 (1%) 6 (4%)
5.9 209 15 (7%) 78 (37%) 3 (1%) 10 (5%)
5.10 302 20 (6%) 62 (21%) 24 (8%) 11 (4%)
======= ====== ======== ========= ========= ===========
The review count for v5.9 is largely bogus, there was a series of 25 whitespace
patches that got "Reviewed-by" and if we exclude them, we're back to 14%,
which is more like what I'd expect.
For 5.10, We've seen an increase in Reviewed-by from 5.9, back to closer to
5.8 levels. I'm hoping we can keep the ~20% up.
Initially I was really pleased with the increase in Tested-by, but with closer
examination, 17 of those are actually from various automated testing on
commits to code we bring in from hostboot/other firmware components. When
you exclude them, we're back down to 2% getting Tested-by, which isn't great.
Developers with the most changesets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========================== === =======
Developer # %
========================== === =======
Stewart Smith 40 (13.2%)
Nicholas Piggin 37 (12.3%)
Oliver O'Halloran 36 (11.9%)
Benjamin Herrenschmidt 23 (7.6%)
Claudio Carvalho 20 (6.6%)
Cyril Bur 19 (6.3%)
Michael Neuling 13 (4.3%)
Shilpasri G Bhat 12 (4.0%)
Reza Arbab 12 (4.0%)
Pridhiviraj Paidipeddi 11 (3.6%)
Vasant Hegde 10 (3.3%)
Akshay Adiga 10 (3.3%)
Mahesh Salgaonkar 8 (2.6%)
Russell Currey 7 (2.3%)
Alistair Popple 7 (2.3%)
Vaibhav Jain 5 (1.7%)
Prem Shanker Jha 4 (1.3%)
Robert Lippert 4 (1.3%)
Frédéric Bonnard 3 (1.0%)
Christophe Lombard 3 (1.0%)
Jeremy Kerr 2 (0.7%)
Michael Ellerman 2 (0.7%)
Balbir Singh 2 (0.7%)
Andrew Donnellan 2 (0.7%)
Madhavan Srinivasan 2 (0.7%)
Adriana Kobylak 2 (0.7%)
Sukadev Bhattiprolu 1 (0.3%)
Alexey Kardashevskiy 1 (0.3%)
Frederic Barrat 1 (0.3%)
Ananth N Mavinakayanahalli 1 (0.3%)
Suraj Jitindar Singh 1 (0.3%)
Guilherme G. Piccoli 1 (0.3%)
========================== === =======
Developers with the most changed lines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========================== ==== =======
Developer # %
========================== ==== =======
Stewart Smith 4284 (24.5%)
Nicholas Piggin 2924 (16.7%)
Claudio Carvalho 2476 (14.2%)
Shilpasri G Bhat 1490 (8.5%)
Cyril Bur 1475 (8.4%)
Oliver O'Halloran 1242 (7.1%)
Benjamin Herrenschmidt 736 (4.2%)
Alistair Popple 498 (2.8%)
Vasant Hegde 299 (1.7%)
Akshay Adiga 273 (1.6%)
Reza Arbab 231 (1.3%)
Mahesh Salgaonkar 225 (1.3%)
Balbir Singh 213 (1.2%)
Frédéric Bonnard 169 (1.0%)
Michael Neuling 142 (0.8%)
Robert Lippert 97 (0.6%)
Pridhiviraj Paidipeddi 93 (0.5%)
Prem Shanker Jha 92 (0.5%)
Christophe Lombard 80 (0.5%)
Russell Currey 78 (0.4%)
Michael Ellerman 72 (0.4%)
Adriana Kobylak 71 (0.4%)
Madhavan Srinivasan 61 (0.3%)
Sukadev Bhattiprolu 58 (0.3%)
Vaibhav Jain 52 (0.3%)
Jeremy Kerr 27 (0.2%)
Ananth N Mavinakayanahalli 16 (0.1%)
Frederic Barrat 9 (0.1%)
Andrew Donnellan 5 (0.0%)
Alexey Kardashevskiy 3 (0.0%)
Suraj Jitindar Singh 1 (0.0%)
Guilherme G. Piccoli 1 (0.0%)
========================== ==== =======
Developers with the most lines removed
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========================= ==== =======
Developer # %
========================= ==== =======
Alistair Popple 304 (6.4%)
Andrew Donnellan 1 (0.0%)
========================= ==== =======
Developers with the most signoffs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
========================== === =======
Developer # %
========================== === =======
Stewart Smith 262 (99.2%)
Reza Arbab 1 (0.4%)
Mahesh Salgaonkar 1 (0.4%)
========================== === =======
Developers with the most reviews
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
================================ ==== =======
Developer # %
================================ ==== =======
Andrew Donnellan 8 (13.6%)
Balbir Singh 5 (8.5%)
Vasant Hegde 5 (8.5%)
Gregory S. Still 4 (6.8%)
Nicholas Piggin 4 (6.8%)
Reza Arbab 3 (5.1%)
Alistair Popple 3 (5.1%)
RANGANATHPRASAD G. BRAHMASAMUDRA 3 (5.1%)
Jennifer A. Stofer 3 (5.1%)
Oliver O'Halloran 3 (5.1%)
Vaidyanathan Srinivasan 2 (3.4%)
Hostboot Team 2 (3.4%)
Christian R. Geddes 2 (3.4%)
Frederic Barrat 2 (3.4%)
Cyril Bur 2 (3.4%)
Stewart Smith 1 (1.7%)
Cédric Le Goater 1 (1.7%)
Samuel Mendoza-Jonas 1 (1.7%)
Daniel M. Crowell 1 (1.7%)
Vaibhav Jain 1 (1.7%)
Madhavan Srinivasan 1 (1.7%)
Michael Ellerman 1 (1.7%)
Shilpasri G Bhat 1 (1.7%)
**Total** 59 (100%)
================================ ==== =======
Developers with the most test credits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=========================== == =======
Developer # %
=========================== == =======
FSP CI Jenkins 4 (16.7%)
Jenkins Server 4 (16.7%)
Hostboot CI 4 (16.7%)
Oliver O'Halloran 3 (12.5%)
Jenkins OP Build CI 3 (12.5%)
Jenkins OP HW 2 (8.3%)
Pridhiviraj Paidipeddi 2 (8.3%)
Andrew Donnellan 1 (4.2%)
Vaidyanathan Srinivasan 1 (4.2%)
**Total** 24 (100%)
=========================== == =======
Developers who gave the most tested-by credits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=========================== == =======
Developer # %
=========================== == =======
Prem Shanker Jha 17 (70.8%)
Benjamin Herrenschmidt 3 (12.5%)
Stewart Smith 2 (8.3%)
Shilpasri G Bhat 1 (4.2%)
Ananth N Mavinakayanahalli 1 (4.2%)
**Total** 24 (100%)
=========================== == =======
Developers with the most report credits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=========================== == =======
Developer # %
=========================== == =======
Pridhiviraj Paidipeddi 2 (18.2%)
Benjamin Herrenschmidt 1 (9.1%)
Andrew Donnellan 1 (9.1%)
Michael Ellerman 1 (9.1%)
Deb McLemore 1 (9.1%)
Brad Bishop 1 (9.1%)
Michel Normand 1 (9.1%)
Hugo Landau 1 (9.1%)
Minda Wei 1 (9.1%)
Francesco A Campisano 1 (9.1%)
**Total** 11 (100%)
=========================== == =======
Developers who gave the most report credits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=========================== == =======
Developer # %
=========================== == =======
Stewart Smith 7 (63.6%)
Suraj Jitindar Singh 1 (9.1%)
Jeremy Kerr 1 (9.1%)
Michael Neuling 1 (9.1%)
Frédéric Bonnard 1 (9.1%)
**Total** 11 (100%)
=========================== == =======
Changesets and Employers
^^^^^^^^^^^^^^^^^^^^^^^^
Top changeset contributors by employer:
========================== === =======
Employer # %
========================== === =======
IBM 298 (98.7%)
Google 3 (1.0%)
(Unknown) 1 (0.3%)
========================== === =======
Top lines changed by employer:
======================== ===== =======
Employer # %
======================== ===== =======
IBM 17396 (99.4%)
Google 73 (0.4%)
(Unknown) 24 (0.1%)
======================== ===== =======
Employers with the most signoffs (total 264):
======================== ===== =======
Employer # %
======================== ===== =======
IBM 264 (100.0%)
======================== ===== =======
Employers with the most hackers (total 33)
========================== === =======
Employer # %
========================== === =======
IBM 31 (93.9%)
Google 1 (3.0%)
(Unknown) 1 (3.0%)
========================== === =======
|