aboutsummaryrefslogtreecommitdiff
path: root/doc/release-notes/skiboot-6.3-rc3.html
blob: b22075ab2ba9b0e9483d2ea642d4efeb65e15658 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308

<!DOCTYPE html>

<html>
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

    <title>skiboot-6.3-rc3 &#8212; skiboot 16aac05
 documentation</title>
    <link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
    <link rel="stylesheet" type="text/css" href="../_static/classic.css" />
    
    <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
    <script src="../_static/jquery.js"></script>
    <script src="../_static/underscore.js"></script>
    <script src="../_static/doctools.js"></script>
    
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="skiboot-6.3.1" href="skiboot-6.3.1.html" />
    <link rel="prev" title="skiboot-6.3-rc2" href="skiboot-6.3-rc2.html" /> 
  </head><body>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li class="right" >
          <a href="skiboot-6.3.1.html" title="skiboot-6.3.1"
             accesskey="N">next</a> |</li>
        <li class="right" >
          <a href="skiboot-6.3-rc2.html" title="skiboot-6.3-rc2"
             accesskey="P">previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../index.html">skiboot 16aac05
 documentation</a> &#187;</li>
          <li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Release Notes</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">skiboot-6.3-rc3</a></li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">
          <div class="body" role="main">
            
  <section id="skiboot-6-3-rc3">
<span id="id1"></span><h1>skiboot-6.3-rc3<a class="headerlink" href="#skiboot-6-3-rc3" title="Permalink to this headline"></a></h1>
<p>skiboot v6.3-rc3 was released on Thursday May 2nd 2019. It is the third
release candidate of skiboot 6.3, which will become the new stable release
of skiboot following the 6.2 release, first released December 14th 2018.</p>
<p>Skiboot 6.3 will mark the basis for op-build v2.3. I expect to tag the final
skiboot 6.3 in the next week (I also predicted this last time, so take my
predictions with a large amount of sodium).</p>
<p>skiboot v6.3-rc3 contains all bug fixes as of <a class="reference internal" href="skiboot-6.0.19.html#skiboot-6-0-19"><span class="std std-ref">skiboot-6.0.19</span></a>,
and <a class="reference internal" href="skiboot-6.2.3.html#skiboot-6-2-3"><span class="std std-ref">skiboot-6.2.3</span></a> (the currently maintained
stable releases).</p>
<p>For how the skiboot stable releases work, see <a class="reference internal" href="../process/stable-skiboot-rules.html#stable-rules"><span class="std std-ref">Skiboot stable tree rules and releases</span></a> for details.</p>
<p>Over <a class="reference internal" href="skiboot-6.3-rc2.html#skiboot-6-3-rc2"><span class="std std-ref">skiboot-6.3-rc2</span></a>, we have the following changes:</p>
<ul>
<li><p>Expose PNOR Flash partitions to host MTD driver via devicetree</p>
<p>This makes it possible for the host to directly address each
partition without requiring each application to directly parse
the FFS headers.  This has been in use for some time already to
allow BOOTKERNFW partition updates from the host.</p>
<p>All partitions except BOOTKERNFW are marked readonly.</p>
<p>The BOOTKERNFW partition is currently exclusively used by the TalosII platform</p>
</li>
<li><p>Write boot progress to LPC port 80h</p>
<p>This is an adaptation of what we currently do for op_display() on FSP
machines, inventing an encoding for what we can write into the single
byte at LPC port 80h.</p>
<p>Port 80h is often used on x86 systems to indicate boot progress/status
and dates back a decent amount of time. Since a byte isn’t exactly very
expressive for everything that can go on (and wrong) during boot, it’s
all about compromise.</p>
<p>Some systems (such as Zaius/Barreleye G2) have a physical dual 7 segment
display that display these codes. So far, this has only been driven by
hostboot (see hostboot commit 90ec2e65314c).</p>
</li>
<li><p>Write boot progress to LPC ports 81 and 82</p>
<p>There’s a thought to write more extensive boot progress codes to LPC
ports 81 and 82 to supplement/replace any reliance on port 80.</p>
<p>We want to still emit port 80 for platforms like Zaius and Barreleye
that have the physical display. Ports 81 and 82 can be monitored by a
BMC though.</p>
</li>
<li><p>Copy and convert Romulus descriptors to Talos</p>
<p>Talos II has some hardware differences from Romulus, therefore
we cannot guarantee Talos II == Romulus in skiboot.  Copy and
slightly modify the Romulus files for Talos II.</p>
</li>
<li><p>npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default</p>
<p>V100 GPUs are known to violate NVLink2 protocol in some cases (one is when
memory was accessed by the CPU and they by GPU using so called block
linear mapping) and issue double probes to NPU which can cope with this
problem only if CONFIG_ENABLE_SNARF_CPM (“disable/enable Probe.I.MO
snarfing a cp_m”) is not set in the CQ_SM Misc Config register #0.
If the bit is set (which is the case today), NPU issues the machine
check stop.</p>
<p>The snarfing feature is designed to detect 2 probes in flight and combine
them into one.</p>
<p>This adds a new “opal-npu2-snarf-cpm” nvram variable which controls
CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check
stop from happening.</p>
<p>This disables snarfing by default as otherwise a broken GPU driver can
crash the entire box even when a GPU is passed through to a guest.
This provides a dial to allow regression tests (might be useful for
a bare metal). To enable snarfing, the user needs to run:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">sudo</span> <span class="n">nvram</span> <span class="o">-</span><span class="n">p</span> <span class="n">ibm</span><span class="p">,</span><span class="n">skiboot</span> <span class="o">--</span><span class="n">update</span><span class="o">-</span><span class="n">config</span> <span class="n">opal</span><span class="o">-</span><span class="n">npu2</span><span class="o">-</span><span class="n">snarf</span><span class="o">-</span><span class="n">cpm</span><span class="o">=</span><span class="n">enable</span>
</pre></div>
</div>
<p>and reboot the host system.</p>
</li>
<li><p>hw/npu2: Show name of opencapi error interrupts</p></li>
<li><p>core/pci: Use PHB io-base-location by default for PHB slots</p>
<p>On witherspoon only the GPU slots and the three pluggable PCI slots
(SLOT0, 1, 2) have platform defined slot names. For builtin devices such
as the SATA controller or the PLX switch that fans out to the GPU slots
we have no location codes which some people consider an issue.</p>
<p>This patch address the problem by making the ibm,slot-location-code for
the root port device default to the ibm,io-base-location-code which is
typically the location code for the system itself.</p>
<p>e.g.</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pciex</span><span class="o">@</span><span class="mi">600</span><span class="n">c3c0100000</span><span class="o">/</span><span class="n">ibm</span><span class="p">,</span><span class="n">loc</span><span class="o">-</span><span class="n">code</span>
                 <span class="s2">&quot;UOPWR.0000000-Node0-Proc0&quot;</span>

<span class="n">pciex</span><span class="o">@</span><span class="mi">600</span><span class="n">c3c0100000</span><span class="o">/</span><span class="n">pci</span><span class="o">@</span><span class="mi">0</span><span class="o">/</span><span class="n">ibm</span><span class="p">,</span><span class="n">loc</span><span class="o">-</span><span class="n">code</span>
                 <span class="s2">&quot;UOPWR.0000000-Node0-Proc0&quot;</span>

<span class="n">pciex</span><span class="o">@</span><span class="mi">600</span><span class="n">c3c0100000</span><span class="o">/</span><span class="n">pci</span><span class="o">@</span><span class="mi">0</span><span class="o">/</span><span class="n">usb</span><span class="o">-</span><span class="n">xhci</span><span class="o">@</span><span class="mi">0</span><span class="o">/</span><span class="n">ibm</span><span class="p">,</span><span class="n">loc</span><span class="o">-</span><span class="n">code</span>
                 <span class="s2">&quot;UOPWR.0000000-Node0&quot;</span>
</pre></div>
</div>
<p>The PHB node, and the root complex nodes have a loc code of the
processor they are attached to, while the usb-xhci device under the
root port has a location code of the system itself.</p>
</li>
<li><p>hw/phb4: Read ibm,loc-code from PBCQ node</p>
<p>On P9 the PBCQs are subdivided by stacks which implement the PCI Express
logic. When phb4 was forked from phb3 most of the properties that were
in the pbcq node moved into the stack node, but ibm,loc-code was not one
of them. This patch fixes the phb4 init sequence to read the base
location code from the PBCQ node (parent of the stack node) rather than
the stack node itself.</p>
</li>
<li><p>hw/xscom: add missing P9P chip name</p></li>
<li><p>asm/head: balance branches to avoid link stack predictor mispredicts</p>
<p>The Linux wrapper for OPAL call and return is arranged like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">__opal_call</span><span class="p">:</span>
    <span class="n">mflr</span>   <span class="n">r0</span>
    <span class="n">std</span>    <span class="n">r0</span><span class="p">,</span><span class="n">PPC_STK_LROFF</span><span class="p">(</span><span class="n">r1</span><span class="p">)</span>
    <span class="n">LOAD_REG_ADDR</span><span class="p">(</span><span class="n">r11</span><span class="p">,</span> <span class="n">opal_return</span><span class="p">)</span>
    <span class="n">mtlr</span>   <span class="n">r11</span>
    <span class="n">hrfid</span>  <span class="o">-&gt;</span> <span class="n">OPAL</span>

<span class="n">opal_return</span><span class="p">:</span>
    <span class="n">ld</span>     <span class="n">r0</span><span class="p">,</span><span class="n">PPC_STK_LROFF</span><span class="p">(</span><span class="n">r1</span><span class="p">)</span>
    <span class="n">mtlr</span>   <span class="n">r0</span>
    <span class="n">blr</span>
</pre></div>
</div>
<p>When skiboot returns to Linux, it branches to LR (i.e., opal_return)
with a blr. This unbalances the link stack predictor and will cause
mispredicts back up the return stack.</p>
</li>
<li><p>external/mambo: also invoke readline for the non-autorun case</p></li>
<li><p>asm/head.S: set POWER9 radix HID bit at entry</p>
<p>When running in virtual memory mode, the radix MMU hid bit should not
be changed, so set this in the initial boot SPR setup.</p>
<p>As a side effect, fast reboot also has HID0:RADIX bit set by the
shared spr init, so no need for an explicit call.</p>
</li>
<li><p>opal-prd: Fix memory leak in is-fsp-system check</p></li>
<li><p>opal-prd: Check malloc return value</p></li>
<li><p>hw/phb4: Squash the IO bridge window</p>
<p>The PCI-PCI bridge spec says that bridges that implement an IO window
should hardcode the IO base and limit registers to zero.
Unfortunately, these registers only define the upper bits of the IO
window and the low bits are assumed to be 0 for the base and 1 for the
limit address. As a result, setting both to zero can be mis-interpreted
as a 4K IO window.</p>
<p>This patch fixes the problem the same way PHB3 does. It sets the IO base
and limit values to 0xf000 and 0x1000 respectively which most software
interprets as a disabled window.</p>
<p>lspci before patch:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">0000</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">00.0</span> <span class="n">PCI</span> <span class="n">bridge</span><span class="p">:</span> <span class="n">IBM</span> <span class="n">Device</span> <span class="mi">04</span><span class="n">c1</span> <span class="p">(</span><span class="n">prog</span><span class="o">-</span><span class="k">if</span> <span class="mi">00</span> <span class="p">[</span><span class="n">Normal</span> <span class="n">decode</span><span class="p">])</span>
        <span class="n">I</span><span class="o">/</span><span class="n">O</span> <span class="n">behind</span> <span class="n">bridge</span><span class="p">:</span> <span class="mi">00000000</span><span class="o">-</span><span class="mi">00000</span><span class="n">fff</span>
</pre></div>
</div>
<p>lspci after patch:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mi">0000</span><span class="p">:</span><span class="mi">00</span><span class="p">:</span><span class="mf">00.0</span> <span class="n">PCI</span> <span class="n">bridge</span><span class="p">:</span> <span class="n">IBM</span> <span class="n">Device</span> <span class="mi">04</span><span class="n">c1</span> <span class="p">(</span><span class="n">prog</span><span class="o">-</span><span class="k">if</span> <span class="mi">00</span> <span class="p">[</span><span class="n">Normal</span> <span class="n">decode</span><span class="p">])</span>
        <span class="n">I</span><span class="o">/</span><span class="n">O</span> <span class="n">behind</span> <span class="n">bridge</span><span class="p">:</span> <span class="kc">None</span>
</pre></div>
</div>
</li>
<li><p>build: link with –orphan-handling=warn</p>
<p>The linker can warn when the linker script does not explicitly place
all sections. These orphan sections are placed according to
heuristics, which may not always be desirable. Enable this warning.</p>
</li>
<li><p>build: -fno-asynchronous-unwind-tables</p>
<p>skiboot does not use unwind tables, this option saves about 100kB,
mostly from .text.</p>
</li>
<li><p>hw/xscom: Enable sw xstop by default on p9</p>
<p>This was disabled at some point during bringup to make life easier for
the lab folks trying to debug NVLink issues. This hack really should
have never made it out into the wild though, so we now have the
following situation occuring in the field:</p>
<ol class="arabic simple">
<li><p>A bad happens</p></li>
<li><p>The host kernel recieves an unrecoverable HMI and calls into OPAL to
request a platform reboot.</p></li>
<li><p>OPAL rejects the reboot attempt and returns to the kernel with
OPAL_PARAMETER.</p></li>
<li><p>Kernel panics and attempts to kexec into a kdump kernel.</p></li>
</ol>
<p>A side effect of the HMI seems to be CPUs becoming stuck which results
in the initialisation of the kdump kernel taking a extremely long time
(6+ hours). It’s also been observed that after performing a dump the
kdump kernel then crashes itself because OPAL has ended up in a bad
state as a side effect of the HMI.</p>
<p>All up, it’s not very good so re-enable the software checkstop by
default. If people still want to turn it off they can using the nvram
override.</p>
</li>
<li><p>opal/hmi: Initialize the hmi event with old value of TFMR.</p>
<p>Do this before we fix TFAC errors. Otherwise the event at host console
shows no thread error reported in TFMR register.</p>
<p>Without this patch the console event show TFMR with no thread error:
(DEC parity error TFMR[59] injection)</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span>   <span class="mf">53.737572</span><span class="p">]</span> <span class="n">Severe</span> <span class="n">Hypervisor</span> <span class="n">Maintenance</span> <span class="n">interrupt</span> <span class="p">[</span><span class="n">Recovered</span><span class="p">]</span>
<span class="p">[</span>   <span class="mf">53.737596</span><span class="p">]</span>  <span class="n">Error</span> <span class="n">detail</span><span class="p">:</span> <span class="n">Timer</span> <span class="n">facility</span> <span class="n">experienced</span> <span class="n">an</span> <span class="n">error</span>
<span class="p">[</span>   <span class="mf">53.737611</span><span class="p">]</span>  <span class="n">HMER</span><span class="p">:</span> <span class="mi">0840000000000000</span>
<span class="p">[</span>   <span class="mf">53.737621</span><span class="p">]</span>  <span class="n">TFMR</span><span class="p">:</span> <span class="mf">3212000870e04000</span>
</pre></div>
</div>
<p>After this patch it shows old TFMR value on host console:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">2302.267271</span><span class="p">]</span> <span class="n">Severe</span> <span class="n">Hypervisor</span> <span class="n">Maintenance</span> <span class="n">interrupt</span> <span class="p">[</span><span class="n">Recovered</span><span class="p">]</span>
<span class="p">[</span> <span class="mf">2302.267305</span><span class="p">]</span>  <span class="n">Error</span> <span class="n">detail</span><span class="p">:</span> <span class="n">Timer</span> <span class="n">facility</span> <span class="n">experienced</span> <span class="n">an</span> <span class="n">error</span>
<span class="p">[</span> <span class="mf">2302.267320</span><span class="p">]</span>  <span class="n">HMER</span><span class="p">:</span> <span class="mi">0840000000000000</span>
<span class="p">[</span> <span class="mf">2302.267330</span><span class="p">]</span>  <span class="n">TFMR</span><span class="p">:</span> <span class="mf">3212000870e14010</span>
</pre></div>
</div>
</li>
</ul>
</section>


            <div class="clearer"></div>
          </div>
        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper">
  <h4>Previous topic</h4>
  <p class="topless"><a href="skiboot-6.3-rc2.html"
                        title="previous chapter">skiboot-6.3-rc2</a></p>
  <h4>Next topic</h4>
  <p class="topless"><a href="skiboot-6.3.1.html"
                        title="next chapter">skiboot-6.3.1</a></p>
  <div role="note" aria-label="source link">
    <h3>This Page</h3>
    <ul class="this-page-menu">
      <li><a href="../_sources/release-notes/skiboot-6.3-rc3.rst.txt"
            rel="nofollow">Show Source</a></li>
    </ul>
   </div>
<div id="searchbox" style="display: none" role="search">
  <h3 id="searchlabel">Quick search</h3>
    <div class="searchformwrapper">
    <form class="search" action="../search.html" method="get">
      <input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
      <input type="submit" value="Go" />
    </form>
    </div>
</div>
<script>$('#searchbox').show(0);</script>
        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related" role="navigation" aria-label="related navigation">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="../genindex.html" title="General Index"
             >index</a></li>
        <li class="right" >
          <a href="skiboot-6.3.1.html" title="skiboot-6.3.1"
             >next</a> |</li>
        <li class="right" >
          <a href="skiboot-6.3-rc2.html" title="skiboot-6.3-rc2"
             >previous</a> |</li>
        <li class="nav-item nav-item-0"><a href="../index.html">skiboot 16aac05
 documentation</a> &#187;</li>
          <li class="nav-item nav-item-1"><a href="index.html" >Release Notes</a> &#187;</li>
        <li class="nav-item nav-item-this"><a href="">skiboot-6.3-rc3</a></li> 
      </ul>
    </div>
    <div class="footer" role="contentinfo">
        &#169; Copyright 2016-2017, IBM, others.
      Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 4.3.2.
    </div>
  </body>
</html>