skiboot v6.3-rc3 release notesv6.3-rc3

Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
author: Stewart Smith <stewart@linux.ibm.com> 2019-05-02 18:29:58 +1000
committer: Stewart Smith <stewart@linux.ibm.com> 2019-05-02 18:29:58 +1000
commit: 588c39adb1ec6dba11bb0d256f13103e2ff79fbb (patch)
tree: 7646958863ad9173f4390f3efc995ceee012fa25 /doc
parent: 0f42d72abdf7e1018fade2758d20d05a0a88947c (diff)
download: skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.zip
skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.tar.gz
skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.tar.bz2
1 files changed, 228 insertions, 0 deletions
diff --git a/doc/release-notes/skiboot-6.3-rc3.rst b/doc/release-notes/skiboot-6.3-rc3.rst
new file mode 100644
index 0000000..6591e27
--- /dev/null
+++ b/doc/release-notes/skiboot-6.3-rc3.rst
@@ -0,0 +1,228 @@
+.. _skiboot-6.3-rc3:
+
+skiboot-6.3-rc3
+===============
+
+skiboot v6.3-rc3 was released on Thursday May 2nd 2019. It is the third
+release candidate of skiboot 6.3, which will become the new stable release
+of skiboot following the 6.2 release, first released December 14th 2018.
+
+Skiboot 6.3 will mark the basis for op-build v2.3. I expect to tag the final
+skiboot 6.3 in the next week (I also predicted this last time, so take my
+predictions with a large amount of sodium).
+
+skiboot v6.3-rc3 contains all bug fixes as of :ref:`skiboot-6.0.19`,
+and :ref:`skiboot-6.2.3` (the currently maintained
+stable releases).
+
+For how the skiboot stable releases work, see :ref:`stable-rules` for details.
+
+Over :ref:`skiboot-6.3-rc2`, we have the following changes:
+
+
+- Expose PNOR Flash partitions to host MTD driver via devicetree
+
+  This makes it possible for the host to directly address each
+  partition without requiring each application to directly parse
+  the FFS headers.  This has been in use for some time already to
+  allow BOOTKERNFW partition updates from the host.
+
+  All partitions except BOOTKERNFW are marked readonly.
+
+  The BOOTKERNFW partition is currently exclusively used by the TalosII platform
+
+- Write boot progress to LPC port 80h
+
+  This is an adaptation of what we currently do for op_display() on FSP
+  machines, inventing an encoding for what we can write into the single
+  byte at LPC port 80h.
+
+  Port 80h is often used on x86 systems to indicate boot progress/status
+  and dates back a decent amount of time. Since a byte isn't exactly very
+  expressive for everything that can go on (and wrong) during boot, it's
+  all about compromise.
+
+  Some systems (such as Zaius/Barreleye G2) have a physical dual 7 segment
+  display that display these codes. So far, this has only been driven by
+  hostboot (see hostboot commit 90ec2e65314c).
+
+- Write boot progress to LPC ports 81 and 82
+
+  There's a thought to write more extensive boot progress codes to LPC
+  ports 81 and 82 to supplement/replace any reliance on port 80.
+
+  We want to still emit port 80 for platforms like Zaius and Barreleye
+  that have the physical display. Ports 81 and 82 can be monitored by a
+  BMC though.
+
+- Copy and convert Romulus descriptors to Talos
+
+  Talos II has some hardware differences from Romulus, therefore
+  we cannot guarantee Talos II == Romulus in skiboot.  Copy and
+  slightly modify the Romulus files for Talos II.
+
+- npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default
+
+  V100 GPUs are known to violate NVLink2 protocol in some cases (one is when
+  memory was accessed by the CPU and they by GPU using so called block
+  linear mapping) and issue double probes to NPU which can cope with this
+  problem only if CONFIG_ENABLE_SNARF_CPM ("disable/enable Probe.I.MO
+  snarfing a cp_m") is not set in the CQ_SM Misc Config register #0.
+  If the bit is set (which is the case today), NPU issues the machine
+  check stop.
+
+  The snarfing feature is designed to detect 2 probes in flight and combine
+  them into one.
+
+  This adds a new "opal-npu2-snarf-cpm" nvram variable which controls
+  CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check
+  stop from happening.
+
+  This disables snarfing by default as otherwise a broken GPU driver can
+  crash the entire box even when a GPU is passed through to a guest.
+  This provides a dial to allow regression tests (might be useful for
+  a bare metal). To enable snarfing, the user needs to run: ::
+
+    sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable
+
+  and reboot the host system.
+
+- hw/npu2: Show name of opencapi error interrupts
+- core/pci: Use PHB io-base-location by default for PHB slots
+
+  On witherspoon only the GPU slots and the three pluggable PCI slots
+  (SLOT0, 1, 2) have platform defined slot names. For builtin devices such
+  as the SATA controller or the PLX switch that fans out to the GPU slots
+  we have no location codes which some people consider an issue.
+
+  This patch address the problem by making the ibm,slot-location-code for
+  the root port device default to the ibm,io-base-location-code which is
+  typically the location code for the system itself.
+
+  e.g. ::
+
+    pciex@600c3c0100000/ibm,loc-code
+                     "UOPWR.0000000-Node0-Proc0"
+
+    pciex@600c3c0100000/pci@0/ibm,loc-code
+                     "UOPWR.0000000-Node0-Proc0"
+
+    pciex@600c3c0100000/pci@0/usb-xhci@0/ibm,loc-code
+                     "UOPWR.0000000-Node0"
+
+  The PHB node, and the root complex nodes have a loc code of the
+  processor they are attached to, while the usb-xhci device under the
+  root port has a location code of the system itself.
+
+- hw/phb4: Read ibm,loc-code from PBCQ node
+
+  On P9 the PBCQs are subdivided by stacks which implement the PCI Express
+  logic. When phb4 was forked from phb3 most of the properties that were
+  in the pbcq node moved into the stack node, but ibm,loc-code was not one
+  of them. This patch fixes the phb4 init sequence to read the base
+  location code from the PBCQ node (parent of the stack node) rather than
+  the stack node itself.
+- hw/xscom: add missing P9P chip name
+- asm/head: balance branches to avoid link stack predictor mispredicts
+
+  The Linux wrapper for OPAL call and return is arranged like this: ::
+
+      __opal_call:
+          mflr   r0
+          std    r0,PPC_STK_LROFF(r1)
+          LOAD_REG_ADDR(r11, opal_return)
+          mtlr   r11
+          hrfid  -> OPAL
+
+      opal_return:
+          ld     r0,PPC_STK_LROFF(r1)
+          mtlr   r0
+          blr
+
+  When skiboot returns to Linux, it branches to LR (i.e., opal_return)
+  with a blr. This unbalances the link stack predictor and will cause
+  mispredicts back up the return stack.
+- external/mambo: also invoke readline for the non-autorun case
+- asm/head.S: set POWER9 radix HID bit at entry
+
+  When running in virtual memory mode, the radix MMU hid bit should not
+  be changed, so set this in the initial boot SPR setup.
+
+  As a side effect, fast reboot also has HID0:RADIX bit set by the
+  shared spr init, so no need for an explicit call.
+- opal-prd: Fix memory leak in is-fsp-system check
+- opal-prd: Check malloc return value
+- hw/phb4: Squash the IO bridge window
+
+  The PCI-PCI bridge spec says that bridges that implement an IO window
+  should hardcode the IO base and limit registers to zero.
+  Unfortunately, these registers only define the upper bits of the IO
+  window and the low bits are assumed to be 0 for the base and 1 for the
+  limit address. As a result, setting both to zero can be mis-interpreted
+  as a 4K IO window.
+
+  This patch fixes the problem the same way PHB3 does. It sets the IO base
+  and limit values to 0xf000 and 0x1000 respectively which most software
+  interprets as a disabled window.
+
+  lspci before patch: ::
+
+    0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
+            I/O behind bridge: 00000000-00000fff
+
+  lspci after patch: ::
+
+    0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
+            I/O behind bridge: None
+
+- build: link with --orphan-handling=warn
+
+  The linker can warn when the linker script does not explicitly place
+  all sections. These orphan sections are placed according to
+  heuristics, which may not always be desirable. Enable this warning.
+- build: -fno-asynchronous-unwind-tables
+
+  skiboot does not use unwind tables, this option saves about 100kB,
+  mostly from .text.
+- hw/xscom: Enable sw xstop by default on p9
+
+  This was disabled at some point during bringup to make life easier for
+  the lab folks trying to debug NVLink issues. This hack really should
+  have never made it out into the wild though, so we now have the
+  following situation occuring in the field:
+
+  1) A bad happens
+  2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
+     request a platform reboot.
+  3) OPAL rejects the reboot attempt and returns to the kernel with
+     OPAL_PARAMETER.
+  4) Kernel panics and attempts to kexec into a kdump kernel.
+
+  A side effect of the HMI seems to be CPUs becoming stuck which results
+  in the initialisation of the kdump kernel taking a extremely long time
+  (6+ hours). It's also been observed that after performing a dump the
+  kdump kernel then crashes itself because OPAL has ended up in a bad
+  state as a side effect of the HMI.
+
+  All up, it's not very good so re-enable the software checkstop by
+  default. If people still want to turn it off they can using the nvram
+  override.
+- opal/hmi: Initialize the hmi event with old value of TFMR.
+
+  Do this before we fix TFAC errors. Otherwise the event at host console
+  shows no thread error reported in TFMR register.
+
+  Without this patch the console event show TFMR with no thread error:
+  (DEC parity error TFMR[59] injection) ::
+
+    [   53.737572] Severe Hypervisor Maintenance interrupt [Recovered]
+    [   53.737596]  Error detail: Timer facility experienced an error
+    [   53.737611]  HMER: 0840000000000000
+    [   53.737621]  TFMR: 3212000870e04000
+
+  After this patch it shows old TFMR value on host console: ::
+
+    [ 2302.267271] Severe Hypervisor Maintenance interrupt [Recovered]
+    [ 2302.267305]  Error detail: Timer facility experienced an error
+    [ 2302.267320]  HMER: 0840000000000000
+    [ 2302.267330]  TFMR: 3212000870e14010
author	Stewart Smith <stewart@linux.ibm.com>	2019-05-02 18:29:58 +1000
committer	Stewart Smith <stewart@linux.ibm.com>	2019-05-02 18:29:58 +1000
commit	588c39adb1ec6dba11bb0d256f13103e2ff79fbb (patch)
tree	7646958863ad9173f4390f3efc995ceee012fa25 /doc
parent	0f42d72abdf7e1018fade2758d20d05a0a88947c (diff)
download	skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.zip skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.tar.gz skiboot-588c39adb1ec6dba11bb0d256f13103e2ff79fbb.tar.bz2