aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorStewart Smith <stewart@linux.vnet.ibm.com>2018-03-28 15:25:31 +1100
committerStewart Smith <stewart@linux.vnet.ibm.com>2018-03-28 15:25:31 +1100
commita6c62330f8b08032434f7cf9587ac5bfb79ffe91 (patch)
treec1cc1206995b94711c76ca6adfb586780b6eb501
parent54770a87cd4155abc670e322b8309e1acd9b4e4e (diff)
downloadskiboot-a6c62330f8b08032434f7cf9587ac5bfb79ffe91.zip
skiboot-a6c62330f8b08032434f7cf9587ac5bfb79ffe91.tar.gz
skiboot-a6c62330f8b08032434f7cf9587ac5bfb79ffe91.tar.bz2
skiboot 5.10.3 release notesv5.10.3
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
-rw-r--r--doc/release-notes/skiboot-5.10.3.rst82
1 files changed, 82 insertions, 0 deletions
diff --git a/doc/release-notes/skiboot-5.10.3.rst b/doc/release-notes/skiboot-5.10.3.rst
new file mode 100644
index 0000000..0dc87ed
--- /dev/null
+++ b/doc/release-notes/skiboot-5.10.3.rst
@@ -0,0 +1,82 @@
+.. _skiboot-5.10.3:
+
+==============
+skiboot-5.10.3
+==============
+
+skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces
+:ref:`skiboot-5.10.2` as the current stable release in the 5.10.x series.
+
+It is recommended that 5.10.3 be used instead of any previous 5.10.x version
+due to the bug fixes and debugging enhancements in it.
+
+Over :ref:`skiboot-5.10.2`, we have a few improvements and bug fixes:
+
+- NPU2: dump NPU2 registers on npu2 HMI
+
+ Due to the nature of debugging npu2 issues, folk are wanting the
+ full list of NPU2 registers dumped when there's a problem.
+
+ This is different than the solution introduced in 5.10.1
+ as there we would dump the registers in a way that would trigger a FIR
+ bit that would confuse PRD.
+- npu2: Add performance tuning SCOM inits
+
+ Peer-to-peer GPU bandwidth latency testing has produced some tunable
+ values that improve performance. Add them to our device initialization.
+
+ File these under things that need to be cleaned up with nice #defines
+ for the register names and bitfields when we get time.
+
+ A few of the settings are dependent on the system's particular NVLink
+ topology, so introduce a helper to determine how many links go to a
+ single GPU.
+- hw/npu2: Assign a unique LPARSHORTID per GPU
+
+ This gets used elsewhere to index items in the XTS tables.
+- occ: Set up OCC messaging even if we fail to setup pstates
+
+ This means that we no longer hit this bug if we fail to get valid pstates
+ from the OCC. ::
+
+ [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
+ echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
+ [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
+ [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
+ [ 10.318805] Disabling lock debugging due to kernel taint
+ [ 10.318808] Severe Machine check interrupt [Not recovered]
+ [ 10.318812] NIP [000000003003e434]: 0x3003e434
+ [ 10.318813] Initiator: CPU
+ [ 10.318815] Error type: Real address [Load/Store (foreign)]
+ [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
+ [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
+ [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
+ [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
+ [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
+ [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
+- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors
+
+ This disables fast reboot in several more cases where serious errors
+ like lock corruption or call re-entrancy are detected.
+- core/opal: allow some re-entrant calls
+
+ This allows a small number of OPAL calls to succeed despite re-entering
+ the firmware, and rejects others rather than aborting.
+
+ This allows a system reset interrupt that interrupts OPAL to do something
+ useful. Sreset other CPUs, use the console, which allows xmon to work or
+ stack traces to be printed, reboot the system.
+
+ Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is
+ used for many other things that does not mean a serious permanent error.
+- core/opal: abort in case of re-entrant OPAL call
+
+ The stack is already destroyed by the time we get here, so there
+ is not much point continuing.
+- npu2: Disable fast reboot
+
+ Fast reboot does not yet work right with the NPU. It's been disabled on
+ NVLink and OpenCAPI machines. Do the same for NVLink2.
+
+ This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset")
+ from the npu code to npu2.