diff options
author | Stewart Smith <stewart@linux.ibm.com> | 2018-04-06 09:38:49 +1000 |
---|---|---|
committer | Stewart Smith <stewart@linux.ibm.com> | 2018-04-06 09:38:49 +1000 |
commit | 6c53bb6db7f6999bef9d352b659c561c8208c83f (patch) | |
tree | f0799deb801f36aa3d69f053a56bd3c474eb46b2 /doc | |
parent | e0c7c89b748312244c1b034b8b5279131add20bc (diff) | |
download | skiboot-6c53bb6db7f6999bef9d352b659c561c8208c83f.zip skiboot-6c53bb6db7f6999bef9d352b659c561c8208c83f.tar.gz skiboot-6c53bb6db7f6999bef9d352b659c561c8208c83f.tar.bz2 |
skiboot-5.11 release notesv5.11
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/release-notes/skiboot-5.11.rst | 828 |
1 files changed, 828 insertions, 0 deletions
diff --git a/doc/release-notes/skiboot-5.11.rst b/doc/release-notes/skiboot-5.11.rst new file mode 100644 index 0000000..53eb9ba --- /dev/null +++ b/doc/release-notes/skiboot-5.11.rst @@ -0,0 +1,828 @@ +.. _skiboot-5.11: + +skiboot-5.11 +============ + +skiboot v5.11 was released on Friday April 6th 2018. It is the first +release of skiboot 5.11, which is now the new stable release +of skiboot following the 5.10 release, first released February 23rd 2018. + +It is *not* expected to keep the 5.11 branch around for long, and instead +quickly move onto a 6.0, which will mark the basis for op-build v2.0 and +will be required for POWER9 systems. + +It is expected that skiboot 6.0 will follow very shortly. Consider 5.11 +more of a beta release to 6.0 than anything. For POWER9 systems it should +certainly be more solid than previous releases though. + +skiboot v5.11 contains all bug fixes as of :ref:`skiboot-5.10.4` +and :ref:`skiboot-5.4.9` (the currently maintained stable releases). There +may be more 5.10.x stable releases, it will depend on demand. + +For how the skiboot stable releases work, see :ref:`stable-rules` for details. + +Over skiboot-5.10, we have the following changes: + +New Platforms +------------- + +- Add VESNIN platform support + + The Vesnin platform from YADRO is a 4 socked POWER8 system with up to 8TB + of memory with 460GB/s of memory bandwidth in only 2U. Many kudos to the + team from Yadro for submitting their code upstream! + +New Features +------------ + +- fast-reboot: enable by default for POWER9 + + - Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is used + +- PCI tunneled operations on PHB4 + + - phb4: set PBCQ Tunnel BAR for tunneled operations + + P9 supports PCI tunneled operations (atomics and as_notify) that are + initiated by devices. + + A subset of the tunneled operations require a response, that must be + sent back from the host to the device. For example, an atomic compare + and swap will return the compare status, as swap will only performed + in case of success. Similarly, as_notify reports if the target thread + has been woken up or not, because the operation may fail. + + To enable tunneled operations, a device driver must tell the host where + it expects tunneled operation responses, by setting the PBCQ Tunnel BAR + Response register with a specific value within the range of its BARs. + + This register is currently initialized by enable_capi_mode(). But, as + tunneled operations may also operate in PCI mode, a new API is required + to set the PBCQ Tunnel BAR Response register, without switching to CAPI + mode. + + This patch provides two new OPAL calls to get/set the PBCQ Tunnel + BAR Response register. + + Note: as there is only one PBCQ Tunnel BAR register, shared between + all the devices connected to the same PHB, only one of these devices + will be able to use tunneled operations, at any time. + - phb4: set PHB CMPM registers for tunneled operations + + P9 supports PCI tunneled operations (atomics and as_notify) that require + setting the PHB ASN Compare/Mask register with a 16-bit indication. + + This register is currently initialized by enable_capi_mode(). But, as + tunneled operations may also work in PCI mode, the ASN Compare/Mask + register should rather be initialized in phb4_init_ioda3(). + + This patch also adds "ibm,phb-indications" to the device tree, to tell + Linux the values of CAPI, ASN, and NBW indications, when supported. + + Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies + in PCI mode. + +- Tie tm-suspend fw-feature and opal_reinit_cpus() together + + Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) + always returns OPAL_UNSUPPORTED. + + This ties the tm suspend fw-feature to the + opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm + suspend is disabled, we correctly report it to the kernel. For + backwards compatibility, it's assumed tm suspend is available if the + fw-feature is not present. + + Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N + DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and + below has TM disabled completely (not just suspend). + + We are using opal_reinit_cpus() to determine this setting (rather than + the device tree/HDAT) as some future firmware may let us change this + dynamically after boot. That is not the case currently though. + +Power Management +---------------- + +- SLW: Increase stop4-5 residency by 10x + + Using DGEMM benchmark we observed there was a drop of 5-9% throughput with + and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup + and provide the subsequent data block to compute. The wakup latency + accumulates over the run and shows up as a performance drop. + + Linux enters stop4/5 more aggressively for its wakeup latency. Increasing + the residency from 1ms to 10ms makes the performance drop <1% +- occ: Set up OCC messaging even if we fail to setup pstates + + This means that we no longer hit this bug if we fail to get valid pstates + from the OCC. :: + + [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear + echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear + [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 + [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 + [ 10.318805] Disabling lock debugging due to kernel taint + [ 10.318808] Severe Machine check interrupt [Not recovered] + [ 10.318812] NIP [000000003003e434]: 0x3003e434 + [ 10.318813] Initiator: CPU + [ 10.318815] Error type: Real address [Load/Store (foreign)] + [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception + [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 + [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 + [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) + [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 + [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 + + +mbox based platforms +^^^^^^^^^^^^^^^^^^^^ + +For platforms using the mbox protocol for host flash access (all BMC based +OpenPOWER systems, most OpenBMC based systems) there have been some hardening +efforts in the event of the BMC being poorly behaved. + +- mbox: Reduce default BMC timeouts + + Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for + 70 seconds waiting for a BMC to come back. This also makes the current + default of 30 seconds a bit pointless, is it far too short to be a + worse case wait time but too long to avoid hitting hardlockup detectors + and wrecking havoc inside host linux. + + Just change it to three seconds so that host linux will survive and + that, reads and writes will fail but at least the host stays up. + + Also refactored the waiting loop just a bit so that it's easier to read. +- mbox: Harden against BMC daemon errors + + Bugs present in the BMC daemon mean that skiboot gets presented with + mbox windows of size zero. These windows cannot be valid and skiboot + already detects these conditions. + + Currently skiboot warns quite strongly about the occurrence of these + problems. The problem for skiboot is that it doesn't take any action. + Initially I wanting to avoid putting policy like this into skiboot but + since these bugs aren't going away and skiboot barfing is leading to + lockups and ultimately the host going down something needs to be done. + + I propose that when we detect the problem we fail the mbox call and punt + the problem back up to Linux. I don't like it but at least it will cause + errors to cascade and won't bring the host down. I'm not sure how Linux + is supposed to detect this or what it can even do but this is better + than a crash. + + Diagnosing a failure to boot if skiboot its self fails to read flash may + be marginally more difficult with this patch. This is because skiboot + will now only print one warning about the zero sized window rather than + continuously spitting it out. + +Fast Reboot Improvements +------------------------ + +Around fast-reboot we have made several improvements to harden the fast +reboot code paths and resort to a full IPL if something doesn't look right. + +- core/fast-reboot: zero memory after fast reboot + + This improves the security and predictability of the fast reboot + environment. + + There can not be a secure fence between fast reboots, because a + malicious OS can modify the firmware itself. However a well-behaved + OS can have a reasonable expectation that OS memory regions it has + modified will be cleared upon fast reboot. + + The memory is zeroed after all other CPUs come up from fast reboot, + just before the new kernel is loaded and booted into. This allows + image preloading to run concurrently, and will allow parallelisation + of the clearing in future. +- core/fast-reboot: verify mem regions before fast reboot + + Run the mem_region sanity checkers before proceeding with fast + reboot. + + This is the beginning of proactive sanity checks on opal data + for fast reboot (with complements the reactive disable_fast_reboot + cases). This is encouraged to re-use and share any kind of debug + code and unit test code. +- fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they exist +- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors + + This disables fast reboot in several more cases where serious errors + like lock corruption or call re-entrancy are detected. +- capp: Disable fast-reboot whenever enable_capi_mode() is called + + This patch updates phb4_set_capi_mode() to disable fast-reboot + whenever enable_capi_mode() is called, irrespective to its return + value. This should prevent against a possibility of not disabling + fast-reboot when some changes to enable_capi_mode() causing return of + an error and leaving CAPP in enabled mode. +- fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt + + Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC + nodes which are already present in the /ibm,opal/power-mgt node. These + per-chip nodes hold the voltage IDs for each pstate and these can be + changed on OCC pstate table biasing. So delete these before calling + the re-init code to re-parse and populate the pstate data. + +Debugging/SRESET improvemens +---------------------------- + +Since :ref:`skiboot-5.11-rc1`: + +- core/cpu: Prevent clobbering of stack guard for boot-cpu + + Commit 90d53934c2da ("core/cpu: discover stack region size before + initialising memory regions") introduced memzero for struct cpu_thread + in init_cpu_thread(). This has an unintended side effect of clobbering + the stack-guard cannery of the boot_cpu stack. This results in opal + failing to init with this failure message: :: + + CPU: P9 generation processor (max 4 threads/core) + CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200 + Guard skip = 0 + Stack corruption detected ! + Aborting! + CPU 0004 Backtrace: + S: 0000000031c13ab0 R: 0000000030013b0c .backtrace+0x5c + S: 0000000031c13b50 R: 000000003001bd18 ._abort+0x60 + S: 0000000031c13be0 R: 0000000030013bbc .__stack_chk_fail+0x54 + S: 0000000031c13c60 R: 00000000300c5b70 .memset+0x12c + S: 0000000031c13d00 R: 0000000030019aa8 .init_cpu_thread+0x40 + S: 0000000031c13d90 R: 000000003001b520 .init_boot_cpu+0x188 + S: 0000000031c13e30 R: 0000000030015050 .main_cpu_entry+0xd0 + S: 0000000031c13f00 R: 0000000030002700 boot_entry+0x1c0 + + So the patch provides a fix by tweaking the memset() call in + init_cpu_thread() to skip over the stack-guard cannery. +- core/lock.c: ensure valid start value for lock spin duration warning + + The previous fix in a8e6cc3f4 only addressed half of the problem, as + we could also get an invalid value for start, causing us to fail + in a weird way. + + This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS + test in op-test-framework. + + You'd get to this part of the test and get the erroneous lock + spinning warnings: :: + + PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 + 0000080000000000 + [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms + [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms + [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms + + This patch checks the validity of timebase before setting start, + and only checks the lock timeout if we got a valid start value. + + +Since :ref:`skiboot-5.10`: + +- core/opal: allow some re-entrant calls + + This allows a small number of OPAL calls to succeed despite re-entering + the firmware, and rejects others rather than aborting. + + This allows a system reset interrupt that interrupts OPAL to do something + useful. Sreset other CPUs, use the console, which allows xmon to work or + stack traces to be printed, reboot the system. + + Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is + used for many other things that does not mean a serious permanent error. +- core/opal: abort in case of re-entrant OPAL call + + The stack is already destroyed by the time we get here, so there + is not much point continuing. +- core/lock: Add lock timeout warnings + + There are currently no timeout warnings for locks in skiboot. We assume + that the lock will eventually become free, which may not always be the + case. + + This patch adds timeout warnings for locks. Any lock which spins for more + than 5 seconds will throw a warning and stacktrace for that thread. This is + useful for debugging siturations where a lock which hang, waiting for the + lock to be freed. +- core/lock: Add deadlock detection + + This adds simple deadlock detection. The detection looks for circular + dependencies in the lock requests. It will abort and display a stack trace + when a deadlock occurs. + The detection is enabled by DEBUG_LOCKS (enabled by default). + While the detection may have a slight performance overhead, as there are + not a huge number of locks in skiboot this overhead isn't significant. +- core/hmi: report processor recovery reason from core FIR bits on P9 + + When an error is encountered that causes processor recovery, HMI is + generated if the recovery was successful. The reason is recorded in + the core FIR, which gets copied into the WOF. + + In this case dump the WOF register and an error string into the OPAL + msglog. + + A broken init setting led to HMIs reported in Linux as: :: + + [ 3.591547] Harmless Hypervisor Maintenance interrupt [Recovered] + [ 3.591648] Error detail: Processor Recovery done + [ 3.591714] HMER: 2040000000000000 + + This patch would have been useful because it tells us exactly that + the problem is in the d-side ERAT: :: + + [ 414.489690798,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000 + [ 414.489693339,7] HMI: [Loc: UOPWR.0000000-Node0-Proc0]: P:0 C:1 T:1: Processor recovery occurred. + [ 414.489699837,7] HMI: Core WOF = 0x0000000410000000 recovered error: + [ 414.489701543,7] HMI: LSU - SRAM (DCACHE parity, etc) + [ 414.489702341,7] HMI: LSU - ERAT multi hit + + In future it will be good to unify this reporting, so Linux could + print something more useful. Until then, this gives some good data. + +NPU2/NVLink2 Fixes +------------------ +- npu2: Add performance tuning SCOM inits + + Peer-to-peer GPU bandwidth latency testing has produced some tunable + values that improve performance. Add them to our device initialization. + + File these under things that need to be cleaned up with nice #defines + for the register names and bitfields when we get time. + + A few of the settings are dependent on the system's particular NVLink + topology, so introduce a helper to determine how many links go to a + single GPU. +- hw/npu2: Assign a unique LPARSHORTID per GPU + + This gets used elsewhere to index items in the XTS tables. +- NPU2: dump NPU2 registers on npu2 HMI + + Due to the nature of debugging npu2 issues, folk are wanting the + full list of NPU2 registers dumped when there's a problem. +- npu2: Remove DD1 support + + Major changes in the NPU between DD1 and DD2 necessitated a fair bit of + revision-specific code. + + Now that all our lab machines are DD2, we no longer test anything on DD1 + and it's time to get rid of it. + + Remove DD1-specific code and abort probe if we're running on a DD1 machine. +- npu2: Disable fast reboot + + Fast reboot does not yet work right with the NPU. It's been disabled on + NVLink and OpenCAPI machines. Do the same for NVLink2. + + This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") + from the npu code to npu2. +- npu2: Use unfiltered mode in XTS tables + + The XTS_PID context table is limited to 256 possible pids/contexts. To + relieve this limitation, make use of "unfiltered mode" instead. + + If an entry in the XTS_BDF table has the bit for unfiltered mode set, we + can just use one context for that entire bdf/lpar, regardless of pid. + Instead of of searching the XTS_PID table, the NMMU checkout request + will simply use the entry indexed by lparshort id instead. + + Change opal_npu_init_context() to create these lparshort-indexed + wildcard entries (0-15) instead of allocating one for each pid. Check + that multiple calls for the same bdf all specify the same msr value. + + In opal_npu_destroy_context(), continue validating the bdf argument, + ensuring that it actually maps to an lpar, but no longer remove anything + from the XTS_PID table. If/when we start supporting virtualized GPUs, we + might consider actually removing these wildcard entries by keeping a + refcount, but keep things simple for now. + +CAPI/OpenCAPI +------------- + +Since :ref:`skiboot-5.11-rc1`: + +- capi: Poll Err/Status register during CAPP recovery + + This patch updates do_capp_recovery_scoms() to poll the CAPP + Err/Status control register, check for CAPP-Recovery to complete/fail + based on indications of BITS-1,5,9 and then proceed with the + CAPP-Recovery scoms iif recovery completed successfully. This would + prevent cases where we bring-up the PCIe link while recovery sequencer + on CAPP is still busy with casting out cache lines. + + In case CAPP-Recovery didn't complete successfully an error is returned + from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 + fenced and mark it as broken. + + The loop that implements polling of Err/Status register will also log + an error on the PHB when it continues for more than 168ms which is the + max time to failure for CAPP-Recovery. + +Since :ref:`skiboot-5.10`: + +- npu2-opencapi: Add OpenCAPI OPAL API calls + + Add three OPAL API calls that are required by the ocxl driver. + + - OPAL_NPU_SPA_SETUP + + The Shared Process Area (SPA) is a table containing one entry (a + "Process Element") per memory context which can be accessed by the + OpenCAPI device. + + - OPAL_NPU_SPA_CLEAR_CACHE + + The NPU keeps a cache of recently accessed memory contexts. When a + Process Element is removed from the SPA, the cache for the link must be + cleared. + + - OPAL_NPU_TL_SET + + The Transaction Layer specification defines several templates for + messages to be exchanged on the link. During link setup, the host and + device must negotiate what templates are supported on both sides and at + what rates those messages can be sent. +- npu2-opencapi: Train OpenCAPI links and setup devices + + Scan the OpenCAPI links under the NPU, and for each link, reset the card, + set up a device, train the link and register a PHB. + + Implement the necessary operations for the OpenCAPI PHB type. + + For bringup, test and debug purposes, we allow an NVRAM setting, + "opencapi-link-training" that can be set to either disable link training + completely or to use the prbs31 test pattern. + + To disable link training: :: + + nvram -p ibm,skiboot --update-config opencapi-link-training=none + + To use prbs31: :: + + nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31 +- npu2-hw-procedures: Add support for OpenCAPI PHY link training + + Unlike NVLink, which uses the pci-virt framework to fake a PCI + configuration space for NVLink devices, the OpenCAPI device model presents + us with a real configuration space handled by the device over the OpenCAPI + link. + + As a result, we have to train the OpenCAPI link in skiboot before we do PCI + probing, so that config space can be accessed, rather than having link + training being triggered by the Linux driver. +- npu2-opencapi: Configure NPU for OpenCAPI + + Scan the device tree for NPUs with OpenCAPI links and configure the NPU per + the initialisation sequence in the NPU OpenCAPI workbook. +- capp: Make error in capp timebase sync a non-fatal error + + Presently when we encounter an error while synchronizing capp timebase + with chip-tod at the end of enable_capi_mode() we return an + error. This has an to unintended consequences. First this will prevent + disabling of fast-reboot even though CAPP is already enabled by this + point. Secondly, failure during timebase sync is a non fatal error or + capp initialization as CAPP/PSL can continue working after this and an + AFU will only see an error when it tries to read the timebase value + from PSL. + + So this patch updates enable_capi_mode() to not return an error in + case call to chiptod_capp_timebase_sync() fails. The function will now + just log an error and continue further with capp init sequence. This + make the current implementation align with the one in kernel 'cxl' + driver which also assumes the PSL timebase sync errors as non-fatal + init error. +- npu2-opencapi: Fix assert on link reset during init + + We don't support resetting an opencapi link yet. + + Commit fe6d86b9 ("pci: Make fast reboot creset PHBs in parallel") + tries resetting any PHB whose slot defines a 'run_sm' callback. It + raises an assert when applied to an opencapi PHB, as 'run_sm' calls + the 'freset' callback, which is not yet defined for opencapi. + + Fix it for now by removing the currently useless definition of + 'run_sm' on the opencapi slot. It will print a message in the skiboot + log because the PHB cannot be reset, which is correct. It will all go + away when we add support for resetting an opencapi link. +- capp: Add lid definition for P9 DD-2.2 + + Update fsp_lid_map to include CAPP ucode lid for phb4-chipid == + 0x202d1 that corresponds to P9 DD-2.2 chip. +- capp: Disable fast-reboot when capp is enabled + + +PCI +--- + +Since :ref:`skiboot-5.11-rc1`: + +- phb4: Reset FIR/NFIR registers before PHB4 probe + + The function phb4_probe_stack() resets "ETU Reset Register" to + unfreeze the PHB before it performs mmio access on the PHB. However in + case the FIR/NFIR registers are set while entering this function, + the reset of "ETU Reset Register" wont unfreeze the PHB and it will + remain fenced. This leads to failure during initial CRESET of the PHB + as mmio access is still not enabled and an error message of the form + below is logged: :: + + PHB#0000[0:0]: Initializing PHB4... + PHB#0000[0:0]: Default system config: 0xffffffffffffffff + PHB#0000[0:0]: New system config : 0xffffffffffffffff + PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff + PHB#0000[0:0]: Waiting for DLP PG reset to complete... + <snip> + PHB#0000[0:0]: Timeout waiting for DLP PG reset ! + PHB#0000[0:0]: Initialization failed + + This is especially seen happening during the MPIPL flow where SBE + would quiesces and fence the PHB so that it doesn't stomp on the main + memory. However when skiboot enters phb4_probe_stack() after MPIPL, + the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU + reset is done. + + So to fix this issue the patch introduces new xscom writes to + phb4_probe_stack() to reset the FIR/NFIR registers before performing + ETU reset to enable mmio access to the PHB. + +Since :ref:`skiboot-5.10`: + +- pci: Reduce log level of error message + + If a link doesn't train, we can end up with error messages like this: :: + + [ 63.027261959,3] PHB#0032[8:2]: LINK: Timeout waiting for electrical link + [ 63.027265573,3] PHB#0032:00:00.0 Error -6 resetting + + The first message is useful but the second message is just debug from + the core PCI code and is confusing to print to the console. + + This reduces the second print to debug level so it's not seen by the + console by default. +- Revert "platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots" + + This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f. + + Ben says: + It's on purpose that we do NOT compare the bus numbers, + they are always 0 in the slot table + we do a hierarchical walk of the tree, matching only the + devfn's along the way bcs the bus numbering isn't fixed + this breaks all slot naming etc... stuff on anything using + the "skiboot" slot tables (P8 opp typically) +- core/pci-dt-slot: Fix booting with no slot map + + Currently if you don't have a slot map in the device tree in + /ibm,pcie-slots, you can crash with a back trace like this: :: + + CPU 0034 Backtrace: + S: 0000000031cd3370 R: 000000003001362c .backtrace+0x48 + S: 0000000031cd3410 R: 0000000030019e38 ._abort+0x4c + S: 0000000031cd3490 R: 000000003002760c .exception_entry+0x180 + S: 0000000031cd3670 R: 0000000000001f10 * + S: 0000000031cd3850 R: 00000000300b4f3e * cpu_features_table+0x1d9e + S: 0000000031cd38e0 R: 000000003002682c .dt_node_is_compatible+0x20 + S: 0000000031cd3960 R: 0000000030030e08 .map_pci_dev_to_slot+0x16c + S: 0000000031cd3a30 R: 0000000030091054 .dt_slot_get_slot_info+0x28 + S: 0000000031cd3ac0 R: 000000003001e27c .pci_scan_one+0x2ac + S: 0000000031cd3ba0 R: 000000003001e588 .pci_scan_bus+0x70 + S: 0000000031cd3cb0 R: 000000003001ee74 .pci_scan_phb+0x100 + S: 0000000031cd3d40 R: 0000000030017ff0 .cpu_process_jobs+0xdc + S: 0000000031cd3e00 R: 0000000030014cb0 .__secondary_cpu_entry+0x44 + S: 0000000031cd3e80 R: 0000000030014d04 .secondary_cpu_entry+0x34 + S: 0000000031cd3f00 R: 0000000030002770 secondary_wait+0x8c + [ 73.016947149,3] Fatal MCE at 0000000030026054 .dt_find_property+0x30 + [ 73.017073254,3] CFAR : 0000000030026040 + [ 73.017138048,3] SRR0 : 0000000030026054 SRR1 : 9000000000201000 + [ 73.017198375,3] HSRR0: 0000000000000000 HSRR1: 0000000000000000 + [ 73.017263210,3] DSISR: 00000008 DAR : 7c7b1b7848002524 + [ 73.017352517,3] LR : 000000003002602c CTR : 000000003009102c + [ 73.017419778,3] CR : 20004204 XER : 20040000 + [ 73.017502425,3] GPR00: 000000003002682c GPR16: 0000000000000000 + [ 73.017586924,3] GPR01: 0000000031c23670 GPR17: 0000000000000000 + [ 73.017643873,3] GPR02: 00000000300fd500 GPR18: 0000000000000000 + [ 73.017767091,3] GPR03: fffffffffffffff8 GPR19: 0000000000000000 + [ 73.017855707,3] GPR04: 00000000300b3dc6 GPR20: 0000000000000000 + [ 73.017943944,3] GPR05: 0000000000000000 GPR21: 00000000300bb6d2 + [ 73.018024709,3] GPR06: 0000000031c23910 GPR22: 0000000000000000 + [ 73.018117716,3] GPR07: 0000000031c23930 GPR23: 0000000000000000 + [ 73.018195974,3] GPR08: 0000000000000000 GPR24: 0000000000000000 + [ 73.018278350,3] GPR09: 0000000000000000 GPR25: 0000000000000000 + [ 73.018353795,3] GPR10: 0000000000000028 GPR26: 00000000300be6fb + [ 73.018424362,3] GPR11: 0000000000000000 GPR27: 0000000000000000 + [ 73.018533159,3] GPR12: 0000000020004208 GPR28: 0000000030767d38 + [ 73.018642725,3] GPR13: 0000000031c20000 GPR29: 00000000300b3dc6 + [ 73.018737925,3] GPR14: 0000000000000000 GPR30: 0000000000000010 + [ 73.018794428,3] GPR15: 0000000000000000 GPR31: 7c7b1b7848002514 + + This has been seen in the lab on a witherspoon using the device tree + entry point (ie. no HDAT). + + This fixes the null pointer deref. + +Bugs Fixed +---------- +Since :ref:`skiboot-5.11-rc1`: + +- cpufeatures: Fix setting DARN and SCV HWCAP feature bits + + DARN and SCV has been assigned AT_HWCAP2 (32-63) bits: :: + + #define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn */ + #define PPC_FEATURE2_SCV 0x00100000 /* scv syscall */ + + A cpufeatures-aware OS will not advertise these to userspace without + this patch. +- xive: disable store EOI support + + Hardware has limitations which would require to put a sync after each + store EOI to make sure the MMIO operations that change the ESB state + are ordered. This is a killer for performance and the PHBs do not + support the sync. So remove the store EOI for the moment, until + hardware is improved. + + Also, while we are at changing the XIVE source flags, let's fix the + settings for the PHB4s which should follow these rules : + + - SHIFT_BUG for DD10 + - STORE_EOI for DD20 and if enabled + - TRIGGER_PAGE for DDx0 and if not STORE_EOI + +Since :ref:`skiboot-5.10`: + +- xive: fix opal_xive_set_vp_info() error path + + In case of error, opal_xive_set_vp_info() will return without + unlocking the xive object. This is most certainly a typo. +- hw/imc: don't access homer memory if it was not initialised + + This can happen under mambo, at least. +- nvram: run nvram_validate() after nvram_reformat() + + nvram_reformat() sets nvram_valid = true, but it does not set + skiboot_part_hdr. Call nvram_validate() instead, which sets + everything up properly. +- dts: Zero struct to avoid using uninitialised value +- hw/imc: Don't dereference possible NULL +- libstb/create-container: munmap() signature file address +- npu2-opencapi: Fix memory leak +- npu2: Fix possible NULL dereference +- occ-sensors: Remove NULL checks after dereference +- core/ipmi-opal: Add interrupt-parent property for ipmi node on P9 and above. + + dtc complains below warning with newer 4.2+ kernels. :: + + dts: Warning (interrupts_property): Missing interrupt-parent for /ibm,opal/ipmi + + This fix adds interrupt-parent property under /ibm,opal/ipmi DT node on P9 + and above, which allows ipmi-opal to properly use the OPAL irqchip. + +Other fixes and improvements +---------------------------- + +- core/cpu: discover stack region size before initialising memory regions + + Stack allocation first allocates a memory region sized to hold stacks + for all possible CPUs up to the maximum PIR of the architecture, zeros + the region, then initialises all stacks. Max PIR is 32768 on POWER9, + which is 512MB for stacks. + + The stack region is then shrunk after CPUs are discovered, but this is + a bit of a hack, and it leaves a hole in the memory allocation regions + as it's done after mem regions are initialised. :: + + 0x000000000000..00002fffffff : ibm,os-reserve - OS + 0x000030000000..0000303fffff : ibm,firmware-code - OPAL + 0x000030400000..000030ffffff : ibm,firmware-heap - OPAL + 0x000031000000..000031bfffff : ibm,firmware-data - OPAL + 0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL + *** gap *** + 0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL + 0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS + 0x000080000000..000080b3cdff : initramfs - OPAL + 0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL + 0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS + + This change moves zeroing into the per-cpu stack setup. The boot CPU + stack is set up based on the current PIR. Then the size of the stack + region is set, by discovering the maximum PIR of the system from the + device tree, before mem regions are intialised. + + This results in all memory being accounted within memory regions, + and less memory fragmentation of OPAL allocations. +- Make gard display show that a record is cleared + + When clearing gard records, Hostboot only modifies the record_id + portion to be 0xFFFFFFFF. The remainder of the entry remains. + Without this change it can be confusing to users to know that + the record they are looking at is no longer valid. +- Reserve OPAL API number for opal_handle_hmi2 function. +- dts: spl_wakeup: Remove all workarounds in the spl wakeup logic + + We coded few workarounds in special wakeup logic to handle the + buggy firmware. Now that is fixed remove them as they break the + special wakeup protocol. As per the spec we should not de-assert + beofre assert is complete. So follow this protocol. +- build: use thin archives rather than incremental linking + + This changes to build system to use thin archives rather than + incremental linking for built-in.o, similar to recent change to Linux. + built-in.o is renamed to built-in.a, and is created as a thin archive + with no index, for speed and size. All built-in.a are aggregated into + a skiboot.tmp.a which is a thin archive built with an index, making it + suitable or linking. This is input into the final link. + + The advantags of build size and linker code placement flexibility are + not as great with skiboot as a bigger project like Linux, but it's a + conceptually better way to build, and is more compatible with link + time optimisation in toolchains which might be interesting for skiboot + particularly for size reductions. + + Size of build tree before this patch is 34.4MB, afterwards 23.1MB. +- core/init: Assert when kernel not found + + If the kernel doesn't load out of flash or there is nothing at + KERNEL_LOAD_BASE, we end up with an esoteric message as we try to + branch to out of skiboot into nothing :: + + [ 0.007197688,3] INIT: ELF header not found. Assuming raw binary. + [ 0.014035267,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13029 + [ 0.014042254,3] *********************************************** + [ 0.014069947,3] Fatal Exception 0xe40 at 0000000000000000 + [ 0.014085574,3] CFAR : 00000000300051c4 + [ 0.014090118,3] SRR0 : 0000000000000000 SRR1 : 0000000000000000 + [ 0.014096243,3] HSRR0: 0000000000000000 HSRR1: 9000000000001000 + [ 0.014102546,3] DSISR: 00000000 DAR : 0000000000000000 + [ 0.014108538,3] LR : 00000000300144c8 CTR : 0000000000000000 + [ 0.014114756,3] CR : 40002202 XER : 00000000 + [ 0.014120301,3] GPR00: 000000003001447c GPR16: 0000000000000000 + + This improves the message and asserts in this case: :: + + [ 0.014042685,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13049 bytes) + [ 0.014049556,0] FATAL: Kernel is zeros, can't execute! + [ 0.014054237,0] Assert fail: core/init.c:566:0 + [ 0.014060472,0] Aborting! +- core: Fix 'opal-runtime-size' property + + We are populating 'opal-runtime-size' before calculating actual stack size. + Hence we endup having wrong runtime size (ex: on P9 it shows ~540MB while + actual size is around ~40MB). Note that only device tree property is shows + wrong value, but reserved-memory reflects correct size. + + init_all_cpus() calculates and updates actual stack size. Hence move this + function call before add_opal_node(). + +- mambo: Add fw-feature flags for security related settings + + Newer firmwares report some feature flags related to security + settings via HDAT. On real hardware skiboot translates these into + device tree properties. For testing purposes just create the + properties manually in the tcl. + + These values don't exactly match any actual chip revision, but the + code should not rely on any exact set of values anyway. We just define + the most interesting flags, that if toggled to "disable" will change + Linux behaviour. You can see the actual values in the hostboot source + in src/usr/hdat/hdatiplparms.H. + + Also add an environment variable for easily toggling the top-level + "security on" setting. +- direct-controls: mambo fix for multiple chips +- libflash/blocklevel: Correct miscalculation in blocklevel_smart_erase() + + If blocklevel_smart_erase() detects that the smart erase fits entire in + one erase block, it has an early bail path. In this path it miscaculates + where in the buffer the backend needs to read from to perform the final + write. +- libstb/secureboot: Fix logging of secure verify messages. + + Currently we are logging secure verify/enforce messages in PR_EMERG + level even when there is no secureboot mode enabled. So reduce the + log level to PR_ERR when secureboot mode is OFF. + +Testing / Code coverage improvements +------------------------------------ + +Improvements in gcov support include support for newer GCCs as well +as easily exporting the area of memory you need to dump to feed to +`extract-gcov`. + +- cpu_idle_job: relax a bit + + This *dramatically* improves kernel boot time with GCOV builds + + from ~3minutes between loading kernel and switching the HILE + bit down to around 10 seconds. +- gcov: Another GCC, another gcov tweak +- Keep constructors with priorities + + Fixes GCOV builds with gcc7, which uses this. +- gcov: Add gcov data struct to sysfs + + Extracting the skiboot gcov data is currently a tedious process which + involves taking a mem dump of skiboot and searching for the gcov_info + struct. + This patch adds the gcov struct to sysfs under /opal/exports. Allowing the + data to be copied directly into userspace and processed. + |