aboutsummaryrefslogtreecommitdiff
path: root/core
AgeCommit message (Collapse)AuthorFilesLines
2021-10-19pau: hmi scom dumpChristophe Lombard1-144/+132
This patch add a new function to dump PAU registers when a HMI has been raised and an OpenCAPI link has been hit by an error. For each register, the scom address and the register value are printed. The hmi.c has been redesigned in order to support the new PHB/PCIEX type (PAU OpenCapi). Now, the *npu* functions support NPU and PAU units of P8, P9 and P10 chips. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19pau: create phbChristophe Lombard2-4/+9
Implement the necessary operations for the OpenCAPI PHB type and inform the device-tree properties associated. The OpenCapi PCI config Addr/Data registers are reachable through the Generation-ID Registers MMIO BARS. The Config Address and Data registers are located at the following offsets from the AFU Config BAR plus 320 KB. • Config Address for Brick 0 – Offset 0 • Config Data for Brick 0 – Offsets: ◦ 128 – 4-byte config register • Config Address for Brick 1 – Offset 256 • Config Data for Brick 1 – Offsets: ◦ 384 – 4-byte config register Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19pau: introduce supportChristophe Lombard1-0/+3
OpenCapi for P10 is included in the P10 chip. This requires OCAPI capable PHYs, Datalink Layer Logic and Transaction Layer Logic to be included. The PHYs are the physical connection to the OCAPI interconnect. The Datalink Layer provides link training. The Transaction Layer executes the cache coherent and data movement commands on the P10 chip. The PAU provides the Transaction Layer functionality for the OCAPI link(s) on the P10 chip. The P10 PAU supports two OCAPI links. Six accelerator units PAUs are instantiated on the P10 chip for a total of twelve OCAPI links. This patch adds PAU opencapi structure for supporting OpenCapi5. hw/pau.c file contains main of PAU management functions. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19AWAN simulator support for P10Ryan Grimm3-6/+18
This patch enables Skiboot to initialize and Linux to boot to user space on the AWAN core and chip models. We need the distinction between core and chip models because the core models do not have an XSCOM unit, CHIPTOD, nor RNG. The chip model does have them and they work. So, add a device_type property to the awan node to distinguish core from chip. Sample DTS are provided for the core and chip models in external/awan. Just like Mambo, we need to return in slw_init before trying to initialize SLW. Without an XSCOM unit in the device tree for the core model, the SLW code path eventually fails an assert due to lack of chips. This commit defines a QUIRK_AWAN where previously Mambo used QUIRK_MAMBO_CALLOUTS so now Mambo and AWAN core both work. Also, fix up chip quirks so the core model and chip model boot and initialize the appropriate units. Disable sreset and power management in a couple spots because the chip model does not support stop with EC=1 and enter_p9_pm_state spins in the branch-to-self after stop. Provide an external/awan/README.md with a high-level view of booting in the environment. Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19Remove support for POWER8 DD1Nicholas Piggin1-14/+9
This significantly simplifies the SLW code. HILE is now always supported. Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19cpu: add debug check in cpu_relaxNicholas Piggin1-0/+6
If cpu_relax() is called when not at medium SMT priority, it will lose the prior priority and return at medium. Add a debug check to catch this, which would have flagged the previous bug. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19cpu: cpu_idle_job SMT priority fixNicholas Piggin1-1/+0
Calling cpu_relax resets the SMT priority to medium, causing the idle loop not to run with lowest priority. Just use barrier() instead, this saves about 3 seconds on a SMT4 systemsim (mambo) boot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19interrupts: add_opal_interrupts avoid NULL dereference on P10 mamboNicholas Piggin1-1/+6
On P10, get_ics_phandle() calls xive2_get_phandle() directly. This results in a NULL dereference on mambo when xive2 is not set up. This was caught with the virtual memory boot patch on P10 mambo. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-09npu3: Remove GPU support on SwiftFrederic Barrat1-1/+0
npu3 was only used on the Swift platform to add support for GPUs (nvlink). The Swift platform has never left the lab and support for GPUs on it is pretty much dead. So let's remove it. The patch removes all related code. Device tree entries are no longer created and in the very unlikely case that someone is still trying to boot it, the linux nvlink discovery code should be quiet. Tested by booting on Swift with no GPU. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-18interrupts: Do not advertise XICS support on P10Cédric Le Goater1-1/+11
We only support the XIVE interface. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06libpore: P10 stop-api supportPratik Rajesh Sampat1-4/+27
Update libpore with P10 STOP API. Add minor changes to make P9 stop-api and P10 stop-api to co-exist in OPAL. These calls are required for STOP11 support on P10. STIOP0,2,3 on P10 does not lose full core state or scoms. stop-api based restore of SPRs or xscoms required only for STOP11 on P10. STOP11 on P10 will be a limited lab test/stress feature and not a product feature. (Same case as P9) Co-authored-by: Pratik Rajesh Sampat <psampat@linux.ibm.com> Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com> Co-authored-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Co-authored-by: Ryan Grimm <grimm@linux.ibm.com> Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06hw/phb5: Add initial supportJordan Niethe2-1/+5
The PHB5 logic on P10 is pretty close to the P9's version. So we keep our base phb4 implementation and just add the few changes within if statements. Signed-off-by: Jordan Niethe <jpn@ozlabs.au.ibm.com> [clg: misc cleanups and fixes ] Signed-off-by: Cédric Le Goater <clg@kaod.org> [Fixed compilation issue - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [Nick: Unify PHB4/PHB5 drivers ] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [Mikey: set default lane eq settings for phb5] Signed-off-by: Michael Neuling <mikey@neuling.org> [FB: squash commits + small cleanup ] Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06xive/p10: Add a XIVE2 driverCédric Le Goater2-3/+13
The XIVE2 interrupt controller of the POWER10 processor follows the same logic than on POWER9 but the HW interface has been largely reviewed. It has a new register interface, different BARs, extra VSDs, new layout for the XIVE structures, and a set of new features which are described below. The OPAL XIVE2 driver code activating this controller was duplicated from P9 for clarity as the registers and structures have changed considerably. The same OPAL interface is implemented for OS compatibility and it should not impact existing Linux kernels, KVM included. Guest OS is not impacted either. Support for new features will be implemented in time and will require new support from the OS. * XIVE2 BARS The interrupt controller BARs have a different layout outlined below. Each sub-engine has now own its range and the indirect TIMA access was replaced with a set of pages, one per CPU, under the IC BAR: - IC BAR (Interrupt Controller) . 4 pages, one per sub-engine . 128 indirect TIMA pages - TM BAR (Thread Interrupt Management Area) . 4 pages - ESB BAR (ESB pages for IPIs) . up to 1TB - END BAR (ESB pages for ENDs) . up to 2TB - NVC BAR (Notification Virtual Crowd) . up to 128 - NVPG BAR (Notification Virtual Process and Group) . up to 1TB - Direct mapped Thread Context Area (reads & writes) OPAL does not use the grouping and crowd capability. * Virtual Structure Tables XIVE2 adds new tables types and also changes the field layout of the END and NVP Virtualization Structure Descriptors. - EAS - END new layout - NVT was splitted in : . NVP (Processor), 32B . NVG (Group), 32B . NVC (Crowd == P9 block group) 32B - IC for remote configuration - SYNC for cache injection - ERQ for event input queue The setup is slighly different on XIVE2 because the indexing has changed for some of the tables, block ID or the chip topology ID can be used. * XIVE2 features SCOM and MMIO registers have a new layout and XIVE2 adds a new global capability and configuration registers. The lowlevel hardware offers a set of new features among which : - cache injection mechanism - 4 cache watch engines - a configurable number of priorities : 1 -8 - StoreEOI with load-after-store ordering is activated by default - new sync/kill operations for cache operations Other features will have some impact on the Hypervisor and guest OS when activated, but this is not required for initial support of the controller. - Gen2 TIMA layout - A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility - Automatic Context save & restore - increase to 24bit for VP number - New escalations schems : ESB, Adaptive, CPPR POWER10 adds support for User interrupts. When configured, the XIVE2 controller can notify directly user processes using the Event Based Branch exception line of the thread. If not running, the OS is notified through an escalation event. New OPAL and PAPR interfaces will be required and OS support needs to be studied. * XIVE2 P9-compat mode, or Gen1 The thread interrupt management area (TIMA) is a set of pages mapped in the Hypervisor and in the guest OS address space giving access to the interrupt thread context registers for interrupt management, ACK, EOI, CPPR, etc. XIVE2 changes slightly the TIMA layout with extra bits for the new features, larger CAM lines and the controller provides configuration switches for backward compatibility. This is called the XIVE2 P9-compat mode, of Gen1 TIMA. It impacts the layout of the TIMA and the availability of the internal features associated with it, Automatic Save & Restore for instance. Using a P9 layout also means setting the controller in such a mode at init time. The XIVE2 driver in OPAL chooses to initialize the XIVE2 controller with a XIVE2/P10 TIMA directly because the layouts are compatible with the Linux PowerNV and the guest OSes expectations. For KVM support, the OPAL calls abstract the HW interface and no assumption is made on the OS CAM line width. * Activating new XIVE2 features Everything related to OPAL internals such as the use of the new cache sync mechanism can be implemented in time without impact on the OS. Other features will require new device tree properties exposed to the OS and extra support for the OS. Automatic Context save & restore is one of the first feature which should be looked at. * XICS-over-XICS driver (P8 compatibility) The P8 emulation mode is an OPAL compat interface used for Linux kernels which did not have XIVE native support. This was useful for POWER9 bringup but it is much less now. As it was adding a lot of complexity and reducing the interrupt controller resources, this mode is not available in the XIVE2 driver for POWER10. It will still be possible to add this compat mode in the future if required. The OS will have to reset the driver at boot time, like on POWER9. * Impact on other drivers (PSI, PHB, NPU) Interrupts are allocated in a very similar way. Each controller might have different ESB characteristics, StoreEOI support, 64K pages for PSI. All is in place to support these changes already. PHB5 will have support for "address-based trigger mode", probably in the DD2.0 time frame when verification is completed. When activated, the XIVE IC ESB pages will be used instead of the PHB ESB pages for a lower interrupt latency. LSI will still use old fashion triggers without StoreEOI. * Yet to be addressed : - OPAL P10 interface incomplete (stop states) - Clarify the PHB5 strategy regarding the use of the XIVE IC ESB pages instead of the PHB ones when address-based trigger mode is supported. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06hdat/spira: Define ibm, primary-topology-index property per chipHaren Myneni1-0/+3
HDAT provides Topology ID table and the primary topology location on P10. This primary location points to primary topology entry in ID table which contains the primary topology index and this index is used to define the paste base address per chip. This patch reads Topology ID table and the primary topology location from hdata and retrieves the primary topology index in the ID table. Make this primaty topology index value available with ibm,primary-topology-index property per chip. VAS reads this property to setup paste base address for each chip. Signed-off-by: Haren Myneni <haren@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06plat/qemu/p10: add a POWER10 platformCédric Le Goater1-0/+1
BMC is still defined as ast2500 but it should change to ast2600 when available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06cpufeatures: Add POWER10 supportNicholas Piggin1-22/+82
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> [Folded Ravi's DAWR patch - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06p10: Workaround core recovery issueMichael Neuling1-0/+36
This works around a core recovery issue in P10. The workaround involves the CME polling for a core recovery and performing the recovery procedure itself. For this to happen, the host leaves core recovery off (HID[5]) and then masks the PC system checkstop. This patch does this. Firmware starts skiboot with recovery already off, so we just leave it off for longer and then mask the PC system checkstop. This makes the window longer where a core recovery can cause an xstop but this window is still small and can still only happens on boot. Signed-off-by: Michael Neuling <mikey@neuling.org> [Added mambo check - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-06Initial POWER10 enablementNicholas Piggin8-66/+725
Co-authored-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Co-authored-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Co-authored-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Co-authored-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Co-authored-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Co-authored-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-08-04POWER9 Cleanups: de-assert SPWPratik R. Sampat1-0/+2
De-assert special wakeup bits for the case when SPWU bit is set, however the core is gated to maintain a coherent state for special wakeup. Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30core/cpu: Initialize all cpu thread areas to avoid invalid memory access.Mahesh Salgaonkar1-2/+24
Starting from p10 hostboot will no longer clear all the system memory except its own space. OPAL uses the memory at SKIBOOT_BASE + SKIBOOT_SIZE for cpu stack with pir as index. With hostboot no longer clearing memory this region may hold junk contents. Currently opal initialize cpu stack memory only for cpu pir that is found on the device-tree. For the rest, the cpu thread contents are uninitialized. This sometime causes for_each_cpu* macros to return cpu thread for pir/cpu which isn't present on the system. The for_each_cpu* macros iterate over cpu stacks using pir as index and returns cpu thread pointer if state != cpu_state_no_cpu. For cpus that are not found on device-tree the state may hold junk value leading OPAL to access invalid cpu thread area. This further leads to accessing pointers with junk values causing machine check (MCE) during OPAL init code. Fix this by Initializing all the cpu thread areas upto cpu_max_pir. [ 182.049714372,3] *********************************************** [ 182.049878580,3] Fatal MCE at 0000000030039738 .init_trace_buffers+0x21c MSR 9000000000201002 [ 182.049943811,3] Cause: load real address error [ 182.049968681,3] Effective address: 0x480113a4791c4a50 [ 182.050000736,3] CFAR : 00000000300395b8 MSR : 9000000000201002 [ 182.050035376,3] SRR0 : 0000000030039738 SRR1 : 9000000000201002 [ 182.050072878,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 182.050117303,3] DSISR: 00000040 DAR : 480113a4791c4a50 [ 182.050149054,3] LR : 0000000030039744 CTR : 0000000000000000 [ 182.050182991,3] CR : 42000224 XER : 00000000 [ 182.050217262,3] GPR00: 000000003003962c GPR16: 0000000032d50000 [ 182.050255746,3] GPR01: 0000000032d53a50 GPR17: 0000000030003198 [ 182.050288081,3] GPR02: 000000003014cb00 GPR18: 0000000000000000 [ 182.050331474,3] GPR03: 0000000031c50000 GPR19: 0000000000000000 [ 182.050371934,3] GPR04: 0000000000000000 GPR20: 0000000000000000 [ 182.050416212,3] GPR05: ffffffffffffffff GPR21: 0000000000000001 [ 182.050454130,3] GPR06: 0000000000000005 GPR22: 00000000300f74eb [ 182.050488053,3] GPR07: 0000000000000028 GPR23: 00000000000fffd8 [ 182.050522774,3] GPR08: 000000000000067f GPR24: 00000000000fff40 [ 182.050566878,3] GPR09: 480113a4791c4a18 GPR25: 0000000000000070 [ 182.050601524,3] GPR10: 00000000078b0353 GPR26: 00000000300f7527 [ 182.050640345,3] GPR11: 0000000000000000 GPR27: 00000000300f7516 [ 182.050680816,3] GPR12: 0000000042000222 GPR28: 000000003acd0000 [ 182.050724099,3] GPR13: 000000000025a908 GPR29: 000000003acd0000 [ 182.050759728,3] GPR14: 0000000000000000 GPR30: 0000000000000000 [ 182.050790430,3] GPR15: 0000000000000000 GPR31: 00000000301f0038 CPU 0228 Backtrace: S: 0000000032d53d60 R: 000000003003962c .init_trace_buffers+0x110 S: 0000000032d53e30 R: 0000000030022f84 .main_cpu_entry+0x550 S: 0000000032d53f00 R: 00000000300031f8 not_fused+0x11c Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [Folded Nick's patch to that added mark_all_secondary_cpus_absent() - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30fast-reboot: Fix the bonus cleanup_cpu_state()Oliver O'Halloran1-2/+10
Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30i2c,trace: Add I2C operation trace eventsOliver O'Halloran1-0/+32
Add support for tracing I2C transactions performed by skiboot. This covers both internally initiated I2C ops and those that requested by the kernel via the OPAL API. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30trace: Add nvram hack to use the old trace export behaviourOliver O'Halloran3-7/+18
Previously we put all the trace buffer exports in the exports/ node. However, there's one trace buffer for each core so I moved them into a subdirectory since they were crowding up the place. Most kernels don't support recursively exporting subnodes though so kernel's don't have support for recursively exporting subnodes, so add a hack to restore the old behaviour for now. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [Fixed run-trace test case - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30core/mce: POWER9 fix machine check decoding of async errorsNicholas Piggin1-0/+13
Async machine check errors due to bad real address from store or foreign link time out comes with the load/store bit (PPC bit 42) set in SRR1 but the cause is set in SRR1 not DSISR, unlike other errors that have the load/store bit set. This behaviour was omitted from the POWER9 User Manual but it is confirmed to be the expected one. Update the machine check decoder to match. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30cpu: Add retry in cpu_pm_disable to kick cpus out of idleVaidyanathan Srinivasan1-2/+11
cpu_pm_idle sets pm_enabled = false and expected all cpus to exit idle. This is needed to re-enter with new settings. Right after cpu_bringup() we call copy_sreset_vector() and then cpu_set_sreset_enable(true). At this time some cpus are still yet to enter idle and hence miss the doorbell to wakeup. This leads to cpu_pm_idle waiting forever. This pattern happens on some system in fused-core mode. The fact that pm_enabled flag is changing right in the middle of idle entry is see from the "cpu_idle_p9 called with pm disabled" traces. One method to fix this race is to retry the door-bell after a timeout. This patch implements a small time out (few seconds) and then issues the doorbell once again to kick the cpu that entered idle late after missing the pm_enabled = false flag. This checking loop run in smt_lowest() and hence the timeout number maps to couple of seconds which is sufficient to let the cpus settle in idle and make them see the doorbell and exit. Example boot log: [ 288.309322810,7] INIT: CPU PIR 0x000d called in [ 288.309320768,7] INIT: CPU PIR 0x000b called in [ 288.314603802,7] INIT: CPU PIR 0x0020 called in [ 288.321303468,5] CPU: All 88 processors called in... [ 288.315056796,6] cpu_idle_p9 called on cpu 0x024e with pm disabled [ 288.321308091,6] cpu_idle_p9 called on cpu 0x0264 with pm disabled [ 288.314424259,6] cpu_idle_p9 called on cpu 0x025b with pm disabled [ 288.324928307,6] cpu_idle_p9 called on cpu 0x0065 with pm disabled [ 305.207316004,6] cpu_pm_disable TIMEOUT on cpu 0x0261 to exit idle [ 322.093298501,6] cpu_pm_disable TIMEOUT on cpu 0x0263 to exit idle [ 338.491281028,6] cpu_pm_disable TIMEOUT on cpu 0x0265 to exit idle [ 355.377263492,6] cpu_pm_disable TIMEOUT on cpu 0x0267 to exit idle [ 372.263245960,6] cpu_pm_disable TIMEOUT on cpu 0x0269 to exit idle [ 389.149228389,6] cpu_pm_disable TIMEOUT on cpu 0x026b to exit idle [ 406.035210852,6] cpu_pm_disable TIMEOUT on cpu 0x026d to exit idle [ 422.433193381,6] cpu_pm_disable TIMEOUT on cpu 0x026f to exit idle [ 422.433277720,6] CHIPTOD: Calculated MCBS is 0x25 (Cfreq=2000000000 Tfreq=32000000) Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> [Reworded commit message - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-05-13hw/imc: move imc_init() towards end main_cpu_entry()Madhavan Srinivasan1-3/+3
imc_init() checks for the 24x7 microcode state at boot to check whether the microcode is in proper state (running or paused). But in a larger system, loading of 24x7 microcode by OCC gets delayed. Because of this, imc_init() removes imc devices from the device tree. Moving imc_init() function towards end of the main_cpu_entry() works around this. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-12-15Fix possible deadlock with DEBUG buildVasant Hegde1-2/+2
Sample output from Cédric: ------------------------- [ 88.294111649,7] cpu_idle_p9 called on cpu 0x063c with pm disabled [ 88.289365222,7] cpu_idle_p9 called on cpu 0x025f with pm disabled [ 88.289900684,7] cpu_idle_p9 called on cpu 0x045f with pm disabled [ 88.302621295,7] CHIPTOD: Base TFMR=0x2512000000000000 [ 88.289899701,7] cpu_idle_p9 called on cpu 0x0456 with pm disabled LOCK ERROR: Deadlock detected @0x30402740 (state: 0x0000000400000001) [ 88.332264757,3] *********************************************** [ 88.332300051,3] < assert failed at core/lock.c:32 > [ 88.332328282,3] . [ 88.332347335,3] . [ 88.332364894,3] . [ 88.332377963,3] OO__) [ 88.332395458,3] <"__/ [ 88.332412628,3] ^ ^ [ 88.332450246,3] Fatal TRAP at 00000000300286a0 .lock_error+0x64 MSR 9000000000021002 [ 88.332501812,3] CFAR : 00000000300414f4 MSR : 9000000000021002 [ 88.332536539,3] SRR0 : 00000000300286a0 SRR1 : 9000000000021002 [ 88.332574644,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 88.332610635,3] DSISR: 00000000 DAR : 0000000000000000 [ 88.332650628,3] LR : 0000000030028690 CTR : 00000000300f9fa0 [ 88.332684451,3] CR : 20002000 XER : 00000000 [ 88.332712767,3] GPR00: 0000000030028690 GPR16: 0000000032c98000 [ 88.332748046,3] GPR01: 0000000032c9b0a0 GPR17: 0000000000000000 [ 88.332784060,3] GPR02: 0000000030169d00 GPR18: 0000000000000000 [ 88.332822091,3] GPR03: 0000000032c9b310 GPR19: 0000000000000000 [ 88.332861357,3] GPR04: 0000000030041480 GPR20: 0000000000000000 [ 88.332897229,3] GPR05: 0000000000000000 GPR21: 0000000000000000 [ 88.332937051,3] GPR06: 0000000000000010 GPR22: 0000000000000000 [ 88.332968463,3] GPR07: 0000000000000000 GPR23: 0000000000000000 [ 88.333007333,3] GPR08: 000000000002cbb5 GPR24: 0000000000000000 [ 88.333041971,3] GPR09: 0000000000000000 GPR25: 0000000000000000 [ 88.333081073,3] GPR10: 0000000000000000 GPR26: 0000000000000003 [ 88.333114301,3] GPR11: 3839616263646566 GPR27: 0000000000000211 [ 88.333156040,3] GPR12: 0000000020002000 GPR28: 000000003042a134 [ 88.333189222,3] GPR13: 0000000000000000 GPR29: 0000000030402740 [ 88.333225638,3] GPR14: 0000000000000000 GPR30: 0000000000000001 [ 88.333259730,3] GPR15: 0000000000000000 GPR31: 0000000000000000 CPU 0211 Backtrace: S: 0000000032c9b3b0 R: 0000000030028690 .lock_error+0x54 S: 0000000032c9b440 R: 0000000030028828 .add_lock_request+0xd0 S: 0000000032c9b4f0 R: 0000000030028a9c .lock_caller+0x8c S: 0000000032c9b5a0 R: 0000000030021b30 .__mcount_stack_check+0x70 S: 0000000032c9b650 R: 00000000300fabb0 .list_check_node+0x1c S: 0000000032c9b6f0 R: 00000000300fac98 .list_check+0x38 S: 0000000032c9b790 R: 00000000300289bc .try_lock_caller+0xac S: 0000000032c9b830 R: 0000000030028ad8 .lock_caller+0xc8 S: 0000000032c9b8e0 R: 0000000030028d74 .lock_recursive_caller+0x54 S: 0000000032c9b980 R: 0000000030020cb8 .console_write+0x48 S: 0000000032c9ba30 R: 00000000300445a8 .vprlog+0xc8 S: 0000000032c9bc20 R: 0000000030044630 ._prlog+0x50 S: 0000000032c9bcb0 R: 0000000030029204 .cpu_idle_p9+0x74 S: 0000000032c9bd40 R: 0000000030029628 .cpu_idle_pm+0x4c S: 0000000032c9bde0 R: 0000000030023fe0 .__secondary_cpu_entry+0xa0 S: 0000000032c9be70 R: 0000000030024034 .secondary_cpu_entry+0x40 S: 0000000032c9bf00 R: 0000000030003290 secondary_wait+0x8c CPU 0x4: opal_run_pollers -> check_stacks -> takes stack_check_lock lock prlog -> console_write -> waits for con_lock CPU 0x211 cpu_idle_p9 -> prlog -> console_write -> Takes con_lock lock list_check_node -> tries to take stack_check_lock and hits deadlock. I think we don't need to hold `stack_check_lock` while printing backtraces. Instead it makes sense to hold backtrace lock (bt_lock) and print output. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-11-27core/opal.c: sparse cleanup integer as NULLStewart Smith1-1/+1
Fixes: core/opal.c:418:61: warning: Using plain integer as NULL pointer Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-11-27core/platform: Fallback to full_reboot if fast-reboot failsVasant Hegde1-1/+2
If fast reboot fails then we return to Linux with OPAL_SUCCESS. Current Linux code thinks that request succedded and enters infinite loop (see Linux pnv_restart() code). This patch fixes above issue by return OPAL_UNSUPPORTED if fast reboot fails. Alternatively we can directly call full_reboot() itself. But I think it makes sense to go back to Linux and report the failure. And Linux falls back to normal reboot request. Fixes: 10bbcd07 ("core/platform: Add an explicit fast-reboot type") Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-11-27core/cpu: fix next_ungarded_primaryNicholas Piggin1-4/+2
next_unguarded_primary dereferences NULL CPU -> UB -> infinite loop Fast reboot works again after this patch. Fixes: 98f5834253c7e ("cpu: Keep track of the "ec_primary" in big core more") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-10-01core/flash.c: add SECBOOT read and write supportClaudio Carvalho1-0/+126
In secure boot enabled systems, the petitboot linux kernel verifies the OS kernel against x509 certificates that are wrapped in secure variables controlled by OPAL. These secure variables are stored in the PNOR SECBOOT partition, as well as the updates submitted for them using userspace tools. This patch adds read and write support to the PNOR SECBOOT partition in a similar fashion to that of NVRAM, so that OPAL can handle the secure variables. Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com> Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-01libstb/secureboot: OS Secure Boot is enabled only if FW secureboot is enabledNayna Jain1-1/+1
OS Secure Boot establishes a chain of trust from firmware to the OS. However, OS Secure Boot can only be secure if the chain of trust beneath it - from hardware to firmware - has been established by Firmware Secure Boot. This patch ensures that OS Secure Boot is enabled only if Firmware Secure Boot is enabled. Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-09-09stack: only print stack usage backtraces when we hit a new watermarkOliver O'Halloran1-4/+4
With DEBUG=1 builds we use the mcount hook to instrument how much stack space we're using. If we detect that a function call has come within 2KB of the bottom of the stack we currently print a backtrace. This can result in a huge amount of console IO in DEBUG=1 builds which can cause op-test timeouts, etc. Printing a backtrace on each function call isn't terribly useful, and it ends up crowding out the backtrace that's printed when we hit a new stack usage watermark. The watermark should provide enough information to find and fix excessive stack usage issues so drop the per-function backtrace printing and move the warning into the high-watermark check. This change is largely necessary because of DEBUG=1 expands adds a backtrace save area to struct lock which expands the size of it to nearly 2KB. struct cpu_thread (which lives at the bottom of the per-thread stacks) contains three locks and an additional backtrace save area which is enabled when DEBUG=1. The extra space requirements result in cpu_thread ballooning from ~420 bytes to nearly 8KB. Any growth in cpu_thread also results in less stack space being available for the thread, so when DEBUG=1 is enabled we go from having a 16KB stack to an 8KB stack. Although this seems large, skiboot does have some fairly deep call chains (UART console flushing, TPM drivers, both combined) which can cause the thread to come within 2KB of the stack use warning zone. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- Maybe we should swap locations of the normal and emergency stacks so cpu_thread takes space from the emergency stack rather than the normal one. The e-stack should only be used at runtime where the call chains should be smaller.
2020-08-07Enable fused core mode support in OPALVaidyanathan Srinivasan1-4/+0
Previous commit 482f18adf21eeb5f6ce2a93334725509a8f6f0cd added check for fused core mode and bailed out. The check can be removed since fused core mode is now supported in OPAL. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07Add POWER9 Cumulus processor PVR typeVaidyanathan Srinivasan1-0/+19
Add PVR checks and feature mapping for POWER9 Cumulus chip. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07cpu: Make cpu_get_core_index() return the fused core numberBenjamin Herrenschmidt2-1/+14
cpu_get_core_index() currently uses pir_to_core_id() which returns an EC number always (ie, a normal core number) even in fused core mode. This is inconsistent with cpu_get_thread_index() which returns a thread within a fused core (0...7) on P9. So let's make things consistent and document it. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07direct-ctl: Use the EC primary for special wakeupsBenjamin Herrenschmidt1-4/+4
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07cpu: Keep track of the "ec_primary" in big core moreBenjamin Herrenschmidt1-6/+14
The "EC" primary is the primary thread of an EC, ie, the corresponding small core "half" of the big core where the thread resides. It will be necessary for the direct controls to target the right half when doing special wakeups among others. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07chip: Fix pir_to_thread_id for fused coresBenjamin Herrenschmidt1-1/+1
pir_to_core_id() and pir_to_thread_id() are extensively used by the direct controls code and are expected to return the "normal" (non-fused, aka EC) core/thread IDs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-07Add basic P9 fused core supportRyan Grimm2-11/+39
P9 cores can be configured into fused core mode where two core chiplets function as an 8-threaded, single core. So, bump four to eight in boot_entry when in fused core mode and cpu_thread_count in init_boot_cpu. The HID, AMOR, TSCR, RPR require the first active thread on that core chiplet to load the copy for that core chiplet. So, send thread 1 of a fused core to init_shared_sprs in boot_entry. The code checks for fused core mode in the core thead state register and puts a field in struct cpu_thread. This flag is checked when updating the HID and in XIVE code when setting the special bar. For XSCOM, the core ID is the non-fused EX. So, create macros to arrange the bits. It's fairly verbose but somewhat readable. This was tested on a P9 ZZ with 16 fused cores and ran HTX for over 24 hours. Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-30mpipl: Move opal_mpipl_save_crashing_pir() call to platform specific codeVasant Hegde2-11/+1
Commit 34664746 moved opal_mpipl_save_crashing_pir() function call from platform specific code to generic assert() path. I completely missed to take care of all terminate path :-( This resulted in breaking `opalcore` on Linux kernel initiated MPIPL. As : - Linux initiated MPIPL calls platform termination function directly - ELF core format needs crashing CPU details to generate proper code Hence I think it makes sense to move this back to platform specific terminate handler code. Today we have two ways to trigger MPIPL based on service processor. - On BMC system we call SBE S0 interrupt - On FSP system we call `attn` instruction In future if we add new ways to trigger MPIPL then we have to add platform specific support code anyway. That way its fine to move this to platform sepcific code. One alternative is to make this call in all code path before making platform.terminate call... which makes it more complicated than above approach. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-30mpipl: Delay MPIPL registration until OPAL init is completeVasant Hegde2-3/+14
If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in continuous MPIPL loop (boot -> crash -> MPIPL -> boot). If OPAL boot fails (before loading kernel) then its better to call reboot. So that BMC can detect `n` number of boot failures (generally n = 3) and stop booting. That way we can avoid continuous loop. This patch moves MPIPL init to the end of init process (just before starting kernel). So that if we fail to boot OPAL we call normal reboot. Also this patch introduces new function to detect MPIPL is enabled or not (is_mpipl_enabled()). And in assert() path we check for this function instead of `dump` DT node. So that it will make sure we will not call MPIPL until opal_mpipl_init is complete. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-30dt: Set new property length in dt_resize_property()Thiago Jung Bauermann2-1/+1
All callers of dt_resize_property() need to set the new property length after calling it. append_chip_id() wasn't doing it, which caused this assert when booting my machine: [ 136.387213258,3] Unable to use memory range 0 from MSAREA 0 [ 136.387356677,3] Unable to use memory range 0 from MSAREA 2 [ 136.387408390,3] *********************************************** [ 136.387454272,3] < assert failed at core/device.c:605 > [ 136.387493225,3] . [ 136.387512799,3] . [ 136.387534056,3] . [ 136.387550294,3] OO__) [ 136.387579530,3] <"__/ [ 136.387605086,3] ^ ^ [ 136.387719329,3] Fatal TRAP at 0000000030028a18 .dt_property_set_cell+0x34 MSR 9000000000021002 [ 136.387801707,3] CFAR : 00000000300bfd3c MSR : 9000000000001000 [ 136.387847032,3] SRR0 : 0000000030028a18 SRR1 : 9000000000021002 [ 136.387893119,3] HSRR0: 0000000030012524 HSRR1: 9000000000001000 [ 136.387936830,3] DSISR: 40000000 DAR : 00000002019df000 [ 136.387983570,3] LR : 00000000300bfd40 CTR : 0000000000000000 [ 136.388046031,3] CR : 20004202 XER : 00000000 [ 136.388094553,3] GPR00: 00000000300bfd40 GPR16: 0000000000000001 [ 136.388139862,3] GPR01: 0000000031e536e0 GPR17: 00000000300ca3c9 [ 136.388181131,3] GPR02: 0000000030121200 GPR18: 0000000030103e1c [ 136.388224105,3] GPR03: 000000003053fc60 GPR19: 0000000000000008 [ 136.388270356,3] GPR04: 0000000000000001 GPR20: 000000003053fba0 [ 136.388313950,3] GPR05: 0000000000000008 GPR21: 0000000000000001 [ 136.388363021,3] GPR06: 0000000031e50060 GPR22: 0000000000000001 [ 136.388416754,3] GPR07: 0000000000000000 GPR23: 0000000000000000 [ 136.388465729,3] GPR08: 0000000000000000 GPR24: 0000000000000000 [ 136.388508156,3] GPR09: 0000000000000004 GPR25: 0000000031204060 [ 136.388556203,3] GPR10: 0000000000000008 GPR26: 000000003120402c [ 136.388599076,3] GPR11: 0000000000000000 GPR27: 0000000030010000 [ 136.388642108,3] GPR12: 0000000040004204 GPR28: 0000000000000002 [ 136.388694064,3] GPR13: 0000000031e50000 GPR29: 0000000031203ee0 [ 136.388743298,3] GPR14: 00000000300cbf03 GPR30: 0000000031202e80 [ 136.388797131,3] GPR15: 00000000300cc01c GPR31: 0000000030103a33 CPU 0048 Backtrace: S: 0000000031e539e0 R: 0000000030028874 .dt_resize_property+0x28 S: 0000000031e53a60 R: 00000000300bfd40 .memory_parse+0xd84 S: 0000000031e53c40 R: 00000000300bc4d8 .parse_hdat+0xed0 S: 0000000031e53e30 R: 000000003001504c .main_cpu_entry+0x1ac S: 0000000031e53f00 R: 0000000030002760 boot_entry+0x1b0 Avoid further appearances of the unidentified animal of doom by making dt_resize_property() do the length updating itself, freeing its callers from that need. Suggested-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-17test: Do gcov builds as a seperate passOliver O'Halloran1-2/+1
We only really use the gcov output when doing the coverage report as a part of the "docs" CI builds. It's useful for development to just run the unit tests so make sure the "check" and "coverage" targets are seperate. This also speeds up our CI builds since those jobs are already doing a seperate GCOV pass so building and running the GCOV binaries during the check pass is redundant. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11core/mce: add support for decoding and handling machine checksNicholas Piggin3-6/+240
This provides an initial facility to decode machine checks into human readable strings, plus a minimum amount of metadata that a handler has to understand in order to deal with the machine check. For now this is only used by skiboot to make MCE reporting nicer, and an ERAT flush recovery attempt which is more about code coverage than really being helpful. *********************************************** Fatal MCE at 00000000300c9c0c .memcmp+0x3c MSR 9000000000141002 Cause: instruction fetch TLB multi-hit error Effective address: 0x00000000300c9c0c ... The intention is to subsequently provide an OPAL API with this information that will enable an OS to implement a machine independent OPAL machine check driver. The code and data tables are derived from Linux code that I wrote, so relicensing is okay. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11core: interrupt markers for stack tracesNicholas Piggin9-7/+20
Use magic marker in the exception stack frame that is used by the unwinder to decode the interrupt type and NIA. The below example trace comes from a modified skiboot that uses virtual memory, but any interrupt type will appear similarly. CPU 0000 Backtrace: S: 0000000031c13580 R: 0000000030028210 .vm_dsi+0x360 S: 0000000031c13630 R: 000000003003b0dc .exception_entry+0x4fc S: 0000000031c13830 R: 0000000030001f4c exception_entry_foo+0x4 --- Interrupt 0x300 at 000000003002431c --- S: 0000000031c13b40 R: 000000003002430c .make_free.isra.0+0x110 S: 0000000031c13bd0 R: 0000000030025198 .mem_alloc+0x4a0 S: 0000000031c13c80 R: 0000000030028bac .__memalign+0x48 S: 0000000031c13d10 R: 0000000030028da4 .__zalloc+0x18 S: 0000000031c13d90 R: 000000003002fb34 .opal_init_msg+0x34 S: 0000000031c13e20 R: 00000000300234b4 .main_cpu_entry+0x61c S: 0000000031c13f00 R: 00000000300031b8 boot_entry+0x1b0 --- OPAL boot --- Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: the new stackentry fields made our test heaps too small] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> fixup! core: interrupt markers for stack traces
2020-06-11move opal_branch_table, opal_num_args to .rodata sectionNicholas Piggin1-6/+6
.head is for code and data which must reside at a fixed low address, mainly entry points. These are moved into .rodata. Despite being modified at runtime, this facilitates these tables being write-protected in a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: improve fast reboot sequenceNicholas Piggin1-119/+120
The current fast reboot sequence is not as robust as it could be. It is this: - Fast reboot CPU stops all other threads with direct control xscoms; - it disables ME (machine checks become checkstops); - resets its SPRs (to get HID[HILE] for machine check interrupts) and overwrites exception vectors with our vectors, with a special fast reboot sreset vector that fixes endian (because OS owns HILE); - then the fast reboot CPU enables ME. At this point the fast reboot CPU can handle machine checks with the skiboot handler, but no other cores can if the OS had switched HILE (they'll execute garbled byte swapped instructions and crash badly). - Then all CPUs run various cleanups, XIVE, resync TOD, etc. - The boot CPU, which is not necessarily the same as the fast reboot initiator CPU, runs xive_reset. This is a lot of code to run, including locking and xscoms, with machine check inoperable. - Finally secondaries are released and everyone sets SPRs and enables ME. Secondaries on other cores don't wait for their thread 0 to set shared SPRs before calling into the normal OPAL secondary code. This is mostly okay because the boot CPU pauses here until all secondaries reach their idle code, but it's not nice to release them out of the fast reboot code in a state with various per-core SPRs in flux. Fix this by having the fast reboot CPU not disable ME or reset its SPRs, because machine checks can still be handled by the OS. Then wait until all CPUs are called into fast reboot and spinning with ME disabled, only then reset any SPRs, copy remaining exception vectors, and now skiboot has taken over the machine check handling, then the CPUs enable ME before cleaning up other things. This way, the region with ME disabled and SPRs and exception vectors in flux is kept absolutely minimal, with no xscoms, no MMIOs, and few significant memory modifications, and all threads kept closely in step. There are no windows where a machine check interrupt may execute garbage due to mismatched HILE on any CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: don't back up old vectors upon fast rebootNicholas Piggin1-5/+5
Initial boot already saved original exception vectors to old_vectors, copying again upon fast reboot will overwrite old_vectors with some arbitrary vectors set up by the current OS. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: add missing clear memory fallbackNicholas Piggin1-2/+8
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>