Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch add a new function to dump PAU registers when a HMI has been
raised and an OpenCAPI link has been hit by an error.
For each register, the scom address and the register value are printed.
The hmi.c has been redesigned in order to support the new PHB/PCIEX
type (PAU OpenCapi). Now, the *npu* functions support NPU and PAU units of
P8, P9 and P10 chips.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Implement the necessary operations for the OpenCAPI PHB type and
inform the device-tree properties associated.
The OpenCapi PCI config Addr/Data registers are reachable through
the Generation-ID Registers MMIO BARS.
The Config Address and Data registers are located at the following offsets
from the AFU Config BAR plus 320 KB.
• Config Address for Brick 0 – Offset 0
• Config Data for Brick 0 – Offsets:
◦ 128 – 4-byte config register
• Config Address for Brick 1 – Offset 256
• Config Data for Brick 1 – Offsets:
◦ 384 – 4-byte config register
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
OpenCapi for P10 is included in the P10 chip. This requires OCAPI capable
PHYs, Datalink Layer Logic and Transaction Layer Logic to be included.
The PHYs are the physical connection to the OCAPI interconnect.
The Datalink Layer provides link training.
The Transaction Layer executes the cache coherent and data movement
commands on the P10 chip.
The PAU provides the Transaction Layer functionality for the OCAPI
link(s) on the P10 chip.
The P10 PAU supports two OCAPI links. Six accelerator units PAUs are
instantiated on the P10 chip for a total of twelve OCAPI links.
This patch adds PAU opencapi structure for supporting OpenCapi5.
hw/pau.c file contains main of PAU management functions.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
This patch enables Skiboot to initialize and Linux to boot to user space
on the AWAN core and chip models.
We need the distinction between core and chip models because the core
models do not have an XSCOM unit, CHIPTOD, nor RNG. The chip
model does have them and they work.
So, add a device_type property to the awan node to distinguish core from
chip. Sample DTS are provided for the core and chip models in
external/awan.
Just like Mambo, we need to return in slw_init before trying to
initialize SLW. Without an XSCOM unit in the device tree for the core
model, the SLW code path eventually fails an assert due to lack of
chips.
This commit defines a QUIRK_AWAN where previously Mambo used
QUIRK_MAMBO_CALLOUTS so now Mambo and AWAN core both work.
Also, fix up chip quirks so the core model and chip model boot and
initialize the appropriate units.
Disable sreset and power management in a couple spots because the chip
model does not support stop with EC=1 and enter_p9_pm_state spins in the
branch-to-self after stop.
Provide an external/awan/README.md with a high-level view of booting in
the environment.
Signed-off-by: Ryan Grimm <grimm@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
This significantly simplifies the SLW code.
HILE is now always supported.
Reviewed-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
If cpu_relax() is called when not at medium SMT priority, it will lose
the prior priority and return at medium. Add a debug check to catch
this, which would have flagged the previous bug.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Calling cpu_relax resets the SMT priority to medium, causing the idle
loop not to run with lowest priority. Just use barrier() instead, this
saves about 3 seconds on a SMT4 systemsim (mambo) boot.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
On P10, get_ics_phandle() calls xive2_get_phandle() directly. This
results in a NULL dereference on mambo when xive2 is not set up.
This was caught with the virtual memory boot patch on P10 mambo.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
npu3 was only used on the Swift platform to add support for
GPUs (nvlink). The Swift platform has never left the lab and support
for GPUs on it is pretty much dead. So let's remove it.
The patch removes all related code. Device tree entries are no
longer created and in the very unlikely case that someone is still
trying to boot it, the linux nvlink discovery code should be quiet.
Tested by booting on Swift with no GPU.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
We only support the XIVE interface.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Update libpore with P10 STOP API. Add minor changes to make
P9 stop-api and P10 stop-api to co-exist in OPAL.
These calls are required for STOP11 support on P10.
STIOP0,2,3 on P10 does not lose full core state or scoms.
stop-api based restore of SPRs or xscoms required only
for STOP11 on P10.
STOP11 on P10 will be a limited lab test/stress feature
and not a product feature. (Same case as P9)
Co-authored-by: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Signed-off-by: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Co-authored-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Co-authored-by: Ryan Grimm <grimm@linux.ibm.com>
Signed-off-by: Ryan Grimm <grimm@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
The PHB5 logic on P10 is pretty close to the P9's version. So
we keep our base phb4 implementation and just add the few changes
within if statements.
Signed-off-by: Jordan Niethe <jpn@ozlabs.au.ibm.com>
[clg: misc cleanups and fixes ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[Fixed compilation issue - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[Nick: Unify PHB4/PHB5 drivers ]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Mikey: set default lane eq settings for phb5]
Signed-off-by: Michael Neuling <mikey@neuling.org>
[FB: squash commits + small cleanup ]
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
The XIVE2 interrupt controller of the POWER10 processor follows the
same logic than on POWER9 but the HW interface has been largely
reviewed. It has a new register interface, different BARs, extra
VSDs, new layout for the XIVE structures, and a set of new features
which are described below.
The OPAL XIVE2 driver code activating this controller was duplicated
from P9 for clarity as the registers and structures have changed
considerably. The same OPAL interface is implemented for OS
compatibility and it should not impact existing Linux kernels, KVM
included. Guest OS is not impacted either.
Support for new features will be implemented in time and will require
new support from the OS.
* XIVE2 BARS
The interrupt controller BARs have a different layout outlined below.
Each sub-engine has now own its range and the indirect TIMA access was
replaced with a set of pages, one per CPU, under the IC BAR:
- IC BAR (Interrupt Controller)
. 4 pages, one per sub-engine
. 128 indirect TIMA pages
- TM BAR (Thread Interrupt Management Area)
. 4 pages
- ESB BAR (ESB pages for IPIs)
. up to 1TB
- END BAR (ESB pages for ENDs)
. up to 2TB
- NVC BAR (Notification Virtual Crowd)
. up to 128
- NVPG BAR (Notification Virtual Process and Group)
. up to 1TB
- Direct mapped Thread Context Area (reads & writes)
OPAL does not use the grouping and crowd capability.
* Virtual Structure Tables
XIVE2 adds new tables types and also changes the field layout of the END
and NVP Virtualization Structure Descriptors.
- EAS
- END new layout
- NVT was splitted in :
. NVP (Processor), 32B
. NVG (Group), 32B
. NVC (Crowd == P9 block group) 32B
- IC for remote configuration
- SYNC for cache injection
- ERQ for event input queue
The setup is slighly different on XIVE2 because the indexing has changed
for some of the tables, block ID or the chip topology ID can be used.
* XIVE2 features
SCOM and MMIO registers have a new layout and XIVE2 adds a new global
capability and configuration registers.
The lowlevel hardware offers a set of new features among which :
- cache injection mechanism
- 4 cache watch engines
- a configurable number of priorities : 1 -8
- StoreEOI with load-after-store ordering is activated by default
- new sync/kill operations for cache operations
Other features will have some impact on the Hypervisor and guest OS
when activated, but this is not required for initial support of the
controller.
- Gen2 TIMA layout
- A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility
- Automatic Context save & restore
- increase to 24bit for VP number
- New escalations schems : ESB, Adaptive, CPPR
POWER10 adds support for User interrupts. When configured, the XIVE2
controller can notify directly user processes using the Event Based
Branch exception line of the thread. If not running, the OS is
notified through an escalation event. New OPAL and PAPR interfaces
will be required and OS support needs to be studied.
* XIVE2 P9-compat mode, or Gen1
The thread interrupt management area (TIMA) is a set of pages mapped
in the Hypervisor and in the guest OS address space giving access to
the interrupt thread context registers for interrupt management, ACK,
EOI, CPPR, etc.
XIVE2 changes slightly the TIMA layout with extra bits for the new
features, larger CAM lines and the controller provides configuration
switches for backward compatibility. This is called the XIVE2
P9-compat mode, of Gen1 TIMA. It impacts the layout of the TIMA and
the availability of the internal features associated with it,
Automatic Save & Restore for instance. Using a P9 layout also means
setting the controller in such a mode at init time.
The XIVE2 driver in OPAL chooses to initialize the XIVE2 controller
with a XIVE2/P10 TIMA directly because the layouts are compatible with
the Linux PowerNV and the guest OSes expectations.
For KVM support, the OPAL calls abstract the HW interface and no
assumption is made on the OS CAM line width.
* Activating new XIVE2 features
Everything related to OPAL internals such as the use of the new cache
sync mechanism can be implemented in time without impact on the OS.
Other features will require new device tree properties exposed to the
OS and extra support for the OS. Automatic Context save & restore is
one of the first feature which should be looked at.
* XICS-over-XICS driver (P8 compatibility)
The P8 emulation mode is an OPAL compat interface used for Linux
kernels which did not have XIVE native support. This was useful for
POWER9 bringup but it is much less now. As it was adding a lot of
complexity and reducing the interrupt controller resources, this mode
is not available in the XIVE2 driver for POWER10.
It will still be possible to add this compat mode in the future if
required. The OS will have to reset the driver at boot time, like on
POWER9.
* Impact on other drivers (PSI, PHB, NPU)
Interrupts are allocated in a very similar way. Each controller might
have different ESB characteristics, StoreEOI support, 64K pages for
PSI. All is in place to support these changes already.
PHB5 will have support for "address-based trigger mode", probably in
the DD2.0 time frame when verification is completed. When activated,
the XIVE IC ESB pages will be used instead of the PHB ESB pages for a
lower interrupt latency.
LSI will still use old fashion triggers without StoreEOI.
* Yet to be addressed :
- OPAL P10 interface incomplete (stop states)
- Clarify the PHB5 strategy regarding the use of the XIVE IC ESB
pages instead of the PHB ones when address-based trigger mode is
supported.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
HDAT provides Topology ID table and the primary topology location on
P10. This primary location points to primary topology entry in ID table
which contains the primary topology index and this index is used to
define the paste base address per chip.
This patch reads Topology ID table and the primary topology location
from hdata and retrieves the primary topology index in the ID table.
Make this primaty topology index value available with
ibm,primary-topology-index property per chip. VAS reads this property
to setup paste base address for each chip.
Signed-off-by: Haren Myneni <haren@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
BMC is still defined as ast2500 but it should change to ast2600 when
available.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
[Folded Ravi's DAWR patch - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
This works around a core recovery issue in P10. The workaround involves
the CME polling for a core recovery and performing the recovery
procedure itself.
For this to happen, the host leaves core recovery off (HID[5]) and
then masks the PC system checkstop. This patch does this.
Firmware starts skiboot with recovery already off, so we just leave it
off for longer and then mask the PC system checkstop. This makes the
window longer where a core recovery can cause an xstop but this
window is still small and can still only happens on boot.
Signed-off-by: Michael Neuling <mikey@neuling.org>
[Added mambo check - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Co-authored-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Co-authored-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Co-authored-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Co-authored-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Co-authored-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Co-authored-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
De-assert special wakeup bits for the case when SPWU bit is set, however
the core is gated to maintain a coherent state for special wakeup.
Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Starting from p10 hostboot will no longer clear all the system memory except
its own space. OPAL uses the memory at SKIBOOT_BASE + SKIBOOT_SIZE for cpu
stack with pir as index. With hostboot no longer clearing memory this region
may hold junk contents. Currently opal initialize cpu stack memory only for
cpu pir that is found on the device-tree. For the rest, the cpu thread
contents are uninitialized. This sometime causes for_each_cpu* macros to
return cpu thread for pir/cpu which isn't present on the system. The
for_each_cpu* macros iterate over cpu stacks using pir as index and returns
cpu thread pointer if state != cpu_state_no_cpu. For cpus that are not found
on device-tree the state may hold junk value leading OPAL to access invalid
cpu thread area. This further leads to accessing pointers with junk values
causing machine check (MCE) during OPAL init code. Fix this by Initializing
all the cpu thread areas upto cpu_max_pir.
[ 182.049714372,3] ***********************************************
[ 182.049878580,3] Fatal MCE at 0000000030039738 .init_trace_buffers+0x21c MSR 9000000000201002
[ 182.049943811,3] Cause: load real address error
[ 182.049968681,3] Effective address: 0x480113a4791c4a50
[ 182.050000736,3] CFAR : 00000000300395b8 MSR : 9000000000201002
[ 182.050035376,3] SRR0 : 0000000030039738 SRR1 : 9000000000201002
[ 182.050072878,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000
[ 182.050117303,3] DSISR: 00000040 DAR : 480113a4791c4a50
[ 182.050149054,3] LR : 0000000030039744 CTR : 0000000000000000
[ 182.050182991,3] CR : 42000224 XER : 00000000
[ 182.050217262,3] GPR00: 000000003003962c GPR16: 0000000032d50000
[ 182.050255746,3] GPR01: 0000000032d53a50 GPR17: 0000000030003198
[ 182.050288081,3] GPR02: 000000003014cb00 GPR18: 0000000000000000
[ 182.050331474,3] GPR03: 0000000031c50000 GPR19: 0000000000000000
[ 182.050371934,3] GPR04: 0000000000000000 GPR20: 0000000000000000
[ 182.050416212,3] GPR05: ffffffffffffffff GPR21: 0000000000000001
[ 182.050454130,3] GPR06: 0000000000000005 GPR22: 00000000300f74eb
[ 182.050488053,3] GPR07: 0000000000000028 GPR23: 00000000000fffd8
[ 182.050522774,3] GPR08: 000000000000067f GPR24: 00000000000fff40
[ 182.050566878,3] GPR09: 480113a4791c4a18 GPR25: 0000000000000070
[ 182.050601524,3] GPR10: 00000000078b0353 GPR26: 00000000300f7527
[ 182.050640345,3] GPR11: 0000000000000000 GPR27: 00000000300f7516
[ 182.050680816,3] GPR12: 0000000042000222 GPR28: 000000003acd0000
[ 182.050724099,3] GPR13: 000000000025a908 GPR29: 000000003acd0000
[ 182.050759728,3] GPR14: 0000000000000000 GPR30: 0000000000000000
[ 182.050790430,3] GPR15: 0000000000000000 GPR31: 00000000301f0038
CPU 0228 Backtrace:
S: 0000000032d53d60 R: 000000003003962c .init_trace_buffers+0x110
S: 0000000032d53e30 R: 0000000030022f84 .main_cpu_entry+0x550
S: 0000000032d53f00 R: 00000000300031f8 not_fused+0x11c
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Folded Nick's patch to that added mark_all_secondary_cpus_absent() - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Add support for tracing I2C transactions performed by skiboot. This covers
both internally initiated I2C ops and those that requested by the kernel
via the OPAL API.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Previously we put all the trace buffer exports in the exports/ node.
However, there's one trace buffer for each core so I moved them into a
subdirectory since they were crowding up the place. Most kernels don't
support recursively exporting subnodes though so kernel's don't have
support for recursively exporting subnodes, so add a hack to restore the
old behaviour for now.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[Fixed run-trace test case - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Async machine check errors due to bad real address from store or
foreign link time out comes with the load/store bit (PPC bit 42)
set in SRR1 but the cause is set in SRR1 not DSISR, unlike other
errors that have the load/store bit set.
This behaviour was omitted from the POWER9 User Manual but it is
confirmed to be the expected one. Update the machine check decoder
to match.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
cpu_pm_idle sets pm_enabled = false and expected all cpus
to exit idle. This is needed to re-enter with new settings.
Right after cpu_bringup() we call copy_sreset_vector() and then
cpu_set_sreset_enable(true). At this time some cpus are still
yet to enter idle and hence miss the doorbell to wakeup.
This leads to cpu_pm_idle waiting forever. This pattern happens
on some system in fused-core mode.
The fact that pm_enabled flag is changing right in the middle of
idle entry is see from the "cpu_idle_p9 called with pm disabled" traces.
One method to fix this race is to retry the door-bell after a timeout.
This patch implements a small time out (few seconds) and then issues
the doorbell once again to kick the cpu that entered idle late after
missing the pm_enabled = false flag.
This checking loop run in smt_lowest() and hence the timeout number
maps to couple of seconds which is sufficient to let the cpus settle in
idle and make them see the doorbell and exit.
Example boot log:
[ 288.309322810,7] INIT: CPU PIR 0x000d called in
[ 288.309320768,7] INIT: CPU PIR 0x000b called in
[ 288.314603802,7] INIT: CPU PIR 0x0020 called in
[ 288.321303468,5] CPU: All 88 processors called in...
[ 288.315056796,6] cpu_idle_p9 called on cpu 0x024e with pm disabled
[ 288.321308091,6] cpu_idle_p9 called on cpu 0x0264 with pm disabled
[ 288.314424259,6] cpu_idle_p9 called on cpu 0x025b with pm disabled
[ 288.324928307,6] cpu_idle_p9 called on cpu 0x0065 with pm disabled
[ 305.207316004,6] cpu_pm_disable TIMEOUT on cpu 0x0261 to exit idle
[ 322.093298501,6] cpu_pm_disable TIMEOUT on cpu 0x0263 to exit idle
[ 338.491281028,6] cpu_pm_disable TIMEOUT on cpu 0x0265 to exit idle
[ 355.377263492,6] cpu_pm_disable TIMEOUT on cpu 0x0267 to exit idle
[ 372.263245960,6] cpu_pm_disable TIMEOUT on cpu 0x0269 to exit idle
[ 389.149228389,6] cpu_pm_disable TIMEOUT on cpu 0x026b to exit idle
[ 406.035210852,6] cpu_pm_disable TIMEOUT on cpu 0x026d to exit idle
[ 422.433193381,6] cpu_pm_disable TIMEOUT on cpu 0x026f to exit idle
[ 422.433277720,6] CHIPTOD: Calculated MCBS is 0x25 (Cfreq=2000000000 Tfreq=32000000)
Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
[Reworded commit message - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
imc_init() checks for the 24x7 microcode state at boot to
check whether the microcode is in proper state (running or paused).
But in a larger system, loading of 24x7 microcode by OCC gets delayed.
Because of this, imc_init() removes imc devices from the device tree.
Moving imc_init() function towards end of the main_cpu_entry()
works around this.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Sample output from Cédric:
-------------------------
[ 88.294111649,7] cpu_idle_p9 called on cpu 0x063c with pm disabled
[ 88.289365222,7] cpu_idle_p9 called on cpu 0x025f with pm disabled
[ 88.289900684,7] cpu_idle_p9 called on cpu 0x045f with pm disabled
[ 88.302621295,7] CHIPTOD: Base TFMR=0x2512000000000000
[ 88.289899701,7] cpu_idle_p9 called on cpu 0x0456 with pm disabled
LOCK ERROR: Deadlock detected @0x30402740 (state: 0x0000000400000001)
[ 88.332264757,3] ***********************************************
[ 88.332300051,3] < assert failed at core/lock.c:32 >
[ 88.332328282,3] .
[ 88.332347335,3] .
[ 88.332364894,3] .
[ 88.332377963,3] OO__)
[ 88.332395458,3] <"__/
[ 88.332412628,3] ^ ^
[ 88.332450246,3] Fatal TRAP at 00000000300286a0 .lock_error+0x64 MSR 9000000000021002
[ 88.332501812,3] CFAR : 00000000300414f4 MSR : 9000000000021002
[ 88.332536539,3] SRR0 : 00000000300286a0 SRR1 : 9000000000021002
[ 88.332574644,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000
[ 88.332610635,3] DSISR: 00000000 DAR : 0000000000000000
[ 88.332650628,3] LR : 0000000030028690 CTR : 00000000300f9fa0
[ 88.332684451,3] CR : 20002000 XER : 00000000
[ 88.332712767,3] GPR00: 0000000030028690 GPR16: 0000000032c98000
[ 88.332748046,3] GPR01: 0000000032c9b0a0 GPR17: 0000000000000000
[ 88.332784060,3] GPR02: 0000000030169d00 GPR18: 0000000000000000
[ 88.332822091,3] GPR03: 0000000032c9b310 GPR19: 0000000000000000
[ 88.332861357,3] GPR04: 0000000030041480 GPR20: 0000000000000000
[ 88.332897229,3] GPR05: 0000000000000000 GPR21: 0000000000000000
[ 88.332937051,3] GPR06: 0000000000000010 GPR22: 0000000000000000
[ 88.332968463,3] GPR07: 0000000000000000 GPR23: 0000000000000000
[ 88.333007333,3] GPR08: 000000000002cbb5 GPR24: 0000000000000000
[ 88.333041971,3] GPR09: 0000000000000000 GPR25: 0000000000000000
[ 88.333081073,3] GPR10: 0000000000000000 GPR26: 0000000000000003
[ 88.333114301,3] GPR11: 3839616263646566 GPR27: 0000000000000211
[ 88.333156040,3] GPR12: 0000000020002000 GPR28: 000000003042a134
[ 88.333189222,3] GPR13: 0000000000000000 GPR29: 0000000030402740
[ 88.333225638,3] GPR14: 0000000000000000 GPR30: 0000000000000001
[ 88.333259730,3] GPR15: 0000000000000000 GPR31: 0000000000000000
CPU 0211 Backtrace:
S: 0000000032c9b3b0 R: 0000000030028690 .lock_error+0x54
S: 0000000032c9b440 R: 0000000030028828 .add_lock_request+0xd0
S: 0000000032c9b4f0 R: 0000000030028a9c .lock_caller+0x8c
S: 0000000032c9b5a0 R: 0000000030021b30 .__mcount_stack_check+0x70
S: 0000000032c9b650 R: 00000000300fabb0 .list_check_node+0x1c
S: 0000000032c9b6f0 R: 00000000300fac98 .list_check+0x38
S: 0000000032c9b790 R: 00000000300289bc .try_lock_caller+0xac
S: 0000000032c9b830 R: 0000000030028ad8 .lock_caller+0xc8
S: 0000000032c9b8e0 R: 0000000030028d74 .lock_recursive_caller+0x54
S: 0000000032c9b980 R: 0000000030020cb8 .console_write+0x48
S: 0000000032c9ba30 R: 00000000300445a8 .vprlog+0xc8
S: 0000000032c9bc20 R: 0000000030044630 ._prlog+0x50
S: 0000000032c9bcb0 R: 0000000030029204 .cpu_idle_p9+0x74
S: 0000000032c9bd40 R: 0000000030029628 .cpu_idle_pm+0x4c
S: 0000000032c9bde0 R: 0000000030023fe0 .__secondary_cpu_entry+0xa0
S: 0000000032c9be70 R: 0000000030024034 .secondary_cpu_entry+0x40
S: 0000000032c9bf00 R: 0000000030003290 secondary_wait+0x8c
CPU 0x4:
opal_run_pollers ->
check_stacks -> takes stack_check_lock lock
prlog ->
console_write -> waits for con_lock
CPU 0x211
cpu_idle_p9 ->
prlog ->
console_write -> Takes con_lock lock
list_check_node -> tries to take stack_check_lock and hits deadlock.
I think we don't need to hold `stack_check_lock` while printing
backtraces. Instead it makes sense to hold backtrace lock (bt_lock)
and print output.
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Fixes:
core/opal.c:418:61: warning: Using plain integer as NULL pointer
Signed-off-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
If fast reboot fails then we return to Linux with OPAL_SUCCESS.
Current Linux code thinks that request succedded and enters
infinite loop (see Linux pnv_restart() code).
This patch fixes above issue by return OPAL_UNSUPPORTED if fast
reboot fails.
Alternatively we can directly call full_reboot() itself. But I
think it makes sense to go back to Linux and report the failure.
And Linux falls back to normal reboot request.
Fixes: 10bbcd07 ("core/platform: Add an explicit fast-reboot type")
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Dan Horák <dan@danny.cz>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
next_unguarded_primary dereferences NULL CPU -> UB -> infinite loop
Fast reboot works again after this patch.
Fixes: 98f5834253c7e ("cpu: Keep track of the "ec_primary" in big core more")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
In secure boot enabled systems, the petitboot linux kernel verifies the
OS kernel against x509 certificates that are wrapped in secure variables
controlled by OPAL. These secure variables are stored in the PNOR SECBOOT
partition, as well as the updates submitted for them using userspace
tools.
This patch adds read and write support to the PNOR SECBOOT partition in
a similar fashion to that of NVRAM, so that OPAL can handle the secure
variables.
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
OS Secure Boot establishes a chain of trust from firmware to the OS.
However, OS Secure Boot can only be secure if the chain of trust
beneath it - from hardware to firmware - has been established by
Firmware Secure Boot. This patch ensures that OS Secure Boot is enabled
only if Firmware Secure Boot is enabled.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
With DEBUG=1 builds we use the mcount hook to instrument how much stack
space we're using. If we detect that a function call has come within
2KB of the bottom of the stack we currently print a backtrace. This can
result in a huge amount of console IO in DEBUG=1 builds which can cause
op-test timeouts, etc.
Printing a backtrace on each function call isn't terribly useful, and it
ends up crowding out the backtrace that's printed when we hit a new
stack usage watermark. The watermark should provide enough information
to find and fix excessive stack usage issues so drop the per-function
backtrace printing and move the warning into the high-watermark check.
This change is largely necessary because of DEBUG=1 expands adds a
backtrace save area to struct lock which expands the size of it to nearly
2KB. struct cpu_thread (which lives at the bottom of the per-thread stacks)
contains three locks and an additional backtrace save area which is
enabled when DEBUG=1. The extra space requirements result in cpu_thread
ballooning from ~420 bytes to nearly 8KB.
Any growth in cpu_thread also results in less stack space being
available for the thread, so when DEBUG=1 is enabled we go from having
a 16KB stack to an 8KB stack. Although this seems large, skiboot does
have some fairly deep call chains (UART console flushing, TPM drivers,
both combined) which can cause the thread to come within 2KB of the stack
use warning zone.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
Maybe we should swap locations of the normal and emergency stacks so
cpu_thread takes space from the emergency stack rather than the normal one.
The e-stack should only be used at runtime where the call chains should be
smaller.
|
|
Previous commit 482f18adf21eeb5f6ce2a93334725509a8f6f0cd
added check for fused core mode and bailed out.
The check can be removed since fused core mode
is now supported in OPAL.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Add PVR checks and feature mapping for POWER9 Cumulus chip.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
cpu_get_core_index() currently uses pir_to_core_id() which returns
an EC number always (ie, a normal core number) even in fused core
mode. This is inconsistent with cpu_get_thread_index() which returns
a thread within a fused core (0...7) on P9.
So let's make things consistent and document it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The "EC" primary is the primary thread of an EC, ie, the corresponding
small core "half" of the big core where the thread resides.
It will be necessary for the direct controls to target the right
half when doing special wakeups among others.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
pir_to_core_id() and pir_to_thread_id() are extensively
used by the direct controls code and are expected to return
the "normal" (non-fused, aka EC) core/thread IDs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
P9 cores can be configured into fused core mode where two core chiplets
function as an 8-threaded, single core. So, bump four to eight in boot_entry
when in fused core mode and cpu_thread_count in init_boot_cpu.
The HID, AMOR, TSCR, RPR require the first active thread on that core chiplet
to load the copy for that core chiplet. So, send thread 1 of a fused core to
init_shared_sprs in boot_entry.
The code checks for fused core mode in the core thead state register and puts a
field in struct cpu_thread. This flag is checked when updating the HID and in
XIVE code when setting the special bar.
For XSCOM, the core ID is the non-fused EX. So, create macros to arrange the
bits. It's fairly verbose but somewhat readable.
This was tested on a P9 ZZ with 16 fused cores and ran HTX for over 24 hours.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Commit 34664746 moved opal_mpipl_save_crashing_pir() function call from
platform specific code to generic assert() path. I completely missed
to take care of all terminate path :-(
This resulted in breaking `opalcore` on Linux kernel initiated MPIPL. As :
- Linux initiated MPIPL calls platform termination function directly
- ELF core format needs crashing CPU details to generate proper code
Hence I think it makes sense to move this back to platform specific
terminate handler code.
Today we have two ways to trigger MPIPL based on service processor.
- On BMC system we call SBE S0 interrupt
- On FSP system we call `attn` instruction
In future if we add new ways to trigger MPIPL then we have to add platform
specific support code anyway. That way its fine to move this to platform
sepcific code.
One alternative is to make this call in all code path before making
platform.terminate call... which makes it more complicated than above approach.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
continuous MPIPL loop (boot -> crash -> MPIPL -> boot).
If OPAL boot fails (before loading kernel) then its better to call reboot.
So that BMC can detect `n` number of boot failures (generally n = 3) and
stop booting. That way we can avoid continuous loop.
This patch moves MPIPL init to the end of init process (just before starting
kernel). So that if we fail to boot OPAL we call normal reboot.
Also this patch introduces new function to detect MPIPL is enabled or not
(is_mpipl_enabled()). And in assert() path we check for this function
instead of `dump` DT node. So that it will make sure we will not call
MPIPL until opal_mpipl_init is complete.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
All callers of dt_resize_property() need to set the new property length
after calling it. append_chip_id() wasn't doing it, which caused this
assert when booting my machine:
[ 136.387213258,3] Unable to use memory range 0 from MSAREA 0
[ 136.387356677,3] Unable to use memory range 0 from MSAREA 2
[ 136.387408390,3] ***********************************************
[ 136.387454272,3] < assert failed at core/device.c:605 >
[ 136.387493225,3] .
[ 136.387512799,3] .
[ 136.387534056,3] .
[ 136.387550294,3] OO__)
[ 136.387579530,3] <"__/
[ 136.387605086,3] ^ ^
[ 136.387719329,3] Fatal TRAP at 0000000030028a18 .dt_property_set_cell+0x34 MSR 9000000000021002
[ 136.387801707,3] CFAR : 00000000300bfd3c MSR : 9000000000001000
[ 136.387847032,3] SRR0 : 0000000030028a18 SRR1 : 9000000000021002
[ 136.387893119,3] HSRR0: 0000000030012524 HSRR1: 9000000000001000
[ 136.387936830,3] DSISR: 40000000 DAR : 00000002019df000
[ 136.387983570,3] LR : 00000000300bfd40 CTR : 0000000000000000
[ 136.388046031,3] CR : 20004202 XER : 00000000
[ 136.388094553,3] GPR00: 00000000300bfd40 GPR16: 0000000000000001
[ 136.388139862,3] GPR01: 0000000031e536e0 GPR17: 00000000300ca3c9
[ 136.388181131,3] GPR02: 0000000030121200 GPR18: 0000000030103e1c
[ 136.388224105,3] GPR03: 000000003053fc60 GPR19: 0000000000000008
[ 136.388270356,3] GPR04: 0000000000000001 GPR20: 000000003053fba0
[ 136.388313950,3] GPR05: 0000000000000008 GPR21: 0000000000000001
[ 136.388363021,3] GPR06: 0000000031e50060 GPR22: 0000000000000001
[ 136.388416754,3] GPR07: 0000000000000000 GPR23: 0000000000000000
[ 136.388465729,3] GPR08: 0000000000000000 GPR24: 0000000000000000
[ 136.388508156,3] GPR09: 0000000000000004 GPR25: 0000000031204060
[ 136.388556203,3] GPR10: 0000000000000008 GPR26: 000000003120402c
[ 136.388599076,3] GPR11: 0000000000000000 GPR27: 0000000030010000
[ 136.388642108,3] GPR12: 0000000040004204 GPR28: 0000000000000002
[ 136.388694064,3] GPR13: 0000000031e50000 GPR29: 0000000031203ee0
[ 136.388743298,3] GPR14: 00000000300cbf03 GPR30: 0000000031202e80
[ 136.388797131,3] GPR15: 00000000300cc01c GPR31: 0000000030103a33
CPU 0048 Backtrace:
S: 0000000031e539e0 R: 0000000030028874 .dt_resize_property+0x28
S: 0000000031e53a60 R: 00000000300bfd40 .memory_parse+0xd84
S: 0000000031e53c40 R: 00000000300bc4d8 .parse_hdat+0xed0
S: 0000000031e53e30 R: 000000003001504c .main_cpu_entry+0x1ac
S: 0000000031e53f00 R: 0000000030002760 boot_entry+0x1b0
Avoid further appearances of the unidentified animal of doom by making
dt_resize_property() do the length updating itself, freeing its callers
from that need.
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
We only really use the gcov output when doing the coverage report as a
part of the "docs" CI builds. It's useful for development to just run
the unit tests so make sure the "check" and "coverage" targets are
seperate.
This also speeds up our CI builds since those jobs are already doing a
seperate GCOV pass so building and running the GCOV binaries during the
check pass is redundant.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This provides an initial facility to decode machine checks into
human readable strings, plus a minimum amount of metadata that
a handler has to understand in order to deal with the machine
check.
For now this is only used by skiboot to make MCE reporting nicer,
and an ERAT flush recovery attempt which is more about code
coverage than really being helpful.
***********************************************
Fatal MCE at 00000000300c9c0c .memcmp+0x3c MSR 9000000000141002
Cause: instruction fetch TLB multi-hit error
Effective address: 0x00000000300c9c0c
...
The intention is to subsequently provide an OPAL API with this
information that will enable an OS to implement a machine
independent OPAL machine check driver.
The code and data tables are derived from Linux code that I wrote,
so relicensing is okay.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Use magic marker in the exception stack frame that is used by the
unwinder to decode the interrupt type and NIA. The below example trace
comes from a modified skiboot that uses virtual memory, but any
interrupt type will appear similarly.
CPU 0000 Backtrace:
S: 0000000031c13580 R: 0000000030028210 .vm_dsi+0x360
S: 0000000031c13630 R: 000000003003b0dc .exception_entry+0x4fc
S: 0000000031c13830 R: 0000000030001f4c exception_entry_foo+0x4
--- Interrupt 0x300 at 000000003002431c ---
S: 0000000031c13b40 R: 000000003002430c .make_free.isra.0+0x110
S: 0000000031c13bd0 R: 0000000030025198 .mem_alloc+0x4a0
S: 0000000031c13c80 R: 0000000030028bac .__memalign+0x48
S: 0000000031c13d10 R: 0000000030028da4 .__zalloc+0x18
S: 0000000031c13d90 R: 000000003002fb34 .opal_init_msg+0x34
S: 0000000031c13e20 R: 00000000300234b4 .main_cpu_entry+0x61c
S: 0000000031c13f00 R: 00000000300031b8 boot_entry+0x1b0
--- OPAL boot ---
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[oliver: the new stackentry fields made our test heaps too small]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
fixup! core: interrupt markers for stack traces
|
|
.head is for code and data which must reside at a fixed low address,
mainly entry points.
These are moved into .rodata. Despite being modified at runtime, this
facilitates these tables being write-protected in a later patch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The current fast reboot sequence is not as robust as it could be. It
is this:
- Fast reboot CPU stops all other threads with direct control xscoms;
- it disables ME (machine checks become checkstops);
- resets its SPRs (to get HID[HILE] for machine check interrupts) and
overwrites exception vectors with our vectors, with a special fast
reboot sreset vector that fixes endian (because OS owns HILE);
- then the fast reboot CPU enables ME.
At this point the fast reboot CPU can handle machine checks with the
skiboot handler, but no other cores can if the OS had switched HILE
(they'll execute garbled byte swapped instructions and crash badly).
- Then all CPUs run various cleanups, XIVE, resync TOD, etc.
- The boot CPU, which is not necessarily the same as the fast reboot
initiator CPU, runs xive_reset.
This is a lot of code to run, including locking and xscoms, with
machine check inoperable.
- Finally secondaries are released and everyone sets SPRs and enables
ME.
Secondaries on other cores don't wait for their thread 0 to set shared
SPRs before calling into the normal OPAL secondary code. This is
mostly okay because the boot CPU pauses here until all secondaries
reach their idle code, but it's not nice to release them out of the
fast reboot code in a state with various per-core SPRs in flux.
Fix this by having the fast reboot CPU not disable ME or reset its
SPRs, because machine checks can still be handled by the OS. Then
wait until all CPUs are called into fast reboot and spinning with
ME disabled, only then reset any SPRs, copy remaining exception
vectors, and now skiboot has taken over the machine check handling,
then the CPUs enable ME before cleaning up other things.
This way, the region with ME disabled and SPRs and exception vectors
in flux is kept absolutely minimal, with no xscoms, no MMIOs, and few
significant memory modifications, and all threads kept closely in step.
There are no windows where a machine check interrupt may execute
garbage due to mismatched HILE on any CPU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Initial boot already saved original exception vectors to old_vectors,
copying again upon fast reboot will overwrite old_vectors with some
arbitrary vectors set up by the current OS.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|