Age | Commit message (Collapse) | Author | Files | Lines |
|
The current fast reboot sequence is not as robust as it could be. It
is this:
- Fast reboot CPU stops all other threads with direct control xscoms;
- it disables ME (machine checks become checkstops);
- resets its SPRs (to get HID[HILE] for machine check interrupts) and
overwrites exception vectors with our vectors, with a special fast
reboot sreset vector that fixes endian (because OS owns HILE);
- then the fast reboot CPU enables ME.
At this point the fast reboot CPU can handle machine checks with the
skiboot handler, but no other cores can if the OS had switched HILE
(they'll execute garbled byte swapped instructions and crash badly).
- Then all CPUs run various cleanups, XIVE, resync TOD, etc.
- The boot CPU, which is not necessarily the same as the fast reboot
initiator CPU, runs xive_reset.
This is a lot of code to run, including locking and xscoms, with
machine check inoperable.
- Finally secondaries are released and everyone sets SPRs and enables
ME.
Secondaries on other cores don't wait for their thread 0 to set shared
SPRs before calling into the normal OPAL secondary code. This is
mostly okay because the boot CPU pauses here until all secondaries
reach their idle code, but it's not nice to release them out of the
fast reboot code in a state with various per-core SPRs in flux.
Fix this by having the fast reboot CPU not disable ME or reset its
SPRs, because machine checks can still be handled by the OS. Then
wait until all CPUs are called into fast reboot and spinning with
ME disabled, only then reset any SPRs, copy remaining exception
vectors, and now skiboot has taken over the machine check handling,
then the CPUs enable ME before cleaning up other things.
This way, the region with ME disabled and SPRs and exception vectors
in flux is kept absolutely minimal, with no xscoms, no MMIOs, and few
significant memory modifications, and all threads kept closely in step.
There are no windows where a machine check interrupt may execute
garbage due to mismatched HILE on any CPU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
SPDX makes it a simpler diff.
I have audited the commit history of each file to ensure that they are
exclusively authored by IBM and thus we have the right to relicense.
The motivation behind this is twofold:
1) We want to enable experiments with coreboot, which is GPLv2 licensed
2) An upcoming firmware component wants to incorporate code from skiboot
and code from the Linux kernel, which is GPLv2 licensed.
I have gone through the IBM internal way of gaining approval for this.
The following files are not exclusively authored by IBM, so are *not*
included in this update (I will be seeking approval from contributors):
core/direct-controls.c
core/flash.c
core/pcie-slot.c
external/common/arch_flash_unknown.c
external/common/rules.mk
external/gard/Makefile
external/gard/rules.mk
external/opal-prd/Makefile
external/pflash/Makefile
external/xscom-utils/Makefile
hdata/vpd.c
hw/dts.c
hw/ipmi/ipmi-watchdog.c
hw/phb4.c
include/cpu.h
include/phb4.h
include/platform.h
libflash/libffs.c
libstb/mbedtls/sha512.c
libstb/mbedtls/sha512.h
platforms/astbmc/barreleye.c
platforms/astbmc/garrison.c
platforms/astbmc/mihawk.c
platforms/astbmc/nicole.c
platforms/astbmc/p8dnu.c
platforms/astbmc/p8dtu.c
platforms/astbmc/p9dsu.c
platforms/astbmc/vesnin.c
platforms/rhesus/ec/config.h
platforms/rhesus/ec/gpio.h
platforms/rhesus/gpio.c
platforms/rhesus/rhesus.c
platforms/astbmc/talos.c
platforms/astbmc/romulus.c
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: fixed up the drift]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Fast reboot started life as a debug hack and it escaped into the wild when
Stewart enabled it by default. There was some reasons for this, but the
main one is that a full reboot takes somewhere between one and five
minutes. For those of us who spend all day rebooting their POWER systems
this is great, but the utility for end users has always been pretty
questionable.
Rebooting a system should be a fairly infrequent activity in the field
with the main reasons for doing one being:
1) Kernel updates,
2) Misbehaving hardware
Although 1) can be performed by kexec we have found that it fails due to
2) occasionally. The reason for 2) is usually hardware getting itself
into a bad state. The universal fix for that type of hardware problem
is turning the hardware off and back on again so it's preferable that
a reboot actually does that.
This patch refactors the reboot handling OPAL calls so that fast-reboot
is only used by default when explicitly enabled, or manually invoked.
This allows developers to continue to use fast-reboot without expecting
users deal with its quirks (and understand how a "normal" reboot,
fast-reboot and MPIPL differ).
This has two user visible changes:
1. Full reboot is now the default. In order to get fast-reboot as the
default the nvram option needs to be set:
nvram -p ibm,skiboot --update-config fast-reset=1
2. The nvram option to force a fast-reboot even when some part of
skiboot has called disable_fast_reboot() has changed from
'fast-reset=im-feeling-lucky' to 'force-fast-reset=1' because
it's impossible to actually use that 'feature' if fast-reboot is
off by default.
nvram -p ibm,skiboot --update-config force-fast-reset=1
Cc: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
There are a number of proc_gen branches removed that are trivially
dead code and comments that refer to P7. As well as those:
- Oliver points out that add_xics_icps() must be unused on POWER8
because it asserts if number of threads > 4, so remove it.
- Change 16b7ae641 ("Remove POWER7 and POWER7+ support") removed all
references to opal_boot_trampoline, so remove that.
- It also removed the only non-trival choose_bus implementation, so
that is removed and its caller simplified.
- Remove the paca code, later CPUs use pcia.
Cc: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The XIVE driver exposes an API to the core OPAL layer and to other
OPAL drivers. This is a minor cleanup preparing ground for future XIVE
logic.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Using traps for assertions like Linux does gives a few advantages:
- The asm code leading to the failure condition is nicer.
- The interrupt gives a clean snapshot of machine state to dump.
The difficulty with using traps for this in OPAL is that the runtime
component will not deal well with the OS taking the 0x700 interrupt
caused by a trap in OPAL.
The long term goal is to improve the ability of the OS to inspect and
debug OPAL at runtime. For now though, the traps are patched out before
passing control to the OS, and the assert falls through to in-line
failure handling.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[oliver: commit prefix, added and renamed the FWTS label, fix tests]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Sometimes it's useful to fiddle with some of the PCI NVRAM options that
we have. Currently this is mostly for enabling and disabling pci-tracing
mode, but having a common place for this stuff is a good idea.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
|
|
Use Software Package Data Exchange (SPDX) to indicate license for each
file that is unique to skiboot.
At the same time, ensure the (C) who and years are correct.
See https://spdx.org/
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: Added a few missing files]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Fast reboot gets disabled for a number of reasons e.g. the availability
of nvlink. However this doesn't actually affect the ability to perform fast
reboot if no nvlink device is actually present.
Add a nvram option for fast-reset where if it's set to
"im-feeling-lucky" then perform the fast-reboot irrespective of if it's
previously been disabled.
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Acked-by: Russell Currey <ruscur@russell.cc>
[stewart: update nvram_query_eq to nvram_query_eq_dangerous]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This requires implementing the MSR[RI] bit. Then just allow all
non-fatal sreset exceptions to recover.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Provide an sreset handler specifically for fast reboots, which allows
FIXUP_ENDIAN to be removed from the normal sreset handler in the next
patch.
The save_1 == 0 condition is no longer required to signal a fast
reboot.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Secondary CPUs currently run with MSR[ME]=0 during boot, whih means
if they take a machine check, the system will checkstop.
Enable ME where possible and allow them to print registers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Improve sreset and MCE handling in fast reboot. Switch the HILE bit
off before copying OPAL's exception vectors, so NMIs can be handled
properly. Also disable MSR[ME] while the vectors are being overwritten.
Some of the remaining problem cases are documented in comments.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Use the word copy, to match copy_exception_vectors.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This takes a checksum of skiboot memory after boot that should be
unchanged during OS operation, and verifies it before allowing a
fast reboot.
This is not read-only memory from skiboot's point of view, beause
it includes things like the opal branch table that gets populated
during boot.
This helps to improve the integrity of firmware against host and
runtime firmware memory scribble bugs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Mambo image payloads get overwritten by the OS and by
fast reboot memory clearing because they have no region
defined. Add them, which allows fast reboot to work.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: fix up 'make check']
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Once things start to go wrong, disable_fast_reboot can be called a
number of times, so make the first reason sticky, and also print it
to the console at disable time. This helps with making sense of
fast reboot disables.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
I missed a hunk when merging :(
Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Fixes: 7c8e1c6f89f3aac77661cfcee75ab515bd053d75
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
For many systems, scanning PCI takes about as much time as
zeroing all of RAM, so we may as well do them at the same time
and cut a few seconds off the total fast reboot time.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This catches a few cases (e.g., fast reboot failure messages) that
don't always make it to the console before the machine is rebooted.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This is required by the architecture and the implementations, I've
observed failures to wake up on big cores without this.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Direct control xscom can take more time to complete. We seem to
wait too little on Boston failing fast-reboot for no good reason.
Increase timeout to 1 sec as a reasonable value for sreset to be delivered
and core to start executing instructions.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This improves the security and predictability of the fast reboot
environment.
There can not be a secure fence between fast reboots, because a
malicious OS can modify the firmware itself. However a well-behaved
OS can have a reasonable expectation that OS memory regions it has
modified will be cleared upon fast reboot.
The memory is zeroed after all other CPUs come up from fast reboot,
just before the new kernel is loaded and booted into. This allows
image preloading to run concurrently, and will allow parallelisation
of the clearing in future.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Run the mem_region sanity checkers before proceeding with fast
reboot.
This is the beginning of proactive sanity checks on opal data
for fast reboot (with complements the reactive disable_fast_reboot
cases). This is encouraged to re-use and share any kind of debug
code and unit test code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
pci_reset() currently does a platform reboot if it fails. It
should not know about fast-reboot at this level, so instead have
it return an error, and the fast reboot caller will do the
platform reboot.
The code essentially does the same thing, but flexibility is
improved. Ideally the fast reboot code should perform pci_reset
and all such fail-able operations before the CPU resets itself
and destroys its own stack. That's not the case now, but that
should be the goal.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is an initial fast reboot implementation for p9 which has only been
tested on the Witherspoon platform, and without the use of NPUs, NX/VAS,
etc.
This has worked reasonably well so far, with no failures in about 100
reboots. It is hidden behind the traditional fast-reboot experimental
nvram option, until more platforms and configurations are tested.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Move the boot CPU cleanup and state transition to active, logically
together with secondaries. Don't release secondaries from fast reboot
hold until everyone has cleaned up and transitioned to active.
This is cosmetic, but it is helpful to run the fast reboot state machine
the same way on all CPUs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Change existing failure error messages to PR_NOTICE so they get
printed to the console, and add some new ones. It's not a more
severe class because it falls back to IPL on failure.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Switch fast reboot to use quiescing rather than "wait for a while".
If firmware can not be quiesced, then fast reboot is skipped. This
significantly improves the robustness of fast reboot in the face of
bugs or unexpected latencies.
Complexity of synchronization in fast-reboot is reduced, because we
are guaranteed to be single-threaded when quiesce succeeds, so locks
can be removed.
In the case that firmware can be quiesced, then it will generally
reduce fast reboot times by nearly 200ms, because quiescing usually
takes very little time.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Don't tie mambo fast reboot to POWER8 CPU type.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently the boot CPU (not the initiator) clears special wakeups
after all CPUs have called in. After the earlier change to have the
initiator wait for secondaries before calling in, this is no longer
necessary.
Have the initiator finish the entire sreset sequence, clearing special
wakeups after all others have called in.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This function has shrunk to the point it's not so helpful to keep it,
it's no longer power8 specific, and getting rid of it simplifies
error handling a little in future changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
There is a 100ms delay when targets reach sreset which does not appear
to have a good purpose. Remove it and therefore reduce the sreset timeout
by the same amount.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is a bit of paranoia, but when a CPU changes state to signal it
has reached a particular point, all previous stores should be visible.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Have the initiator wait for all its sreset targets to call in, and
time out after 200ms if they did not. Fail and revert to IPL reboot.
Testing indicates that after successful sreset_all_others(), it
takes less than 102ms (in hundreds of fast reboots) for secondaries
to call in. 100 of that is due to an initial delay, but core
un-splitting was not measured.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Pass back failures from sreset_all_others, also change return codes to
OPAL_ form in sreset_all_prepare to match.
Errors will revert to the IPL path, so it's not critical to completely
clean up everything if that would complicate things. Detecting the
error and failing is the important thing.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Move the mambo sreset code out from the P8 implementation.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The "last man standing" logic has the initiator CPU sreset all
others, then one of them sresets the initiator.
This complicates the fast reboot process and increases potential for
errors. The initiator can simply branch to 0x100 directly.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This provides a simple API that is amenable to be implemented
by the direct-controls subsystem in a future change.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
pm idle requires the system reset vector and IPI facilities before
it can be enabled. Split these out and manage them individually.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Create a more generic helper for changing HID0 bits on all
processors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
It uses tlbiel and only cleans up the TLB of the calling core
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Recent CPUs have introduced a lower SMT priority. This uses the
Linux pattern of executing priority nops in descending order to
get a simple portable way to put the CPU into lowest SMT priority.
Introduce smt_lowest() and use it in place of smt_very_low and
smt_low ; smt_very_low sequences.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|