Age | Commit message (Collapse) | Author | Files | Lines |
|
This is an experimental patch that implements "Fast reboot" on P8
machines.
The basic idea is that when the OS calls OPAL reboot, we gather all
the threads in the system using a combination of patching the reset
vector and soft-resetting them, then cleanup a few bits of hardware
(we do re-probe PCIe for example), and reload & restart the bootloader.
For Trusted Boot, this means we *add* measurements to the TPM, so you
will get *different* PCR values as compared to a full IPL. This makes
sense as if you want to be sure you are running something known then,
well, do a full IPL as soft reset should never be trusted to clear any
malicious code.
This is very experimental and needs a lot of testing and also auditing
code for other bits of HW that might need to be cleaned up.
BenH TODO: I also need to check if we are properly PERST'ing PCI devices.
This is partially based on old code I had to do that on P7. I only
support it on P8 though as there are issues with the PSI interrupts
on P7 that cannot be reliably solved.
Even though this should be considered somewhat experimental, we've had
a lot of success on a variety of machines. Dozens/hundreds of reboots
across Tuleta, Garrison and Habanero.
Currently, we've hidden it behind a NVRAM config option, which *is*
liable to change in the future (to ensure that only those who know
what they're doing enable it)
You can enable the experimental support via nvram option:
nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[stewart@linux.vnet.ibm.com: hide behind nvram option, include Mambo fixes
from Mikey]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
During an OCC reset cycle the system is forced to Psafe pstate.
When OCC becomes active, the system has to be restored to its
last pstate as requested by host. So host needs to be notified
of OCC_RESET event or else system will continue to remian in
Psafe state until host requests a new pstate after the OCC
reset cycle.
This patch defines 'OPAL_PRD_MSG_TYPE_OCC_RESET_NOTIFY' to
notify OPAL when opal-prd issues OCC reset. OPAL will queue
OCC_RESET message to host when it receives opal_prd_msg of
type '*_OCC_RESET_NOTIFY'.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Be a bit clearer in the impact of some of these errors.
Suggested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This lets us use FWTS to check that we haven't errored out enabling
OCCs, which is a (relatively) common error that crops up and since
we continue to boot without OCCs, things still work fine, it's just
we don't get any power or frequency scaling.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Parse the entire pstate table provided by OCC and filter out the
entries that are outside the Pmax and Pmin limits. This can
occur when turbo mode is disabled and OCC limits the Pmax to
nominal pstate, but includes turbo pstates in the pstate table.
We end up with wrong pstates in such cases if we do not parse
the pstate table to filter out the correct range.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When constructing the pstate entries in the device tree we allocate
MAX_PSTATES, even though we know that there are nr_pstates.
Use this information to allocate nr_pstates and potentially save us some
heap.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add support to read OCC-OPAL shared memory region version2 to parse
ultra-turbo pstates and core-to-max-pstate-allowed array and append
it to device tree. Each element of core-to-max-pstate-allowed indicates
the maximum pstate sustained with 'n' online cores.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fixes: a804c1b2c13f
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Modify the OCC reset order such that master OCC is reset after the
slave OCCs are reset. In Tuleta/Alpine systems 'proc0' will always be
the master OCC, which has to be stopped last when FSP sends OCC_RESET
command to Opal.
This fixes BZ 119718, SW289036
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
hw/occ.c:278:2: warning: Value stored to 'rc' is never read
rc = xscom_read(chip->id, XSCOM_ADDR_P8_EX_SLAVE(core, EX_PM_PPMCR), &tmp);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hw/occ.c:309:2: warning: Value stored to 'rc' is never read
rc = xscom_read(chip->id, XSCOM_ADDR_P8_EX_SLAVE(core, EX_PM_PPMSR), &tmp);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Since skiboot is all BE, this doesn't make a difference in code generated.
It does silence the following sparse warnings though:
hw/occ.c:354:38: warning: incorrect type in assignment (different base types)
hw/occ.c:354:38: expected restricted beint64_t [usertype] type
hw/occ.c:354:38: got int
hw/occ.c:370:46: warning: incorrect type in assignment (different base types)
hw/occ.c:370:46: expected restricted beint64_t [addressable] [assigned] [usertype] type
hw/occ.c:370:46: got int
hw/occ.c:371:46: warning: incorrect type in assignment (different base types)
hw/occ.c:371:46: expected restricted beint64_t [addressable] [assigned] [usertype] chip
hw/occ.c:371:46: got unsigned int [unsigned] [usertype] id
hw/occ.c:372:57: warning: incorrect type in assignment (different base types)
hw/occ.c:372:57: expected restricted beint64_t [addressable] [assigned] [usertype] throttle_status
hw/occ.c:372:57: got unsigned char [unsigned] [usertype] throttle
hw/occ.c:477:49: warning: incorrect type in initializer (different base types)
hw/occ.c:477:49: expected restricted beint64_t [usertype] type
hw/occ.c:477:49: got int
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In some simulation environments, we simulate a system close to an
ibm-fsp system but with a crucial difference: we don't simulate OCCs.
This means that for a P8 (well, a simulated one) that looks like it's
part of a ibm-fsp system, we'd wait around for about a minute to be
asked to start OCCs and for the OCCs to start. Obviously, this would
never happen and we'd hit the OCC initialization timeout (correctly)
logging an error.
However, in this simulation environment, it isn't an error as the
required information to work out it isn't an error is (at least now)
provided in hdat under 'OCC Functional State'.
Previously, the ibm,occ-functional-state property was just passed
through the device tree to the host through the XSCOM node and
skiboot ignored it.
This patch takes note of occ-functional-state and skips waiting for
OCCs on any chips that have been marked as having non functional
OCC.
In such simulation environments this means we:
a) don't log an error that isn't really an error
b) boot 1 minute quicker as we don't hit the timeout.
Tested-by: Gajendra B Bandhu1 <gbandhu1@in.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In the event of a lot of OCC events (or many CPU cores), we could
send many OCC messages to the host, which if it wasn't calling
opal_get_msg really often, would cause skiboot to malloc() additional
messages until we ran out of skiboot heap and things didn't end up
being much fun.
When running certain hardware exercisers, they seem to steal all time
from Linux being able to call opal_get_msg, causing these to queue up
and get "opalmsg: No available node in the free list, allocating" warnings
followed by tonnes of backtraces of failing memory allocations.
|
|
Recent HostBoot & SBE firmware provide a HW timer facility that can
be used to implement OPAL timers and thus limit the reliance on the
Linux heartbeat.
This implements support for it. The side effect is that i2c from Centaurs
is now usable.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[stewart@linux.vnet.ibm.com: fix run-timer unit test]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
There are now no users of the call_out parameter and future users should
use the log_append_msg() and log_append_data() functions, so remove all
references to call_out.
Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a new class of message definition OPAL_MSG_OCC to
opal_message_type to notify the following OCC events to host:
1) OCC Reset
2) OCC Load
3) OCC Throttle Status Change
Add an opal poller to periodically read throttle status updated by OCC
for each chip and notify any change in throttle status to host. The
throttle status indicates the reason why OCC may have limited the max
Pstate of the chip.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We were previously asking the OCC of the current chip to generate
the self interrupt. If Hostboot does not configure all the PSI Host
Bridges, so if the current chip happens to have an unconfigured PSI HB,
the chip will never see the interrupt.
Instead grab a chip id from the list of configured PSIs, and ask the OCC
on that chip to generate the self-interrupt.
This adds a pointer to the chip's PSI in struct proc_chip so we can
use the current chip's PSI if it is active without having to look
through all of them.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
cc: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Commit 4db0c1e4f introduced occ_load_req queue. With that changes we queue
the occ load request if hostservice LID load is not complete. And we have
callback function (occ_poke_load_queue)...which takes care of calling
__occ_do_load().
But current code proceeds and calls __occ_do_load() after queueing....which
is not correct. So just return if we queue the occ load request.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We need to pass the PNOR access status to the OCCs, as they may write to
the PNOR in the event of a checkstop.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
|
|
This change hooks the OCC TMGT interrupt path into the PRD's
prd_tmgt_interrupt function.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
There is no guarantee that a hostservices lid load request will arrive
after we have cached the required lids. For such cases, queue the
request and service them after caching.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
And add some basic qemu quirks
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently, the occ_interrupt handler will clear the interrupt bit along
with the interrupt reason. If an irq has occurred between the read and
the clear, we'll mask out interrupt bit for that new event
This change checks the reason bits after clearing the interrupt bit. If
any are set, we re-set the interrupt bit to trigger another interrupt.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The OCC interrupt register only exists on P8, accessing it on P7 causes
not only error logs but also causes PRD to eventually gard chips.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
During characterisation, we'd like to allow userspace to see the vdd and
vcs values exposed by the OCC. This change adds two new properties to
expose these:
ibm,pstate-vdds
ibm,pstate-vcss
- containing one byte per pstate, representing the Vdd or Vcs value for
that pstate.
Becuase we now have a few different error paths (one for each allocation
failure), we consolidate the free()s into a single path.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If the OCC interrupt comes from another chip, we incorrectly try to clear
it on the local one. This causes hangs at boot on some machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Commit cf6f4e8912d29fb89ce85c84834607065ad595a5 introduced a platform
independent frontend for error logging. However it failed to move the
generic parts of the fsp-elog.h header into the platform independent
one, instead relying on the fact that up until now fsp-elog.h was
included whenever a function needed to log errors.
This patch moves the platform independent defines into the frontend
header file (errorlog.h) and removes the include of the platform
specific header in generic code paths.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Mambo doesn't implement various things such as PBA SCOMs, LPC,
ChipTOD, etc... It also provides a special console hook.
This adds detection of Mambo via the /mambo node, and enables
us to boot all the way to Linux.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
This function uses int arrays from stack that pushes stack usage to
more than 2kB. Reduce stack usage by allocating memory.
Ben H's stack check compile option exposed this usage count:
hw/occ.c: In function 'add_cpu_pstate_properties':
hw/occ.c:187:1: warning: the frame size of 2064 bytes is larger than
2048 bytes [-Wframe-larger-than=]
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Modify the FSP response message to include the status code in the
status/error byte instead of adding a new word to it which is
incorrect.
FSP ack messages are 2 words with status in the 3rd byte of second
word. Status byte is in the extra (3rd) word only on new status
messages from OPAL to FSP.
Code corrected based on FSP mailbox spec version 3.16.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
OPAL is expected to leave OCC stopped after receiving reset OCC
message from FSP. FSP will send this either at boot before
a load/start, or during runtime before load/start. If there
is no subsequent load/start command, the OCC can be left stopped.
After few attempts (runtime reset), FSP can just send reset and
expect OPAL to leave OCC in stopped state.
Call HBRT to stop OCC on FSP reset OCC command and acknowledge.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This makes OPAL use the OCC interrupt facility to send itself an interrupt
whenever the OPAL event bit is set as a result of an OPAL call that wasn't
itself opal_handle_interrupt() or opal_handle_hmi() (both of which we know
the OS will already deal with appropriately).
This ensures that OPAL event changes are notified to Linux via its
interrupt path which is necessary for it to properly broadcast the state
change to its various clients.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Basically, errors should be logged as errors and for the most part,
standard booting with log level of PR_NOTICE doesn't need debug
level output.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
Presently we are logging informational event if OCC timeout happens
during boot. Change the severity to Unrecoverable Error.
Also updated the elog description.
Sample Output:
|------------------------------------------------------------------------------|
| Entry Id Commit Time SubSystem Committed by |
| Platform Id State Event Severity Ascii Str |
|------------------------------------------------------------------------------|
| 0x53A530C8 10/09/2014 10:13:06 CEC Hardware Subsystem OC |
| 0xB0000001 Sent to Hypervisor Unrecoverable Error BB82C013 |
|------------------------------------------------------------------------------|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When the fast/deep power management modes for the cpu idle states
is initialized, bits which are not relevant in this context are also
being set. Fix this.
Besides this, the EX_PM_GP1 register will be read/written into by the
OCC as well. We touch this register during initialization of fast/deep
cpuidle modes and during initialization of pstate transitions. The register
contents can thus get messed up due to potential race conditions between
the OCC and sapphire settings.
Hence make use of the AND and OR scoms to do the settings and hence
let the hardware take care of the necessary synchronization.
We can also get rid of the setting of deep mode during slw_reinit since
we enable the required deep winkle mode during slw_init itself. This means
effectively removing the slw_prepare_chip() and its children functions.
They are no longer useful.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Keep it 0 for open-power platforms where OCC is going to be preloaded,
also avoids a annoying 1mn delay on early openpower and bml when there
is no OCC firmware to wait for.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
OCC F/w team recommends 60s timeout waiting for OCC to init. OCC has
to wait for memory throttle calibration from hardware procedure and
that takes 10s of seconds on large memory configurations. In one of
the failing case, it took 24s for OCC to init.
Typically OCC takes 2-5 secs to boot and we do that in parallel with
skiboot inits. But on certain corner cases with large memory,
we have to wait for 60s before we give up.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|