Age | Commit message (Collapse) | Author | Files | Lines |
|
Commit 80fd2e963bd4 ("xscom: Don't log xscom errors caused by OPAL
calls") ensured that xscom errors caused due to XSCOM read/write OPAL
calls aren't logged in the error-log since the caller of the OPAL call
is expected to handle it.
However we are continuing to print the prerror() in the OPAL log
regarding the same. This patch reduces the severity of the log from
PR_ERROR to PR_INFO for the xscom read and write made via OPAL calls.
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Print info only for xscom read/writes made via opal calls
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Currently we assume any xscom_read / write targeted at a chipid with 0x8
as the top four bits is intended to be a centaur SCOM. On non-P8
platforms there is no reason to assume this so covert it to use the
new struct scom_controller infrastructure.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Currently the top nibble of the "part ID" is used to determine the type
of a xscom_read() / xscom_write() call. This was mainly done for the
benefit of PRD on P8 which would do "targeted" SCOMs to EX (core)
chiplets and rely on skiboot to do find the actual scom address.
Similarly, PRD also relied on this to access the SCOMs of centaur
chips which are accessed via FSI on P8.
On P9 PRD moved to only doing non-targeted scoms where it would
only ever supply a "part ID" which was the fabric ID of the chip
to be SCOMed. The centaur support was also unnecessary since OPAL
didn't support any P9 systems with Centaurs. However, on future
systems we will have to support memory buffer chips again so we need
to expand the SCOM support to accomodate them.
To do this, allow skiboot components to register a SCOM read and write()
function for chip ID. This will allow us to ensure the P8 EX chiplet and
Centaur SCOM code is only ever used on P8, freeing up the Part ID
address space for other uses.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
SPDX makes it a simpler diff.
I have audited the commit history of each file to ensure that they are
exclusively authored by IBM and thus we have the right to relicense.
The motivation behind this is twofold:
1) We want to enable experiments with coreboot, which is GPLv2 licensed
2) An upcoming firmware component wants to incorporate code from skiboot
and code from the Linux kernel, which is GPLv2 licensed.
I have gone through the IBM internal way of gaining approval for this.
The following files are not exclusively authored by IBM, so are *not*
included in this update (I will be seeking approval from contributors):
core/direct-controls.c
core/flash.c
core/pcie-slot.c
external/common/arch_flash_unknown.c
external/common/rules.mk
external/gard/Makefile
external/gard/rules.mk
external/opal-prd/Makefile
external/pflash/Makefile
external/xscom-utils/Makefile
hdata/vpd.c
hw/dts.c
hw/ipmi/ipmi-watchdog.c
hw/phb4.c
include/cpu.h
include/phb4.h
include/platform.h
libflash/libffs.c
libstb/mbedtls/sha512.c
libstb/mbedtls/sha512.h
platforms/astbmc/barreleye.c
platforms/astbmc/garrison.c
platforms/astbmc/mihawk.c
platforms/astbmc/nicole.c
platforms/astbmc/p8dnu.c
platforms/astbmc/p8dtu.c
platforms/astbmc/p9dsu.c
platforms/astbmc/vesnin.c
platforms/rhesus/ec/config.h
platforms/rhesus/ec/gpio.h
platforms/rhesus/gpio.c
platforms/rhesus/rhesus.c
platforms/astbmc/talos.c
platforms/astbmc/romulus.c
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: fixed up the drift]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This adds missing endian conversions to most calls, sufficient at least
to handle calls from a kernel booting on mambo.
Subsystems requiring more extensive changes (e.g., xive) will be done
with individual changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The XSCOM read/write OPAL calls are largely there to support running
PRD in the OS. PRD itself handles submitting error logs (if needed)
when a XSCOM operation fails so there's no need to send an error log
from inside of OPAL.
Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
There are a number of proc_gen branches removed that are trivially
dead code and comments that refer to P7. As well as those:
- Oliver points out that add_xics_icps() must be unused on POWER8
because it asserts if number of threads > 4, so remove it.
- Change 16b7ae641 ("Remove POWER7 and POWER7+ support") removed all
references to opal_boot_trampoline, so remove that.
- It also removed the only non-trival choose_bus implementation, so
that is removed and its caller simplified.
- Remove the paca code, later CPUs use pcia.
Cc: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Use Software Package Data Exchange (SPDX) to indicate license for each
file that is unique to skiboot.
At the same time, ensure the (C) who and years are correct.
See https://spdx.org/
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: Added a few missing files]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
It's been a good long while since either OPAL POWER7 user touched a
machine, and even longer since they'd have been okay using an old
version rather than tracking master.
There's also been no testing of OPAL on POWER7 systems for an awfully
long time, so it's pretty safe to assume that it's very much bitrotted.
It also saves a whole 14kb of xz compressed payload space.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Enthusiasticly-Acked-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Most nvram options used by skiboot are just for debug or testing for
regressions. They should never be used long term.
We've hit a number of issues in testing and the field where nvram
options have been set "temporarily" but haven't been properly cleared
after, resulting in crashes or real bugs being masked.
This patch marks most nvram options used by skiboot as dangerous and
prints a chicken to remind users of the problem.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Fixes: 2c8f96534a978bb4cac3e4b7dd393a9cc4926555
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This was disabled at some point during bringup to make life easier for
the lab folks trying to debug NVLink issues. This hack really should
have never made it out into the wild though, so we now have the
following situation occuring in the field:
1) A bad happens
2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
request a platform reboot.
3) OPAL rejects the reboot attempt and returns to the kernel with
OPAL_PARAMETER.
4) Kernel panics and attempts to kexec into a kdump kernel.
A side effect of the HMI seems to be CPUs becoming stuck which results
in the initialisation of the kdump kernel taking a extremely long time
(6+ hours). It's also been observed that after performing a dump the
kdump kernel then crashes itself because OPAL has ended up in a bad
state as a side effect of the HMI.
All up, it's not very good so re-enable the software checkstop by
default. If people still want to turn it off they can using the nvram
override.
Cc: skiboot-stable@lists.ozlabs.org
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Enable a new PVR to get us running on another p9 variant.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We print out a whole bunch of things on boot, most of which aren't
interesting, so we should *not* print them instead.
Printing things like what CPUs we found and what PCI devices we found
*are* useful, so continue to do that. But we don't need to splat out
a bunch of things that are always going to be true.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17.
We don't need this as we need to do it a different way, with a explicit
set of registers as otherwise we trip other random FIR bits and everything
becomes even more terrible.
I suggest alcohol.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is not the way we want to end up doing this.
This is a hack to make folk happy and not require crondump to
debug nvidia/npu2 issues.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a mechanism to enable/disable sw checkstop by looking at nvram option
opal-sw-xstop=<enable/disable>.
For now this patch disables the sw checkstop trigger unless explicitly
enabled through nvram option 'opal-sw-xstop=enable'i for p9. This will allow
an opportunity to get host kernel in panic path or xmon for unrecoverable
HMIs or MCE, to be able to debug the issue effectively.
To enable sw checkstop in opal issue following command:
# nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
NOTE: This is a workaround patch to disable sw checkstop by default to gain
control in host kernel for better checkstop debugging. Once we have most of
the checkstop issues stabilized/resolved, revisit this patch to enable sw
checkstop by default.
For p8 platform it will remain enabled by default unless explicitly disabled.
To disable sw checkstop on p8 issue following command:
# nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Due to a hardware issue where core responding to scom was delayed due to
thread reconfiguration, leaves the SCOM logic in a state where the
subsequent scom to that core can get errors. This is affected for Core
PC scom registers in the range of 20010A80-20010ABF
The solution is if a xscom timeout occurs to one of Core PC scom registers
in the range of 20010A80-20010ABF, a clearing scom write is done to
0x20010800 with data of '0x00000000' which will also get a timeout but
clears the scom logic errors. After the clearing write is done the original
scom operation can be retried.
The scom timeout is reported as status 0x4 (Invalid address) in HMER[21-23].
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
So caller of xscom_reset() does not have to bother about adding a delay
separately. Instead caller can control whether to add a delay or not using
second argument to xscom_reset().
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Workaround on P9: PRD does operations it *knows* will fail with this
error to work around a hardware issue where accesses via the PIB
(FSI or OCC) work as expected, accesses via the ADU (what xscom goes
through) do not. The chip logic will always return all FFs if there
is any error on the scom.
Suggested-by: Daniel M Crowell <dcrowell@us.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
xscom_read/write operations returns CHIPLET_OFFLINE when chiplet is offline.
Some multicast xscom_read/write requests from HBRT results in xscom operation
on offline chiplet(s) and printing below warnings in OPAL console.
[ 135.036327572,3] XSCOM: Read failed, ret = -14
[ 135.092689829,3] XSCOM: Read failed, ret = -14
This results in unnecessary bugs. Hence remove error message for multicast
SCOM operations.
Suggested-by: Daniel M Crowell <dcrowell@us.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Current code only supports DD1. This adds support for DD2.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
It is common for xscom registers to only contain specific bit fields that
need to be modified without altering the rest of the register. This adds a
convenience function to perform xscom read-modify-write operations under a
mask.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
90% of what we print isn't useful to a normal user. This
dramatically reduces the amount of messages printed by
OPAL in normal circumstances.
We still need to add a way to bump the log level at boot
based on a BMC scratch register or some HDAT property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In the mambo tcl we set the CPU version to DD 2.0, because mambo is not
bug compatible with DD 1.
But in xscom_read_cfam_chipid() we have a hard coded value, to work
around the lack of the f000f register, which claims to be P9 DD 1.0.
This doesn't seem to cause crashes or anything, but at boot we do see:
[ 0.003893084,5] XSCOM: chip 0x0 at 0x1a0000000000 [P9N DD1.0]
So fix it to claim that the xscom is also DD 2.0 to match the CPU.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add code to perform indirect form 1 scoms.
POWER8 does form 0 only. POWER9 adds form 1. The form is determined
from the address only. Hardware only allows writes for form 1.
Only hostboot uses these scoms during IPL, so they are unused by
skiboot currently.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Indirect scoms can only set certain bits of data. Ensure only these
are set when trying to write.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add scom reset registers for POWER9.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Abstract error recovery registers to get ready for POWER9.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
SCOM doesn't exist, we end up in a sad place (machine check)
Fixes: c9cadb4fe60d4d41fc45a35a1d2ae27e0632c679
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
To differentiate between 1.00, 1.01, 1.02 etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We add some routines that let a caller get the xscom lock once and
then do a bunch of xscoms while holding it.
In some situations without this, it could take long enough to get
the xscom lock that the 1ms timeout would expire and we'd falsely
think the SLW timer didn't work when in fact it did.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Instead of mapping them to just 3 different codes, define an OPAL
error code for all known HMER error status, as different recovery
path might be needed at the call site, and it allows for more
informative logging.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In case of error, don't leave the data random. It helps debugging when
the user fails to check the error code. This happens due to a bug in the
PRD wrapper app.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The checks validate pointers sent in using
opal_addr_valid() in opal_call API's provided
via the console, cpu, fdt, flash, i2c, interrupts,
nvram, opal-msg, opal, opal-pci, xscom and
cec modules
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
[stewart@linux.vnet.ibm.com: added FWTS annotation]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
XSCOM engine blocks subsequently after querying FIR of any
sleeping core. This causes subsequent XSCOM opertions to hang
forever due to XSCOM engine being continuously busy. Reset XSCOM
engine after querying FIR of any sleeping core.
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
[stewart@linux.vnet.ibm.com: split from timebase quirk patch]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
More to go, especially we need to review recovery, but at least
this enables indirect and errors out on not-yet-supported EX
targeting.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
OPAL retries XSCOM read/write operations forever till it succeeds.
This can cause XSCOM ops to hang forever when XSCOM engine remains
busy for some reason. Changed it to retry XSCOM operations only
XSCOM_BUSY_MAX_RETRIES number of times instead of retrying forever.
Also added logic to reset XSCOM engine after XSCOM_BUSY_RESET_THRESHOLD
number of retries to unblock it when it remains busy.
Cc: stable # 9c2d82394fd2 ("xscom: Return OPAL_WRONG_STATE on XSCOM ops..")
Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add PVR detection, chip id and other misc bits for POWER9.
POWER9 changes the location of the HILE and attn enable bits in the
HID0 register, so add these definitions also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
[stewart@linux.vnet.ibm.com: Fix Numbus typo, hdata_to_dt build fixes]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
xscom_read and xscom_write return OPAL_SUCCESS if they worked, and
OPAL_HARDWARE if they didn't. This doesn't provide information about why
the operation failed, such as if the CPU happens to be asleep.
This is specifically useful in error scanning, so if every CPU is being
scanned for errors, sleeping CPUs likely aren't the cause of failures.
So, return OPAL_WRONG_STATE in xscom_read and xscom_write if the CPU is
sleeping.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
hw/xscom.c:425:23: warning: constant 0x221EF04980000000 is so big it is long
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
It works just like P8, we copy the code for now rather than make
it somewhat common due to our locking differences and to limit
the risk close to release. We can refactor later.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|