Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
(cherry picked from commit 17661bef0e0968e60e0938e646e6d3ab0e201d46)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
(cherry picked from commit 7d64a8b4daa00a78e49493668ad4fd6789bfc883)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Also fixes hdat_to_dt test cases.
Fixes: ad484081ef8a51811e7902aec436fa8f1ca9604a
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Set a few new values in the PHY_RESET procedure, as specified by our
updated programming guide documentation.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add parsing for the link speed information and the OCC GPU presence
flags.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add the per-chip structures that descibe how the A-Bus/NVLink/OpenCAPI
phy is configured. This generates the npu@xyz nodes for each chip on
systems that support it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add structure definitions that describe the physical PCIe topology of
a system and parse them into the device-tree based PCIe slot
description.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Iterating the SPPCRD structures (per chip data) is a fairly common
operation in the HDAT parser. Iterating the tuples directly is somewhat
irritating since we need to check for disabled chips, etc on every pass.
A better way to handle this is to iterate throught he xscom nodes
(generated from the SPPCRD data) and map from the xscom node to the
originating structure. This patch adds a function to do that.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Adds HDIF_get_iarray() which retrieves and validates an internal array
header and HDIF_iarray_for_each() for walking the individual array
entries. This reduces the amount of get-then-check boilerplate that
we have with the existing HDIF_get_iarray_item() method for iterating
internal data arrays.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Sometimes handy.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a dummy is_rodata() implementation for use inside test code.
Currently we don't need to make this actually check if the given
pointer is actually read-only, but someone might want it to work
properly in the future.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In the future we will always create the npu nodes based on what's in the
HDAT. For now we seperate witherspoon into an old and new platform where
the old platform will assume a sequoia planar and create the relevant
NPU nodes for that planar. If you have a redbud system this will be
broken, but this should be fine for most cases.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add the other PCIe devices to the witherspoon slot tables. This provides
a fall back for systems without IOSLOT information in the HDAT. This is
mainly to allow DD1 systems to continue being useful.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Move this out of the astbmc specific part into a generic helper. This
allows us to use it more commonly.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
astbmc has some code to handle devices that are behind a "slot" on a
riser card that can't be added to the static slot tables for a system.
We probably want to use this code outside the slot table handling so
move it somewhere generic and rework it so slot table specifics aren't
buried inside it.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In P9 we get information about the physical PCIe slot topology
through the HDAT. As a rule we never directly consume the HDAT
inside of Skiboot and we always parse and incorporate the data
from HDAT into the Skiboot device tree.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[stewart@linux.vnet.ibm.com: add (C) header]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
I'm seeing an infinite loop while hot unplugging
a CPU. This is a workaround till we do the right
things for p9. May be a candidate for backporting
The messages I see in an infinite loop are:
[ 740.250192896,3] LIBPORE: Core ID = 20 is not within valid range of [0;15]
[ 740.250230176,3] SLW: Failed to set spr for CPU 51
When trying to hotunplug core id 20. For now the
patch just skips calling p8_pore* on p9 machines.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The erase_range() function handles erasing the flash for a given start
address and length, and can handle an unaligned start address and
length. However in the unaligned start address case we are incorrectly
calculating the remaining size which can lead to incomplete erases.
If we're going to update the remaining size based on what the start
address was then we probably want to do that before we overide the
origin start address. So rearrange the code so that this is indeed the
case.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com>
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In the recent change:
3f936bae97 phb4: Retrain link if degraded
We retrain if the link is degraded. We do 3 retries to get an optimal
link.
Unfortunately if the last retry fails, we mark the PHB as bad and
don't use it. Hence that PHB is lost even though it actually trained
(just degraded).
This fixes the problem by printing an error message (as below) but
still marking the PHB as good.
[ 7.179320404,3] PHB#0005[0:5]: LINK: Link degraded
[ 8.387346665,3] PHB#0005[0:5]: LINK: Link degraded
[ 10.078409137,3] PHB#0005[0:5]: LINK: Link degraded
[ 11.281477269,3] PHB#0005[0:5]: LINK: Link degraded
[ 11.283123885,3] PHB#0005[0:5]: LINK: Degraded but no more retries
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and
below) the PCIe PHY can lockup causing training issues. This can cause
a degradation in speed or width in ~5% of training cases (depending on
the card). This is fixed in later chip revisions. This issue can also
cause PCIe links to not train at all, but this case is already
handled.
This patch checks if the PCIe link has trained optimally and if not,
does a full PHB reset (to fix the PHY lockup) and retrain.
One complication is some devices are known to train degraded unless
device specific configuration is performed. Because of this, we only
retrain when the device is in a whitelist. All devices in the current
whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon.
We always gather information on the link and print it in the logs even
if the card is not in the whitelist.
For testing purposes, there's an nvram to retry all PCIe cards and all
P9 chips when a degraded link is detected. The new option is
'pci-retry-all=true' which can be set using:
nvram -p ibm,skiboot --update-config pci-retry-all=true
This option may increase the boot time if used on a badly behaving
card.
Signed-off-by: Michael Neuling <mikey@neuling.org>
[stewart@linux.vnet.ibm.com: fix Cumulus VERS_MAJ r.e. Mikey mail]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Make link retries a #define rather than open coding it in the PHB4
init code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We are going need pci_wait_crs() in the PHB4 code so make it global.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Split phb4_get_link_state() into a new function so that it can be
reused to get info on the speed and width of the link.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Move nvram read to the PHB4 init code so that's it's only read once,
rather than every time we go though PHB reset.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This code was never used (since retries is set to 0), it's not very
useful and it makes the code harder to read. So lets just remove it.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The HW only supported limited access sizes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Correct the documentation in a couple of places to match the
actual behaviour and improve bits and pieces of it
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Provide a way to test recoverable data link interrupts via a new
vendor capability byte.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
====== v2 -> v3: ======
* Corrected name of NPU RING (no 2). [Andrew Donnellan]
* Corrected spelling of device. [Andrew Donnellan]
hw/npu2.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Allow the NPU2 to trigger "recoverable data link" interrupts.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When requested via OPAL_XIVE_ANY_CHIP, we need to try all
chips. We first try the current one (on which the caller
sits) and if that fails, we iterate all chips until the
allocation succeeds.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Instead of trying to "pull" everything and clear VT (which didn't
work and caused some FIRs to be set), instead just clear and then
set the PTER thread enable bit. This has the side effect of
completely resetting the corresponding thread context.
This fixes the spurrious XIVE FIRs reported by PRD and fircheck
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is high overhead so we don't enable it by default even
in debug builds, it's also a bit messy, but it allowed me to
detect and debug a locking issue earlier so it can be useful.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We normally allocate IPIs from 0x10. Make that 0x1000 on debug
builds to limit the chances of overlapping with Linux interrupt
numbers which makes debugging code that confuses them easier.
Also add a warning in emulation if we get an interrupt in the
queue whose number is below the gap.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Thankfully the missing locking only affects debug code and
init code that doesn't run concurrently. Also adds a DEBUG
option that checks the lock is properly held.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Cosmetic fix.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Without this, we sometimes don't observe from a CPU the
values written to the ENDs or NVTs via the cache watch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This runs 1000 iterations exercising the cache watch and scrub
facilities on VPs and ENDs at boot. This exposes a HW bug with
the scrub which will be worked around in a subsequent patch.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If this fails, print a bit more info about it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This adds debug code to check that the initial updates of
in-memory VPs and EQs via the cache watch and cache scrub
facilities has worked properly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We don't use them and we hijack the VP field with their
configuration to store the EQ reference, so make sure the
kernel or guest can't turn them back on by doing MMIO
writes to ACK#
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
That doesn't work, the HW doesn't implement it in the cache
watch facility anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We no longer update "live" memory structures, we use a temporary
copy on the stack and update the actual memory structure using
the cache watch, so those barriers are pointless.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Exports the In-Memory Collection counter nest memory to
the OS. This allows the OS to view the nest counter
region directly. This helps in nest microcode debug
and to check counter raw value.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add In-Memory Collection counter dummy nodes to the skiboot.tcl
to aid code testing in mambo for both OPAL and Kernel side enablement.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Minor cleanup to avoid null pointer access.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add basic handling of FLR (function level reset) by porting the changes
from commit b74841db759d ("npu: Implement FLR") to npu2.
The only difference for npu2 is that we track the reset state explicitly
with a link flag instead of inferring it from
dev->procedure_{status,number,step,data}.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a complement to npu2_set_link_flag().
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|