Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a mode to PHB4 to trace training process closely. This activates
as soon as PERST is deasserted and produces human readable output of
the process.
This may increase training times since it duplicates some of the
training code. This code has it's own simple checks for fence and
timeout but will fall through to the default training code once done.
Output produced, looks like the "TRACE:" lines below:
[ 3.410799664,7] PHB#0001[0:1]: FRESET: Starts
[ 3.410802000,7] PHB#0001[0:1]: FRESET: Prepare for link down
[ 3.410806624,7] PHB#0001[0:1]: FRESET: Assert skipped
[ 3.410808848,7] PHB#0001[0:1]: FRESET: Deassert
[ 3.410812176,3] PHB#0001[0:1]: TRACE: 0x0000000101000000 0ms
[ 3.417170176,3] PHB#0001[0:1]: TRACE: 0x0000100101000000 12ms presence
[ 3.436289104,3] PHB#0001[0:1]: TRACE: 0x0000180101000000 49ms training
[ 3.436373312,3] PHB#0001[0:1]: TRACE: 0x00001d0811000000 49ms trained
[ 3.436420752,3] PHB#0001[0:1]: TRACE: Link trained.
[ 3.436967856,7] PHB#0001[0:1]: LINK: Start polling
[ 3.437482240,7] PHB#0001[0:1]: LINK: Electrical link detected
[ 3.437996864,7] PHB#0001[0:1]: LINK: Link is up
[ 4.438000048,7] PHB#0001[0:1]: LINK: Link is stable
Enabled via nvram using:
nvram -p ibm,skiboot --update-config pci-tracing=true
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This improves PHB reset and link training timing. Justifications and
reasons are included in the patch.
Polling frequencies are decreased from 100ms to 10ms.
Added is a new state called PHB4_SLOT_LINK_STABLE which is now needed
since the link training can be so fast that we touch config space too
quickly (PCIe spec requires 1 second between PERST de-assert and
device config space reads). We use this new state to sanity check the
PHB and link before moving onto the PCI bus scan, where we no longer
recover from these error conditions.
Also added is simplified documentation of the PHB reset and training flow.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This adds a function phb4_check_reg() to sanity check when we do MMIO
reads from the PHB to make sure it's not fenced.
This also adds some uses of this function in common locations where
these may occur on PHB reset and link training.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we retry if we don't detect an electrical link. This is
pointless as all devices should respond in the given time.
This patches removes this retry and just returns OPAL_HARDWARE if we
don't detect an electrical link.
This has the additional benefit of improving boot times on machines
that have badly wired presence detect (ie. says a device is present
when there isn't).
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
phb4_retry_state() returns a good error code, so just use that rather
than complicating the caller.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we assume on boot that PERST is asserted so that we can skip
having to assert it ourselves.
This instead reads the PERST status and determines if we need to
assert it based on that.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Byte swap TLP headers so they are the same as the PCIe spec.
Also remove redundant print.
Suggested-by: Rob Lippert <rlippert@google.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
If the link doesn't have a electrical link or the link doesn't train
we should make that more obvious to the user.
This boosts these prints to error level.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Better logs why the slot didn't work and make it a PR_ERR so users
see it by default.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Force verbose EEH. This is a heavy handed and we should turn if off
later as things stabilise, but is useful for now.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Mostly errata workarounds, some DD1 specific.
The step Init_5 was moved to Init_16, so the numbering was updated to
reflect this.
(mikey: added section on ignoring errata ER20161123)
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we only retry once when we have a link training failure.
This changes this to be 3 retries as 1 retry is not giving us enough
reliablity.
This will increase the boot time, especially on systems where we
incorrectly detect a link presence when there really is nothing
present. I'll post a followup patch to optimise our timings to help
mitigate this later.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This reworks the pci link training retry code so that we can do more
than one retry.
This will now also print an error if a link fails to train.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
For PHB4 it's possible that the phy may end up in a bad state where it
can no longer recieve data. This can manifest as the link not
retraining. A simple PERST will not clear this. The PHB must be
completely reset.
This changes the retry state to CRESET to do this.
This issue may also manifest itself as the link training in a degraded
state (lower speed or narrower width). This patch doesn't attempt to
fix that (will come later).
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we recursively call run_sm() in phb4_retry_state(). This is
unnecessary and overly complex.
This just returns with a small wait time. 1ms should be a very small
over head compared to having to do the actual retry.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The MPIPL procedure says to only set bit 26 when forcing the PEC into
freeze mode. Currently we set bits 24-27.
This changes the code to follow spec and only set bit 26.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
According to the workbook, pfir must be cleared before the nfir.
The way we have it now causes the nfir to not clear properly in some
error circumstances.
This swaps the order to match the workbook.
Also updates the comments to be clearer.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When waiting in PHB4_SLOT_CRESET_WAIT_CQ for transations to end, we
incorrectly move onto the next state. Generally we don't hit this as
the transactions have ended already anyway.
This removes the incorrect state transition.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Set default lane equalisation if there is nothing in the device-tree.
Default value taken from hdat and confirmed by hardware team. Neatens
the code up a bit too.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We had a few problems:
- We used the wrong register to trigger the reset (spec bug)
- We should clear the PFIR and NFIR while the reset is asserted
- ... and in the right order !
- We should only apply the DD1 workaround after the reset has
been lifted.
- We should ensure we use ASB whenever we are fenced or doing a
CRESET
- Make config ops write with ASB
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This replaces use of MMIO registers with the new accessors
in places that can be called during recovery procedures at
times when the PHB can be fenced.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Those will pick between ASB (ie, XSCOM) accesses and direct MMIO
based on PHB flags, thus allowing transparent access whether the
PHB is fenced or not.
Mark as unused for now so we don't get a warning.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Enabled via nvram pci-eeh-verbose=true. ie.
nvram -p ibm,skiboot --update-config pci-eeh-verbose=true
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
For now at PHBERR level. We don't have room in the diags data
passed to Linux for these unfortunately.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
90% of what we print isn't useful to a normal user. This
dramatically reduces the amount of messages printed by
OPAL in normal circumstances.
We still need to add a way to bump the log level at boot
based on a BMC scratch register or some HDAT property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The root complex config space only supports 4-byte accesses. Thus, when
the client requests a smaller size write, we do a read-modify-write to
the register.
However, some register have bits defined as "write 1 to clear".
If we do a RMW cycles on such a register and such bits are 1 in the
part that the client doesn't intend to modify, we will accidentally
write back those 1's and clear the corresponding bit.
This avoids it by masking out those magic bits from the "old" value
read from the register.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
PHB's don't have base location codes on non-FSP systems and it's
normal.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The code is duplicated between phb3 and phb4 for no reason
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
# Conflicts:
# core/init.c
# hw/phb3.c
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Wait for DLP PGRESET to clear *after* lifting the PCIe core reset
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Support StoreEOI, full complements of PEs (twice as big TVT)
and other updates.
Also renumber init steps to match spec 063
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently if we read all 1's from the EEH or IRQ capabilities, we end
up train wrecking on some other random code (eg. an assert() in xive).
This hardens the PHB4 code to look for these bad reads and more
gracefully fails the init for that PHB alone. This allows the rest of
the system to boot and ignore those bad PHBs.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Find the CAPP on the chip associated with the HMI event for PHB4.
The recovery mode (re-initialization of the capp, resume of functional
operations) is only available with P9 DD2. A new patch will be provided
to support this feature.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
CAPP microcode flash download and CAPP upload for PHB4.
A new file 'capp.c' is created to receive common capp code for PHB3 and
PHB4.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Enable the Coherently attached processor interface. The PHB is used as
a CAPI interface.
CAPI Adapters can be connected to either PEC0 or PEC2. Single port
CAPI adapter can be connected to either PEC0 or PEC2, but Dual-Port
Adapter can be only connected to PEC2
CAPP0 attached to PHB0(PEC0 - single port)
CAPP1 attached to PHB3(PEC2 - single or dual port)
As we did for PHB3, a new specific file 'phb4-capp.h' is created to
contain the CAPP register definitions.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The workarounds for P9 DD1 are only needed for Nimbus. P9 Cumulus will
be DD1 but don't need these same workarounds.
This patch ensures the P9 DD1 workarounds only apply to Nimbus. It
also renames some things to make clear what's what.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
There are two issues in current implementation: It should return errcode
visibile to Linux, which has prefix OPAL_*. The code isn't very obvious.
This returns OPAL_HARDWARE when the PHB is broken. Otherwise, OPAL_SUCCESS
is always returned. In the mean while, It refactors the code to make it
obvious: OPAL_PCI_SLOT_PRESENT is returned when the presence signal (low active)
or PCIe link is active. Otherwise, OPAL_PCI_SLOT_EMPTY is returned.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we pass in a proc_chip structure to phys_map_get(). All we we
really need from this structure is the Global Chip ID (GCID). This
patch reworks the function so that we only need to pass the GCID which
allows us to use it before the proc_chip structures have been
initialised (i.e in the HDAT parser).
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Acked-By: Michael Neuling <mikey@neuling.org>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Implement CFG (config space) error injection.
This works the same as PHB3. MMIO and DMA error injection require a
rewrite, so they're unsupported for now.
While it's not feature complete, this at least provides an easy way to
inject an error that will trigger EEH.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In PHB3 there were separate recovery procedures depending on the class of
error. PHB4 performs almost exactly the same steps in recovering from any
class of error, so change phbX_err_ER_clear() to phbX_err_clear() for this
implementation.
Since the same sequence gets used, call this function in phb4_creset() -
which is used to handle fatal (fence) errors - where it was not called in
previous hardware revisions.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
During a hot reset the PCI link will drop, so we need to mask link down
events to prevent unnecessary errors.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
phb4_root_port_init() was a NOP before, so fix that.
Nothing PHB4-specific here. Something may be required in future.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This implements complete reset (creset) functionality for POWER9 DD1.
Only partially tested and contends with some DD1 errata, but it's a start.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Witherspoon systems come with a 'shared' PCI slot: physically, it
looks like a x16 slot, but it's actually two x8 slots connected to two
PHBs of two different chips. Taking advantage of it requires some
logic on the PCI adapter. Only the Mellanox CX5 adapter is known to
support it at the time of this writing.
This patch enables support for the shared slot on witherspoon if a x16
adapter is detected. Each x8 slot has a presence bit, so both bits
need to be set for the activation to take place. Slot sharing is
activated through a gpio.
Note that there's no easy way to be sure that the card is indeed a
shared-slot compatible PCI adapter and not a normal x16 card. Plugging
a normal x16 adapter on the shared slot should be avoided on
witherspoon, as the link won't train on the second slot, resulting in
a timeout and a longer boot time. Only the first slot is usable and
the x16 adapter will end up using only half the lines.
If the PCI card plugged on the physical slot is only x8 (or less),
then the presence bit of the second slot is not set, so this patch
does nothing. The x8 (or less) adapter should work like on any other
physical slot.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
[stewart@linux.vnet.ibm.com: re-org code, move into platform file]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
As current revisions of PHB4 don't properly handle the resulting
L1 link transition.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This was used for early broken simulators
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This renames the "poll" op to "run_sm" (short for run state machine).
I think this is a better name since the function does a bunch of
things like reseting the slot. Also it avoids confusion with the
"poll_link" op which does something different (and can even be called
from run_sm).
No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Having the option to disable EEH for MMIO without rebuilding skiboot
could be useful for testing, so check for pci-eeh-mmio=disabled in nvram.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We always assign BARs in phb4, so this removes the unnecessary force
assign logic.
This patch also cleanup the logging to make it less verbose.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|