Age | Commit message (Collapse) | Author | Files | Lines |
|
We disallow to inject error to reserved PE#, which is 255 instead
of 0 on PHB3. Otherwise, error OPAL_PARAM is returned when injecting
error to PE#0.
This fixes above issue by checking against the correct PE number 255.
Reported-by: Pradeep Ramanna <pramann2@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
merge in CAPI fixes from Ian and Daniel
|
|
If the PHB is fenced during phb3_pci_msi_check_q, it can get stuck in an
infinite loop waiting to lock the FFI. Further, as the phb lock is held
during this function it will prevent any other CPUs from dealing with
the fence, leading to the entire system hanging.
If the PHB_FFI_LOCK returns all Fs, return immediately to allow the
fence to be dealt with.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This fixes a critical bug in CAPI support.
CAPI requires that all faults are escalated into a fence, not a
freeze. This is done by setting bits in a number of MMIO
registers. phb3_set_capi_mode() calls phb3_init_capp_errors() to do
this. However, if the PHB is already in CAPP mode - for example in the
recovery case - phb3_set_capi_mode() will bail out early, and those
registers will not be set.
This is quite easy to verify. PCI config space access errors, for
example, normally cause a freeze. On a CAPI-mode PHB, they should
cause a fence. Say we have a CAPI card on PHB 0, and we inject a
PCI config space error:
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct_inboundA;
lspci;
The first time we inject this, the PHB will fence and recover, but
won't reset the registers. Therefore, the second time we inject it,
we will incorrectly freeze, not fence.
Worse, the recovery for the resultant EEH freeze event interacts
poorly with the CAPP, triggering an EEH recovery of the PHB. The
combination of the two attempted recoveries will get the PHB into
an inoperable state.
It's quite likely that there other side effects of bailing out
early. For example, the timebase sync probably fails to recover.
Rather than auditing all the possibilities, I verified that
repeating the entire setup procedure still works when the PHB is
already in CAPP mode. It does work, so just do the entire setup
every time instead of bailing out early.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This fixes a critical bug in CAPI support.
CAPI requires that all faults are escalated into a fence, not a
freeze. This is done by setting bits in a number of MMIO
registers. phb3_set_capi_mode() calls phb3_init_capp_errors() to do
this. However, if the PHB is already in CAPP mode - for example in the
recovery case - phb3_set_capi_mode() will bail out early, and those
registers will not be set.
This is quite easy to verify. PCI config space access errors, for
example, normally cause a freeze. On a CAPI-mode PHB, they should
cause a fence. Say we have a CAPI card on PHB 0, and we inject a
PCI config space error:
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct_inboundA;
lspci;
The first time we inject this, the PHB will fence and recover, but
won't reset the registers. Therefore, the second time we inject it,
we will incorrectly freeze, not fence.
Worse, the recovery for the resultant EEH freeze event interacts
poorly with the CAPP, triggering an EEH recovery of the PHB. The
combination of the two attempted recoveries will get the PHB into
an inoperable state.
It's quite likely that there other side effects of bailing out
early. For example, the timebase sync probably fails to recover.
Rather than auditing all the possibilities, I verified that
repeating the entire setup procedure still works when the PHB is
already in CAPP mode. It does work, so just do the entire setup
every time instead of bailing out early.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently, we have reserved PE#0 to which all RIDs are mapped prior
to PE assignment request from kernel. The last M64 BAR is configured
to have shared mode. So we have to cut off the first M64 segment,
which corresponds to reserved PE#0 in kernel. If the first BAR
(for example PF's IOV BAR) requires huge alignment in kernel, we
have to waste huge M64 space to accomodate the alignment. If we
have reserved PE#256, the waste of M64 space will be avoided.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In phb3_init_rc_cfg(), fix logical operand issue by guarding
p->has_link with brackets.
Fixes Coverity defect#97816.
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Extend the OPAL call phb3_set_capi_mode to configure CAPP timebase.
Inform Linux with the device tree property "ibm,capp-timebase-sync.
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The last M64 (64-bits MMIO) BAR is always enabled and it is equal
to the PHB's M64 window. Also, the BAR is split to 256 segments
and each PE will have one segment in it. However, the VF PE takes
another BAR other than the last one to accomodate its M64 resources.
So current code will always give wrong M64 base address and size
when injecting M64 error for specified VF PE.
In order to fix the issue, we have to recognize the type of the
target PE: (A) bus dependent or (B) PCI device (VF) dependent.
For (A), we figure out the M64 base address and length from the
last M64 BAR. For (B), we scan from BAR#0 to BAR#14 and first
hit wins.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Before the SRIOV is enabled, the only supported PE type is PCI bus
dependent PE when doing error injection via PCI config space. That
means the device/function number are ignored when writing to PAPR
error injection address/mask registers (0x2b8 and 0x2c0) to inject
PCI config access caused errors. If user intends to inject error
to one VF, which is binding with individual PE, all VFs hooked to
same PCI bus might receive errors wrongly.
The patch fixes above issue by writing correct PCI config address
to the registers according to the PE type: bus dependent or PCI
device dependent PE.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When enabling (x+1) VFs on Mellanox adapter after disabling its
SRIOV capability, which has been enabled with (x) VFs. We might
hit EEH error caused by bogus PE number in RTC. The reason how
the bogus PE number shows up in RTC isn't known yet. The patch
to invalidate the entire RTC on updating RTT, as workaround,
helps avoiding the problem:
# lspci -s 0002:01:00.0
0002:01:00.0 Ethernet controller: Mellanox Technologies \
MT27500 Family [ConnectX-3]
BZ: 125893
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When one RID is unmapped from its corresponding PE# as requested by OS,
the reserved PE#, which is PE#0, should be picked to cover the RID.
The patch fixes the wrong reserved PE# for PHB3.
BZ: 125893
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We have problem with some Mellanox cards
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
phb3_init_hw() is called to do PHB reset in order to recover from
fenced PHB. During the time, we shouldn't try to add duplicated
property "ibm,32-bit-bypass-supported", which causes crash.
Reported-by: Chad Larson <clarson@vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Chad Larson <clarson@vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
On failing to load CAPP microcode, we would call wait_for_resource_loaded
for the next PHB but without issuing a new request, thus making
wait_for_resource spin forever waiting for something that will
never complete.
Fix is to just track result of load.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
|
|
This means VPD LID is already loaded before we start preloading
kernel and initramfs LIDs, thus ensuring VPD doesn't have to wait
for them to finish being read from FSP.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Instead of synchronously waiting for CAPP microcode during PCI probe,
start preload of CAPP microcode early in boot so that it's present
when we need it during PCI probing.
On some platforms (astbmc), flash access is serialized, and prior to
this patch, the async preload of BOOTKERNEL would have to finish before
loading CAPP ucode would start, needlessly slowing boot.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The comment says 1s but we are really only waiting for 100ms and this
isn't enough for some Altera FPGA cards it seems.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
A performance issue on HPC workloads was identified with some network
adapters due to the specific DMA access patterns they use which hits
a worst-case scenario in the PHB.
Disabling the write scope group feature in the PHB works around this,
so let's do that when we detect such an adapter in a PCIe direct slot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Use #defines rather than magic numbers.
No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The comment says 1s but we are really only waiting for 100ms and this
isn't enough for some Altera FPGA cards it seems.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
A performance issue on HPC workloads was identified with some network
adapters due to the specific DMA access patterns they use which hits
a worst-case scenario in the PHB.
Disabling the write scope group feature in the PHB works around this,
so let's do that when we detect such an adapter in a PCIe direct slot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The presence detect bit in the standard root complex config space is
not properly implemented on some IBM PHBs. Using it during probe is
incorrect.
We already have a workaround using the hotplug override "AB" detect
bits in the PHB3 code but it somewhat relies on the standard presence
detect bit returning false positives, which happened on Venice/Murano
but no longer happens in Naples.
Similarly, all the slot control stuff in the generic pci_enable_bridge()
isn't going to work properly on the PHB root complex and is unnecessary
as this code is only called after the upper layers have verified the
presence of a valid link on the PHB (the slot power control for the PHB
is handled separately).
This fixes it all by removing the AB detect flag, and unconditionally
using those bits in PHB3 presence detect along with making sure the
code in pci_enable_bridge() that manipulates the slot controls is
only executed on downstream ports.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This detects the new PHB revision and does the appropriate updates
to the init sequence.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Now that opal.h includes opal-api.h, there are a bunch of files that
include both but don't need to.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
No functional changes in what happens, just have two calls, one for
queueing preload the other for waiting until it has loaded.
future patches will introduce platform specific queueing.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
..which will result lock error in OPAL.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This reworks the CAPP microcode flash download and CAPP upload.
We now use load_resource() to download microcode from flash rather than
assuming we are on an FSP. This means we can download the microcode on PNOR
based systems.
Also, currently the code associates the microcode upload with the PHB. This
means we store one copy of the microcode for every PHB in the system. This
patch changes this so that we only save one copy of the microcode for the whole
system. We mark if the microcode as been uploaded in the CAPP unit based on
the chip, rather than the PHB. We add a check in case the system has two
different chip ECs in the one system but such a Frankenmachine should never be
built!
We keep the microcode around in case we need it for a recovery event.
It also harmonises the CAPP printks to look the same.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The current CAPP microcode lid parsing code assumes skiboot is running big
endian which may not always be the case.
This rewrites the lid parsing code to be endian safe.
It also cleans up the code a bunch.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This moves some code around to avoid needed to predefine
do_capp_recovery_scoms().
No code changes.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently we hardware the CAPP to use PHB0. This works for tuleta but not for
other systems.
This makes the port mapping dynamic so that first PHB to request a mode change
to CAPI will get the CAPP port mapped to it.
Calls to switch addition PHBs to CAPI mode will fail.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In several places, a "bus/device/function" u16 was being directly
or'ed into an address using a left-shift. This should be using
SETFIELD, especially now that all _LSH have been removed.
Change use of BDFN (bus/device/function) field from using plain
left-shift to using SETFIELD(). Add proper BDFN field definitions.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The last two patches updated GETFIELD() and SETFIELD() to no longer
require the user to specify the mask and shift of a field, and to
remove all _LSH defines and rename any _MASK defines. There are
some places where the masks were used directly, where the caller
needs to have the _MASK suffix removed. There are also two users
of SETFIELD() where the field name still has the _MASK suffix
because there is an existing macro with the base name.
Change users of SETFIELD() to include the _MASK suffix where needed.
Change direct users of any mask to remove the _MASK suffix.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is probably not the best collection of things in the world,
but it means that opal.h is much closer to being directly usable
by an OS.
This triggers a bunch of #include fixes throughout the tree.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
|
|
This increase various timeouts as per CQ SW283991 which should help
with some external drawers and GPUs.
We also fixup the timeouts in the PEC which HB won't do before GA3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Merge PHB3 fixes from stable update
|
|
This increase various timeouts as per CQ SW283991 which should help
with some external drawers and GPUs.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On a re-ipl or warm reboot, Sapphire asserts and deasserts PERST to each slot.
This results in the fpga image loaded into the flash for a CAPP adapter. HMIs
have been observed with a 200ms wait following PERST deassert, so bump time up
to 1s. Do this for all cases because re-ipl does not preserve memory and we'd
need a mechanism for Sapphire know that there is a CAPP adapter.
We might be able to reduce this to 750ms or 500ms but need more testing. Use
1s to be safe. Also, phyp fw uses 1s after deassert.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The patch enables injecting PCI errors to DMA address address,
including 32-bits and 64-bits ranges.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch enables injecting PCI errors to 64-bits MMIO range.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When doing error injection to 32-bits MMIO range, fixed length 8MB
is used. That's incorrect as one PE might span multile segments.
Also the 32-bits MMIO segment size isn't 8MB necessarily.
The patch fixes the issue to cover all (contiguous) 32-bits MMIO
segments assigned to the specified PE. Also, it fixes the 48 bits
of 50 bits AIB address, instead of all bits used for comparison.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch refactors the code we had for PCI error injection. It
doesn't change the logic:
* Rename names of error types and functions according to the
comments given by Michael Ellerman when reviewing the kernel
counterpart.
* Split The backend of error injection for PHB3 and P7IOC to
multiple functions to improve code readability. Some logics
are simplified without affecting their original functionality.
* Misc cleanup like renaming variables and functions.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Use real functionality based flags instead of a mode list in the DT
and other cleanups & missing bits (this one actually builds !)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Add a flag indicating the CAPP unit is in recovery. When a capp recoverable
malfunction HMI comes in, the HMI handler will call into
phb3_set_capp_recovery, which will put set the flag and send the event to
Linux.
EEH will call phb3_next_error which will tell it the phb is fenced.
EEH will then call into sapphire to reinitialize the phb which contains steps
3-5 of capp recovery procedure. The code increases wait time of PERST to 1s to
ensure fpga download is complete before polling linkup.
EEH will then rebind the cxl driver and it will complete recovery once it
initializes and turns snoops on, steps 7-8, completing capp recovery procedure.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For user initiated capp recovery, provide a mode to turn snoops off. The perst
alone does not turn snoops off and we need to do this as part of the capp
recovery procedure before reinitializing the phb.
A second mode turns snoops back on after recovery. The driver needs to do this
after it reinitializes the PSL otherwise tlbies could come in before the psl is
initialized. Also write 0 to capp error status and control as part of the
recovery procedure.
Put modes as flag defines in opal.h so the driver can pick them up.
Add a dt property "ibm,capi-modes" which tells the driver which modes sapphire
supports. For backwards compatibility with older opals. Also, the driver can
disable reset in sysfs if not supported.
Move the mode checking into phb3.c so it's all in one place.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
FLUSH_SUE_STATE_MAP change fixes a problem with recovery. We were using an old
lab value that marked PTE entries in a shared state. After recovery, PTE
entries were getting flushed out to memory with an SUE, resulting in a machine
check. The new value means PTE entries are dropped on recovery.
For, APC_MASTER_PB_CTRL spec says to use initfile value and bit 3 should be
set. Initfile missing bit 3 so do a RMW. Bit 3 enables CAPP combined
response.
CAPP_EPOCH_TIMER_CTRL enables epoch timers and the recovery timer when recovery
is enabled. Also relax epoch timer period mask due to a bug.
TRANSPORT_CONTROL reg set bit 37 - rfs_benign_ptr_data in addition to spec
value. Should be set in initifile in future.
Rename APC_MASTER_CONFIG to APC_MASTER_CAPI_CTRL to match workbook name.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch adds function pci_device_init(), which is called by
phb->ops->device_init() to apply common initialization on the
specified PCI device during bootup or after PE reset.
Currently, we only put the logic of MPS configuration to the
function, but more will be put there.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|