Age | Commit message (Collapse) | Author | Files | Lines |
|
Yes, a bunch of these should be.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Yes they should. Do so by adding static to the macro.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Yes they should. Also, some are unused so we comment them out to at
least keep the code as documentation complete.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
be static?
Yes they should.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Yes, yes it should.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The HIOMAP protocol was developed after the release of P8 in preparation
for P9. As a consequence P9 always uses it, but it has rarely been
enabled for P8. P8DTU has recently added IPMI HIOMAP support to its BMC
firmware, so enable its use in skiboot with P8 machines. Doing so
requires some rework to ensure fallback works correctly as in the past
the fallback was to mbox, which will only work for P9.
Tested on Garrison, Palmetto without HIOMAP, Palmetto with HIOMAP, and
Witherspoon.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The postcondition of probing with a lock sequence is easier to make
correct than with unlock. The original implementation left SuperIO
locked after execution which broke an assumption of some callers.
Tested on Garrison, Palmetto without HIOMAP, Palmetto with HIOMAP and
Witherspoon.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Putting "Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]"
(more precisely, the second of 2 its PCI functions, no matter in what
order) into the D3 state causes EEH with the "PCT timeout" error.
This has been noticed on garrison machines only and firestones do not
seem to have this issue.
This disables D-states changing for devices on root buses on Naples by
installing a config space access filter (copied from PHB4).
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-By: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface
capabilities which includes IPMI message max resend count, message
timeout, etc,. Most of the time OPAL gets response from BMC within
specified timeout. In some corner cases (like mboxd daemon reset in BMC,
BMC reboot, etc) OPAL may not get response within timeout period. In
such scenarios, OPAL resends message until max resend count reaches.
OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few
operations like flash read, write, etc. Thread will wait in OPAL until
it gets response from BMC. In some corner cases like BMC reboot, thread
may wait in OPAL for long time (more than 20 seconds) and results in
kernel hardlockup.
This patch introduces new interface to disable message resend option. We
will disable message resend option for synchrous message. This will
greatly reduces kernel hardlock up issues.
This is short term fix. Long term solution is to convert all synchronous
messages to asynhrounous one.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In some corner cases (like BMC reboot), bt_send_and_unlock() starts
message timer, but won't send message to BMC as driver is not free to
send message. bt_expire_old_msg() function enables H2B interrupt without
actually sending message.
This patch fixes above issue.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The OPAL_PCI_EEH_FREEZE_STATUS call takes a bunch of parameters, one of
them is @phb_status. It is defined as __be64* and always NULL in
the current Linux upstream but if anyone ever decides to read that status,
then the PHB3's handler will assume it is struct OpalIoPhb3ErrorData*
(which is a lot bigger than 8 bytes) and zero it causing the stack
corruption; p7ioc-phb has the same issue.
This removes @phb_status from all eeh_freeze_status() hooks and moves
the error message from PHB4 to the affected OPAL handlers.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-By: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
I now know what an IODA cache is and I'm not happy about it. With
the power of Comments™ you too can share the misery.
Remove the big WARNING about the P8 specific hardware bug while
we're here. That seems to have been copied over from phb3.c and
no one thought about it too hard.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The PELT-V is also an in-memory table and there is no reason to have two
copies of it. Removing the cache shaves another 128KB off the size of
each struct phb4.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In ancient times we added a caches to struct phb3 for some of the IODA
tables which can only be accessed in-directly via XSCOM. A cache for the
Requester Translation Table (RTT) was also added even though this is an
in-memory table. This was carried over to PHB4 when Ben did the initial
copy and paste, but it's still largely pointless.
There's no real need to have a second copy of the table. This patch
removes the "cache" and changes all the users to reference the RTT
directly if we need to. This reduces the size of the struct phb4 by
128KB.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
When we allocate the various in-memory tables we assert() on the
allocation. There's no point in checking if the table pointer is NULL
or not at runtime.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
General cleanup. For a function that does nothing more than a
mask-and-compare the current implementation is way more convoluted
than it has any right to be.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
gets response from OPAL it runs opal_poll_events() until firmware
handles the request.
On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
initiate system reboot/shutdown. At present OPAL queues IPMI messages
and return SUCESS to Host. If BMC is not ready to accept command (like
BMC reboot), then these message will fail. We have to manually
reboot/shutdown the system using BMC interface.
This patch adds logic to validate message return value. If message failed,
then it will resend the message. At some stage BMC will be ready to accept
message and handles IPMI message.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Enable a new PVR to get us running on another p9 variant.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This reverts commit bd9839684d482417e8c60449592f4308e9a91dac as it broke
booting on P8 systems, including Garrison (AMI BMC), Firestone (AMI BMC)
and QEMU (BMC simulator).
Issue https://github.com/open-power/skiboot/issues/217 tracks the
failure. The P8 IPMI HIOMAP feature can be re-enabled once this issue is
resolved.
Reported-by: Sam Mendoza-Jonas <sam@mendozajonas.com>
Reported-by: Sam Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Sam Mendoza-Jonas <sam@mendozajonas.com>
Acked-by: Sam Mendoza-Jonas <sam@mendozajonas.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This is not a shipping product and is no longer supported by Linux
or other firmware components.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Change print level from debug to warning for reporting
bad EC_PPM_SPECIAL_WKUP_* scom values. To reduce cluttering
in the log print only on error.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Removing init routines required for Power8 DD1, but was enabled for all
Power8 DD versions.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The HIOMAP protocol was developed after the release of P8 in preparation
for P9. As a consequence P9 always uses it, but it has rarely been
enabled for P8. P8DTU has recently added IPMI HIOMAP support to its BMC
firmware, so enable its use in skiboot with P8 machines. Doing so
requires some rework to ensure fallback works correctly as in the past
the fallback was to mbox, which will only work for P9.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We now have a dt function to do this. Use it.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In npu2_populate_devices(), we do
p->phb_nvlink.scan_map |= 0x1 << ((dev->bdfn & 0xf8) >> 3);
:
dev->nvlink.pvd = pci_virt_add_device(&p->phb_nvlink, dev->bdfn, 0x100, dev);
/* At this point, dev->nvlink.pvd->bdfn = dev->bdfn */
if (dev->nvlink.pvd) {
p->phb_nvlink.scan_map |= 0x1 << ((dev->nvlink.pvd->bdfn & 0xf8) >> 3);
:
}
Because dev->nvlink.pvd->bdfn equals dev->bdfn, the second assignment to
scan_map is redundant. Remove it.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
When killing multiple pages, npu2_tce_kill() loops doing single page
kills, but never advances the address. Fix this.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This variable is never used. Remove it.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This variable is never used. Remove it.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This cache is written but never read. Wiring it up would gain us little
(except added complexity), and it obviously hasn't been missed thus far,
so remove it altogether.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We assign pci_cmd, but it never gets used. Remove it.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We've listed npu2-common.o twice. Remove one.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
While disabling CAPP an HMI gets triggered as soon as ETU is put in
reset mode. This is caused as before we can disabled CAPP, it detects
PHB link going down and triggers an HMI requesting Opal to perform
CAPP recovery. This has an un-intended side effect of spamming the
Opal logs with malfunction alert messages and may also confuse the
user.
To prevent this we mask the CAPP FIR error 'PHB Link Down' Bit(31)
when we are disabling CAPP just before we put ETU in reset in
phb4_creset(). Also now since bringing down the PHB link now wont
trigger an HMI and CAPP recovery, hence we manually set the
PHB4_CAPP_RECOVERY flag on the phb to force recovery during creset.
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We implement h/w sequence to disable CAPP in disable_capi_mode() and
with it also enable fast-reset for CAPI mode in phb4_set_capi_mode().
Sequence to disable CAPP is executed in three phases. The first two
phase is implemented in disable_capi_mode() where we reset the CAPP
registers followed by PEC registers to their init values. The final
third final phase is to reset the PHB CAPI Compare/Mask Register and
is done in phb4_init_ioda3(). The reason to move the PHB reset to
phb4_init_ioda3() is because by the time Opal PCI reset state machine
reaches this function the PHB is already un-fenced and its
configuration registers accessible via mmio.
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This patch introduces a PHB4 flag PHB4_CAPP_DISABLE and scaffolding
necessary to handle it during CRESET flow. The flag is set when CAPP
is request to switch to PCIe mode via call to phb4_set_capi_mode()
with mode OPAL_PHB_CAPI_MODE_PCIE. This starts the below sequence that
ultimately ends in newly introduced phb4_slot_sm_run_completed()
1. Set PHB4_CAPP_DISABLE to phb4->flags.
2. Start a CRESET on the phb slot. This also starts the opal pci reset
state machine.
3. Wait for slot state to be PHB4_SLOT_CRESET_WAIT_CQ.
4. Perform CAPP recovery as PHB is still fenced, by calling
do_capp_recovery_scoms().
5. Call newly introduced 'disable_capi_mode()' to disable CAPP.
6. Wait for slot reset to complete while it transitions to
PHB4_SLOT_FRESET and optionally to PHB4_SLOT_LINK_START.
7. Once slot reset is complete opal pci-core state machine will call
slot->ops.completed_sm_run().
8. For PHB4 this branches newly introduced 'phb4_slot_sm_run_completed()'.
9. Inside this function we mark the CAPP as disabled and un-register
the opal syncer phb4_host_sync_reset().
10. Optionally if the slot reset was unsuccessful disable
fast-reboot.
****************************
Notes:
****************************
a. Function 'disable_capi_mode()' performs various sanity tests on CAPP to
to determine if its ok to disable it and perform necessary xscoms
to disable it. However the current implementation proposed in this
patch is a skeleton one that just does sanity tests. A followup patch
will be proposed that implements the xscoms necessary to disable CAPP.
b. The sequence expects that Opal PCI reset state machine makes
forward progress hence needs someone to call slot->ops.run_sm(). This
can be either from phb4_host_sync_reset() or opal_pci_poll().
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This patch introduces a new opal syncer for PHB4 named
phb4_host_sync_reset(). We register this opal syncer when CAPP is
activated successfully in phb4_set_capi_mode() so that it will be
called at kernel shutdown during fast-reset.
During kernel shutdown the function will then repeatedly call
phb->ops->set_capi_mode() to switch switch CAPP to PCIe mode. In case
set_capi_mode() indicates its OPAL_BUSY, which indicates that CAPP is
still transitioning to new state; it calls slot->ops.run_sm() to
ensure that Opal slot reset state machine makes forward progress.
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Presently phb4_set_capi_mode() performs certain CAPP checks
like, checking of CAPP ucode loaded or checks if CAPP is still in
recovery, even when the requested mode is to switch to PCI mode.
Hence this patch updates and re-factors phb4_set_capi_mode() to make
sure CAPP related checks are only performed when request to enable
CAPP is made by mode==OPAL_PHB_CAPI_MODE_CAPI/DMA_TVT1. We also update
other possible modes requests to return a more appropriate status code
based on if CAPP is activated or not.
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Previously struct proc_chip member 'capp_phb3_attached_mask' was used
for Power-8 to keep track of PHB attached to the single CAPP on the
chip. CAPP on that chip supported a flexible PHB assignment
scheme. However since then new chips only support a static assignment
i.e a CAPP can only be attached to a specific PEC.
Hence instead of using 'proc_chip.capp_phb4_attached_mask' to manage
CAPP <-> PEC assignments which needs a global lock (capi_lock) to be
updated, we introduce a new struct named 'capp' a pointer to which
resides inside struct 'phb4'. Since updates to struct 'phb4' already
happen in context of phb_lock; this eliminates the
need to use mutex 'capi_lock' while updating
'capp_phb4_attached_mask'.
This struct is also used to hold CAPP specific variables such as
pointer to the 'struct phb' to which the CAPP is attached,
'capp_xscom_offset' which is the xscom offset to be added to CAPP
registers in case there are more than 1 on the chip, 'capp_index'
which is the index of the CAPP on the chip, and attached_pe' which is
the process endpoint index to which CAPP is attached. Finally member
'chip_id' holds the chip-id thats used for performing xscom
read/writes.
Also new helpers named capp_xscom_read()/write() are introduced to
make access to CAPP xscom registers easier.
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This reverts commit d8b161f4b361f70a7bb43be47d4a32b8f937287a.
As discussed on list, a bit premature to merge, removing for now.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Okay, so maybe the static analysis warning is all useless, and maybe
having the ifdef around a call is actually useful. I'll take the less
noise in my CI static analysis thing.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
If a GPU is passed through to a guest and the guest unexpectedly terminates,
there can be cache lines in CPUs that belong to the GPU. So purge the caches
as part of the reset sequence. L1 is write through, so doesn't need to be purged.
The sequence to purge the L2 and L3 caches from the hw team:
"L2 purge:
(1) initiate purge
putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TYPE L2CAC_FLUSH -all
putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER ON -all
(2) check this is off in all caches to know purge completed
getspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_REG_BUSY -all
(3) putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER OFF -all
L3 purge:
1) Start the purge:
putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_TTYPE FULL_PURGE -all
putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ ON -all
2) Ensure that the purge has completed by checking the status bit:
getspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ -all
You should see it say OFF if it's done:
p9n.ex k0:n0:s0:p00:c0
EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ
OFF"
Suggested-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Each XTS MMIO ATSD# register is accompanied by another register -
XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD
transactions.
When a host system passes a GPU through to a guest, we need to enable
some ATSD for an LPAR. At the moment the host assigns one ATSD to
a NVLink bridge and this maps it to an LPAR when GPU is assigned to
the LPAR. The link number is used for an ATSD index.
ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to be
acceptable price for the simplicity.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized
@pci_error_type parameter and then analyzes it even if the OPAL call
returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU
freezes.
This initializes @pci_error_type and @severity to known safe values.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The P9 NPU workbook says that only 4K/64K/16M/256M page size are supported
and in fact npu2_map_pe_dma_window() supports just these but in absence of
the "ibm,supported-tce-sizes" property Linux assumes the default P9 PHB4
page sizes - 4K/64K/2M/1G - so when Linux tries 2M/1G TCEs, we get lots of
"Unexpected TCE size" from npu2_tce_kill().
This advertises TCE page sizes so Linux could handle it correctly, i.e.
fall back to 4K/64K TCEs.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
If the link trains in degraded mode, log the ODL endpoint information
register for debug. Its content is specific to the DLx and TLx
implementation, so this is really information useful for the hardware
team.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
There's no status readily available to tell the effective link
width. Instead, we have to look at the individual status of each lane,
on the transmit and receive direction. All relevant information is in
the ODL status register.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Log the link training status register in case of failure to train.
It can have useful information for the hardware team.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|