Age | Commit message (Collapse) | Author | Files | Lines |
|
A bad GPU or other condition may leave us with a subset of links that
never get initialized. If an ATSD is sent to one of those bricks, it
will never complete, leaving us waiting forever for a response:
watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050]
...
Modules linked in: nvidia_uvm(O) nvidia(O)
CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2
task: c0000000285cfc00 task.stack: c000001fea860000
NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60
REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000
CFAR: c0000000000abdf4 SOFTE: 1
GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820
GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560
GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff
GPR12: 0000000000008000 c000000003167e80
NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0
LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370
ATSDs are only sent to bricks which have a valid entry in the XTS_BDF
table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless
we make it all the way to creating a context for the BDF.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The cxl driver will set the capi value, like other drivers already do.
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Add support to load the imc catalog from a lid file packaged
as part of the system firmware. Lid number allocated
is 0x80f00103.lid.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This happens normally if a slot doesn't have a working HW presence
detect and relies instead of inband presence detect.
The message we display is scary and not very useful unless ou
are debugging, so quiten it up and change it to something more
meaningful.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
pause_microcode_at_boot() loops through all the chip's ucode
control block and pause the ucode if it is in the running state.
But it does not fail if any of the chip's ucode is not initialised.
Add code to return a failure if ucode is not initialized in any
of the chip. Since pause_microcode_at_boot() is called just before
attaching the IMC device nodes in imc_init(), add code to check for
the function return.
Fixes: 9750eee802f8d ('hw/imc: pause microcode at boot')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The ASN indication is used for tunneled operations (as_notify and
atomics). Tunneled operation messages can be sent in PCI mode as
well as CAPI mode.
The address field of as_notify messages is hijacked to encode the
LPID/PID/TID of the target thread, so those messages should not go
through address translation. Therefore bit 59 is part of the ASN
indication.
This patch sets TVT#1 in bypass mode when capi mode is enabled,
to prevent as_notify messages from being dropped.
Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Hardware has limitations which would require to put a sync after each
store EOI to make sure the MMIO operations that change the ESB state
are ordered. This is a killer for performance and the PHBs do not
support the sync. So remove the store EOI for the moment, until
hardware is improved.
Also, while we are at changing the XIVE source flags, let's fix the
settings for the PHB4s which should follow these rules :
- SHIFT_BUG for DD10
- STORE_EOI for DD20 and if enabled
- TRIGGER_PAGE for DDx0 and if not STORE_EOI
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The function phb4_probe_stack() resets "ETU Reset Register" to
unfreeze the PHB before it performs mmio access on the PHB. However in
case the FIR/NFIR registers are set while entering this function,
the reset of "ETU Reset Register" wont unfreeze the PHB and it will
remain fenced. This leads to failure during initial CRESET of the PHB
as mmio access is still not enabled and an error message of the form
below is logged:
PHB#0000[0:0]: Initializing PHB4...
PHB#0000[0:0]: Default system config: 0xffffffffffffffff
PHB#0000[0:0]: New system config : 0xffffffffffffffff
PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff
PHB#0000[0:0]: Waiting for DLP PG reset to complete...
<snip>
PHB#0000[0:0]: Timeout waiting for DLP PG reset !
PHB#0000[0:0]: Initialization failed
This is especially seen happening during the MPIPL flow where SBE
would quiesces and fence the PHB so that it doesn't stomp on the main
memory. However when skiboot enters phb4_probe_stack() after MPIPL,
the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU
reset is done.
So to fix this issue the patch introduces new xscom writes to
phb4_probe_stack() to reset the FIR/NFIR registers before performing
ETU reset to enable mmio access to the PHB.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch updates do_capp_recovery_scoms() to poll the CAPP
Err/Status control register, check for CAPP-Recovery to complete/fail
based on indications of BITS-1,5,9 and then proceed with the
CAPP-Recovery scoms iif recovery completed successfully. This would
prevent cases where we bring-up the PCIe link while recovery sequencer
on CAPP is still busy with casting out cache lines.
In case CAPP-Recovery didn't complete successfully an error is returned
from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4
fenced and mark it as broken.
The loop that implements polling of Err/Status register will also log
an error on the PHB when it continues for more than 168ms which is the
max time to failure for CAPP-Recovery.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This can happen under mambo, at least.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Peer-to-peer GPU bandwidth latency testing has produced some tunable
values that improve performance. Add them to our device initialization.
File these under things that need to be cleaned up with nice #defines
for the register names and bitfields when we get time.
A few of the settings are dependent on the system's particular NVLink
topology, so introduce a helper to determine how many links go to a
single GPU.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This gets used elsewhere to index items in the XTS tables.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
[arbab@linux.vnet.ibm.com: Added commit log]
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17.
We don't need this as we need to do it a different way, with a explicit
set of registers as otherwise we trip other random FIR bits and everything
becomes even more terrible.
I suggest alcohol.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fixes: CID 263056 and 263052
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fixes: ac4272bf ("fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt")
Fixes: CID 263053
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fixes: CID 264267
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The follow pattern exists in several npu2 functions:
struct phb *phb = pci_get_phb(phb_id);
struct npu2 *p = phb_to_npu2_nvlink(phb);
The problem is that pci_get_phb() can return NULL and
phb_to_npu2_nvlink() dereferences its parameter. Coverity says that the
return value of pci_get_phb() is checked 43 out of 46 times which
suggests we should be more careful.
Futhurmore, functions with the baddly placed call to
phb_to_npu2_nvlink() do seem to check that the return value of
pci_get_phb() isn't NULL, but this check would be too little too late.
This patch just moves the call of phb_to_npu2_nvlink() to after the
NULL check for the return value of pci_get_phb().
Affected functions are:
opal_npu_map_lpar()
opal_npu_init_context()
opal_npu_destroy_context()
Fixes: CID 264274, 264273, 264272, 264271, 264266, 264265
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Both scale_sensor() and scale_energy() take the value to scale as a
pointer. These functions do not NULL check the pointer before the first
time they dereference it, which is fine since passing NULL would be
completely pointless.
Both functions do perform a pointless NULL check later on. This
confuses coverity and really doesn't make much sense at all. Since
calling these functions with NULL as the sensor parameter makes no
sense, and currently theres a dereference before the check, just remove
the check.
Fixes: CID 264276 and 264275
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This means that we no longer hit this bug if we fail to get valid pstates
from the OCC.
[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
[ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
[ 10.318805] Disabling lock debugging due to kernel taint
[ 10.318808] Severe Machine check interrupt [Not recovered]
[ 10.318812] NIP [000000003003e434]: 0x3003e434
[ 10.318813] Initiator: CPU
[ 10.318815] Error type: Real address [Load/Store (foreign)]
[ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
[ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3
[ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240
[ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1)
[ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000
[ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
[Additional changes from Shilpa]
Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
In case of error, opal_xive_set_vp_info() will return without
unlocking the xive object. This is most certainly a typo.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of
revision-specific code.
Now that all our lab machines are DD2, we no longer test anything on DD1
and it's time to get rid of it.
Remove DD1-specific code and abort probe if we're running on a DD1 machine.
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We've been carting around this field since the original p7ioc-phb code.
As far as I can tell we never actually use it for anything other than
checking if the PHB has been marked as broken or not. The _FENCED
state is set in a few places, but we never use it in favour of just
checking the MMIO register.
This patch just replaces it with a boolean that indicates if
the PHB has been marked as broken and removes the giant, mostly
wrong, comment explaining it's usage that is copied and pasted
into each phb header file.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
and provide the subsequent data block to compute. The wakup latency
accumulates over the run and shows up as a performance drop.
Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
the residency from 1ms to 10ms makes the performance drop <1%
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We coded few workarounds in special wakeup logic to handle the
buggy firmware. Now that is fixed remove them as they break the
special wakeup protocol. As per the spec we should not de-assert
beofre assert is complete. So follow this protocol.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fast reboot does not yet work right with the NPU. It's been disabled on
NVLink and OpenCAPI machines. Do the same for NVLink2.
This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset")
from the npu code to npu2.
Cc: stable # 5.10.x
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The XTS_PID context table is limited to 256 possible pids/contexts. To
relieve this limitation, make use of "unfiltered mode" instead.
If an entry in the XTS_BDF table has the bit for unfiltered mode set, we
can just use one context for that entire bdf/lpar, regardless of pid.
Instead of of searching the XTS_PID table, the NMMU checkout request
will simply use the entry indexed by lparshort id instead.
Change opal_npu_init_context() to create these lparshort-indexed
wildcard entries (0-15) instead of allocating one for each pid. Check
that multiple calls for the same bdf all specify the same msr value.
In opal_npu_destroy_context(), continue validating the bdf argument,
ensuring that it actually maps to an lpar, but no longer remove anything
from the XTS_PID table. If/when we start supporting virtualized GPUs, we
might consider actually removing these wildcard entries by keeping a
refcount, but keep things simple for now.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This reverts commit 20f685a3627a2a522c465716377561a8fbcc608f.
We've hit problems on Zaius machines and the needed petitboot changes
haven't made it upstream yet.
Let's revert for the time being while we sort everything out.
We probably have to keep both around for a few years.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch updates phb4_set_capi_mode() to disable fast-reboot
whenever enable_capi_mode() is called, irrespective to its return
value. This should prevent against a possibility of not disabling
fast-reboot when some changes to enable_capi_mode() causing return of
an error and leaving CAPP in enabled mode.
Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Presently when we encounter an error while synchronizing capp timebase
with chip-tod at the end of enable_capi_mode() we return an
error. This has an to unintended consequences. First this will prevent
disabling of fast-reboot even though CAPP is already enabled by this
point. Secondly, failure during timebase sync is a non fatal error or
capp initialization as CAPP/PSL can continue working after this and an
AFU will only see an error when it tries to read the timebase value
from PSL.
So this patch updates enable_capi_mode() to not return an error in
case call to chiptod_capp_timebase_sync() fails. The function will now
just log an error and continue further with capp init sequence. This
make the current implementation align with the one in kernel 'cxl'
driver which also assumes the PSL timebase sync errors as non-fatal
init error.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We don't support resetting an opencapi link yet.
Commit fe6d86b9 ("pci: Make fast reboot creset PHBs in parallel")
tries resetting any PHB whose slot defines a 'run_sm' callback. It
raises an assert when applied to an opencapi PHB, as 'run_sm' calls
the 'freset' callback, which is not yet defined for opencapi.
Fix it for now by removing the currently useless definition of
'run_sm' on the opencapi slot. It will print a message in the skiboot
log because the PHB cannot be reset, which is correct. It will all go
away when we add support for resetting an opencapi link.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Update fsp_lid_map to include CAPP ucode lid for phb4-chipid ==
0x202d1 that corresponds to P9 DD-2.2 chip.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
P9 supports PCI tunneled operations (atomics and as_notify) that are
initiated by devices.
A subset of the tunneled operations require a response, that must be
sent back from the host to the device. For example, an atomic compare
and swap will return the compare status, as swap will only performed
in case of success. Similarly, as_notify reports if the target thread
has been woken up or not, because the operation may fail.
To enable tunneled operations, a device driver must tell the host where
it expects tunneled operation responses, by setting the PBCQ Tunnel BAR
Response register with a specific value within the range of its BARs.
This register is currently initialized by enable_capi_mode(). But, as
tunneled operations may also operate in PCI mode, a new API is required
to set the PBCQ Tunnel BAR Response register, without switching to CAPI
mode.
This patch provides two new OPAL calls to get/set the PBCQ Tunnel
BAR Response register.
Note: as there is only one PBCQ Tunnel BAR register, shared between
all the devices connected to the same PHB, only one of these devices
will be able to use tunneled operations, at any time.
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
P9 supports PCI tunneled operations (atomics and as_notify) that require
setting the PHB ASN Compare/Mask register with a 16-bit indication.
This register is currently initialized by enable_capi_mode(). But, as
tunneled operations may also work in PCI mode, the ASN Compare/Mask
register should rather be initialized in phb4_init_ioda3().
This patch also adds "ibm,phb-indications" to the device tree, to tell
Linux the values of CAPI, ASN, and NBW indications, when supported.
Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies
in PCI mode.
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
dtc tool complaining about below warning as usage of linux,stdout-path
property under /chosen node is deprecated.
dts: Warning
(chosen_node_stdout_path): Use 'stdout-path' instead of 'linux,stdout-path'
So this patch fix this by using stdout-path property on all the systems
and keep linux,stdout-path only on P8 and before. This property refers to
a node which represents the device to be used for boot console output.
Verified boot on both P8 and P9 systems with new and older kernels.
And also verified dtc warnings got fixed in both P8 and P9.
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[stewart: simplify logic]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Ref[1] enables fast-reboot by default for POWER9. Presently
fast-reboot for CAPP is not yet supported. Hence this patch disables
fast-reboot in case the CAPP is enabled after reboot until its
support is merged.
References: [1] https://patchwork.ozlabs.org/patch/878879/
Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add three OPAL API calls that are required by the ocxl driver.
- OPAL_NPU_SPA_SETUP
The Shared Process Area (SPA) is a table containing one entry (a
"Process Element") per memory context which can be accessed by the
OpenCAPI device.
- OPAL_NPU_SPA_CLEAR_CACHE
The NPU keeps a cache of recently accessed memory contexts. When a
Process Element is removed from the SPA, the cache for the link must be
cleared.
- OPAL_NPU_TL_SET
The Transaction Layer specification defines several templates for
messages to be exchanged on the link. During link setup, the host and
device must negotiate what templates are supported on both sides and at
what rates those messages can be sent.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Scan the OpenCAPI links under the NPU, and for each link, reset the card,
set up a device, train the link and register a PHB.
Implement the necessary operations for the OpenCAPI PHB type.
For bringup, test and debug purposes, we allow an NVRAM setting,
"opencapi-link-training" that can be set to either disable link training
completely or to use the prbs31 test pattern.
To disable link training:
nvram -p ibm,skiboot --update-config opencapi-link-training=none
To use prbs31:
nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Unlike NVLink, which uses the pci-virt framework to fake a PCI
configuration space for NVLink devices, the OpenCAPI device model presents
us with a real configuration space handled by the device over the OpenCAPI
link.
As a result, we have to train the OpenCAPI link in skiboot before we do PCI
probing, so that config space can be accessed, rather than having link
training being triggered by the Linux driver.
Add some helper functions to wrap the existing NVLink PHY training sequence
so we can easily run it within skiboot.
Additionally, we add OpenCAPI-specific lane settings, and a function to
"bump" lanes that haven't trained properly (this process isn't documented
in the workbook, but the hardware experts assure us that this improves link
training reliability...) We also support the PRBS31 pattern that's used for
bringup and test purposes.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Scan the device tree for NPUs with OpenCAPI links and configure the NPU per
the initialisation sequence in the NPU OpenCAPI workbook.
Training of individual links and setup of per-AFU/link configuration will
be in a later patch.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Unlike NVLink, OpenCAPI registers a separate PHB for each device, in order
to allow us to force Linux to use the correct MMIO windows for each NPU
link. This requires some reworking of NPU data structures to account for
the fact that a PHB could correspond to either an NPU (NVLink) or a single
link (OpenCAPI).
At some later point, we may want to rework the NVLink code to present a
separate PHB per device in order to simplify this. For now, we split
NVLink-specific device data into a separate struct in order to make it
clear which fields are NVLink-only.
Additionally, add helper functions to correctly translate between
OpenCAPI/NVLink PHBs and the underlying structures, and various fields
for OpenCAPI data that we're going to need later on.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Split out common helper functions for NPU register access into a separate
file, as these will be used extensively by both NVLink and OpenCAPI code.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is not the way we want to end up doing this.
This is a hack to make folk happy and not require crondump to
debug nvidia/npu2 issues.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC
nodes which are already present in the /ibm,opal/power-mgt node. These
per-chip nodes hold the voltage IDs for each pstate and these can be
changed on OCC pstate table biasing. So delete these before calling
the re-init code to re-parse and populate the pstate data.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This changes to build system to use thin archives rather than
incremental linking for built-in.o, similar to recent change to Linux.
built-in.o is renamed to built-in.a, and is created as a thin archive
with no index, for speed and size. All built-in.a are aggregated into
a skiboot.tmp.a which is a thin archive built with an index, making it
suitable or linking. This is input into the final link.
The advantags of build size and linker code placement flexibility are
not as great with skiboot as a bigger project like Linux, but it's a
conceptually better way to build, and is more compatible with link
time optimisation in toolchains which might be interesting for skiboot
particularly for size reductions.
Size of build tree before this patch is 34.4MB, afterwards 23.1MB.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This fixes these nvidia cards training at only GEN2 spends rather than
GEN3 by disabling PCIe lane equalisation.
Firstly we check if the card is in a whitelist. If it is and the link
has not trained optimally, retry with lane equalisation off. We do
this on all POWER9 chip revisions since this is a device issue, not
a POWER9 chip issue.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch adds a new opal call to enable/disable a sensor group. This
call is used to select the sensor groups that needs to be copied to
main memory by OCC at runtime.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[stewart: rebase and bump OPAL API number]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Scale the sensor values appropriately to match the HWMON requirements.
OCC sensor values should be scaled as per "scale_factor' field and
then converted to HWMON required unit.
Sensors like temperature and power are already scaled by the kernel
driver which convert the values to millidegree Celsius and microWatt
respectively. So apart from temperature and power the remaining
sensors are converted to HWMON standards.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Export the accumulated power values as energy sensors. The accumulator
field of power sensors are used for representing energy counters which
can be exported as energy counters in Linux hwmon interface.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
[stewart: fix old gcc compiler warning]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors
upto 64bits.
This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|