Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17.
We don't need this as we need to do it a different way, with a explicit
set of registers as otherwise we trip other random FIR bits and everything
becomes even more terrible.
I suggest alcohol.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
(cherry picked from commit 80452d2cf2ce4dfc769b74c28bd0c73ec076b9be)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is not the way we want to end up doing this.
This is a hack to make folk happy and not require crondump to
debug nvidia/npu2 issues.
Cc: stable
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
(cherry picked from commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17)
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
These aren't API.
Fixes: b57a5380aa489fa877b2d619225aea2602f20dca
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Log HMI errors as step 1. OS will need to deduce
and interpret the HMI event.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Invalid accesses from the GPU can cause a specific PE to be frozen by the
NPU. Add an interrupt handler which reports the frozen PE to the operating
system via as an EEH event.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
As it turns out, low power mode is not yet ready for prime time. We
shouldn't write the low power config register until it is.
This reverts commit a05054c53a37850a2118d01fcf6669ebb10d1a33.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
There are three different ways we configure the MCD and memory map.
1) Old way (current way)
Skiboot configures the MCD and puts GPUs at 4TB and below
2) New way with MCD
Hostboot configures the MCD and skiboot puts GPU at 4TB and above
3) New way without MCD
No one configures the MCD and skiboot puts GPU at 4TB and below
The patch keeps option 1 and adds options 2 and 3.
The different configurations are detected using certain scoms (see
patch).
Option 1 will go away eventually as it's a configuration that can
cause xstops or data integrity problems. We are keeping it around to
support existing hostboot.
Option 2 supports only 4 GPUs and 512GB of memory per socket.
Option 3 supports 6 GPUs and 4TB of memory but may have some
performance impact.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This refactors the BAR setting code to make it clearer and handle a
larger range of BAR addresses. This is needed as we are about to move
the GPU to a physical address that is currently not supported by this
code.
This change derives group and chip sections of the BAR from the base
address rather than the chip_id now. mem sel is also derived from the
base address, rather than assuming 0.
No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This code is replicated, so let's put it in a function. Also add some cleanups.
No functional change.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Change the RX clk mux control to be done by software instead of HW. This
avoids glitches caused by changing the mux setting.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Add a procedure which sets the NTL low power config register.
To actually enter low power mode, a corresponding change must be present
in the GPU device driver. The link will not enter low power mode unless
both sides agree, which means this change is safe to make independently.
It should have no forward or backward dependencies on other components.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Allow the NPU2 to trigger "recoverable data link" interrupts.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
POWER9 DD2 has added a new bit we'd like to set:
"XTS_CONFIG2_NO_FLUSH_ENA:
if enabled, allows MMIO ATSDs to suppress the flush"
This has passed sanity tests with 4.12 kernels, which are capable of
exercising this capability.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Reflect the changed NTL BAR layout in POWER9 DD2.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Reflect the changed GENID BAR layout in POWER9 DD2.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
POWER9 DD2 has added a second GPU memory BAR. Use it, but continue to
program things the old way on DD1 systems.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Cc: Alistair Popple <alistair@popple.id.au>
Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Change these values for POWER9 DD2, but keep backwards compatibility.
Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com>
Acked-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The memory coherence directory (MCD) needs to know which system memory addresses
belong to the GPU. This amounts to setting a BAR and a size in the MCD to cover
the addresses assigned to each of the GPUs. To ease assignment we assume GPUs
are assigned memory in a contiguous block per chip.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
NPU2 BARs were being assigned and tracked with a global static
array. This worked fine when there was only a single chip/NPU2 in the
system however multiple chips results in the a shared data structure
for BAR management which results in multiple chips getting assigned
the same BAR addresses and other incorrect sharing of BAR properties.
This patch splits the static and dynamic BAR configuration and stores
the dynamic configuration in the per-NPU2 data structure.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Unlike other system buses the NVLink2 links need to be trained at
runtime as training requires interaction from the GPU device
drivers. This patch implements the required training procedures for
NVLink2, which are different than the NVLink1 equivalents.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
NPU2 registers are split up into a series of stacks and blocks. This
adds definitions and register offset calculation macros for all of the
register stacks/blocks in preparation for NPU2.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|