aboutsummaryrefslogtreecommitdiff
path: root/include/npu2-regs.h
AgeCommit message (Collapse)AuthorFilesLines
2018-03-27Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith1-6/+1
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 80452d2cf2ce4dfc769b74c28bd0c73ec076b9be) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28NPU2 HMIs: dump out a *LOT* of npu2 registers for debuggingStewart Smith1-1/+6
This is not the way we want to end up doing this. This is a hack to make folk happy and not require crondump to debug nvidia/npu2 issues. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21npu2/opal-api: move npu2 checkstop defines to npu2-regs.hStewart Smith1-0/+98
These aren't API. Fixes: b57a5380aa489fa877b2d619225aea2602f20dca Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08hw/npu2: Implement logging HMI actionsBalbir Singh1-0/+11
Log HMI errors as step 1. OS will need to deduce and interpret the HMI event. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-14npu2.c: Add PE error detectionAlistair Popple1-16/+1
Invalid accesses from the GPU can cause a specific PE to be frozen by the NPU. Add an interrupt handler which reports the frozen PE to the operating system via as an EEH event. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15Revert "npu2: hw-procedures: Enable low power mode"Reza Arbab1-6/+0
As it turns out, low power mode is not yet ready for prime time. We shouldn't write the low power config register until it is. This reverts commit a05054c53a37850a2118d01fcf6669ebb10d1a33. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Move to new GPU memory mapMichael Neuling1-0/+5
There are three different ways we configure the MCD and memory map. 1) Old way (current way) Skiboot configures the MCD and puts GPUs at 4TB and below 2) New way with MCD Hostboot configures the MCD and skiboot puts GPU at 4TB and above 3) New way without MCD No one configures the MCD and skiboot puts GPU at 4TB and below The patch keeps option 1 and adds options 2 and 3. The different configurations are detected using certain scoms (see patch). Option 1 will go away eventually as it's a configuration that can cause xstops or data integrity problems. We are keeping it around to support existing hostboot. Option 2 supports only 4 GPUs and 512GB of memory per socket. Option 3 supports 6 GPUs and 4TB of memory but may have some performance impact. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Refactor BAR setting codeMichael Neuling1-1/+2
This refactors the BAR setting code to make it clearer and handle a larger range of BAR addresses. This is needed as we are about to move the GPU to a physical address that is currently not supported by this code. This change derives group and chip sections of the BAR from the base address rather than the chip_id now. mem sel is also derived from the base address, rather than assuming 0. No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Create npu2_write_mcd()Michael Neuling1-0/+3
This code is replicated, so let's put it in a function. Also add some cleanups. No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-13npu2: hw-procedures: Add phy_rx_clock_sel()Reza Arbab1-0/+1
Change the RX clk mux control to be done by software instead of HW. This avoids glitches caused by changing the mux setting. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-09npu2: hw-procedures: Enable low power modeReza Arbab1-0/+6
Add a procedure which sets the NTL low power config register. To actually enter low power mode, a corresponding change must be present in the GPU device driver. The link will not enter low power mode unless both sides agree, which means this change is safe to make independently. It should have no forward or backward dependencies on other components. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12npu2: Enable recoverable data link (no-stall) interruptsSam Bobroff1-0/+10
Allow the NPU2 to trigger "recoverable data link" interrupts. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Set the XTS config2 registerReza Arbab1-0/+1
POWER9 DD2 has added a new bit we'd like to set: "XTS_CONFIG2_NO_FLUSH_ENA: if enabled, allows MMIO ATSDs to suppress the flush" This has passed sanity tests with 4.12 kernels, which are capable of exercising this capability. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Adjust content of the NTL BARReza Arbab1-2/+4
Reflect the changed NTL BAR layout in POWER9 DD2. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Adjust content of the GENID BARReza Arbab1-2/+3
Reflect the changed GENID BAR layout in POWER9 DD2. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Add NPU2_GPU1_MEM_BARReza Arbab1-0/+1
POWER9 DD2 has added a second GPU memory BAR. Use it, but continue to program things the old way on DD1 systems. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Fix indirect SCOM addressesReza Arbab1-2/+4
Change these values for POWER9 DD2, but keep backwards compatibility. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-06hw/npu2.c: Add memory coherence directory programmingAlistair Popple1-0/+5
The memory coherence directory (MCD) needs to know which system memory addresses belong to the GPU. This amounts to setting a BAR and a size in the MCD to cover the addresses assigned to each of the GPUs. To ease assignment we assume GPUs are assigned memory in a contiguous block per chip. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-10npu2: Fix BAR mapping for multiple chipsAlistair Popple1-4/+12
NPU2 BARs were being assigned and tracked with a global static array. This worked fine when there was only a single chip/NPU2 in the system however multiple chips results in the a shared data structure for BAR management which results in multiple chips getting assigned the same BAR addresses and other incorrect sharing of BAR properties. This patch splits the static and dynamic BAR configuration and stores the dynamic configuration in the per-NPU2 data structure. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-03-30npu2: Add hardware link training proceduresAlistair Popple1-0/+7
Unlike other system buses the NVLink2 links need to be trained at runtime as training requires interaction from the GPU device drivers. This patch implements the required training procedures for NVLink2, which are different than the NVLink1 equivalents. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-03-30npu2: Add npu2 register definitionsAlistair Popple1-0/+431
NPU2 registers are split up into a series of stacks and blocks. This adds definitions and register offset calculation macros for all of the register stacks/blocks in preparation for NPU2. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>