aboutsummaryrefslogtreecommitdiff
path: root/include/npu2-regs.h
AgeCommit message (Collapse)AuthorFilesLines
2021-10-19pau: hmi scom dumpChristophe Lombard1-0/+5
This patch add a new function to dump PAU registers when a HMI has been raised and an OpenCAPI link has been hit by an error. For each register, the scom address and the register value are printed. The hmi.c has been redesigned in order to support the new PHB/PCIEX type (PAU OpenCapi). Now, the *npu* functions support NPU and PAU units of P8, P9 and P10 chips. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-12Re-license IBM written files as Apache 2.0 OR GPLv2+Stewart Smith1-1/+1
SPDX makes it a simpler diff. I have audited the commit history of each file to ensure that they are exclusively authored by IBM and thus we have the right to relicense. The motivation behind this is twofold: 1) We want to enable experiments with coreboot, which is GPLv2 licensed 2) An upcoming firmware component wants to incorporate code from skiboot and code from the Linux kernel, which is GPLv2 licensed. I have gone through the IBM internal way of gaining approval for this. The following files are not exclusively authored by IBM, so are *not* included in this update (I will be seeking approval from contributors): core/direct-controls.c core/flash.c core/pcie-slot.c external/common/arch_flash_unknown.c external/common/rules.mk external/gard/Makefile external/gard/rules.mk external/opal-prd/Makefile external/pflash/Makefile external/xscom-utils/Makefile hdata/vpd.c hw/dts.c hw/ipmi/ipmi-watchdog.c hw/phb4.c include/cpu.h include/phb4.h include/platform.h libflash/libffs.c libstb/mbedtls/sha512.c libstb/mbedtls/sha512.h platforms/astbmc/barreleye.c platforms/astbmc/garrison.c platforms/astbmc/mihawk.c platforms/astbmc/nicole.c platforms/astbmc/p8dnu.c platforms/astbmc/p8dtu.c platforms/astbmc/p9dsu.c platforms/astbmc/vesnin.c platforms/rhesus/ec/config.h platforms/rhesus/ec/gpio.h platforms/rhesus/gpio.c platforms/rhesus/rhesus.c platforms/astbmc/talos.c platforms/astbmc/romulus.c Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: fixed up the drift] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-07-26npu2: Prepare purge_l2_l3_caches() for reuseReza Arbab1-13/+0
Move this to a separate compilation unit with its own header, for reuse. The code formerly in npu2.c is copied verbatim. The #defines formerly in npu2-regs.h have been reformatted and changed to use PPC_BITMASK() instead of multiple consecutive PPC_BIT()s. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-07-26SPDX-ify all skiboot codeStewart Smith1-15/+2
Use Software Package Data Exchange (SPDX) to indicate license for each file that is unique to skiboot. At the same time, ensure the (C) who and years are correct. See https://spdx.org/ Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: Added a few missing files] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-06-27npu2: Increase timeout for L2/L3 cache purgingAlexey Kardashevskiy1-0/+2
On NVLink2 bridge reset, we purge all L2/L3 caches in the system. This is an asynchronous operation, we have a 2ms timeout here. There are reports that this is not enough and "PURGE L3 on core xxx timed out" messages appear (for the reference: on the test setup this takes 280us..780us). This defines the timeout as a macro and changes this from 2ms to 20ms. This adds a tracepoint to tell how long it took to purge all the caches. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memoryAndrew Donnellan1-0/+7
Lowest Point of Coherency (LPC) memory allows the host to access memory on an OpenCAPI device. Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for assigning and clearing the memory BAR. (We try to avoid using the term "LPC" to avoid confusion with Low Pin Count.) At present, we use a fixed location in the address space, which means we are restricted to a single range of 4TB, on a single OpenCAPI device per chip. In future, we'll use some chip ID extension magic to give us more space, and some sort of allocator to assign ranges to more than one device. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-02npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by defaultAlexey Kardashevskiy1-0/+14
V100 GPUs are known to violate NVLink2 protocol in some cases (one is when memory was accessed by the CPU and they by GPU using so called block linear mapping) and issue double probes to NPU which can cope with this problem only if CONFIG_ENABLE_SNARF_CPM ("disable/enable Probe.I.MO snarfing a cp_m") is not set in the CQ_SM Misc Config register #0. If the bit is set (which is the case today), NPU issues the machine check stop. The snarfing feature is designed to detect 2 probes in flight and combine them into one. This adds a new "opal-npu2-snarf-cpm" nvram variable which controls CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check stop from happening. This disables snarfing by default as otherwise a broken GPU driver can crash the entire box even when a GPU is passed through to a guest. This provides a dial to allow regression tests (might be useful for a bare metal). To enable snarfing, the user needs to run: sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable and reboot the host system. While at this, define macros for register names as well to avoid touching same lines over and over again. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Dump (more) npu2 registers on link error and HMIsFrederic Barrat1-0/+10
We were already logging some NPU registers during an HMI. This patch cleans up a bit how it is done and separates what is global from what is specific to nvlink or opencapi. Since we can now receive an error interrupt when an opencapi link goes down unexpectedly, we also dump the NPU state but we limit it to the registers of the brick which hit the error. The list of registers to dump was worked out with the hw team to allow for proper debugging. For each register, we print the name as found in the NPU workbook, the scom address and the register value. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Setup an error interrupt on some opencapi FIRsFrederic Barrat1-1/+4
Many errors reported in the NPU FIR2 register, mostly catching unexpected errors on the opencapi link are defined as 'brick fatal' in the workbook, yet the default action is set to system checkstop. It's possible to see those errors during AFU development, where the AFU may send unexpected packets on the link, therefore triggering those errors. Checkstopping the system in this case is clearly extreme, as the error could be contained to the brick and proper analysis of a checkstop is not trivial outside of a bringup environment. This patch changes the default action of those errors so that the NPU will raise an interrupt instead. Follow-up patches will log proper information so that the error can be debugged and linux can catch the event. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Setup perf counters to detect CRC errorsFrederic Barrat1-0/+17
It's possible to set up performance counters for the PLL to detect various conditions for the links in nvlink or opencapi mode. Since those counters are currently unused, let's configure them when an obus is in opencapi mode to detect CRC errors on the link. Each link has two counters: - CRC error detected by the host - CRC error detected by the DLx (NAK received by the host) We also dump the counters shortly after the link trains, but they can be read multiple times through cronus, pdbg or linux. The counters are configured to be reset after each read. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Rework ODL register accessFrederic Barrat1-19/+11
ODL registers used to control the opencapi link state have an address built on a base address and an offset for each brick which can be computed instead of hard-coded individually for each brick. Rework how we access the ODL registers, to avoid repeating switch statements all over the place. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-25npu2: Allow ATSD for LPAR other than 0Alexey Kardashevskiy1-0/+2
Each XTS MMIO ATSD# register is accompanied by another register - XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD transactions. When a host system passes a GPU through to a guest, we need to enable some ATSD for an LPAR. At the moment the host assigns one ATSD to a NVLink bridge and this maps it to an LPAR when GPU is assigned to the LPAR. The link number is used for an ATSD index. ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to be acceptable price for the simplicity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-01-18sparse: Make tree 'constant is so big' warning cleanStewart Smith1-1/+1
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-12Revert "npu2: Allow ATSD for LPAR other than 0"Stewart Smith1-2/+0
This reverts commit d8b161f4b361f70a7bb43be47d4a32b8f937287a. As discussed on list, a bit premature to merge, removing for now. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10Add purging CPU L2 and L3 caches into NPU hreset.Rashmica Gupta1-0/+11
If a GPU is passed through to a guest and the guest unexpectedly terminates, there can be cache lines in CPUs that belong to the GPU. So purge the caches as part of the reset sequence. L1 is write through, so doesn't need to be purged. The sequence to purge the L2 and L3 caches from the hw team: "L2 purge: (1) initiate purge putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TYPE L2CAC_FLUSH -all putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER ON -all (2) check this is off in all caches to know purge completed getspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_REG_BUSY -all (3) putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER OFF -all L3 purge: 1) Start the purge: putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_TTYPE FULL_PURGE -all putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ ON -all 2) Ensure that the purge has completed by checking the status bit: getspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ -all You should see it say OFF if it's done: p9n.ex k0:n0:s0:p00:c0 EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ OFF" Suggested-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10npu2: Allow ATSD for LPAR other than 0Alexey Kardashevskiy1-0/+2
Each XTS MMIO ATSD# register is accompanied by another register - XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD transactions. When a host system passes a GPU through to a guest, we need to enable some ATSD for an LPAR. At the moment the host assigns one ATSD to a NVLink bridge and this maps it to an LPAR when GPU is assigned to the LPAR. The link number is used for an ATSD index. ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to be acceptable price for the simplicity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28npu2-opencapi: Log ODL endpoint information registerFrederic Barrat1-0/+5
If the link trains in degraded mode, log the ODL endpoint information register for debug. Its content is specific to the DLx and TLx implementation, so this is really information useful for the hardware team. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28npu2-opencapi: Detect if link trained in degraded modeFrederic Barrat1-0/+2
There's no status readily available to tell the effective link width. Instead, we have to look at the individual status of each lane, on the transmit and receive direction. All relevant information is in the ODL status register. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-18Add the other 7 ATSD registers to the device tree.Rashmica Gupta1-0/+2
Suggested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17npu2: Split device index into brick and link indexAndrew Donnellan1-7/+7
On Witherspoon, OpenCAPI devices attached to link indexes 0 and 1 are handled by bricks 2 and 3. Rename index to brick_index, and add a new field, link_index, to refer to the link index. For now, we set those values identically. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-08-06npu2: Add support for relaxed-ordering modeReza Arbab1-3/+18
Some device drivers support out of order access to GPU memory. This does not affect the CPU view of memory but it does affect the GPU view of memory. It should only be enabled if the GPU driver has requested it. Add OPAL APIs allowing the driver to query relaxed ordering state or request it to be set for a device. Current hardware only allows relaxed ordering to be enabled per PCIe root port. So the code here doesn't enable relaxed ordering until it has been explicitly requested for every device on the port. Signed-off-by: Alistair Popple <alistair@popple.id.au> [arbab@linux.ibm.com: Rebase/refactor original changes] Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-08-06npu2: Don't open code NPU2_RELAXED_ORDERING_CFG2Reza Arbab1-0/+2
Make the code that initializes these registers more descriptive by using macros instead of open coded literals. No functional change. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-08-06npu2: Add NPU2_SM_REG_OFFSET()Reza Arbab1-0/+4
Add a register offset calculation macro using SM block index, similar to the other NPU2_*_REG_OFFSET() macros. Signed-off-by: Alistair Popple <alistair@popple.id.au> [arbab@linux.ibm.com: Rebase/refactor original changes] Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-07-17Move pb_cen_hp_mode_curr register definition to xscom-p9-reg.hAlistair Popple1-2/+0
Currently it is defined in npu2-regs.h but needs to be used by other files as well so move it somewhere generic. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-07-17npu2/hw-procedures: Enable parity and credit overflow checksReza Arbab1-0/+4
Enable these error checking features by setting the appropriate bits in our one-off initialization of each "NTL Misc Config 2" register. The exception is NDL RX parity checking, which should be disabled during the link training procedures. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-07-17npu2/hw-procedures: Don't open code NPU2_NTL_MISC_CFG2_BRICK_ENABLEReza Arbab1-0/+1
Name this bit properly. There's a lot more cleanup like this to be done, but I'm catching this one now as part of some related changes. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-03-27Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith1-6/+1
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22npu2: Remove DD1 supportAndrew Donnellan1-2/+0
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of revision-specific code. Now that all our lab machines are DD2, we no longer test anything on DD1 and it's time to get rid of it. Remove DD1-specific code and abort probe if we're running on a DD1 machine. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-12npu2: Use unfiltered mode in XTS tablesReza Arbab1-0/+2
The XTS_PID context table is limited to 256 possible pids/contexts. To relieve this limitation, make use of "unfiltered mode" instead. If an entry in the XTS_BDF table has the bit for unfiltered mode set, we can just use one context for that entire bdf/lpar, regardless of pid. Instead of of searching the XTS_PID table, the NMMU checkout request will simply use the entry indexed by lparshort id instead. Change opal_npu_init_context() to create these lparshort-indexed wildcard entries (0-15) instead of allocating one for each pid. Check that multiple calls for the same bdf all specify the same msr value. In opal_npu_destroy_context(), continue validating the bdf argument, ensuring that it actually maps to an lpar, but no longer remove anything from the XTS_PID table. If/when we start supporting virtualized GPUs, we might consider actually removing these wildcard entries by keeping a refcount, but keep things simple for now. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Add OpenCAPI OPAL API callsFrederic Barrat1-0/+4
Add three OPAL API calls that are required by the ocxl driver. - OPAL_NPU_SPA_SETUP The Shared Process Area (SPA) is a table containing one entry (a "Process Element") per memory context which can be accessed by the OpenCAPI device. - OPAL_NPU_SPA_CLEAR_CACHE The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. - OPAL_NPU_TL_SET The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Train OpenCAPI links and setup devicesAndrew Donnellan1-1/+51
Scan the OpenCAPI links under the NPU, and for each link, reset the card, set up a device, train the link and register a PHB. Implement the necessary operations for the OpenCAPI PHB type. For bringup, test and debug purposes, we allow an NVRAM setting, "opencapi-link-training" that can be set to either disable link training completely or to use the prbs31 test pattern. To disable link training: nvram -p ibm,skiboot --update-config opencapi-link-training=none To use prbs31: nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31 Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-hw-procedures: Add support for OpenCAPI PHY link trainingAndrew Donnellan1-0/+6
Unlike NVLink, which uses the pci-virt framework to fake a PCI configuration space for NVLink devices, the OpenCAPI device model presents us with a real configuration space handled by the device over the OpenCAPI link. As a result, we have to train the OpenCAPI link in skiboot before we do PCI probing, so that config space can be accessed, rather than having link training being triggered by the Linux driver. Add some helper functions to wrap the existing NVLink PHY training sequence so we can easily run it within skiboot. Additionally, we add OpenCAPI-specific lane settings, and a function to "bump" lanes that haven't trained properly (this process isn't documented in the workbook, but the hardware experts assure us that this improves link training reliability...) We also support the PRBS31 pattern that's used for bringup and test purposes. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Configure NPU for OpenCAPIAndrew Donnellan1-0/+89
Scan the device tree for NPUs with OpenCAPI links and configure the NPU per the initialisation sequence in the NPU OpenCAPI workbook. Training of individual links and setup of per-AFU/link configuration will be in a later patch. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2: Split out common helper functions into separate fileAndrew Donnellan1-0/+5
Split out common helper functions for NPU register access into a separate file, as these will be used extensively by both NVLink and OpenCAPI code. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28NPU2 HMIs: dump out a *LOT* of npu2 registers for debuggingStewart Smith1-1/+6
This is not the way we want to end up doing this. This is a hack to make folk happy and not require crondump to debug nvidia/npu2 issues. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21npu2/opal-api: move npu2 checkstop defines to npu2-regs.hStewart Smith1-0/+98
These aren't API. Fixes: b57a5380aa489fa877b2d619225aea2602f20dca Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08hw/npu2: Implement logging HMI actionsBalbir Singh1-0/+11
Log HMI errors as step 1. OS will need to deduce and interpret the HMI event. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-14npu2.c: Add PE error detectionAlistair Popple1-16/+1
Invalid accesses from the GPU can cause a specific PE to be frozen by the NPU. Add an interrupt handler which reports the frozen PE to the operating system via as an EEH event. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15Revert "npu2: hw-procedures: Enable low power mode"Reza Arbab1-6/+0
As it turns out, low power mode is not yet ready for prime time. We shouldn't write the low power config register until it is. This reverts commit a05054c53a37850a2118d01fcf6669ebb10d1a33. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Move to new GPU memory mapMichael Neuling1-0/+5
There are three different ways we configure the MCD and memory map. 1) Old way (current way) Skiboot configures the MCD and puts GPUs at 4TB and below 2) New way with MCD Hostboot configures the MCD and skiboot puts GPU at 4TB and above 3) New way without MCD No one configures the MCD and skiboot puts GPU at 4TB and below The patch keeps option 1 and adds options 2 and 3. The different configurations are detected using certain scoms (see patch). Option 1 will go away eventually as it's a configuration that can cause xstops or data integrity problems. We are keeping it around to support existing hostboot. Option 2 supports only 4 GPUs and 512GB of memory per socket. Option 3 supports 6 GPUs and 4TB of memory but may have some performance impact. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Refactor BAR setting codeMichael Neuling1-1/+2
This refactors the BAR setting code to make it clearer and handle a larger range of BAR addresses. This is needed as we are about to move the GPU to a physical address that is currently not supported by this code. This change derives group and chip sections of the BAR from the base address rather than the chip_id now. mem sel is also derived from the base address, rather than assuming 0. No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-15npu2: Create npu2_write_mcd()Michael Neuling1-0/+3
This code is replicated, so let's put it in a function. Also add some cleanups. No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-13npu2: hw-procedures: Add phy_rx_clock_sel()Reza Arbab1-0/+1
Change the RX clk mux control to be done by software instead of HW. This avoids glitches caused by changing the mux setting. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-11-09npu2: hw-procedures: Enable low power modeReza Arbab1-0/+6
Add a procedure which sets the NTL low power config register. To actually enter low power mode, a corresponding change must be present in the GPU device driver. The link will not enter low power mode unless both sides agree, which means this change is safe to make independently. It should have no forward or backward dependencies on other components. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12npu2: Enable recoverable data link (no-stall) interruptsSam Bobroff1-0/+10
Allow the NPU2 to trigger "recoverable data link" interrupts. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Set the XTS config2 registerReza Arbab1-0/+1
POWER9 DD2 has added a new bit we'd like to set: "XTS_CONFIG2_NO_FLUSH_ENA: if enabled, allows MMIO ATSDs to suppress the flush" This has passed sanity tests with 4.12 kernels, which are capable of exercising this capability. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Adjust content of the NTL BARReza Arbab1-2/+4
Reflect the changed NTL BAR layout in POWER9 DD2. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Adjust content of the GENID BARReza Arbab1-2/+3
Reflect the changed GENID BAR layout in POWER9 DD2. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Add NPU2_GPU1_MEM_BARReza Arbab1-0/+1
POWER9 DD2 has added a second GPU memory BAR. Use it, but continue to program things the old way on DD1 systems. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-08-04npu2: Fix indirect SCOM addressesReza Arbab1-2/+4
Change these values for POWER9 DD2, but keep backwards compatibility. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>