aboutsummaryrefslogtreecommitdiff
path: root/hw/npu2-opencapi.c
AgeCommit message (Collapse)AuthorFilesLines
2021-10-19pau: Add support for OpenCAPI Persistent Memory devices.Christophe Lombard1-15/+3
Lowest Point of Coherency (LPC) memory allows the host to access memory on an OpenCAPI device. When the P10 chip accesses memory addresses on the AFU, the Real Address on the PowerBus must hit a BAR in the PAU such as GPU-Memory BAR. The BAR defines the range of Real Addresses that represent AFU memory. The two existing OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE are used to manage the AFU momory. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-10-19npu2: move opal apiChristophe Lombard1-35/+5
Move the OPAL entry points for npu2 opencapi to the common opal NPU file. This prepares us to add same entries for PAU opencapi in this common file. No functional change. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-11io: endian annotations and fixNicholas Piggin1-6/+6
Annotate io accessor pointer types with endian. sparse caught a bug in memcpy_from_ci, which is fixed. From: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-12Re-license IBM written files as Apache 2.0 OR GPLv2+Stewart Smith1-1/+1
SPDX makes it a simpler diff. I have audited the commit history of each file to ensure that they are exclusively authored by IBM and thus we have the right to relicense. The motivation behind this is twofold: 1) We want to enable experiments with coreboot, which is GPLv2 licensed 2) An upcoming firmware component wants to incorporate code from skiboot and code from the Linux kernel, which is GPLv2 licensed. I have gone through the IBM internal way of gaining approval for this. The following files are not exclusively authored by IBM, so are *not* included in this update (I will be seeking approval from contributors): core/direct-controls.c core/flash.c core/pcie-slot.c external/common/arch_flash_unknown.c external/common/rules.mk external/gard/Makefile external/gard/rules.mk external/opal-prd/Makefile external/pflash/Makefile external/xscom-utils/Makefile hdata/vpd.c hw/dts.c hw/ipmi/ipmi-watchdog.c hw/phb4.c include/cpu.h include/phb4.h include/platform.h libflash/libffs.c libstb/mbedtls/sha512.c libstb/mbedtls/sha512.h platforms/astbmc/barreleye.c platforms/astbmc/garrison.c platforms/astbmc/mihawk.c platforms/astbmc/nicole.c platforms/astbmc/p8dnu.c platforms/astbmc/p8dtu.c platforms/astbmc/p9dsu.c platforms/astbmc/vesnin.c platforms/rhesus/ec/config.h platforms/rhesus/ec/gpio.h platforms/rhesus/gpio.c platforms/rhesus/rhesus.c platforms/astbmc/talos.c platforms/astbmc/romulus.c Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: fixed up the drift] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-02-12npu2-opencapi: Allow platforms to identify physical slotsFrederic Barrat1-3/+13
This patch lets each platform define the name of the opencapi slots. It makes it easier to identify which physical card is generating errors or messages in the linux or skiboot log files. The patch provides slot names for mihawk and witherspoon. If the platform doesn't define any, then we default to 'OPENCAPI-xxxx' There are various ways to find out about the slot names: skiboot log lspci command (if the PCI hotplug driver pnv-php is loaded) lshw checking the device tree and probably others.... Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-02-12npu2-opencapi: Don't drive reset signal permanentlyFrederic Barrat1-6/+40
A problem was found with the way we manage the I2C signal to reset adapters. Skiboot currently always drives the value of the opencapi reset signal. We set the I2C pin for reset in output mode and keep it in output mode permanently. And since the reset signal is inverted, it is explicitly set to high by the I2C controller pretty much all the time. When the opencapi card is powered off, for example on a reboot, actively driving the I2C reset pin to high keeps applying a voltage to part of the FPGA, which can leak current, send the FPGA in a bad state since it's unexpected or even damage the card. To prevent damaging adapters, the recommendation from the hardware team is to switch back the pin to input mode at the end of a reset cycle. There are pull-up resistors on the planar of all the platforms to make sure the reset signal is high "naturally". When the slot is powered off, the reset pin won't be kept high by the i2c controller any more. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-01-29npu2-opencapi: don't fence on masked XSL errorsFrederic Barrat1-2/+9
An upcoming change in the initfile is going to modify the default action and fence behavior of some of the NPU FIR2 bits. We're already overriding the settings of most of those. The one exception is for bits 41 and 42, which are XSL errors impacting 2 links that we mask (instead we rely on the subsequent OTL error, which is per link). The new initfile will fence-on-error for bits 41 and 42. And even if the FIRs are masked, the NPU logic could fence the links, which is not what we want. So this patch makes sure we don't fence on the FIRs we want to ignore. It has no effect on existing firmware. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-01-29hw/npu2-opencapi: Support multiple LPC devicesAndrew Donnellan1-13/+36
Currently, we only have a single range for LPC memory per chip, and we only allow a single device to use that range. With upcoming Hostboot/SBE changes, we'll use the chip address extension mask to give us multiple ranges by using the masked bits of the group ID. Each device can now allocate a whole 4TB non-mirrored region. We still don't do >4TB ranges. If the extension mask is not set correctly, we'll fall back to only permitting one device and printing an error suggesting a firmware upgrade. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-01-29npu2: Rework phb-index assignments for virtual PHBsFrederic Barrat1-0/+2
Until now, opencapi PHBs were not using the 'ibm,phb-index' property, as it was thought unnecessary. For nvlink, a phb-index was associated to the npu when parsing hdat data, and the nvlink PHB was reusing the same value. It turns out it helps to have the 'ibm,phb-index' property for opencapi PHBs after all. Otherwise it can lead to wrong results on platforms like mihawk when trying to match entries in the slot table. We end up with an opencapi device inheriting wrong properties in the device tree, because match_slot_phb_entry() default to phb-index 0 if it cannot find the property. Though it doesn't seem to cause any harm, it's wrong and a future patch is expected to start using the slot table for opencapi, so it needs fixing. The twist is that with opencapi, we can have multiple virtual PHBs for a single NPU on P9. There's one PHB per (opencapi) brick. Therefore there's no 1-to-1 mapping between the NPU and PHB index and it no longer makes sense to associate a phb-index to a npu. With this patch, opencapi PHBs created under a NPU use a fixed mapping for their phb-index, based on the brick index. The range of possible values is 7 to 12. Because there can only be one nvlink PHB per NPU, it is always using a phb-index of 7. A side effect is that 2 virtual PHBs on 2 different chips can have the same phb-index, which is similar to what happens for 'real' PCI PHBs, but is different from what was happening on a nvlink-only witherspoon so far. Reviewed-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-12-16opal-api: add endian conversions to most opal callsNicholas Piggin1-3/+9
This adds missing endian conversions to most calls, sufficient at least to handle calls from a kernel booting on mambo. Subsystems requiring more extensive changes (e.g., xive) will be done with individual changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-11Remove dead POWER7 codeNicholas Piggin1-1/+0
There are a number of proc_gen branches removed that are trivially dead code and comments that refer to P7. As well as those: - Oliver points out that add_xics_icps() must be unused on POWER8 because it asserts if number of threads > 4, so remove it. - Change 16b7ae641 ("Remove POWER7 and POWER7+ support") removed all references to opal_boot_trampoline, so remove that. - It also removed the only non-trival choose_bus implementation, so that is removed and its caller simplified. - Remove the paca code, later CPUs use pcia. Cc: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-05npu2-opencapi: Fix integer promotion bug in LPC allocationAndrew Donnellan1-1/+1
If you try to allocate an amount of LPC memory that's not a power of 2, we round the value up to the nearest power of 2. By the magic of C, "1 << n" gets treated as an int, even if you're assigning it to a uint64_t. Change 1 to 1ULL to fix this. (C, it's great.) Reported-by: Alastair D'Silva <alistair@d-silva.org> Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Log a warning when resetting a broken deviceFrederic Barrat1-0/+4
On P9, the NPU doesn't support recovery if the link goes down unexpectedly. It was not fully verified. We mark the device as broken when we receive an error interrupt from the NPU. However, there's nothing to prevent the OS from trying to reset the device; It may or may not work, it's unsupported territory, so let's log a message to make it clear, as it could help when debugging. We haven't hit any cases where the reset goes badly enough that we'd want to prevent it, so let it go for now. We can revisit later if we have evidence that it's causing more problems than it is worth. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Handle OPAL_UNMAP_PE operation on set_pe() callbackFrederic Barrat1-1/+6
In a hot-unplug scenario, the OS will try to unmap the PE. Skiboot doesn't do anything with the linux PE for opencapi other than being a mailbox, but at least let's be consistent. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Activate PCI hotplug on opencapi slotFrederic Barrat1-4/+65
Implement the get_power_state() and set_power_state() callbacks for the opencapi slot and add properties in the device tree to mark the opencapi slot as hot-pluggable. We don't really power off/on the opencapi adapter. The slot at play here is the virtual slot associated to the virtual opencapi PHB. The real PCIe slot where the card is drawing its power from is untouched (skiboot is not even aware which PCIe slot the card is seated on). So the 'fake' power off is fencing the card and set it in reset so that the FPGA image can be updated. The 'fake' power on is not doing much, as the unfencing happens on the subsequent link training. Opencapi slots are named 'OPENCAPI-xxxx' where xxxx is the opal ID of the PHB/slot. This is meant to easily identify the slot used by an AFU device, as the AFU device names are also built around that ID. For example, the device /dev/ocxl/AFP3.0006:00:00.1.0 uses the slot OPENCAPI-0006. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Improve error reporting to the OSFrederic Barrat1-4/+17
When resetting an opencapi link, the brick will be fenced temporarily. Therefore we can't rely on the fencing state of the brick any more to check for the health of an opencapi PHB, as we could report errors if queried for a PHB state at the same time a link is being reset. Instead, we flag the device as 'broken' when an error interrupt is received, just before raising an event to the OS. When the OS is querying for the state of a PHB, we only have to look at the 'broken' attribute. Note that there's no recovery possible on P9 when an error interrupt is received unexpectedly, as recovery is not supported by hardware. So when a device/link is marked as 'broken', it stays broken. All the OS can do is log the error and notify the drivers. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Detect PHY reset errorsFrederic Barrat1-1/+6
PHY reset can fail! Though past problems are now fixed, let's handle any future failure. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Simplify freset statesFrederic Barrat1-13/+3
Let's get rid of one transitional state, since there's no need to pause in between releasing the reset signals of the ODL and the adapter. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Tweak fundamental reset sequenceFrederic Barrat1-22/+26
Modify slightly the ordering of a few steps in our init sequence on fundamental reset, so that it can be called from the OS, when the link is already up: - when the card is reset, the link goes down, so we need to fence the brick to prevent errors propagating to the NPU and OS - since fencing and unfencing don't require any delay, let's also fence/unfence during the very first reset at boot. It's useless but doesn't hurt and keep the code simpler. - resetting the PHY must be done a bit later, while fenced and the ODL and DLx in reset Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Rework link training timeoutFrederic Barrat1-4/+5
Opencapi link state should be polled for up to 3 seconds. Current code assumes a tight retry loop during fundamental reset at boot, which is not going to be true on link retraining. So update the timeout detection code to use a timebase instead of a simple retry count which could be way too long. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Make sure the PCI slot has the proper IDFrederic Barrat1-1/+2
The PCI slot created for the opencapi PHB didn't have its ID properly defined because it was created before we assign an ID to the PHB. Simply switch the PCI slot creation and PHB registration calls to fix it. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-07-26SPDX-ify all skiboot codeStewart Smith1-16/+3
Use Software Package Data Exchange (SPDX) to indicate license for each file that is unique to skiboot. At the same time, ensure the (C) who and years are correct. See https://spdx.org/ Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: Added a few missing files] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-06-04npu2-opencapi: Mask 2 XSL errorsFrederic Barrat1-9/+20
Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs") converted some FIR bits default action from system checkstop to raising an error interrupt. For 2 XSL error events that can be triggered by a misbehaving AFU, the error interrupt is raised twice, once for each link (the XSL logic in the NPU is shared between 2 links). So a badly behaving AFU could impact another, unsuspecting opencapi adapter. It doesn't look good and it turns out we can do better. We can mask those 2 XSL errors. The error will also be picked up by the OTL logic, which is per link. So we'll still get an error interrupt, but only on the relevant link, and the other opencapi adapter can stay functional. Fixes: f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03Remove remnants of OPAL_PCI_GET_PHB_DIAG_DATAStewart Smith1-1/+0
Never present in a public OPAL release, and only kernels prior to 3.11 would ever attempt to call it. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memoryAndrew Donnellan1-0/+181
Lowest Point of Coherency (LPC) memory allows the host to access memory on an OpenCAPI device. Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for assigning and clearing the memory BAR. (We try to avoid using the term "LPC" to avoid confusion with Low Pin Count.) At present, we use a fixed location in the address space, which means we are restricted to a single range of 4TB, on a single OpenCAPI device per chip. In future, we'll use some chip ID extension magic to give us more space, and some sort of allocator to assign ranges to more than one device. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15nvram: Flag dangerous NVRAM optionsMichael Neuling1-1/+1
Most nvram options used by skiboot are just for debug or testing for regressions. They should never be used long term. We've hit a number of issues in testing and the field where nvram options have been set "temporarily" but haven't been properly cleared after, resulting in crashes or real bugs being masked. This patch marks most nvram options used by skiboot as dangerous and prints a chicken to remind users of the problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Report errors to the OS if an OpenCAPI brick is fencedFrederic Barrat1-4/+51
Now that the NPU may report interrupts due to the link going down unexpectedly, report those errors to the OS when queried by the 'next_error' PHB callback. The hardware doesn't support recovery of the link when it goes down unexpectedly. So we report the PHB as dead, so that the OS can log the proper message, notify the drivers and take the devices down. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Setup an error interrupt on some opencapi FIRsFrederic Barrat1-7/+32
Many errors reported in the NPU FIR2 register, mostly catching unexpected errors on the opencapi link are defined as 'brick fatal' in the workbook, yet the default action is set to system checkstop. It's possible to see those errors during AFU development, where the AFU may send unexpected packets on the link, therefore triggering those errors. Checkstopping the system in this case is clearly extreme, as the error could be contained to the brick and proper analysis of a checkstop is not trivial outside of a bringup environment. This patch changes the default action of those errors so that the NPU will raise an interrupt instead. Follow-up patches will log proper information so that the error can be debugged and linux can catch the event. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Use NVLink irq setup for OpenCAPIFrederic Barrat1-46/+4
Start using the irq setup code from NVLink for OpenCAPI, since the 2 versions are so close. There are only 2 differences: - the NPU may trigger more interrupts for OpenCAPI, 35 vs. 23, though none are configured to be triggered for now. - we need to enable the 4 translation faults interrupts for OpenCAPI. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-09hw/npu2: Fix OpenCAPI PE assignmentAndrew Donnellan1-40/+34
When we support mixing NVLink and OpenCAPI devices on the same NPU, we're going to have to share the same range of 16 PE numbers between NVLink and OpenCAPI PHBs. For OpenCAPI devices, PE assignment is only significant for determining which System Interrupt Log register is used for a particular brick - unlike NVLink, it doesn't play any role in determining how links are fenced. Split the PE range into a lower half which is used for NVLink, and an upper half that is used for OpenCAPI, with a fixed PE number assigned per brick. As the PE assignment for OpenCAPI devices is fixed, set the PE once during device init and then ignore calls to the set_pe() operation. Suggested-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-20npu2/hw-procedures: Fix parallel zcal for opencapiFrederic Barrat1-2/+3
For opencapi, we currently do impedance calibration when initializing the PHY for the device, which could run in parallel if we were rich and had multiple opencapi devices. But if 2 devices are on the same obus, the 2 calibration sequences could overlap, which likely yields bad results and is useless anyway since it only needs to be done once per obus. This patch splits the opencapi PHY reset in 2 parts: - a 'init' part called serially at boot. That's when zcal is done. If we have 2 devices on the same socket, the zcal won't be redone, since we're called serially and we'll see it has already be done for the obus - a 'reset' part called during fundamental reset as a prereq for link training. It does the PHY setup for a set of lanes and the dccal. The PHY team confirmed there's no dependency between zcal and the other reset steps and it can be moved earlier. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Fix adapter reset when using 2 adaptersFrederic Barrat1-7/+27
If two opencapi adapters are on the same obus, we may try to train the two links in parallel at boot time, when all the PCI links are being trained. Both links use the same i2c controller to handle the reset signal, so some care is needed to make sure resetting one doesn't interfere with the reset of the other. We need to keep track of the current state of the i2c controller (and use locking). This went mostly unnoticed as you need to have 2 opencapi cards on the same socket and links tended to train anyway because of the retries. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Extend delay after releasing reset on adapterFrederic Barrat1-2/+2
Give more time to the FPGA to process the reset signal. The previous delay, 5ms, is too short for newer adapters with bigger FPGAs. Extend it to 250ms. Ultimately, that delay will likely end up being added to the opencapi specification, but we are not there yet. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: ODL should be in reset when enabledFrederic Barrat1-0/+6
We haven't hit any problem so far, but from the ODL designer, the ODL should be in reset when it is enabled. The ODL remains in reset until we start a fundamental reset to initiate link training. We still assert and deassert the ODL reset signal as part of the normal procedure just before training the link. Asserting is therefore useless at boot, since the ODL is already in reset, but we keep it as it's only a scom write and it's needed when we reset/retrain from the OS. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Keep ODL and adapter in reset at the same timeFrederic Barrat1-25/+43
Split the function to assert and deassert the reset signal on the ODL, so that we can keep the ODL in reset while we reset the adapter, therefore having a window where both sides are in reset. It is actually not required with our current DLx at boot time, but I need to split the ODL reset function for the following patch and it will become useful/required later when we introduce resetting an opencapi link from the OS. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Rename functions used to reset an adapterFrederic Barrat1-4/+4
This is really to avoid confusion with a later patch and clarify whether we're resetting the ODL or the adapter. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Setup perf counters to detect CRC errorsFrederic Barrat1-0/+62
It's possible to set up performance counters for the PLL to detect various conditions for the links in nvlink or opencapi mode. Since those counters are currently unused, let's configure them when an obus is in opencapi mode to detect CRC errors on the link. Each link has two counters: - CRC error detected by the host - CRC error detected by the DLx (NAK received by the host) We also dump the counters shortly after the link trains, but they can be read multiple times through cronus, pdbg or linux. The counters are configured to be reset after each read. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-13npu2-opencapi: Rework ODL register accessFrederic Barrat1-100/+9
ODL registers used to control the opencapi link state have an address built on a base address and an offset for each brick which can be computed instead of hard-coded individually for each brick. Rework how we access the ODL registers, to avoid repeating switch statements all over the place. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-01-18sparse: Make tree 'constant is so big' warning cleanStewart Smith1-2/+2
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-01-16npu2: Remove unused npu2::bdf2pe_cacheReza Arbab1-1/+0
This cache is written but never read. Wiring it up would gain us little (except added complexity), and it obviously hasn't been missed thus far, so remove it altogether. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28npu2-opencapi: Log ODL endpoint information registerFrederic Barrat1-1/+28
If the link trains in degraded mode, log the ODL endpoint information register for debug. Its content is specific to the DLx and TLx implementation, so this is really information useful for the hardware team. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28npu2-opencapi: Detect if link trained in degraded modeFrederic Barrat1-19/+31
There's no status readily available to tell the effective link width. Instead, we have to look at the individual status of each lane, on the transmit and receive direction. All relevant information is in the ODL status register. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28npu2-opencapi: Log extra information on link training failureFrederic Barrat1-3/+34
Log the link training status register in case of failure to train. It can have useful information for the hardware team. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17hw/npu2, platform: Restructure OpenCAPI i2c reset/presence pinsAndrew Donnellan1-3/+7
In platform_ocapi, we define i2c_{reset,presence}_odl{0,1} to specify the appropriate reset/presence GPIO pins for devices connected to ODL0 and ODL1 respectively. This is obviously wrong, because a device connected to brick 2 and a device connected to brick 4 are going to be different devices connected to different I2C pins, but rather conveniently we haven't had to deal with systems that can use the full 4 bricks as yet. Now that we're adding OpenCAPI support for Witherspoon, we should change this to specify pins separately for all 4 bricks. Replace i2c_{reset,presence}_odl{0,1} with i2c_{reset,presence}_brick{2,3,4,5} and update the presence detection code, device reset code, and existing platforms accordingly. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17hw/npu2, platform: Add NPU2 platform device detection callbackAndrew Donnellan1-79/+14
There is no standardised way to determine the presence and type of devices connected to an NPU on POWER9. Currently, we hardcode device types based on platform type (as no platform currently supports both OpenCAPI and NVLink), and for OpenCAPI platforms we use I2C to detect presence. Witherspoon (and potentially other platforms later on) supports both NVLink and OpenCAPI, and additionally uses SXM2 connectors which can carry more than one link, rather than the SlimSAS connectors used for OpenCAPI on Zaius and ZZ. This necessitates some special handling. Add a platform callback for NPU device detection. In a later patch, we will use this to implement Witherspoon-specific device detection. For now, add a Witherspoon stub that sets all links to NVLink (i.e. current behaviour). Move the existing I2C-based presence detection for OpenCAPI devices on Zaius/ZZ into common code, which we use by default for platforms which do not define a callback. Clean up the use of the ibm,npu-link-type property, which will now be exposed solely for debugging and not consumed internally. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17hw/npu2: Common NPU2 init routine between NVLink and OpenCAPIAndrew Donnellan1-105/+53
Replace probe_npu2() and probe_npu2_opencapi() with a new shared probe_npu2(). Refactor some of the common NPU setup code into shared code. No functional change. This patch does not implement support for using both types of devices simultaneously on the same NPU - we expect to add this sometime in the future. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17npu2: Split device index into brick and link indexAndrew Donnellan1-37/+40
On Witherspoon, OpenCAPI devices attached to link indexes 0 and 1 are handled by bricks 2 and 3. Rename index to brick_index, and add a new field, link_index, to refer to the link index. For now, we set those values identically. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17hw/npu2-opencapi: Fix setting of supported OpenCAPI templatesAndrew Donnellan1-2/+2
In opal_npu_tl_set(), we made a typo that means the OPAL_NPU_TL_SET call may not clear the enable bits for templates that were previously enabled but are now disabled. Fix the typo so we clear NPU2_OTL_CONFIG1_TX_TEMP2_EN as well as TEMP{1,3}_EN. Reported-by: Tyler Seredynski <tseredynski@gmail.com> Fixes: cd8b82a8e83ed ("npu2-opencapi: Add OpenCAPI OPAL API calls") Cc: stable Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-07-26npu2-opencapi: Don't send commands to NPU when link is downFrederic Barrat1-1/+2
Even if an opencapi link is down, we currently always try to issue a config read operation when probing for PCI devices, because of the default scan map used for an opencapi PHB. The config operation fails, as expected, but it can also raise a FIR bit and trigger an HMI. For opencapi, there's no root device like for a "normal" PCI PHB, so there's no reason to do the config operation. To fix it, we keep the scan map blank by default, and only add a device once the link is trained. CC: stable # v6.1+ Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-07-03npu2: Use same compatible string for NVLink and OpenCAPI link nodes in ↵Andrew Donnellan1-5/+8
device tree Currently, we distinguish between NPU links for NVLink devices and OpenCAPI devices through the use of two different compatible strings - ibm,npu-link and ibm,npu-link-opencapi. As we move towards supporting configurations with both NVLink and OpenCAPI devices behind a single NPU, we need to detect the device type as part of presence detection, which can't happen until well after the point where the HDAT or platform code has created the NPU device tree nodes. Changing a node's compatible string after it's been created is a bit ugly, so instead we should move the device type to a new property which we can add to the node later on. Get rid of the ibm,npu-link-opencapi compatible string, add a new ibm,npu-link-type property, and a helper function to check the link type. Add an "unknown" device type in preparation for later patches to detect device type dynamically. These device tree bindings are entirely internal to skiboot and are not consumed directly by Linux, so this shouldn't break anything (other than internal BML lab environments). Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>