Age | Commit message (Collapse) | Author | Files | Lines |
|
Lowest Point of Coherency (LPC) memory allows the host to access memory on
an OpenCAPI device.
When the P10 chip accesses memory addresses on the AFU, the Real Address
on the PowerBus must hit a BAR in the PAU such as GPU-Memory BAR. The BAR
defines the range of Real Addresses that represent AFU memory.
The two existing OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE
are used to manage the AFU momory.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Move the OPAL entry points for npu2 opencapi to the common opal NPU
file. This prepares us to add same entries for PAU opencapi in this common
file.
No functional change.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Annotate io accessor pointer types with endian.
sparse caught a bug in memcpy_from_ci, which is fixed.
From: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
SPDX makes it a simpler diff.
I have audited the commit history of each file to ensure that they are
exclusively authored by IBM and thus we have the right to relicense.
The motivation behind this is twofold:
1) We want to enable experiments with coreboot, which is GPLv2 licensed
2) An upcoming firmware component wants to incorporate code from skiboot
and code from the Linux kernel, which is GPLv2 licensed.
I have gone through the IBM internal way of gaining approval for this.
The following files are not exclusively authored by IBM, so are *not*
included in this update (I will be seeking approval from contributors):
core/direct-controls.c
core/flash.c
core/pcie-slot.c
external/common/arch_flash_unknown.c
external/common/rules.mk
external/gard/Makefile
external/gard/rules.mk
external/opal-prd/Makefile
external/pflash/Makefile
external/xscom-utils/Makefile
hdata/vpd.c
hw/dts.c
hw/ipmi/ipmi-watchdog.c
hw/phb4.c
include/cpu.h
include/phb4.h
include/platform.h
libflash/libffs.c
libstb/mbedtls/sha512.c
libstb/mbedtls/sha512.h
platforms/astbmc/barreleye.c
platforms/astbmc/garrison.c
platforms/astbmc/mihawk.c
platforms/astbmc/nicole.c
platforms/astbmc/p8dnu.c
platforms/astbmc/p8dtu.c
platforms/astbmc/p9dsu.c
platforms/astbmc/vesnin.c
platforms/rhesus/ec/config.h
platforms/rhesus/ec/gpio.h
platforms/rhesus/gpio.c
platforms/rhesus/rhesus.c
platforms/astbmc/talos.c
platforms/astbmc/romulus.c
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: fixed up the drift]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This patch lets each platform define the name of the opencapi
slots. It makes it easier to identify which physical card is
generating errors or messages in the linux or skiboot log files.
The patch provides slot names for mihawk and witherspoon. If the
platform doesn't define any, then we default to 'OPENCAPI-xxxx'
There are various ways to find out about the slot names:
skiboot log
lspci command (if the PCI hotplug driver pnv-php is loaded)
lshw
checking the device tree
and probably others....
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
A problem was found with the way we manage the I2C signal to reset
adapters. Skiboot currently always drives the value of the opencapi
reset signal. We set the I2C pin for reset in output mode and keep it
in output mode permanently. And since the reset signal is inverted, it
is explicitly set to high by the I2C controller pretty much all the
time.
When the opencapi card is powered off, for example on a reboot,
actively driving the I2C reset pin to high keeps applying a voltage to
part of the FPGA, which can leak current, send the FPGA in a bad state
since it's unexpected or even damage the card. To prevent damaging
adapters, the recommendation from the hardware team is to switch back
the pin to input mode at the end of a reset cycle. There are pull-up
resistors on the planar of all the platforms to make sure the reset
signal is high "naturally". When the slot is powered off, the reset
pin won't be kept high by the i2c controller any more.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
An upcoming change in the initfile is going to modify the default
action and fence behavior of some of the NPU FIR2 bits. We're already
overriding the settings of most of those. The one exception is for
bits 41 and 42, which are XSL errors impacting 2 links that we
mask (instead we rely on the subsequent OTL error, which is per link).
The new initfile will fence-on-error for bits 41 and 42. And even if
the FIRs are masked, the NPU logic could fence the links, which is not
what we want. So this patch makes sure we don't fence on the FIRs we
want to ignore. It has no effect on existing firmware.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Currently, we only have a single range for LPC memory per chip, and we only
allow a single device to use that range.
With upcoming Hostboot/SBE changes, we'll use the chip address extension
mask to give us multiple ranges by using the masked bits of the group ID.
Each device can now allocate a whole 4TB non-mirrored region. We still
don't do >4TB ranges.
If the extension mask is not set correctly, we'll fall back to only
permitting one device and printing an error suggesting a firmware upgrade.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Until now, opencapi PHBs were not using the 'ibm,phb-index' property,
as it was thought unnecessary. For nvlink, a phb-index was associated
to the npu when parsing hdat data, and the nvlink PHB was reusing the
same value.
It turns out it helps to have the 'ibm,phb-index' property for
opencapi PHBs after all. Otherwise it can lead to wrong results on
platforms like mihawk when trying to match entries in the slot
table. We end up with an opencapi device inheriting wrong properties
in the device tree, because match_slot_phb_entry() default to
phb-index 0 if it cannot find the property. Though it doesn't seem to
cause any harm, it's wrong and a future patch is expected to start
using the slot table for opencapi, so it needs fixing.
The twist is that with opencapi, we can have multiple virtual PHBs for
a single NPU on P9. There's one PHB per (opencapi) brick. Therefore
there's no 1-to-1 mapping between the NPU and PHB index and it no
longer makes sense to associate a phb-index to a npu.
With this patch, opencapi PHBs created under a NPU use a fixed mapping
for their phb-index, based on the brick index. The range of possible
values is 7 to 12. Because there can only be one nvlink PHB per NPU,
it is always using a phb-index of 7.
A side effect is that 2 virtual PHBs on 2 different chips can have the
same phb-index, which is similar to what happens for 'real' PCI PHBs,
but is different from what was happening on a nvlink-only witherspoon
so far.
Reviewed-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This adds missing endian conversions to most calls, sufficient at least
to handle calls from a kernel booting on mambo.
Subsystems requiring more extensive changes (e.g., xive) will be done
with individual changes.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
There are a number of proc_gen branches removed that are trivially
dead code and comments that refer to P7. As well as those:
- Oliver points out that add_xics_icps() must be unused on POWER8
because it asserts if number of threads > 4, so remove it.
- Change 16b7ae641 ("Remove POWER7 and POWER7+ support") removed all
references to opal_boot_trampoline, so remove that.
- It also removed the only non-trival choose_bus implementation, so
that is removed and its caller simplified.
- Remove the paca code, later CPUs use pcia.
Cc: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
If you try to allocate an amount of LPC memory that's not a power of 2,
we round the value up to the nearest power of 2.
By the magic of C, "1 << n" gets treated as an int, even if you're
assigning it to a uint64_t.
Change 1 to 1ULL to fix this.
(C, it's great.)
Reported-by: Alastair D'Silva <alistair@d-silva.org>
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
On P9, the NPU doesn't support recovery if the link goes down
unexpectedly. It was not fully verified. We mark the device as broken
when we receive an error interrupt from the NPU. However, there's
nothing to prevent the OS from trying to reset the device; It may or
may not work, it's unsupported territory, so let's log a message to
make it clear, as it could help when debugging. We haven't hit any
cases where the reset goes badly enough that we'd want to prevent it,
so let it go for now. We can revisit later if we have evidence that
it's causing more problems than it is worth.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
In a hot-unplug scenario, the OS will try to unmap the PE. Skiboot
doesn't do anything with the linux PE for opencapi other than being a
mailbox, but at least let's be consistent.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Implement the get_power_state() and set_power_state() callbacks for
the opencapi slot and add properties in the device tree to mark the
opencapi slot as hot-pluggable.
We don't really power off/on the opencapi adapter. The slot at play
here is the virtual slot associated to the virtual opencapi PHB. The
real PCIe slot where the card is drawing its power from is
untouched (skiboot is not even aware which PCIe slot the card is
seated on). So the 'fake' power off is fencing the card and set it in
reset so that the FPGA image can be updated. The 'fake' power on is
not doing much, as the unfencing happens on the subsequent link
training.
Opencapi slots are named 'OPENCAPI-xxxx' where xxxx is the opal ID of
the PHB/slot. This is meant to easily identify the slot used by an AFU
device, as the AFU device names are also built around that ID.
For example, the device /dev/ocxl/AFP3.0006:00:00.1.0 uses the slot
OPENCAPI-0006.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
When resetting an opencapi link, the brick will be fenced
temporarily. Therefore we can't rely on the fencing state of the brick
any more to check for the health of an opencapi PHB, as we could
report errors if queried for a PHB state at the same time a link is
being reset.
Instead, we flag the device as 'broken' when an error interrupt is
received, just before raising an event to the OS. When the OS is
querying for the state of a PHB, we only have to look at the 'broken'
attribute.
Note that there's no recovery possible on P9 when an error interrupt
is received unexpectedly, as recovery is not supported by hardware. So
when a device/link is marked as 'broken', it stays broken. All the OS
can do is log the error and notify the drivers.
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
PHY reset can fail! Though past problems are now fixed, let's handle
any future failure.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Let's get rid of one transitional state, since there's no need to
pause in between releasing the reset signals of the ODL and the
adapter.
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Modify slightly the ordering of a few steps in our init sequence on
fundamental reset, so that it can be called from the OS, when the link
is already up:
- when the card is reset, the link goes down, so we need to fence the
brick to prevent errors propagating to the NPU and OS
- since fencing and unfencing don't require any delay, let's also
fence/unfence during the very first reset at boot. It's useless but
doesn't hurt and keep the code simpler.
- resetting the PHY must be done a bit later, while fenced and the ODL
and DLx in reset
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Opencapi link state should be polled for up to 3 seconds. Current code
assumes a tight retry loop during fundamental reset at boot, which is
not going to be true on link retraining. So update the timeout
detection code to use a timebase instead of a simple retry count which
could be way too long.
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The PCI slot created for the opencapi PHB didn't have its ID properly
defined because it was created before we assign an ID to the
PHB. Simply switch the PCI slot creation and PHB registration calls to
fix it.
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Use Software Package Data Exchange (SPDX) to indicate license for each
file that is unique to skiboot.
At the same time, ensure the (C) who and years are correct.
See https://spdx.org/
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
[oliver: Added a few missing files]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some
opencapi FIRs") converted some FIR bits default action from system
checkstop to raising an error interrupt. For 2 XSL error events that
can be triggered by a misbehaving AFU, the error interrupt is raised
twice, once for each link (the XSL logic in the NPU is shared between
2 links). So a badly behaving AFU could impact another, unsuspecting
opencapi adapter.
It doesn't look good and it turns out we can do better. We can mask
those 2 XSL errors. The error will also be picked up by the OTL logic,
which is per link. So we'll still get an error interrupt, but only on
the relevant link, and the other opencapi adapter can stay functional.
Fixes: f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs")
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Never present in a public OPAL release, and only kernels prior to 3.11
would ever attempt to call it.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Lowest Point of Coherency (LPC) memory allows the host to access memory on
an OpenCAPI device.
Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for
assigning and clearing the memory BAR. (We try to avoid using the term
"LPC" to avoid confusion with Low Pin Count.)
At present, we use a fixed location in the address space, which means we
are restricted to a single range of 4TB, on a single OpenCAPI device per
chip. In future, we'll use some chip ID extension magic to give us more
space, and some sort of allocator to assign ranges to more than one device.
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Most nvram options used by skiboot are just for debug or testing for
regressions. They should never be used long term.
We've hit a number of issues in testing and the field where nvram
options have been set "temporarily" but haven't been properly cleared
after, resulting in crashes or real bugs being masked.
This patch marks most nvram options used by skiboot as dangerous and
prints a chicken to remind users of the problem.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Acked-By: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Now that the NPU may report interrupts due to the link going down
unexpectedly, report those errors to the OS when queried by the
'next_error' PHB callback.
The hardware doesn't support recovery of the link when it goes down
unexpectedly. So we report the PHB as dead, so that the OS can log the
proper message, notify the drivers and take the devices down.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Many errors reported in the NPU FIR2 register, mostly catching
unexpected errors on the opencapi link are defined as 'brick fatal' in
the workbook, yet the default action is set to system checkstop. It's
possible to see those errors during AFU development, where the AFU may
send unexpected packets on the link, therefore triggering those
errors. Checkstopping the system in this case is clearly extreme, as
the error could be contained to the brick and proper analysis of a
checkstop is not trivial outside of a bringup environment.
This patch changes the default action of those errors so that the NPU
will raise an interrupt instead. Follow-up patches will log
proper information so that the error can be debugged and linux can
catch the event.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Start using the irq setup code from NVLink for OpenCAPI, since the 2
versions are so close. There are only 2 differences:
- the NPU may trigger more interrupts for OpenCAPI, 35 vs. 23, though
none are configured to be triggered for now.
- we need to enable the 4 translation faults interrupts for OpenCAPI.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
When we support mixing NVLink and OpenCAPI devices on the same NPU, we're
going to have to share the same range of 16 PE numbers between NVLink and
OpenCAPI PHBs.
For OpenCAPI devices, PE assignment is only significant for determining
which System Interrupt Log register is used for a particular brick - unlike
NVLink, it doesn't play any role in determining how links are fenced.
Split the PE range into a lower half which is used for NVLink, and an upper
half that is used for OpenCAPI, with a fixed PE number assigned per brick.
As the PE assignment for OpenCAPI devices is fixed, set the PE once
during device init and then ignore calls to the set_pe() operation.
Suggested-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
For opencapi, we currently do impedance calibration when initializing
the PHY for the device, which could run in parallel if we were rich
and had multiple opencapi devices. But if 2 devices are on the same
obus, the 2 calibration sequences could overlap, which likely yields
bad results and is useless anyway since it only needs to be done once
per obus.
This patch splits the opencapi PHY reset in 2 parts:
- a 'init' part called serially at boot. That's when zcal is done. If
we have 2 devices on the same socket, the zcal won't be redone,
since we're called serially and we'll see it has already be done for
the obus
- a 'reset' part called during fundamental reset as a prereq for link
training. It does the PHY setup for a set of lanes and the dccal.
The PHY team confirmed there's no dependency between zcal and the
other reset steps and it can be moved earlier.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
If two opencapi adapters are on the same obus, we may try to train the
two links in parallel at boot time, when all the PCI links are being
trained. Both links use the same i2c controller to handle the reset
signal, so some care is needed to make sure resetting one doesn't
interfere with the reset of the other. We need to keep track of the
current state of the i2c controller (and use locking).
This went mostly unnoticed as you need to have 2 opencapi cards on the
same socket and links tended to train anyway because of the retries.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Give more time to the FPGA to process the reset signal. The previous
delay, 5ms, is too short for newer adapters with bigger FPGAs. Extend
it to 250ms.
Ultimately, that delay will likely end up being added to the opencapi
specification, but we are not there yet.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We haven't hit any problem so far, but from the ODL designer, the ODL
should be in reset when it is enabled.
The ODL remains in reset until we start a fundamental reset to
initiate link training. We still assert and deassert the ODL reset
signal as part of the normal procedure just before training the
link. Asserting is therefore useless at boot, since the ODL is already
in reset, but we keep it as it's only a scom write and it's needed
when we reset/retrain from the OS.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Split the function to assert and deassert the reset signal on the ODL,
so that we can keep the ODL in reset while we reset the adapter,
therefore having a window where both sides are in reset.
It is actually not required with our current DLx at boot time, but I
need to split the ODL reset function for the following patch and it
will become useful/required later when we introduce resetting an
opencapi link from the OS.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This is really to avoid confusion with a later patch and clarify
whether we're resetting the ODL or the adapter.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
It's possible to set up performance counters for the PLL to detect
various conditions for the links in nvlink or opencapi mode. Since
those counters are currently unused, let's configure them when an obus
is in opencapi mode to detect CRC errors on the link. Each link has
two counters:
- CRC error detected by the host
- CRC error detected by the DLx (NAK received by the host)
We also dump the counters shortly after the link trains, but they can
be read multiple times through cronus, pdbg or linux. The counters are
configured to be reset after each read.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
ODL registers used to control the opencapi link state have an address
built on a base address and an offset for each brick which can be
computed instead of hard-coded individually for each brick.
Rework how we access the ODL registers, to avoid repeating switch
statements all over the place.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This cache is written but never read. Wiring it up would gain us little
(except added complexity), and it obviously hasn't been missed thus far,
so remove it altogether.
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
If the link trains in degraded mode, log the ODL endpoint information
register for debug. Its content is specific to the DLx and TLx
implementation, so this is really information useful for the hardware
team.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
There's no status readily available to tell the effective link
width. Instead, we have to look at the individual status of each lane,
on the transmit and receive direction. All relevant information is in
the ODL status register.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Log the link training status register in case of failure to train.
It can have useful information for the hardware team.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In platform_ocapi, we define i2c_{reset,presence}_odl{0,1} to specify the
appropriate reset/presence GPIO pins for devices connected to ODL0 and ODL1
respectively.
This is obviously wrong, because a device connected to brick 2 and a device
connected to brick 4 are going to be different devices connected to
different I2C pins, but rather conveniently we haven't had to deal with
systems that can use the full 4 bricks as yet. Now that we're adding
OpenCAPI support for Witherspoon, we should change this to specify pins
separately for all 4 bricks.
Replace i2c_{reset,presence}_odl{0,1} with
i2c_{reset,presence}_brick{2,3,4,5} and update the presence detection code,
device reset code, and existing platforms accordingly.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
There is no standardised way to determine the presence and type of devices
connected to an NPU on POWER9.
Currently, we hardcode device types based on platform type (as no platform
currently supports both OpenCAPI and NVLink), and for OpenCAPI platforms
we use I2C to detect presence.
Witherspoon (and potentially other platforms later on) supports both
NVLink and OpenCAPI, and additionally uses SXM2 connectors which can carry
more than one link, rather than the SlimSAS connectors used for OpenCAPI on
Zaius and ZZ. This necessitates some special handling.
Add a platform callback for NPU device detection. In a later patch, we
will use this to implement Witherspoon-specific device detection. For now,
add a Witherspoon stub that sets all links to NVLink (i.e. current
behaviour).
Move the existing I2C-based presence detection for OpenCAPI devices on
Zaius/ZZ into common code, which we use by default for platforms which do
not define a callback. Clean up the use of the ibm,npu-link-type property,
which will now be exposed solely for debugging and not consumed internally.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Replace probe_npu2() and probe_npu2_opencapi() with a new shared
probe_npu2(). Refactor some of the common NPU setup code into shared code.
No functional change. This patch does not implement support for using both
types of devices simultaneously on the same NPU - we expect to add this
sometime in the future.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
On Witherspoon, OpenCAPI devices attached to link indexes 0 and 1 are
handled by bricks 2 and 3.
Rename index to brick_index, and add a new field, link_index, to
refer to the link index. For now, we set those values identically.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Alistair Popple <alistair@popple.id.au>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In opal_npu_tl_set(), we made a typo that means the OPAL_NPU_TL_SET call
may not clear the enable bits for templates that were previously enabled
but are now disabled.
Fix the typo so we clear NPU2_OTL_CONFIG1_TX_TEMP2_EN as well as
TEMP{1,3}_EN.
Reported-by: Tyler Seredynski <tseredynski@gmail.com>
Fixes: cd8b82a8e83ed ("npu2-opencapi: Add OpenCAPI OPAL API calls")
Cc: stable
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Even if an opencapi link is down, we currently always try to issue a
config read operation when probing for PCI devices, because of the
default scan map used for an opencapi PHB. The config operation fails,
as expected, but it can also raise a FIR bit and trigger an HMI.
For opencapi, there's no root device like for a "normal" PCI PHB, so
there's no reason to do the config operation. To fix it, we keep the
scan map blank by default, and only add a device once the link is
trained.
CC: stable # v6.1+
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
device tree
Currently, we distinguish between NPU links for NVLink devices and OpenCAPI
devices through the use of two different compatible strings - ibm,npu-link
and ibm,npu-link-opencapi.
As we move towards supporting configurations with both NVLink and OpenCAPI
devices behind a single NPU, we need to detect the device type as part of
presence detection, which can't happen until well after the point where the
HDAT or platform code has created the NPU device tree nodes. Changing a
node's compatible string after it's been created is a bit ugly, so instead
we should move the device type to a new property which we can add to the
node later on.
Get rid of the ibm,npu-link-opencapi compatible string, add a new
ibm,npu-link-type property, and a helper function to check the link type.
Add an "unknown" device type in preparation for later patches to detect
device type dynamically.
These device tree bindings are entirely internal to skiboot and are not
consumed directly by Linux, so this shouldn't break anything (other than
internal BML lab environments).
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|