diff options
author | Frederic Barrat <fbarrat@linux.ibm.com> | 2019-10-09 21:38:08 +0200 |
---|---|---|
committer | Oliver O'Halloran <oohall@gmail.com> | 2019-10-22 17:31:47 +1100 |
commit | 40bc636eb6b64473ec9ef167bd2bc91a9c032806 (patch) | |
tree | aa4f89e6cddb12bdee6be30776d4521ce38a21bc /hw/npu2-common.c | |
parent | dbc70aea3a2eec5d8d3c092c2397b2997e35ba60 (diff) | |
download | skiboot-40bc636eb6b64473ec9ef167bd2bc91a9c032806.zip skiboot-40bc636eb6b64473ec9ef167bd2bc91a9c032806.tar.gz skiboot-40bc636eb6b64473ec9ef167bd2bc91a9c032806.tar.bz2 |
npu2-opencapi: Improve error reporting to the OS
When resetting an opencapi link, the brick will be fenced
temporarily. Therefore we can't rely on the fencing state of the brick
any more to check for the health of an opencapi PHB, as we could
report errors if queried for a PHB state at the same time a link is
being reset.
Instead, we flag the device as 'broken' when an error interrupt is
received, just before raising an event to the OS. When the OS is
querying for the state of a PHB, we only have to look at the 'broken'
attribute.
Note that there's no recovery possible on P9 when an error interrupt
is received unexpectedly, as recovery is not supported by hardware. So
when a device/link is marked as 'broken', it stays broken. All the OS
can do is log the error and notify the drivers.
Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Diffstat (limited to 'hw/npu2-common.c')
-rw-r--r-- | hw/npu2-common.c | 7 |
1 files changed, 7 insertions, 0 deletions
diff --git a/hw/npu2-common.c b/hw/npu2-common.c index 6d5c35a..51ecd0c 100644 --- a/hw/npu2-common.c +++ b/hw/npu2-common.c @@ -406,6 +406,13 @@ static void npu2_err_interrupt(struct irq_source *is, uint32_t isn) p->chip_id, irq_name); free(irq_name); show_all_regs(p, brick); + /* + * P9 NPU doesn't support recovering a link going down + * unexpectedly. So we mark the device as broken and + * report it to the OS, so that the error is logged + * and the drivers notified. + */ + npu2_opencapi_set_broken(p, brick); opal_update_pending_evt(OPAL_EVENT_PCI_ERROR, OPAL_EVENT_PCI_ERROR); break; |