diff options
author | Daniel Axtens <dja@axtens.net> | 2015-08-13 14:40:04 +1000 |
---|---|---|
committer | Stewart Smith <stewart@linux.vnet.ibm.com> | 2015-08-18 08:10:51 +1000 |
commit | efe71cdf61aab1802b69e9aba768856e6c846a35 (patch) | |
tree | 44c45dbb849905ed0a0cf5d23b3b3bc383186ccf /doc | |
parent | 8ec792b604481fb02ffc5c41bd02e74cffda9b1d (diff) | |
download | skiboot-efe71cdf61aab1802b69e9aba768856e6c846a35.zip skiboot-efe71cdf61aab1802b69e9aba768856e6c846a35.tar.gz skiboot-efe71cdf61aab1802b69e9aba768856e6c846a35.tar.bz2 |
phb3: Continue CAPP setup even if PHB is already in CAPP mode
This fixes a critical bug in CAPI support.
CAPI requires that all faults are escalated into a fence, not a
freeze. This is done by setting bits in a number of MMIO
registers. phb3_set_capi_mode() calls phb3_init_capp_errors() to do
this. However, if the PHB is already in CAPP mode - for example in the
recovery case - phb3_set_capi_mode() will bail out early, and those
registers will not be set.
This is quite easy to verify. PCI config space access errors, for
example, normally cause a freeze. On a CAPI-mode PHB, they should
cause a fence. Say we have a CAPI card on PHB 0, and we inject a
PCI config space error:
echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0000/err_injct_inboundA;
lspci;
The first time we inject this, the PHB will fence and recover, but
won't reset the registers. Therefore, the second time we inject it,
we will incorrectly freeze, not fence.
Worse, the recovery for the resultant EEH freeze event interacts
poorly with the CAPP, triggering an EEH recovery of the PHB. The
combination of the two attempted recoveries will get the PHB into
an inoperable state.
It's quite likely that there other side effects of bailing out
early. For example, the timebase sync probably fails to recover.
Rather than auditing all the possibilities, I verified that
repeating the entire setup procedure still works when the PHB is
already in CAPP mode. It does work, so just do the entire setup
every time instead of bailing out early.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Diffstat (limited to 'doc')
0 files changed, 0 insertions, 0 deletions