diff options
author | Stewart Smith <stewart@linux.ibm.com> | 2019-04-30 14:38:43 +1000 |
---|---|---|
committer | Stewart Smith <stewart@linux.ibm.com> | 2019-05-02 09:57:15 +1000 |
commit | cb2e148df960b86821e10bc55ae39aeac9d56e4e (patch) | |
tree | 126cea0cf02ac118ba6f7f13882d930df08ca74b /doc | |
parent | c3e38ba93c5f5b6e8983fbd3aafe227756483d97 (diff) | |
download | skiboot-cb2e148df960b86821e10bc55ae39aeac9d56e4e.zip skiboot-cb2e148df960b86821e10bc55ae39aeac9d56e4e.tar.gz skiboot-cb2e148df960b86821e10bc55ae39aeac9d56e4e.tar.bz2 |
doc: Add (most) nvram debugging options
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Diffstat (limited to 'doc')
-rw-r--r-- | doc/console-log.rst | 17 | ||||
-rw-r--r-- | doc/device-tree/ibm,opal/power-mgt.rst | 2 | ||||
-rw-r--r-- | doc/index.rst | 1 | ||||
-rw-r--r-- | doc/opal-api/opal-cec-reboot-6-116.rst | 11 | ||||
-rw-r--r-- | doc/pci.rst | 119 | ||||
-rw-r--r-- | doc/power-management.rst | 17 |
6 files changed, 166 insertions, 1 deletions
diff --git a/doc/console-log.rst b/doc/console-log.rst index ca9ec3f..c758e9a 100644 --- a/doc/console-log.rst +++ b/doc/console-log.rst @@ -74,3 +74,20 @@ still only PR_NOTICE through drivers. People who write something like 0x1f will get a very quiet boot indeed. +Debugging +--------- + +You can change the log level of what goes to the in memory buffer and whta +goes to the driver (i.e. serial port / IPMI Serial over LAN) at boot time +by setting NVRAM variables: :: + + nvram -p ibm,skiboot --update-config log-level-driver=7 + nvram -p ibm,skiboot --update-config log-level-memory=7 + +You can also use the named versions of emerg, alert, crit, err, +warning, notice, printf, info, debug, trace or insane. ie. :: + + nvram -p ibm,skiboot --update-config log-level-driver=insane + + +You an also write to the debug_descriptor to change it at runtime. diff --git a/doc/device-tree/ibm,opal/power-mgt.rst b/doc/device-tree/ibm,opal/power-mgt.rst index b326a24..8d9439d 100644 --- a/doc/device-tree/ibm,opal/power-mgt.rst +++ b/doc/device-tree/ibm,opal/power-mgt.rst @@ -1,3 +1,5 @@ +.. _power-mgt-devtree: + ibm,opal/power-mgt device tree entries ====================================== diff --git a/doc/index.rst b/doc/index.rst index b7a868c..79a5acc 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -46,6 +46,7 @@ Developer Guide and Internals xscom-node-bindings xive imc + power-management OPAL ABI diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst index 516d4fc..e9e53ce 100644 --- a/doc/opal-api/opal-cec-reboot-6-116.rst +++ b/doc/opal-api/opal-cec-reboot-6-116.rst @@ -66,3 +66,14 @@ OPAL_REBOOT_FULL_IPL = 2 Unsupported Reboot type For unsupported reboot type, this function will return with OPAL_UNSUPPORTED and no reboot will be triggered. + +Debugging +^^^^^^^^^ + +This is **not** ABI and may change or be removed at any time. + +You can change if the software checkstop trigger is used or not by an NVRAM +variable: :: + + nvram -p ibm,skiboot --update-config opal-sw-xstop=enable + nvram -p ibm,skiboot --update-config opal-sw-xstop=disable diff --git a/doc/pci.rst b/doc/pci.rst index f72fc14..d18d35d 100644 --- a/doc/pci.rst +++ b/doc/pci.rst @@ -1,7 +1,124 @@ PCI === -**WARNING**: This documentation **urgently needs updating** and is *woefully* incomplete. +Debugging +--------- + +There exist a couple of NVRAM options for enabling extra debug functionality +to help debug PCI issues. These are not ABI and may be changed or removed at +**any** time. + +Verbose EEH +^^^^^^^^^^^ + +:: + + nvram -p ibm,skiboot --update-config pci-eeh-verbose=true + +Disable EEH MMIO +^^^^^^^^^^^^^^^^ +:: + nvram -p ibm,skiboot --update-config pci-eeh-mmio=disabled + + +Check for RX errors after link training +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Some PHB4 PHYs can get stuck in a bad state where they are constantly +retraining the link. This happens transparently to skiboot and Linux +but will causes PCIe to be slow. Resetting the PHB4 clears the +problem. + +We can detect this case by looking at the RX errors count where we +check for link stability. This patch does this by modifying the link +optimal code to check for RX errors. If errors are occurring we +retrain the link irrespective of the chip rev or card. + +Normally when this problem occurs, the RX error count is maxed out at +255. When there is no problem, the count is 0. We chose 8 as the max +rx errors value to give us some margin for a few errors. There is also +a knob that can be used to set the error threshold for when we should +retrain the link. i.e. :: + + nvram -p ibm,skiboot --update-config phb-rx-err-max=8 + +Retrain link if degraded +^^^^^^^^^^^^^^^^^^^^^^^^ + +On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and +below) the PCIe PHY can lockup causing training issues. This can cause +a degradation in speed or width in ~5% of training cases (depending on +the card). This is fixed in later chip revisions. This issue can also +cause PCIe links to not train at all, but this case is already +handled. + +There is code in skiboot that checks if the PCIe link has trained optimally +and if not, does a full PHB reset (to fix the PHY lockup) and retrain. + +One complication is some devices are known to train degraded unless +device specific configuration is performed. Because of this, we only +retrain when the device is in a whitelist. All devices in the current +whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon. + +We always gather information on the link and print it in the logs even +if the card is not in the whitelist. + +For testing purposes, there's an nvram to retry all PCIe cards and all +P9 chips when a degraded link is detected. The new option is +'pci-retry-all=true' which can be set using: :: + + nvram -p ibm,skiboot --update-config pci-retry-all=true + +This option may increase the boot time if used on a badly behaving +card. + +Maximum link speed +^^^^^^^^^^^^^^^^^^ + +Was useful during bringup on P9 DD1. + +:: + nvram -p ibm,skiboot --update-config pcie-max-link-speed=4 + + +Ric Mata Mode +^^^^^^^^^^^^^ + +This mode (for PHB4) will trace the training process closely. This activates +as soon as PERST is deasserted and produces human readable output of +the process. + +It will also add the PCIe Link Training and Status State Machine (LTSSM) tracing +and details on speed and link width. + +Output looks a bit like this :: + + [ 1.096995141,3] PHB#0000[0:0]: TRACE:0x0000001101000000 0ms GEN1:x16:detect + [ 1.102849137,3] PHB#0000[0:0]: TRACE:0x0000102101000000 11ms presence GEN1:x16:polling + [ 1.104341838,3] PHB#0000[0:0]: TRACE:0x0000182101000000 14ms training GEN1:x16:polling + [ 1.104357444,3] PHB#0000[0:0]: TRACE:0x00001c5101000000 14ms training GEN1:x16:recovery + [ 1.104580394,3] PHB#0000[0:0]: TRACE:0x00001c5103000000 14ms training GEN3:x16:recovery + [ 1.123259359,3] PHB#0000[0:0]: TRACE:0x00001c5104000000 51ms training GEN4:x16:recovery + [ 1.141737656,3] PHB#0000[0:0]: TRACE:0x0000144104000000 87ms presence GEN4:x16:L0 + [ 1.141752318,3] PHB#0000[0:0]: TRACE:0x0000154904000000 87ms trained GEN4:x16:L0 + [ 1.141757964,3] PHB#0000[0:0]: TRACE: Link trained. + [ 1.096834019,3] PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect + [ 1.105578525,3] PHB#0001[0:1]: TRACE:0x0000102101000000 17ms presence GEN1:x16:polling + [ 1.112763075,3] PHB#0001[0:1]: TRACE:0x0000183101000000 31ms training GEN1:x16:config + [ 1.112778956,3] PHB#0001[0:1]: TRACE:0x00001c5081000000 31ms training GEN1:x08:recovery + [ 1.113002083,3] PHB#0001[0:1]: TRACE:0x00001c5083000000 31ms training GEN3:x08:recovery + [ 1.114833873,3] PHB#0001[0:1]: TRACE:0x0000144083000000 35ms presence GEN3:x08:L0 + [ 1.114848832,3] PHB#0001[0:1]: TRACE:0x0000154883000000 35ms trained GEN3:x08:L0 + [ 1.114854650,3] PHB#0001[0:1]: TRACE: Link trained. + +Enabled via NVRAM: :: + + nvram -p ibm,skiboot --update-config pci-tracing=true + +Named after the person the output of this mode is typically sent to. + + +**WARNING**: The documentation below **urgently needs updating** and is *woefully* incomplete. IODA PE Setup Sequences ----------------------- diff --git a/doc/power-management.rst b/doc/power-management.rst new file mode 100644 index 0000000..76491a7 --- /dev/null +++ b/doc/power-management.rst @@ -0,0 +1,17 @@ +Power Management +================ + +See :ref:`power-mgt-devtree` for device tree structure describing power management facilities. + +Debugging +--------- + +There exist a few debug knobs that can be set via nvram settings. These are +**not** ABI and may be changed or removed at *any* time. + +Disabling specific stop states +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +On boot, specific stop states can be disabled via setting a mask. For example, +to disable all but stop 0,1,2, use ~0xE0000000. :: + + nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF |