aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>2016-08-13 18:41:11 +0530
committerStewart Smith <stewart@linux.vnet.ibm.com>2016-08-25 19:01:58 +1000
commitbb18811054c71c8b8a56df597f3d790b4c08908d (patch)
treea3172d43301759d2caa0d3c31acd9ed9c50db0e1
parentec1cf514b83eb22c57bf8aa65d8acd712759a0c2 (diff)
downloadskiboot-bb18811054c71c8b8a56df597f3d790b4c08908d.zip
skiboot-bb18811054c71c8b8a56df597f3d790b4c08908d.tar.gz
skiboot-bb18811054c71c8b8a56df597f3d790b4c08908d.tar.bz2
opal/hmi: Fix a TOD HMI failure during a race condition.
There are chances where another interrupt can wake a CPU in 0x100 vector just when HMI for TOD error is also pending. In such a rare race condition if CPU has woken up with tb_loss power saving mode, it will invoke opal call to resync the TB. Since TOD is already in error state, resync TB will timeout leaving TFMR bit 18 set to '1'. (TFMR[18]=1 means TB is prepared to receive new value from TOD. Once the new value is received this bit gets reset to '0', otherwise TB would stay in waiting state). When HMI is delivered, it may find all TFMR errors are already cleared but would fail to restore TB since TFMR bit 18 is already set. This leads to HMI recovery failure causing a kernel crash. This patch fixes this by clearing of TB errors if TFMR[18] is set to 1. This makes sure that TB is in clean state before TB restore process starts. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 026b9a13bf8d61a7e72721d59961b40cbc98b410) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
-rw-r--r--hw/chiptod.c7
1 files changed, 7 insertions, 0 deletions
diff --git a/hw/chiptod.c b/hw/chiptod.c
index 88f6c8e..91c8ce4 100644
--- a/hw/chiptod.c
+++ b/hw/chiptod.c
@@ -1499,11 +1499,18 @@ int chiptod_recover_tb_errors(void)
* Check for TB errors.
* On Sync check error, bit 44 of TFMR is set. Check for it and
* clear it.
+ *
+ * In some rare situations we may have all TB errors already cleared,
+ * but TB stuck in waiting for new value from TOD with TFMR bit 18
+ * set to '1'. This uncertain state of TB would fail the process
+ * of getting TB back into running state. Get TB in clean initial
+ * state by clearing TB errors if TFMR[18] is set.
*/
if ((tfmr & SPR_TFMR_TB_MISSING_STEP) ||
(tfmr & SPR_TFMR_TB_RESIDUE_ERR) ||
(tfmr & SPR_TFMR_FW_CONTROL_ERR) ||
(tfmr & SPR_TFMR_TBST_CORRUPT) ||
+ (tfmr & SPR_TFMR_MOVE_CHIP_TOD_TO_TB) ||
(tfmr & SPR_TFMR_TB_MISSING_SYNC)) {
if (!tfmr_recover_tb_errors(tfmr)) {
rc = 0;