diff options
author | Reza Arbab <arbab@linux.ibm.com> | 2019-11-14 10:13:04 -0600 |
---|---|---|
committer | Vasant Hegde <hegdevasant@linux.vnet.ibm.com> | 2019-12-05 15:11:23 +0530 |
commit | f67887e3ff68d0986702ae79961d795d9793cc8f (patch) | |
tree | 9e717ed0ae77744111711f816319c1b54495108c /hw | |
parent | f25d660bd5622c59a11adaa174a0d01c9b7263d8 (diff) | |
download | skiboot-f67887e3ff68d0986702ae79961d795d9793cc8f.zip skiboot-f67887e3ff68d0986702ae79961d795d9793cc8f.tar.gz skiboot-f67887e3ff68d0986702ae79961d795d9793cc8f.tar.bz2 |
npu2/hw-procedures: Remove assertion from check_credits()
[ Upstream commit 24664b48642845d620e225111bf6184f3c102f60 ]
The RX clock mux in the NVLink PHY can glitch, which will manifest in
hard to diagnose behavior--at best, a checkstop during the first link
traffic. The only reliable way we found to detect this was by checking
for a discrepancy in the credits we expect to receive during link
training.
Since the time the check was added, we've found that
* Commit ac6f1599ff33 ("npu2: hw-procedures: Add phy_rx_clock_sel()")
does work around the original glitch.
* Asserting is too harsh. Before root cause was established, it was
thought this could have been a manufacturing defect and we wanted to
loudly fail hardware acceptance boot cycle tests.
* It seems there is a valid situation in which credits are off from
the expected value. During GPU hot reset, a CPU prefetch across the link
can affect the credit count before we check.
Given all of the above, remove the assert().
Cc: stable # 6.0.x
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Diffstat (limited to 'hw')
-rw-r--r-- | hw/npu2-hw-procedures.c | 15 |
1 files changed, 6 insertions, 9 deletions
diff --git a/hw/npu2-hw-procedures.c b/hw/npu2-hw-procedures.c index c1ae8f1..24447a4 100644 --- a/hw/npu2-hw-procedures.c +++ b/hw/npu2-hw-procedures.c @@ -783,17 +783,14 @@ static uint32_t check_credit(struct npu2_dev *ndev, uint64_t reg, static uint32_t check_credits(struct npu2_dev *ndev) { - int fail = 0; uint64_t val; - fail += CHECK_CREDIT(ndev, NPU2_NTL_CRED_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); - fail += CHECK_CREDIT(ndev, NPU2_NTL_RSP_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); - fail += CHECK_CREDIT(ndev, NPU2_NTL_CRED_DATA_CREDIT_RX, 0x1001000000000000ULL); - fail += CHECK_CREDIT(ndev, NPU2_NTL_RSP_DATA_CREDIT_RX, 0x1001000000000000ULL); - fail += CHECK_CREDIT(ndev, NPU2_NTL_DBD_HDR_CREDIT_RX, 0x0640640000000000ULL); - fail += CHECK_CREDIT(ndev, NPU2_NTL_ATSD_HDR_CREDIT_RX, 0x0200200000000000ULL); - - assert(!fail); + CHECK_CREDIT(ndev, NPU2_NTL_CRED_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); + CHECK_CREDIT(ndev, NPU2_NTL_RSP_HDR_CREDIT_RX, 0x0BE0BE0000000000ULL); + CHECK_CREDIT(ndev, NPU2_NTL_CRED_DATA_CREDIT_RX, 0x1001000000000000ULL); + CHECK_CREDIT(ndev, NPU2_NTL_RSP_DATA_CREDIT_RX, 0x1001000000000000ULL); + CHECK_CREDIT(ndev, NPU2_NTL_DBD_HDR_CREDIT_RX, 0x0640640000000000ULL); + CHECK_CREDIT(ndev, NPU2_NTL_ATSD_HDR_CREDIT_RX, 0x0200200000000000ULL); val = npu2_read(ndev->npu, NPU2_NTL_MISC_CFG1(ndev)); val &= 0xFF3FFFFFFFFFFFFFUL; |