aboutsummaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorAlexey Kardashevskiy <aik@ozlabs.ru>2019-04-29 19:12:27 +1000
committerStewart Smith <stewart@linux.ibm.com>2019-05-02 09:57:15 +1000
commit0f492a92590850af6360bdcc93e2047b285d41c7 (patch)
treef9f79e96583903bc23546280c6457f62560c6b5c /include
parent44afdc1afb1fe17ad7ae6758279f856a464cb922 (diff)
downloadskiboot-0f492a92590850af6360bdcc93e2047b285d41c7.zip
skiboot-0f492a92590850af6360bdcc93e2047b285d41c7.tar.gz
skiboot-0f492a92590850af6360bdcc93e2047b285d41c7.tar.bz2
npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default
V100 GPUs are known to violate NVLink2 protocol in some cases (one is when memory was accessed by the CPU and they by GPU using so called block linear mapping) and issue double probes to NPU which can cope with this problem only if CONFIG_ENABLE_SNARF_CPM ("disable/enable Probe.I.MO snarfing a cp_m") is not set in the CQ_SM Misc Config register #0. If the bit is set (which is the case today), NPU issues the machine check stop. The snarfing feature is designed to detect 2 probes in flight and combine them into one. This adds a new "opal-npu2-snarf-cpm" nvram variable which controls CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check stop from happening. This disables snarfing by default as otherwise a broken GPU driver can crash the entire box even when a GPU is passed through to a guest. This provides a dial to allow regression tests (might be useful for a bare metal). To enable snarfing, the user needs to run: sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable and reboot the host system. While at this, define macros for register names as well to avoid touching same lines over and over again. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Diffstat (limited to 'include')
-rw-r--r--include/npu2-regs.h14
1 files changed, 14 insertions, 0 deletions
diff --git a/include/npu2-regs.h b/include/npu2-regs.h
index ba10b8e..61e8ea8 100644
--- a/include/npu2-regs.h
+++ b/include/npu2-regs.h
@@ -791,4 +791,18 @@ void npu2_scom_write(uint64_t gcid, uint64_t scom_base,
#define L3_PRD_PURGE_TTYPE_MASK PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3) | PPC_BIT(4)
#define L3_FULL_PURGE 0x0
+/* Config registers for NPU2 */
+#define NPU_STCK0_CS_SM0_MISC_CONFIG0 0x5011000
+#define NPU_STCK0_CS_SM1_MISC_CONFIG0 0x5011030
+#define NPU_STCK0_CS_SM2_MISC_CONFIG0 0x5011060
+#define NPU_STCK0_CS_SM3_MISC_CONFIG0 0x5011090
+#define NPU_STCK1_CS_SM0_MISC_CONFIG0 0x5011200
+#define NPU_STCK1_CS_SM1_MISC_CONFIG0 0x5011230
+#define NPU_STCK1_CS_SM2_MISC_CONFIG0 0x5011260
+#define NPU_STCK1_CS_SM3_MISC_CONFIG0 0x5011290
+#define NPU_STCK2_CS_SM0_MISC_CONFIG0 0x5011400
+#define NPU_STCK2_CS_SM1_MISC_CONFIG0 0x5011430
+#define NPU_STCK2_CS_SM2_MISC_CONFIG0 0x5011460
+#define NPU_STCK2_CS_SM3_MISC_CONFIG0 0x5011490
+
#endif /* __NPU2_REGS_H */