aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2020-10-22FSP/NVRAM: Do not assert in vNVRAM statistics callVasant Hegde1-2/+1
[ Upstream commit 9ca8bf1bde56330075634bd3cb601d0f6ee90514 ] `msg` is valid pointer here. I don't recall why I added assert here :-( This is not correct. We shouldn't call assert here. Also we are not using `msg`. Hence convert it to `__unused`. Fixes: 19d4f98e ('FSP/NVRAM: Handle "get vNVRAM statistics" command') Cc: skiboot-stable@lists.ozlabs.org # v5.4.x + Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-11-30FSP/IPMI: Handle FSP reset reloadVasant Hegde1-0/+34
[ Upstream commit 2a63db6511b63a75efe820f90bb7972afc2fcdef ] FSP IPMI driver serializes ipmi messages. It sends message to FSP and waits for response before sending new message. It works fine as long as we get response from FSP on time. If we have inflight ipmi message during FSP R/R, we will not get resonse from FSP. So if we initiate inband FSP R/R then all subsequent inband ipmi message gets blocked. Sequence: - ipmitool mc reset cold - <FSP R/R complete> - ipmitool <any command> <-- gets blocked This patch clears inflight ipmi messages after FSP R/R complete. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2017-12-20p8-i2c: Limit number of retry attemptsOliver O'Halloran1-2/+10
Current we will attempt to start an I2C transaction until it succeeds. In the event that the OCC does not release the lock on an I2C bus this results in an async token being held forever and the kernel thread that started the transaction will block forever while waiting for an async completion message. Fix this by limiting the number of attempts to start the transaction. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c2e404aedd52da91fdf605e24b9d1ae7894974c5) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-31FSP/CONSOLE: Disable notification on unresponsive consolesVasant Hegde1-3/+5
Commit fd6b71fc fixed the situation where ipmi console was open (hvc0) but got data on different console (hvc1). During FSP R/R OPAL closes all consoles. After R/R complete FSP requests to open hvc1 and sends data on this. If hvc1 registration failed or not opened in host kernel then it will not read data and results in RCU stalls. Note that this is workaround for older kernel where we don't have separate irq for each console. Latest kernel works fine without this patch. CC: stable CC: Sam Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c9cc5ef5772ebfbfff978f1c25763d733e45752c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Limit number of error loggingVasant Hegde1-8/+13
Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again. This patch fixes it by generating error log only once. I think this is enough to indicate something went wrong. Also with previous patch, once console buffer is full, OPAL is returning error to payload from fsp_console_write_buffer_space(). So payload will never call fsp_console_write(). Hence move error logging logic to right place. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d798c276b4da7559970702c2f31b12549c92741e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Fix fsp_console_write_buffer_space() callVasant Hegde1-1/+35
Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data. In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls. This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: reset timeout in fsp_console_write() path] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 6557a728385c3fbcef297d2c61c7d93bc539f8bb) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Close SOL session during R/RVasant Hegde1-3/+0
Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running `top` command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel. kernel call trace: ------------------ [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769) [ 2082.828365] Task dump for CPU 32: [ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884 [ 2082.828375] Workqueue: events dump_work_fn [ 2082.828376] Call Trace: [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable) [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0 [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98 [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250 [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0 [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0 [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130 [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68 Hence lets close SOL (and FW console) during FSP R/R. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9d1755179112071652bb4a317f9006da630ce25d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Do not associate unavailable consoleVasant Hegde1-0/+11
Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message. OPAL log: ------- [ 5013.227994012,7] FSP: Reassociating HVSI console 1 [ 5013.227997540,7] FSP: Reassociating HVSI console 2 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 175e406ac29435017c5bd08b2c45d93b2c5a7669) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP: Disable PSI link whenever FSP tells OPAL about impending R/RVasant Hegde1-17/+8
Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well. Hence disable PSI link as soon as we detect FSP impending R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a4788a49f004a91bb8ca015336abf9ae119fbc52) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11GCC7: fixes for -Wimplicit-fallthrough expected regexesStewart Smith1-1/+1
It turns out GCC7 adds a useful warning and does fancy things like parsing your comments to work out that you intended to do the fallthrough. There's a few places where we don't match the regex. Fix them, as it's harmless to do so. Found by building on Fedora Rawhide in Travis. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e58aeb1ca304835d65beed98db7f118d3c154cd4) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/NVRAM: Handle "get vNVRAM statistics" commandVasant Hegde1-0/+41
FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD. Sample OPAL log: [16944.384670488,3] FSP: Unhandled message eb0500 [16944.474110465,3] FSP: Unhandled message eb0500 [16945.111280784,3] FSP: Unhandled message eb0500 [16945.293393485,3] FSP: Unhandled message eb0500 With this patch, I don't think FSP will ever call "free vNVRAM" MBOX command. But to be safer side lets return FSP_STATUS_INVALID_SUBCMD for this MBOX command as well. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 19d4f98e9483e4c1cae1d5a59491d8ab4f9a6e7f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-21FSP: Add check to detect FSP R/R inside fsp_sync_msg()Vasant Hegde1-2/+11
OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue including inflight message (fsp_reset_cmdclass()). But we are not resetting inflight message state. In extreme croner case where we sent message to FSP via fsp_sync_msg() path and FSP R/R happens before getting respose from FSP, then we will endup waiting in fsp_sync_msg() until everything becomes normal. This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller if FSP is in R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c74e88e8614de0a82cba5c30812d5aa39db747a9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CONSOLE: Workaround for unresponsive ipmi daemonVasant Hegde1-1/+17
We use TCE mapped area to write data to console. Console header (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer). Kernel makes opal_console_write() OPAL call to write data to console. OPAL write data to TCE mapped area and sends MBOX command to FSP. If our console becomes full and we have data to write to console, we keep on waiting until FSP reads data. In some corner cases, where FSP is active but not responding to console MBOX message (due to buggy IPMI) and we have heavy console write happening from kernel, then eventually our console buffer becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to kernel. Kernel will keep on retrying. This is creating kernel soft lockups. In some extreme case when every CPU is trying to write to console, user will not be able to ssh and thinks system is hang. If we reset FSP or restart IPMI daemon on FSP, system recovers and everything becomes normal. This patch adds workaround to above issue by returning OPAL_HARDWARE when cosole is full. Side effect of this patch is, we may endup dropping latest console data. But better to drop console data than system hang. Alternative approach is to drop old data from console buffer, make space for new data. But in normal condition only FSP can update 'next_out' pointer and if we touch that pointer, it may introduce some other race conditions. Hence we decided to just new console write request. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c8a7535f3539c79955645e6b3714b367a994b1e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Set status field in response message for timed out messageVasant Hegde1-1/+4
For timed out FSP messages, we set message status as "fsp_msg_timeout". But most FSP driver users (like surviellance) are ignoring this field. They always look for FSP returned status value in callback function (second byte in word1). So we endup treating timed out message as success response from FSP. Sample output: [69902.432509048,7] SURV: Sending the heartbeat command to FSP [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3 .... [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP [70023.226903251,3] FSP: fsp_trigger_reset() entry Here SURV code thought it got valid response from FSP. But actually we didn't receive response from FSP. This patch fixes above issue by updating status field in response structure. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4cef4d8d6000936b1a4e1065bf69ee2edd3fcc1f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Improve timeout messageVasant Hegde1-4/+5
Presently we print word0 and word1 in error log. word0 contains sequence number and command class. One has to understand word0 format to identify command class. Lets explicitly print command class, sub command etc. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 807a3acc8fd66af1e1c6e7154aa5029c9b91bb3b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Remove local fsp_in_reset variableVasant Hegde1-10/+0
Now that we are using fsp_in_rr() to detect FSP reset/reload, fsp_in_reset become redundant. Lets remove this local variable. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a34369631e6d85c26966eb0b8d5e4c44bcf96c7c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Fix possible FSP R/R issue in rtc write pathVasant Hegde1-9/+11
fsp_opal_rtc_write() checks FSP status before queueing message to FSP. But if FSP R/R starts before getting response to queued message then we will continue to return OPAL_BUSY_EVENT to host. In some extreme condition host may experience hang. Once FSP is back we will repost message, get response from FSP and return OPAL_SUCCES to host. This patch caches new values and returns OPAL_SUCCESS if FSP R/R is happening. And once FSP is back we will send cached value to FSP. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f4757fbfcf616365c74b1aa6508b2ab27480cdd0) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14hw/fsp/rtc: read/write cached rtc tod on fsp hir.ppaidipe@linux.vnet.ibm.com1-2/+2
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp reset. Use latest fsp_in_rr() function to properly read the cached rtc value when fsp reset initiated by the hir. Below is the kernel trace when we set hw clock, when hir process starts. [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688] [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000 [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70 [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic) [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 1727.775889] CR: 28024442 XER: 20000000 [ 1727.775890] CFAR: c00000000008472c SOFTE: 1 GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4 GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000 GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003 GPR12: c0000000000846e8 c00000000fba0100 [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48 [ 1727.775899] Call Trace: [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable) [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0 [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630 [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0 [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0 [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0 [ 1727.775908] Instruction dump: [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020 [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4 This is found when executing the testcase https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py With this fix ran fsp hir torture testcase in the above test which is working fine. Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 447ccc4de529f001271fd4dfd78401bc4c90832e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CHIPTOD: Return false in error pathVasant Hegde1-0/+1
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 740d00b1036188c6e248418fb0a13faf14723e7a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-09FSP: Notify FSP of Platform Log ID after Host Initiated Reset ReloadStewart Smith2-22/+50
Trigging a Host Initiated Reset (when the host detects the FSP has gone out to lunch and should be rebooted), would cause "Unknown Command" messages to appear in the OPAL log. This patch implements those messages How to trigger FSP RR(HIR): $ putmemproc 300000f8 0x00000000deadbeef s1 k0:n0:s0:p00 ecmd_ppc putmemproc 300000f8 0x00000000deadbeef Log showing unknown command: / # cat /sys/firmware/opal/msglog | grep -i ,3 [ 110.232114723,3] FSP: fsp_trigger_reset() entry [ 188.431793837,3] FSP #0: Link down, starting R&R [ 464.109239162,3] FSP #0: Got XUP with no pending message ! [ 466.340598554,3] FSP-DPO: Unknown command 0xce0900 [ 466.340600126,3] FSP: Unhandled message ce0900 The message we need to handle is "Get PLID after host initiated FipS reset/reload". When the FSP comes back from HIR, it asks "hey, so, which error log explains why you rebooted me?". So, we tell it. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f3a5741408a11be6992cf8779f2eae10b08c020a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-19hw/i2c: Fix early lock dropOliver O'Halloran1-2/+0
When interacting with an I2C master the p8-i2c driver (common to p9) aquires a per-master lock which it holds for the duration of it's interaction with the master. Unfortunately, when p8_i2c_check_initial_status() detects that the master is busy with another transaction it drops the lock and returns OPAL_BUSY. This is contrary to the driver's locking strategy which requires that the caller aquire and drop the lock. This leads to a crash due to the double unlock(), which skiboot treats as fatal. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit bb192fd55ffb20d619101c5e3e1f4fd24f844d11) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-03-16hw/fsp: Do not queue SP and SPCN class messages during reset/reloadAnanth N Mavinakayanahalli3-0/+31
During FSP R/R, the FSP is inaccessible and will lose state. Messages to the FSP are generally queued for sending later. It does seem like the FSP fails to process any subseuqent messages of certain classes (SP info -- ipmi) if it receives queued mbox messages it isn't expecting. In certain other cases (sensors), the FSP driver returns a default code (async completion) even though there is no known bound from the time of this error return to the actual data being available. The kernel driver keeps waiting leading to soft-lockup on the host side. Mitigate both these (known) cases by returning OPAL_BUSY so the host driver knows to retry later. With this change, the sensors command works fine when the FSP comes back. This version also resolves the remaining IPMI issues Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4940b8148640c06e139aec8c6d0370af7dd3b184) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22phb3: Lock the PHB on set_xive callbacksBenjamin Herrenschmidt1-0/+8
Those are called by the interrupts core and thus skip the locking implicit in the PCI opal calls. However IODA table access can be racy, so make sure we lock the PHB. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 55af871041a4b09e53013671450980bdb36f91e3) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-14hw/phb3: fix error handling in complete resetAndrew Donnellan1-2/+1
During a complete reset, when we get a timeout waiting for pending transaction in state PHB3_STATE_CRESET_WAIT_CQ, we mark the PHB as permanently broken. Set the state to PHB3_STATE_FENCED so that the kernel can retry the complete reset. Reported-by: Pradipta Ghosh <pradghos@in.ibm.com> Suggested-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 7fe3de438b19545471d2fb72e54ed01a40b12706) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-02p8-i2c reset things manually in some error conditionsStewart Smith1-22/+157
It appears that our reset code wasn't entirely correct, and what we're meant to do is reset each port and wait for command complete. In the event where that fails, we can then bitbang things to recover to a state where at least the i2c engine isn't in a weird state. Practically, this means that "i2cdetect -y 10; i2cdetect -y 10" (where 10 is the bus where a TPM is attached, typically p8e1p2) doesn't hard lock the machine (things are still bad and you won't reboot successfully, but it's *better*). one downside to this patch is that we spend a *long* time in OPAL (tens of ms) when doing the reset. This is something that we really need to fix, as it's not at all nice. The full fix for this though will involve changing a decent chunk of the p8-i2c code, as we don't want to write *any* registers while doing this extended reset (while existing code checks status a bit later). Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit cf6ec98fe79c59fd5de5c5a77917913af8d2cede) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24slw: do SLW timer testing while holding xscom lockStewart Smith2-9/+29
We add some routines that let a caller get the xscom lock once and then do a bunch of xscoms while holding it. In some situations without this, it could take long enough to get the xscom lock that the 1ms timeout would expire and we'd falsely think the SLW timer didn't work when in fact it did. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a5761fb4585520983716d17fdb33f04891cf0479) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24p8i2c: Use calculated poll_interval when booting OPALStewart Smith1-10/+13
Otherwise we'd default to 2seconds (TIMER_POLL) during boot on chips with a functional i2c interrupt, leading to slow i2c during boot (or hitting timeouts instead). Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 6c077e9ed08c7af7db56ef6f334d204f78e6de8d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24i2c: Add i2c_run_req() to crank the state machine for a requestStewart Smith1-0/+18
Doing everything asynchronously is brilliant, it's exactly what we want to do. Except... the tpm driver wants to do things synchronously, which isn't so cool. For reasons that are not yet completely known, we spend an awful lot of time in the main thread *not* running pollers (potentially seconds), which doesn't bode well for I2C timeouts. Since the TPM measure is done in a secondary thread, we do *not* run pollers there either (as of 323c8aeb54bd4e0b9004091fcbb4a9daeda2f576 - which is roughly as of skiboot 2.1.1). But we still need to crank the i2c state machine, so we introduce a call to do just that. It will return how long the poll interval should be, so that we can time_wait() for a more appropriate time for whatever i2c implementation is sitting behind things. Without this, it was "easy" to get to a situation where the i2c state machine wasn't cranked at all, and you'd hit the i2c timeout (for the issued operation) before the poller to crank i2c was ever called. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Tested-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 441ddb9b4719a092bbbf81fcf775632f282a3fca) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24fsp: Don't recurse pollers in ibm_fsp_terminateStewart Smith2-1/+38
If we were to terminate in a poller, we'd call op_display() which called pollers which hit the recursive poller warning, which ended in not much fun at all. This patch will skip the running of pollers and instead run the FSP poller to set the op-panel display before attn. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9fcb109218b1374a8caa3cac62e83fbedb1f7f2f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-08bmc_platform: fail PNOR access request if no bmc *before* we reserve itStewart Smith1-4/+3
Fixes: 5611389876a748e19b7593d4eb426ced7a6ed31f Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-08Add BMC platform to enable correct OEM IPMI commandsStewart Smith1-2/+19
An out of tree platform (p8dtu) uses a different IPMI OEM command for IPMI_PARTIAL_ADD_ESEL. This exposed some assumptions about the BMC implementation in our core code. Now, with platform.bmc, each platform can dictate (or detect) the BMC that is present. We allow it to be set at runtime rather than purely statically in struct platform as it's possible to have differing BMC implementations on the one machine (e.g. AMI BMC or OpenBMC). Acked-by: Jeremy Kerr <jk@ozlabs.org> [stewart@linux.vnet.ibm.com: remove enum, update (C) years] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-08hw/ipmi-sensor: Fix setting of firmware progress sensor properly.Pridhiviraj Paidipeddi1-2/+42
Currently Hostboot populates /bmc/sensors dt node and corresponding sensors only for BMC platforms, And for FSP platforms hostboot is not populating any fsp sensors(Management sensors) and also there is no firmware progress sensor exist in fsp platforms. Due to which OPAL incorrectly setting firmware status on a sensor id "00" which is not at all exist. On a FSP system: cat /sys/firmware/opal/msglog | grep -i setting [ 21.189204883,6] IPMI: setting fw progress sensor 00 to 07 [ 21.189559121,6] IPMI: setting fw progress sensor 00 to 13 cat /sys/firmware/opal/msglog | grep -i skiboot [ 84.127416495,5] SkiBoot skiboot-5.4.0-rc3 starting... On a BMC system: cat /sys/firmware/opal/msglog | grep -i setting [ 3.166286901,6] IPMI: setting fw progress sensor 05 to 14 [ 14.259153338,6] IPMI: setting fw progress sensor 05 to 07 [ 14.469070593,5] IPMI: Resetting boot count on successful boot [ 15.001210324,6] IPMI: setting fw progress sensor 05 to 13 So this patch fixes this incorrect setting on a fsp system, and also sets the sensor only if OPAL initialises ipmi sensors and corresponding sensor exists for a given sensor type in the device tree. After patch: On a FSP system: cat /sys/firmware/opal/msglog | grep -i setting On a BMC system: cat /sys/firmware/opal/msglog | grep -i setting [ 3.164859816,6] IPMI: setting fw progress sensor 05 to 14 [ 14.024941077,6] IPMI: setting fw progress sensor 05 to 07 [ 14.211514767,5] IPMI: Resetting boot count on successful boot [ 14.252554375,6] IPMI: setting fw progress sensor 05 to 13 Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: return OPAL_UNSUPPORTED on !sensors_present, make ipmi_sensor_type_present() static in ipmi-sensor.c] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog: Removal of elog_reject_head() from 'opal_kexec_elog_notify' routineMukesh Ojha1-1/+0
elog_reject_head() routine makes the state 'elog_read_from_fsp_head_state' either 'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE' depending on the current state of 'elog_read_from_fsp_head_state'. We can remove this elog_reject_head() from 'opal_kexec_elog_notify()' as just after that it is called inside 'fsp_opal_resend_pending_logs()'. So, it is redundant inside opal_kexec_elog_notify() routine. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog: Remove the elog enable check from 'fsp_elog_check_and_fetch_head'Mukesh Ojha1-3/+0
We use 'elog_enabled' flag to check whether host OS is ready to receive error log or not. This is nothing to do with reading error log from service processor. This patch is to remove the check and keep this 'elog_enabled' free from FSP specific code and move it into core/errorlog.c in later upcoming patches. With this changes, in some corner cases we may endup reading same error log twice from FSP. It happens as we call 'elog_reject_head' inside 'fsp_opal_resend_pending_logs' which makes the state either 'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE'. So, a call to 'fsp_elog_check_and_fetch_head' routine ends up reading the error log from FSP which was already read. This case happens twice in a reboot as whenever 'fsp_opal_resend_pending_logs' gets called. So, we can ignore it. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog : Modification as per coding guidelines to make the code more legibleMukesh Ojha3-173/+188
Some modifications related to typo errors, alignment, case letter mismatch to add more clarity to the code. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25fast-reboot: disable on FSP code update or unrecoverable HMIStewart Smith1-0/+3
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [stewart@linux.vnet.ibm.com: unlock before return (suggested by Mahesh/Andrew), disable only on non-cancelling fsp codeupdate call (suggested by Vasant)] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-24pci: Remove obsoleted PCI slot pfreset() operationGavin Shan4-39/+30
PCI slot pfreset() operation is obsoleted as nobody uses it. This removes it and the related PCI slot states. No functional changes introduced. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17hw/phb3: Override root slot's prepare_link_change() with PHB'sGavin Shan1-0/+8
For PCI slot behind root port, its prepare_link_change() should be same to PHB's. Otherwise, the UTL events cannot be masked when the slot is reseted, leading to EEH error because of UTL link-down event. Cc: stable # 5.3.0+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17hw/phb3: Disable surprise link down event on PCI slotsGavin Shan1-8/+49
This masks surprise link down event on RC or downtream ports if the PCI slots behind them support PCI surprise hotplug. The event should be handled by PCI hotplug driver instead of EEH subsystem. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17Fast reboot for P8Benjamin Herrenschmidt5-62/+61
This is an experimental patch that implements "Fast reboot" on P8 machines. The basic idea is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader. For Trusted Boot, this means we *add* measurements to the TPM, so you will get *different* PCR values as compared to a full IPL. This makes sense as if you want to be sure you are running something known then, well, do a full IPL as soft reset should never be trusted to clear any malicious code. This is very experimental and needs a lot of testing and also auditing code for other bits of HW that might need to be cleaned up. BenH TODO: I also need to check if we are properly PERST'ing PCI devices. This is partially based on old code I had to do that on P7. I only support it on P8 though as there are issues with the PSI interrupts on P7 that cannot be reliably solved. Even though this should be considered somewhat experimental, we've had a lot of success on a variety of machines. Dozens/hundreds of reboots across Tuleta, Garrison and Habanero. Currently, we've hidden it behind a NVRAM config option, which *is* liable to change in the future (to ensure that only those who know what they're doing enable it) You can enable the experimental support via nvram option: nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [stewart@linux.vnet.ibm.com: hide behind nvram option, include Mambo fixes from Mikey] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-12pci: Avoid hot resets at boot timeRussell Currey3-12/+4
In the PCI post-fundamental reset code, a hot reset is performed at the end. This is causing issues at boot time as a reset signal is being sent downstream before the links are up, which is causing issues on adapters behind switches. No errors result in skiboot, but the adapters are not usable in Linux as a result. Hot resets also occur in the FSP platform-specific code for conventional PCI slots, which could cause issues. This patch fixes some adapters not being configurable in Linux on some systems. The issue was not present in skiboot 5.2.x. Cc: stable # 5.3.x Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-11hw/phb3: Disable ECRC on Broadcom adapter behind PMC switchGavin Shan1-2/+24
The ECRC generation and check can't be enabled on Broadcom's NIC (14e4:168a) when it seats behind PMC PCIe switch downstream port (11f8:8546). Otherwise, the NIC's config space can not be accessed and returns 0xFF's on read because of EEH error even after the error is cleared. The issue is reported from Firestone. This disables ECRC generation and check on Broadcom's NIC when it seats behind PMC PCIe switch downstream port. With this applied, the NIC can be detected successfully. Reported-by: Li Meng <shlimeng@cn.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: add description of device workaround is for] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-11Revert "hw/phb3.c: adjust offset to run CAPP containers"Stewart Smith1-22/+2
This reverts commit cf39c2a7dd1a2ee9b19a5490f7fa25690b8e8ae3. Fixes: cf39c2a7dd1a2ee9b19a5490f7fa25690b8e8ae3 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-11Revert "hw/phb3.c: preload the whole CAPP partition"Stewart Smith1-2/+2
We should use the API properly. This reverts commit 0657bccb778cbe71fc8c00879826ca0217b7010d. Fixes: 0657bccb778cbe71fc8c00879826ca0217b7010d Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-10hw/phb3.c: adjust offset to run CAPP containersClaudio Carvalho1-2/+22
This adjusts the CAPP header offset if CAPP is a secure boot container. Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-10hw/phb3.c: preload the whole CAPP partitionClaudio Carvalho1-2/+2
This change preloads the whole CAPP partition. We decided to build a container for the whole CAPP lid as opposed to have one for the TOC and one for each subpartition. Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-26occ/prd/opal-prd: Queue OCC_RESET event message to host in OpenPOWERShilpasri G Bhat2-23/+37
During an OCC reset cycle the system is forced to Psafe pstate. When OCC becomes active, the system has to be restored to its last pstate as requested by host. So host needs to be notified of OCC_RESET event or else system will continue to remian in Psafe state until host requests a new pstate after the OCC reset cycle. This patch defines 'OPAL_PRD_MSG_TYPE_OCC_RESET_NOTIFY' to notify OPAL when opal-prd issues OCC reset. OPAL will queue OCC_RESET message to host when it receives opal_prd_msg of type '*_OCC_RESET_NOTIFY'. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22fsp/console: Allocate irq for each hvc consoleSam Mendoza-Jonas1-5/+21
Allocate an irq number for each hvc console and set its interrupt-parent property so that Linux can use the opal irqchip instead of the OPAL_EVENT_CONSOLE_INPUT interface. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-17SLW: Actually print the register dump only to memoryStewart Smith1-1/+1
Fixes: 81154ba9b2d418cd5f9eda3a6f89ca6631556510 Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-16fsi-master: Whitespace cleanupsBenjamin Herrenschmidt1-5/+5
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>