aboutsummaryrefslogtreecommitdiff
path: root/hw/fsp
AgeCommit message (Collapse)AuthorFilesLines
2020-10-22FSP/NVRAM: Do not assert in vNVRAM statistics callVasant Hegde1-2/+1
[ Upstream commit 9ca8bf1bde56330075634bd3cb601d0f6ee90514 ] `msg` is valid pointer here. I don't recall why I added assert here :-( This is not correct. We shouldn't call assert here. Also we are not using `msg`. Hence convert it to `__unused`. Fixes: 19d4f98e ('FSP/NVRAM: Handle "get vNVRAM statistics" command') Cc: skiboot-stable@lists.ozlabs.org # v5.4.x + Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-11-30FSP/IPMI: Handle FSP reset reloadVasant Hegde1-0/+34
[ Upstream commit 2a63db6511b63a75efe820f90bb7972afc2fcdef ] FSP IPMI driver serializes ipmi messages. It sends message to FSP and waits for response before sending new message. It works fine as long as we get response from FSP on time. If we have inflight ipmi message during FSP R/R, we will not get resonse from FSP. So if we initiate inband FSP R/R then all subsequent inband ipmi message gets blocked. Sequence: - ipmitool mc reset cold - <FSP R/R complete> - ipmitool <any command> <-- gets blocked This patch clears inflight ipmi messages after FSP R/R complete. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2017-10-31FSP/CONSOLE: Disable notification on unresponsive consolesVasant Hegde1-3/+5
Commit fd6b71fc fixed the situation where ipmi console was open (hvc0) but got data on different console (hvc1). During FSP R/R OPAL closes all consoles. After R/R complete FSP requests to open hvc1 and sends data on this. If hvc1 registration failed or not opened in host kernel then it will not read data and results in RCU stalls. Note that this is workaround for older kernel where we don't have separate irq for each console. Latest kernel works fine without this patch. CC: stable CC: Sam Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c9cc5ef5772ebfbfff978f1c25763d733e45752c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Limit number of error loggingVasant Hegde1-8/+13
Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again. This patch fixes it by generating error log only once. I think this is enough to indicate something went wrong. Also with previous patch, once console buffer is full, OPAL is returning error to payload from fsp_console_write_buffer_space(). So payload will never call fsp_console_write(). Hence move error logging logic to right place. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d798c276b4da7559970702c2f31b12549c92741e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Fix fsp_console_write_buffer_space() callVasant Hegde1-1/+35
Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data. In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls. This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: reset timeout in fsp_console_write() path] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 6557a728385c3fbcef297d2c61c7d93bc539f8bb) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Close SOL session during R/RVasant Hegde1-3/+0
Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running `top` command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel. kernel call trace: ------------------ [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769) [ 2082.828365] Task dump for CPU 32: [ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884 [ 2082.828375] Workqueue: events dump_work_fn [ 2082.828376] Call Trace: [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable) [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0 [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98 [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250 [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0 [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0 [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130 [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68 Hence lets close SOL (and FW console) during FSP R/R. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9d1755179112071652bb4a317f9006da630ce25d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Do not associate unavailable consoleVasant Hegde1-0/+11
Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message. OPAL log: ------- [ 5013.227994012,7] FSP: Reassociating HVSI console 1 [ 5013.227997540,7] FSP: Reassociating HVSI console 2 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 175e406ac29435017c5bd08b2c45d93b2c5a7669) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP: Disable PSI link whenever FSP tells OPAL about impending R/RVasant Hegde1-17/+8
Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well. Hence disable PSI link as soon as we detect FSP impending R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a4788a49f004a91bb8ca015336abf9ae119fbc52) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11GCC7: fixes for -Wimplicit-fallthrough expected regexesStewart Smith1-1/+1
It turns out GCC7 adds a useful warning and does fancy things like parsing your comments to work out that you intended to do the fallthrough. There's a few places where we don't match the regex. Fix them, as it's harmless to do so. Found by building on Fedora Rawhide in Travis. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e58aeb1ca304835d65beed98db7f118d3c154cd4) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/NVRAM: Handle "get vNVRAM statistics" commandVasant Hegde1-0/+41
FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD. Sample OPAL log: [16944.384670488,3] FSP: Unhandled message eb0500 [16944.474110465,3] FSP: Unhandled message eb0500 [16945.111280784,3] FSP: Unhandled message eb0500 [16945.293393485,3] FSP: Unhandled message eb0500 With this patch, I don't think FSP will ever call "free vNVRAM" MBOX command. But to be safer side lets return FSP_STATUS_INVALID_SUBCMD for this MBOX command as well. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 19d4f98e9483e4c1cae1d5a59491d8ab4f9a6e7f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-21FSP: Add check to detect FSP R/R inside fsp_sync_msg()Vasant Hegde1-2/+11
OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue including inflight message (fsp_reset_cmdclass()). But we are not resetting inflight message state. In extreme croner case where we sent message to FSP via fsp_sync_msg() path and FSP R/R happens before getting respose from FSP, then we will endup waiting in fsp_sync_msg() until everything becomes normal. This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller if FSP is in R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c74e88e8614de0a82cba5c30812d5aa39db747a9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CONSOLE: Workaround for unresponsive ipmi daemonVasant Hegde1-1/+17
We use TCE mapped area to write data to console. Console header (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer). Kernel makes opal_console_write() OPAL call to write data to console. OPAL write data to TCE mapped area and sends MBOX command to FSP. If our console becomes full and we have data to write to console, we keep on waiting until FSP reads data. In some corner cases, where FSP is active but not responding to console MBOX message (due to buggy IPMI) and we have heavy console write happening from kernel, then eventually our console buffer becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to kernel. Kernel will keep on retrying. This is creating kernel soft lockups. In some extreme case when every CPU is trying to write to console, user will not be able to ssh and thinks system is hang. If we reset FSP or restart IPMI daemon on FSP, system recovers and everything becomes normal. This patch adds workaround to above issue by returning OPAL_HARDWARE when cosole is full. Side effect of this patch is, we may endup dropping latest console data. But better to drop console data than system hang. Alternative approach is to drop old data from console buffer, make space for new data. But in normal condition only FSP can update 'next_out' pointer and if we touch that pointer, it may introduce some other race conditions. Hence we decided to just new console write request. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c8a7535f3539c79955645e6b3714b367a994b1e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Set status field in response message for timed out messageVasant Hegde1-1/+4
For timed out FSP messages, we set message status as "fsp_msg_timeout". But most FSP driver users (like surviellance) are ignoring this field. They always look for FSP returned status value in callback function (second byte in word1). So we endup treating timed out message as success response from FSP. Sample output: [69902.432509048,7] SURV: Sending the heartbeat command to FSP [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3 .... [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP [70023.226903251,3] FSP: fsp_trigger_reset() entry Here SURV code thought it got valid response from FSP. But actually we didn't receive response from FSP. This patch fixes above issue by updating status field in response structure. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4cef4d8d6000936b1a4e1065bf69ee2edd3fcc1f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Improve timeout messageVasant Hegde1-4/+5
Presently we print word0 and word1 in error log. word0 contains sequence number and command class. One has to understand word0 format to identify command class. Lets explicitly print command class, sub command etc. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 807a3acc8fd66af1e1c6e7154aa5029c9b91bb3b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Remove local fsp_in_reset variableVasant Hegde1-10/+0
Now that we are using fsp_in_rr() to detect FSP reset/reload, fsp_in_reset become redundant. Lets remove this local variable. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a34369631e6d85c26966eb0b8d5e4c44bcf96c7c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Fix possible FSP R/R issue in rtc write pathVasant Hegde1-9/+11
fsp_opal_rtc_write() checks FSP status before queueing message to FSP. But if FSP R/R starts before getting response to queued message then we will continue to return OPAL_BUSY_EVENT to host. In some extreme condition host may experience hang. Once FSP is back we will repost message, get response from FSP and return OPAL_SUCCES to host. This patch caches new values and returns OPAL_SUCCESS if FSP R/R is happening. And once FSP is back we will send cached value to FSP. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f4757fbfcf616365c74b1aa6508b2ab27480cdd0) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14hw/fsp/rtc: read/write cached rtc tod on fsp hir.ppaidipe@linux.vnet.ibm.com1-2/+2
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp reset. Use latest fsp_in_rr() function to properly read the cached rtc value when fsp reset initiated by the hir. Below is the kernel trace when we set hw clock, when hir process starts. [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688] [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000 [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70 [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic) [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 1727.775889] CR: 28024442 XER: 20000000 [ 1727.775890] CFAR: c00000000008472c SOFTE: 1 GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4 GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000 GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003 GPR12: c0000000000846e8 c00000000fba0100 [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48 [ 1727.775899] Call Trace: [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable) [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0 [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630 [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0 [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0 [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0 [ 1727.775908] Instruction dump: [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020 [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4 This is found when executing the testcase https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py With this fix ran fsp hir torture testcase in the above test which is working fine. Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 447ccc4de529f001271fd4dfd78401bc4c90832e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CHIPTOD: Return false in error pathVasant Hegde1-0/+1
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 740d00b1036188c6e248418fb0a13faf14723e7a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-09FSP: Notify FSP of Platform Log ID after Host Initiated Reset ReloadStewart Smith2-22/+50
Trigging a Host Initiated Reset (when the host detects the FSP has gone out to lunch and should be rebooted), would cause "Unknown Command" messages to appear in the OPAL log. This patch implements those messages How to trigger FSP RR(HIR): $ putmemproc 300000f8 0x00000000deadbeef s1 k0:n0:s0:p00 ecmd_ppc putmemproc 300000f8 0x00000000deadbeef Log showing unknown command: / # cat /sys/firmware/opal/msglog | grep -i ,3 [ 110.232114723,3] FSP: fsp_trigger_reset() entry [ 188.431793837,3] FSP #0: Link down, starting R&R [ 464.109239162,3] FSP #0: Got XUP with no pending message ! [ 466.340598554,3] FSP-DPO: Unknown command 0xce0900 [ 466.340600126,3] FSP: Unhandled message ce0900 The message we need to handle is "Get PLID after host initiated FipS reset/reload". When the FSP comes back from HIR, it asks "hey, so, which error log explains why you rebooted me?". So, we tell it. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f3a5741408a11be6992cf8779f2eae10b08c020a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-03-16hw/fsp: Do not queue SP and SPCN class messages during reset/reloadAnanth N Mavinakayanahalli3-0/+31
During FSP R/R, the FSP is inaccessible and will lose state. Messages to the FSP are generally queued for sending later. It does seem like the FSP fails to process any subseuqent messages of certain classes (SP info -- ipmi) if it receives queued mbox messages it isn't expecting. In certain other cases (sensors), the FSP driver returns a default code (async completion) even though there is no known bound from the time of this error return to the actual data being available. The kernel driver keeps waiting leading to soft-lockup on the host side. Mitigate both these (known) cases by returning OPAL_BUSY so the host driver knows to retry later. With this change, the sensors command works fine when the FSP comes back. This version also resolves the remaining IPMI issues Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4940b8148640c06e139aec8c6d0370af7dd3b184) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24fsp: Don't recurse pollers in ibm_fsp_terminateStewart Smith2-1/+38
If we were to terminate in a poller, we'd call op_display() which called pollers which hit the recursive poller warning, which ended in not much fun at all. This patch will skip the running of pollers and instead run the FSP poller to set the op-panel display before attn. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9fcb109218b1374a8caa3cac62e83fbedb1f7f2f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog: Removal of elog_reject_head() from 'opal_kexec_elog_notify' routineMukesh Ojha1-1/+0
elog_reject_head() routine makes the state 'elog_read_from_fsp_head_state' either 'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE' depending on the current state of 'elog_read_from_fsp_head_state'. We can remove this elog_reject_head() from 'opal_kexec_elog_notify()' as just after that it is called inside 'fsp_opal_resend_pending_logs()'. So, it is redundant inside opal_kexec_elog_notify() routine. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog: Remove the elog enable check from 'fsp_elog_check_and_fetch_head'Mukesh Ojha1-3/+0
We use 'elog_enabled' flag to check whether host OS is ready to receive error log or not. This is nothing to do with reading error log from service processor. This patch is to remove the check and keep this 'elog_enabled' free from FSP specific code and move it into core/errorlog.c in later upcoming patches. With this changes, in some corner cases we may endup reading same error log twice from FSP. It happens as we call 'elog_reject_head' inside 'fsp_opal_resend_pending_logs' which makes the state either 'ELOG_STATE_REJECTED' or 'ELOG_STATE_NONE'. So, a call to 'fsp_elog_check_and_fetch_head' routine ends up reading the error log from FSP which was already read. This case happens twice in a reboot as whenever 'fsp_opal_resend_pending_logs' gets called. So, we can ignore it. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25errorlog : Modification as per coding guidelines to make the code more legibleMukesh Ojha2-137/+147
Some modifications related to typo errors, alignment, case letter mismatch to add more clarity to the code. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-25fast-reboot: disable on FSP code update or unrecoverable HMIStewart Smith1-0/+3
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> [stewart@linux.vnet.ibm.com: unlock before return (suggested by Mahesh/Andrew), disable only on non-cancelling fsp codeupdate call (suggested by Vasant)] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17Fast reboot for P8Benjamin Herrenschmidt2-0/+8
This is an experimental patch that implements "Fast reboot" on P8 machines. The basic idea is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader. For Trusted Boot, this means we *add* measurements to the TPM, so you will get *different* PCR values as compared to a full IPL. This makes sense as if you want to be sure you are running something known then, well, do a full IPL as soft reset should never be trusted to clear any malicious code. This is very experimental and needs a lot of testing and also auditing code for other bits of HW that might need to be cleaned up. BenH TODO: I also need to check if we are properly PERST'ing PCI devices. This is partially based on old code I had to do that on P7. I only support it on P8 though as there are issues with the PSI interrupts on P7 that cannot be reliably solved. Even though this should be considered somewhat experimental, we've had a lot of success on a variety of machines. Dozens/hundreds of reboots across Tuleta, Garrison and Habanero. Currently, we've hidden it behind a NVRAM config option, which *is* liable to change in the future (to ensure that only those who know what they're doing enable it) You can enable the experimental support via nvram option: nvram -p ibm,skiboot --update-config experimental-fast-reset=feeling-lucky Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [stewart@linux.vnet.ibm.com: hide behind nvram option, include Mambo fixes from Mikey] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22fsp/console: Allocate irq for each hvc consoleSam Mendoza-Jonas1-5/+21
Allocate an irq number for each hvc console and set its interrupt-parent property so that Linux can use the opal irqchip instead of the OPAL_EVENT_CONSOLE_INPUT interface. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-26FSP/ELOG: Fix elog timeout issueVasant Hegde1-8/+12
Presently we set timeout value as soon as we add elog to queue. If we have multiple elogs to write, it doesn't consider queue wait time. Instead set timeout value when we are actually sending elog to FSP. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-26FSP/ELOG: Remove redundant elog stateVasant Hegde1-2/+2
OPAL gets elog notification from service processor which contains log information. Once we get notification we start reading log data and change elog state to ELOG_STATE_FETCHING. Hence we don't need ELOG_STATE_FETCHED_INFO state. Lets remove this variable. Also in some places we have used this state after sending event information to host. Replace such usage with better state (ELOG_STATE_HOST_INFO). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-11psi: Remove psi->workingBenjamin Herrenschmidt1-2/+0
I was only ever set to true Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10FSP/ELOG: elog_enable flag should be false by defaultMukesh Ojha1-3/+1
This issue is one of the corner case, which is related to recent change went upstream and only observed in the petitboot prompt, where we see only one error log instead of getting all error log in /sys/firmware/opal/elog. Below is snippet of the code, where elog module in the kernel initialised. { .. ... rc = request_threaded_irq(irq, NULL, elog_event, =<======= IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); | if (rc) { | pr_err("%s: Can't request OPAL event irq (%d)\n", | __func__, rc); | return rc; | } | /* We are now ready to pull error logs from opal. */ | if (opal_check_token(OPAL_ELOG_RESEND)) | opal_resend_pending_logs(); =<======= } Scenario: While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from OPAL, whenever it has error logs that are waiting to be fetched from the kernel. Race occurs between the code arrowed above, as soon as kernel registers error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on the error log set the state from ELOG_STATE_FETCHED_DATA to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During the same time 'opal_resend_pending_logs'(kernel) call which will set the state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL. Because of that, read call from the kernel, which was to be made after the 'opal_get_elog_size' ends up failing. But, the elog kobject was created for the particular error log. Further in the resend routine in the OPAL, we make opal_commit_elog_in_host() call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes 'opal_get_elog_size' which results in getting the error log info of the same error log which was fetched earlier. It also changes the state machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. Below is the snippet from the elog_event registered handler call { ... ... /* we may get notified twice, let's handle * that gracefully and not create two conflicting * entries. */ if (kset_find_obj(elog_kset, name)) return IRQ_HANDLED; ... ... } In the kernel, we search kobject for the error log whether it already exist. So kobject is found and it returns without reading error log data. So, this patch makes the flag which was true during initialisation to false. And that solves the race. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28FSP/MDST: Fix TCE alignment issueVasant Hegde1-1/+1
We have used TCE_MASK value (4095) instead of TCE_PSIZE (4096) to align memory source address. In some corner cases (like source memory size = 4097) we may endup doing wrong mapping and corrupting part of SYSDUMP. This patch uses ALIGN_UP macro with TCE_PSIZE value for alignining memory. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix OPAL generated elog resend logicVasant Hegde1-5/+0
Fix resend logic in opal_resend_pending_logs, so that it actually restarts sending remaining logs. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix possible event notifier hangsVasant Hegde2-3/+18
In some corner cases host may send acknowledgement without reading actual data (fsp_opal_elog_info -> fsp_opal_elog_ack). Because of this elog_read_from_fsp_head_state may be stuck in wrong state (ELOG_STATE_HOST_INFO) and not able to send remaining ELOG's to host. Hence reset ELOG state and start sending remaining ELOG's. Also in normal case we will ACK the logs which are already processed (elog_read_processed). Hence rearrange the code such that we go through elog_read_processed first. Finally return OPAL_PARAMETER if we are not able to find ELOG ID. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: spelling fix] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification if list is not consistentVasant Hegde1-0/+2
Chances of elog_read_pending inconsistent state is very very less. Just to be on safer side, disable notification if list is not in consistent state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Improve elog event statesVasant Hegde1-1/+2
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). Ideally we should disable notification as soon as host consumes event (after fsp_opal_elog_info). Also if host fails to call fsp_opal_elog_read (ex: situations like duplicate event), then we endup keeping notification forever. This patch introduces new ELOG state (ELOG_STATE_HOST_INFO). As soon as host consumes event elog will move to this new state so that event notification is disabled. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix OPAL generated elog event notificationVasant Hegde2-18/+32
We use elog notifier to notify logs from multiple sources (FSP generated logs - fsp-elog-read.c and OPAL generated logs - fsp-elog-write.c). OPAL generated logs sets elog event bit whenever it has new logs to send to host. But it relies on fsp-elog-read.c to disable the event bit..which is wrong! This patch creates common function to enable/disable event notification. It will enable event notification if any of the source is ready to send error log to host and disables notification once it completes sending all errors to host. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification during kexecVasant Hegde1-13/+33
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). In some corner cases like kexec, host may endup reading same ELOG id twice (calling fsp_opal_elog_info twice because of resend request). Host finds it as duplicate and it will not read actual log (fsp_opal_elog_read()). In such situations we fails to disable event notification :-( Scenario : OPAL Host ------------------------------------- OPAL_EVENT_ELOG_AVAIL --> kexec OPAL_EVENT_ELOG_AVAIL --> elog client registered <-- read ELOG (id=x) <-- resend elog (opal_resend_pending_logs()) resend all ELOG --> read ELOG (id=x) -- Duplicate ELOG ! bhoom!! kernel call trace: ------------------ [ 28.055923] CPU: 10 PID: 20 Comm: irq/29-opal-elo Not tainted 4.4.0-24-generic #43-Ubuntu [ 28.056012] task: c0000000ef982a20 ti: c0000000efa38000 task.ti: c0000000efa38000 [ 28.056100] NIP: c000000008010a24 LR: c000000008010a24 CTR: 0000000030033758 [ 28.056188] REGS: c0000000efa3b9c0 TRAP: 0901 Not tainted (4.4.0-24-generic) [ 28.056274] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22000844 XER: 20000000 [ 28.056499] CFAR: c000000008009958 SOFTE: 1 GPR00: c000000008131e8c c0000000efa3bc40 c0000000095b4200 0000000000000900 GPR04: c0000000094a63c8 0000000000000001 9000000100009033 0000000000000062 GPR08: 0000000000000000 0000000000000000 c0000000ef960400 9000000100001003 GPR12: c00000000806de48 c00000000fb45f00 [ 28.057042] NIP [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057117] LR [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057189] Call Trace: [ 28.057221] [c0000000efa3bc40] [c0000000f108a980] 0xc0000000f108a980 (unreliable) [ 28.057326] [c0000000efa3bc60] [c000000008131e8c] irq_finalize_oneshot.part.2+0xbc/0x250 [ 28.057429] [c0000000efa3bcb0] [c000000008132170] irq_thread_fn+0x80/0xa0 [ 28.057519] [c0000000efa3bcf0] [c00000000813263c] irq_thread+0x1ac/0x280 [ 28.057609] [c0000000efa3bd80] [c0000000080e61e0] kthread+0x110/0x130 [ 28.057698] [c0000000efa3be30] [c000000008009538] ret_from_kernel_thread+0x5c/0xa4 [ 28.057799] Instruction dump: [ 28.057844] 994d02ca 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 [ 28.057995] 7c0803a6 4e800020 60420000 4bff17ad <60000000> 4bffffe4 60420000 e92d0020 This patch adds kexec notifier client. It will disable event notification during kexec. Once host is ready to receive ELOG's again it will call fsp_opal_resend_pending_logs(). This call re-enables ELOG notication. It will fix above issue. I will add follow up patch to improve event state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-15fsp/console: Ignore data on unresponsive consolesSam Mendoza-Jonas1-2/+16
Linux kernels from v4.1 onwards will try to request an irq for each hvc console using OPAL_EVENT_CONSOLE_INPUT, however because the IRQF_SHARED flag is not set any console after the first will fail. If there is data on one of these failed consoles OPAL will set OPAL_EVENT_CONSOLE_INPUT every time fsp_console_read is called, leading to RCU stalls in the kernel. As a workaround for unpatched kernels, cease setting OPAL_EVENT_CONSOLE_INPUT for consoles that we have noticed are not being read. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-14FSP/ELOG: Remove redundant validationVasant Hegde1-2/+0
We don't need to validate msg->resp message as its always allocated. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-11FSP: Validate fsp_msg response memory allocationVasant Hegde1-1/+7
fsp_allocmsg() returns true even if msg->resp memory allocation fails. Validate msg->resp memory allocation as well. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-30fsp/op-panel: Fix out of bounds array access and #define display dimensionsSuraj Jitindar Singh1-17/+13
In the function __opal_write_oppanel() coverity complains about an out of bounds array access. While the pointer is never actually dereferenced, this isn't immediately obvious from the code. Additionally the number and length of the lines on the operator panel display are hard coded into the function. While we are here we might as well move these into a #define statement. Rework the code in __opal_write_oppanel() where the message is copied into the buffer so that coverity won't complain about an out of bounds array access and so that it is line number and length agnostic (now relying on the #defined values). Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-surveillance: add FWTS annotation on failing to reply to heartbeatStewart Smith1-1/+8
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-leds: Add FWTS annotations for some errorsStewart Smith1-0/+34
Only the major ones at this point in time. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-elog: add FWTS annotations for several errorsStewart Smith2-0/+29
These errors are essentially assert()s - something has gone wrong and it's likely because of a bug somewhere. Things we should *never* it regards to inconsistency, so have FWTS throw warnings on them. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-sensor: add FWTS annotation for already existing sensor nodeStewart Smith1-0/+7
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-epow: add FWTS annotationStewart Smith1-0/+10
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-06-24fsp-elog-write: display error code from FSP on error writing error logStewart Smith1-1/+1
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-05-10Make trigger_attn() enable attn alsoMichael Neuling1-5/+0
This changes trigger_attn() to also enable attn via HID0, so callers don't have to do it themselves. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-04-27fsp: Add CAPP lid definition for NaplesPhilippe Bergheaud1-0/+2
Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>