aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-05-28skiboot 5.4.10 release notesskiboot-5.4.10Stewart Smith1-0/+58
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-28opal-prd: Do not error out on first failure for soft/hard offline.Mahesh Salgaonkar1-3/+3
The memory errors (CEs and UEs) that are detected as part of background memory scrubbing are reported by PRD asynchronously to opal-prd along with affected memory ranges. hservice_memory_error() converts these ranges into page granularity before hooking up them to soft/hard offline-ing infrastructure. But the current implementation of hservice_memory_error() does not hookup all the pages to soft/hard offline-ing if any of the page offline action fails. e.g hard offline can fail for: - Pages that are not part of buddy managed pool. - Pages that are reserved by kernel using memblock_reserved() - Pages that are in use by kernel. But for the pages that are in use by user space application, the hard offline marks the page as hwpoison, sends SIGBUS signal to kill the affected application as recovery action and returns success. Hence, It is possible that some of the pages in that memory range are in use by application or free. By stopping on first error we loose the opportunity to hwpoison the subsequent pages which may be free or in use by application. This patch fixes this issue. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit e9ee7c7d357160a704c8248a1787124f94df8c54) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29OPAL_PCI_SET_POWER_STATE: fix locking in error pathsStewart Smith1-4/+12
Otherwise we could exit OPAL holding locks, potentially leading to all sorts of problems later on. Cc: stable # 5.3+ Fixes: 7a3e2c4ee3aa0 Signed-off-by: Stewart Smith <stewart@linux.ibm.com> (cherry picked from commit a22ba4576ad35dccb86622e71442794d09e62bce) Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-01-05external/test: make stripping out version number more robustStewart Smith1-2/+2
For some bizarre reason, Travis started failing on this substitution when there'd been zero code changes in this area... This at least papers over whatever the problem is for the time being. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 41f51c834a1be508ca2e7446fe8fa6abc3af473c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-05Merge skiboot-5.4.9 into 5.4.xStewart Smith7-5/+158
2018-01-05hdata: Parse IPL FW feature settingsskiboot-5.4.9Oliver O'Halloran2-0/+54
Add parsing for the firmware feature flags in the HDAT. This indicates the settings of various parameters which are set at IPL time by firmware. Cc: stable # 5.4.x 371e88e23662 eeba2d64fb7a 0abc3af7e8f6 Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4e23b42d2ad76da21422a1d2de471df29f76b8df) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-05hdata: Add an idata array iteratorOliver O'Halloran2-0/+88
Adds HDIF_get_iarray() which retrieves and validates an internal array header and HDIF_iarray_for_each() for walking the individual array entries. This reduces the amount of get-then-check boilerplate that we have with the existing HDIF_get_iarray_item() method for iterating internal data arrays. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 0abc3af7e8f607aa2fe6bffda9bc072e86126bc9) [stewart: include fix for backtrace()] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-05core/opal: check ibm,opal existsOliver O'Halloran1-5/+1
The ibm,opal node is normally created by Skiboot either in the HDAT parser or after the input FDT has been unflattened. However, in order to supply the /ibm,opal/power-mgt/enabled-stop-states property FDT we to tolerate /ibm,opal/ existing in the input tree. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit eeba2d64fb7ac929283ed4611ca04209278eb777) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-05dt: add dt_new_check()Oliver O'Halloran2-0/+15
This is similar to dt_new(), but if the node already exists it will return the existing node. This is useful because some init code depends on the presence of certain nodes, but where the node is actually created is unimportant. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 371e88e2366215b5a033c2cedaf7486d1e66914d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-20p8-i2c: Limit number of retry attemptsOliver O'Halloran2-2/+11
Current we will attempt to start an I2C transaction until it succeeds. In the event that the OCC does not release the lock on an I2C bus this results in an async token being held forever and the kernel thread that started the transaction will block forever while waiting for an async completion message. Fix this by limiting the number of attempts to start the transaction. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c2e404aedd52da91fdf605e24b9d1ae7894974c5) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-31FSP/CONSOLE: Disable notification on unresponsive consolesVasant Hegde1-3/+5
Commit fd6b71fc fixed the situation where ipmi console was open (hvc0) but got data on different console (hvc1). During FSP R/R OPAL closes all consoles. After R/R complete FSP requests to open hvc1 and sends data on this. If hvc1 registration failed or not opened in host kernel then it will not read data and results in RCU stalls. Note that this is workaround for older kernel where we don't have separate irq for each console. Latest kernel works fine without this patch. CC: stable CC: Sam Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c9cc5ef5772ebfbfff978f1c25763d733e45752c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11skiboot 5.4.8 release notesskiboot-5.4.8Stewart Smith1-0/+158
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Limit number of error loggingVasant Hegde1-8/+13
Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again. This patch fixes it by generating error log only once. I think this is enough to indicate something went wrong. Also with previous patch, once console buffer is full, OPAL is returning error to payload from fsp_console_write_buffer_space(). So payload will never call fsp_console_write(). Hence move error logging logic to right place. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d798c276b4da7559970702c2f31b12549c92741e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Fix fsp_console_write_buffer_space() callVasant Hegde1-1/+35
Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data. In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls. This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: reset timeout in fsp_console_write() path] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 6557a728385c3fbcef297d2c61c7d93bc539f8bb) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Close SOL session during R/RVasant Hegde1-3/+0
Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running `top` command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel. kernel call trace: ------------------ [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769) [ 2082.828365] Task dump for CPU 32: [ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884 [ 2082.828375] Workqueue: events dump_work_fn [ 2082.828376] Call Trace: [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable) [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0 [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98 [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250 [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0 [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0 [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130 [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68 Hence lets close SOL (and FW console) during FSP R/R. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9d1755179112071652bb4a317f9006da630ce25d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Do not associate unavailable consoleVasant Hegde1-0/+11
Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message. OPAL log: ------- [ 5013.227994012,7] FSP: Reassociating HVSI console 1 [ 5013.227997540,7] FSP: Reassociating HVSI console 2 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 175e406ac29435017c5bd08b2c45d93b2c5a7669) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP: Disable PSI link whenever FSP tells OPAL about impending R/RVasant Hegde2-18/+8
Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well. Hence disable PSI link as soon as we detect FSP impending R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a4788a49f004a91bb8ca015336abf9ae119fbc52) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOTStewart Smith1-2/+2
See 696d378d7b7295366e115e89a785640bf72a5043 for all the details. Suggested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 296ca2acf00fde764e3ea6436f46468c6caef174) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORMStewart Smith3-5/+21
We had a race condition between FSP Reset/Reload and powering down the system from the host: Roughly: FSP Host --- ---- Power on Power on (inject EPOW) (trigger FSP R/R) Processes EPOW event, starts shutting down calls OPAL_CEC_POWER_DOWN (is still in R/R) gets OPAL_INTERNAL_ERROR, spins in opal_poll_events (FSP comes back) spinning in opal_poll_events (thinks host is running) The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload error path for fsp_sync_msg() is to return -1, which means we give the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API docs give us the opportunity to return OPAL_BUSY when trying again later may be successful, and we're ambiguous as to if you should retry on OPAL_INTERNAL_ERROR. For reference, the linux code looks like this: >static void __noreturn pnv_power_off(void) >{ > long rc = OPAL_BUSY; > > pnv_prepare_going_down(); > > while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) { > rc = opal_cec_power_down(0); > if (rc == OPAL_BUSY_EVENT) > opal_poll_events(NULL); > else > mdelay(10); > } > for (;;) > opal_poll_events(NULL); >} Which means that *practically* our only option is to return OPAL_BUSY or OPAL_BUSY_EVENT. We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're running pollers to communicate with the FSP and do the final bits of Reset/Reload handling before we power off the system. Additionally, we really should update our documentation to point all of these return codes and what action an OS should take. CC: stable Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 696d378d7b7295366e115e89a785640bf72a5043) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11libflash/file: Handle short read()s and write()s correctlyCyril Bur1-2/+4
Currently we don't move the buffer along for a short read() or write() and nor do we request only the remaining amount. Fixes: c7c3a4cd53d libflash/file: Add a file access backend to for the blocklevel interface. Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c06ed583d05d8c8b86584b3c4afda71adbd5301a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11libc/stdio/vsnprintf.c: add explicit fallthroughStewart Smith1-0/+1
silences recent GCC warning Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 11cf409be293091470b3f75619416e2bb2697265) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11GCC7: fixes for -Wimplicit-fallthrough expected regexesStewart Smith3-3/+5
It turns out GCC7 adds a useful warning and does fancy things like parsing your comments to work out that you intended to do the fallthrough. There's a few places where we don't match the regex. Fix them, as it's harmless to do so. Found by building on Fedora Rawhide in Travis. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e58aeb1ca304835d65beed98db7f118d3c154cd4) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/NVRAM: Handle "get vNVRAM statistics" commandVasant Hegde1-0/+41
FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD. Sample OPAL log: [16944.384670488,3] FSP: Unhandled message eb0500 [16944.474110465,3] FSP: Unhandled message eb0500 [16945.111280784,3] FSP: Unhandled message eb0500 [16945.293393485,3] FSP: Unhandled message eb0500 With this patch, I don't think FSP will ever call "free vNVRAM" MBOX command. But to be safer side lets return FSP_STATUS_INVALID_SUBCMD for this MBOX command as well. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 19d4f98e9483e4c1cae1d5a59491d8ab4f9a6e7f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19skiboot-5.4.7 release notesskiboot-5.4.7Stewart Smith1-0/+30
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-21FSP: Add check to detect FSP R/R inside fsp_sync_msg()Vasant Hegde1-2/+11
OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue including inflight message (fsp_reset_cmdclass()). But we are not resetting inflight message state. In extreme croner case where we sent message to FSP via fsp_sync_msg() path and FSP R/R happens before getting respose from FSP, then we will endup waiting in fsp_sync_msg() until everything becomes normal. This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller if FSP is in R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c74e88e8614de0a82cba5c30812d5aa39db747a9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-16platforms/ibm-fsp/firenze: Fix PCI slot power-off patternGavin Shan1-2/+2
When powering off the PCI slot, the corresponding bits should be set to 0bxx00xx00 instead of 0bxx11xx11. Otherwise, the specified PCI slot can't be put into power-off state. Fortunately, it didn't introduce any side-effects so far. Cc: stable # 5.3.0+ Fixes: 6884fe63ba1e ("platforms/ibm-fsp: Support PCI slot") Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 72540af036218373a16c4d15dbb0583c46b0b328) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14Add skiboot-5.4.6 release notesskiboot-5.4.6Stewart Smith1-0/+117
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CONSOLE: Workaround for unresponsive ipmi daemonVasant Hegde2-1/+20
We use TCE mapped area to write data to console. Console header (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer). Kernel makes opal_console_write() OPAL call to write data to console. OPAL write data to TCE mapped area and sends MBOX command to FSP. If our console becomes full and we have data to write to console, we keep on waiting until FSP reads data. In some corner cases, where FSP is active but not responding to console MBOX message (due to buggy IPMI) and we have heavy console write happening from kernel, then eventually our console buffer becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to kernel. Kernel will keep on retrying. This is creating kernel soft lockups. In some extreme case when every CPU is trying to write to console, user will not be able to ssh and thinks system is hang. If we reset FSP or restart IPMI daemon on FSP, system recovers and everything becomes normal. This patch adds workaround to above issue by returning OPAL_HARDWARE when cosole is full. Side effect of this patch is, we may endup dropping latest console data. But better to drop console data than system hang. Alternative approach is to drop old data from console buffer, make space for new data. But in normal condition only FSP can update 'next_out' pointer and if we touch that pointer, it may introduce some other race conditions. Hence we decided to just new console write request. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c8a7535f3539c79955645e6b3714b367a994b1e9) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Set status field in response message for timed out messageVasant Hegde1-1/+4
For timed out FSP messages, we set message status as "fsp_msg_timeout". But most FSP driver users (like surviellance) are ignoring this field. They always look for FSP returned status value in callback function (second byte in word1). So we endup treating timed out message as success response from FSP. Sample output: [69902.432509048,7] SURV: Sending the heartbeat command to FSP [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3 .... [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP [70023.226903251,3] FSP: fsp_trigger_reset() entry Here SURV code thought it got valid response from FSP. But actually we didn't receive response from FSP. This patch fixes above issue by updating status field in response structure. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4cef4d8d6000936b1a4e1065bf69ee2edd3fcc1f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP: Improve timeout messageVasant Hegde1-4/+5
Presently we print word0 and word1 in error log. word0 contains sequence number and command class. One has to understand word0 format to identify command class. Lets explicitly print command class, sub command etc. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 807a3acc8fd66af1e1c6e7154aa5029c9b91bb3b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Remove local fsp_in_reset variableVasant Hegde1-10/+0
Now that we are using fsp_in_rr() to detect FSP reset/reload, fsp_in_reset become redundant. Lets remove this local variable. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a34369631e6d85c26966eb0b8d5e4c44bcf96c7c) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/RTC: Fix possible FSP R/R issue in rtc write pathVasant Hegde1-9/+11
fsp_opal_rtc_write() checks FSP status before queueing message to FSP. But if FSP R/R starts before getting response to queued message then we will continue to return OPAL_BUSY_EVENT to host. In some extreme condition host may experience hang. Once FSP is back we will repost message, get response from FSP and return OPAL_SUCCES to host. This patch caches new values and returns OPAL_SUCCESS if FSP R/R is happening. And once FSP is back we will send cached value to FSP. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f4757fbfcf616365c74b1aa6508b2ab27480cdd0) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14hw/fsp/rtc: read/write cached rtc tod on fsp hir.ppaidipe@linux.vnet.ibm.com1-2/+2
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp reset. Use latest fsp_in_rr() function to properly read the cached rtc value when fsp reset initiated by the hir. Below is the kernel trace when we set hw clock, when hir process starts. [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688] [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000 [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70 [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic) [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> [ 1727.775889] CR: 28024442 XER: 20000000 [ 1727.775890] CFAR: c00000000008472c SOFTE: 1 GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4 GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000 GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003 GPR12: c0000000000846e8 c00000000fba0100 [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48 [ 1727.775899] Call Trace: [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable) [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0 [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630 [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0 [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0 [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0 [ 1727.775908] Instruction dump: [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020 [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4 This is found when executing the testcase https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py With this fix ran fsp hir torture testcase in the above test which is working fine. Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 447ccc4de529f001271fd4dfd78401bc4c90832e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-14FSP/CHIPTOD: Return false in error pathVasant Hegde1-0/+1
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 740d00b1036188c6e248418fb0a13faf14723e7a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-09Add skiboot-5.4.5 release notesskiboot-5.4.5Stewart Smith1-0/+56
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-09FSP: Notify FSP of Platform Log ID after Host Initiated Reset ReloadStewart Smith6-33/+83
Trigging a Host Initiated Reset (when the host detects the FSP has gone out to lunch and should be rebooted), would cause "Unknown Command" messages to appear in the OPAL log. This patch implements those messages How to trigger FSP RR(HIR): $ putmemproc 300000f8 0x00000000deadbeef s1 k0:n0:s0:p00 ecmd_ppc putmemproc 300000f8 0x00000000deadbeef Log showing unknown command: / # cat /sys/firmware/opal/msglog | grep -i ,3 [ 110.232114723,3] FSP: fsp_trigger_reset() entry [ 188.431793837,3] FSP #0: Link down, starting R&R [ 464.109239162,3] FSP #0: Got XUP with no pending message ! [ 466.340598554,3] FSP-DPO: Unknown command 0xce0900 [ 466.340600126,3] FSP: Unhandled message ce0900 The message we need to handle is "Get PLID after host initiated FipS reset/reload". When the FSP comes back from HIR, it asks "hey, so, which error log explains why you rebooted me?". So, we tell it. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f3a5741408a11be6992cf8779f2eae10b08c020a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-19hw/i2c: Fix early lock dropOliver O'Halloran1-2/+0
When interacting with an I2C master the p8-i2c driver (common to p9) aquires a per-master lock which it holds for the duration of it's interaction with the master. Unfortunately, when p8_i2c_check_initial_status() detects that the master is busy with another transaction it drops the lock and returns OPAL_BUSY. This is contrary to the driver's locking strategy which requires that the caller aquire and drop the lock. This leads to a crash due to the double unlock(), which skiboot treats as fatal. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit bb192fd55ffb20d619101c5e3e1f4fd24f844d11) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-08head.S: store LR rather than CTR when trying to store LRStewart Smith1-1/+1
Long existing typo of r5 rather than r6, meaning we were storing CTR instead of LR. Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d55194c5d9ada77eee2c9a69814708304f34d334) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-08head.S: store all of LR and CTROliver O'Halloran1-2/+2
When saving the CTR and LR registers the skiboot exception handlers use the 'stw' instruction which only saves the lower 32 bits of the register. Given these are both 64 bit registers this leads to some strange register dumps, for example: *********************************************** Unexpected exception 200 ! SRR0 : 0000000030016968 SRR1 : 9000000000201000 HSRR0: 0000000000000180 HSRR1: 9000000000001000 LR : 3003438830823f50 CTR : 3003438800000018 CFAR : 00000000300168fc CR : 40004208 XER: 00000000 In this dump the upper 32 bits of LR and CTR are actually stack gunk which obscures the underlying issue. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 70bc370883330c8b1076555c126647a3cdf88706) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-03Skiboot 5.4.4 release notesskiboot-5.4.4Stewart Smith1-0/+76
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-03-16hw/fsp: Do not queue SP and SPCN class messages during reset/reloadAnanth N Mavinakayanahalli4-0/+32
During FSP R/R, the FSP is inaccessible and will lose state. Messages to the FSP are generally queued for sending later. It does seem like the FSP fails to process any subseuqent messages of certain classes (SP info -- ipmi) if it receives queued mbox messages it isn't expecting. In certain other cases (sensors), the FSP driver returns a default code (async completion) even though there is no known bound from the time of this error return to the actual data being available. The kernel driver keeps waiting leading to soft-lockup on the host side. Mitigate both these (known) cases by returning OPAL_BUSY so the host driver knows to retry later. With this change, the sensors command works fine when the FSP comes back. This version also resolves the remaining IPMI issues Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4940b8148640c06e139aec8c6d0370af7dd3b184) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-02-22core/pci: Fix PCIe slot's presenceGavin Shan2-1/+4
According to PCIe spec, the presence bit is hardcoded to 1 if PCIe switch downstream port doesn't support slot capability. The register used for the check in pcie_slot_get_presence_state() is wrong. It should be PCIe capability register instead of PCIe slot capability register. Otherwise, we always have present bit on the PCI topology. The issue is found on Supermicro's p8dtu2u machine: # lspci -t -+-[0022:00]---00.0-[01-08]----00.0-[02-08]--+-01.0-[03]----00.0 | \-02.0-[04-08]-- # cat /sys/bus/pci/slots/S002204/adapter 1 # lspci -vvs 0022:02:02.0 # lspci -vvs 0022:02:02.0 0022:02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, \ 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab) (prog-if 00 [Normal decode]) : Capabilities: [68] Express (v2) Downstream Port (Slot+), MSI 00 : SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock- Changed: MRL- PresDet- LinkState- This fixes the issue by checking the correct register (PCIe capability). Also, the register's value is cached in advance as we did for slot and link capability. Fixes: bc66fb67aee ("core/pci: Support PCI slot") Cc: stable # 5.3.0+ Signed-off-by: Gavin Shan <gwhsan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9e3c7ee4086fc9123134209aebcecd0c1f95e2ca) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-02-22core/pci: More reliable way to update PCI slot power stateGavin Shan1-1/+1
The power control bit (SLOT_CTL, offset: PCIe cap + 0x18) isn't reliable enough to reflect the PCI slot's power state. Instead, the power indication bits are more reliable comparatively. This leads to mismatch between the cached power state and PCI slot's presence state, resulting in the hotplug driver in kernel refuses to unplug the devices properly on the request. The issue was found on below NVMe card on "supermicro,p8dtu2u" machine. We don't have this issue on the integrated PLX 8718 switch. # lspci 0022:01:00.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:02:01.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:02:04.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:02:05.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:02:06.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:02:07.0 PCI bridge: PLX Technology, Inc. PEX 9733 33-lane, \ 9-port PCI Express Gen 3 (8.0 GT/s) Switch (rev aa) 0022:17:00.0 Non-Volatile memory controller: Device 19e5:0123 (rev 45) This updates the cached PCI slot's power state using the power indication bits instead of power control bit, to fix above issue. Cc: stable #5.4.0+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit b03d75da4a7f1211e59166115ec66d1dd674fbad) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-02-16core/pci: Avoid hreset after fresetGavin Shan1-1/+2
Commit 5ac71c9 ("pci: Avoid hot resets at boot time") missed to avoid hot reset after fundamental reset for PCIe common slots. This fixes it. Cc: stable # 5.3.x Reported-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 53a08f13e3310bec362c2ddf2aba1851e053fa14) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-01-16Add skiboot 5.4.3 release notesskiboot-5.4.3Stewart Smith1-0/+18
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22Makefile: Disable stack protector due to gcc problemsBenjamin Herrenschmidt1-3/+9
Depending on how it was built, gcc will use the canary from a global (works for us) or from the TLS (doesn't work for us and accesses random stuff instead). Fixing that would be tricky. There are talks of adding a gcc option to force use of globals, but in the meantime, disable the stack protector Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [stewart@linux.vnet.ibm.com: add -fno-stack-protector] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit fe6f1f982b562ba855bb68fb51545f104078f546) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22stack: Don't recurse into __stack_chk_failBenjamin Herrenschmidt1-2/+7
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d7ffce9096d5a23ee4ff309910983d823e953bd2) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22Makefile: Use -ffixed-r13Benjamin Herrenschmidt1-0/+1
We use r13 for our own stuff, make sure it's properly fixed Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d45b9bc4f98dfeac3ce6ee906948b56944f6aa6b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22phb3: Lock the PHB on set_xive callbacksBenjamin Herrenschmidt1-0/+8
Those are called by the interrupts core and thus skip the locking implicit in the PCI opal calls. However IODA table access can be racy, so make sure we lock the PHB. Signed-off-by: Benjamin Herrenschmidt Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 55af871041a4b09e53013671450980bdb36f91e3) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-21arch_flash_arm: Don't assume mtd labels are shortJoel Stanley1-1/+1
pflash relies on arch_flash_arm parsing /proc/mtd to discover the pnor partition. It helpfully uses strcasestr so it can handle the string changing, which is what has happened as we moved to upstream compliant mtd device tree bindings. We currently have a string like this: dev: size erasesize name mtd0: 00060000 00001000 "u-boot" mtd1: 00020000 00001000 "u-boot-env" mtd2: 00280000 00001000 "kernel" mtd3: 001c0000 00001000 "initramfs" mtd4: 01740000 00001000 "rofs" mtd5: 00400000 00001000 "rwfs" mtd6: 02000000 00001000 "1e620000.flash-controller:flash@1" mtd7: 08000000 00001000 "1e630000.flash-controller:pnor@0" Unfortunately arch_flash_arm assumes the string will be at most 50 characters. That's right before the label we're looking for starts so we ignore that line and keep searching. Fix it by allowing for a 255 character line. Fixes: 48ab7ce09504 (external/pflash: Add --mtd) Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 7d6e73810dec029678a0d14a3f47485d4025520e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>