aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-28SBE: Account cancelled timer requestskiboot-op940.xVasant Hegde1-0/+3
[ Upstream commit b44c7594523d20945179e497c45ec9007981ac75 ] Currently we are not accounting cancelled timer request. So in some corner cases we may schedule new timer request with new-timer-value > inflight-timer-value. Lets explicit check new_target value with inflight timer value. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-28SBE: Rate limit timer requestsVasant Hegde1-0/+22
[ Upstream commit 2e654443050acdd4deffdbb44723a847ca11e6b2 ] We schedule timer and wait for `timer expiry` interrupt from SBE. If we get new timer request which is lesser than inflight timer expiry value we can update timer (essentially sending new timer chip-op and SBE takes care of stoping inflight timer and scheduling new one). SBE runs at much slower speed than host CPU. If we do continuous timer update like below then SBE will be busy with handling PSU side timer message and will not get time to handle FIFO side requests. send timer chip-op -> Got ACK -> send timer chip-op Hence this patch limits number of continuous timer update and we will restart sending timer request as soon as we get timer expiry interrupt. Rate limit value (2) is suggested by SBE team. With this patch: If our timer requests are : 2ms, 1500us, 1000us and 800us (and requests are coming after sending each message) We will schedule timer for 2ms and then update timer for 1500us and 1000us (These update happens after getting ACK interrupt from SBE) We will not send 800us request. At 1000us we get `timer expiry` and we are good to send next timer requests (At this stage both 1000us and 800us timeout happens. We will schedule next timer request with timeout value 500us (1500-1000)). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-28SBE: Check timer state before scheduling timerVasant Hegde1-2/+4
[ Upstream commit 47ab3a92298e72e44b9477a02b1312a09272a54a ] Timer flow: - OPAL sends timer chip-op to SBE and waits for ACK - Until we get ACK interrupt from SBE we will not schedule any new timer - Once we get ACK either we wait for timer expiry -OR- schedule new one if new-timer-request < inflight-timer-timeout value. - If we get new timer request while processing current one p9_sbe_update_timer_expiry code sets `has_new_target` and we schedule it in ACK path (p9_sbe_timer_resp()). p9_sbe_timer_resp() is callback handler and its called without lock. It does not check whether timer message is busy or not (timer_ctrl_msg). So in theory we may hit below scenario and corrupt msg_list. CPU 1 -> Timer ACK (callback handler) -- its not holding any lock CPU 2 -> Grabbed sbe_timer_lock -> scheduled timer --> done CPU 3 -> p9_sbe_update_timer_expiry() -> see timer is busy -> sets has_new_timer -> done CPU 1 -> gets chance to grab sbe_timer_lock -> saw has_new_timer -> Called p9_sbe_timer_schedule() --> List corrupted ! This patch adds timer message busy check in p9_sbe_timer_resp(). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-28xscom: Fix xscom error logging caused due to xscom OPAL callGautham R. Shenoy1-2/+19
[ Upstream commit a4101173cacf79fcd91d395ab12aac9cb6840975 ] Commit 80fd2e963bd4 ("xscom: Don't log xscom errors caused by OPAL calls") ensured that xscom errors caused due to XSCOM read/write OPAL calls aren't logged in the error-log since the caller of the OPAL call is expected to handle it. However we are continuing to print the prerror() in the OPAL log regarding the same. This patch reduces the severity of the log from PR_ERROR to PR_INFO for the xscom read and write made via OPAL calls. Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Print info only for xscom read/writes made via opal calls Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-28xive/p9: Remove assert from xive_eq_for_target()Cédric Le Goater1-1/+1
[ Upstream commit f07ea9564425d8005ab334dfa40f7cebe4e71fbf ] XIVE VPs are structures describing the vCPUs of guests. When starting a guest, these are allocated and enabled and some checks are done on the location of the associated ENDs, which describe the event queues. If the block of the VP and the block of the ENDs do not match, the XIVE driver asserts. Unfortunately, there is no way to check that a VP identifier is part of a VP block that was previously allocated and it is relatively easy to crash the host with a bogus VP id. That can be done with a QEMU hack on a machine using vsmt. Simply remove the assert, the OS should gracefully handle the error. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-09-28phb4: Disable TCE cache line bufferFrederic Barrat2-0/+2
[ Upstream commit 15b93a301509ba7813343540e25b47ba395674b9 ] This patch implements a circumvention for HW557787. It disables the TCE cache line buffer as, under heavy loads, there's a possibility of an entry being re-allocated incorrectly. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-23platform/mihawk: Fix IPMI double-freenichole1-2/+0
[ Upstream commit 68dc040a6540c218d20517764ff5d740a3626c55 ] The commit 6826095 ("platform/mihawk: support dynamic PCIe slot table") added the IPMI OEM command to communicate with BMC. We do the ipmi_free_msg(msg) twice that caused the Fast-reboot fail. This patch fixes it by removing the IPMI double-free bug to restore Fast-reboot. Signed-off-by: Nichole Wang <Nichole_Wang@wistron.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-6.6.x Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Fixes: commit 6826095 ("platform/mihawk: support dynamic PCIe slot table") Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06platform/mihawk: Tune equalization settings for opencapiFrederic Barrat3-4/+33
[ Upstream commit afe6bc9051907d25082309895f8cfe44f59e2f25 ] The Bittware 250SOC adapter on Mihawk was showing a high count of CRC errors on one of the opencapi slots. The PHY team suggested new equalization settings to correct the errors. All existing adapters have been tested on mihawk to make sure the settings are compatible. However, the new settings should not be used on platforms other than mihawk. The changes specific to mihawk are: - Update the tx_ffe_pre_coeff and tx_ffe_post_coeff input parameters used during zcal - turn off the tx_ffe_boost parameter through scom Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06uart: Drop console write data if BMC becomes unresponsiveVasant Hegde1-26/+74
[ Upstream commit 6bf21350da32776aac8ba75bf48933854647bd7e ] If BMC becomes unresponsive (ex: during BMC reboot) during console write then we may get stuck in uart_wait_tx_room(). This will result in CPU to get stuck in OPAL. This will result in kernel lockups and in some cases host becomes unresponsive. This patch introduces timeout option. If UART operation doesn't complete within predefined time then it will drop write data and comes out. Note that this patch fixes both OPAL internal console as well as console write APIs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [Various fixes on top of Nick's proposal to have single timer - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06npu2: Invalidate entire TCE cache if many entries requestedAlexey Kardashevskiy1-5/+12
[ Upstream commit 2a0455ba0f7784b2d7e9e3915fd30f815afd2ae1 ] Turned out invalidating entries in NPU TCE cache is so slow that it becomes visible when running a 30+GB guest with GPU+NVlink2 passed through; a 100GB guest takes about 20s to map all 100GB. This falls through to the entire cache invalidation if more than 128 TCEs were requested to invalidate, this reduces 20s from the abobe to less than 1s. The KVM change [1] is required to see this difference. The threshold of 128 is chosen in attempt not to affect performance much as it is not clear how expensive it is to populate the TCE cache again; all we know for sure is that mapping the guest produces invalidation requests of 512 TCEs each. Note TCE cache invalidation in PHB4 is faster and does not require the same workaround. [1] KVM: PPC: vfio/spapr_tce: Split out TCE invalidation from TCE updates https://patchwork.ozlabs.org/patch/1149003/ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-04-03platform/mihawk: support dynamic PCIe slot tableJoy Chu1-16/+212
[ Upstream commit 6826095796c91583f344950ff76ef8ccf6e756b5 ] Slot table auto-detection for different riser cards by using IPMI OEM command to communicate with BMC. Signed-off-by: Joy Chu <joy_chu@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-04-03hw/phb4: Tune GPU direct performance on witherspoon in PCI modeFrederic Barrat3-24/+78
[ Upstream commit e876514b3773dcecc0b39317ca341d27db96d81c ] Good GPU direct performance on witherspoon, with a Mellanox adapter on the shared slot, requires to reallocate some dma engines within PEC2, "stealing" some from PHB4&5 and giving extras to PHB3. It's currently done when using CAPI mode. But the same is true if the adapter stays in PCI mode. In preparation for upcoming versions of MOFED, which may not use CAPI mode, this patch reallocates dma engines even in PCI mode for a series of Mellanox adapters that can be used with GPU direct, on witherspoon and on the shared slot only. The loss of dma engines for PHB4&5 on witherspoon has not shown problems in testing, as well as in current deployments where CAPI mode is used. Here is a comparison of the bandwidth numbers seen with the PHB in PCI mode (no CAPI) with and without this patch. Variations on smaller packet sizes can be attributed to jitter and are not that meaningful. # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.6.1 # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D) # Size Bandwidth (MB/s) Bandwidth (MB/s) # with patch without patch 1 1.29 1.48 2 2.66 3.04 4 5.34 5.93 8 10.68 11.86 16 21.39 23.71 32 42.78 49.15 64 85.43 97.67 128 170.82 196.64 256 385.47 383.02 512 774.68 755.54 1024 1535.14 1495.30 2048 2599.31 2561.60 4096 5192.31 5092.47 8192 9930.30 9566.90 16384 18189.81 16803.42 32768 24671.48 21383.57 65536 28977.71 24104.50 131072 31110.55 25858.95 262144 32180.64 26470.61 524288 32842.23 26961.93 1048576 33184.87 27217.38 2097152 33342.67 27338.08 Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-26platform/mihawk: add nvme devices slot tableJoy Chu1-16/+76
[ Upstream commit d6ab89dbdbb894e835b08022bf9d46999ffc9df6 ] Add nvme slot table for broadcom gen4 nvme hba card support. Signed-off-by: Joy Chu <joy_chu@wistron.com> [oliver: fixed statment with no effect warning] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-20errorlog: Increase the severity of abnormal reboot eventsVasant Hegde1-1/+1
[ Upstream commit daf9215c85f910043c472984683948baaf18da39 ] Currently Linux will usually call opal_cec_reboot2() in response to unrecoverable HMIs and other serious hardware errors. OPAL handles platform errors by sending an error log to the BMC / FSP and triggering a software checkstop. Sending error logs to the BMC / FSP is normally an async operation, but in this path we need to ensure that error logs are sent out before the xstop is triggered. The easiest way to do that is to escalate the severity of the generated error log from "abnormal reboot" to "panic" since we force panic logs to be send synchronusly. It's also a more accurate description of what's happening. CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> [oliver: commit message] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-20eSEL: Make sure PANIC logs are sent to BMC before calling assertVasant Hegde1-2/+15
[ Upstream commit 033e797cb0d77b151ab6e47d8fb7666a09641107 ] eSEL logs are split into multiple smaller chunks and sent to BMC. We use ipmi_queue_msg_sync() interface for sending OPAL_ERROR_PANIC severity events to BMC. But callback handler (ipmi_cmd_done()) clears 'sync_msg' after getting response to first chunk as its not aware that we have more data to send. So in assert()/checkstop path we may endup checkstoping system before error log is sent to BMC completely. We will miss useful error log. This patch introduces new wait loop in ipmi_elog_commit(). It will wait until error log is sent to BMC. I think this is safe because even if something goes wrong (like BMC reset) we will hit timeout and eventually we will come out of this loop. Alternatively we can add additional check in ipmi_cmd_done() path. But I don't wanted to make this path aware of message type. Reviewed-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-12npu2-opencapi: Allow platforms to identify physical slotsFrederic Barrat5-3/+74
[ Upstream commit 9de4f2284c54433f7f4ff3dc3d13a39c657e2c19 ] This patch lets each platform define the name of the opencapi slots. It makes it easier to identify which physical card is generating errors or messages in the linux or skiboot log files. The patch provides slot names for mihawk and witherspoon. If the platform doesn't define any, then we default to 'OPENCAPI-xxxx' There are various ways to find out about the slot names: skiboot log lspci command (if the PCI hotplug driver pnv-php is loaded) lshw checking the device tree and probably others.... Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2, npu3: Remove ibm, phb-index property from the NPU dt nodeFrederic Barrat7-14/+2
[ Upstream commit bbb4777f682dab0f1411a493861af9e340e81229 ] The 'ibm,phb-index' property of the NPU node is now useless, as we can have multiple PHBs associated to the same NPU on P9. Let's remove it to avoid confusion. Reviewed-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu3: Don't use the device tree to assign the phb-index of the PHBFrederic Barrat2-1/+7
[ Upstream commit 57d43efd6bbb052b467df3a19ca84feccdd0649b ] On Axone, there's a 1-to-1 mapping between virtual PHBs and NPUs. We could keep assigning the phb-index of the virtual PHB from the value found in the npu node of the device tree, but to be consistent with P9/npu2 and avoid confusion, this patch assigns the phb-index when the virtual PHB is created, based on the npu index, similarly to what we do on P9. Reviewed-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2: Rework phb-index assignments for virtual PHBsFrederic Barrat4-3/+22
[ Upstream commit da28a6642b79a68d6c8773f149692a3702a31240 ] Until now, opencapi PHBs were not using the 'ibm,phb-index' property, as it was thought unnecessary. For nvlink, a phb-index was associated to the npu when parsing hdat data, and the nvlink PHB was reusing the same value. It turns out it helps to have the 'ibm,phb-index' property for opencapi PHBs after all. Otherwise it can lead to wrong results on platforms like mihawk when trying to match entries in the slot table. We end up with an opencapi device inheriting wrong properties in the device tree, because match_slot_phb_entry() default to phb-index 0 if it cannot find the property. Though it doesn't seem to cause any harm, it's wrong and a future patch is expected to start using the slot table for opencapi, so it needs fixing. The twist is that with opencapi, we can have multiple virtual PHBs for a single NPU on P9. There's one PHB per (opencapi) brick. Therefore there's no 1-to-1 mapping between the NPU and PHB index and it no longer makes sense to associate a phb-index to a npu. With this patch, opencapi PHBs created under a NPU use a fixed mapping for their phb-index, based on the brick index. The range of possible values is 7 to 12. Because there can only be one nvlink PHB per NPU, it is always using a phb-index of 7. A side effect is that 2 virtual PHBs on 2 different chips can have the same phb-index, which is similar to what happens for 'real' PCI PHBs, but is different from what was happening on a nvlink-only witherspoon so far. Reviewed-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Log a warning when resetting a broken deviceFrederic Barrat1-0/+4
[ Upstream commit 233e863c8b1dccad8be7c39336d232a4a3994e6b ] On P9, the NPU doesn't support recovery if the link goes down unexpectedly. It was not fully verified. We mark the device as broken when we receive an error interrupt from the NPU. However, there's nothing to prevent the OS from trying to reset the device; It may or may not work, it's unsupported territory, so let's log a message to make it clear, as it could help when debugging. We haven't hit any cases where the reset goes badly enough that we'd want to prevent it, so let it go for now. We can revisit later if we have evidence that it's causing more problems than it is worth. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Handle OPAL_UNMAP_PE operation on set_pe() callbackFrederic Barrat1-1/+6
[ Upstream commit 9d5faafc56f5cac7ba848bc684835353e039f048 ] In a hot-unplug scenario, the OS will try to unmap the PE. Skiboot doesn't do anything with the linux PE for opencapi other than being a mailbox, but at least let's be consistent. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Activate PCI hotplug on opencapi slotFrederic Barrat1-4/+65
[ Upstream commit 6299d3e51b16b47a00685da70688df634dedc7df ] Implement the get_power_state() and set_power_state() callbacks for the opencapi slot and add properties in the device tree to mark the opencapi slot as hot-pluggable. We don't really power off/on the opencapi adapter. The slot at play here is the virtual slot associated to the virtual opencapi PHB. The real PCIe slot where the card is drawing its power from is untouched (skiboot is not even aware which PCIe slot the card is seated on). So the 'fake' power off is fencing the card and set it in reset so that the FPGA image can be updated. The 'fake' power on is not doing much, as the unfencing happens on the subsequent link training. Opencapi slots are named 'OPENCAPI-xxxx' where xxxx is the opal ID of the PHB/slot. This is meant to easily identify the slot used by an AFU device, as the AFU device names are also built around that ID. For example, the device /dev/ocxl/AFP3.0006:00:00.1.0 uses the slot OPENCAPI-0006. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Improve error reporting to the OSFrederic Barrat3-4/+29
[ Upstream commit 40bc636eb6b64473ec9ef167bd2bc91a9c032806 ] When resetting an opencapi link, the brick will be fenced temporarily. Therefore we can't rely on the fencing state of the brick any more to check for the health of an opencapi PHB, as we could report errors if queried for a PHB state at the same time a link is being reset. Instead, we flag the device as 'broken' when an error interrupt is received, just before raising an event to the OS. When the OS is querying for the state of a PHB, we only have to look at the 'broken' attribute. Note that there's no recovery possible on P9 when an error interrupt is received unexpectedly, as recovery is not supported by hardware. So when a device/link is marked as 'broken', it stays broken. All the OS can do is log the error and notify the drivers. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Detect PHY reset errorsFrederic Barrat3-5/+17
[ Upstream commit dbc70aea3a2eec5d8d3c092c2397b2997e35ba60 ] PHY reset can fail! Though past problems are now fixed, let's handle any future failure. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Simplify freset statesFrederic Barrat1-13/+3
[ Upstream commit 7989d6edfcbbe9ba061b667f53b82f6860ffff01 ] Let's get rid of one transitional state, since there's no need to pause in between releasing the reset signals of the ODL and the adapter. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Tweak fundamental reset sequenceFrederic Barrat2-24/+26
[ Upstream commit c5db832570a73e30fb5b668d8cce759178b2e4b7 ] Modify slightly the ordering of a few steps in our init sequence on fundamental reset, so that it can be called from the OS, when the link is already up: - when the card is reset, the link goes down, so we need to fence the brick to prevent errors propagating to the NPU and OS - since fencing and unfencing don't require any delay, let's also fence/unfence during the very first reset at boot. It's useless but doesn't hurt and keep the code simpler. - resetting the PHY must be done a bit later, while fenced and the ODL and DLx in reset Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Rework link training timeoutFrederic Barrat2-4/+7
[ Upstream commit 2600cfac4db106b219deee042f53e8d9c54d857d ] Opencapi link state should be polled for up to 3 seconds. Current code assumes a tight retry loop during fundamental reset at boot, which is not going to be true on link retraining. So update the timeout detection code to use a timebase instead of a simple retry count which could be way too long. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-hw-procedures: Fix link retraining on resetFrederic Barrat1-0/+16
[ Upstream commit fed081dcbd0a1fb84a61bc3429a615e1fc8bd780 ] Link retraining was showing reliability problems due to some opencapi-only settings not being optimized. This patch updates some extra PHY state, as agreed with the PHY team. Though they mostly impact link retraining behavior, they should also be set at boot. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-opencapi: Make sure the PCI slot has the proper IDFrederic Barrat1-1/+2
[ Upstream commit 544ce7ef2b8c926cbdd7a23c0f796c1fb157c096 ] The PCI slot created for the opencapi PHB didn't have its ID properly defined because it was created before we assign an ID to the PHB. Simply switch the PCI slot creation and PHB registration calls to fix it. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-11npu2-hw-procedures: Move some opencapi PHY settings in one-off initFrederic Barrat1-19/+16
[ Upstream commit 13e1a7e54cf0f46e2fe414b8499661dd4b9b903d ] The PHY_RX_AC_COUPLED and PHY_RX_SPEED_SELECT for opencapi are group settings for the obus. They should be set in the one-off PHY init function at boot and not on the link reset path, as they theoretically impact more than one link. Since we cannot mix link type and/or speed on an optical bus, it has no pratical impact, it just looks cleaner. Also use the OCAPIINF macro for the associated traces. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10core/pci: Fix scan of devices for opencapi slotsFrederic Barrat1-5/+15
[ Upstream commit 94bc2d7a85110d752a8424d5f85382c4f02ec155 ] Opencapi devices are found directly under the PHB and the PHB slot doesn't have an associated PCI device (root complex). So when scanning a PHB, devices are added directly under the PHB, like it's done at boot time. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10core/pci: Train link of PHB slots when hotpluggingFrederic Barrat1-22/+85
[ Upstream commit 30642155862c6fab9c8eca98b0a3f84636a98c01 ] The link of PHB slots must be trained after powering on. This can be done by calling the fundamental reset callback of the slot. We could force a reset for all the slots and have a common path in set_power_state(). But this patch only resets the PHB slot. Some slot implementations do a power cycle during fundamental reset, so calling a reset after powering on would repeat that operation. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10core/pci: Use proper phandle during hotplug for PHB slotsFrederic Barrat1-6/+15
[ Upstream commit 8bae237693b6e05cd979c28a22e8a8efef1cb2bc ] PHB slots don't have an associated device (slot->pd = NULL). They were not used by the PCI hotplug framework so far, but with opencapi virtual PHBs, that's changing. With opencapi, devices are directly under the PHB (no root complex or intermediate bridge) and the slot used for hotplug is the PHB slot. This patch uses the proper phandle when replying asynchronously to the OS when using a PHB slot. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10core/pci: Add missing lock in set_power_timerFrederic Barrat2-0/+12
[ Upstream commit 38e51a31c815befd9b455fd6896d0753ebeb4a38 ] set_power_timer() was not using any lock, though it alters the slot state and devices found under it. There's a remote possibility that set_power_timer() is called through check_timers() by a thread already holding the phb lock, so we try to take the lock but yield and rearm the timer if somebody else is already owning it. There really shouldn't be any contention here. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10core/pci: Refactor common paths on slot hotplugFrederic Barrat1-17/+26
[ Upstream commit f6f247a8c46848a0c42fec7cab27f09cd8f2d5e2 ] Refactor code executed to remove or rescan devices when a slot power state changes, synchronously or asynchronously through a timer callback. It will be more useful in a future patch. No functional changes. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10skiboot v6.5.3 release notesv6.5.3Vasant Hegde1-0/+24
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10npu2-opencapi: Don't drive reset signal permanentlyFrederic Barrat1-6/+40
[ Upstream commit 53408440edb30de7ad18f12db285f15a0863fbc3 ] A problem was found with the way we manage the I2C signal to reset adapters. Skiboot currently always drives the value of the opencapi reset signal. We set the I2C pin for reset in output mode and keep it in output mode permanently. And since the reset signal is inverted, it is explicitly set to high by the I2C controller pretty much all the time. When the opencapi card is powered off, for example on a reboot, actively driving the I2C reset pin to high keeps applying a voltage to part of the FPGA, which can leak current, send the FPGA in a bad state since it's unexpected or even damage the card. To prevent damaging adapters, the recommendation from the hardware team is to switch back the pin to input mode at the end of a reset cycle. There are pull-up resistors on the planar of all the platforms to make sure the reset signal is high "naturally". When the slot is powered off, the reset pin won't be kept high by the i2c controller any more. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10mpipl: Rework memory reservation for OPAL dumpVasant Hegde3-23/+40
[ Upstream commit b0e024216a3b1d35aa2273b6f64742db7ae49861 ] During boot, OPAL reserves memory required to capture OPAL dump and architected register data. During MPIPL, hostboot will copy OPAL dump to this memory. Post MPIPL kernel will use this memory to create opalcore. We use mem_reserve_fw() for this reservation. At present this reservation happens late in the init path. It may clash with memory allocated by local_alloc(). We have two option to fix above issue: - Use local_alloc() for allocating memory for OPAL dump This works fine on first boot. We can use this method to reserve memory. But Post MPIPL we still want to reserve destination memory to make sure no one is stomping this area. Also this reservation might have happened in between other local_allocations. So in Post MPIPL boot allocator may not find enough memory in first region for other local_alloc() requests and may throw mem_alloc() error before trying to allocate from other regions. - Early memory reservation for OPAL dump Allocate and reserve memory just after memory region init. This patch uses second approach to fix reservation issue. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-10xscom: Don't log xscom errors caused by OPAL callsOliver O'Halloran2-4/+14
[ Upstream commit 80fd2e963bd4364ee8c3b5a06215d8cbdfe04fcb ] The XSCOM read/write OPAL calls are largely there to support running PRD in the OS. PRD itself handles submitting error logs (if needed) when a XSCOM operation fails so there's no need to send an error log from inside of OPAL. Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-09mpipl: Disable fast-reboot during post MPIPL bootVasant Hegde1-0/+2
[ Upstream commit b858aef5210e98b19419ad4dc347cf96d89cbf85 ] Otherwise device tree will continue to have `mpipl-boot` and kernel may think its MPIPL boot. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-09hdata: Update MPIPL support IPL parameterVasant Hegde1-1/+1
[ Upstream commit a448c4e2f6c3962b1690e367d1a2a8e03180b320 ] We used bit 4 of `sys_attributes` attribute for MPIPL supported flag. Unfortunately we forgot to update HDAT spec. Now bit 4 is used for different purpose. Hence use bit 5 for MPIPL. Fortunately we don't have any released firmware with MPIPL supported yet. Hence its safe to make this change. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-01-29npu2: Clear fence on all bricksAlexey Kardashevskiy1-5/+12
[ Upstream commit 9be9a77a8352aee0bb74ac0d79f55e1238f76285 ] A bug in the NVidia driver can cause an UR HMI which fences bricks (links). At the moment we clear fence status only for bricks of a specific devices, however this does not appear to be enough and we need to clear fences for all bricks. This is ok as we do not allow using GPUs individually anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-09skiboot v6.5.2 release notesv6.5.2Vasant Hegde1-0/+28
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-09libstb/tpm: block access to unknown i2c devs on the tpm busOliver O'Halloran1-4/+43
[ Upstream commit 9f7b726ccf7ee9b6fe50277e79e0bd285bfa9918 ] Our favourite TPM is capable of listening on multiple I2C bus addresses and although this feature is supposed to be disabled by default we have some systems in the wild where the TPM appears to be listening on these secondary addresses. The secondary addresses are also susceptible to the bus-lockup problem that we see with certain traffic patterns to the "main" TPM address. We don't know what addresses the TPM might be listening on it's best to take a conservitve approach and only allow traffic to I2C bus addresses that we are explicitly told about by firmware. This is only required on the TPM bus, so this patch extends the existing TPM workaround to also check that a DT node exists for any I2C bus address the OS wants to talk to. If there isn't one, we don't forward the I2C request to the bus and return an I2C timeout error to the OS. Acked-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-06slw: slw_reinit fix array overrunNicholas Piggin1-1/+1
[ Upstream commit 6b512fceb4210d5cf166912ef72c90cd29caec67 ] The slw patch saving array is too small, which results in slw_reinit overwriting 32 bytes beyond the end of it. The size is increased to 0x100, which is the architecture interrupt vector size. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-06IPMI: Trigger OPAL TI in abort path.Mahesh Salgaonkar1-7/+23
[ Upstream commit a810d1fe7a1631dc1eb211ff70885f044cb40904 ] The current assert/abort implementation for BMC based system invokes cec reboot after printing backtrace. This means that BMC never gets notified about OPAL crash/termination. This sometimes leads into never ending IPL-ing loop if OPAL keeps aborting very early in boot path. Trigger a software xstop (OPAL TI) to inform BMC about the OPAL termination. BMC is capable of catching checkstop signal and facilitate in rebooting (IPL-ing) host. With AutoReboot policy, OpenBMC handles checkstop signals and counts them against the reboot counter. In cases where OPAL is crashing before host reaches to runtime, OpenBMC will move the system in Quiesced state after 3 or so attempts of IPL/reboot so that system can be debugged. When OPAL triggers software checkstop it causes all the CPU threads to be stooped and moved to quiesced state. Hence OPAL don't need to explicitly stop all CPUs before calling software xstop. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05platform/mihawk: Add system VPD EEPROM to I2C busJoy Chu1-0/+20
[ Upstream commit 52952aca9d6148e7ae3c3725ae43d48e27b61357 ] Add VPD EEPROM type fix for planar VPD update. Signed-off-by: Joy Chu <joy_chu@wistron.com> [oliver: commit subject] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05platform/mihawk: Detect old system compatible stringFrederic Barrat1-1/+2
[ Upstream commit 425340bd2809808f21856d80bf76b785b87b6041 ] Newer firmware declares the system as "ibm,mihawk", but the labs are full of older installs, which were using "wistron,mihawk". Let's keep detecting the older string since it allows to run recent skiboot on older fw stack and make people's lives a little tiny bit easier. Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05npu2/hw-procedures: Remove assertion from check_credits()Reza Arbab1-9/+6
[ Upstream commit 24664b48642845d620e225111bf6184f3c102f60 ] The RX clock mux in the NVLink PHY can glitch, which will manifest in hard to diagnose behavior--at best, a checkstop during the first link traffic. The only reliable way we found to detect this was by checking for a discrepancy in the credits we expect to receive during link training. Since the time the check was added, we've found that * Commit ac6f1599ff33 ("npu2: hw-procedures: Add phy_rx_clock_sel()") does work around the original glitch. * Asserting is too harsh. Before root cause was established, it was thought this could have been a manufacturing defect and we wanted to loudly fail hardware acceptance boot cycle tests. * It seems there is a valid situation in which credits are off from the expected value. During GPU hot reset, a CPU prefetch across the link can affect the credit count before we check. Given all of the above, remove the assert(). Cc: stable # 6.0.x Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05npu2-opencapi: Fix integer promotion bug in LPC allocationAndrew Donnellan1-1/+1
[ Upstream commit e85e2e2b8b0a4dd96e10343df19df1a161cd6aac ] If you try to allocate an amount of LPC memory that's not a power of 2, we round the value up to the nearest power of 2. By the magic of C, "1 << n" gets treated as an int, even if you're assigning it to a uint64_t. Change 1 to 1ULL to fix this. (C, it's great.) Reported-by: Alastair D'Silva <alistair@d-silva.org> Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>