aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-29skiboot v6.6.6 release notesv6.6.6skiboot-6.6.xCédric Le Goater1-0/+15
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-09-29phb4: Disable TCE cache line bufferFrederic Barrat2-0/+2
This patch implements a circumvention for HW557787. It disables the TCE cache line buffer as, under heavy loads, there's a possibility of an entry being re-allocated incorrectly. [ Upstream commit 15b93a301509ba7813343540e25b47ba395674b9 ] Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-01-06skiboot v6.6.5 release notesv6.6.5Vasant Hegde1-0/+25
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Account cancelled timer requestVasant Hegde1-0/+3
[ Upstream commit b44c7594523d20945179e497c45ec9007981ac75 ] Currently we are not accounting cancelled timer request. So in some corner cases we may schedule new timer request with new-timer-value > inflight-timer-value. Lets explicit check new_target value with inflight timer value. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Rate limit timer requestsVasant Hegde1-0/+22
[ Upstream commit 2e654443050acdd4deffdbb44723a847ca11e6b2 ] We schedule timer and wait for `timer expiry` interrupt from SBE. If we get new timer request which is lesser than inflight timer expiry value we can update timer (essentially sending new timer chip-op and SBE takes care of stoping inflight timer and scheduling new one). SBE runs at much slower speed than host CPU. If we do continuous timer update like below then SBE will be busy with handling PSU side timer message and will not get time to handle FIFO side requests. send timer chip-op -> Got ACK -> send timer chip-op Hence this patch limits number of continuous timer update and we will restart sending timer request as soon as we get timer expiry interrupt. Rate limit value (2) is suggested by SBE team. With this patch: If our timer requests are : 2ms, 1500us, 1000us and 800us (and requests are coming after sending each message) We will schedule timer for 2ms and then update timer for 1500us and 1000us (These update happens after getting ACK interrupt from SBE) We will not send 800us request. At 1000us we get `timer expiry` and we are good to send next timer requests (At this stage both 1000us and 800us timeout happens. We will schedule next timer request with timeout value 500us (1500-1000)). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Check timer state before scheduling timerVasant Hegde1-2/+4
[ Upstream commit 47ab3a92298e72e44b9477a02b1312a09272a54a ] Timer flow: - OPAL sends timer chip-op to SBE and waits for ACK - Until we get ACK interrupt from SBE we will not schedule any new timer - Once we get ACK either we wait for timer expiry -OR- schedule new one if new-timer-request < inflight-timer-timeout value. - If we get new timer request while processing current one p9_sbe_update_timer_expiry code sets `has_new_target` and we schedule it in ACK path (p9_sbe_timer_resp()). p9_sbe_timer_resp() is callback handler and its called without lock. It does not check whether timer message is busy or not (timer_ctrl_msg). So in theory we may hit below scenario and corrupt msg_list. CPU 1 -> Timer ACK (callback handler) -- its not holding any lock CPU 2 -> Grabbed sbe_timer_lock -> scheduled timer --> done CPU 3 -> p9_sbe_update_timer_expiry() -> see timer is busy -> sets has_new_timer -> done CPU 1 -> gets chance to grab sbe_timer_lock -> saw has_new_timer -> Called p9_sbe_timer_schedule() --> List corrupted ! This patch adds timer message busy check in p9_sbe_timer_resp(). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06xscom: Fix xscom error logging caused due to xscom OPAL callGautham R. Shenoy1-2/+19
[ Upstream commit a4101173cacf79fcd91d395ab12aac9cb6840975 ] Commit 80fd2e963bd4 ("xscom: Don't log xscom errors caused by OPAL calls") ensured that xscom errors caused due to XSCOM read/write OPAL calls aren't logged in the error-log since the caller of the OPAL call is expected to handle it. However we are continuing to print the prerror() in the OPAL log regarding the same. This patch reduces the severity of the log from PR_ERROR to PR_INFO for the xscom read and write made via OPAL calls. Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Print info only for xscom read/writes made via opal calls Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06xive/p9: Remove assert from xive_eq_for_target()Cédric Le Goater1-1/+1
[ Upstream commit f07ea9564425d8005ab334dfa40f7cebe4e71fbf ] XIVE VPs are structures describing the vCPUs of guests. When starting a guest, these are allocated and enabled and some checks are done on the location of the associated ENDs, which describe the event queues. If the block of the VP and the block of the ENDs do not match, the XIVE driver asserts. Unfortunately, there is no way to check that a VP identifier is part of a VP block that was previously allocated and it is relatively easy to crash the host with a bogus VP id. That can be done with a QEMU hack on a machine using vsmt. Simply remove the assert, the OS should gracefully handle the error. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06core/platform: Fallback to full_reboot if fast-reboot failsVasant Hegde1-1/+2
[ Upstream commit 8256da311027176dd22885205f16869f55b79f3b ] If fast reboot fails then we return to Linux with OPAL_SUCCESS. Current Linux code thinks that request succedded and enters infinite loop (see Linux pnv_restart() code). This patch fixes above issue by return OPAL_UNSUPPORTED if fast reboot fails. Alternatively we can directly call full_reboot() itself. But I think it makes sense to go back to Linux and report the failure. And Linux falls back to normal reboot request. Fixes: 10bbcd07 ("core/platform: Add an explicit fast-reboot type") Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-10-22skiboot v6.6.4 release notesv6.6.4Vasant Hegde1-0/+18
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-10-22asm/head: fix power save wakeup register corruptionNicholas Piggin1-4/+4
[ Upstream commit 355a7dc193292b5b543e1bba1ff8b4a295fe8381 ] Power save wakeup handlers can clobber r30 before testing for state loss and avoiding restoring non-volatile GPRs. Fix this by using r5 instead (and move the register usage to one place, for clarity). Cc: skiboot-stable@lists.ozlabs.org Fixes: 8a43bf86b7 ("core/exceptions: implement an exception handler for non-powersave sresets") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-10-22FSP/NVRAM: Do not assert in vNVRAM statistics callVasant Hegde1-2/+1
[ Upstream commit 9ca8bf1bde56330075634bd3cb601d0f6ee90514 ] `msg` is valid pointer here. I don't recall why I added assert here :-( This is not correct. We shouldn't call assert here. Also we are not using `msg`. Hence convert it to `__unused`. Fixes: 19d4f98e ('FSP/NVRAM: Handle "get vNVRAM statistics" command') Cc: skiboot-stable@lists.ozlabs.org # v5.4.x + Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09skiboot v6.6.3 release notesv6.6.3Vasant Hegde1-0/+21
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09fsp/dump: Handle non-MPIPL scenarioVasant Hegde1-4/+4
[ Upstream commit 0ad0ab3e24a322b79bec8451bc21e9bdd40a6657 ] If MPIPL is not enabled then we will not create `/ibm,opal/dump` node and we should continue to parse/retrieve SYSDUMP. I missed this scenario when I fixed similar issue last time :-( Fixes: 92b7968 (fsp: Skip sysdump retrieval only in MPIPL boot) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09hw/phb4: Verify AER support before initialising AER regsOliver O'Halloran1-0/+3
[ Upstream commit 0a5f2812a7e9007f2a89502a3f07bac34bfacbdb ] Check the AER capability offset pointer is non-zero before enabling the AER messages. If the device doesn't support AER we end up writing garbage to config offset 0x0 + PCIECAP_AER_CAPCTL, or 0x18. For a normal device this is one of the BARs so this doesn't do much, but for a bridge this results in overriding: 0x18 - The primary bus number 0x19 - The secondary bus number 0x1A - The subordinate bus number 0x1B - The latency timer 0x1B is hardwired to zero for PCIe devices, but overwriting the bus number register can cause issues with routing of config space accesses. It's worth pointing out that we write actual values for the secondary and subordinate bus numbers before scanning the secondary bus, but the primary bus number is never restored. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09hw/phb4: Actually enable error reportingOliver O'Halloran1-0/+1
[ Upstream commit 9b594262eeec7699836ff50c8762241d1f2570a3 ] PHB3 had an errata about correctable errors and when Ben was doing the initial PHB4 port he deleted the corresponding config write to DEVCTL. Whoops. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09hdata: Add new "smp-cable-connector" VPD keywordKlaus Heinrich Kiwi1-0/+1
[ Upstream commit ef58f69f34faf42c64f5d6df857f07a69707c0e7 ] Recent FSP versions are defining a new VPD keyword 'SN' that brings SMP Cable Connector FRU info. Signed-off-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-07-03skiboot v6.6.2 release notesv6.6.2Vasant Hegde1-0/+17
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-07-03fsp: Skip sysdump retrieval only in MPIPL bootVasant Hegde1-3/+11
[ Upstream commit 92b79689cae560ff0cb3620a0221147bb947138c ] It seems we should continue to retrieval SYSDUMP except in MPIPL boot. Fixes: d6eb510 (fsp: Ignore platform dump notification on P9) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-23platform/mihawk: Fix IPMI double-freenichole1-2/+0
[ Upstream commit 68dc040a6540c218d20517764ff5d740a3626c55 ] The commit 6826095 ("platform/mihawk: support dynamic PCIe slot table") added the IPMI OEM command to communicate with BMC. We do the ipmi_free_msg(msg) twice that caused the Fast-reboot fail. This patch fixes it by removing the IPMI double-free bug to restore Fast-reboot. Signed-off-by: Nichole Wang <Nichole_Wang@wistron.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-6.6.x Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Fixes: commit 6826095 ("platform/mihawk: support dynamic PCIe slot table") Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06skiboot v6.6.1 release notesv6.6.1Vasant Hegde1-0/+31
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06occ: Fix false negatives in wait_for_all_occ_init()Gautham R. Shenoy3-32/+154
[ Upstream commit ec3c45f3889cd5f7615db5615dd6824abe32f759 ] Currently the wait_for_all_occ_init() function determines that the OCCs associated with every Chip has been initialized by verifying if the "Valid" bit in pstate table of that OCC is set. However, on chips where all the EX units are guarded, the OCC, even though it is active, does not update the pstate_table. Currently as a result of this, OPAL concludes that the OCC is not functional and not only disable Pstate initialization, but incorrectly report that that OCCs were not initialized, thereby cutting other features such as sensors. Fix this by ensuring that * We check if there is atleast one active EX unit in the chip before checking if the OCC is active. * On platforms with OCC-OPAL communication interface version 0x90 * wait_for_all_occ_init() only checks if the occ_state in the OCC dynamic area is set to "Active State". * move the "Valid" bit check to add_cpu_pstate_properties(), which is where we create the device-tree entries for frequency scaling. Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06uart: Drop console write data if BMC becomes unresponsiveVasant Hegde1-26/+74
[ Upstream commit 6bf21350da32776aac8ba75bf48933854647bd7e ] If BMC becomes unresponsive (ex: during BMC reboot) during console write then we may get stuck in uart_wait_tx_room(). This will result in CPU to get stuck in OPAL. This will result in kernel lockups and in some cases host becomes unresponsive. This patch introduces timeout option. If UART operation doesn't complete within predefined time then it will drop write data and comes out. Note that this patch fixes both OPAL internal console as well as console write APIs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [Various fixes on top of Nick's proposal to have single timer - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06hw/phys-map: Fix OCAPI_MEM BAR valuesAndrew Donnellan1-3/+3
[ Upstream commit 75198f668911830bb5df27da59786199eac2e47c ] The comment next to the OCAPI_MEM entries in the Nimbus phys-map claims that we are "varying the upper 2 bits of the group ID" for each OpenCAPI link, as matches the chip address extension mask that will be set by future versions of Hostboot. The actual entries, on the other hand, vary the *lower* 2 bits of the group ID. Whoops. This didn't appear to cause us problems on the specific machines that we had access to at the time, but now that this is being tested a bit harder it's crashing machines... Fixes: bc72973d13215 ("hw/npu2-opencapi: Support multiple LPC devices") Cc: Frederic Barrat <fbarrat@linux.ibm.com> Reported-by: Wael El-Essawy <welessa@us.ibm.com> Reported-by: Milton Miller <miltonm@us.ibm.com> Reported-by: Jenny Huynh <jhuynh@us.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06Detect fused core mode and bail outJoel Stanley2-0/+20
[ Upstream commit 482f18adf21eeb5f6ce2a93334725509a8f6f0cd ] Fused code mode is currently not supported in OPAL. Continuing to boot the system would result in errors at later stages of boot. Wait for console to be up and print message for developers to check and fix the system modes. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06platform/mihawk: Tune equalization settings for opencapiFrederic Barrat3-4/+33
[ Upstream commit afe6bc9051907d25082309895f8cfe44f59e2f25 ] The Bittware 250SOC adapter on Mihawk was showing a high count of CRC errors on one of the opencapi slots. The PHY team suggested new equalization settings to correct the errors. All existing adapters have been tested on mihawk to make sure the settings are compatible. However, the new settings should not be used on platforms other than mihawk. The changes specific to mihawk are: - Update the tx_ffe_pre_coeff and tx_ffe_post_coeff input parameters used during zcal - turn off the tx_ffe_boost parameter through scom Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06hdata/memory.c: Fix "Inconsistent MSAREA" warningsKlaus Heinrich Kiwi1-0/+3
[ Upstream commit 11d12c6fb60af42b89930fe776958f0eb208dd23 ] add_memory_buffer_mmio() should be exclusive to P9P (AXONE). Running it on non P9P systems resulted in warnings such as: MS AREA: Inconsistent MSAREA version 40 for P9P system So check for PVR and quietly return if not P9P. Fixes: 38b5c3179 (Add support for memory-buffer mmio) Cc: skiboot-stable@lists.ozlabs.org Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06PSI: Convert prerror to PR_NOTICEVasant Hegde1-1/+1
[ Upstream commit 071f00d661feaca05d9f610a21bd7c4d643e6b29 ] "Spurious interrupt" is not severe. Reduce message severity and keep msglog happy! Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06sensors: occ: Fix a bug when sensor values are zeroGautham R. Shenoy1-1/+2
[ Upstream commit 1beb1519f4c39c3d4c418aafa219236568c38c8d ] The commit 1b9a449d ("opal-api: add endian conversions to most opal calls") modified the code in opal_read_sensor() to make it Little-Endian safe. In the process, it changed the code so that if a sensor value was zero, it would simply return OPAL_SUCCESS without updating the return buffer. As a result, the return buffer contained bogus values which were reflected on those sensors being read by the Kernel. This patch fixes it by ensuring that the return buffer is updated with the value read from the sensor every time. Thanks to Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> for spotting the missing return-buffer update. cc: skiboot-stable@lists.ozlabs.org Fixes: commit 1b9a449d ("opal-api: add endian conversions to most opal calls") Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-06sensors: occ: Fix the GPU detection codeGautham R. Shenoy1-2/+20
[ Upstream commit f3ac046b386fea80286c72c3217acb407230a8c6 ] commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu systems") assumes that presence of "ibm,power9-npu" compatible node indicates the presence of GPUs. However this is incorrect, as even OpenCAPI is supported via NPU. Thus ZZ systems, which have OpenCAPI connectors but not GPUs will have "ibm,power9-npu" compatible nodes. This results in OPAL creating device-tree entries for the GPU sensors on ZZ systems which don't even have GPUs. This patch fixes the GPU detection code in occ-sensors, by first checking for "ibm,ioda2-npu2-phb" compatible node which indicates the presence of nvlink. Only if such a node exists, do we check with the OCC for presence of GPUs on systems to confirm the presence of the GPU. Otherwise, we cut the GPU sensors. Thanks to Frederic Barrat <fbarrat@linux.ibm.com> for suggesting "ibm,ioda2-npu2-phb" for detecting the presence of nvlink GPUs. cc: skiboot-stable@lists.ozlabs.org Fixes: commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu systems") Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-04-23skiboot v6.6 release notesv6.6Oliver O'Halloran1-0/+65
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15ZZ: Fix System Attention Indicator location codeVasant Hegde1-1/+5
We are using SAI indicator location from SLCA to represent System Attention Indicator location code. In P9, this is mapped to op-panel location code. op-panel has identify and fault LEDs as well. Our SPCN command lists op-panel location code as well. Hence we get below OPAL warning. OPAL msglog: FSPLED: duplicate location code U78D3.001.WT0004T-D1 Because of above issue we are not creating device tree node for D1 identify/fault indicators. We have System Attention Indicator at enclosure level as well.. which is replica of attention indicator in op-panel. Hence use System VPD location code to represent attention indicator. Note that we have dedicated MBOX command to read/update System Attention Indicator which doesn't need location code. Hence we are fine with this change. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15MPIPL: Add support to save crash CPU details on FSP systemVasant Hegde3-3/+13
OPAL uses different path to trigger MPIPL: - On BMC system we call SBE S0 interrupt - On FSP system we call `attn` instruction Currently on BMC system we collect crash CPU PIR details.. which is needed to generate proper dump. This happens just before calling SBE S0 interrupt. Since we don't use this path in FSP system OPAL is not saving crashing CPU details. Hence by default `opalcore` is not pointing to crashing CPU and not showing proper backtrace. We have to go through all CPUs to find crashing CPU backtrace. This patch move this function to common place so that if MPIPL is supported we collect crashing CPU data. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15fsp: Ignore platform dump notification on P9Vasant Hegde1-0/+3
After system crash FSP collects dump and passes dump details via HDAT. OPAL/Linux uses this detail to extract SYSDUMP. P9 FSP system we have MPIPL support. FSP folks says we have to ignore platform dump notification passed by HDAT and use inband MPIPL mechanism to extract dump. CC: Murulidhar Nataraju <murulidhar@in.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-09platform: add Raptor Blackbird supportStewart Smith2-1/+108
Based off the Raptor patch: https://git.raptorcs.com/git/blackbird-skiboot/commit/?id=c81f9d66592dc2a7cf7f6c59c3def5cee0638c1f Notable changes: - slot names matching what's silkscreened on the board - Expose IPL Observer over op-panel OPAL calls This means you can "printf '\xfe\xfe\xfe' > /dev/op_panel" to make the IPL Observer on the Raptor BMC builds to realise it can turn on fan control. Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-09skiboot v6.0.23 release notesVasant Hegde1-0/+17
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-04-08hw/ocmb: Add OCMB SCOM supportOliver O'Halloran4-0/+185
Add a driver for the SCOM ranges of the OCMB. Unlike most chips the OCMB has two different (three if you count OpenCAPI config space) register spaces and we need to ensure that the right access size is used on each. Additionally the SCOM interface is a bit non-standard in that a full physical address is passed as the SCOM address rather than a register number so we don't need to perform any address transformations, we just need to verify that the address falls into one of the nominated address ranges. Cc: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-08hdata/memory: Add support for memory-buffer mmioOliver O'Halloran1-14/+125
HDAT now allows associating a set of MMIO address ranges with an MSAREA. This is to allow for exporting the MMIO register space associated with a memory-buffer chip to the hypervisor so we can wire up access to that for PRD. The DT format is similar to the old centaur memory-buffer@<addr> nodes that we had on P8 OpenPower systems. The biggest difference is that the HDAT format allows for multiple memory ranges on each "chip" and each of these ranges may have a different register size. Cc: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-08hw/centaur: Convert to use the new scom APIOliver O'Halloran3-10/+16
Currently we assume any xscom_read / write targeted at a chipid with 0x8 as the top four bits is intended to be a centaur SCOM. On non-P8 platforms there is no reason to assume this so covert it to use the new struct scom_controller infrastructure. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-08hw/xscom: Add scom infrastructureOliver O'Halloran2-0/+87
Currently the top nibble of the "part ID" is used to determine the type of a xscom_read() / xscom_write() call. This was mainly done for the benefit of PRD on P8 which would do "targeted" SCOMs to EX (core) chiplets and rely on skiboot to do find the actual scom address. Similarly, PRD also relied on this to access the SCOMs of centaur chips which are accessed via FSI on P8. On P9 PRD moved to only doing non-targeted scoms where it would only ever supply a "part ID" which was the fabric ID of the chip to be SCOMed. The centaur support was also unnecessary since OPAL didn't support any P9 systems with Centaurs. However, on future systems we will have to support memory buffer chips again so we need to expand the SCOM support to accomodate them. To do this, allow skiboot components to register a SCOM read and write() function for chip ID. This will allow us to ensure the P8 EX chiplet and Centaur SCOM code is only ever used on P8, freeing up the Part ID address space for other uses. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-01docs: Fix ref to skiboot-6.4 in 6.5 release notesv6.6-rc1Oliver O'Halloran1-1/+1
I like to click things. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-31platform/mihawk: support dynamic PCIe slot tableJoy Chu1-16/+212
Slot table auto-detection for different riser cards by using IPMI OEM command to communicate with BMC. Signed-off-by: Joy Chu <joy_chu@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-30hw/phb4: Tune GPU direct performance on witherspoon in PCI modeFrederic Barrat3-24/+78
Good GPU direct performance on witherspoon, with a Mellanox adapter on the shared slot, requires to reallocate some dma engines within PEC2, "stealing" some from PHB4&5 and giving extras to PHB3. It's currently done when using CAPI mode. But the same is true if the adapter stays in PCI mode. In preparation for upcoming versions of MOFED, which may not use CAPI mode, this patch reallocates dma engines even in PCI mode for a series of Mellanox adapters that can be used with GPU direct, on witherspoon and on the shared slot only. The loss of dma engines for PHB4&5 on witherspoon has not shown problems in testing, as well as in current deployments where CAPI mode is used. Here is a comparison of the bandwidth numbers seen with the PHB in PCI mode (no CAPI) with and without this patch. Variations on smaller packet sizes can be attributed to jitter and are not that meaningful. # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.6.1 # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D) # Size Bandwidth (MB/s) Bandwidth (MB/s) # with patch without patch 1 1.29 1.48 2 2.66 3.04 4 5.34 5.93 8 10.68 11.86 16 21.39 23.71 32 42.78 49.15 64 85.43 97.67 128 170.82 196.64 256 385.47 383.02 512 774.68 755.54 1024 1535.14 1495.30 2048 2599.31 2561.60 4096 5192.31 5092.47 8192 9930.30 9566.90 16384 18189.81 16803.42 32768 24671.48 21383.57 65536 28977.71 24104.50 131072 31110.55 25858.95 262144 32180.64 26470.61 524288 32842.23 26961.93 1048576 33184.87 27217.38 2097152 33342.67 27338.08 Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-30hw/imc: Add error message on failing cases for imc_initMadhavan Srinivasan1-3/+11
Add couple of more debug messages to understand possible fail in imc_init(). Currently the only message printed is "IMC Devices not added" which is not very helpful when debugging. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-30Revert "FSP: Disable PSI link whenever FSP tells OPAL about impending R/R"Vasant Hegde3-9/+19
This reverts commit a4788a49f004a91bb8ca015336abf9ae119fbc52. Above patch was added to handle host power down with FSP in R/R state. But FSP is not liking OPAL giving up PSI link early in R/R process. For FSP initiated R/R OPAL should wait until we get PSI interrupt. Hence reverting above commit. Also partially reverting commit e04a34af to make fsp_dpo_pending as global variable. We have made several improvement in the way we handle FSP communication and also in power down path. Now if host sends powerdown message when FSP in RR, OPAL return OPAL_BUSY_EVENT. Kernel will run poller() and retry power down message after sometime. So I think this patch will not have any side effect on power down path. Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-30skiboot v6.0.22 release notesVasant Hegde1-0/+21
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-20hw/prd: Hold FSP notifications while PRD is inactiveOliver O'Halloran1-1/+12
On FSP systems we rely on a service on the FSP to send us a notification when the OCCs become active. On systems with NVDIMMs this is especially critical because the OCC is responsible for starting the NVDIMM save procedure when power fails. The message sent from the FSP isn't sent to OPAL itself, rather it's sent to the PRD service running on the host (via OPAL). If this service is not running OPAL will currently send an error response back to the FSP and drop the message. This causes problems because the OCCs active message is generally sent while OPAL is still booting the system so the PRD daemon never gets notified that the OCC is active. Once the OS is running we rely on PRD to report the protection status of the NVDIMMs on the system. However, because it never recieves the notification from the FSP it will always report the DIMMs as un-protected because it thinks the OCCs are inactive. This patch fixes the issue by allowing a single message to be held in OPAL while PRD is inactive. Once OPAL recieves a notification that PRD has started we deliver the message. It's worth pointing out that this is kind of janky and brittle and would probably break horribly if FSP notify messages were multi-part since we could end up in a situation where only a single part of a multi-part message is queued, with the rest being dropped. However, the only user of the FSP notification message appears to be the OCC, and the OCC team says it's not a problem. I'll take their word for it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> ---
2020-03-20skiboot v6.5.4 release notesVasant Hegde1-0/+16
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-13Re-license contributions from YadroOliver O'Halloran3-3/+3
Cc: Ilya Kuznetsov <ilya@yadro.com> Cc: Artem Senichev <artemsen@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-13Re-license contributions from Dan HorákOliver O'Halloran3-3/+3
Cc: Dan Horák <dan@danny.cz> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>