aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-03-10skiboot v6.5.3 release notesv6.5.3Vasant Hegde1-0/+24
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10npu2-opencapi: Don't drive reset signal permanentlyFrederic Barrat1-6/+40
[ Upstream commit 53408440edb30de7ad18f12db285f15a0863fbc3 ] A problem was found with the way we manage the I2C signal to reset adapters. Skiboot currently always drives the value of the opencapi reset signal. We set the I2C pin for reset in output mode and keep it in output mode permanently. And since the reset signal is inverted, it is explicitly set to high by the I2C controller pretty much all the time. When the opencapi card is powered off, for example on a reboot, actively driving the I2C reset pin to high keeps applying a voltage to part of the FPGA, which can leak current, send the FPGA in a bad state since it's unexpected or even damage the card. To prevent damaging adapters, the recommendation from the hardware team is to switch back the pin to input mode at the end of a reset cycle. There are pull-up resistors on the planar of all the platforms to make sure the reset signal is high "naturally". When the slot is powered off, the reset pin won't be kept high by the i2c controller any more. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-10mpipl: Rework memory reservation for OPAL dumpVasant Hegde3-23/+40
[ Upstream commit b0e024216a3b1d35aa2273b6f64742db7ae49861 ] During boot, OPAL reserves memory required to capture OPAL dump and architected register data. During MPIPL, hostboot will copy OPAL dump to this memory. Post MPIPL kernel will use this memory to create opalcore. We use mem_reserve_fw() for this reservation. At present this reservation happens late in the init path. It may clash with memory allocated by local_alloc(). We have two option to fix above issue: - Use local_alloc() for allocating memory for OPAL dump This works fine on first boot. We can use this method to reserve memory. But Post MPIPL we still want to reserve destination memory to make sure no one is stomping this area. Also this reservation might have happened in between other local_allocations. So in Post MPIPL boot allocator may not find enough memory in first region for other local_alloc() requests and may throw mem_alloc() error before trying to allocate from other regions. - Early memory reservation for OPAL dump Allocate and reserve memory just after memory region init. This patch uses second approach to fix reservation issue. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-10xscom: Don't log xscom errors caused by OPAL callsOliver O'Halloran2-4/+14
[ Upstream commit 80fd2e963bd4364ee8c3b5a06215d8cbdfe04fcb ] The XSCOM read/write OPAL calls are largely there to support running PRD in the OS. PRD itself handles submitting error logs (if needed) when a XSCOM operation fails so there's no need to send an error log from inside of OPAL. Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-09mpipl: Disable fast-reboot during post MPIPL bootVasant Hegde1-0/+2
[ Upstream commit b858aef5210e98b19419ad4dc347cf96d89cbf85 ] Otherwise device tree will continue to have `mpipl-boot` and kernel may think its MPIPL boot. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-03-09hdata: Update MPIPL support IPL parameterVasant Hegde1-1/+1
[ Upstream commit a448c4e2f6c3962b1690e367d1a2a8e03180b320 ] We used bit 4 of `sys_attributes` attribute for MPIPL supported flag. Unfortunately we forgot to update HDAT spec. Now bit 4 is used for different purpose. Hence use bit 5 for MPIPL. Fortunately we don't have any released firmware with MPIPL supported yet. Hence its safe to make this change. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-01-29npu2: Clear fence on all bricksAlexey Kardashevskiy1-5/+12
[ Upstream commit 9be9a77a8352aee0bb74ac0d79f55e1238f76285 ] A bug in the NVidia driver can cause an UR HMI which fences bricks (links). At the moment we clear fence status only for bricks of a specific devices, however this does not appear to be enough and we need to clear fences for all bricks. This is ok as we do not allow using GPUs individually anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-09skiboot v6.5.2 release notesv6.5.2Vasant Hegde1-0/+28
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-09libstb/tpm: block access to unknown i2c devs on the tpm busOliver O'Halloran1-4/+43
[ Upstream commit 9f7b726ccf7ee9b6fe50277e79e0bd285bfa9918 ] Our favourite TPM is capable of listening on multiple I2C bus addresses and although this feature is supposed to be disabled by default we have some systems in the wild where the TPM appears to be listening on these secondary addresses. The secondary addresses are also susceptible to the bus-lockup problem that we see with certain traffic patterns to the "main" TPM address. We don't know what addresses the TPM might be listening on it's best to take a conservitve approach and only allow traffic to I2C bus addresses that we are explicitly told about by firmware. This is only required on the TPM bus, so this patch extends the existing TPM workaround to also check that a DT node exists for any I2C bus address the OS wants to talk to. If there isn't one, we don't forward the I2C request to the bus and return an I2C timeout error to the OS. Acked-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-06slw: slw_reinit fix array overrunNicholas Piggin1-1/+1
[ Upstream commit 6b512fceb4210d5cf166912ef72c90cd29caec67 ] The slw patch saving array is too small, which results in slw_reinit overwriting 32 bytes beyond the end of it. The size is increased to 0x100, which is the architecture interrupt vector size. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-06IPMI: Trigger OPAL TI in abort path.Mahesh Salgaonkar1-7/+23
[ Upstream commit a810d1fe7a1631dc1eb211ff70885f044cb40904 ] The current assert/abort implementation for BMC based system invokes cec reboot after printing backtrace. This means that BMC never gets notified about OPAL crash/termination. This sometimes leads into never ending IPL-ing loop if OPAL keeps aborting very early in boot path. Trigger a software xstop (OPAL TI) to inform BMC about the OPAL termination. BMC is capable of catching checkstop signal and facilitate in rebooting (IPL-ing) host. With AutoReboot policy, OpenBMC handles checkstop signals and counts them against the reboot counter. In cases where OPAL is crashing before host reaches to runtime, OpenBMC will move the system in Quiesced state after 3 or so attempts of IPL/reboot so that system can be debugged. When OPAL triggers software checkstop it causes all the CPU threads to be stooped and moved to quiesced state. Hence OPAL don't need to explicitly stop all CPUs before calling software xstop. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05platform/mihawk: Add system VPD EEPROM to I2C busJoy Chu1-0/+20
[ Upstream commit 52952aca9d6148e7ae3c3725ae43d48e27b61357 ] Add VPD EEPROM type fix for planar VPD update. Signed-off-by: Joy Chu <joy_chu@wistron.com> [oliver: commit subject] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05platform/mihawk: Detect old system compatible stringFrederic Barrat1-1/+2
[ Upstream commit 425340bd2809808f21856d80bf76b785b87b6041 ] Newer firmware declares the system as "ibm,mihawk", but the labs are full of older installs, which were using "wistron,mihawk". Let's keep detecting the older string since it allows to run recent skiboot on older fw stack and make people's lives a little tiny bit easier. Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05npu2/hw-procedures: Remove assertion from check_credits()Reza Arbab1-9/+6
[ Upstream commit 24664b48642845d620e225111bf6184f3c102f60 ] The RX clock mux in the NVLink PHY can glitch, which will manifest in hard to diagnose behavior--at best, a checkstop during the first link traffic. The only reliable way we found to detect this was by checking for a discrepancy in the credits we expect to receive during link training. Since the time the check was added, we've found that * Commit ac6f1599ff33 ("npu2: hw-procedures: Add phy_rx_clock_sel()") does work around the original glitch. * Asserting is too harsh. Before root cause was established, it was thought this could have been a manufacturing defect and we wanted to loudly fail hardware acceptance boot cycle tests. * It seems there is a valid situation in which credits are off from the expected value. During GPU hot reset, a CPU prefetch across the link can affect the credit count before we check. Given all of the above, remove the assert(). Cc: stable # 6.0.x Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05npu2-opencapi: Fix integer promotion bug in LPC allocationAndrew Donnellan1-1/+1
[ Upstream commit e85e2e2b8b0a4dd96e10343df19df1a161cd6aac ] If you try to allocate an amount of LPC memory that's not a power of 2, we round the value up to the nearest power of 2. By the magic of C, "1 << n" gets treated as an int, even if you're assigning it to a uint64_t. Change 1 to 1ULL to fix this. (C, it's great.) Reported-by: Alastair D'Silva <alistair@d-silva.org> Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-12-05hw/port80: Squash No SYNC errorOliver O'Halloran2-4/+10
[ Upstream commit 6cf9ace9d69dcb5c37b328625132bc5c9624b778 ] On Aspeed BMCs can be configured to route LPC IO address 0x80 to a GPIO port. Some systems use this to implement a boot progress indicator, but not all of them. There's no easy way to tell if this has been setup or not and if it hasn't we get an LPC SYNC no-response error from out LPC master. When we reach Linux and enable interrupts this results in this spurious error being printed: LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010082 lpc_probe_write() is intended to catch situations where the peripherial being written to might not be configured, so use that instead of lpc_outb() to squash the error. Cc: Ranga <stewart@flamingspork.com> Cc: Andrew Jeffery <andrew@aj.id.au> Acked-by: Andrew Jeffery <andrew@aj.id.au> [oliver: fixed the test] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23skiboot v6.5.1 release notesv6.5.1Vasant Hegde1-0/+27
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23core/ipmi: Fix use-after-freeVasant Hegde1-3/+11
[ Upstream commit d75e82dbfbb9443efeb3f9a5921ac23605aab469 ] Commit f01cd77 introduced backend poller() for ipmi message. But in some corner cases its possible that we endup calling poller() after freeing ipmi message. Thread 1 : ipmi_queue_msg_sync() Waiting for ipmi sync message to complete Thread 2 : bt_poll() -> ipmi_cmd_done() -> callback handler -> free message Oliver hit this issue during fast-reboot test with skiboot DEBUG build. In debug build we poision the memory after free. That helped us to catch this issue. [ 460.295570781,3] *********************************************** [ 460.295773157,3] Fatal MCE at 0000000030035cb4 .ipmi_queue_msg_sync+0x110 MSR 9000000000201002 [ 460.295887496,3] CFAR : 0000000030035ce8 MSR : 9000000000000000 [ 460.295956419,3] SRR0 : 0000000030035cb4 SRR1 : 9000000000201002 [ 460.296035015,3] HSRR0: 0000000030012624 HSRR1: 9000000002803002 [ 460.296102413,3] DSISR: 00000008 DAR : 99999999999999d1 [ 460.296169710,3] LR : 0000000030035ce4 CTR : 0000000030002880 [ 460.296248482,3] CR : 28002422 XER : 20040000 [ 460.296336621,3] GPR00: 0000000030035ce4 GPR16: 00000000301d36d8 [ 460.296415449,3] GPR01: 0000000031c133d0 GPR17: 00000000300f5cd8 [ 460.296482811,3] GPR02: 0000000030142700 GPR18: 0000000030407ff0 [ 460.296550265,3] GPR03: 0000000000000100 GPR19: 0000000000000000 [ 460.296629041,3] GPR04: 0000000028002424 GPR20: 0000000000000000 [ 460.296696369,3] GPR05: 0000000020040000 GPR21: 0000000030121d73 [ 460.296820977,3] GPR06: c000001fffffd480 GPR22: 0000000030121dd2 [ 460.296888226,3] GPR07: c000001fffffd480 GPR23: 0000000030613400 [ 460.296978218,3] GPR08: 0000000000000001 GPR24: 0000000000000001 [ 460.297056871,3] GPR09: 9999999999999999 GPR25: 0000000031c13960 [ 460.297124647,3] GPR10: 0000000000000000 GPR26: 0000000000000004 [ 460.297203811,3] GPR11: 0000000000000000 GPR27: 0000000000000003 [ 460.297271250,3] GPR12: 0000000028002424 GPR28: 0000000030613400 [ 460.297339026,3] GPR13: 0000000031c10000 GPR29: 0000000030406b50 [ 460.297417605,3] GPR14: 00000000300f58f8 GPR30: 0000000030406b40 [ 460.297485176,3] GPR15: 00000000300f58d8 GPR31: 00000000309249c8 Reported-by: Oliver O'Halloran <oohall@gmail.com> Fixes: f01cd77 (ipmi: ensure forward progress on ipmi_queue_msg_sync()) Cc: skiboot-stable@lists.ozlabs.org # v6.3+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Fix unaligned writes to ECC partitionsAndrew Jeffery4-7/+65
[ Upstream commit 96ddf4b5d3e18ed9fbf726d0dabebe1ec2424fac ] Currently trying to clear a gard record results in errors: $ ./opal-gard -pef part create /sys0/node0/proc1 $ ./opal-gard -pef part list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 ========================================================= $ ./opal-gard -pef part clear 00000001 Clearing gard record 0x00000001...done ECC: uncorrectable error: ffffff00ffffffff ff libflash ecc invalid $ A little wrapper around hexdump(1) helps show where the error lies by grouping output in blocks of nine such that the last byte is the ECC byte: $ declare -f ecchd ecchd () { hexdump -e '"%08_ax""\t"' -e '9/1 "%02x ""\t""|"' -e '9/1 "%_p""|\n"' "$@" } A clean GARD partition displays as: $ ecchd part 0002c000 ff ff ff ff ff ff ff ff 00 |.........| * 00030ffb ff ff ff ff ff |.....| $ Dumping the corrupt partition shows: $ ecchd part 0002c000 ff ff ff ff ff ff ff ff 00 |.........| * 0002c024 ff ff ff ff ff ff ff ff ff |.........| 0002c02d ff ff ff 00 ff ff ff ff ff |.........| * 0002c051 ff ff ff 00 ff ff ff ff 00 |.........| 0002c05a ff ff ff ff ff ff ff ff 00 |.........| * 00030ffb ff ff ff ff ff |.....| $ blocklevel_smart_write() turned out to not be quite as smart as it thought it was in that any unaligned write to ECC protected partitions aligned the calculated ECC values to the start of the write buffer and not relative to the start of the partition. Fixes: 29d1e6f78109 ("libflash/blocklevel: add a smart write function which wraps up eraseing and writing") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Tidy local variable declarationsAndrew Jeffery1-4/+7
[ Upstream commit a950fd789c1ce0fbdc4f5486ccdd8301d6258ba7 ] Group them by use (and name). It's not reverse christmas tree, but it's a bit easier on the eye. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Avoid reuse of formal parametersAndrew Jeffery1-6/+13
[ Upstream commit aa52f9439b91db5949c04acb8e1d2d21c37ddba5 ] Lays the ground-work for fixing unaligned writes to ECC protected partitions. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Deny writes intersecting ECC protected regionsAndrew Jeffery1-1/+9
[ Upstream commit bdbbfcacccdb768db587ef73a2a28cf3dc8ceda9 ] Other code paths don't handle writes spanning mixed regions, and it's a headache, so deny it here too. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Avoid indirectly testing formal parametersAndrew Jeffery1-1/+1
[ Upstream commit 6867bd54c21b023b74e924abd6f4c3f1cd9959c2 ] The early-exit tests write_buf, but write_buf is assigned to buf on declaration. Test buf directly instead to avoid unnecessary indirection. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Rename size variable for clarityAndrew Jeffery1-8/+9
[ Upstream commit 518db2b216af5178a18f8b7ad06769b6acc51bcc ] We're writing in chunks, so lets make it clear that size is relative to the chunk that we're writing. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Rename write bufferAndrew Jeffery1-7/+7
[ Upstream commit 5c935e78f335597f502e1fda1911ca50bbeed561 ] The buffer is only used for ECC protected partitions, so lets call it ecc_buf for clarity. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23blocklevel: smart_write: Terminate line for debug output in no-change caseAndrew Jeffery1-0/+2
[ Upstream commit 8f204c12ebc07111c99d2059e1df0b86d7a203ae ] Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23gard: Fix data corruption when clearing single recordsAndrew Jeffery4-1/+31
[ Upstream commit e08fee36a5196ca094c5fb7dd1421279b146531f ] Attempting to clear a specific gard record leads to corruption of the target record rather than the expected removal: $ ./opal-gard -f romulus.pnor list No GARD entries to display $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1 $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 ========================================================= $ ./opal-gard -f romulus.pnor clear 00000001 Clearing gard record 0x00000001...done $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Unknown | /Sys0/Node0/Proc1 ========================================================= The GUARD partition needs to be compacted when clearing records as the end of the list is a sentinel represented by the erased-flash state. The compaction strategy is to read the trailing records and write them to the offset of the record to be removed, followed by writing the sentinel record at the offset of what was previously the last valid record. The corruption occurs due to incorrect calculation of the offset at which the trailing records will be written. Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org> Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23core/platform: Actually disable fast-reboot on P8Oliver O'Halloran1-3/+5
[ Upstream commit 923b5a5342a7a37bd376327e35c7fcb98138d41c ] There was an attempt. It was not successful. Fixes: 14f709b8eeda ("Disable fast-reset for POWER8") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23xive: fix return value of opal_xive_allocate_irq()Cédric Le Goater1-1/+1
[ Upstream commit e97391ae2bb5a146a5041453f9185326654264d9 ] When the maximum number of interrupts per chip is reached, xive_try_allocate_irq() returns an internal XIVE error: XIVE_ALLOC_NO_SPACE. But its value 0xffffffff is interpreted as a positive value by its caller opal_xive_allocate_irq() and not as an error. opal_xive_allocate_irq() returns this value to Linux which also considers 0xffffffff as a valid interrupt number and tries to get the interrupt characteritics using opal_xive_get_irq_info(). This OPAL calls finally fails leading to all sort of errors on the host which is not prepared for such a scenario. Code impacted are the IPI setup and the both XIVE KVM devices. Fix by returning OPAL_RESOURCE from xive_try_allocate_irq() which is consistent with the other errors returned by this routine. This fixes the behavior in opal_xive_allocate_irq() and in Linux. A workaround could be introduced in Linux to consider 0xffffffff as a OPAL_RESOURCE value. This assumption is valid with the current XIVE IRQ number encoding. Fixes: 07946e68f47a ("xive: Add interrupt allocator") Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> [oliver: Added fixes tag] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23MPIPL: struct opal_mpipl_fadump doesn't needs to be packedVasant Hegde1-1/+1
[ Upstream commit cc02885770f63d00d2483be4e1627d2cadfffa8a ] [CC] core/opal-dump.o core/opal-dump.c: In function ‘post_mpipl_get_opal_data’: core/opal-dump.c:471:11: warning: taking address of packed member of ‘struct opal_mpipl_fadump’ may result in an unaligned pointer value [-Waddress-of-packed-member] 471 | region = opal_mpipl_data->region; | ^~~~~~~~~~~~~~~ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-10-23core/flash: Validate secure boot content sizeOliver O'Halloran1-0/+11
[ Upstream commit e2018d2a3d46491dc2abd758c67c1937910b3a67 ] Currently we don't check if the secure boot payload size fits within the partition that we are reading it from. This results in strange failures later on in boot if we cross the boundary between an ECCed and a non-ECCed partition since libflash does not support reading from regions with mixed ECC status. Without this patch: blocklevel_read: Can't cope with partial ecc FLASH: failed to read content size 15728640 BOOTKERNEL partition, rc 3 With: FLASH: Cannot load BOOTKERNEL. Content is larger than the partition Cc: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Acked-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-08-16skiboot 6.5 release notesv6.5Oliver O'Halloran1-0/+20
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16ipmi: Use standard MIN() macro definitionJordan Niethe1-3/+1
There is a MIN() macro definition in skiboot.h. Remove the redundant definition from here and use that one. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16hw/phb4: Use standard MIN/MAX macro definitionsJordan Niethe1-6/+3
The max() macro definition incorrectly returns the minimum value. The max() macro is used to ensure that PERST has been asserted for 250ms and that we wait 100ms seconds for the ETU logic in the CRESET_START PHB4 PCI slot state. However, by returning the minimum value there is no guarantee that either of these requirements are met. Correct macro definitions for MIN and MAX are already provided in skiboot.h. Remove the redundant/incorrect versions here and switch to using the standard ones. Fixes: 70edcbb4b39d ("hw/phb4: Skip FRESET PERST when coming from CRESET") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16hw/phb4: Prevent register accesses when in resetOliver O'Halloran2-0/+11
While the the ETU is in reset we cannot access any of the PHB registers. If a PHB register is accessed via the XSCOM indirect interface then we'll cause an ETU reset error which may prevent the PHB from being re-initialised once the reset is lifted. Prevent register accesses while in reset by adding a flag that is set while the ETU reset bit is high and checking that flag in the XSCOM (ASB) backdoor register access path. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16npu: Fix device binding error messageReza Arbab1-2/+6
Helping someone troubleshoot a Garrison machine, I noticed some of the BDFs printed here are wrong: npu_dev_bind_pci_dev: No PCI device for NPU device 0004:00:00.0 to bind to. If you expect a GPU to be there, this is a problem. npu_dev_bind_pci_dev: No PCI device for NPU device 0004:00:01.0 to bind to. If you expect a GPU to be there, this is a problem. npu_dev_bind_pci_dev: No PCI device for NPU device 0004:00:04.0 to bind to. If you expect a GPU to be there, this is a problem. npu_dev_bind_pci_dev: No PCI device for NPU device 0004:00:05.0 to bind to. If you expect a GPU to be there, this is a problem. Change the prlog() call to print them correctly. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16npu3: Expose remaining ATSD launch registersReza Arbab2-9/+13
List all 16 ATSD registers in the device tree, not just the first 8. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16npu3: Initialize NPU3_SNP_MISC_CFG0Reza Arbab2-0/+11
Enable powerbus snooping here, or else MMIO to any NTL/NDL registers will cause a checkstop. This was not an issue in Simics simulation, but discovered rather quickly during bringup on a real Axone chip. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16npu3: Rename NPU3_SM_MISC_CFGn register macrosReza Arbab2-12/+12
The SM blocks have multiple MISC_CFG registers. For example, there are both CS.SM0.MCP.MISC.CONFIG0 and CS.SM0.SNP.MISC.CONFIG0. Rename our macro for the former to more clearly reflect this and avoid a clash when the latter is added. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16pci: Use a macro for accessing PCI BDF Function NumberJordan Niethe8-17/+18
Currently when the Function Number bits of a BDF are needed the bit operations to get it are free coded. There are many places where the Function Number is used, so make a macro to use instead of free coding it everytime. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16pci: Use a macro for accessing PCI BDF Device NumberJordan Niethe9-15/+16
Currently when the Device Number bits of a BDF are needed the bit operations to get it are free coded. There are many places where the Device Number is used, so make a macro to use instead of free coding it everytime. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16pci: Use a macro for accessing PCI BDF Bus NumberJordan Niethe9-18/+21
Currently when the Bus Number bits of a BDF are needed the bit operations to get it are free coded. There are many places where the Bus Number is used, so make a macro to use instead of free coding it everytime. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16core/pci-dt-slots: Remove duplicate PCIDBG() definitionJordan Niethe1-6/+0
PCIDBG() is already defined in pci.h, which is included by pci-dt-slot.c. It should not be defined again so remove this definition. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16include/xscom: Use the name EQ rather than EPOliver O'Halloran2-11/+15
The P9 pervasive spec uses the term "EP" to refer to the combination of an EQ chiplet and its two child EX chiplets. Nothing else seems to use the term EP and in Skiboot all the uses of the XSCOM_ADDR_P9_EP() macro are to translate the address of EQ specific SCOM registers. Change the name of our address calculation macros to match the general terminology to make what it does clearer. Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16include/xscom: Remove duplicate p9 definitionsOliver O'Halloran1-5/+0
These are already defined in xscom-p9-regs.h Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16include/xscom: Remove duplicate p8 definitionsOliver O'Halloran1-40/+0
Duplicates of what's already in xscom-p8-regs.h Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-15MPIPL: Add documentationVasant Hegde6-0/+250
Document MPIPL device tree and OPAL APIs. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> [oliver: rebased] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-15MPIPL: Prepare architected registers data tagVasant Hegde1-0/+37
Post MPIPL kernel needs saved CPU register details to create vmcore/opalcore. This patch prepares CPU register data tag and add it to tags list. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [oliver: rebased] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-15MPIPL: Reserve memory to capture architected registers dataVasant Hegde4-1/+53
- Split SPIRAH memory to accommodate architected register ntuple. Today we have 1K memory for SPIRAH and it uses 288 bytes. Lets split this into two parts : SPIRAH (756 bytes) architected register memory (256 bytes) - Update SPIRAH architected register ntuple - Calculate memory required to capture architected registers data Ideally we should use HDAT provided data (proc_dump_area->thread_size). But we are not getting this data during boot. Hence lets reserve fixed memory for architected registers data collection. - Add architected registers destination memory to reserve-memory DT node. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [oliver: rebased] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-15MPIPL: Clear tags and metadataVasant Hegde1-0/+6
Post dump process, kernel sends FREE_PRESERVE_MEMEORY notification to OPAL. OPAL will clear metadata section and tags. Subsequent opal_mpipl_query_tag() call will return OPAL_EMPTY. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [oliver: rebased] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>