aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-04xive/p9: obsolete OPAL_XIVE_IRQ_SHIFT_BUG flagsCédric Le Goater4-9/+3
These were needed to workaround HW bugs in PHB4 LSIs of POWER9 DD1.0 processors. HW395455 P9/PHB4: Wrong Interrupt ESB CI Load Opcode Location in 64K page mode Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: obsolete OPAL_XIVE_IRQ_*_VIA_FW flagsCédric Le Goater2-14/+2
These were needed to workaround HW bugs in PHB4 LSIs of POWER9 DD1.0 processors. Keep the flags in case of a similar issue in the next generation of the XIVE logic and keep it also for Linux which still has handlers in its XIVE layer. However, there is no need to keep the code in POWER9 XIVE driver. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: remove dead codeCédric Le Goater1-4/+0
Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: remove code not using block group modeCédric Le Goater1-208/+1
block group mode is now required, it can not be disabled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: remove code not using indirect modeCédric Le Goater1-111/+12
An indirect table is a one page array of XIVE VSDs pointing to subpages containing XIVE virtual structures: NVTs, ENDs, EASs or ESBs. The OPAL XIVE driver uses direct tables for the EAS and ESB tables. The number of interrupts being 1M, the tables are respectivelly 256K and 8M per chip. We want the EAS lookup to be fast so we keep this table direct. The NVT and END are bigger structures (64 and 32 bytes). If the table were direct, we would use 32M and 32M of OPAL memory per chip. They are indirect today and Linux allocates the pages on behalf of OPAL when a new indirect subpage is needed. We plan to increase the NVT space and END space in P10. Remove USE_INDIRECT ifdef and associated code not used anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: use MMIO access for VC_EQC_CONFIGCédric Le Goater1-1/+1
There is no reason to issue loads on XSCOM when syncing the interrupt controller. All should be in place to use MMIOs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: minor cleanup of the interfaceCédric Le Goater6-8/+8
The XIVE driver exposes an API to the core OPAL layer and to other OPAL drivers. This is a minor cleanup preparing ground for future XIVE logic. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: introduce header files for the registersCédric Le Goater4-459/+490
This is moving the definitions of the registers of the P9 XIVE interrupt controller and the P9 XIVE internal structures in a specific header file and moving the definitions related to the thread interrupt context area to a common file. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04npu3: Make SALT CMD_REG writableReza Arbab1-3/+4
CMD_REG should be writable, not read-only. Fix this, initializing it with a default "unset" value (0xffffffff). Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04npu3: Improve SALT log outputReza Arbab1-3/+5
Add a log line for when the PPE indicates it's not in the ready state, and make all the SALT lines start with a capital to look nicer. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04npu3: Add ibm, ioda2-npu3-phb to compatible propertyReza Arbab1-2/+7
Though they are currently identical to the OS, it may become necessary to distinguish npu3 phbs from npu2 ones at some point. Add a unique string to the compatible property. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04platforms/swift: Remove spurious error messageReza Arbab1-5/+0
npu3_chip_possible_gpus() works by dividing the number of NVLink-mode bricks by the number of bricks connecting a single GPU. In a system with no GPUs, the latter value is unknown, so the function returns zero and we trip a somewhat misleading error message. The code afterward is safe to execute in any case, so there's no need to return either. Remove the check entirely. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-24skiboot v6.5.1 release notesVasant Hegde1-0/+27
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-24skiboot v6.3.4 release notesVasant Hegde1-0/+29
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22gard: Add support to run gard tests on FSP platformVasant Hegde2-6/+11
gard tool is not supported on FSP based system. But we can still run gard tests on FSP based system. Acked-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/ipmi: Remove redundant variableVasant Hegde1-7/+3
Previous commit d75e82dbf introduced unnecessary variable/check. Remove that and add barrier after setting sync_msg to NULL. Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Fixes: d75e82dbf (core/ipmi: Fix use-after-free) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22chip: enable HOMER/OCC common area region in Qemu emulated PowerNV hostBalamuruhan S2-2/+3
Recent work on Qemu adds support to emulate homer memory region and occ common area region with respective device models, so remove `QUIRK_NO_PBA` to enable HOMER/OCC common area region for Qemu emulated PowerNV host. Introduce `QUIRK_QEMU` in enum proc_chip_quirks that will be used for future work. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22occ-sensor: clean dt properties if sensor is not availableBalamuruhan S1-0/+4
In `occ_sensor_init()` device tree node is created for sensor-goups and performs `occ_sensor_sanity()` check to initialize the device tree. But if there are no sensors like in Qemu, sanity check fails but still device tree populates the sensor-groups node wrongly as the node created is not cleaned up. Signed-off-by: Balamuruhan S <bala24@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Log a warning when resetting a broken deviceFrederic Barrat1-0/+4
On P9, the NPU doesn't support recovery if the link goes down unexpectedly. It was not fully verified. We mark the device as broken when we receive an error interrupt from the NPU. However, there's nothing to prevent the OS from trying to reset the device; It may or may not work, it's unsupported territory, so let's log a message to make it clear, as it could help when debugging. We haven't hit any cases where the reset goes badly enough that we'd want to prevent it, so let it go for now. We can revisit later if we have evidence that it's causing more problems than it is worth. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Handle OPAL_UNMAP_PE operation on set_pe() callbackFrederic Barrat1-1/+6
In a hot-unplug scenario, the OS will try to unmap the PE. Skiboot doesn't do anything with the linux PE for opencapi other than being a mailbox, but at least let's be consistent. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Activate PCI hotplug on opencapi slotFrederic Barrat1-4/+65
Implement the get_power_state() and set_power_state() callbacks for the opencapi slot and add properties in the device tree to mark the opencapi slot as hot-pluggable. We don't really power off/on the opencapi adapter. The slot at play here is the virtual slot associated to the virtual opencapi PHB. The real PCIe slot where the card is drawing its power from is untouched (skiboot is not even aware which PCIe slot the card is seated on). So the 'fake' power off is fencing the card and set it in reset so that the FPGA image can be updated. The 'fake' power on is not doing much, as the unfencing happens on the subsequent link training. Opencapi slots are named 'OPENCAPI-xxxx' where xxxx is the opal ID of the PHB/slot. This is meant to easily identify the slot used by an AFU device, as the AFU device names are also built around that ID. For example, the device /dev/ocxl/AFP3.0006:00:00.1.0 uses the slot OPENCAPI-0006. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Improve error reporting to the OSFrederic Barrat3-4/+29
When resetting an opencapi link, the brick will be fenced temporarily. Therefore we can't rely on the fencing state of the brick any more to check for the health of an opencapi PHB, as we could report errors if queried for a PHB state at the same time a link is being reset. Instead, we flag the device as 'broken' when an error interrupt is received, just before raising an event to the OS. When the OS is querying for the state of a PHB, we only have to look at the 'broken' attribute. Note that there's no recovery possible on P9 when an error interrupt is received unexpectedly, as recovery is not supported by hardware. So when a device/link is marked as 'broken', it stays broken. All the OS can do is log the error and notify the drivers. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Detect PHY reset errorsFrederic Barrat3-5/+17
PHY reset can fail! Though past problems are now fixed, let's handle any future failure. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Simplify freset statesFrederic Barrat1-13/+3
Let's get rid of one transitional state, since there's no need to pause in between releasing the reset signals of the ODL and the adapter. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Tweak fundamental reset sequenceFrederic Barrat2-24/+26
Modify slightly the ordering of a few steps in our init sequence on fundamental reset, so that it can be called from the OS, when the link is already up: - when the card is reset, the link goes down, so we need to fence the brick to prevent errors propagating to the NPU and OS - since fencing and unfencing don't require any delay, let's also fence/unfence during the very first reset at boot. It's useless but doesn't hurt and keep the code simpler. - resetting the PHY must be done a bit later, while fenced and the ODL and DLx in reset Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Rework link training timeoutFrederic Barrat2-4/+7
Opencapi link state should be polled for up to 3 seconds. Current code assumes a tight retry loop during fundamental reset at boot, which is not going to be true on link retraining. So update the timeout detection code to use a timebase instead of a simple retry count which could be way too long. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-hw-procedures: Fix link retraining on resetFrederic Barrat1-0/+16
Link retraining was showing reliability problems due to some opencapi-only settings not being optimized. This patch updates some extra PHY state, as agreed with the PHY team. Though they mostly impact link retraining behavior, they should also be set at boot. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-opencapi: Make sure the PCI slot has the proper IDFrederic Barrat1-1/+2
The PCI slot created for the opencapi PHB didn't have its ID properly defined because it was created before we assign an ID to the PHB. Simply switch the PCI slot creation and PHB registration calls to fix it. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22npu2-hw-procedures: Move some opencapi PHY settings in one-off initFrederic Barrat1-19/+16
The PHY_RX_AC_COUPLED and PHY_RX_SPEED_SELECT for opencapi are group settings for the obus. They should be set in the one-off PHY init function at boot and not on the link reset path, as they theoretically impact more than one link. Since we cannot mix link type and/or speed on an optical bus, it has no pratical impact, it just looks cleaner. Also use the OCAPIINF macro for the associated traces. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/pci: Fix scan of devices for opencapi slotsFrederic Barrat1-5/+15
Opencapi devices are found directly under the PHB and the PHB slot doesn't have an associated PCI device (root complex). So when scanning a PHB, devices are added directly under the PHB, like it's done at boot time. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/pci: Train link of PHB slots when hotpluggingFrederic Barrat1-22/+85
The link of PHB slots must be trained after powering on. This can be done by calling the fundamental reset callback of the slot. We could force a reset for all the slots and have a common path in set_power_state(). But this patch only resets the PHB slot. Some slot implementations do a power cycle during fundamental reset, so calling a reset after powering on would repeat that operation. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/pci: Use proper phandle during hotplug for PHB slotsFrederic Barrat1-6/+15
PHB slots don't have an associated device (slot->pd = NULL). They were not used by the PCI hotplug framework so far, but with opencapi virtual PHBs, that's changing. With opencapi, devices are directly under the PHB (no root complex or intermediate bridge) and the slot used for hotplug is the PHB slot. This patch uses the proper phandle when replying asynchronously to the OS when using a PHB slot. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/pci: Add missing lock in set_power_timerFrederic Barrat2-0/+12
set_power_timer() was not using any lock, though it alters the slot state and devices found under it. There's a remote possibility that set_power_timer() is called through check_timers() by a thread already holding the phb lock, so we try to take the lock but yield and rearm the timer if somebody else is already owning it. There really shouldn't be any contention here. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-22core/pci: Refactor common paths on slot hotplugFrederic Barrat1-17/+26
Refactor code executed to remove or rescan devices when a slot power state changes, synchronously or asynchronously through a timer callback. It will be more useful in a future patch. No functional changes. Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-15core/ipmi: Fix use-after-freeVasant Hegde1-3/+11
Commit f01cd77 introduced backend poller() for ipmi message. But in some corner cases its possible that we endup calling poller() after freeing ipmi message. Thread 1 : ipmi_queue_msg_sync() Waiting for ipmi sync message to complete Thread 2 : bt_poll() -> ipmi_cmd_done() -> callback handler -> free message Oliver hit this issue during fast-reboot test with skiboot DEBUG build. In debug build we poision the memory after free. That helped us to catch this issue. [ 460.295570781,3] *********************************************** [ 460.295773157,3] Fatal MCE at 0000000030035cb4 .ipmi_queue_msg_sync+0x110 MSR 9000000000201002 [ 460.295887496,3] CFAR : 0000000030035ce8 MSR : 9000000000000000 [ 460.295956419,3] SRR0 : 0000000030035cb4 SRR1 : 9000000000201002 [ 460.296035015,3] HSRR0: 0000000030012624 HSRR1: 9000000002803002 [ 460.296102413,3] DSISR: 00000008 DAR : 99999999999999d1 [ 460.296169710,3] LR : 0000000030035ce4 CTR : 0000000030002880 [ 460.296248482,3] CR : 28002422 XER : 20040000 [ 460.296336621,3] GPR00: 0000000030035ce4 GPR16: 00000000301d36d8 [ 460.296415449,3] GPR01: 0000000031c133d0 GPR17: 00000000300f5cd8 [ 460.296482811,3] GPR02: 0000000030142700 GPR18: 0000000030407ff0 [ 460.296550265,3] GPR03: 0000000000000100 GPR19: 0000000000000000 [ 460.296629041,3] GPR04: 0000000028002424 GPR20: 0000000000000000 [ 460.296696369,3] GPR05: 0000000020040000 GPR21: 0000000030121d73 [ 460.296820977,3] GPR06: c000001fffffd480 GPR22: 0000000030121dd2 [ 460.296888226,3] GPR07: c000001fffffd480 GPR23: 0000000030613400 [ 460.296978218,3] GPR08: 0000000000000001 GPR24: 0000000000000001 [ 460.297056871,3] GPR09: 9999999999999999 GPR25: 0000000031c13960 [ 460.297124647,3] GPR10: 0000000000000000 GPR26: 0000000000000004 [ 460.297203811,3] GPR11: 0000000000000000 GPR27: 0000000000000003 [ 460.297271250,3] GPR12: 0000000028002424 GPR28: 0000000030613400 [ 460.297339026,3] GPR13: 0000000031c10000 GPR29: 0000000030406b50 [ 460.297417605,3] GPR14: 00000000300f58f8 GPR30: 0000000030406b40 [ 460.297485176,3] GPR15: 00000000300f58d8 GPR31: 00000000309249c8 Reported-by: Oliver O'Halloran <oohall@gmail.com> Fixes: f01cd77 (ipmi: ensure forward progress on ipmi_queue_msg_sync()) Cc: skiboot-stable@lists.ozlabs.org # v6.3+ Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Fix unaligned writes to ECC partitionsAndrew Jeffery4-7/+65
Currently trying to clear a gard record results in errors: $ ./opal-gard -pef part create /sys0/node0/proc1 $ ./opal-gard -pef part list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 ========================================================= $ ./opal-gard -pef part clear 00000001 Clearing gard record 0x00000001...done ECC: uncorrectable error: ffffff00ffffffff ff libflash ecc invalid $ A little wrapper around hexdump(1) helps show where the error lies by grouping output in blocks of nine such that the last byte is the ECC byte: $ declare -f ecchd ecchd () { hexdump -e '"%08_ax""\t"' -e '9/1 "%02x ""\t""|"' -e '9/1 "%_p""|\n"' "$@" } A clean GARD partition displays as: $ ecchd part 0002c000 ff ff ff ff ff ff ff ff 00 |.........| * 00030ffb ff ff ff ff ff |.....| $ Dumping the corrupt partition shows: $ ecchd part 0002c000 ff ff ff ff ff ff ff ff 00 |.........| * 0002c024 ff ff ff ff ff ff ff ff ff |.........| 0002c02d ff ff ff 00 ff ff ff ff ff |.........| * 0002c051 ff ff ff 00 ff ff ff ff 00 |.........| 0002c05a ff ff ff ff ff ff ff ff 00 |.........| * 00030ffb ff ff ff ff ff |.....| $ blocklevel_smart_write() turned out to not be quite as smart as it thought it was in that any unaligned write to ECC protected partitions aligned the calculated ECC values to the start of the write buffer and not relative to the start of the partition. Fixes: 29d1e6f78109 ("libflash/blocklevel: add a smart write function which wraps up eraseing and writing") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Tidy local variable declarationsAndrew Jeffery1-4/+7
Group them by use (and name). It's not reverse christmas tree, but it's a bit easier on the eye. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Avoid reuse of formal parametersAndrew Jeffery1-6/+13
Lays the ground-work for fixing unaligned writes to ECC protected partitions. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Deny writes intersecting ECC protected regionsAndrew Jeffery1-1/+9
Other code paths don't handle writes spanning mixed regions, and it's a headache, so deny it here too. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Avoid indirectly testing formal parametersAndrew Jeffery1-1/+1
The early-exit tests write_buf, but write_buf is assigned to buf on declaration. Test buf directly instead to avoid unnecessary indirection. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Rename size variable for clarityAndrew Jeffery1-8/+9
We're writing in chunks, so lets make it clear that size is relative to the chunk that we're writing. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Rename write bufferAndrew Jeffery1-7/+7
The buffer is only used for ECC protected partitions, so lets call it ecc_buf for clarity. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14blocklevel: smart_write: Terminate line for debug output in no-change caseAndrew Jeffery1-0/+2
Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14gard: Fix data corruption when clearing single recordsAndrew Jeffery4-1/+31
Attempting to clear a specific gard record leads to corruption of the target record rather than the expected removal: $ ./opal-gard -f romulus.pnor list No GARD entries to display $ ./opal-gard -f romulus.pnor create /sys0/node0/proc1 $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Manual | /Sys0/Node0/Proc1 ========================================================= $ ./opal-gard -f romulus.pnor clear 00000001 Clearing gard record 0x00000001...done $ ./opal-gard -f romulus.pnor list ID | Error | Type | Path --------------------------------------------------------- 00000001 | 00000000 | Unknown | /Sys0/Node0/Proc1 ========================================================= The GUARD partition needs to be compacted when clearing records as the end of the list is a sentinel represented by the erased-flash state. The compaction strategy is to read the trailing records and write them to the offset of the record to be removed, followed by writing the sentinel record at the offset of what was previously the last valid record. The corruption occurs due to incorrect calculation of the offset at which the trailing records will be written. Cc: Skiboot Stable <skiboot-stable@lists.ozlabs.org> Fixes: 5616c42d900a ("libflash/blocklevel: Make read/write be ECC agnostic for callers") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14core/init: Checksum romem after patching out trapsOliver O'Halloran1-2/+4
Currently we checksum the read-only parts of skiboot's memory just before loading and booting petitboot. Commit 9ddc1a6bfaef ("core/util: trap based assertions") modifies the .text after this point since it needs to disable the trap instructions that we use to trigger an abort() before entering the kernel. We can fix this by moving the checksum to after the point where the traps are patched out. We could do the patching sooner, but since load_and_boot_kernel() is a fairly complex function it's perferable to keep boot-time assertion infrastructure active until just before we enter the kernel. Reported-by: Carol L Soto <clsoto@us.ibm.com> Tested-by: Carol L Soto <clsoto@us.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Fixes: 9ddc1a6bfaef ("core/util: trap based assertions") Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-14core/init: Don't checksum MPIPL data areasOliver O'Halloran3-1/+15
Right now the romem checksum runs from _start until the start of our data area. This spans the area used for the MPIPL data structures since they're included in the SPIRA-H data area. Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-03core/exceptions.c: do not include handler code in exception backtraceNicholas Piggin3-6/+31
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-03core/util: branch-to-NULL assert for ELFv2 ABINicholas Piggin2-6/+17
The ELFv1 branch to NULL catcher puts a function descriptor at 0 which points to a function that asserts. For ELFv2, put a trap at address 0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: commit message prefix] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-03core/util: trap based assertionsNicholas Piggin10-35/+130
Using traps for assertions like Linux does gives a few advantages: - The asm code leading to the failure condition is nicer. - The interrupt gives a clean snapshot of machine state to dump. The difficulty with using traps for this in OPAL is that the runtime component will not deal well with the OS taking the 0x700 interrupt caused by a trap in OPAL. The long term goal is to improve the ability of the OS to inspect and debug OPAL at runtime. For now though, the traps are patched out before passing control to the OS, and the assert falls through to in-line failure handling. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: commit prefix, added and renamed the FWTS label, fix tests] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-03core/exceptions.c: rearrange code to allow more interrupt typesNicholas Piggin1-4/+12
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>