aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-06-27opal-ci: Drop ubuntu-20.04, add ubuntu-24.04HEADmasterReza Arbab3-4/+4
Standard support for Ubuntu 20.04 ended on May 31, 2025. Remove it and add Ubuntu 24.04. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-06-27opal-ci: Drop fedora40, add fedora42Reza Arbab4-3/+3
Fedora 40 has reached end-of-life. Remove it and add Fedora 42. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz>
2025-05-16libstb/(create|print)-container: Enable custom ssl dirReza Arbab1-2/+7
Respect SSL_DIR if it is set, to use ssl headers and libs that are in a nonstandard location. When skiboot is built by op-build, the system ssl installation is being used instead of the buildroot one. This change will let us fix that. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-05-16libstb: Add print-container to `make clean`Reza Arbab1-3/+3
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-05-16opal-ci: Force mambo rpm install on fedora-rawhideReza Arbab1-1/+1
Starting with RPM v6, "packages built with RPM < 4.14.0 cannot be verified due to their use of weak, obsolete MD5 and SHA1 digests." [1] So, our ancient p9 mambo package is now failing to install on fedora-rawhide. Use the suggested workaround of setting %_pkgverify_flags to 0 to restore the old behavior. [1] https://rpm.org/releases/6.0.0#compatibility-notes Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz>
2025-05-16uart: Handle read failure case in UART communicationAbhishek Singh Tomar1-1/+7
Added logic to handle cases where the Line Status Register (LSR) reads 0xFF, which may indicate an error in reading the register through LPC or the presence of multiple simultaneous UART errors. previously, This false read of set bit lead to soft lock or hand in older production systems. In such scenarios, processing data read/write operations does not make sense. The function now returns `false` to signal the failure and halt further operations. Signed-off-by: Abhishek Singh Tomar <abhishek@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-05-16uart: Fix uninitialised comparisonAbhishek Singh Tomar1-1/+1
According to doc/opal-api/opal-console-read-write-1-2.rst, the length argument of OPAL_CONSOLE_WRITE_BUFFER_SPACE is only used to return a value. Indeed, the API is called twice in the kernel code, and __length remains uninitialized in both cases. This can lead to a hang/softlock issue in older hardware. Eliminate the problematic comparison which uses the uninitialized value. Fixes: 6bf21350da32 ("uart: Drop console write data if BMC becomes unresponsive") Signed-off-by: Abhishek Singh Tomar <abhishek@linux.ibm.com> Reviewed-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18ipmi: Improve handling of time errorsNicholas Piggin1-27/+90
When BMC is in NTP mode, SET_SEL_TIME returns IPMI wrong state errors. This better tracks and returns IPMI errors from OPAL_RTC_WRITE, which prevents Linux from continuing to retry this non-transient failure. Could the BMC be switched to non-NTP mode some time after the OS is up? The host could see the OPAL_WRONG_STATE return and deal with this if necessary. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18ipmi: Return error from ipmi_queue_msg_syncNicholas Piggin3-5/+9
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18astbmc/ipmi: Improve Set Enables handlingNicholas Piggin2-15/+58
The IPMI Set Enables command is supposed to do a read-modify-write operation to change bits if it is not coded specifically for the system. Since the various BMCs supported by astbmc platform code (e.g., QEMU and OpenBMC) are a bit different and subject to change, it's safer to set bits with RMW. Then bits should be set one by one to help isolate failures. And the Set Enables command is changed to run synchronously so that host/BMC behaviour is a bit more deterministic when setting up IPMI. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18hw/xive: Warn if pushing HW context to already enabled CAMNicholas Piggin2-43/+47
The push-context operation is not defined if a context is already valid, it should only be performed if the CAM is pulled. Add a check to ensure the TIMA reset was performed properly before pushing a context. QEMU does not yet model the reset via PTER toggle correctly, so this causes some noise in boot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18core/timers: Try to process all poll timersNicholas Piggin2-9/+12
Poll timers are not delay based and have no kind of ordering, so processing does not have to stop if a busy timer is encountered. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18core/timers: Fix running-timer delayNicholas Piggin1-11/+10
When timer run code encounters an alreay-running timer, it has to stop processing and run them later. In the case of poll timers the SBE timer is scheduled for a minimum-delay, and for delay timers nothing is done. This looks backwards: poll timers do not get called from the SBE interrupt so that delay is pointless, whereas it is helpful for delay timers to ensure they're processed again soon. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18hw/lpc: Fix firmware memory space access for devices > 0Nicholas Piggin1-13/+20
The LPC firmware memory space is 256MB in size, and it may select up to 16 devices with the IDSEL field. OPB addresses FW space as a 32-bit value with the top 4 bit selecting the device and the bottom 28 addressing the FW memory space. Therefore the top bits should ignored when calculating the offset into the FW window. Fix this by allowing lpc_opb_prepare() to adjust the address directly and correctly mask it. Now there's no need to return opb_base to the caller either, fold that in at the same time. This bug could be observed with QEMU's PNOR implementation that placed some of the PNOR in device 1, though that has been changed in QEMU 10.0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18hw/bt: Add FIFO buffer length capability validationNicholas Piggin1-2/+13
Add validation of BT FIFO sizes against IPMI message allocations. The BT interface capabilities command returns one less than the FIFO size, so fix this off by one error in the sanity check. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18cpu: Do not access xscom registers for guarded CPUsNicholas Piggin1-1/+1
Guarded CPUs are powered down so access to their PC xscom registers fails. This prevents the failed attempt and accompanying warnings. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-18fsp: Fix opal boot failure on systems with redundant FSPMahesh Salgaonkar1-3/+32
On systems with redundant FSP, opal detects primary/backup FSPs. However, it ends up considering Backup FSP as active_fsp and starts mailbox communication with it. This causes opal to send IPL messages to backup FSP instead of primary. Since primary FSP never receives IPL messages from OPAL, it assumes that opal failed to boot and enters into termination state. The active_fsp is set during fsp_update_links_states() function which is invoked during boot, through fsp_create_fsp(), as well as reset/reload, through fsp_reinit_fsp(). During the boot, when 2 FSPs are detected by opal, fsp_update_links_states() sets the last one as active_fsp which may not be primary FSP. Fix this issue by detecting/setting primary FSP as active_fsp during opal boot. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-04Revert "hdata/test: Build with -Wno-error=unterminated-string-initialization"Reza Arbab1-2/+0
The compile errors this ignores are now being resolved by use of the "nonstring" attribute. This reverts commit 009fd0976006d0327cf374c1ff8ae73dd4895efa. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-04Fix -Werror=unterminated-string-initialization errorsAditya Gupta3-2/+9
GCC 15 has introduced errors for "unterminated-string-initialization" Which treat any character array initialised with a string with a larger size such that the null-character is not getting included in the character array, GCC 15 gives a warning (and warnings are treated as errors in skiboot compile). This causes following errors on compiling skiboot with GCC 15: core/init.c:79:27: error: initializer-string for array of ‘unsigned char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (9 chars into 8 available) [-Werror=unterminated-string-initialization] 79 | .eye_catcher = "OPALdbug", | ^~~~~~~~~~ cc1: all warnings being treated as errors ... In file included from hdata/hdata.h:8, from hdata/spira.c:17: hdata/spira.c:35:32: error: initializer-string for array of ‘char’ truncates NUL terminator but destination lacks ‘nonstring’ attribute (7 chars into 6 available) [-Werror=unterminated-string-initialization] 35 | .hdr = HDIF_SIMPLE_HDR("PROCIN", 1, struct proc_init_data), | ^~~~~~~~ hdata/hdif.h:45:68: note: in definition of macro ‘HDIF_ID’ 45 | #define HDIF_ID(_id) .d1f0 = CPU_TO_BE16(0xd1f0), .id = _id | ^~~ hdata/spira.c:35:16: note: in expansion of macro ‘HDIF_SIMPLE_HDR’ 35 | .hdr = HDIF_SIMPLE_HDR("PROCIN", 1, struct proc_init_data), | ^~~~~~~~~~~~~~~ ... (similar errors few more times with hdata) ... cc1: all warnings being treated as errors``` Fix the errors by marking character arrays which are not supposed to be "null-terminated strings" with "nonstring" attribute, such as eye-catchers in skiboot debug descriptor and hdif header Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-04hdata: Prevent NULL dereference on duplicate entries in TPMREL sectionMahesh Salgaonkar1-0/+7
Currently if you encounter duplicate entries in TPMREL section while parsing HDAT, opal crashes with below back trace: [ 119.205498180,3] DT: dt_attach_root failed, duplicate ibm,cvc-service@40 [ 119.206975658,3] *********************************************** [ 119.208669044,3] Fatal MCE at 000000003003729c .dt_find_property+0x30 MSR 9000000000001002 [ 119.210355268,3] Cause: unknown error [ 119.211273270,3] CFAR : 0000000030037288 MSR : 9000000000001002 [ 119.212502638,3] SRR0 : 000000003003729c SRR1 : 9000000000001002 [ 119.214037362,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 119.215266730,3] DSISR: 40000000 DAR : a600607d01006b79 [...] CPU 0008 Backtrace: S: 0000000031c53980 R: 0000000030026b0c .__memalign+0x58 S: 0000000031c53a10 R: 0000000030037378 .new_property+0xb0 S: 0000000031c53aa0 R: 0000000030037778 .__dt_add_property_strings+0x58 S: 0000000031c53b40 R: 000000003010bf74 .node_stb_parse+0x414 S: 0000000031c53c30 R: 0000000030102ee4 .parse_hdat+0x20cc S: 0000000031c53e30 R: 0000000030022c04 .main_cpu_entry+0x1d0 S: 0000000031c53f00 R: 000000003000321c go_primary+0x10c --- OPAL boot --- Fix the null pointer deref and proceed with warning message instead of crashing. Also add debug prints to display all entries. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02extract-gcov: Add checksum fieldReza Arbab1-0/+3
Since gcc commit 72e0c742bd01 ("gcov: make profile merging smarter"), gcov expects there to be a checksum field after the stamp. For our purposes it's not necessary for it to be a valid value. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02extract-gcov: Fix tag lengthsReza Arbab1-3/+10
Starting with GCC 12, gcov cannot parse the .gcda files we generate: $ powerpc64le-linux-gcov-dump platforms/qemu/qemu.gcda platforms/qemu/qemu.gcda:data:magic `adcg':version `B23*' (swapped endianness) platforms/qemu/qemu.gcda:stamp 1742688024 platforms/qemu/qemu.gcda:checksum 2623079854 platforms/qemu/qemu.gcda: 01000000: 3:FUNCTION ident=1191288390, lineno_checksum=0xdb12f55c, cfg_checksum=0xf9e50e8f platforms/qemu/qemu.gcda:tag `46db12f5' is incorrectly nested platforms/qemu/qemu.gcda: 46db12f5:1559880974:UNKNOWN This is due to gcc commit 23eb66d1d46a ("gcov: Use system IO buffering"), where the length field of tags in the file changed to represent total bytes, not a count of words. Change what we write accordingly. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02extract-gcov: Add counters for GCC > 7Reza Arbab1-12/+12
The number of gcov counters can be derived from the gcc source by doing git grep -c ^DEF_GCOV_COUNTER $(git tag | grep ^releases/gcc) gcc/gcov-counter.def Add the newer GCC releases to extract-gcov. While we're at it, rewrite the preprocessor statements to use #elif for easier reading. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02gcov: Don't print 0x with %pReza Arbab1-1/+1
Remove the extra 0x from this message: [ 0.042561024,5] GCOV: gcov_info_list at 0x0x30481280 Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02gcov: Document sysfs interfaceReza Arbab1-1/+5
Add a line to the gcov documentation for what was added in commit 8d0f41e021b3 ("gcov: Add gcov data struct to sysfs"). Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02opal-ci: Remove ubuntu-18.04Reza Arbab3-42/+1
Standard support for Ubuntu 18.04 ended on May 31, 2023. Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-04-02flash: Handle nullptr dereference of system_flashAditya Gupta1-1/+10
With QEMU with NO support for MPIPL, 'p9_sbe_terminate' returns early at: /* Return if MPIPL is not supported */ if (!is_mpipl_enabled()) return; But with MPIPL supported in QEMU, 'p9_sbe_terminate' continues further and calls 'flash_unregister' which causes a Machine Check due to nullptr dereference of 'system_flash': [ 13.240783728,5] Reboot: OS reported error. Performing MPIPL [ 13.241662601,5] DUMP: Crashing PIR = 0x0 [ 13.244049276,5] RESET: Fast reboot disabled: Kernel re-entered OPAL [ 1.815018] Disabling lock debugging due to kernel taint [ 1.815518] MCE: CPU0: machine check (Severe) Real address Load (bad) DAR: 0000006000000098 [Not recovered] [ 1.815544] MCE: CPU0: NIP: [0000000030040f54] 0x30040f54 [ 1.815911] MCE: CPU0: Initiator CPU [ 1.815930] MCE: CPU0: Hardware error [ 1.816110] opal: Hardware platform error: Unrecoverable Machine Check exception [ 1.816338] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G M 6.12.0-rc4+ #1 [ 1.816531] Tainted: [M]=MACHINE_CHECK [ 1.816546] Hardware name: IBM PowerNV (emulated by qemu) POWER10 0x801200 opal:v7.1 PowerNV [ 1.816629] NIP: 0000000030040f54 LR: 000000003007e528 CTR: 000000003004d75c [ 1.816646] REGS: c0000004d5e47d60 TRAP: 0200 Tainted: G M (6.12.0-rc4+) [ 1.816684] MSR: 9000000002a03002 <SF,HV,VEC,VSX,FP,ME,RI> CR: 28002284 XER: 00000000 [ 1.816863] CFAR: 000000003007e524 DAR: 0000006000000098 DSISR: 00000040 IRQMASK: 3 [ 1.816863] GPR00: 000000003007e528 0000000031c13ac0 0000000030192900 0000006000000060 [ 1.816863] GPR04: 0000000030500028 000000000000000a 0000000031c10068 0000000031c10068 [ 1.816863] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.816863] GPR12: 0000000028002284 c000000002e80000 c00000000001192c 0000000000000000 [ 1.816863] GPR16: 0000000031c10000 0000000000000000 0000000000000000 0000000000000000 [ 1.816863] GPR20: 0000000000000003 0000000000000074 0000000000000000 0000000000000000 [ 1.816863] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 1.816863] GPR28: c000000002d0e8c8 00000000301257de c000000002d0e8c8 000000000000000c [ 1.817061] NIP [0000000030040f54] 0x30040f54 [ 1.817074] LR [000000003007e528] 0x3007e528 [ 1.817165] Call Trace: [ 1.817337] Code: 00000060 80002138 e01d0d48 00000000 01000000 00000180 a602087c 3700223d 602e29e9 100001f8 91ff21f8 180069e8 <380023e9> 0000292c 34008241 280041f8 [ 13.247702490,0] OPAL: Reboot requested due to Platform error. [ 13.247857686,3] OPAL: failed to log an error [ 13.248012502,2] NVRAM: Failed to load Previously above machine check was never hit as QEMU platform didn't had MPIPL, and hence the caller 'p9_sbe_terminate' used to return early. Add null check to ignore the unregister request if system_flash is not set. Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07platform: Identify correct bmc platform based on bmc hw versionMahesh Salgaonkar1-1/+13
At the moment the generic platform sets bmc_generic() as bmc platform which does not have any support to initialize the flash and hence it fails to load petitboot kernel. [ 583.105000325,4] FLASH: Failed to load VERSION data [ 583.105490257,5] INIT: Waiting for kernel... [ 583.105523156,5] INIT: platform wait for kernel load failed [ 583.105555219,5] INIT: Assuming kernel at 0x20000000 [ 583.105589925,3] INIT: ELF header not found. Assuming raw binary. [...] [ 583.299682673,5] INIT: Starting kernel at 0x20000000, fdt at 0x30a44eb0 1274673 bytes [ 583.344432417,3] *********************************************** [ 583.344490230,3] Fatal Exception 0x800 at 0000000020000000 MSR 9000000000000000 [ 583.344535875,3] CFAR : 0000000030022948 MSR : 9000000000000000 [ 583.344578019,3] SRR0 : 0000000020000000 SRR1 : 9000000000000000 [ 583.344620242,3] HSRR0: 0000000020000000 HSRR1: 9000000000000000 OPAL builds the device tree for BMC based system using HDAT. It populates bmc/compatible node with bmc hw version e.g. "ibm,ast2600,openbmc". Use that to identify proper BMC hw board and initialize BMC platform with proper backend. This allows opal to successfully load and boot into petitboot kernel. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07iohub: Add HUB ID for everest systemsAditya Gupta2-0/+5
Everest's hub id is 0x52, which OPAL earlier didn't recognise: [ 574.179390090,6] CEC: HUB FRU 0 is CPU Card [ 574.179430286,6] CEC: 2 chips in FRU [ 574.179464930,7] CEC: IO Hub Chip #0 OK [ 574.179497312,7] CEC: PChip: 0 HUB ID: 0052 [EC=0x20] Hub#=0) [ 574.179543358,3] CEC: Hub ID 0x0052 unsupported ! <-------- Due to not recognising the HUB id, it doesn't initialise the PCI slots. Define 0x52 as Everest's hub id, so OPAL initialises PCIe slots also for Everest Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07external/mambo: skiboot.tcl add Power11 configMahesh Salgaonkar1-0/+28
Setup skiboot.tcl with Power11 config to be boot on Power11 mambo. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07plat/qemu: add support for Power11 platformAditya Gupta2-2/+26
Add support for QEMU simulator for Power11 when it starts supporting "qemu,powernv11" machines. Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07cpufeatures: Add Power11 supportMahesh Salgaonkar1-27/+35
Update the cpu_feature structure to support Power11. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07Initial Power11 enablementMahesh Salgaonkar36-63/+158
Detect Power11 PVR and use P10 code path. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> [adityag: Add Power11 chiptod device node] [adityag: Fix the proc_gen checks in pir_to_thread_id and bmc sensor] Signed-off-by: Aditya Gupta <adityag@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-03-07external: Add support for aarch64Eddie James2-0/+4
Update the external archictecture checker script and Makefile for aarch64. Signed-off-by: Eddie James <eajames@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-02-26external/ffspart: Avoid makefile race conditionReza Arbab1-1/+1
In ffspart we assign this make variable: FFSPART_VERSION ?= $(shell ./make_version.sh $(EXE)) However, ./make_version.sh is actually a make target, and whether it exists or not at the time of this assignment is by chance, depending on how the make concurrency works out. In practice, this intermittently causes CI build failure: make -j${MAKE_J} check + make -j4 check ... [ RUN-TEST ] check-ffspart ... make[1]: ./make_version.sh: No such file or directory ... make[1]: *** [Makefile:13: check] Error 1 make[1]: Entering directory '/build/external/ffspart' ... running test/tests/00-usage running test/tests/01-param-sanity Fatal error, cannot execute binary './ffspart'. Did you make? make[1]: Leaving directory '/build/external/ffspart' make: *** [/build/external/Makefile.check:21: check-ffspart] Error 2 make: *** Waiting for unfinished jobs.... The rule for make_version.sh is just a symlink: make_version.sh: $(Q_LN)ln -sf ../../make_version.sh To avoid the race, call make_version.sh from its actual location instead of relying on the link to be created. The same thing was done for gard in commit 8ab0caf26de9 ("external/gard: Fix make dist target"). Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-02-26external/opal-prd: generate path to opal-prd in its service fileDan Horák2-3/+6
Currently the path where to install the opal-prd binary is defined in the Makefile by the $sbindir variable, but its service files hard-codes the path to /usr/sbin/opal-prd. The build should generate the service file based on the actual $sbindir value. Also strip the trailing slash from the $prefix variable. Signed-off-by: Dan Horák <dan@danny.cz> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-02-26external/mambo: pmem: make persistent memory disk mapping 2MB alignedMahesh Salgaonkar1-0/+32
commit 0a6a2ff30c9e ("mambo: Add persistent memory disk support") allows user to map disk images persistent memory using PMEM_DISK ENV variable. However, If the size of the disk image file passed is not 2MB align, then the Linux kernel fails to detect pmem device with misaligned error. nd_pmem namespace0.0: [mem 0x20000000000-0x203fffe01ff flags 0x200] misaligned, unable to map nd_pmem namespace0.0: probe with driver nd_pmem failed with error -95 And then linux kernel fails to mount root fs from /dev/pmem0 md: ... autorun DONE. /dev/root: Can't open blockdev VFS: Cannot open root device "/dev/pmem0" or unknown-block(0,0): error -6 [...] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) Fix this by adding remaining bytes as padding to make pmem device memory map 2MB aligned. Reported-by: Brad Thomasson <bthomas@us.ibm.com> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-02-26hdata/test: Build with -Wno-error=unterminated-string-initializationReza Arbab1-0/+2
Six bytes of the HDIF header are used as an eye catcher: struct HDIF_common_hdr { ... char id[6]; /* eye catcher string */ ... } We assign all six characters of this string without a terminating nul, so now that GCC 15 enables -Werror=unterminated-string-initialization by default, the build breaks: In file included from hdata/test/../spira.h:7, from hdata/test/../cpu-common.c:5, from hdata/test/hdata_to_dt.c:148: hdata/test/../spira.c:35:32: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization] 35 | .hdr = HDIF_SIMPLE_HDR("PROCIN", 1, struct proc_init_data), | ^~~~~~~~ hdata/test/../hdif.h:45:68: note: in definition of macro 'HDIF_ID' 45 | #define HDIF_ID(_id) .d1f0 = CPU_TO_BE16(0xd1f0), .id = _id | ^~~ hdata/test/../spira.c:35:16: note: in expansion of macro 'HDIF_SIMPLE_HDR' 35 | .hdr = HDIF_SIMPLE_HDR("PROCIN", 1, struct proc_init_data), | ^~~~~~~~~~~~~~~ hdata/test/../spira.h:797:33: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization] 797 | #define CPU_CTL_HDIF_SIG "CPUCTL" | ^~~~~~~~ hdata/test/../hdif.h:45:68: note: in definition of macro 'HDIF_ID' 45 | #define HDIF_ID(_id) .d1f0 = CPU_TO_BE16(0xd1f0), .id = _id | ^~~ hdata/test/../spira.c:73:16: note: in expansion of macro 'HDIF_SIMPLE_HDR' 73 | .hdr = HDIF_SIMPLE_HDR(CPU_CTL_HDIF_SIG, 2, struct cpu_ctl_init_data), | ^~~~~~~~~~~~~~~ hdata/test/../spira.c:73:32: note: in expansion of macro 'CPU_CTL_HDIF_SIG' 73 | .hdr = HDIF_SIMPLE_HDR(CPU_CTL_HDIF_SIG, 2, struct cpu_ctl_init_data), | ^~~~~~~~~~~~~~~~ hdata/test/../spira.h:30:33: error: initializer-string for array of 'char' is too long [-Werror=unterminated-string-initialization] 30 | #define SPIRAH_HDIF_SIG "SPIRAH" | ^~~~~~~~ hdata/test/../hdif.h:45:68: note: in definition of macro 'HDIF_ID' 45 | #define HDIF_ID(_id) .d1f0 = CPU_TO_BE16(0xd1f0), .id = _id | ^~~ hdata/test/../spira.c:126:16: note: in expansion of macro 'HDIF_SIMPLE_HDR' 126 | .hdr = HDIF_SIMPLE_HDR(SPIRAH_HDIF_SIG, SPIRAH_VERSION, struct spirah), | ^~~~~~~~~~~~~~~ hdata/test/../spira.c:126:32: note: in expansion of macro 'SPIRAH_HDIF_SIG' 126 | .hdr = HDIF_SIMPLE_HDR(SPIRAH_HDIF_SIG, SPIRAH_VERSION, struct spirah), | ^~~~~~~~~~~~~~~ To ignore the spurious error, build the single testcase that trips this with -Wno-error=unterminated-string-initialization. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz>
2025-01-24hw/sbe-p9: P10 additionsNicholas Piggin1-5/+43
P10 has a lower minimum timeout threshold than P9 (100usecs). Some P10 SBE timers run about 6.7% slow, which must be a hardware or firmware issue. Use the SBE timer health checking code to detect this and compensate for it. Speeding up timers as a rule is dangerous because early-expiry is a bug, howerver the core timer code checks expiry against the CPU's timebase when running timers, and with the previous changes it will schedule a new SBE timer for the remaining delay. So if this adjustment speeds things up slightly too much, it won't cause bugs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-24hw/sbe-p9: Limit SBE timer to 10sNicholas Piggin1-0/+5
The SBE in P10 has a maximum expiry limit of just over 10s, so limit SBE timers to 10s. If the desired timeout is longer than 10s, additional SBE timers will be scheduled as the 10s timers are serviced. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-24hw/sbe-p9: Fix sbe_last_gen_stamp timing inconsistencyNicholas Piggin1-8/+7
sbe_last_gen_stamp isn't a very clear name, so rename it to sbe_current_timer_tb first of all. This is used to detect if the timer should be programmed to get an earlier timeout. One issue with it is that it is set *after* the SBE acks the timer message, at which point the SBE could already have started counting the timer. This means the SBE timer interrupt could come in before that time, which is confusing and error prone. Set the field at the point the timer is submitted to the SBE. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-24hw/sbe-p9: Rename timer limitsNicholas Piggin1-9/+13
These aren't "defaults", but really minimum advertised accurate timeouts. Rename them and make them variables to accommodate changes for P10. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-24hw/sbe-p9: Better handle SBE timer rate-limitingNicholas Piggin1-6/+6
SBE timer messages are rate-limited so as not to flood the SBE. 2 timer updates are permitted before the next timer interrupt. The problem with this is that any subsequent sooner timers will not reprogram the interrupt earlier so will be arbitrarily delayed. Change this code to allow 3 updates, and have the 3rd update program the SBE to the minimum expiry time, which gives rate-limiting without compromising timer accuracy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-24hw/sbe-p9: Change SBE lagging detectionNicholas Piggin7-11/+31
Disabling the SBE timer entirely is counter-productive: the SBE interrupt can be delayed for a number of reasons including booting or OS bugs, and there is no other timer to replace it. If the SBE timer is detected to be lagging, increase polling rate until it fires but keep it running. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15hw/sbe-p9: Re-set the SBE timer after SBE interruptNicholas Piggin1-1/+3
When the SBE interrupt fires, clear the previous sbe_timer_target and has_new_target variables, because the timer code will send us an updated timer expiry after running check_timers(). This allows for example, a case where the SBE timer has fired too early to reschedule the SBE timer again rather than leaving it to be picked up by polling. SBE timer can fire early if the timer exceeds its maximum timeout, or of the SBE timing is a little off. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15core/timer: Always update hardware timerNicholas Piggin2-8/+38
Have the core timer code always call into the SBE timers with the soonest time, so the SBE code can be more careful with maintaining the hardware timer. This fixes a bug where the SBE timer is not being set immediately on schedule_timer. With a subsequent change to SBE code, it allows an SBE timer that fires too early to cause a re-schedule of the SBE timer. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15hw/sbe-p9: Add error message for unexpected msg list stateNicholas Piggin1-1/+4
SBE message acks should always apply to the first message in the list, if the message list is empty this would be a bug, so print an error message in that case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15hw/sbe-p9: Check SBE health on each chipNicholas Piggin1-0/+94
Add a SBE health check when initialising the SBEs, which sends a timer message and checks for the ack and timer expiry responses. This is better than eventually finding a timer is not firing and shutting down the SBE timer, it also tests SBEs on all chips in the system, not just the primary. This bypasses the queueing code to make things simpler, which is okay because the SBEs are not up yet so no other messages are being sent to the SBE. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15hw/sbe-p9: Move seq increment from queue to submitNicholas Piggin1-6/+6
The sequence number is a low level SBE hardware detail, so it can be assigned later when the message is being sent to the SBE. This allows SBE messages to be sent without queueing in special cases. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
2025-01-15hw/sbe: Add SBE quirk for mambo and awanNicholas Piggin4-3/+11
There appears to be no device-tree test for the P9 SBE presence like there is for P8. The P9 device tree test looks for the "primary" property, but this doesn't really test SBE presence because all chips have an SBE. It just happens to work because mambo must not add that property. So add a platform quirk, and mark mambo and awan as not having SBE. This is needed for a later change that runs a health check on every SBE in the system. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [arbab: Add #include <chip.h>] Signed-off-by: Reza Arbab <arbab@linux.ibm.com>