aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-06-11io: endian annotations and fixNicholas Piggin3-31/+36
Annotate io accessor pointer types with endian. sparse caught a bug in memcpy_from_ci, which is fixed. From: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11platform/blackbird: endian fixNicholas Piggin1-1/+1
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11build: -fverbose-asm for .s targetsNicholas Piggin2-1/+5
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11core/mce: add support for decoding and handling machine checksNicholas Piggin5-6/+293
This provides an initial facility to decode machine checks into human readable strings, plus a minimum amount of metadata that a handler has to understand in order to deal with the machine check. For now this is only used by skiboot to make MCE reporting nicer, and an ERAT flush recovery attempt which is more about code coverage than really being helpful. *********************************************** Fatal MCE at 00000000300c9c0c .memcmp+0x3c MSR 9000000000141002 Cause: instruction fetch TLB multi-hit error Effective address: 0x00000000300c9c0c ... The intention is to subsequently provide an OPAL API with this information that will enable an OS to implement a machine independent OPAL machine check driver. The code and data tables are derived from Linux code that I wrote, so relicensing is okay. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11core: interrupt markers for stack tracesNicholas Piggin13-8/+33
Use magic marker in the exception stack frame that is used by the unwinder to decode the interrupt type and NIA. The below example trace comes from a modified skiboot that uses virtual memory, but any interrupt type will appear similarly. CPU 0000 Backtrace: S: 0000000031c13580 R: 0000000030028210 .vm_dsi+0x360 S: 0000000031c13630 R: 000000003003b0dc .exception_entry+0x4fc S: 0000000031c13830 R: 0000000030001f4c exception_entry_foo+0x4 --- Interrupt 0x300 at 000000003002431c --- S: 0000000031c13b40 R: 000000003002430c .make_free.isra.0+0x110 S: 0000000031c13bd0 R: 0000000030025198 .mem_alloc+0x4a0 S: 0000000031c13c80 R: 0000000030028bac .__memalign+0x48 S: 0000000031c13d10 R: 0000000030028da4 .__zalloc+0x18 S: 0000000031c13d90 R: 000000003002fb34 .opal_init_msg+0x34 S: 0000000031c13e20 R: 00000000300234b4 .main_cpu_entry+0x61c S: 0000000031c13f00 R: 00000000300031b8 boot_entry+0x1b0 --- OPAL boot --- Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: the new stackentry fields made our test heaps too small] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> fixup! core: interrupt markers for stack traces
2020-06-11skiboot.lds.S: introduce PAGE_SIZE, use it to lay out sectionsNicholas Piggin2-8/+17
Separate code, data, read-only data, and other significant sections with PAGE_SIZE alignment. This enables memory protection for these sections with a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11skiboot.lds.S: remove dynsym/dynstr and pltNicholas Piggin1-8/+4
skiboot is static so these are always empty. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11move opal_branch_table, opal_num_args to .rodata sectionNicholas Piggin3-13/+16
.head is for code and data which must reside at a fixed low address, mainly entry points. These are moved into .rodata. Despite being modified at runtime, this facilitates these tables being write-protected in a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: improve fast reboot sequenceNicholas Piggin2-119/+121
The current fast reboot sequence is not as robust as it could be. It is this: - Fast reboot CPU stops all other threads with direct control xscoms; - it disables ME (machine checks become checkstops); - resets its SPRs (to get HID[HILE] for machine check interrupts) and overwrites exception vectors with our vectors, with a special fast reboot sreset vector that fixes endian (because OS owns HILE); - then the fast reboot CPU enables ME. At this point the fast reboot CPU can handle machine checks with the skiboot handler, but no other cores can if the OS had switched HILE (they'll execute garbled byte swapped instructions and crash badly). - Then all CPUs run various cleanups, XIVE, resync TOD, etc. - The boot CPU, which is not necessarily the same as the fast reboot initiator CPU, runs xive_reset. This is a lot of code to run, including locking and xscoms, with machine check inoperable. - Finally secondaries are released and everyone sets SPRs and enables ME. Secondaries on other cores don't wait for their thread 0 to set shared SPRs before calling into the normal OPAL secondary code. This is mostly okay because the boot CPU pauses here until all secondaries reach their idle code, but it's not nice to release them out of the fast reboot code in a state with various per-core SPRs in flux. Fix this by having the fast reboot CPU not disable ME or reset its SPRs, because machine checks can still be handled by the OS. Then wait until all CPUs are called into fast reboot and spinning with ME disabled, only then reset any SPRs, copy remaining exception vectors, and now skiboot has taken over the machine check handling, then the CPUs enable ME before cleaning up other things. This way, the region with ME disabled and SPRs and exception vectors in flux is kept absolutely minimal, with no xscoms, no MMIOs, and few significant memory modifications, and all threads kept closely in step. There are no windows where a machine check interrupt may execute garbage due to mismatched HILE on any CPU. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: don't back up old vectors upon fast rebootNicholas Piggin1-5/+5
Initial boot already saved original exception vectors to old_vectors, copying again upon fast reboot will overwrite old_vectors with some arbitrary vectors set up by the current OS. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11fast-reboot: add missing clear memory fallbackNicholas Piggin1-2/+8
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11move the __this_cpu register to r16, reserve r13-r15Nicholas Piggin4-25/+37
There have been several bugs between Linux and OPAL caused by both using r13 for their primary per-CPU data address. This patch moves OPAL to use r16 for this, and prevents the compiler from touching r13-r15 (r14,r15 allow Linux to use additional fixed registers in future). This helps code to be a little more robust, and may make crashes in OPAL (or debugging with pdbg or in simulators) easier to debug by having easy access to the PACA. Later, if we allow interrupts (other than non-maskable) to be taken when running in skiboot, Linux's interrupt return handler does not restore r13 if the interrupt was taken in PR=0 state, which would corrupt the skiboot r13 register, so this allows for the possibility, although it will have to become a formal OPAL ABI requirement if we rely on it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: x86_64 has an r13, but not an r16 so the tests broke] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> wip: fix __this_cpu() in the test cases
2020-06-11asm/head.S: QUIESCE_REJECT fixNicholas Piggin1-1/+2
This was returning to the wrong point and loading some garbage that had not been set up yet. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11libstb/container: Add missing includesNicholas Piggin1-0/+2
libstb will sometimes randomly fail to compile due to missing types. This appears to solve it but I didn't look too far into why it mostly works (or can be made to work with make clean) without this. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11opal-ci: Add Ubuntu20.04 supportVasant Hegde3-7/+3
And drop Ubuntu 16.04. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11opal-ci: Fix broken fedora buildsVasant Hegde5-11/+4
- Our device tree test cases are passing with fedora shipped `dtc` command. Hence remove `dtc` build process. - Replace fedora30 with fedora32. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11Properly check mmap error codeorbitcowboy2-5/+5
Signed-off-by: orbitcowboy <orbitcowboy@web.de> [oliver: misplaced paren] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11stb/print-container: Properly check mmap error codeHanno Böck1-1/+1
Signed-off-by: Hanno Böck <hanno@gentoo.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-11external/ffspart: Use read() rather than mmap()Oliver O'Halloran1-8/+34
The various ffspart test cases use /dev/zero as an input which doesn't behave like a normal file. The stat.st_size field for char devs is generally zero so the subsequent attempt to mmap() the file fails because we requested a zero size mapping. Previously we didn't notice this, but it sort of worked since the partitions in the test script that used /dev/zero as an input were also had those partitions marked as ECC. This resulted in the partition contents being re-generated (using a buffer libflash allocates) and the source data pointer being ignored since we said it was zero length. Fix all this by dropping mmap() entirely and inhale the input file into a buffer we malloc() instead. This works for any file, including /dev/urandom, which can't be mmap()ed. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-09skiboot v6.6.1 release notesVasant Hegde1-0/+31
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-05skiboot v6.3.5 release notesVasant Hegde1-0/+17
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-06-05github: update pull request templateOliver O'Halloran1-3/+18
The current wording is a bit curt. Flesh it out a bit and put in some useful detail. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com>
2020-06-04Disable protected execution facilityRyan Grimm5-0/+110
This patch disables Protected Execution Faciltiy (PEF). This software procedure is needed for the lab because Cronus will be configured to bring the machine up with PEF on. Hostboot has a similar procedure for running with PEF off. Skiboot can run with PEF on but the kernel cannot; the kernel will take a machine check when trying to write a protected resource, such as the PTCR. So, use this until we have an ultravisor, or if we want to use BML with Cronus without UV = 1. Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> Tested-by: Alistair Popple <alistair@popple.id.au> [oliver: replaced bare urfid with a macro for toolchain compatibility] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03xive: Fix typo and spelling in a commentGustavo Romero1-1/+1
This commit fixes a typo and a spelling in a comment about the XIVE set translate mechanism. Signed-off-by: Gustavo Romero <gromero@linux.ibm.com> Reviewed-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03occ: Fix false negatives in wait_for_all_occ_init()Gautham R. Shenoy3-32/+154
Currently the wait_for_all_occ_init() function determines that the OCCs associated with every Chip has been initialized by verifying if the "Valid" bit in pstate table of that OCC is set. However, on chips where all the EX units are guarded, the OCC, even though it is active, does not update the pstate_table. Currently as a result of this, OPAL concludes that the OCC is not functional and not only disable Pstate initialization, but incorrectly report that that OCCs were not initialized, thereby cutting other features such as sensors. Fix this by ensuring that * We check if there is atleast one active EX unit in the chip before checking if the OCC is active. * On platforms with OCC-OPAL communication interface version 0x90 * wait_for_all_occ_init() only checks if the occ_state in the OCC dynamic area is set to "Active State". * move the "Valid" bit check to add_cpu_pstate_properties(), which is where we create the device-tree entries for frequency scaling. Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03opal-gard: sync up the chip unit data list with upstream hostboot.Mahesh Salgaonkar1-1/+9
opal-gard on POWER9P system fails to identify few chip targets while displaying gard records. This patch fixes that. Before: # opal-gard list ID | Error | Type | Path ------------------------------------------------------------------------------- 00000001 | 90004af4 | Predictive | /Sys0/Node0/Proc0/MC0/MI0/UNKNOWN0/UNKNOWN0 =============================================================================== After this patch: # ./opal-gard list ID | Error | Type | Path --------------------------------------------------------------------------- 00000001 | 90004af4 | Predictive | /Sys0/Node0/Proc0/MC0/MI0/MCC0/OMI0 =========================================================================== Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03Honor DEAD_CODE_ELIMINATION flagMauro S. M. Rodrigues1-0/+1
While trying to reduce the size of the final binary I found DEAD_CODE_ELIMINATION=1 but it didn't change the binary size and known ununsed functions were seen when inspecting the elf with nm. Even though the necessary parameters for compiler, -ffunction-sections and -fdata-sections, are set, ld's --gc-sections wasn't, so add it in order to honor the flag. Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Enable error interruptsOliver O'Halloran1-1/+39
In PHB4 the PHB's error and informational interrupts were changed to behave more like actual LSIs. On PHB3 these interrupts would be only be raised on a 0 -> 1 transition of an error status bits (i.e. they were rising edge triggered). On PHB4 the error interrupts are "true" LSIs and will be re-raised as long the underlying error status bit is set. This causes a headache for us because OPAL's PHB error handling model requires Skiboot to preserve the state of the PHB (including errors) until the kernel is ready to handle the error. As a result we can't do anything in Skiboot to handle the interrupt condition and we need to mask the error internally. We can do this by clearing the relevant bits in the IRQ_ENABLE registers of the PHB. It's worth pointing out that we don't want to mask the interrupt by setting the Q bit in the XIVE ESBs. The ESBs are owned by the OS which may be masking and unmasking the interrupt for its own reasons (e.g. migrating IRQs). Skiboot modifying the ESB state could potentially cause problems and should be avoided. Cc: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Factor out interrupt setupOliver O'Halloran1-5/+11
Move the unmasking (enabling) of the various PHB error and informational interrupts out of the main init sequence. We'll need this elsewhere to enable the PHB error interrupts. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Don't disable TXE(12) interrupt if on P9 DD2.0Oliver O'Halloran1-4/+1
Commit 7dbf80d1db45 ("phb4: Generate checkstop on AIB ECC corr/uncorr for DD2.0 parts") changed the PHB inits so that on DD2.0 TXE error bit 12 would cause a checkstop. The patch also changes the TXE_ERR_IRQ_ENABLE settings to prevent this bit from causing a PHB error interrupt. However, there's not much point in doing this since the system is going to checkstop anyway. Removing the code to disable the interrupt simplifies the situation a bit and avoids conflating FIR propagation with the normal PHB error interrupts. The PHB spec is actively confusing in this area since it describes the TXE Error summary bit in the LEM FIR as an "interrupt" even though it's completely seperate to the PHB's LSI error reporting interrupt. Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Fix interrupt namesOliver O'Halloran1-0/+17
Linux doesn't seem to parse the interrupt-names property when there are unnamed (zero length string) interrupts. Add a name callback to the interrupt source and go from there. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Make error interrupt handler compileOliver O'Halloran1-3/+1
When phb4.c was copied from phb3.c the error interrupts were disabled by default and apparently never re-enabled. Remove the #if 0 block and call phb4_set_err_pending() rather than the phb3 version. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03uart: Drop console write data if BMC becomes unresponsiveVasant Hegde1-26/+74
If BMC becomes unresponsive (ex: during BMC reboot) during console write then we may get stuck in uart_wait_tx_room(). This will result in CPU to get stuck in OPAL. This will result in kernel lockups and in some cases host becomes unresponsive. This patch introduces timeout option. If UART operation doesn't complete within predefined time then it will drop write data and comes out. Note that this patch fixes both OPAL internal console as well as console write APIs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [Various fixes on top of Nick's proposal to have single timer - Vasant] Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26hw/phys-map: Fix OCAPI_MEM BAR valuesAndrew Donnellan1-3/+3
The comment next to the OCAPI_MEM entries in the Nimbus phys-map claims that we are "varying the upper 2 bits of the group ID" for each OpenCAPI link, as matches the chip address extension mask that will be set by future versions of Hostboot. The actual entries, on the other hand, vary the *lower* 2 bits of the group ID. Whoops. This didn't appear to cause us problems on the specific machines that we had access to at the time, but now that this is being tested a bit harder it's crashing machines... Fixes: bc72973d13215 ("hw/npu2-opencapi: Support multiple LPC devices") Cc: Frederic Barrat <fbarrat@linux.ibm.com> Reported-by: Wael El-Essawy <welessa@us.ibm.com> Reported-by: Milton Miller <miltonm@us.ibm.com> Reported-by: Jenny Huynh <jhuynh@us.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26Detect fused core mode and bail outJoel Stanley2-0/+20
Fused code mode is currently not supported in OPAL. Continuing to boot the system would result in errors at later stages of boot. Wait for console to be up and print message for developers to check and fix the system modes. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: Joel Stanley <joel@jms.id.au> Tested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26platform/mihawk: Tune equalization settings for opencapiFrederic Barrat3-4/+33
The Bittware 250SOC adapter on Mihawk was showing a high count of CRC errors on one of the opencapi slots. The PHY team suggested new equalization settings to correct the errors. All existing adapters have been tested on mihawk to make sure the settings are compatible. However, the new settings should not be used on platforms other than mihawk. The changes specific to mihawk are: - Update the tx_ffe_pre_coeff and tx_ffe_post_coeff input parameters used during zcal - turn off the tx_ffe_boost parameter through scom Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26hdata/memory.c: Fix "Inconsistent MSAREA" warningsKlaus Heinrich Kiwi1-0/+3
add_memory_buffer_mmio() should be exclusive to P9P (AXONE). Running it on non P9P systems resulted in warnings such as: MS AREA: Inconsistent MSAREA version 40 for P9P system So check for PVR and quietly return if not P9P. Fixes: 38b5c3179 (Add support for memory-buffer mmio) Cc: skiboot-stable@lists.ozlabs.org Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26opal entry: Fix LE skiboot clobbering r10 argumentNicholas Piggin2-5/+6
Fortunately no OPAL calls seem to use 8 arguments yet. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26buddy: Fix warnings when undefining BUDDY_DEBUGRyan Grimm1-2/+2
In simulation, hundreds of millions of cycles are chewed up in this code path: PC: 0x0000000030033450 -> <.bitmap_tst_bit>+0x18 LR: 0x000000003003347c -> <.buddy_check_alloc>+0x14 0x0000000031c13b30 -> <_ebss>+0x1803b30 0x000000003003351c -> <.buddy_check_alloc_down>+0x4c 0x00000000300339c4 -> <.buddy_free>+0x7c 0x0000000030033be8 -> <.buddy_create>+0xcc 0x0000000030089bbc -> <.xive_init>+0xf0 0x00000000300157cc -> <.main_cpu_entry>+0x8a0 0x000000003000275c -> <boot_entry>+0x1bc Undefining BUDDY_DEBUG saves 30+ minutes of wall clock time, so fix the "warning: unused parameter" messages when compiling. Signed-off-by: Ryan Grimm <grimm@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26PSI: Convert prerror to PR_NOTICEVasant Hegde1-1/+1
"Spurious interrupt" is not severe. Reduce message severity and keep msglog happy! Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26sensors: occ: Fix a bug when sensor values are zeroGautham R. Shenoy1-1/+2
The commit 1b9a449d ("opal-api: add endian conversions to most opal calls") modified the code in opal_read_sensor() to make it Little-Endian safe. In the process, it changed the code so that if a sensor value was zero, it would simply return OPAL_SUCCESS without updating the return buffer. As a result, the return buffer contained bogus values which were reflected on those sensors being read by the Kernel. This patch fixes it by ensuring that the return buffer is updated with the value read from the sensor every time. Thanks to Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> for spotting the missing return-buffer update. cc: skiboot-stable@lists.ozlabs.org Fixes: commit 1b9a449d ("opal-api: add endian conversions to most opal calls") Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26sensors: occ: Fix the GPU detection codeGautham R. Shenoy1-2/+20
commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu systems") assumes that presence of "ibm,power9-npu" compatible node indicates the presence of GPUs. However this is incorrect, as even OpenCAPI is supported via NPU. Thus ZZ systems, which have OpenCAPI connectors but not GPUs will have "ibm,power9-npu" compatible nodes. This results in OPAL creating device-tree entries for the GPU sensors on ZZ systems which don't even have GPUs. This patch fixes the GPU detection code in occ-sensors, by first checking for "ibm,ioda2-npu2-phb" compatible node which indicates the presence of nvlink. Only if such a node exists, do we check with the OCC for presence of GPUs on systems to confirm the presence of the GPU. Otherwise, we cut the GPU sensors. Thanks to Frederic Barrat <fbarrat@linux.ibm.com> for suggesting "ibm,ioda2-npu2-phb" for detecting the presence of nvlink GPUs. cc: skiboot-stable@lists.ozlabs.org Fixes: commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu systems") Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-05-26libstb: Don't exit trustedboot services multiple timesMauro S. M. Rodrigues1-2/+7
For the very specific scenario when the fast-reboot is used, we see multiple error messages regarding the trusteboot measurements not being done. The way fast-reboot works is performing just fundamental operations, like PCI initialization, to get skiboot into good shape to boot kernel, and later the host's Kernel. That means fast-reboot contains data structures filled since last full reboot. In this process trustedboot is not re-initialized when, but it still tries to perform the STB measurements and event logging done in trustedboot_exit_services, showing multiple failure messages. This patch avoids that situation by returning earlier and logging that trustedboot already exited. If eventually something changes and trustedboot gets re-initialized during fast-reboot this patch also set boot_services_exited to false after every initialization so we always exit trustedboot whenever it get initialized. Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-23skiboot v6.6 release notesv6.6Oliver O'Halloran1-0/+65
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15ZZ: Fix System Attention Indicator location codeVasant Hegde1-1/+5
We are using SAI indicator location from SLCA to represent System Attention Indicator location code. In P9, this is mapped to op-panel location code. op-panel has identify and fault LEDs as well. Our SPCN command lists op-panel location code as well. Hence we get below OPAL warning. OPAL msglog: FSPLED: duplicate location code U78D3.001.WT0004T-D1 Because of above issue we are not creating device tree node for D1 identify/fault indicators. We have System Attention Indicator at enclosure level as well.. which is replica of attention indicator in op-panel. Hence use System VPD location code to represent attention indicator. Note that we have dedicated MBOX command to read/update System Attention Indicator which doesn't need location code. Hence we are fine with this change. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15MPIPL: Add support to save crash CPU details on FSP systemVasant Hegde3-3/+13
OPAL uses different path to trigger MPIPL: - On BMC system we call SBE S0 interrupt - On FSP system we call `attn` instruction Currently on BMC system we collect crash CPU PIR details.. which is needed to generate proper dump. This happens just before calling SBE S0 interrupt. Since we don't use this path in FSP system OPAL is not saving crashing CPU details. Hence by default `opalcore` is not pointing to crashing CPU and not showing proper backtrace. We have to go through all CPUs to find crashing CPU backtrace. This patch move this function to common place so that if MPIPL is supported we collect crashing CPU data. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-15fsp: Ignore platform dump notification on P9Vasant Hegde1-0/+3
After system crash FSP collects dump and passes dump details via HDAT. OPAL/Linux uses this detail to extract SYSDUMP. P9 FSP system we have MPIPL support. FSP folks says we have to ignore platform dump notification passed by HDAT and use inband MPIPL mechanism to extract dump. CC: Murulidhar Nataraju <murulidhar@in.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-09platform: add Raptor Blackbird supportStewart Smith2-1/+108
Based off the Raptor patch: https://git.raptorcs.com/git/blackbird-skiboot/commit/?id=c81f9d66592dc2a7cf7f6c59c3def5cee0638c1f Notable changes: - slot names matching what's silkscreened on the board - Expose IPL Observer over op-panel OPAL calls This means you can "printf '\xfe\xfe\xfe' > /dev/op_panel" to make the IPL Observer on the Raptor BMC builds to realise it can turn on fan control. Signed-off-by: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-04-09skiboot v6.0.23 release notesVasant Hegde1-0/+17
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-04-08hw/ocmb: Add OCMB SCOM supportOliver O'Halloran4-0/+185
Add a driver for the SCOM ranges of the OCMB. Unlike most chips the OCMB has two different (three if you count OpenCAPI config space) register spaces and we need to ensure that the right access size is used on each. Additionally the SCOM interface is a bit non-standard in that a full physical address is passed as the SCOM address rather than a register number so we don't need to perform any address transformations, we just need to verify that the address falls into one of the nominated address ranges. Cc: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>