Age | Commit message (Collapse) | Author | Files | Lines |
|
Annotate io accessor pointer types with endian.
sparse caught a bug in memcpy_from_ci, which is fixed.
From: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This provides an initial facility to decode machine checks into
human readable strings, plus a minimum amount of metadata that
a handler has to understand in order to deal with the machine
check.
For now this is only used by skiboot to make MCE reporting nicer,
and an ERAT flush recovery attempt which is more about code
coverage than really being helpful.
***********************************************
Fatal MCE at 00000000300c9c0c .memcmp+0x3c MSR 9000000000141002
Cause: instruction fetch TLB multi-hit error
Effective address: 0x00000000300c9c0c
...
The intention is to subsequently provide an OPAL API with this
information that will enable an OS to implement a machine
independent OPAL machine check driver.
The code and data tables are derived from Linux code that I wrote,
so relicensing is okay.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Use magic marker in the exception stack frame that is used by the
unwinder to decode the interrupt type and NIA. The below example trace
comes from a modified skiboot that uses virtual memory, but any
interrupt type will appear similarly.
CPU 0000 Backtrace:
S: 0000000031c13580 R: 0000000030028210 .vm_dsi+0x360
S: 0000000031c13630 R: 000000003003b0dc .exception_entry+0x4fc
S: 0000000031c13830 R: 0000000030001f4c exception_entry_foo+0x4
--- Interrupt 0x300 at 000000003002431c ---
S: 0000000031c13b40 R: 000000003002430c .make_free.isra.0+0x110
S: 0000000031c13bd0 R: 0000000030025198 .mem_alloc+0x4a0
S: 0000000031c13c80 R: 0000000030028bac .__memalign+0x48
S: 0000000031c13d10 R: 0000000030028da4 .__zalloc+0x18
S: 0000000031c13d90 R: 000000003002fb34 .opal_init_msg+0x34
S: 0000000031c13e20 R: 00000000300234b4 .main_cpu_entry+0x61c
S: 0000000031c13f00 R: 00000000300031b8 boot_entry+0x1b0
--- OPAL boot ---
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[oliver: the new stackentry fields made our test heaps too small]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
fixup! core: interrupt markers for stack traces
|
|
Separate code, data, read-only data, and other significant sections
with PAGE_SIZE alignment. This enables memory protection for these
sections with a later patch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
skiboot is static so these are always empty.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
.head is for code and data which must reside at a fixed low address,
mainly entry points.
These are moved into .rodata. Despite being modified at runtime, this
facilitates these tables being write-protected in a later patch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The current fast reboot sequence is not as robust as it could be. It
is this:
- Fast reboot CPU stops all other threads with direct control xscoms;
- it disables ME (machine checks become checkstops);
- resets its SPRs (to get HID[HILE] for machine check interrupts) and
overwrites exception vectors with our vectors, with a special fast
reboot sreset vector that fixes endian (because OS owns HILE);
- then the fast reboot CPU enables ME.
At this point the fast reboot CPU can handle machine checks with the
skiboot handler, but no other cores can if the OS had switched HILE
(they'll execute garbled byte swapped instructions and crash badly).
- Then all CPUs run various cleanups, XIVE, resync TOD, etc.
- The boot CPU, which is not necessarily the same as the fast reboot
initiator CPU, runs xive_reset.
This is a lot of code to run, including locking and xscoms, with
machine check inoperable.
- Finally secondaries are released and everyone sets SPRs and enables
ME.
Secondaries on other cores don't wait for their thread 0 to set shared
SPRs before calling into the normal OPAL secondary code. This is
mostly okay because the boot CPU pauses here until all secondaries
reach their idle code, but it's not nice to release them out of the
fast reboot code in a state with various per-core SPRs in flux.
Fix this by having the fast reboot CPU not disable ME or reset its
SPRs, because machine checks can still be handled by the OS. Then
wait until all CPUs are called into fast reboot and spinning with
ME disabled, only then reset any SPRs, copy remaining exception
vectors, and now skiboot has taken over the machine check handling,
then the CPUs enable ME before cleaning up other things.
This way, the region with ME disabled and SPRs and exception vectors
in flux is kept absolutely minimal, with no xscoms, no MMIOs, and few
significant memory modifications, and all threads kept closely in step.
There are no windows where a machine check interrupt may execute
garbage due to mismatched HILE on any CPU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Initial boot already saved original exception vectors to old_vectors,
copying again upon fast reboot will overwrite old_vectors with some
arbitrary vectors set up by the current OS.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
There have been several bugs between Linux and OPAL caused by both
using r13 for their primary per-CPU data address. This patch moves
OPAL to use r16 for this, and prevents the compiler from touching
r13-r15 (r14,r15 allow Linux to use additional fixed registers in
future).
This helps code to be a little more robust, and may make crashes
in OPAL (or debugging with pdbg or in simulators) easier to debug by
having easy access to the PACA.
Later, if we allow interrupts (other than non-maskable) to be taken when
running in skiboot, Linux's interrupt return handler does not restore
r13 if the interrupt was taken in PR=0 state, which would corrupt the
skiboot r13 register, so this allows for the possibility, although it
will have to become a formal OPAL ABI requirement if we rely on it.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[oliver: x86_64 has an r13, but not an r16 so the tests broke]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
wip: fix __this_cpu() in the test cases
|
|
This was returning to the wrong point and loading some garbage that
had not been set up yet.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
libstb will sometimes randomly fail to compile due to missing types.
This appears to solve it but I didn't look too far into why it mostly
works (or can be made to work with make clean) without this.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
And drop Ubuntu 16.04.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Dan Horák <dan@danny.cz>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
- Our device tree test cases are passing with fedora shipped `dtc`
command. Hence remove `dtc` build process.
- Replace fedora30 with fedora32.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Dan Horák <dan@danny.cz>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: orbitcowboy <orbitcowboy@web.de>
[oliver: misplaced paren]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Hanno Böck <hanno@gentoo.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The various ffspart test cases use /dev/zero as an input which doesn't
behave like a normal file. The stat.st_size field for char devs is
generally zero so the subsequent attempt to mmap() the file fails
because we requested a zero size mapping.
Previously we didn't notice this, but it sort of worked since the
partitions in the test script that used /dev/zero as an input were
also had those partitions marked as ECC. This resulted in the partition
contents being re-generated (using a buffer libflash allocates) and
the source data pointer being ignored since we said it was zero length.
Fix all this by dropping mmap() entirely and inhale the input file into
a buffer we malloc() instead. This works for any file, including
/dev/urandom, which can't be mmap()ed.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
The current wording is a bit curt. Flesh it out a bit and put in some
useful detail.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com>
|
|
This patch disables Protected Execution Faciltiy (PEF).
This software procedure is needed for the lab because Cronus will be
configured to bring the machine up with PEF on. Hostboot has a similar
procedure for running with PEF off.
Skiboot can run with PEF on but the kernel cannot; the kernel will take
a machine check when trying to write a protected resource, such as the
PTCR.
So, use this until we have an ultravisor, or if we want to use BML with
Cronus without UV = 1.
Signed-off-by: Ryan Grimm <grimm@linux.ibm.com>
Tested-by: Alistair Popple <alistair@popple.id.au>
[oliver: replaced bare urfid with a macro for toolchain compatibility]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This commit fixes a typo and a spelling in a comment about the XIVE set
translate mechanism.
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Currently the wait_for_all_occ_init() function determines that the
OCCs associated with every Chip has been initialized by verifying if
the "Valid" bit in pstate table of that OCC is set.
However, on chips where all the EX units are guarded, the OCC, even
though it is active, does not update the pstate_table. Currently as a
result of this, OPAL concludes that the OCC is not functional and not
only disable Pstate initialization, but incorrectly report that that
OCCs were not initialized, thereby cutting other features such as
sensors.
Fix this by ensuring that
* We check if there is atleast one active EX unit in the chip
before checking if the OCC is active.
* On platforms with OCC-OPAL communication interface version 0x90
* wait_for_all_occ_init() only checks if the occ_state in the
OCC dynamic area is set to "Active State".
* move the "Valid" bit check to add_cpu_pstate_properties(),
which is where we create the device-tree entries for
frequency scaling.
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
opal-gard on POWER9P system fails to identify few chip targets while
displaying gard records. This patch fixes that.
Before:
# opal-gard list
ID | Error | Type | Path
-------------------------------------------------------------------------------
00000001 | 90004af4 | Predictive | /Sys0/Node0/Proc0/MC0/MI0/UNKNOWN0/UNKNOWN0
===============================================================================
After this patch:
# ./opal-gard list
ID | Error | Type | Path
---------------------------------------------------------------------------
00000001 | 90004af4 | Predictive | /Sys0/Node0/Proc0/MC0/MI0/MCC0/OMI0
===========================================================================
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Reviewed-by: Dan Horák <dan@danny.cz>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
While trying to reduce the size of the final binary I found
DEAD_CODE_ELIMINATION=1 but it didn't change the binary size and
known ununsed functions were seen when inspecting the elf with nm.
Even though the necessary parameters for compiler, -ffunction-sections
and -fdata-sections, are set, ld's --gc-sections wasn't, so add it in
order to honor the flag.
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
In PHB4 the PHB's error and informational interrupts were changed to behave
more like actual LSIs. On PHB3 these interrupts would be only be raised on
a 0 -> 1 transition of an error status bits (i.e. they were rising edge
triggered). On PHB4 the error interrupts are "true" LSIs and will be
re-raised as long the underlying error status bit is set.
This causes a headache for us because OPAL's PHB error handling model
requires Skiboot to preserve the state of the PHB (including errors) until
the kernel is ready to handle the error. As a result we can't do anything
in Skiboot to handle the interrupt condition and we need to mask the error
internally. We can do this by clearing the relevant bits in the IRQ_ENABLE
registers of the PHB.
It's worth pointing out that we don't want to mask the interrupt by setting
the Q bit in the XIVE ESBs. The ESBs are owned by the OS which may be
masking and unmasking the interrupt for its own reasons (e.g. migrating
IRQs). Skiboot modifying the ESB state could potentially cause problems and
should be avoided.
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Move the unmasking (enabling) of the various PHB error and informational
interrupts out of the main init sequence. We'll need this elsewhere to
enable the PHB error interrupts.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Commit 7dbf80d1db45 ("phb4: Generate checkstop on AIB ECC corr/uncorr
for DD2.0 parts") changed the PHB inits so that on DD2.0 TXE error bit
12 would cause a checkstop. The patch also changes the
TXE_ERR_IRQ_ENABLE settings to prevent this bit from causing a PHB error
interrupt. However, there's not much point in doing this since the
system is going to checkstop anyway.
Removing the code to disable the interrupt simplifies the situation a
bit and avoids conflating FIR propagation with the normal PHB error
interrupts. The PHB spec is actively confusing in this area since it
describes the TXE Error summary bit in the LEM FIR as an "interrupt"
even though it's completely seperate to the PHB's LSI error reporting
interrupt.
Cc: Michael Neuling <mikey@neuling.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Linux doesn't seem to parse the interrupt-names property when there are
unnamed (zero length string) interrupts. Add a name callback to the
interrupt source and go from there.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
When phb4.c was copied from phb3.c the error interrupts were disabled by
default and apparently never re-enabled. Remove the #if 0 block and call
phb4_set_err_pending() rather than the phb3 version.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
If BMC becomes unresponsive (ex: during BMC reboot) during console write
then we may get stuck in uart_wait_tx_room(). This will result in CPU
to get stuck in OPAL. This will result in kernel lockups and in some
cases host becomes unresponsive.
This patch introduces timeout option. If UART operation doesn't complete
within predefined time then it will drop write data and comes out.
Note that this patch fixes both OPAL internal console as well as
console write APIs.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[Various fixes on top of Nick's proposal to have single timer - Vasant]
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The comment next to the OCAPI_MEM entries in the Nimbus phys-map claims
that we are "varying the upper 2 bits of the group ID" for each OpenCAPI
link, as matches the chip address extension mask that will be set by future
versions of Hostboot.
The actual entries, on the other hand, vary the *lower* 2 bits of the group
ID. Whoops.
This didn't appear to cause us problems on the specific machines that we
had access to at the time, but now that this is being tested a bit harder
it's crashing machines...
Fixes: bc72973d13215 ("hw/npu2-opencapi: Support multiple LPC devices")
Cc: Frederic Barrat <fbarrat@linux.ibm.com>
Reported-by: Wael El-Essawy <welessa@us.ibm.com>
Reported-by: Milton Miller <miltonm@us.ibm.com>
Reported-by: Jenny Huynh <jhuynh@us.ibm.com>
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Fused code mode is currently not supported in OPAL. Continuing to
boot the system would result in errors at later stages of boot.
Wait for console to be up and print message for developers to check
and fix the system modes.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Tested-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The Bittware 250SOC adapter on Mihawk was showing a high count of CRC
errors on one of the opencapi slots. The PHY team suggested new
equalization settings to correct the errors.
All existing adapters have been tested on mihawk to make sure the
settings are compatible. However, the new settings should not be used
on platforms other than mihawk.
The changes specific to mihawk are:
- Update the tx_ffe_pre_coeff and tx_ffe_post_coeff input parameters
used during zcal
- turn off the tx_ffe_boost parameter through scom
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
add_memory_buffer_mmio() should be exclusive to P9P (AXONE).
Running it on non P9P systems resulted in warnings such as:
MS AREA: Inconsistent MSAREA version 40 for P9P system
So check for PVR and quietly return if not P9P.
Fixes: 38b5c3179 (Add support for memory-buffer mmio)
Cc: skiboot-stable@lists.ozlabs.org
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Fortunately no OPAL calls seem to use 8 arguments yet.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
In simulation, hundreds of millions of cycles are chewed up in this code
path:
PC: 0x0000000030033450 -> <.bitmap_tst_bit>+0x18
LR: 0x000000003003347c -> <.buddy_check_alloc>+0x14
0x0000000031c13b30 -> <_ebss>+0x1803b30
0x000000003003351c -> <.buddy_check_alloc_down>+0x4c
0x00000000300339c4 -> <.buddy_free>+0x7c
0x0000000030033be8 -> <.buddy_create>+0xcc
0x0000000030089bbc -> <.xive_init>+0xf0
0x00000000300157cc -> <.main_cpu_entry>+0x8a0
0x000000003000275c -> <boot_entry>+0x1bc
Undefining BUDDY_DEBUG saves 30+ minutes of wall clock time, so fix
the "warning: unused parameter" messages when compiling.
Signed-off-by: Ryan Grimm <grimm@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
"Spurious interrupt" is not severe. Reduce message severity
and keep msglog happy!
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The commit 1b9a449d ("opal-api: add endian conversions to most opal
calls") modified the code in opal_read_sensor() to make it
Little-Endian safe. In the process, it changed the code so that if a
sensor value was zero, it would simply return OPAL_SUCCESS without
updating the return buffer. As a result, the return buffer contained
bogus values which were reflected on those sensors being read by the
Kernel.
This patch fixes it by ensuring that the return buffer is updated with
the value read from the sensor every time.
Thanks to Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> for
spotting the missing return-buffer update.
cc: skiboot-stable@lists.ozlabs.org
Fixes: commit 1b9a449d ("opal-api: add endian conversions to most opal
calls")
Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
systems") assumes that presence of "ibm,power9-npu" compatible node
indicates the presence of GPUs. However this is incorrect, as even
OpenCAPI is supported via NPU. Thus ZZ systems, which have OpenCAPI
connectors but not GPUs will have "ibm,power9-npu" compatible nodes.
This results in OPAL creating device-tree entries for the GPU sensors
on ZZ systems which don't even have GPUs.
This patch fixes the GPU detection code in occ-sensors, by first
checking for "ibm,ioda2-npu2-phb" compatible node which indicates the
presence of nvlink. Only if such a node exists, do we check with the
OCC for presence of GPUs on systems to confirm the presence of the
GPU. Otherwise, we cut the GPU sensors.
Thanks to Frederic Barrat <fbarrat@linux.ibm.com> for suggesting
"ibm,ioda2-npu2-phb" for detecting the presence of nvlink GPUs.
cc: skiboot-stable@lists.ozlabs.org
Fixes: commit bebe096ee242 ("sensors: occ: Skip GPU sensors for non-gpu
systems")
Reported-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
For the very specific scenario when the fast-reboot is used, we see
multiple error messages regarding the trusteboot measurements not being
done.
The way fast-reboot works is performing just fundamental operations, like
PCI initialization, to get skiboot into good shape to boot kernel, and
later the host's Kernel. That means fast-reboot contains data structures
filled since last full reboot.
In this process trustedboot is not re-initialized when, but it still tries
to perform the STB measurements and event logging done in
trustedboot_exit_services, showing multiple failure messages.
This patch avoids that situation by returning earlier and logging that
trustedboot already exited.
If eventually something changes and trustedboot gets re-initialized during
fast-reboot this patch also set boot_services_exited to false after every
initialization so we always exit trustedboot whenever it get initialized.
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
We are using SAI indicator location from SLCA to represent System Attention
Indicator location code. In P9, this is mapped to op-panel location code.
op-panel has identify and fault LEDs as well. Our SPCN command lists
op-panel location code as well. Hence we get below OPAL warning.
OPAL msglog:
FSPLED: duplicate location code U78D3.001.WT0004T-D1
Because of above issue we are not creating device tree node for D1
identify/fault indicators.
We have System Attention Indicator at enclosure level as well.. which is
replica of attention indicator in op-panel. Hence use System VPD location
code to represent attention indicator.
Note that we have dedicated MBOX command to read/update System Attention
Indicator which doesn't need location code. Hence we are fine with this
change.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
OPAL uses different path to trigger MPIPL:
- On BMC system we call SBE S0 interrupt
- On FSP system we call `attn` instruction
Currently on BMC system we collect crash CPU PIR details.. which is needed to
generate proper dump. This happens just before calling SBE S0 interrupt. Since
we don't use this path in FSP system OPAL is not saving crashing CPU details.
Hence by default `opalcore` is not pointing to crashing CPU and not showing
proper backtrace. We have to go through all CPUs to find crashing CPU backtrace.
This patch move this function to common place so that if MPIPL is supported
we collect crashing CPU data.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
After system crash FSP collects dump and passes dump details via HDAT.
OPAL/Linux uses this detail to extract SYSDUMP.
P9 FSP system we have MPIPL support. FSP folks says we have to ignore
platform dump notification passed by HDAT and use inband MPIPL mechanism
to extract dump.
CC: Murulidhar Nataraju <murulidhar@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Based off the Raptor patch:
https://git.raptorcs.com/git/blackbird-skiboot/commit/?id=c81f9d66592dc2a7cf7f6c59c3def5cee0638c1f
Notable changes:
- slot names matching what's silkscreened on the board
- Expose IPL Observer over op-panel OPAL calls
This means you can "printf '\xfe\xfe\xfe' > /dev/op_panel" to
make the IPL Observer on the Raptor BMC builds to realise it
can turn on fan control.
Signed-off-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Add a driver for the SCOM ranges of the OCMB. Unlike most chips the OCMB
has two different (three if you count OpenCAPI config space) register
spaces and we need to ensure that the right access size is used on each.
Additionally the SCOM interface is a bit non-standard in that a full
physical address is passed as the SCOM address rather than a register
number so we don't need to perform any address transformations, we just
need to verify that the address falls into one of the nominated address
ranges.
Cc: Klaus Heinrich Kiwi <klaus@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|