Age | Commit message (Collapse) | Author | Files | Lines |
|
OMG Kees Cook was right, the code is *smaller*. We save like a dozen
instructions in the exception path!
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The ISA specifies that MCE interrupts in power saving modes will enter
at 0x200 with powersave bits in SRR1 set. This is not currently
supported properly, the MCE will just happen like a normal interrupt,
but GPRs could be lost, which would lead to crashes (e.g., r1, r2, r13
etc).
So check the power save bits similarly to the sreset vector, and
handle this properly.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This requires implementing the MSR[RI] bit. Then just allow all
non-fatal sreset exceptions to recover.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Detect non-powersave sresets and send them to the normal exception
handler which prints registers and stack.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This adds the redzone to the interrupt stack, and code to restore
registers.
This can be used for a number of things. Initially it will be used
to recover from system reset interrupts, it could later be used to
handle recoverable machine checks, use the decrementer to implement
a watchdog, handle HMI interrupts at boot, and to implement virtual
memory.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Save and print the MSR of the interrupt context. This can be derived
from the interrupt type, SRR1, and other system register settings. But
it can be useful to quickly verify what's happening.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Print DSISR and DAR, to help with deciphering machine check exceptions,
and improve the output a bit, decode NIP symbol, improve alignment, etc.
Also print a specific header for machine check, because we do expect to
see these if there is a hardware failure.
Before:
[ 0.005968779,3] ***********************************************
[ 0.005974102,3] Unexpected exception 200 !
[ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000
[ 0.005998130,3] CFAR : 00000000300b58bc
[ 0.006002769,3] CR : 40000004 XER: 20000000
[ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000
[ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000
[...]
After:
[ 0.003287941,3] ***********************************************
[ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24
[ 0.003579628,3] CFAR : 00000000300b5964
[ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000
[ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000
[ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000
[ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80
[ 0.003609930,3] CR : 40000004 XER : 20000000
[ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000
[ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000
[...]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Presently abort() call sets up HID0, triggers attention and finally
calls infinite for loop. FSP takes care of collecting required logs
and reboots the system. This sequence is specific to FSP machine
and it will not work on BMC based machine. Hence move FSP specific
code to hw/fsp/fsp-attn.c.
Note that this patch adds new parameter to abort call. Hence replaced
_abort() by abort() in exception.c so that we can capture file info
as well.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Now that opal.h includes opal-api.h, there are a bunch of files that
include both but don't need to.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
And print some informations about GPR state, backtrace, etc...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Linux no longer calls it, it never worked on LE and generally
speaking never really did anything useful anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This is probably not the best collection of things in the world,
but it means that opal.h is much closer to being directly usable
by an OS.
This triggers a bunch of #include fixes throughout the tree.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Removed following:
- Machine check handle and other related routines.
- per-cpu MCE event used to record machine check data
cpu_thread->mc_event;
- Machine check related definition including mce event structure from
include/opal.h
- A comment above GET_STACK() #define that warns about runtime modification
made to GET_STACK macro by MC patching code.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Now that we catch/handle machine check interrupt directly in Linux host
PowerNV kernel, we are not anymore dependent on OPAL firmware to do MCE
handling job for us. The MCE handling code in OPAL has exclusive stack
space (4k size) reserved and remains unused with Linux host not being
dependent on it anymore. Hence, this patch removes the code that allows
machine check interrupt patching in OPAL and reclaims back 4k of stack
space for use of normal stack. For older kernel the patching request
will result into an error.
The subsequent patch will remove the rest of MCE handling code from OPAL.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
With new proposed change, Linux will get the HMI interrupt directly. Linux
will then invoke opal_handle_hmi to handle HMI recovery in opal. After
handling HMI errors, opal will generate an OPAL HMI event and queue it up
in opal message infrastructure so that Linux host can pull the event
and act upon it accordingly. This patch also adds new message type for
HMI event.
Changes in v2:
- Removed the token argument from opal_handle_hmi()
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Move the original hmi handler to new file core/hmi.c. No functionality
change, just a code movement and variable name change.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|