aboutsummaryrefslogtreecommitdiff
path: root/core/stack.c
AgeCommit message (Collapse)AuthorFilesLines
2020-12-15Fix possible deadlock with DEBUG buildVasant Hegde1-2/+2
Sample output from Cédric: ------------------------- [ 88.294111649,7] cpu_idle_p9 called on cpu 0x063c with pm disabled [ 88.289365222,7] cpu_idle_p9 called on cpu 0x025f with pm disabled [ 88.289900684,7] cpu_idle_p9 called on cpu 0x045f with pm disabled [ 88.302621295,7] CHIPTOD: Base TFMR=0x2512000000000000 [ 88.289899701,7] cpu_idle_p9 called on cpu 0x0456 with pm disabled LOCK ERROR: Deadlock detected @0x30402740 (state: 0x0000000400000001) [ 88.332264757,3] *********************************************** [ 88.332300051,3] < assert failed at core/lock.c:32 > [ 88.332328282,3] . [ 88.332347335,3] . [ 88.332364894,3] . [ 88.332377963,3] OO__) [ 88.332395458,3] <"__/ [ 88.332412628,3] ^ ^ [ 88.332450246,3] Fatal TRAP at 00000000300286a0 .lock_error+0x64 MSR 9000000000021002 [ 88.332501812,3] CFAR : 00000000300414f4 MSR : 9000000000021002 [ 88.332536539,3] SRR0 : 00000000300286a0 SRR1 : 9000000000021002 [ 88.332574644,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 88.332610635,3] DSISR: 00000000 DAR : 0000000000000000 [ 88.332650628,3] LR : 0000000030028690 CTR : 00000000300f9fa0 [ 88.332684451,3] CR : 20002000 XER : 00000000 [ 88.332712767,3] GPR00: 0000000030028690 GPR16: 0000000032c98000 [ 88.332748046,3] GPR01: 0000000032c9b0a0 GPR17: 0000000000000000 [ 88.332784060,3] GPR02: 0000000030169d00 GPR18: 0000000000000000 [ 88.332822091,3] GPR03: 0000000032c9b310 GPR19: 0000000000000000 [ 88.332861357,3] GPR04: 0000000030041480 GPR20: 0000000000000000 [ 88.332897229,3] GPR05: 0000000000000000 GPR21: 0000000000000000 [ 88.332937051,3] GPR06: 0000000000000010 GPR22: 0000000000000000 [ 88.332968463,3] GPR07: 0000000000000000 GPR23: 0000000000000000 [ 88.333007333,3] GPR08: 000000000002cbb5 GPR24: 0000000000000000 [ 88.333041971,3] GPR09: 0000000000000000 GPR25: 0000000000000000 [ 88.333081073,3] GPR10: 0000000000000000 GPR26: 0000000000000003 [ 88.333114301,3] GPR11: 3839616263646566 GPR27: 0000000000000211 [ 88.333156040,3] GPR12: 0000000020002000 GPR28: 000000003042a134 [ 88.333189222,3] GPR13: 0000000000000000 GPR29: 0000000030402740 [ 88.333225638,3] GPR14: 0000000000000000 GPR30: 0000000000000001 [ 88.333259730,3] GPR15: 0000000000000000 GPR31: 0000000000000000 CPU 0211 Backtrace: S: 0000000032c9b3b0 R: 0000000030028690 .lock_error+0x54 S: 0000000032c9b440 R: 0000000030028828 .add_lock_request+0xd0 S: 0000000032c9b4f0 R: 0000000030028a9c .lock_caller+0x8c S: 0000000032c9b5a0 R: 0000000030021b30 .__mcount_stack_check+0x70 S: 0000000032c9b650 R: 00000000300fabb0 .list_check_node+0x1c S: 0000000032c9b6f0 R: 00000000300fac98 .list_check+0x38 S: 0000000032c9b790 R: 00000000300289bc .try_lock_caller+0xac S: 0000000032c9b830 R: 0000000030028ad8 .lock_caller+0xc8 S: 0000000032c9b8e0 R: 0000000030028d74 .lock_recursive_caller+0x54 S: 0000000032c9b980 R: 0000000030020cb8 .console_write+0x48 S: 0000000032c9ba30 R: 00000000300445a8 .vprlog+0xc8 S: 0000000032c9bc20 R: 0000000030044630 ._prlog+0x50 S: 0000000032c9bcb0 R: 0000000030029204 .cpu_idle_p9+0x74 S: 0000000032c9bd40 R: 0000000030029628 .cpu_idle_pm+0x4c S: 0000000032c9bde0 R: 0000000030023fe0 .__secondary_cpu_entry+0xa0 S: 0000000032c9be70 R: 0000000030024034 .secondary_cpu_entry+0x40 S: 0000000032c9bf00 R: 0000000030003290 secondary_wait+0x8c CPU 0x4: opal_run_pollers -> check_stacks -> takes stack_check_lock lock prlog -> console_write -> waits for con_lock CPU 0x211 cpu_idle_p9 -> prlog -> console_write -> Takes con_lock lock list_check_node -> tries to take stack_check_lock and hits deadlock. I think we don't need to hold `stack_check_lock` while printing backtraces. Instead it makes sense to hold backtrace lock (bt_lock) and print output. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-09-09stack: only print stack usage backtraces when we hit a new watermarkOliver O'Halloran1-4/+4
With DEBUG=1 builds we use the mcount hook to instrument how much stack space we're using. If we detect that a function call has come within 2KB of the bottom of the stack we currently print a backtrace. This can result in a huge amount of console IO in DEBUG=1 builds which can cause op-test timeouts, etc. Printing a backtrace on each function call isn't terribly useful, and it ends up crowding out the backtrace that's printed when we hit a new stack usage watermark. The watermark should provide enough information to find and fix excessive stack usage issues so drop the per-function backtrace printing and move the warning into the high-watermark check. This change is largely necessary because of DEBUG=1 expands adds a backtrace save area to struct lock which expands the size of it to nearly 2KB. struct cpu_thread (which lives at the bottom of the per-thread stacks) contains three locks and an additional backtrace save area which is enabled when DEBUG=1. The extra space requirements result in cpu_thread ballooning from ~420 bytes to nearly 8KB. Any growth in cpu_thread also results in less stack space being available for the thread, so when DEBUG=1 is enabled we go from having a 16KB stack to an 8KB stack. Although this seems large, skiboot does have some fairly deep call chains (UART console flushing, TPM drivers, both combined) which can cause the thread to come within 2KB of the stack use warning zone. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> --- Maybe we should swap locations of the normal and emergency stacks so cpu_thread takes space from the emergency stack rather than the normal one. The e-stack should only be used at runtime where the call chains should be smaller.
2020-06-11core: interrupt markers for stack tracesNicholas Piggin1-0/+11
Use magic marker in the exception stack frame that is used by the unwinder to decode the interrupt type and NIA. The below example trace comes from a modified skiboot that uses virtual memory, but any interrupt type will appear similarly. CPU 0000 Backtrace: S: 0000000031c13580 R: 0000000030028210 .vm_dsi+0x360 S: 0000000031c13630 R: 000000003003b0dc .exception_entry+0x4fc S: 0000000031c13830 R: 0000000030001f4c exception_entry_foo+0x4 --- Interrupt 0x300 at 000000003002431c --- S: 0000000031c13b40 R: 000000003002430c .make_free.isra.0+0x110 S: 0000000031c13bd0 R: 0000000030025198 .mem_alloc+0x4a0 S: 0000000031c13c80 R: 0000000030028bac .__memalign+0x48 S: 0000000031c13d10 R: 0000000030028da4 .__zalloc+0x18 S: 0000000031c13d90 R: 000000003002fb34 .opal_init_msg+0x34 S: 0000000031c13e20 R: 00000000300234b4 .main_cpu_entry+0x61c S: 0000000031c13f00 R: 00000000300031b8 boot_entry+0x1b0 --- OPAL boot --- Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [oliver: the new stackentry fields made our test heaps too small] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> fixup! core: interrupt markers for stack traces
2020-03-12Re-license IBM written files as Apache 2.0 OR GPLv2+Stewart Smith1-1/+1
SPDX makes it a simpler diff. I have audited the commit history of each file to ensure that they are exclusively authored by IBM and thus we have the right to relicense. The motivation behind this is twofold: 1) We want to enable experiments with coreboot, which is GPLv2 licensed 2) An upcoming firmware component wants to incorporate code from skiboot and code from the Linux kernel, which is GPLv2 licensed. I have gone through the IBM internal way of gaining approval for this. The following files are not exclusively authored by IBM, so are *not* included in this update (I will be seeking approval from contributors): core/direct-controls.c core/flash.c core/pcie-slot.c external/common/arch_flash_unknown.c external/common/rules.mk external/gard/Makefile external/gard/rules.mk external/opal-prd/Makefile external/pflash/Makefile external/xscom-utils/Makefile hdata/vpd.c hw/dts.c hw/ipmi/ipmi-watchdog.c hw/phb4.c include/cpu.h include/phb4.h include/platform.h libflash/libffs.c libstb/mbedtls/sha512.c libstb/mbedtls/sha512.h platforms/astbmc/barreleye.c platforms/astbmc/garrison.c platforms/astbmc/mihawk.c platforms/astbmc/nicole.c platforms/astbmc/p8dnu.c platforms/astbmc/p8dtu.c platforms/astbmc/p9dsu.c platforms/astbmc/vesnin.c platforms/rhesus/ec/config.h platforms/rhesus/ec/gpio.h platforms/rhesus/gpio.c platforms/rhesus/rhesus.c platforms/astbmc/talos.c platforms/astbmc/romulus.c Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: fixed up the drift] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-10-03core/exceptions.c: do not include handler code in exception backtraceNicholas Piggin1-4/+26
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-07-26SPDX-ify all skiboot codeStewart Smith1-14/+4
Use Software Package Data Exchange (SPDX) to indicate license for each file that is unique to skiboot. At the same time, ensure the (C) who and years are correct. See https://spdx.org/ Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: Added a few missing files] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-06-13sparse: fix bt_lock should be staticStewart Smith1-1/+1
Fix this sparse warning: core/stack.c:123:13: warning: symbol 'bt_lock' was not declared. Should it be static? Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28core/stack: Rename backtrace functions, get rid of wrappersAndrew Donnellan1-6/+7
Rename ___backtrace() to backtrace_create() and ___print_backtrace() to backtrace_print(). Get rid of __backtrace() and __print_backtrace() wrappers. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28core/stack: Convert stack check code to not use backtrace wrapperAndrew Donnellan1-5/+5
We're about to get rid of __backtrace() and __print_backtrace(), convert the stack check code to not use them. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28core/stack: Store PIR in ___backtrace()Andrew Donnellan1-3/+3
In ___backtrace(), store the current PIR in the metadata struct, rather than relying on the caller to do it. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28core/stack: Define a backtrace metadata structAndrew Donnellan1-27/+26
Every time we take a backtrace, we have to store the number of entries, the OPAL API token, r1 caller and PIR values. Rather than defining these and passing them around all over the place, let's throw them in a struct. Define a struct, struct bt_metadata, to store these details, and convert ___backtrace() and ___print_backtrace() to use it. We change the wrapper functions __backtrace() and __print_backtrace() to call ___backtrace()/___print_backtrace() with struct bt_metadata, but don't change their parameter profiles for now - we'll do that later. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28core/stack: Remove r1 argument from ___backtrace()Andrew Donnellan1-4/+2
___backtrace() is always called with r1 = __builtin_frame_address(0), and it's unlikely we're going to need it to do something else any time soon, so simplify the API by removing the parameter. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-20skiboot.lds.S: move read-write data after the end of symbol mapNicholas Piggin1-2/+0
This also tidies up linker script symbol declarations and adds _rodata_mem symbol for the next change to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-18core/opal: Emergency stack for re-entryNicholas Piggin1-2/+11
This detects OPAL being re-entered by the OS, and switches to an emergency stack if it was. This protects the firmware's main stack from re-entrancy and allows the OS to use NMI facilities for crash / debug functionality. Further nested re-entry will destroy the previous emergency stack and prevent returning, but those should be rare cases. This stack is sized at 16kB, which doubles the size of CPU stacks, so as not to introduce a regression in primary stack size. The 16kB stack originally had a 4kB machine check stack at the top, which was removed by 80eee1946 ("opal: Remove machine check interrupt patching in OPAL."). So it is possible the size could be tightened again, but that would require further analysis. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-18core/stack: backtrace unwind basic OPAL call detailsNicholas Piggin1-7/+26
Put OPAL callers' r1 into the stack back chain, and then use that to unwind back to the OPAL entry frame (as opposed to boot entry, which has a 0 back chain). >From there, dump the OPAL call token and the caller's r1. A backtrace looks like this: CPU 0000 Backtrace: S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 --- This is pretty basic for the moment, but it does give you the bottom of the Linux stack. It will allow some interesting improvements in future. First, with the eframe, all the call's parameters can be printed out as well. The ___backtrace / ___print_backtrace API needs to be reworked in order to support this, but it's otherwise very simple (see opal_trace_entry()). Second, it will allow Linux's stack to be passed back to Linux via a debugging opal call. This will allow Linux's BUG() or xmon to also print the Linux back trace in case of a NMI or MCE or watchdog lockup that hits in OPAL. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-02-08core/utils: add snprintf_symbolNicholas Piggin1-13/+4
get_symbol is difficult to use. Add snprintf_symbol helper which prints a symbol into a buffer with length, and returns the number of bytes used, similarly to snprintf. Use this in the stack dumping code rather than open-coding it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-20lock: Add additional lock auditing codeBenjamin Herrenschmidt1-0/+1
Keep track of lock owner name and replace lock_depth counter with a per-cpu list of locks held by the cpu. This allows us to print the actual locks held in case we hit the (in)famous message about opal_pollers being run with a lock held. It also allows us to warn (and drop them) if locks are still held when returning to the OS or completing a scheduled job. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [stewart: fix unit tests] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25core/backtrace: Serialise printing backtracesOliver O'Halloran1-0/+12
Add a lock so that only one thread can print a backtrace at a time. This should prevent multiple threads from garbaling each other's backtraces. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-12-22stack: Don't recurse into __stack_chk_failBenjamin Herrenschmidt1-2/+7
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-03-30core: Fix backtrace for gcc 6Joel Stanley1-2/+2
GCC 6 warns when we look at any stack frame other than our own, ie any argument to __builtin_frame_address other than zero. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-03-04Fix early backtracesRyan Grimm1-1/+6
If we fail an assert() before we add a mem region, such as missing chip-id in a dt xscom node, we don't get a backtrace: [836883,0] Assert fail: core/device.c:870:id != 0xffffffff [848859,0] Aborting! CPU 0000 Backtrace: This patch adjusts the top_of_ram value compared to the fp stack frame to assume one stack early on so we get a backtrace: [440546,0] Assert fail: core/device.c:822:id != 0xffffffff [452522,0] Aborting! CPU 0000 Backtrace: S: 0000000031c03b70 R: 00000000300135d0 .backtrace+0x24 S: 0000000031c03bf0 R: 0000000030017f38 ._abort+0x4c S: 0000000031c03c70 R: 0000000030017fb4 .assert_fail+0x34 S: 0000000031c03cf0 R: 0000000030021830 .dt_get_chip_id+0x24 S: 0000000031c03d60 R: 00000000300143cc .init_chips+0x23c S: 0000000031c03e30 R: 0000000030013ab8 .main_cpu_entry+0x110 S: 0000000031c03f00 R: 000000003000254c boot_entry+0x19c Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2015-08-11Don't try to print symbol when we didn't find oneStewart Smith1-1/+3
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2014-11-18Add symbolic backtraces and expose skiboot map to LinuxBenjamin Herrenschmidt1-23/+28
We use a double link technique, doing a first pass with a .o containing a dummy symbol map, then re-linking with a new .o Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-18Fix backtracesBenjamin Herrenschmidt1-15/+18
__builtin_frame_address really wants constants Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-18Capture backtraces when measuring stack depthBenjamin Herrenschmidt1-13/+24
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-11-18Rename backtrace.c to stack.c and move stack related bitsBenjamin Herrenschmidt1-0/+184
... from util.c to stack.c Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>