aboutsummaryrefslogtreecommitdiff
path: root/core
AgeCommit message (Collapse)AuthorFilesLines
2018-03-27NPU2: dump NPU2 registers on npu2 HMIStewart Smith1-2/+73
Due to the nature of debugging npu2 issues, folk are wanting the full list of NPU2 registers dumped when there's a problem. We have to list out each register as traversing the range triggers FIR bits that confuse PRD. Suggested-by: Ryan Black <rblack@us.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 215a7ce1f1863d61936799568a2ea53afba92ddc) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith1-37/+1
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 80452d2cf2ce4dfc769b74c28bd0c73ec076b9be) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errorsNicholas Piggin2-0/+3
This disables fast reboot in several more cases where serious errors like lock corruption or call re-entrancy are detected. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 1f53f9fa766f0e8ccf4ce59821c541dba268340b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22core/opal: allow some re-entrant callsNicholas Piggin1-3/+16
This allows a small number of OPAL calls to succeed despite re-entering the firmware, and rejects others rather than aborting. This allows a system reset interrupt that interrupts OPAL to do something useful. Sreset other CPUs, use the console, which allows xmon to work or stack traces to be printed, reboot the system. Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is used for many other things that does not mean a serious permanent error. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 82fd5d06beee81b7f219937f3ed311c42b1eb3d8) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22core/opal: abort in case of re-entrant OPAL callNicholas Piggin1-1/+1
The stack is already destroyed by the time we get here, so there is not much point continuing. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 884f97b25b49b961992a782a7173f7b18e50c8f0) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-04Tie tm-suspend fw-feature and opal_reinit_cpus() togetherMichael Neuling1-5/+22
Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) always returns OPAL_UNSUPPORTED. This ties the tm suspend fw-feature to the opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm suspend is disabled, we correctly report it to the kernel. For backwards compatibility, it's assumed tm suspend is available if the fw-feature is not present. Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and below has TM disabled completely (not just suspend). We are using opal_reinit_cpus() to determine this setting (rather than the device tree/HDAT) as some future firmware may let us change this dynamically after boot. That is not the case currently though. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 730bccbbb6154bd9bf7e98d3fcb121521f325996) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28NPU2 HMIs: dump out a *LOT* of npu2 registers for debuggingStewart Smith1-1/+37
This is not the way we want to end up doing this. This is a hack to make folk happy and not require crondump to debug nvidia/npu2 issues. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21core: Fix mismatched names between reserved memory nodes & propertiesJeremy Kerr2-15/+24
OPAL exposes reserved memory regions through the device tree in both new (nodes) and old (properties) formats. However, the names used for these don't match - we use a generated cell address for the nodes, but the plain region name for the properties. This change, heavily based on code from Oliver O'Halloran <oohall@gmail.com>, reworks the dt-generation code to firstly generate the new-format nodes, then uses those same names to generate the property data. Reported-by: Deb McLemore <debmc@linux.vnet.ibm.com> CC: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> [stewart: fix test case] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensor-groups: occ: Add support to disable/enable sensor groupShilpasri G Bhat1-1/+13
This patch adds a new opal call to enable/disable a sensor group. This call is used to select the sensor groups that needs to be copied to main memory by OCC at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [stewart: rebase and bump OPAL API number] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensors: Support reading u64 sensor valuesShilpasri G Bhat1-2/+76
This patch adds support to read u64 sensor values. This also adds changes to the core and the backend implementation code to make this API as the base call. Host can use this new API to read sensors upto 64bits. This adds a list to store the pointer to the kernel u32 buffer, for older kernels making async sensor u32 reads. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21dt: add /cpus/ibm, powerpc-cpu-features device tree bindingsNicholas Piggin4-1/+931
This is a new CPU feature advertising interface that is fine-grained, extensible, aware of privilege levels, and gives control of features to all levels of the stack (firmware, hypervisor, and OS). The design and binding specification is described in detail in doc/. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [stewart: fix maybe-uninitialized warning from older GCC, doc cleanup] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-20Revert "pci: Shared slot state synchronisation for hot reset"Russell Currey1-14/+0
An issue was found in shared slot reset where the system can be stuck in an infinite loop, pull the code out until there's a proper fix. This reverts commit 1172a6c57ff3c66f6361e572a1790cbcc0e5ff37. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-18increase log verbosity in debug buildsStewart Smith1-1/+1
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-19Add -debug to version on DEBUG buildsStewart Smith1-1/+7
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-19cpu_wait_job: Correctly report time spent waiting for jobStewart Smith1-3/+3
Way back when, we got confused between timebase and ms, so let's just use ms and be done with it. Fixes: 514406fa44279996bfc9c85c1e4e53689d375e64 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-14DT: Add "version" property under ibm, firmware-versions nodeVasant Hegde1-0/+20
First line of VERSION section in PNOR contains firmware version. Use that to add "version" property under firmware versions dt node. Sample output: -------------- root@xxx2:/proc/device-tree/ibm,firmware-versions# lsprop version "witherspoon-ibm-OP9_v1.19_1.94" .... ... Suggested-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-13core: hostservices: Remove redundant special wakeup codeShilpasri G Bhat1-161/+3
Use the generic dctl_{set/clear}_special_wakeup() in hostservices to assert and de-assert core special wakeup for P8 and remove the duplicated code. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-13core/device.c: Fix dt_find_compatible_nodeAlistair Popple2-11/+39
dt_find_compatible_node() and dt_find_compatible_node_on_chip() are used to find device nodes under a parent/root node with a given compatible property. dt_next(root, prev) is used to walk the child nodes of the given parent and takes two arguments - root contains the parent node to walk whilst prev contains the previous child to search from so that it can be used as an iterator over all children nodes. The first iteration of dt_find_compatible_node(root, prev) calls dt_next(root, root) which is not a well defined operation as prev is assumed to be child of the root node. The result is that when a node contains no children it will start returning the parent nodes siblings until it hits the top of the tree at which point a NULL derefence is attempted when looking for the root nodes parent. Dereferencing NULL can result in undesirable data exceptions during system boot and untimely non-hilarious system crashes. dt_next() should not be called with prev == root. Instead we add a check to dt_next() such that passing prev = NULL will cause it to start iterating from the first child node (if any). Also add a unit test for this case to run-device.c. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08hw/npu2: Implement logging HMI actionsBalbir Singh1-1/+82
Log HMI errors as step 1. OS will need to deduce and interpret the HMI event. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08core/exception: beautify exception handler, add MCE-involved registersNicholas Piggin1-3/+36
Print DSISR and DAR, to help with deciphering machine check exceptions, and improve the output a bit, decode NIP symbol, improve alignment, etc. Also print a specific header for machine check, because we do expect to see these if there is a hardware failure. Before: [ 0.005968779,3] *********************************************** [ 0.005974102,3] Unexpected exception 200 ! [ 0.005978696,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.005985239,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.005991782,3] LR : 000000003002ad80 CTR : 0000000000000000 [ 0.005998130,3] CFAR : 00000000300b58bc [ 0.006002769,3] CR : 40000004 XER: 20000000 [ 0.006008069,3] GPR00: 000000003002ad80 GPR16: 0000000000000000 [ 0.006015170,3] GPR01: 0000000031c03bd0 GPR17: 0000000000000000 [...] After: [ 0.003287941,3] *********************************************** [ 0.003561769,3] Fatal MCE at 000000003002ad80 .nvram_init+0x24 [ 0.003579628,3] CFAR : 00000000300b5964 [ 0.003584268,3] SRR0 : 000000003002ad80 SRR1 : 9000000000001000 [ 0.003590812,3] HSRR0: 00000000300027b4 HSRR1: 9000000030001000 [ 0.003597355,3] DSISR: 00000000 DAR : 0000000000000000 [ 0.003603480,3] LR : 000000003002ad68 CTR : 0000000030093d80 [ 0.003609930,3] CR : 40000004 XER : 20000000 [ 0.003615698,3] GPR00: 00000000300149e8 GPR16: 0000000000000000 [ 0.003622799,3] GPR01: 0000000031c03bc0 GPR17: 0000000000000000 [...] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08core/init: manage MSR[ME] explicitly, always enableNicholas Piggin1-0/+15
The current boot sequence inherits MSR[ME] from the IPL firmware, and never changes it. Some environments disable MSR[ME] (e.g., mambo), and others can enable it (hostboot). This has two problems. First, MSR[ME] must be disabled while in the process of taking over the interrupt vector from the previous environment. Second, after installing our machine check handler, MSR[ME] should be enabled to get some useful output rather than a checkstop. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08core/utils: add snprintf_symbolNicholas Piggin2-14/+29
get_symbol is difficult to use. Add snprintf_symbol helper which prints a symbol into a buffer with length, and returns the number of bytes used, similarly to snprintf. Use this in the stack dumping code rather than open-coding it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08fast-boot: occ: Re-parse the pstate table during fast-bootShilpasri G Bhat1-1/+2
OCC shares the frequency list to host by copying the pstate table to main memory in HOMER. This table is parsed during boot to create device-tree properties for frequency and pstate IDs. OCC can update the pstate table to present a new set of frequencies to the host. But host will remain oblivious to these changes unless it is re-inited with the updated device-tree CPU frequency properties. So this patch allows to re-parse the pstate table and update the device-tree properties during fast-reboot. OCC updates the pstate table when asked to do so using pstate-table bias command. And this is mainly used by WOF team for characterization purposes. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08fast-reboot: move pci_reset error handling into fast-reboot codeNicholas Piggin2-7/+17
pci_reset() currently does a platform reboot if it fails. It should not know about fast-reboot at this level, so instead have it return an error, and the fast reboot caller will do the platform reboot. The code essentially does the same thing, but flexibility is improved. Ideally the fast reboot code should perform pci_reset and all such fail-able operations before the CPU resets itself and destroys its own stack. That's not the case now, but that should be the goal. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-08core/init: move imc catalog preload init after the STB init.Pridhiviraj Paidipeddi1-3/+3
As a safer side move the imc catalog preload after the STB init to make sure the imc catalog resource get's verified and measured properly during loading when both secure and trusted boot modes are on. Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-30core: Avoid possible uninitialized pointer read (CID 209502)Cyril Bur1-1/+1
A likely copy and paste oversight. Fixes: 0d84ea6b (core: Add support for quiescing OPAL) Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-01-14SLW: Delay cpuidle device-tree creationAkshay Adiga1-1/+0
Create cpuidle device-tree after slw_init(), so that we can stop the deeper states from being added , when wakeup engine is not present or failed. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-20lock: Add additional lock auditing codeBenjamin Herrenschmidt17-23/+74
Keep track of lock owner name and replace lock_depth counter with a per-cpu list of locks held by the cpu. This allows us to print the actual locks held in case we hit the (in)famous message about opal_pollers being run with a lock held. It also allows us to warn (and drop them) if locks are still held when returning to the OS or completing a scheduled job. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [stewart: fix unit tests] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-20Add support for new gcc 7 parametrized stack protectorBenjamin Herrenschmidt2-5/+10
This gives us per-cpu guard values as well. For now I just xor a magic constant with the CPU PIR value. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-20Mambo: run hello_world and sreset_world tests with Secure and Trusted BootStewart Smith1-2/+13
We *disable* the secure boot part, but we keep the verified boot part as we don't currently have container verification code for Mambo. We can run a small part of the code currently though. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-18libstb/cvc: update memory-region to point to /reserved-memoryClaudio Carvalho1-0/+7
The linux documentation, reserved-memory.txt, says that memory-region is a phandle that pairs to a children of /reserved-memory. This updates /ibm,secureboot/ibm,cvc/memory-region to point to /reserved-memory/secure-crypt-algo-code instead of /ibm,hostboot/reserved-memory/secure-crypt-algo-code. Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-18core: update superseded libstb calls in flash.c and init.cClaudio Carvalho2-8/+9
List of libstb calls that were superseded: sb_verify() -> secureboot_verify() tb_measure() -> trustedboot_measure() stb_final() -> trustedboot_exit_boot_services() stb_init() -> secureboot_init() and trustedboot_init() The new functions are supported in both P8 and P9. Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-18core/init.c: remove redundant calls to verify and measure BOOTKERNELClaudio Carvalho1-22/+4
The flash driver calls libstb to verify and measure every PNOR partition requested at boot time. This removes redundat code from init.c used to verify and measure BOOTKERNEL. Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-18core/flash.c: extern function to get the name of a PNOR partitionClaudio Carvalho1-0/+10
This adds the flash_map_resource_name() to allow skiboot subsystems to lookup the name of a PNOR partition. Thus, we don't need to duplicate the same information in other places (e.g. libstb). Signed-off-by: Claudio Carvalho <cclaudio@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-18make check: Make valgrind optionalMichael Ellerman1-1/+1
To (slightly) lower the barrier for contributions, we can make valgrind optional with just a small amount of plumbing. This allows make check to run successfully without valgrind. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-14opal-prd: Add support for runtime OCC reset in ZZShilpasri G Bhat1-13/+28
This patch handles OCC_RESET runtime events in host opal-prd and also provides support for calling 'hostinterface->wakeup()' which is required for doing the reset operation. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-14prd: Enable error logging via firmware_request interfaceVasant Hegde1-1/+1
In P9 HBRT sends error logs to FSP via firmware_request interface. This patch adds support to parse error log and send it to FSP. CC: Jeremy Kerr <jk@ozlabs.org> CC: Daniel M Crowell <dcrowell@us.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-13core/hmi: Display chip location code while displaying core FIR.Mahesh Salgaonkar1-1/+4
No functionality change. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-13core/hmi: Do not display FIR details if none of the bits are set.Mahesh Salgaonkar1-0/+3
So that we don't flood OPAL console logs with information that is not useful. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-13opal/hmi: HMI logging with location code info.Mahesh Salgaonkar1-4/+36
Add few HMI debug prints with location code info few additional info. No functionality change. With this patch the log messages will look like: [210612.175196744,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [210612.175200449,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc1]: P:8 C:16 T:1: TFMR(2d12000870e04020) Timer Facility Error [210660.259689526,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000 [210660.259695649,7] HMI: [Loc: UOPWR.1302LFA-Node0-Proc0]: P:0 C:16 T:1: Processor recovery Done. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-13core/hmi: Use pr_fmt macro for tagging log messagesMahesh Salgaonkar1-16/+19
No functionality changes. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-13opal: Get chip location codeMahesh Salgaonkar1-0/+10
and store it under proc_chip for quick reference during HMI handling code. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-11chiptod: Keep boot timestamps contiguousOliver O'Halloran1-3/+5
Currently we reset the timebase value to (almost) zero when synchronising the timebase of each chip to the Chip TOD network which results in this: [ 42.374813167,5] CPU: All 80 processors called in... [ 2.222791151,5] FLASH: Found system flash: Macronix MXxxL51235F id:0 [ 2.222977933,5] BT: Interface initialized, IO 0x00e4 This patch modifies the chiptod_init() process to use the current timebase value rather than resetting it to zero. This results in the timestamps remaining contigious from the start of hostboot until the petikernel starts. e.g. [ 70.188811484,5] CPU: All 144 processors called in... [ 72.458004252,5] FLASH: Found system flash: id:0 [ 72.458147358,5] BT: Interface initialized, IO 0x00e4 Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-11timer: Stop calling list_top() racilyBenjamin Herrenschmidt1-4/+5
This will trip the debug checks in debug builds under some circumstances and is actually a rather bad idea as we might look at a timer that is concurrently being removed and modified, and thus incorrectly assume there is no work to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-06Fix booting & OPAL call return values with DEBUG=1Stewart Smith1-2/+3
On a debug build, _mcount would trash r3 and opal_exit_check would not restore it, leaving OPAL calls returning garbage. this fix simply preserves the return value and doesn't let the compiler get fancy on us. We effectively just get an extra `mr` instruction to restore r3. Fixes: 9c565ee6bca4b665d9d1120bfff5e88ee80615bc Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reported-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-04dctl: p9 increase thread quiesce timeoutNicholas Piggin1-4/+14
We require all instructions to be completed before a thread is considered stopped, by the dctl interface. Long running instructions like cache misses and CI loads may take a significant amount of time to complete, and timeouts have been observed in stress testing. Increase the timeout significantly, to cover this. The workbook just says to poll, but we like to have timeouts to avoid getting stuck in firmware. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-03direct-controls: enable fast reboot direct controls for mamboNicholas Piggin2-3/+19
Add mambo direct controls to stop threads, which is required for reliable fast-reboot. Enable direct controls by default on mambo. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-03fast-reboot: bare bones fast reboot implementation for POWER9Nicholas Piggin2-28/+50
This is an initial fast reboot implementation for p9 which has only been tested on the Witherspoon platform, and without the use of NPUs, NX/VAS, etc. This has worked reasonably well so far, with no failures in about 100 reboots. It is hidden behind the traditional fast-reboot experimental nvram option, until more platforms and configurations are tested. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-03fast-reboot: move boot CPU cleanup logically together with secondariesNicholas Piggin1-8/+8
Move the boot CPU cleanup and state transition to active, logically together with secondaries. Don't release secondaries from fast reboot hold until everyone has cleaned up and transitioned to active. This is cosmetic, but it is helpful to run the fast reboot state machine the same way on all CPUs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-12-03fast-reboot: move fdt freeing into initNicholas Piggin2-7/+9
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>