aboutsummaryrefslogtreecommitdiff
path: root/core
AgeCommit message (Collapse)AuthorFilesLines
2019-03-04hw/bt: Do not disable ipmi message retry during OPAL bootVasant Hegde1-1/+2
[ Upstream commit c0ab7b45db3dc44daf001f61324bd1418091dede ] Currently OPAL doesn't know whether BMC is functioning or not. If BMC is down (like BMC reboot), then we keep on retry sending message to BMC. So in some corner cases we may hit hard lockup issue in kernel. Ideally we should avoid using synchronous path as much as possible. But for now commit 01f977c3 added option to disable message retry in synchronous. But this fix is not required during boot. Hence lets disable IPMI message retry during OPAL boot. Fixes: 01f977c3 (hw/bt: Add backend interface to disable ipmi message) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-03-04core/ipmi: Add ipmi sync messages to top of the listVasant Hegde1-1/+1
[ Upstream commit 968c30905d7a61d777606a5f5c7949027564efd8 ] In ipmi_queue_msg_sync() path OPAL will wait until it gets response from BMC. If we do not get response ontime we may endup in kernel hardlockups. Hence lets add sync messages to top of the queue. This will reduces the chance of hardlockups. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-03-04hw/bt: Add backend interface to disable ipmi message retry optionVasant Hegde1-0/+2
[ Upstream commit 01f977c33d46f35ae6735c874d415a793d7bd9af ] During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface capabilities which includes IPMI message max resend count, message timeout, etc,. Most of the time OPAL gets response from BMC within specified timeout. In some corner cases (like mboxd daemon reset in BMC, BMC reboot, etc) OPAL may not get response within timeout period. In such scenarios, OPAL resends message until max resend count reaches. OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few operations like flash read, write, etc. Thread will wait in OPAL until it gets response from BMC. In some corner cases like BMC reboot, thread may wait in OPAL for long time (more than 20 seconds) and results in kernel hardlockup. This patch introduces new interface to disable message resend option. We will disable message resend option for synchrous message. This will greatly reduces kernel hardlock up issues. This is short term fix. Long term solution is to convert all synchronous messages to asynhrounous one. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-02-15core/opal: Print PIR value in exit pathVasant Hegde1-2/+2
[ Upstream commit 554062d7fe5aac2e1a65a15a0385946a1fb6f8f4 ] Useful for debugging. CC: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-02-15core/ipmi: Improve error messageVasant Hegde1-1/+2
[ Upstream commit 7516e3827e5044442b9b79fa44fe118e101207c8 ] Useful for debugging. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2019-02-13firmware-versions: Add test case for parsing VERSIONStewart Smith18-147/+364
[ Upstream commit 3170270be92ad945600d25ced9352c39fc7f156a ] Also make it possible to use with afl-lop/afl-fuzz just to help make *sure* we're all good. Additionally, if we hit a entry in VERSION that is larger than our buffer size, we skip over it gracefully rather than overwriting the stack. This is only a problem if VERSION isn't trusted, which as of 4b8cc05a94513816d43fb8bd6178896b430af08f it is verified as part of Secure Boot. CC: stable # v5.9+ Fixes: 9727fe384b8685270d344201f7e051475eea3a0b [stewart: fix up include ordering for building on centos7] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-13core/cpu: HID update raceNicholas Piggin1-2/+2
[ Upstream commit d27180b55d7740a711f2a6417eed02782a1cd536 ] If the per-core HID register is updated concurrently by multiple threads, updates can get lost. This has been observed during fast reboot where the HILE bit does not get cleared on all cores, which can cause machine check exception interrupts to crash. Fix this by only updating HID on thread0. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-11cpufeatures: Always advertise POWER8NVL as DD2Alexey Kardashevskiy1-4/+5
[ Upstream commit 17975a6e645768c4651199055052a13858db6506 ] Despite the major version of PVR being 1 (0x004c0100) for POWER8NVL, these chips are functionally equalent to P8/P8E DD2 levels. This advertises POWER8NVL as DD2. As the result, skiboot adds ibm,powerpc-cpu-features/processor-control-facility for such CPUs and the linux kernel can use hypervisor doorbell messages to wake secondary threads; otherwise "KVM: CPU %d seems to be stuck" would appear because of missing LPCR_PECEDH. Fixes: 7f4c8e8ce0b "dt: add /cpus/ibm, powerpc-cpu-features device tree bindings" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-11core/lock: Stop drop_my_locks() from always causing abortReza Arbab1-1/+1
[ Upstream commit 9ef153f6f013b224db8e9b78764ef6cf89c152fa ] The loop in drop_my_locks() looks like this: while((l = list_pop(&this_cpu()->locks_held, struct lock, list)) != NULL) { if (warn) prlog(PR_ERR, " %s\n", l->owner); unlock(l); } Both list_pop() and unlock() call list_del(). This means that on the last iteration of the loop, the list will be empty when we get to unlock_check(), causing this: LOCK ERROR: Releasing lock we don't hold depth @0x30493d20 (state: 0x0000000000000001) [13836.000173140,0] Aborting! CPU 0000 Backtrace: S: 0000000031c03930 R: 000000003001d840 ._abort+0x60 S: 0000000031c039c0 R: 000000003001a0c4 .lock_error+0x64 S: 0000000031c03a50 R: 0000000030019c70 .unlock+0x54 S: 0000000031c03af0 R: 000000003001a040 .drop_my_locks+0xf4 To fix this, change list_pop() to list_top(). Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-11i2c: Fix i2c request hang during opal init if timers are not checkedFrederic Barrat1-0/+16
If an i2c request cannot go through the first time, because the bus is found in error and need a reset or it's locked by the OCC for example, the underlying i2c implementation is using timers to manage the request. However during opal init, opal pollers may not be called, it depends in the context in which the i2c request is made. If the pollers are not called, the timers are not checked and we can end up with an i2c request which will not move foward and skiboot hangs. Fix it by explicitly checking the timers if we are waiting for an i2c request to complete and it seems to be taking a while. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10vpd: Force static analysis to not think about NULL term stringsStewart Smith1-1/+1
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10opal_sync_host_reboot: clarify when we return OPAL_BUSY_EVENTStewart Smith1-6/+4
Basically to shut up static analysis of using a boolean in a non-boolean context (bitwise). Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10opal_trace_entry: Move ifdef around to shut up static analysisStewart Smith1-2/+4
Again, this makes things look slightly different so I don't keep seeing the static analysis warning. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10mem_region.c: Move ifdef for MEM_POISON to shut up static analysisStewart Smith1-2/+6
The static analysis tool is arguably wrong and should go away. But... I'm sick of keeping coming back to it and reviewing the false positives enough to make a slight change to where ifdefs are. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10Change ifdef around dump_fdt() to shut up static analysisStewart Smith1-2/+2
This is a dumb warning from a certain static analysis tool that a function has no effect when the ifdef that would make it have an effect isn't defined and we replace it with a no-op impl. Putting the #ifdef around the call just so I don't have to discount this damn static analysis false positive every time I go and look at the results. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-12-10core/cpu.c: avoid container_of(NULL) in next_cpu()Stewart Smith1-5/+5
A certain finicky static analysis tool did point out that we were operating on a value that could be null (and since first_cpu() calls next_cpu(NULL) to get the first one, it also gets to be complained about as next_cpu() could act on that NULL pointer). So, rework things to shut the static analysis tool up, when in fact this was never a problem. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-28Don't warn on "long" OPAL_RESYNC_TIMEBASE callsStewart Smith1-1/+1
On P8 this is called when we exit fastsleep, and we shouldn't measure the "time" spent in the call for what (in retrospect) is an obvious reason. Fixes: 50ea35c2d07874755c03e6ae2bdf7a33ad2c768a Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-21Warn on long OPAL callsStewart Smith1-0/+9
Measure entry/exit time for OPAL calls and warn appropriately if the calls take too long (>100ms gets us a DEBUG log, > 1000ms gets us a warning). Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-19ipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10msStewart Smith1-1/+1
On a plain boot, this reduces the time spent in OPAL by ~170ms on p9dsu. This is due to hiomap (currently) using synchronous IPMI messages. It will also *significantly* reduce latency on runtime flash operations, as we'll spend typically 10-20ms in OPAL rather than 100-200ms. It's not an ideal solution to that, but it's a quick and obvious win for jitter. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-11-01core/flash: Log return code when ffs_init() failsAndrew Jeffery1-1/+1
Knowing the return code is at least better than not knowing the return code. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-31Run pollers in time_wait() when not bootingStewart Smith1-2/+2
This only bit us hard with hiomap in one scenario. Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward progress on ongoing operations, and the internal to skiboot API has been that time_wait() of a suitable time will run pollers (on at least one CPU) to help ensure forward progress can be made. In a perfect world, interrupts are used but they may a) be disabled, or b) the thing we're doing can't use interrupts because computers are generally terrible. Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run pollers only on the boot CPU, and not if we held any locks. This was to reduce the chance of programming code that could deadlock, as well as to ensure that we didn't just thrash all the cachelines for running pollers all over a large system during boot, or hard spin on the same locks on all secondary CPUs. The problem arises if the OS we're booting makes an OPAL call early on, with interrupts disabled, that requires a poller to run to make forward progress. An example of this would be OPAL_WRITE_NVRAM early in Linux boot (where Linux sets up the partitions it wants) - something that occurs iff we've had to reformat NVRAM this boot (i.e. first boot or corrupted NVRAM). The hiomap implementation should arguably *not* rely on synchronous IPMI messages, but this is a future improvement (as was for mbox before it). The mbox-flash code solved this problem by spinning on check_timers(). More generically though, the approach of running the pollers when no longer booting means we behave more in line with what the API is meant to be, rather than have this odd case of "time_wait() for a condition that could also be tripped by an interrupt works fine unless the OS is up and running but hasn't set interrupts up yet". Fixes: 529bdca0bc546a7ae3ecbd2c3134b7260072d8b0 Fixes: 3db397ea5892a8b348cf412739996731884561b3 Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-25Revert "TEMPORARY HACK: Disable verifying VERSION"Stewart Smith1-6/+1
This reverts commit f835684365273c5ff1b7c700ddc0f9c1a859363f. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-25Quieten 'warnings' now that SIO is disabledStewart Smith1-1/+2
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-25npu2-opencapi: Enable presence detection on ZZFrederic Barrat1-6/+0
Presence detection for opencapi adapters was broken for ZZ planars v3 and below. All ZZ systems currently used in the lab have had their planar upgraded, so we can now remove the override we had to force presence and activate presence detection. Which should improve boot time. Considering the state of opal support on ZZ, this is really only for lab usage on BML. The opencapi enablement team has okay'd the change. In the unlikely case somebody tries opencapi on an old ZZ, the presence detection through i2c will show that no adapter is present and skiboot won't try to access or train the link. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-25cpu: Quieten OS endian switch messagesJoel Stanley1-2/+2
Users see these when loading an OS from Petitboot: [ 119.486794100,5] OPAL: Switch to big-endian OS [ 120.022302604,5] OPAL: Switch to little-endian OS Which is expected and doesn't provide any information the user can act on. Switch them to PR_INFO so they still appear in the log, but not on the serial console. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-23core/device: NULL pointer dereference fixNicholas Piggin1-1/+4
This was caught with unmapped memory dereference page faults. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-23core/flash: NULL pointer dereference fixesNicholas Piggin2-7/+14
These were caught with unmapped memory dereference page faults. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16opal/hmi: Wakeup the cpu before reading core_firVaibhav Jain1-6/+15
When stop state 5 is enabled, reading the core_fir during an HMI can result in a xscom read error with xscom_read() returning an OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At present this return error code is not handled in decode_core_fir() hence the invalid core_fir value is sent to the kernel where it interprets it as a FATAL hmi causing a system check-stop. This can be prevented by forcing the core to wake-up using before reading the core_fir. Hence this patch wraps the call to read_core_fir() within calls to dctl_set_special_wakeup() and dctl_clear_special_wakeup(). Suggested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16core/flash: Ignore prefix when comparing versions.Samuel Mendoza-Jonas1-2/+16
The Skiboot version can include a "skiboot-" prefix if built with something like Buildroot. The property being compared against won't include this so ignore it. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16core/device: Test dt_new_check()Stewart Smith1-2/+4
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16core/device: increase test coverage for dt_new_addr and dt_new_2addrStewart Smith1-0/+2
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16core/device: add test for dt_new() a duplicate nodeStewart Smith1-0/+1
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16core/device: Add test for duplicate nodes with dt_attach_root()Stewart Smith1-1/+8
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-16gcov: Fix building with GCC8Stewart Smith2-7/+8
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-11platform: Restructure bmc_platform typeAndrew Jeffery2-4/+25
Segregate the BMC platform configuration into hardware and software components. This allows population of platform default values for hardware configuration that may no-longer be accessible by the host. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> [stewart: fixup pci-quirk unit test] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-10core/flash: Unlock around blocklevel calls in NVRAM accessorsAndrew Jeffery1-0/+11
This ensures progress when we don't have interrupts available for IPMI. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-10core/flash: Only lock around flashes update in flash_register()Andrew Jeffery1-6/+2
Previously in flash_register() held flash_lock across ffs_init(), which calls through the blocklevel layer to read the flash. This is unhelpful with the IPMI HIOMAP protocol transport as LPC interrupts have not yet been enabled and we are relying on polling to progress. The held lock stalls the boot as we take the nopoll path in time_wait() while completing ipmi_queue_msg_sync() in libflash/ipmi-flash.c Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-10core/lock: Use try_lock_caller() in lock_caller() to capture ownerAndrew Jeffery1-1/+1
Otherwise we can get reports of core/lock.c owning the lock, which is not helpful when tracking down ownership issues. Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-10core/lock: don't set bust_locks on lock errorNicholas Piggin1-2/+0
bust_locks is a big hammer that guarantees a mess if it's set while all other threads are not stopped. I propose removing this in the lock error paths. In debugging the previous deadlock false positive, none of the error messages printed, and the in-memory console was totally garbled due to lack of locking. I think it's generally better for debugging and system integrity to keep locks held when lock errors occur. Lock busting should be used carefully, just to allow messages to be printed out or machine to be restarted, probably when the whole system is single-threaded. Skiboot is slowly working toward that being feasible with co-operative debug APIs between firmware and host, but for the time being, difficult lock crashes are better not to corrupt everything by busting locks. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-10-10core/lock: fix timeout warning causing a deadlock false positiveNicholas Piggin1-6/+15
If a lock waiter exceeds the warning timeout, it prints a message while still registered as requesting the lock. Printing the message can take locks, so if one is held when the owner of the original lock tries to print a message, it will get a false positive deadlock detection, which brings down the system. This can easily be hit when there is a lot of HMI activity from a KVM guest, where the timebase was not returned to host timebase before calling the HMI handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-27init: Fix starting stripped kernelMichael Neuling1-0/+1
Currently if we try to run a raw/stripped binary kernel (ie. without the elf header) we crash with: [ 0.008757768,5] INIT: Waiting for kernel... [ 0.008762937,5] INIT: platform wait for kernel load failed [ 0.008768171,5] INIT: Assuming kernel at 0x20000000 [ 0.008779241,3] INIT: ELF header not found. Assuming raw binary. [ 0.017047348,5] INIT: Starting kernel at 0x0, fdt at 0x3044b230 14339 bytes [ 0.017054251,0] FATAL: Kernel is zeros, can't execute! [ 0.017059054,0] Assert fail: core/init.c:590:0 [ 0.017065371,0] Aborting! This is because we haven't set kernel_entry correctly in this path. This fixes it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-27opal/hmi: Handle early HMIs on thread0 when secondaries are still in OPAL.Mahesh Salgaonkar1-0/+49
When primary thread receives a CORE level HMI for timer facility errors while secondaries are still in OPAL, thread 0 ends up in rendez-vous waiting for secondaries to get into hmi handling. This is because OPAL runs with MSR(EE=0) and hence HMIs are delayed on secondary threads until they are given to Linux OS. Fix this by adding a check for secondary state and force them in hmi handling by queuing job on secondary threads. I have tested this by injecting HDEC parity error very early during Linux kernel boot. Recovery works fine for non-TB errors. But if TB is bad at this very eary stage we already doomed. Without this patch we see: [ 285.046347408,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c [ 285.051160609,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c [ 285.055359021,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 285.055361439,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e14000) Timer Facility Error [ 286.232183823,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc1) [ 287.409002056,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc1) [ 289.073820164,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc1) [ 290.250638683,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc2) [ 291.427456821,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc2) [ 293.092274807,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc2) [ 294.269092904,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc3) [ 295.445910944,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc3) [ 297.110728970,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc3) After this patch: [ 259.401719351,7] OPAL: Start CPU 0x0841 (PIR 0x0841) -> 0x000000000000a83c [ 259.406259572,7] OPAL: Start CPU 0x0842 (PIR 0x0842) -> 0x000000000000a83c [ 259.410615534,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c [ 259.415444519,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c [ 259.419641401,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419644124,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e04000) Timer Facility Error [ 259.419650678,7] HMI: Sending hmi job to thread 1 [ 259.419652744,7] HMI: Sending hmi job to thread 2 [ 259.419653051,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419654725,7] HMI: Sending hmi job to thread 3 [ 259.419654916,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419658025,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000 [ 259.419658406,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:2: TFMR(2e12002870e04000) Timer Facility Error [ 259.419663095,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:3: TFMR(2e12002870e04000) Timer Facility Error [ 259.419655234,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:1: TFMR(2e12002870e04000) Timer Facility Error [ 259.425109779,7] OPAL: Start CPU 0x0845 (PIR 0x0845) -> 0x000000000000a83c [ 259.429870681,7] OPAL: Start CPU 0x0846 (PIR 0x0846) -> 0x000000000000a83c [ 259.434549250,7] OPAL: Start CPU 0x0847 (PIR 0x0847) -> 0x000000000000a83c Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-20fast-reboot: verify firmware "romem" checksumNicholas Piggin2-0/+52
This takes a checksum of skiboot memory after boot that should be unchanged during OS operation, and verifies it before allowing a fast reboot. This is not read-only memory from skiboot's point of view, beause it includes things like the opal branch table that gets populated during boot. This helps to improve the integrity of firmware against host and runtime firmware memory scribble bugs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-20skiboot.lds.S: move read-write data after the end of symbol mapNicholas Piggin1-2/+0
This also tidies up linker script symbol declarations and adds _rodata_mem symbol for the next change to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-20core/mem_region: mambo reserve kernel payload areasNicholas Piggin11-3/+40
Mambo image payloads get overwritten by the OS and by fast reboot memory clearing because they have no region defined. Add them, which allows fast reboot to work. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [stewart: fix up 'make check'] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-19core/fast-reboot: print the fast reboot disable reasonNicholas Piggin1-5/+7
Once things start to go wrong, disable_fast_reboot can be called a number of times, so make the first reason sticky, and also print it to the console at disable time. This helps with making sense of fast reboot disables. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-18Actually add /ibm,opal/fast-reboot propertyStewart Smith1-0/+11
I missed a hunk when merging :( Reported-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Fixes: 7c8e1c6f89f3aac77661cfcee75ab515bd053d75 Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-18Add fast-reboot property to /ibm,opal DT nodeStewart Smith1-0/+2
this means that if it's permanently disabled on boot, the test suite can pick that up and not try a fast reboot test. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17nvram: Fix wait-for-nvram messageOliver O'Halloran1-2/+3
We print a message when nvram_query() needs to wait for the NVRAM to be loaded from the BMC/FSP. Currently this is printed at PR_WARNING which is excessive since this doesn't actually indicate that anything is wrong. There's also nothing that we can really do about loading the NVRAM being slow, so just print this at PR_DEBUG. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-09-17nvram: Print how long we waited for nvramOliver O'Halloran2-0/+11
Print how long we had to wait for NVRAM to become available if we needed to wait. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>