Age | Commit message (Collapse) | Author | Files | Lines |
|
This increase various timeouts as per CQ SW283991 which should help
with some external drawers and GPUs.
We also fixup the timeouts in the PEC which HB won't do before GA3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
On a re-ipl or warm reboot, Sapphire asserts and deasserts PERST to each slot.
This results in the fpga image loaded into the flash for a CAPP adapter. HMIs
have been observed with a 200ms wait following PERST deassert, so bump time up
to 1s. Do this for all cases because re-ipl does not preserve memory and we'd
need a mechanism for Sapphire know that there is a CAPP adapter.
We might be able to reduce this to 750ms or 500ms but need more testing. Use
1s to be safe. Also, phyp fw uses 1s after deassert.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Presently we are logging informational event if OCC timeout happens
during boot. Change the severity to Unrecoverable Error.
Also updated the elog description.
Sample Output:
|------------------------------------------------------------------------------|
| Entry Id Commit Time SubSystem Committed by |
| Platform Id State Event Severity Ascii Str |
|------------------------------------------------------------------------------|
| 0x53A530C8 10/09/2014 10:13:06 CEC Hardware Subsystem OC |
| 0xB0000001 Sent to Hypervisor Unrecoverable Error BB82C013 |
|------------------------------------------------------------------------------|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch enables injecting PCI errors to DMA address address,
including 32-bits and 64-bits ranges.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch enables injecting PCI errors to 64-bits MMIO range.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When doing error injection to 32-bits MMIO range, fixed length 8MB
is used. That's incorrect as one PE might span multile segments.
Also the 32-bits MMIO segment size isn't 8MB necessarily.
The patch fixes the issue to cover all (contiguous) 32-bits MMIO
segments assigned to the specified PE. Also, it fixes the 48 bits
of 50 bits AIB address, instead of all bits used for comparison.
BZ: 115222
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch refactors the code we had for PCI error injection. It
doesn't change the logic:
* Rename names of error types and functions according to the
comments given by Michael Ellerman when reviewing the kernel
counterpart.
* Split The backend of error injection for PHB3 and P7IOC to
multiple functions to improve code readability. Some logics
are simplified without affecting their original functionality.
* Misc cleanup like renaming variables and functions.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Fix for nodes > 0. No need to map to node and local chip id. Just pass i as
chip id. Remove unneccessary braces.
In set_capp_recoverable, return not recovered if phb not found.
Found by Milton Miller.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Code cleanup.
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Use real functionality based flags instead of a mode list in the DT
and other cleanups & missing bits (this one actually builds !)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Based on email from JT Kellington, Dave Larson, and Joe McGill and feedback
from Ben H.
handle_malfunction reads the bits in the malf alert reg, checks for
is_capp_recoverable, and returns 1 if recoverable. It also calls into phb3 to
put phb3 in capp error recovery state. Returns 0 if not capp recoverable and
it's a TODO to add the logic to check the other FIRs.
Don't send message when malf alert empty. Use return code -1 to tell
opal_handle_hmi to swallow the event. Also, with locking, only one thread per
core will send the message instead of all threads.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Take a lock before handle_hmi_event per Ben's suggestion. So, when we clear
events, only one thread per core will report it.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Add a flag indicating the CAPP unit is in recovery. When a capp recoverable
malfunction HMI comes in, the HMI handler will call into
phb3_set_capp_recovery, which will put set the flag and send the event to
Linux.
EEH will call phb3_next_error which will tell it the phb is fenced.
EEH will then call into sapphire to reinitialize the phb which contains steps
3-5 of capp recovery procedure. The code increases wait time of PERST to 1s to
ensure fpga download is complete before polling linkup.
EEH will then rebind the cxl driver and it will complete recovery once it
initializes and turns snoops on, steps 7-8, completing capp recovery procedure.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For user initiated capp recovery, provide a mode to turn snoops off. The perst
alone does not turn snoops off and we need to do this as part of the capp
recovery procedure before reinitializing the phb.
A second mode turns snoops back on after recovery. The driver needs to do this
after it reinitializes the PSL otherwise tlbies could come in before the psl is
initialized. Also write 0 to capp error status and control as part of the
recovery procedure.
Put modes as flag defines in opal.h so the driver can pick them up.
Add a dt property "ibm,capi-modes" which tells the driver which modes sapphire
supports. For backwards compatibility with older opals. Also, the driver can
disable reset in sysfs if not supported.
Move the mode checking into phb3.c so it's all in one place.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
FLUSH_SUE_STATE_MAP change fixes a problem with recovery. We were using an old
lab value that marked PTE entries in a shared state. After recovery, PTE
entries were getting flushed out to memory with an SUE, resulting in a machine
check. The new value means PTE entries are dropped on recovery.
For, APC_MASTER_PB_CTRL spec says to use initfile value and bit 3 should be
set. Initfile missing bit 3 so do a RMW. Bit 3 enables CAPP combined
response.
CAPP_EPOCH_TIMER_CTRL enables epoch timers and the recovery timer when recovery
is enabled. Also relax epoch timer period mask due to a bug.
TRANSPORT_CONTROL reg set bit 37 - rfs_benign_ptr_data in addition to spec
value. Should be set in initifile in future.
Rename APC_MASTER_CONFIG to APC_MASTER_CAPI_CTRL to match workbook name.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch changes fsp_opal_get_dpo_status function to return
OPAL_WRONG_STATE when not in DPO pending state. This will help
the host to differentiate whether the system is in DPO pending
state or not and then analyse the returned timeout value correctly.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Right now if the OPAL message queuing fails, the FSP never gets
the ack back for the original DPO initiation message it had sent
previously. With this patch, if the OPAL message queuing fails to
send the DPO message to the host, it still acks the FSP about the
original message but with error flags.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch adds a positive return statement after handling
DPO message from FSP. Currently it was returning a negetive
value for all the possible cases.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch cleans up multiple printf statements and also
introduces couple of defines to reflect the byte position
signatures present on the FSP DPO initiation command.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The FPGA used on some open power machines generates regular pulses instead
of levels. In that case, reading the status might fail since it's not
latched. In that case, also check the latched event bit in the XIVR.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Create a device-node which will be used by Linux for matching
and use a saner default time if IPMI doesn't work.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The platform probe code might want to add things to it.
While at it, make add_cpu_idle_state_properties() local to slw.c
and call it from slw_init() instead of from add_opal_node().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For PCIe devices, there are 2 bits used to control completion
timeout as follows:
PCIe Cap + 0x24, Device Capabilities 2 Register, bit#4
PCIe Cap + 0x28, Device Control 2 Register, bit#4
The patch adds function pci_disable_completion_timeout(), which
is called during bootup or after PE reset.
It's responsing to bug#114961
Suggested-by: Michael A. Perez <perezma@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch adds function pci_device_init(), which is called by
phb->ops->device_init() to apply common initialization on the
specified PCI device during bootup or after PE reset.
Currently, we only put the logic of MPS configuration to the
function, but more will be put there.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Keep it 0 for open-power platforms where OCC is going to be preloaded,
also avoids a annoying 1mn delay on early openpower and bml when there
is no OCC firmware to wait for.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Probably due to the way we spin, we seem to still be hitting the
odd case where we fail to reinit due to a secondary not having quite
reached the right state inside skiboot. Let's bump the timeout up.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch provides the in-band support for reading the 'console-select'
system parameter. It also adds the console support to honour the system
param for switching the console type in P8 systems.
Tested-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
|
|
Commit 9f64cb20 introduced a spurious unconditional byteswap, which we
don't need for HAVE_BIG_ENDIAN.
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
|
|
Match the fast-sleep name between OPAL and HB
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
|
|
|
|
Add IPMI GET_SEL_TIME and SET_SEL_TIME commands to the IPMI stack.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Our libc now has a proper implementation of mktime, which makes adding
tm structures together easy. This patch makes the FSP RTC functions
use the library functions and removes the generic time calculation
code from the FSP RTC driver.
The OPAL<->tm conversion functions are also made public as they will
be useful for the IPMI RTC implementation.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For the case where the survserver on the fsp server is dead (for whatever
reason [1]), even before the first time query via sysparam of the surv
status by sapphire, we get an error response to the sysparam query.
We should apparently trigger a HIR in that case (same as phyp).
[1] survserver has a real bug on a 'fsptelinit --disablerecovery'
followed by a 'kill -9 <survserver_pid>'
Fixes https://bugzilla.linux.ibm.com/show_bug.cgi?id=114646.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
|
|
|
|
Both the FSP RTC and the upcoming IPMI RTC implementation need to
manipulate time in various ways. Rather than re-implementing slightly
different versions of the calculations twice lets implement some
standard library functions (with tests) and use those.
This patch adds mktime and gmtime_r to the libc.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Commit e810dcbc (ATTN: Set up attention area to handle attention) broke
tests, as the familiy of CPU_TO_BEXX macros are not compile time constant.
hdata/test/../spira.c:60:4: error: initializer element is not constant
.addr = CPU_TO_BE64((unsigned long)&(cpu_ctl_spat_area) + SKIBOOT_BASE)
There is no test coverage of this code, so for now we can comment out
these areas in order to allow the tests to pass.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We are missing a prlog for tests. This adds a dumb version that ignores
the log level and uses printf to display all messages.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This pulls in a fix for warnings in our tests:
hdata/test/../spira.c:64:64: warning: suggest parentheses around ‘+’
in operand of ‘&’ [-Wparentheses]
.addr = CPU_TO_BE64((unsigned long)&(cpu_ctl_sp_attn_area1) + SKIBOOT_BASE)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
I misread the spec when implementing the chassis control message.
This fixes the message, as well as correcting the naming of the IPMI
fields to better reflect what they represent.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For the case where the survserver on the fsp server is dead (for whatever
reason [1]), even before the first time query via sysparam of the surv
status by sapphire, we get an error response to the sysparam query.
We should apparently trigger a HIR in that case (same as phyp).
[1] survserver has a real bug on a 'fsptelinit --disablerecovery'
followed by a 'kill -9 <survserver_pid>'
Fixes https://bugzilla.linux.ibm.com/show_bug.cgi?id=114646.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
|
|
This chagne fixes a bug found by Alistair Popple: we have a stray '9' in
the count of non-leap-years in 400 years. This will cause an incorrect
result from tm_add if the TOD cache is >400 years old.
Signed-off-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Now that the log automatically timestamps entries, remove the tb print
in the error paths.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
It seems that when we commited the IPMI/BT driver we updated the
device tree compatible property for the iBT interface. Unfortunately
Palmetto still requires a DT fixup for this node and somewhere along
the way there was a typo.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch adds a OPAL interface to fetch the DPO timeout. This
functionality is required to synchronously query Sapphire about
how much seconds are remaining for a forced system shutdown which
is useful in cases where the host has missed the OPAL_MSG_DPO for
some reason like system boot, reboot or kexec operations. This
ensures host can still query about the DPO timeout status and act.
This patch also adds helper routine to convert time base into seconds.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch moves the DPO message handling from FSP core code into
a separate file to make it more cleaner and to add OPAL interfaces
in the subsequent patch. It does not change anything functionally.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch changes the log message prefix from EPOW to FSPEPOW
as this standard is followed every where in FSP specific code
base. This also changes a bit in the file header.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For better debugability, the patch adds git version and backtrace
details to user data section (along with file info which was already
present).
After adding required details TermImmedData looks like:
TermImmedData |
| 00000000 63386631 6639322D 64697274 793A0000 c8f1f92-dirty:.. |
| 00000010 00000000 00000000 00000000 00000000 ................ |
| 00000020 00000000 00000000 43505520 30303030 ........CPU 0000 |
| 00000030 30303264 20426163 6B747261 63653A0A 002d Backtrace:. |
| 00000040 20533A20 30303030 30303030 33316162 S: 0000000031ab |
| 00000050 36626130 20523A20 30303030 30303030 6ba0 R: 00000000 |
| 00000060 33303031 33306238 0A20533A 20303030 300130b8. S: 000 |
| 00000070 30303030 30333161 62366334 3020523A 0000031ab6c40 R: |
| 00000080 20303030 30303030 30333030 34623738 000000003004b78 |
| 00000090 380A2053 3A203030 30303030 30303331 8. S: 0000000031 |
| 000000A0 61623663 63302052 3A203030 30303030 ab6cc0 R: 000000 |
| 000000B0 30303330 30313736 31300A20 533A2030 0030017610. S: 0 |
| 000000C0 30303030 30303033 31616236 64343020 000000031ab6d40 |
| 000000D0 523A2030 30303030 30303033 30303035 R: 0000000030005 |
| 000000E0 3133340A 20533A20 30303030 30303030 134. S: 00000000 |
| 000000F0 33316162 36663030 20523A20 30303030 31ab6f00 R: 0000 |
| 00000100 30303030 33303030 32353534 0A000000 000030002554.... |
| 00000110 00000000 00000000 00000000 00000000 ................ |
| 00000120 00000000 00000000 00000000 00000000 ................ |
| 00000130 00000000 00000000 00000000 00000000 ................ |
| 00000140 00000000 00000000 00000000 00000000 ................ |
| 00000150 00000000 00000000 00000000 00000000 ................ |
| 00000160 00000000 00000000 00000000 00000000 ................ |
| 00000170 00000000 00000000 00000000 00000000 ................ |
| 00000180 00000000 00000000 00000000 00000000 ................ |
| 00000190 00000000 00000000 00000000 00000000 ................ |
| 000001A0 00000000 00000000 00000000 00000000 ................ |
| 000001B0 00000000 00000000 00000000 00000000 ................ |
| 000001C0 00000000 00000000 00000000 636F7265 ............core |
| 000001D0 2F6F7061 6C2E633A 3233383A 30000000 /opal.c:238:0... |
|------------------------------------------------------------------------------|
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Generating src dynamically results in:
1. Difficulty in documenting and for field people to understand.
2. It might also conflict with existing srcs.
Hence add default SRC in SRC section.
Assert function call address in hex word 2.
errl -d <elog-entry-id>:
..
| Reference Code : BB821410 |
| Hex Words 2 - 5 : 30017610 00000000 00000000 00000000 |
..
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Existing backtrace will dump the backtrace to stderr.
__backtrace will dump the backtrace to buffer. backtrace()
will call __backtrace internally and dump it to stderr.
Signed-off-by Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|