Age | Commit message (Collapse) | Author | Files | Lines |
|
The patch refactors the code we had for PCI error injection. It
doesn't change the logic:
* Rename names of error types and functions according to the
comments given by Michael Ellerman when reviewing the kernel
counterpart.
* Split The backend of error injection for PHB3 and P7IOC to
multiple functions to improve code readability. Some logics
are simplified without affecting their original functionality.
* Misc cleanup like renaming variables and functions.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Fix for nodes > 0. No need to map to node and local chip id. Just pass i as
chip id. Remove unneccessary braces.
In set_capp_recoverable, return not recovered if phb not found.
Found by Milton Miller.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Code cleanup.
Signed-off-by: Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Based on email from JT Kellington, Dave Larson, and Joe McGill and feedback
from Ben H.
handle_malfunction reads the bits in the malf alert reg, checks for
is_capp_recoverable, and returns 1 if recoverable. It also calls into phb3 to
put phb3 in capp error recovery state. Returns 0 if not capp recoverable and
it's a TODO to add the logic to check the other FIRs.
Don't send message when malf alert empty. Use return code -1 to tell
opal_handle_hmi to swallow the event. Also, with locking, only one thread per
core will send the message instead of all threads.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Take a lock before handle_hmi_event per Ben's suggestion. So, when we clear
events, only one thread per core will report it.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For user initiated capp recovery, provide a mode to turn snoops off. The perst
alone does not turn snoops off and we need to do this as part of the capp
recovery procedure before reinitializing the phb.
A second mode turns snoops back on after recovery. The driver needs to do this
after it reinitializes the PSL otherwise tlbies could come in before the psl is
initialized. Also write 0 to capp error status and control as part of the
recovery procedure.
Put modes as flag defines in opal.h so the driver can pick them up.
Add a dt property "ibm,capi-modes" which tells the driver which modes sapphire
supports. For backwards compatibility with older opals. Also, the driver can
disable reset in sysfs if not supported.
Move the mode checking into phb3.c so it's all in one place.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Create a device-node which will be used by Linux for matching
and use a saner default time if IPMI doesn't work.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The platform probe code might want to add things to it.
While at it, make add_cpu_idle_state_properties() local to slw.c
and call it from slw_init() instead of from add_opal_node().
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For PCIe devices, there are 2 bits used to control completion
timeout as follows:
PCIe Cap + 0x24, Device Capabilities 2 Register, bit#4
PCIe Cap + 0x28, Device Control 2 Register, bit#4
The patch adds function pci_disable_completion_timeout(), which
is called during bootup or after PE reset.
It's responsing to bug#114961
Suggested-by: Michael A. Perez <perezma@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch adds function pci_device_init(), which is called by
phb->ops->device_init() to apply common initialization on the
specified PCI device during bootup or after PE reset.
Currently, we only put the logic of MPS configuration to the
function, but more will be put there.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Probably due to the way we spin, we seem to still be hitting the
odd case where we fail to reinit due to a secondary not having quite
reached the right state inside skiboot. Let's bump the timeout up.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Add IPMI GET_SEL_TIME and SET_SEL_TIME commands to the IPMI stack.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Our libc now has a proper implementation of mktime, which makes adding
tm structures together easy. This patch makes the FSP RTC functions
use the library functions and removes the generic time calculation
code from the FSP RTC driver.
The OPAL<->tm conversion functions are also made public as they will
be useful for the IPMI RTC implementation.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We are missing a prlog for tests. This adds a dumb version that ignores
the log level and uses printf to display all messages.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
I misread the spec when implementing the chassis control message.
This fixes the message, as well as correcting the naming of the IPMI
fields to better reflect what they represent.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Acked-by: Jeremy Kerr <jeremy.kerr@au.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Existing backtrace will dump the backtrace to stderr.
__backtrace will dump the backtrace to buffer. backtrace()
will call __backtrace internally and dump it to stderr.
Signed-off-by Aruna Balakrishnaiah <aruna@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Ensure a thread is not stopping its siblings from making forward
progress when we are busy-waiting on older DD1.x CPU revisions where
SMT priorities are somewhat broken.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Rebooting and power down for the Palmetto is done by the BMC, which we
speak to over the BT interface using IPMI. Implement the IPMI chassis
commands which are used for power control, and hook them up to the
palmetto platform callbacks for shutdown and reboot.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch adds a basic IPMI layer to the sapphire core and support
for a BT IPMI interface as found on the Aspeed BMC of the Palmetto
platform
[ Changed the compatible property -- BenH ]
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Adds a fake RTC that can be initialized via a named reserve in the
device tree that may, at some point, be on NVRAM.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The complete reset could be issued by kdump kernel to remove pending
PCI traffic in order to avoid EEH errors in kdump scenario. However,
the bus numbers configured into PCI bridges would be lost after the
reset and it would cause that some of PCI devices (e.g. IPR) can't
be probed by kdump kernel successfully.
The patch fixes above issue by restoring bus numbers after complete
reset. It's responsing to bug#113210
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Currently, the tasks of scanning PHBs are done on master CPU one
by one. The patch intends to do same tasks on multiple CPUs in
order to save booting time with help of additional flags to PHB.
With the patch applied, we saves 22 seconds the tasks to reset
and scan 8 PHBs on one P8 box from 37 seconds to 15 seconds.
NOTE: the printed logs during PCI enumeration should include
PHB index to be self-explaining enough. I'll fix it later.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
As Rolf reported, 2 downstream ports from different PHBs are
connected to same physical bridge, which supports virtual
"partitioned" functionalities. Fundamental reset issued on
one PHB affects the functionality used by another PHB during
PCI enumeration. Eventually, we can't detect the functionality
and all devices behind it on one of two PHBs.
The patch splits PCI enumeration to reset all PHBs and then scan
them one by one to avoid above issue. Also, the patch replaces
PCI_MAX_PHBs with ARRAY_SIZE, which is used heavily.
Reported-by: Rolf Brudeseth <rolfb@us.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@au1.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This enables (advanced) users to vary what level of output they want
at runtime in the memory console and through console drivers (fsp/uart)
You can vary two things by poking in the debug descriptor:
a) what log level is printed at all
e.g. only turn on PR_TRACE at specific points during runtime
b) what log level goes out the fsp/uart console
defaults to PR_PRINTF
We use two 4bit numbers (1 byte) for this in debug descriptor (saving
some space, not needlessly wasting space that we may want in future).
The default is 0x75 (7=PR_DEBUG to in memory console, 5=PR_PRINTF to drivers)
If you write 0x77 you will get debug info on uart/fsp console as
well as in memory. If you write 0x95 you get PR_INSANE in memory but
still only PR_NOTICE through drivers.
People who write something like 0x1f will get a very quiet boot indeed.
A future patch would be to (when possible) peek at device tree entries
for if we should change the default.
A future patch would add an OPAL API to get/set this.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
We modify write() (adding console_write()) which calls down to
a modified __flush_console() which can now decide if it's flushing
the added console contents to the console drivers or not.
A future patch may add support for changing PR_NOTICE to some other level
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Moving assert_fail() out of libc and into core/utils.c so that we can
sanely call prlog(PR_EMERG).
We shorten it from three fputs calls down to one prlog() call.
This may increase the number of cycles and stack usage for when we
hit an assert, which may not be desirable.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
When handling assert and we're going to fail, get the message out
with a high priority.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
If we're printing a backtrace, things have probably gone horribly,
horribly wrong - highest log priority.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Replace the libc printf implementation with a wrapper that does
fancy log things such as display timestamp and the log level.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This is the initial patch for having timestamps in the log.
It currently only wraps prerror to our prlog() function and thus only
(very slightly) modifies bootup log.
we use the timebase as an indication of the progression of time. It is
not perfect, and is indeed reset back to zero during boot, but it should
serve adequately for our needs of "approximately this much time elapsed
between log entries".
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
This patch implements basic framework for TOD error recovery. To start
with, this patch implements TOD sync check error recovery as an example.
Currently this patch recover from sync check error on non-master chip.
We can use same framework and recover from more TOD errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
With new proposed change, Linux will get the HMI interrupt directly. Linux
will then invoke opal_handle_hmi to handle HMI recovery in opal. After
handling HMI errors, opal will generate an OPAL HMI event and queue it up
in opal message infrastructure so that Linux host can pull the event
and act upon it accordingly. This patch also adds new message type for
HMI event.
Changes in v2:
- Removed the token argument from opal_handle_hmi()
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Move the original hmi handler to new file core/hmi.c. No functionality
change, just a code movement and variable name change.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch introduces a new OPAL API opal_pci_eeh_freeze_set(),
which allows to set frozen state for the specified PE, so that
we can support "compound" PE in kernel.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Though the p7ioc spec states the errors triggered by PAPR error
injection register set (0x2b0, 0x2b8, 0x2c0) should be one-shot
without "sticky" bit, Firebird-L machine doesn't follow the rule.
It will cause endless frozen PE until we have to remove the PE
permanently.
The patch extends opal_pci_reset() allowing kernel to clear PAPR
error injection register set at appropriate point.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch introduces new OPAL API opal_pci_err_injct() for injecting
PCI errors.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
So my great attempt at avoiding all re-entencies fails due to HBRT... at
least until we have some kind of way to thread things, it will have to
re-enter so let's bite the bullet, make the poller list walking lockless
(we'll handle removal when we have to, ie, not yet) and slightly extend
the coverage of the PSI lock while at it. All the other pollers already
have their own locks anyway so we are actually removing some overhead.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
And check & warn inside opal_run_pollers() as well
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
In case where we don't want to recurse into opal pollers
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
For debug purposes essentially
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Otherwise we don't handle surveillance and PSI link monitoring
This should fix cases of surveillance timeouts during things
like code update such as BZ109939
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|