Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We have an implementation for non-FSP systems now, and we shouldn't be
calling that from code in an fsp/ directory, so move op_display() to a
platform function.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
It's been a good long while since either OPAL POWER7 user touched a
machine, and even longer since they'd have been okay using an old
version rather than tracking master.
There's also been no testing of OPAL on POWER7 systems for an awfully
long time, so it's pretty safe to assume that it's very much bitrotted.
It also saves a whole 14kb of xz compressed payload space.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Enthusiasticly-Acked-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This patch implements generic interface to pass data from FSP to HBRT during
runtime (FSP -> OPAL -> opal-prd -> HBRT).
OPAL gets notification from FSP for new HBRT messages. We will convert MBOX
message to firmware_notify format and send it to HBRT.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Kernel calls opal_get_msg() API to read OPAL message. In this path OPAL
calls "callback" handler to inform caller that kernel read the opal
message. It assumes that read is always success. This assumption was
fine as message was always fixed size.
Next patch introduces variable size opal message. In that situation
opal_get_msg() may fail due to insufficient buffer size (ex: old kernel
and new OPAL combination). So lets add `return value` parameter to
"callback" handler. So that caller knows kernel didn't read the
message and take appropriate action.
Cc: Jeremy Kerr <jk@ozlabs.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The current code has a few possible issues with string handling, and
gcc flags a number of string / buffer warnings when enabling more
checking.
Some of the issues in the file:
- Mixing of null-terminated arrays (in most cases), and non-null in the
input/output buffer format. memcpy generally should be used when the
length is known.
- Lack of input data length bounds checking. Malformed input could
cause overruns.
- String copying from same sized source and destination array sizes,
where the source is a NUL terminated string, so the strncpy copies
the string without its NUL terminator, which becomes NUL terminated
at the zeroed destination array. Compiler does not like this, and
it only works if the destination has been zeroed, so not a great
pattern.
- Attemping to NUL terminate string using strcat, which will overwrite
a byte past the end of the array if the string length is at maximum,
or worse if the input was malformed.
This patch fixes several of these issues and fixes a number of compiler
warnings. In general, the buffer and string handling could probably
benefit from a more in-depth audit.
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
BT responses are handled using a timer doing the polling. To hope to
get an answer to an IPMI synchronous message, the timer needs to run.
We can't just check all timers though as there may be a timer that
wants a lock that's held by a code path calling ipmi_queue_msg_sync(),
and if we did enforce that as a requirement, it's a pretty subtle
API that is asking to be broken.
So, if we just run a poll function to crank anything that the IPMI
backend needs, then we should be fine.
This issue shows up very quickly under QEMU when loading the first
flash resource with the IPMI HIOMAP backend.
Reported-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This is an adaptation of what we currently do for op_display() on FSP
machines, inventing an encoding for what we can write into the single
byte at LPC port 80h.
Port 80h is often used on x86 systems to indicate boot progress/status
and dates back a decent amount of time. Since a byte isn't exactly very
expressive for everything that can go on (and wrong) during boot, it's
all about compromise.
Some systems (such as Zaius/Barreleye G2) have a physical dual 7 segment
display that display these codes. So far, this has only been driven by
hostboot (see hostboot commit 90ec2e65314c).
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Rename ___backtrace() to backtrace_create() and ___print_backtrace() to
backtrace_print(). Get rid of __backtrace() and __print_backtrace()
wrappers.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We're about to get rid of __backtrace() and __print_backtrace(), convert
the FSP/IPMI attn code to not use them.
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Below message is confusing. Lets make it clear.
FSP sends "R/R complete notification" whenever there is a dump. We use `flag`
to identify whether its its R/R completion -OR- just new dump notification.
[ 483.406351956,6] FSP: SP says Reset/Reload complete
[ 483.406354278,5] DUMP: FipS dump available. ID = 0x1a00001f [size: 6367640 bytes]
[ 483.406355968,7] A Reset/Reload was NOT done
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
It's only used there, let's minimise our needed includes.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
If FSP changes next IPL side, then disable fast reboot.
sample output:
[ 620.196442259,5] FSP: Got sysparam update, param ID 0xf0000007
[ 620.196444501,5] CUPD: FW IPL side changed. Disable fast reboot
[ 620.196445389,5] CUPD: Next IPL side : perm
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Caught by scan-build, also constant-ify the input
parameter.
Signed-off-by: Balbir singh <bsingharora@gmail.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Currently we only call set_opal_console() to establish the backend
used by the OPAL console API if we find at least one FSP serial
port in HDAT.
On systems where there is none (IPMI only), we fail to set it,
causing the console code to try to use the dummy console causing
an assertion failure during boot due to clashing on the device-tree
node names.
So always set it if an FSP is present
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
hw/fsp/fsp.c:1011:17: warning: passing an object that undergoes default argument promotion to
'va_start' has undefined behavior [-Wvarargs]
va_start(list, add_words);
^
hw/fsp/fsp.c:1007:59: note: parameter of type 'u8' (aka 'unsigned char') is declared here
void fsp_fillmsg(struct fsp_msg *msg, u32 cmd_sub_mod, u8 add_words, ...)
^
[CC] platforms/ibm-fsp/apollo-pci.o
hw/fsp/fsp.c:1026:17: warning: passing an object that undergoes default argument promotion to
'va_start' has undefined behavior [-Wvarargs]
va_start(list, add_words);
^
hw/fsp/fsp.c:1016:47: note: parameter of type 'u8' (aka 'unsigned char') is declared here
struct fsp_msg *fsp_mkmsg(u32 cmd_sub_mod, u8 add_words, ...)
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Add support to load the imc catalog from a lid file packaged
as part of the system firmware. Lid number allocated
is 0x80f00103.lid.
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This reverts commit 20f685a3627a2a522c465716377561a8fbcc608f.
We've hit problems on Zaius machines and the needed petitboot changes
haven't made it upstream yet.
Let's revert for the time being while we sort everything out.
We probably have to keep both around for a few years.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Update fsp_lid_map to include CAPP ucode lid for phb4-chipid ==
0x202d1 that corresponds to P9 DD-2.2 chip.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
dtc tool complaining about below warning as usage of linux,stdout-path
property under /chosen node is deprecated.
dts: Warning
(chosen_node_stdout_path): Use 'stdout-path' instead of 'linux,stdout-path'
So this patch fix this by using stdout-path property on all the systems
and keep linux,stdout-path only on P8 and before. This property refers to
a node which represents the device to be used for boot console output.
Verified boot on both P8 and P9 systems with new and older kernels.
And also verified dtc warnings got fixed in both P8 and P9.
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[stewart: simplify logic]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This changes to build system to use thin archives rather than
incremental linking for built-in.o, similar to recent change to Linux.
built-in.o is renamed to built-in.a, and is created as a thin archive
with no index, for speed and size. All built-in.a are aggregated into
a skiboot.tmp.a which is a thin archive built with an index, making it
suitable or linking. This is input into the final link.
The advantags of build size and linker code placement flexibility are
not as great with skiboot as a bigger project like Linux, but it's a
conceptually better way to build, and is more compatible with link
time optimisation in toolchains which might be interesting for skiboot
particularly for size reductions.
Size of build tree before this patch is 34.4MB, afterwards 23.1MB.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This patch adds support to read u64 sensor values. This also adds
changes to the core and the backend implementation code to make this
API as the base call. Host can use this new API to read sensors
upto 64bits.
This adds a list to store the pointer to the kernel u32 buffer, for
older kernels making async sensor u32 reads.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
a. Surveillance response times out and OPAL triggers a HIR
b. Before the HIR process kicks in, OPAL gets a PSI interrupt indicating link down
c. HIR process continues and OPAL tries to write to DRCR; PSI link inactive => xstop
OPAL should confirm that the FSP is not already in reset in the HIR path.
[V2] Handle the case where a second reset is triggered due to the two resets
happening in succession.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
commit ba4d46fdd9eb ("console: Set log level from nvram") wants to read
from NVRAM rather early. This works fine on BMC based systems as
nvram_init() is actually synchronous. This is not true for FSP systems
and it turns out that the query for the console log level simply
queries blank nvram.
The simple fix is to wait for the NVRAM read to complete before
performing any query. Unfortunately it turns out that the fsp-nvram
code does not inform the generic NVRAM layer when the read is complete,
rather, it must be prompted to do so.
This patch addresses both these problems. This patch adds a check before
the first read of the NVRAM (for the console log level) that the read
has completed. The fsp-nvram code has been updated to inform the generic
layer as soon as the read completes.
The old prompt to the fsp-nvram code has been removed but a check to
ensure that the NVRAM has been loaded remains. It is conservative but
if the NVRAM is not done loading before the host is booted it will not
have an nvram device-tree node which means it won't be able to access
the NVRAM at all, ever, even after the NVRAM has loaded.
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
These messages just fill up the opal console log with useless messages
resulting in us losing useful information.
They have been like this since the first commit in skiboot. Make them
trace.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Commit fd6b71fc fixed the situation where ipmi console was open (hvc0) but got
data on different console (hvc1).
During FSP R/R OPAL closes all consoles. After R/R complete FSP requests to
open hvc1 and sends data on this. If hvc1 registration failed or not opened in
host kernel then it will not read data and results in RCU stalls.
Note that this is workaround for older kernel where we don't have separate irq
for each console. Latest kernel works fine without this patch.
CC: stable
CC: Sam Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added
error logging when buffer is full. In some corner cases kernel may call this
function multiple time and we may endup logging error again and again.
This patch fixes it by generating error log only once. I think this is enough
to indicate something went wrong.
Also with previous patch, once console buffer is full, OPAL is returning error
to payload from fsp_console_write_buffer_space(). So payload will never call
fsp_console_write(). Hence move error logging logic to right place.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Kernel calls fsp_console_write_buffer_space() to check console buffer space
availability. If there is enough buffer space to write data, then kernel will
call fsp_console_write() to write actual data.
In some extreme corner cases (like one explained in commit c8a7535f)
console becomes full and this function returns 0 to kernel (or space available
in console buffer < next incoming data size). Kernel will continue retrying
until it gets enough space. So we will start seeing RCU stalls.
This patch keeps track of previous available space. If previous space is same
as current means not enough space in console buffer to write incoming data.
It may be due to very high console write operation and slow response from FSP
-OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this
point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs).
If situation is not improved within 10 seconds means something went bad. Lets
return OPAL_RESOURCE so that kernel can drop console write and continue.
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
CC: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[stewart: reset timeout in fsp_console_write() path]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Presently we are not closing SOL and FW console sessions during R/R. Host will
continue to write to SOL buffer during FSP R/R. If there is heavy console write
operation happening during FSP R/R (like running `top` command inside console),
then at some point console buffer becomes full. fsp_console_write_buffer_space()
returns 0 (or less than required space to write data) to host. While one thread
is busy writing to console, if some other threads tries to write data to console
we may see RCU stalls (like below) in kernel.
kernel call trace:
------------------
[ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769)
[ 2082.828365] Task dump for CPU 32:
[ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884
[ 2082.828375] Workqueue: events dump_work_fn
[ 2082.828376] Call Trace:
[ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable)
[ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0
[ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98
[ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250
[ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0
[ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0
[ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130
[ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68
Hence lets close SOL (and FW console) during FSP R/R.
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Presently OPAL sends associate/unassociate MBOX command for all
FSP serial console (like below OPAL message). We have to check
console is available or not before sending this message.
OPAL log:
-------
[ 5013.227994012,7] FSP: Reassociating HVSI console 1
[ 5013.227997540,7] FSP: Reassociating HVSI console 2
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went
into reset before the CEC power down came in. But this is generic issue
that can happen in normal shutdown path as well.
Hence disable PSI link as soon as we detect FSP impending R/R.
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
CC: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM
statistics. OPAL doesn't maintain any such statistics. Hence return
FSP_STATUS_INVALID_SUBCMD.
Sample OPAL log:
[16944.384670488,3] FSP: Unhandled message eb0500
[16944.474110465,3] FSP: Unhandled message eb0500
[16945.111280784,3] FSP: Unhandled message eb0500
[16945.293393485,3] FSP: Unhandled message eb0500
With this patch, I don't think FSP will ever call "free vNVRAM" MBOX command.
But to be safer side lets return FSP_STATUS_INVALID_SUBCMD for this MBOX
command as well.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Update fsp_lid_map to include CAPP ucode lids for phb4-chipid ==
0x200d1 and phb4-chipid == 0x201d1 that corresponds to P9 DD-2.0 &
DD-2.1 chips respectively.
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
These two prints just end up filling the skiboot logs on any machine
that's been booted for more than a few hours.
They have never been useful, so make them trace level.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We use irq for reading input from console, but not in output path.
Hence do not enable input irq in write path.
Fixes : 583c8203 (fsp/console: Allocate irq for each hvc console)
CC: Sam Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-By: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued
-> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response
from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue
including inflight message (fsp_reset_cmdclass()). But we are not resetting
inflight message state.
In extreme croner case where we sent message to FSP via fsp_sync_msg() path
and FSP R/R happens before getting respose from FSP, then we will endup waiting
in fsp_sync_msg() until everything becomes normal.
This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller
if FSP is in R/R.
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
CAPP microcode flash download and CAPP upload for PHB4.
A new file 'capp.c' is created to receive common capp code for PHB3 and
PHB4.
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Fix coverity warning message.
Null pointer dereferences (NULL_RETURNS)
/hw/fsp/fsp-console.c: 295 in fsp_open_vserial()
289
290 fs->open = true;
291
292 fs->poke_msg = fsp_mkmsg(FSP_CMD_VSERIAL_OUT, 2,
293 msg->data.words[0],
294 msg->data.words[1] & 0xffff);
>>> CID 145796: Null pointer dereferences (NULL_RETURNS)
>>> Dereferencing a null pointer "fs->poke_msg".
295 fs->poke_msg->user_data = fs;
296
297 fs->in_buf->partition_id = fs->out_buf->partition_id = part_id;
298 fs->in_buf->session_id = fs->out_buf->session_id = sess_id;
299 fs->in_buf->hmc_id = fs->out_buf->hmc_id = hmc_indx;
300 fs->in_buf->data_offset = fs->out_buf->data_offset =
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We use TCE mapped area to write data to console. Console header
(fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
Kernel makes opal_console_write() OPAL call to write data to console.
OPAL write data to TCE mapped area and sends MBOX command to FSP.
If our console becomes full and we have data to write to console,
we keep on waiting until FSP reads data.
In some corner cases, where FSP is active but not responding to
console MBOX message (due to buggy IPMI) and we have heavy console
write happening from kernel, then eventually our console buffer
becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
kernel. Kernel will keep on retrying. This is creating kernel soft
lockups. In some extreme case when every CPU is trying to write to
console, user will not be able to ssh and thinks system is hang.
If we reset FSP or restart IPMI daemon on FSP, system recovers and
everything becomes normal.
This patch adds workaround to above issue by returning OPAL_HARDWARE
when cosole is full. Side effect of this patch is, we may endup dropping
latest console data. But better to drop console data than system hang.
Alternative approach is to drop old data from console buffer, make space
for new data. But in normal condition only FSP can update 'next_out'
pointer and if we touch that pointer, it may introduce some other
race conditions. Hence we decided to just new console write request.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
For timed out FSP messages, we set message status as "fsp_msg_timeout".
But most FSP driver users (like surviellance) are ignoring this field.
They always look for FSP returned status value in callback function
(second byte in word1). So we endup treating timed out message as success
response from FSP.
Sample output:
[69902.432509048,7] SURV: Sending the heartbeat command to FSP
[70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3
....
[70023.226901445,7] SURV: Received heartbeat acknowledge from FSP
[70023.226903251,3] FSP: fsp_trigger_reset() entry
Here SURV code thought it got valid response from FSP. But actually we didn't
receive response from FSP.
This patch fixes above issue by updating status field in response structure.
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Presently we print word0 and word1 in error log. word0 contains
sequence number and command class. One has to understand word0
format to identify command class.
Lets explicitly print command class, sub command etc.
CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Now that we are using fsp_in_rr() to detect FSP reset/reload, fsp_in_reset
become redundant. Lets remove this local variable.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
fsp_opal_rtc_write() checks FSP status before queueing message to FSP. But if
FSP R/R starts before getting response to queued message then we will continue
to return OPAL_BUSY_EVENT to host. In some extreme condition host may
experience hang. Once FSP is back we will repost message, get response from FSP
and return OPAL_SUCCES to host.
This patch caches new values and returns OPAL_SUCCESS if FSP R/R is happening.
And once FSP is back we will send cached value to FSP.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
reset. Use latest fsp_in_rr() function to properly read the cached rtc
value when fsp reset initiated by the hir.
Below is the kernel trace when we set hw clock, when hir process starts.
[ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
[ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
[ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
[ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
[ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
[ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic)
[ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
[ 1727.775889] CR: 28024442 XER: 20000000
[ 1727.775890] CFAR: c00000000008472c SOFTE: 1
GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
GPR12: c0000000000846e8 c00000000fba0100
[ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
[ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
[ 1727.775899] Call Trace:
[ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
[ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
[ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
[ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
[ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
[ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
[ 1727.775908] Instruction dump:
[ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
[ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
This is found when executing the testcase
https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
With this fix ran fsp hir torture testcase in the above test
which is working fine.
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
CC: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
.. as we reuse same msg to send next output message.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|