Age | Commit message (Collapse) | Author | Files | Lines |
|
There are a couple of places we can set/unset fence for a brick:
1. MISC register: NPU2_MISC_FENCE_STATE
2. NTL register for the brick: NPU2_NTL_MISC_CFG1(ndev)
Recent testing of ATS in combination with GPU reset has exposed a side
effect of using (1); if fence is set for all six bricks, it triggers a
sticky nmmu latch which prevents the NPU from getting ATR responses.
This manifests as a hang in the tests.
We have npu2_dev_fence_brick() which uses (1), and only two calls to it.
Replace the call which sets fence with a write to (2). Remove the
corresponding unset call entirely. It's unneeded because the procedures
already do a progression from full fence to half to idle using (2).
Signed-off-by: Reza Arbab <arbab@linux.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 5ff8763c9b0421d8de0f4346ca211c853d2406d4)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This helps some cards train on the second PERST (ie fast-reboot). The
reason is not clear why but it helps, so YOLO!
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 9078f8268922b44c3b0f2cd44f567b9389073142)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
I'm going to defer training to this state soon, so move the tracing
first.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit efc4020a32fbb199c58ada9315d64a175162d066)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We want to get through this as fast as possible so minimise by
removing msecs_to_tb() call. Changes number passed from 512 -> 1.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit da05882b8e6e146b5b4121b1e177c4aea47de8f2)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Processor FRU vpd doesn't contain vendor detail. We have to parse
module VPD to get vendor detail.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 861350941f9a3fb76ebcae3e5a32b3cbec929d03)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
On OpenPower system, VPD keyword size tells us the maximum size of the data.
But they fill trailing end with space (0x20) instead of NULL. Also spec
doesn't stop user to have space (0x20) within actual data.
This patch discards trailing spaces before populating device tree.
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[stewart: fixup make check]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 77f510d35e8d60faed989496fac2de16663ff332)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
3d019581c98153 introduced clearing PCR on reinit cpus,
and until (the near future from now) qemu didn't support
this register.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 021f6f39b9bfb60e1cda5432bfe6430cb9adfed7)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Ensure that quirks are run (or not) for given PCI vendor and device IDs.
This tests the quirk infrastructure and the PCI_VENDOR_ID() and
PCI_DEVICE_ID() macros, the latter of which was recently found to be
broken.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit dc24a1fd61e0b3fbceb1027b2c458bde0257fb38)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The vendor ID is 16 bits not 8. This error leaves the top of the vendor
ID in the bottom bits of the device ID, which resulted in e.g. a failure
to run the PCI quirk for the AST VGA device.
Fixes: 2b841bf0ef1b ("core/pci: Use cached vendor/device IDs in quirks")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 50dfd067835af53736ec8106b1f0e99339b54a81)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The read offset (4:11) in Receive FIFO control register is incremented
by FIFO size whenever CRB read by NX. But the index in RxFIFO has to
match with the corresponding entry in FIFO maintained by VAS in kernel.
VAS entry is reset to 0 when opening the receive window during driver
initialization. So when NX842 is reloaded or in kexec boot, possibility
of mismatch between RxFIFO control register and VAS entries in kernel.
It could cause CRB failure / timeout from NX.
This patch adds nx_coproc_init opal call for kernel to initialize
readOffset (4:11) and Queued (15:23) in RxFIFO control register.
Fixes: 3b3c5962f432 ("NX: Add P9 NX support for 842 compression engine")
CC: stable # v5.8+
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 56026a13292453b072ad3cc9adf3dee960077f38)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
HMIs for NPU xstops are broadcasted to all chips. All cores on all the
chips receive HMI. HMI handler correctly identifies and extracts the
NPU FIR details from affected chip, but while printing FIR data it
prints chip id and location code details of this_cpu()->chip_id which
may not be correct. This patch fixes this issue.
CC: stable # v6.0+
Fixes: 7bcbc78c ("Add location code to NPU2 HMI logging")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[stewart: add fixes and cc stable]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit fa82d360a73a5e52131d8a19e9fdb9d6e9c2eeb9)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
stop1_lite has been removed since it adds no additional benefit
over stop0_lite. stop2_lite has been removed since currently it adds
minimal benefit over stop2. However, the benefit is eclipsed by the time
required to ungate the clocks
Moreover, Lite states don't give up the SMT resources, can potentially
have a performance impact on sibling threads.
Since current OSs (Linux) aren't smart enough to make good decisions
with these stop states, we're (temporarly) removing them from what
we expose to the OS, the idea being to bring them back in a new
DT representation so that only an OS that knows what to do will
do things with them.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[stewart: add to explanation]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 34e9c3c1edb3eed02f428f9cbf97d99b3db43d4d)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The memory errors (CEs and UEs) that are detected as part of background
memory scrubbing are reported by PRD asynchronously to opal-prd along with
affected memory ranges. hservice_memory_error() converts these ranges into
page granularity before hooking up them to soft/hard offline-ing
infrastructure.
But the current implementation of hservice_memory_error() does not hookup
all the pages to soft/hard offline-ing if any of the page offline action
fails. e.g hard offline can fail for:
- Pages that are not part of buddy managed pool.
- Pages that are reserved by kernel using memblock_reserved()
- Pages that are in use by kernel.
But for the pages that are in use by user space application, the hard
offline marks the page as hwpoison, sends SIGBUS signal to kill the
affected application as recovery action and returns success.
Hence, It is possible that some of the pages in that memory range are in
use by application or free. By stopping on first error we loose the
opportunity to hwpoison the subsequent pages which may be free or in use by
application. This patch fixes this issue.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit e9ee7c7d357160a704c8248a1787124f94df8c54)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Force reset was added as an attempt to work around some issues with TPM
devices locking up their I2C bus. In that particular case the problem
was that the device would hold the SCL line down permanently due to a
device firmware bug. The force reset doesn't actually do anything to
alleviate the situation here, it just happens to reset the internal
master state enough to make the I2C driver appear to work until
something tries to access the bus again.
On P9 systems with secure boot enabled there is the added problem
of the "diagostic mode" not being supported on I2C masters A,B,C and
D. Diagnostic mode allows the SCL and SDA lines to be driven directly
by software. Without this force reset is impossible to implement.
This patch removes the force reset functionality entirely since:
a) it doesn't do what it's supposed to, and
b) it's butt ugly code
Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port().
There's no need to reset every port on a master in response to an
error that occurred on a specific port.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 49656a181133013d0b436db8052e23895ad4ff11)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We have observed some TPMs clock streching the I2C bus for signifigant
amounts of time when processing commands. The same TPMs also have
errata that can result in permernantly locking up a bus in response to
an I2C transaction they don't understand. Using an excessively long
timeout to prevent this in the field.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 81d52fb22cc95cc1a8fc1001dbac361843da2662)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Set the default timeout for any bus containing a TPM to one second. This
is needed to work around a bug in the firmware of certain TPMs that will
clock strech the I2C port the for up to a second. Additionally, when the
TPM is clock streching it responds to a STOP condition on the bus by
bricking itself. Clearing this error requires a hard power cycle of the
system since the TPM is powered by standby power.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 3668dc88a1bdfc087ab7d329eabde5ac0086f9bf)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Add support for setting a default timeout for the I2C port to the
device-tree. This is consumed by skiboot.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit ac6059026442f0da98293f800aa002271d579097)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Currently if Linux boots with a non-zero PCR, things can go bad where
some early userspace programs can take illegal instructions. This is
being fixed in Linux, but in the mean time, we should cleanup in
skiboot also.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
(cherry picked from commit 3d019581c98153fc047821e3266d90852141f70d)
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
It turns out this is the IBM way to do this and we (OPAL) were just
never really told about the change.
So, with the best public docs that are available being Hostboot
source, let's make the change!
See https://github.com/open-power/hostboot/blob/8e05a4399b/src/include/usr/ipmi/ipmiif.H#L115
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The current HMI error message does not specifiy where the HMI
error occured.
The original error message was
NPU: FIR#0 FIR 0x0080100000000000 mask 0x009a48180f01ffff
The enhanced error message is
NPU2: [Loc: UOPWR.0000000-Node0-Proc0] P:0 FIR#0 FIR 0x0000100000000000 mask 0x009a48180f03ffff
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Currently the slot table init happens twice in both probe and init
functions due to the variant detection logic called with in-correct
condition check.
Fixes: d32ddea9 ("p9dsu: detect p9dsu variant even when hostboot
doesn't tell us")
Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
[stewart: renanem variant-detected to p9dsu-riser-detected]
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Without the WOF registers it's hard to figure out what went wrong first,
so print those when we print the FIRs when a fence is detected.
Suggested-by: Mike Perez <perezma@us.ibm.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
An earlier commit introduced a counter variable poller_recursion to
limit to the number number of error messages shown when opal_pollers
are run recursively. However the check for the counter value was
placed in a way that the poller recursion was only detected first 16
times and then allowed afterwards.
This patch fixes this by moving the check for the counter value inside
the conditional branch with some re-factoring so that opal_poller
recursion is not erroneously allowed after poll_recursion is detected
first 16 times.
Fixes: b6a729e118f4 ("Limit number of Poller recursion detected errors to display")
Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Stability improvements in microcode for stop4/stop5 are
available in upstream hcode images. Stop4 and stop5 can
be safely enabled by default.
Use ~0xE0000000 to cut all but stop0,1,2 in case there
are any issues with stop4/5.
example:
nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In the recent patch:
eddff9bf40 hmi: Clear unknown debug trigger
I rebased the code from an older skiboot before the HMI rework. When I
did this, I missed the handled flag. Without this the HMER is not
cleared properly and the HMI keeps happening.
This properly sets the handled flag and hence clears the HMER bit.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
op-build commit 736a08b996e292a449c4996edb264011dfe56a40
added hcode to the VERSION partition, let's parse it out
and let the user know.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
BMC Get device ID command gives BMC firmware version details. Lets add this
to device tree. User space tools will use this information to display BMC
version details.
Stewart,
I have added bmc information under /ibm,firmware-version node as its firmware
version. But may be we should add new node (/bmc/firmware). So that we can
keep BMC related information separately. Let me know your thoughts on this.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
POWER9 adds 32 bit carry and overflow bits to the XER, but we need to
set the relevant CTRL1 bit to enable them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
When on ppc64le and CROSS is not set by the environment, make assumes
ppc64 and sets a default CROSS. Check for ppc64le as well, so that
'make' works out of the box on ppc64le.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This snuck in recently.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
While CROSS can be set to a ppc64le toolchian, we don't want to build
for that target. Hardcode the target to powerpc64-linux-gnu.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Link with ld instead of gcc so we can build with clang as cc.
Remove the linker script and unnecessary flags. The application links
just fine without them.
Add cflags required by clang in order to build for the correct target.
Remove the dependency file generation. The assembly files don't include
any headers, so they weren't doing anything.
Simplify clean rule, as the $(RM) alias does -f for us, and we no longer
have .d files.
Build tested on ppc64le and amd64. Booted in Qemu on both using:
qemu-system-ppc64 -M powernv -nodefaults -nographic -serial stdio \
-kernel test/hello_world/hello_kernel/hello_kernel
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
When running this:
qemu-system-ppc64 -m 2G -M powernv -kernel debian-jessie-vmlinux \
-initrd debian-jessie-initrd.gz -nographic \
-device ipmi-bmc-sim,id=ipmi0 -device isa-ipmi-bt,bmc=ipmi0 \
-hda /tmp/debian-jessie-install.qcow2.kDubGYDrqa
We die with this error:
qemu-system-ppc64: -hda /tmp/debian-jessie-install.qcow2.kDubGYDrqa: machine type does not support if=ide,bus=0,unit=0
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Docker tries to cache stuff and it bites us.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
This contains the latest features. It's close to upstream, but there's
enough stuff in there that it probably makes sense to continue using it
for now.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
We can just use whatever qemu-img binary that's laying around,
including the distro one.
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
p9_stop_thread should fail the operation if it finds the thread was
already quiescd. This implies something else is doing direct controls
on the thread (e.g., pdbg) or there is some exceptional condition we
don't know how to deal with. Proceeding here would cause things to
trample on each other, for example the hard lockup watchdog trying to
send a sreset to the core while it is stopped for debugging with pdbg
will end in tears.
If p9_stop_thread times out waiting for the thread to quiesce, do
not hit it with a core_start direct control, because we don't know
what state things are in and doing more things at this point is worse
than doing nothing. There is no good recipe described in the workbook
to de-assert the core_stop control if it fails to quiesce the thread.
After timing out here, the thread may eventually quiesce and get
stuck, but that's simpler to debug than undefied behaviour.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Firstly, p9_cont_thread should check that the thread actually was
quiesced before it tries to resume it. Anything could happen if we
try this from an arbitrary thread state.
Then when resuming a quiesced thread that is inactive or stopped (in
a stop idle state), we must not send a core_start direct control,
clear_maint must be used in these cases.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
The minor version increments of the pstate table are backward
compatible. The minor version is changed when the pstate table
remains same and the existing reserved bytes are used for pointing
new data. So use only major version number while parsing the pstate
table. This will allow old skiboot to parse the pstate table and
handle minor version updates.
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
Now that skiboot supports building with clang we can use a modern distro
to cross compile using that compiler. Ubuntu 18.04 ships with clang 6,
so start with that.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|
|
In today's lesson, Stewart learns shell.
Fixes: e101e85c9ff65e82f7ede4d5541d921b4a3ed923
Reported-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
|