Age | Commit message (Collapse) | Author | Files | Lines |
|
[ Upstream commit d51eb6f95e7078235ba2217e2dc9fc53e65bc902 ]
The commit f397cc30bdf8 ("phb4: Only escalate freezes on MMIO load where
necessary") introduced a change to restrict escalation to the chips that
actually need it. However it missed one case which still causes the
escalation on every chip. This affects EEH recovery to cause full
PHB reset on some chips which is not necessary. This patch fixes that.
Also, add a check for p9 chip in phb4_escalation_required() function.
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit 15b93a301509ba7813343540e25b47ba395674b9 ]
This patch implements a circumvention for HW557787. It disables the
TCE cache line buffer as, under heavy loads, there's a possibility of
an entry being re-allocated incorrectly.
Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit d505f4037976ac540be1608653272ee57ae737ee ]
During opal boot, in imc_init(), 24x7/IMC microcode state is checked
and if it is not in running or pause state, currently all the
imc devices are removed from device tree. Instead, remove only
the nest imc devices. Core/Thread/Trace imc devices are not related
to 24x7 microcode. Patch adds a function to remove specific imc
device type and the same is used, when pause_microcode() fails, to
remove nest imc device types from the device tree.
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit 46d7eafbda4006b9b858b49f9df9c63575582a92 ]
BT IRQ may preempt BT timer if BMC response host when bt msg timeout.
When BT IRQ preempt BT timer, the infight_bt_msg did not protected by bt.lock very well.
And we will see the following log:
[29006114.163785853,3] BT: seq 0x81 netfn 0x0a cmd 0x23: Timeout sending message
[29006114.288029290,3] BT: seq 0x81 netfn 0x0b cmd 0x23: Timeout sending message
[29006114.288917798,3] IPMI: Incorrect netfn 0x0b in response
It may cause 'CPU Hardlock UP', 'memory refree', 'kernel crash' or something else...
Signed-off-by: lixg <867314078@qq.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit b44c7594523d20945179e497c45ec9007981ac75 ]
Currently we are not accounting cancelled timer request. So in some
corner cases we may schedule new timer request with
new-timer-value > inflight-timer-value.
Lets explicit check new_target value with inflight timer value.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit 2e654443050acdd4deffdbb44723a847ca11e6b2 ]
We schedule timer and wait for `timer expiry` interrupt from SBE.
If we get new timer request which is lesser than inflight timer
expiry value we can update timer (essentially sending new timer chip-op
and SBE takes care of stoping inflight timer and scheduling new one).
SBE runs at much slower speed than host CPU. If we do continuous timer
update like below then SBE will be busy with handling PSU side timer
message and will not get time to handle FIFO side requests.
send timer chip-op -> Got ACK -> send timer chip-op
Hence this patch limits number of continuous timer update and we will
restart sending timer request as soon as we get timer expiry interrupt.
Rate limit value (2) is suggested by SBE team.
With this patch:
If our timer requests are : 2ms, 1500us, 1000us and 800us
(and requests are coming after sending each message)
We will schedule timer for 2ms and then update timer for 1500us and 1000us
(These update happens after getting ACK interrupt from SBE)
We will not send 800us request.
At 1000us we get `timer expiry` and we are good to send next timer requests
(At this stage both 1000us and 800us timeout happens. We will schedule
next timer request with timeout value 500us (1500-1000)).
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit 47ab3a92298e72e44b9477a02b1312a09272a54a ]
Timer flow:
- OPAL sends timer chip-op to SBE and waits for ACK
- Until we get ACK interrupt from SBE we will not schedule any new timer
- Once we get ACK either we wait for timer expiry -OR- schedule
new one if new-timer-request < inflight-timer-timeout value.
- If we get new timer request while processing current one
p9_sbe_update_timer_expiry code sets `has_new_target` and we
schedule it in ACK path (p9_sbe_timer_resp()).
p9_sbe_timer_resp() is callback handler and its called without lock.
It does not check whether timer message is busy or not (timer_ctrl_msg).
So in theory we may hit below scenario and corrupt msg_list.
CPU 1 -> Timer ACK (callback handler) -- its not holding any lock
CPU 2 -> Grabbed sbe_timer_lock -> scheduled timer --> done
CPU 3 -> p9_sbe_update_timer_expiry() -> see timer is busy -> sets has_new_timer -> done
CPU 1 -> gets chance to grab sbe_timer_lock -> saw has_new_timer -> Called p9_sbe_timer_schedule() --> List corrupted !
This patch adds timer message busy check in p9_sbe_timer_resp().
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit de20b93849c3cdee62ff066e079b5460737e8609 ]
This reverts commit 5262cdd1b99f77bca5951fc8132f9795ef0c2b87.
When link reset/retrain, this method cannot maintain the max-link-speed limit, so remove it.
Signed-off-by: LuluTHSu <Lulu_Su@wistron.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit a4101173cacf79fcd91d395ab12aac9cb6840975 ]
Commit 80fd2e963bd4 ("xscom: Don't log xscom errors caused by OPAL
calls") ensured that xscom errors caused due to XSCOM read/write OPAL
calls aren't logged in the error-log since the caller of the OPAL call
is expected to handle it.
However we are continuing to print the prerror() in the OPAL log
regarding the same. This patch reduces the severity of the log from
PR_ERROR to PR_INFO for the xscom read and write made via OPAL calls.
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Print info only for xscom read/writes made via opal calls
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
[ Upstream commit f07ea9564425d8005ab334dfa40f7cebe4e71fbf ]
XIVE VPs are structures describing the vCPUs of guests. When starting
a guest, these are allocated and enabled and some checks are done on
the location of the associated ENDs, which describe the event
queues. If the block of the VP and the block of the ENDs do not match,
the XIVE driver asserts.
Unfortunately, there is no way to check that a VP identifier is part
of a VP block that was previously allocated and it is relatively easy
to crash the host with a bogus VP id. That can be done with a QEMU
hack on a machine using vsmt.
Simply remove the assert, the OS should gracefully handle the error.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reported-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
|
|
Commit ad7e9a67c4e4 ("xive/p9: obsolete OPAL_XIVE_IRQ_SHIFT_BUG
flags") forgot to remove the internal flag.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Refer to the spec. of mowgli, limit the slot to Gen3 speed.
For mowgli platform spec.
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: LuluTHSu <Lulu_Su@wistron.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
`msg` is valid pointer here. I don't recall why I added assert here :-(
This is not correct. We shouldn't call assert here. Also we are not using
`msg`. Hence convert it to `__unused`.
Fixes: 19d4f98e ('FSP/NVRAM: Handle "get vNVRAM statistics" command')
Cc: skiboot-stable@lists.ozlabs.org # v5.4.x +
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
If MPIPL is not enabled then we will not create `/ibm,opal/dump` node
and we should continue to parse/retrieve SYSDUMP. I missed this scenario
when I fixed similar issue last time :-(
Fixes: 92b7968 (fsp: Skip sysdump retrieval only in MPIPL boot)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Check the AER capability offset pointer is non-zero before enabling the
AER messages. If the device doesn't support AER we end up writing
garbage to config offset 0x0 + PCIECAP_AER_CAPCTL, or 0x18. For a normal
device this is one of the BARs so this doesn't do much, but for a bridge
this results in overriding:
0x18 - The primary bus number
0x19 - The secondary bus number
0x1A - The subordinate bus number
0x1B - The latency timer
0x1B is hardwired to zero for PCIe devices, but overwriting the bus
number register can cause issues with routing of config space accesses.
It's worth pointing out that we write actual values for the secondary
and subordinate bus numbers before scanning the secondary bus, but the
primary bus number is never restored.
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
PHB3 had an errata about correctable errors and when Ben was doing the
initial PHB4 port he deleted the corresponding config write to DEVCTL.
Whoops.
Cc: skiboot-stable@lists.ozlabs.org
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The IMC HW targets HW ECs, not fused cores on P9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
FROM: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linux doesn't know how to properly restore state on "both halves" of
a fused core, so limit ourselves to STOP states that don't require
HV state restore for bare metal kernels (KVM is still broken) until
we add a new representation for STOP states.
The new representation will have per-state versioning so that we
can control their individual enablement based on whether the OS
has the necessary workarounds to make them work.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Set or clear the fused core mode bit in the XIVE inits
properly. While HostBoot is supposed to do it, I prefer
not depending on it doing the right thing, since we already
configure that register ourselves anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
P9 cores can be configured into fused core mode where two core chiplets
function as an 8-threaded, single core. So, bump four to eight in boot_entry
when in fused core mode and cpu_thread_count in init_boot_cpu.
The HID, AMOR, TSCR, RPR require the first active thread on that core chiplet
to load the copy for that core chiplet. So, send thread 1 of a fused core to
init_shared_sprs in boot_entry.
The code checks for fused core mode in the core thead state register and puts a
field in struct cpu_thread. This flag is checked when updating the HID and in
XIVE code when setting the special bar.
For XSCOM, the core ID is the non-fused EX. So, create macros to arrange the
bits. It's fairly verbose but somewhat readable.
This was tested on a P9 ZZ with 16 fused cores and ran HTX for over 24 hours.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
To activate the HW thread context ring, and its associated thread
interrupt registers, a thread needs to raise the VT bit in word2. This
requires access to the TIMA and this access is only granted if the
thread was first enabled at the XIVE IC level.
This is done in a sequence in xive_cpu_callin() but there is a chance
that the accesses done on the TIMA do not see the update of the enable
register.
To make sure that the enablement has completed, add an extra load on
the PC_THREAD_EN_REGx register. This guarantees that the TIMA accesses
will see the latest state of the enable register.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Fix a typo in comment about Presentation Controller Base Address Register
and another typo about code to configure the queue overflows.
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
When configuring the XIVE notification address any currently pending
interrupts will be delivered once the the valid bit in the BAR is set.
Currently we enable the notify BAR before we've configured the global
interrupt number offset for the PSI interrupts. If any PSI interrupt is
pending at this point we'll send an interrupt trigger notification for
the wrong interrupt vector. This can potentially cause a checkstop since
there may not be an EAS / IVT configure for that vector.
Fix this by fixing the ordering so we setup the offset before the XIVE
notification address.
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
It seems we should continue to retrieval SYSDUMP except in MPIPL boot.
Fixes: d6eb510 (fsp: Ignore platform dump notification on P9)
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This commit fixes two typos in XIVE comments about how to handle an
escalation event.
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
There are few PRD function which are specific to FSP/BMC. If HBRT
accidently makes those call we are asserting today.. which is not good.
This function replaces those assert()'s with OPAL_UNSUPPORTED return value.
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Commit 34664746 moved opal_mpipl_save_crashing_pir() function call from
platform specific code to generic assert() path. I completely missed
to take care of all terminate path :-(
This resulted in breaking `opalcore` on Linux kernel initiated MPIPL. As :
- Linux initiated MPIPL calls platform termination function directly
- ELF core format needs crashing CPU details to generate proper code
Hence I think it makes sense to move this back to platform specific
terminate handler code.
Today we have two ways to trigger MPIPL based on service processor.
- On BMC system we call SBE S0 interrupt
- On FSP system we call `attn` instruction
In future if we add new ways to trigger MPIPL then we have to add platform
specific support code anyway. That way its fine to move this to platform
sepcific code.
One alternative is to make this call in all code path before making
platform.terminate call... which makes it more complicated than above approach.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
continuous MPIPL loop (boot -> crash -> MPIPL -> boot).
If OPAL boot fails (before loading kernel) then its better to call reboot.
So that BMC can detect `n` number of boot failures (generally n = 3) and
stop booting. That way we can avoid continuous loop.
This patch moves MPIPL init to the end of init process (just before starting
kernel). So that if we fail to boot OPAL we call normal reboot.
Also this patch introduces new function to detect MPIPL is enabled or not
(is_mpipl_enabled()). And in assert() path we check for this function
instead of `dump` DT node. So that it will make sure we will not call
MPIPL until opal_mpipl_init is complete.
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
All callers of dt_resize_property() need to set the new property length
after calling it. append_chip_id() wasn't doing it, which caused this
assert when booting my machine:
[ 136.387213258,3] Unable to use memory range 0 from MSAREA 0
[ 136.387356677,3] Unable to use memory range 0 from MSAREA 2
[ 136.387408390,3] ***********************************************
[ 136.387454272,3] < assert failed at core/device.c:605 >
[ 136.387493225,3] .
[ 136.387512799,3] .
[ 136.387534056,3] .
[ 136.387550294,3] OO__)
[ 136.387579530,3] <"__/
[ 136.387605086,3] ^ ^
[ 136.387719329,3] Fatal TRAP at 0000000030028a18 .dt_property_set_cell+0x34 MSR 9000000000021002
[ 136.387801707,3] CFAR : 00000000300bfd3c MSR : 9000000000001000
[ 136.387847032,3] SRR0 : 0000000030028a18 SRR1 : 9000000000021002
[ 136.387893119,3] HSRR0: 0000000030012524 HSRR1: 9000000000001000
[ 136.387936830,3] DSISR: 40000000 DAR : 00000002019df000
[ 136.387983570,3] LR : 00000000300bfd40 CTR : 0000000000000000
[ 136.388046031,3] CR : 20004202 XER : 00000000
[ 136.388094553,3] GPR00: 00000000300bfd40 GPR16: 0000000000000001
[ 136.388139862,3] GPR01: 0000000031e536e0 GPR17: 00000000300ca3c9
[ 136.388181131,3] GPR02: 0000000030121200 GPR18: 0000000030103e1c
[ 136.388224105,3] GPR03: 000000003053fc60 GPR19: 0000000000000008
[ 136.388270356,3] GPR04: 0000000000000001 GPR20: 000000003053fba0
[ 136.388313950,3] GPR05: 0000000000000008 GPR21: 0000000000000001
[ 136.388363021,3] GPR06: 0000000031e50060 GPR22: 0000000000000001
[ 136.388416754,3] GPR07: 0000000000000000 GPR23: 0000000000000000
[ 136.388465729,3] GPR08: 0000000000000000 GPR24: 0000000000000000
[ 136.388508156,3] GPR09: 0000000000000004 GPR25: 0000000031204060
[ 136.388556203,3] GPR10: 0000000000000008 GPR26: 000000003120402c
[ 136.388599076,3] GPR11: 0000000000000000 GPR27: 0000000030010000
[ 136.388642108,3] GPR12: 0000000040004204 GPR28: 0000000000000002
[ 136.388694064,3] GPR13: 0000000031e50000 GPR29: 0000000031203ee0
[ 136.388743298,3] GPR14: 00000000300cbf03 GPR30: 0000000031202e80
[ 136.388797131,3] GPR15: 00000000300cc01c GPR31: 0000000030103a33
CPU 0048 Backtrace:
S: 0000000031e539e0 R: 0000000030028874 .dt_resize_property+0x28
S: 0000000031e53a60 R: 00000000300bfd40 .memory_parse+0xd84
S: 0000000031e53c40 R: 00000000300bc4d8 .parse_hdat+0xed0
S: 0000000031e53e30 R: 000000003001504c .main_cpu_entry+0x1ac
S: 0000000031e53f00 R: 0000000030002760 boot_entry+0x1b0
Avoid further appearances of the unidentified animal of doom by making
dt_resize_property() do the length updating itself, freeing its callers
from that need.
Suggested-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Replace 0x20000 with a clear define.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[oliver: added prev patch, minor style fix]
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
I think Cedric forgot this patch at some point.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The CAM line of the HW context is 23bits wide and its value is
hardcoded in the XIVE IC presenter with :
|chip|000000000001|thrdid |
To make sure that we won't assign a VP id overlapping with the HW CAM
line, we reserve range 0x80..0xff in our VP allocator. Make that
clear.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The VP space is 19bits wide but the number of XIVE VPs software can
use depends on the configured number of EQs. We have 1M EQs and we use
8 priorities per VP. Therefore, our VP space is limited to 128k.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
It is possible to configure the IC and TM BAR mappings using 4k pages
but we never do. Remove the code doing so.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The XIVE interrupt controller uses a set of Virtualization Structure
Tables (VST) which characteristics, type, address, size, are described
by Virtual Structure Descriptors (VSD). A VSD is 64bit wide.
The EQ and VP tables are indirect tables. The VSD points to a single
page of VSDs each pointing to a page of virtual structures. Indirect
tables are limited to a single top page which is enough to cover the
whole range of EQs (24 bits) and VPs (19bits).
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Each EQ descriptor is associated with a pair of ESB pages. The even
page controls the ESn PQ bits and the odd page controls the ESe PQ
bits.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
When an interrupt can not be delivered, an escalation interrupt can be
triggered. The EQ descriptor of the pending interrupt should be
configured to generate an escalation event, using the EQ_W0_ESCALATE_CTL
'e' bit, and words 4 and 5 of the EQ descriptor should contain an IVE
pointing to the escalation EQ to trigger. This is why EQ descriptors
are considered as interrupt sources and registered as such when
initializing the interrupt controller.
These interrupts are identified as escalations by the OPAL XIVE
interface, OPAL calls and internal routines, by setting a special bit
in their global interrupt number. Clarify that and check that the
number of EQ descriptors is not overflowing the global interrupt
encoding.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
XIVE_EQ_ORDER defines the number of EQ descriptors per chip the system
can use.
The EQ descriptors can be controlled by ESB pages also and the driver
defines in the VC BAR of the controller a range of 128G of ESB pages
giving access to 1M EQs. All ESB pages are backed by a memory table,
so we are fine but we could improve the configuration.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Each interrupt source is associated with a pair of ESB pages. The even
page is for trigger and the odd page is for management.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
On P9, the global IRQ number is limited to 24 bits because the XICS
emulation encodes the CPPR value in the top 8 bits. The following
4 bits are used to encode the XIVE block number, which leaves 20 bits
for the interrupt index number. Introduce a definition reflecting the
size of this bitfield and check that number of interrupts per chip is
not overflowing our encoding.
Reviewed-by: Gustavo Romero <gromero@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
The size of the interrupt number space is constrained by the block and
index fields of the trigger data exchanged between source units and
the XIVE IC. These are respectively 4 and 28 bits, which gives us a 32
bits interrupt number space. But the XICS emulation requires 8 bits to
encode the CPPR value. The system interrupt number space is therefore
constrained to 24 bits and on a chip, to 20 bits because the XIVE
driver configures the HW to use one block per chip.
XIVE_INT_ORDER defines the size of the interrupt number space : 1M per
chip.
To control these interrupts, the driver defines in the VC BAR of the
controller a range of 384G of ESB pages giving access to 3M interrupts.
The VSD for the memory table is smaller than the index and accesses to
some ESB pages are not backed by a memory table structure. If such an
access occurred, it would result in a FIR.
It never happened but this is something to fix with a finer configuration
of the VC BAR.
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
We only really use the gcov output when doing the coverage report as a
part of the "docs" CI builds. It's useful for development to just run
the unit tests so make sure the "check" and "coverage" targets are
seperate.
This also speeds up our CI builds since those jobs are already doing a
seperate GCOV pass so building and running the GCOV binaries during the
check pass is redundant.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Annotate io accessor pointer types with endian.
sparse caught a bug in memcpy_from_ci, which is fixed.
From: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
This commit fixes a typo and a spelling in a comment about the XIVE set
translate mechanism.
Signed-off-by: Gustavo Romero <gromero@linux.ibm.com>
Reviewed-by: Stewart Smith <stewart@flamingspork.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
Currently the wait_for_all_occ_init() function determines that the
OCCs associated with every Chip has been initialized by verifying if
the "Valid" bit in pstate table of that OCC is set.
However, on chips where all the EX units are guarded, the OCC, even
though it is active, does not update the pstate_table. Currently as a
result of this, OPAL concludes that the OCC is not functional and not
only disable Pstate initialization, but incorrectly report that that
OCCs were not initialized, thereby cutting other features such as
sensors.
Fix this by ensuring that
* We check if there is atleast one active EX unit in the chip
before checking if the OCC is active.
* On platforms with OCC-OPAL communication interface version 0x90
* wait_for_all_occ_init() only checks if the occ_state in the
OCC dynamic area is set to "Active State".
* move the "Valid" bit check to add_cpu_pstate_properties(),
which is where we create the device-tree entries for
frequency scaling.
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|
|
In PHB4 the PHB's error and informational interrupts were changed to behave
more like actual LSIs. On PHB3 these interrupts would be only be raised on
a 0 -> 1 transition of an error status bits (i.e. they were rising edge
triggered). On PHB4 the error interrupts are "true" LSIs and will be
re-raised as long the underlying error status bit is set.
This causes a headache for us because OPAL's PHB error handling model
requires Skiboot to preserve the state of the PHB (including errors) until
the kernel is ready to handle the error. As a result we can't do anything
in Skiboot to handle the interrupt condition and we need to mask the error
internally. We can do this by clearing the relevant bits in the IRQ_ENABLE
registers of the PHB.
It's worth pointing out that we don't want to mask the interrupt by setting
the Q bit in the XIVE ESBs. The ESBs are owned by the OS which may be
masking and unmasking the interrupt for its own reasons (e.g. migrating
IRQs). Skiboot modifying the ESB state could potentially cause problems and
should be avoided.
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
|