Age | Commit message (Collapse) | Author | Files | Lines |
|
pci_set_cap needs a callback to free data and we need to call
that when we're doing __pci_reset()
We also need to free pcrf entries.
In the future, __pci_reset() and pci_remove_bus() need to come
together to be one canonical place on how to free a PCI device
rather than the two we have now. This patch *purely* focuses
on the problem of not leaking memory across fast-reboot.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We are going need pci_wait_crs() in the PHB4 code so make it global.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This can be checked from config space, but we will need to know this when
restoring the PCI topology, and it is not always safe to access config
space during this period.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
P9 supports PCI peer-to-peer: a PCI device can write directly to the
mmio space of another PCI device. It completely by-passes the CPU.
It requires some configuration on the PHBs involved:
1. on the initiating side, the address for the read/write operation is
in the mmio space of the target, i.e. well outside the range normally
allowed. So we disable range-checking on the TVT entry in bypass mode.
2. on the target side, we need to explicitly enable p2p by setting a
bit in a configuration register. It has the side-effect of reserving
an outbound (as seen from the CPU) store queue for p2p. Therefore we
only enable p2p on the PHBs using it, as we don't want to waste the
resource if we don't have to.
P9 supports p2p mmio writes. Reads are currently only supported if the
two devices are under the same PHB but that is expected to change in
the future, and it raises questions about intermediate switches
configuration, so we report an error for the time being.
The patch adds a new OPAL call to allow the OS to declare a p2p
(initiator, target) pair.
Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The PCI device vendor/device IDs have been cached to pd->vdid, no
need to pass them in pci_handle_quirk(). This also introduces two
macros to extract vendor/device fields and they are useful afterwards.
No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Move phb3_pcicfg_filter() to pci.c, rename it to pci_handle_cfg_filters()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This avoids doing a search through the list of all devices on
every config space access to every device under a PHB.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This adds another PHB callback (device_remove()), corresponding to
device_init(). With it, the PHB3 layer can receive notification
upon PCI topology changes. This functionality will be used by the
subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
NVLink2 is a new feature introduced on POWER9 systems. It is an
evolution of of the NVLink1 feature included in POWER8+ systems but
adds several new features including support for GPU address
translation using the Nest MMU and cache coherence.
Similar to NVLink1 the functionality is exposed to the OS as a series
of virtual PCIe devices. However the actual hardware interfaces are
significantly different which limits the amount of common code that
can be shared between implementations in the firmware.
This patch adds basic hardware initialisation and exposure of the
virtual NVLink2 PCIe devices to the running OS.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When we start to support SRIOV capability in subsequent patches,
a data struct will be instantiated and associated with the SRIOV
capability. This extends the current implementation for that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The logic initializing device's PCIe capability is resident in the
function pci_scan_one() from day one. It's because information
(e.g. vendor/device IDs) aren't stored into PCI device instance in
old days. Now, the PCI device instance contains all information
required to initialize its PCIe capability and others.
This moves the logic initializing PCIe capability from pci_scan_one()
to separate functions, pci_init_capabilities() and pci_init_pcie_cap().
pci_scan_one() is simplified to make code maintaining a bit easier.
Also, it will allow us to intorduce separate functions to initialize other
capabilities as we're doing for PCIe capability.
This also exports pci_init_capabilities() so that it can be reused by
SRIOV VFs in future. No logical changes introduced by this.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The NVLinks (v1 and v2 to be supported in future) are exposed to
Linux kernel by emulated PCI devices (aka PCI virtual devices).
Currently, the implementation is covered by NVLink driver (npu.c),
meaning npu2.c will have similar implementation though it will be
totally duplicated with that in npu.c.
This supports PCI virtual device in the generic layer so that it
can be shared by all NVLink drivers. The design is highlighted as:
* There are 3 config spaces for every PCI virtual device, corresponds
to the cached config space, readonly space, write-1-clear space.
* Reuse PCI config register filter mechanism to allow NVLink driver
to emulate the access to the designated config registers. The config
values are fetched from or written to the cached config space when
the config registers aren't covered by filter.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This improves PCI config register filter so that it can be reused
by PCI virtual device in subsequent patch:
* First argument to pci_cfg_reg_func() is changed to "void *".
It allows to accept variable data types including PCI virtual
device in future.
* Return value from pci_cfg_reg_func() to be used by PCI virtual
device in future.
* Shortened name of function phb3_pcicfg_filter_rc_pref_window().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Throughout skiboot (and the kernel) PE numbers are named "pe_no",
"pe_num" and "pe_number", and sized as 16, 32 and 64bit uints depending
on where you look. This is annoying and potentially misleading in cases
such as the OPAL API, where different calls have different int sizes
even though the PE number they want is the same.
Fix this by making *everything* uint64_t pe_number. In doing this, there
are some whitespace fixes and mve_number gets dragged into this as well
for cases like set_msi_{32/64} where they essentially mean the same thing.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This adds the base support for the PHB4. It currently only support
the M32 window, EEH or in general error recovery aren't supported
yet.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[stewart@linux.vnet.ibm.com: update (C) year, fix indenting]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The value might be different for different PHB instances
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The various reset requests are completed by PHB's callbacks. All
of them (except reset on IODA table or error injection) are covered
by PCI slot. opal_pci_poll() faces similar situation.
This reimplements opal_pci_reset() and opal_pci_poll() based on
the callbacks provided by PCI slot instead of PHB. Also, couple of
new APIs are introduced based on the callbacks in PCI slot as below:
* opal_pci_get_presence_state(): Check if there is adapter presented
behind the specified PHB or PCI slot.
* opal_pci_get_power_state(): Returns power supply state (on or off)
on the specified PHB or PCI slot.
* opal_pci_set_power_state(): Sets power supply state (on or off)
on the specified PHB or PCI slot. Besides, the state can be (offline
or online) without changing the PCI slot's power state.
Eventually, the definition of unused PHB's callbacks are removed.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
During PCI enumeration, the root complex's link and fundamental
reset are carried out by PHB's callbacks which are replaced by
the corresponding PCI slot's callbacks. Also, the hotplug related
device node properties are populated based on the PCI slot info
that is included in PCI slot now.
This uses PCI slot in enumeration:
* Use PCI slot's callbacks for fundamental reset and link status
retrieval in PCI enumeration.
* Simplify the code by removing traditional PCI/PCI-x related
logic as we don't have PCI/PCI-X root complex.
* Replace pci_add_slot_properties() with pci_slot_add_properties()
to populate PCI slot properties in device-tree.
* PHB is always not hotpluggable. No hotpluggable properties in
its device node are needed.
* Remove "struct pci_slot_info" definition as its info is included
in "struct pci_slot".
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Every PCIE bridge port or PHB is expected to be bound with PCI slot
, to which various PCI slot's functionalities are attached (e.g. power,
link, reset). This supports PCI slot:
* PCI slot is reprsented by "struct pci_slot".
* "struct pci_slot_ops" represents the functions supported on the
PCI slot. It's initialized by PCI slot core at the beginning and
allowed to be overrided by platform partially or completely.
* On PCI hot plugging event, the PCI devices behind the slot are
enumarated. Device sub-tree is populated and sent to OS by OPAL
message.
* On PCI hot unplugging event, the PCI devices behind the slot are
destroyed. Device sub-tree is removed and the slot is powered off.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This adds @data argument and "int" return value for struct phb_ops::
device_init() so that it can be called in pci_walk_dev() directly to
reinitialize the PCI devices behind the specified slot in subsequent
patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently, pci_restore_bridge_buses() restores the assigned bus
ranges for all PCI bridges behind the specified PHB. This extends
the function and allows doing same thing for the PCI bridges behind
the specified slot. The extended functionality is going to be used
by PCI hotplug logic in the subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Currently, pci_walk_dev() iterates all PCI devices behind the
specified PHB. This extends the function to allow iteration
on PCI devices behind the specified PCI slot so that it can
be used by PCI hotplug logic in the subsequent patches.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When scanning to non-existing PCI device, EEH (frozen) error is
usually happening. We clear the unexpected frozen PE state after
it. The reserved PE number is assumed to be 0 wrongly. So the
frozen state on the reserved PE number isn't cleared properly.
This introduces struct phb_ops::get_reserved_pe_number() to
retrieve the reserved PE number from platforms. Then the EEH
frozen state checking and clearing are applied to the reserved
PE number.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
pci_{get,put}_phb() were introduced to increase/decrease refcount or
similar thing to PHB. They should show up in pairs and some code is
obvious breaking the semantics, but the logic is good as pci_put_phb()
does nothing.
As we do not maintain refcount for PHB and we should not have PHB
unplugging in near future, it simply drops pci_put_phb(). No functional
changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
phb_ops->device_node_fixup() was introduced for NPU1 so that the
chip backend can bind the emulated NPU device with the GPU device
and fixes the device-tree node accordingly. There're couple of
issues as I can image:
* In pci_fixup_nodes(), one PHB has only one level of device
depth in the hierarchy tree. It's true for NPU PHBs, but
false for other PHBs. That indicates the function can be
called for NPU PHBs.
* The callback name indicates the specific work to be done
there. That doesn't make sense. We need another name without
indicating the specific work to do. It will give the backend
on chip level more freedom. Similarly, the callback is called
on basis of PCI device. It's hard for backend to manuplate
the PHB. More freedom the backend will get with more bold
granularity.
This fixes above issues by replacing phb_ops->device_node_fixup()
with phb_ops->phb_final_fixup(). More freedom will be received in
the backends.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
All kinds of PHBs are maintaining a spinlock. At mean while, the
spinlock is acquired or released by backends for phb_ops->lock()
or phb_ops->unlock(). There're no difference of the logic on all
kinds of PHBs. So it's reasonable to maintain the lock in the
generic layer (struct phb).
This moves lock from specific PHB to generic one. The spinlock is
initialized when the generic PHB is registered in pci_register_phb().
Also, two inline functions phb_{lock, unlock}() are introduced to
acquire/release it. No logical changes introduced.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
Similar to for_each_cpu, adding a for_each_phb makes PHB traversal easy.
Suggested-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
No functional change, but static analysis showed up the oddity of
something that is generally unsigned (opal_id) having a signed
value assigned to it.
Took the opportunity to use a define to increase readability.
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
When nvLink and nVida's GPU included in PCI topology, we have the
emulated PCI devices to represent nvLinks, which is associated with
the real GPU PCI device with help of device-tree. The patch introduces
one more field "dn" to "struct pci_device" to make the job easier.
The patch also adds one more PHB operations "device_node_fixup", which
is to be called when populating PCI device node so that we have chance
to link the emulated PCI device and the real GPU device through device
tree.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The patch caches IDs (vendor, device, sub-vendor, sub-device and
class) for PCI devices. Those IDs could be used to identify one
specific PCI device later.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
We have to provide the emulated result for PCI config register
access on some devices to eleminate the gap between hardware and
software. One example would be the 0x28 (prefetchable memory window
upper 32-bits) of the root complex on Naples isn't writable. Linux
kernel relies on that to detect 64-bits window successfully.
This introduces config register filter to PCI device to eleminate
above gap. Each PCI device maintains a list of filters, which are
populated when the PCI device is initialized. When PCI config space
is accessed, the filter is searched to override the result from user
(write) or hardware (read) if necessary.
Reported-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
On P8, we calculate the OPAL ID of the PHB as a function of the physical
chip number and PHB index on that chip. P7 continues using "allocated"
numbers for now.
We also consistently print the PHB ID as a 4-digit hex number which
facilitates decoding it, and print the chip:index location in the probe
code to make it easier to correlate log entries.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[stewart@linux.vnet.ibm.com: use next_chip rather than get_chip]
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
This moves some fields that are specific to the LXVPD mechanism out
of the generic pci_slot_info into a private wrapper. Additionally,
most fields in pci_slot_info are made signed integers in order to
allow them to be set to "-1" which indicates that the field doesn't
have a meaningful value, and inhibits creation of the corresponding
device-tree property.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The opal eeh interrupt handlers raise an opal event
(OPAL_EVENT_PCI_ERROR) whenever there is some processing required from
the OS. The OS then needs to call opal_pci_next_error(...) in a loop
passing each phb in turn to clear the event.
However opal_pci_next_error(...) clears the event unconditionally
meaning it would be possible for eeh events to be cleared without
processing them leading to missed events.
This patch fixes the problem by keeping track of eeh events on a
per-phb basis and only clearing the opal event once all phb eeh events
have been cleared.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
|
|
The patch refactors the code we had for PCI error injection. It
doesn't change the logic:
* Rename names of error types and functions according to the
comments given by Michael Ellerman when reviewing the kernel
counterpart.
* Split The backend of error injection for PHB3 and P7IOC to
multiple functions to improve code readability. Some logics
are simplified without affecting their original functionality.
* Misc cleanup like renaming variables and functions.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Add a flag indicating the CAPP unit is in recovery. When a capp recoverable
malfunction HMI comes in, the HMI handler will call into
phb3_set_capp_recovery, which will put set the flag and send the event to
Linux.
EEH will call phb3_next_error which will tell it the phb is fenced.
EEH will then call into sapphire to reinitialize the phb which contains steps
3-5 of capp recovery procedure. The code increases wait time of PERST to 1s to
ensure fpga download is complete before polling linkup.
EEH will then rebind the cxl driver and it will complete recovery once it
initializes and turns snoops on, steps 7-8, completing capp recovery procedure.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch adds function pci_device_init(), which is called by
phb->ops->device_init() to apply common initialization on the
specified PCI device during bootup or after PE reset.
Currently, we only put the logic of MPS configuration to the
function, but more will be put there.
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The complete reset could be issued by kdump kernel to remove pending
PCI traffic in order to avoid EEH errors in kdump scenario. However,
the bus numbers configured into PCI bridges would be lost after the
reset and it would cause that some of PCI devices (e.g. IPR) can't
be probed by kdump kernel successfully.
The patch fixes above issue by restoring bus numbers after complete
reset. It's responsing to bug#113210
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch introduces a new OPAL API opal_pci_eeh_freeze_set(),
which allows to set frozen state for the specified PE, so that
we can support "compound" PE in kernel.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Though the p7ioc spec states the errors triggered by PAPR error
injection register set (0x2b0, 0x2b8, 0x2c0) should be one-shot
without "sticky" bit, Firebird-L machine doesn't follow the rule.
It will cause endless frozen PE until we have to remove the PE
permanently.
The patch extends opal_pci_reset() allowing kernel to clear PAPR
error injection register set at appropriate point.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
The patch introduces new OPAL API opal_pci_err_injct() for injecting
PCI errors.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|