aboutsummaryrefslogtreecommitdiff
path: root/hw/phb4.c
AgeCommit message (Collapse)AuthorFilesLines
2017-07-25phb4: Add link training trace modeMichael Neuling1-0/+56
Add a mode to PHB4 to trace training process closely. This activates as soon as PERST is deasserted and produces human readable output of the process. This may increase training times since it duplicates some of the training code. This code has it's own simple checks for fence and timeout but will fall through to the default training code once done. Output produced, looks like the "TRACE:" lines below: [ 3.410799664,7] PHB#0001[0:1]: FRESET: Starts [ 3.410802000,7] PHB#0001[0:1]: FRESET: Prepare for link down [ 3.410806624,7] PHB#0001[0:1]: FRESET: Assert skipped [ 3.410808848,7] PHB#0001[0:1]: FRESET: Deassert [ 3.410812176,3] PHB#0001[0:1]: TRACE: 0x0000000101000000 0ms [ 3.417170176,3] PHB#0001[0:1]: TRACE: 0x0000100101000000 12ms presence [ 3.436289104,3] PHB#0001[0:1]: TRACE: 0x0000180101000000 49ms training [ 3.436373312,3] PHB#0001[0:1]: TRACE: 0x00001d0811000000 49ms trained [ 3.436420752,3] PHB#0001[0:1]: TRACE: Link trained. [ 3.436967856,7] PHB#0001[0:1]: LINK: Start polling [ 3.437482240,7] PHB#0001[0:1]: LINK: Electrical link detected [ 3.437996864,7] PHB#0001[0:1]: LINK: Link is up [ 4.438000048,7] PHB#0001[0:1]: LINK: Link is stable Enabled via nvram using: nvram -p ibm,skiboot --update-config pci-tracing=true Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Improve reset and link training timingMichael Neuling1-10/+105
This improves PHB reset and link training timing. Justifications and reasons are included in the patch. Polling frequencies are decreased from 100ms to 10ms. Added is a new state called PHB4_SLOT_LINK_STABLE which is now needed since the link training can be so fast that we touch config space too quickly (PCIe spec requires 1 second between PERST de-assert and device config space reads). We use this new state to sanity check the PHB and link before moving onto the PCI bus scan, where we no longer recover from these error conditions. Also added is simplified documentation of the PHB reset and training flow. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Add phb4_check_reg() to sanity check failuresMichael Neuling1-0/+17
This adds a function phb4_check_reg() to sanity check when we do MMIO reads from the PHB to make sure it's not fenced. This also adds some uses of this function in common locations where these may occur on PHB reset and link training. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Remove retry on electrical link timeoutMichael Neuling1-1/+1
Currently we retry if we don't detect an electrical link. This is pointless as all devices should respond in the given time. This patches removes this retry and just returns OPAL_HARDWARE if we don't detect an electrical link. This has the additional benefit of improving boot times on machines that have badly wired presence detect (ie. says a device is present when there isn't). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Simplify calling phb4_retry_state()Michael Neuling1-10/+2
phb4_retry_state() returns a good error code, so just use that rather than complicating the caller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Read PERST signal rather than assuming it's assertedMichael Neuling1-3/+3
Currently we assume on boot that PERST is asserted so that we can skip having to assert it ourselves. This instead reads the PERST status and determines if we need to assert it based on that. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Fix endian of TLP headers printMichael Neuling1-5/+5
Byte swap TLP headers so they are the same as the PCIe spec. Also remove redundant print. Suggested-by: Rob Lippert <rlippert@google.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Change timeouts prints to error levelMichael Neuling1-2/+2
If the link doesn't have a electrical link or the link doesn't train we should make that more obvious to the user. This boosts these prints to error level. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Better logs why the slot didn't workMichael Neuling1-1/+10
Better logs why the slot didn't work and make it a PR_ERR so users see it by default. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Force verbose EEH loggingMichael Neuling1-0/+2
Force verbose EEH. This is a heavy handed and we should turn if off later as things stabilise, but is useful for now. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-25phb4: Initialization sequence updatesRussell Currey1-13/+22
Mostly errata workarounds, some DD1 specific. The step Init_5 was moved to Init_16, so the numbering was updated to reflect this. (mikey: added section on ignoring errata ER20161123) Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Do more retries on link training failuresMichael Neuling1-1/+1
Currently we only retry once when we have a link training failure. This changes this to be 3 retries as 1 retry is not giving us enough reliablity. This will increase the boot time, especially on systems where we incorrectly detect a link presence when there really is nothing present. I'll post a followup patch to optimise our timings to help mitigate this later. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Rework retries so we can do more than oneMichael Neuling1-7/+7
This reworks the pci link training retry code so that we can do more than one retry. This will now also print an error if a link fails to train. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Workaround phy lockup by doing full PHB reset on retryMichael Neuling1-1/+1
For PHB4 it's possible that the phy may end up in a bad state where it can no longer recieve data. This can manifest as the link not retraining. A simple PERST will not clear this. The PHB must be completely reset. This changes the retry state to CRESET to do this. This issue may also manifest itself as the link training in a degraded state (lower speed or narrower width). This patch doesn't attempt to fix that (will come later). Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Avoid recursive call into run state machineMichael Neuling1-1/+1
Currently we recursively call run_sm() in phb4_retry_state(). This is unnecessary and overly complex. This just returns with a small wait time. 1ms should be a very small over head compared to having to do the actual retry. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Only set one bit in nfirMichael Neuling1-1/+1
The MPIPL procedure says to only set bit 26 when forcing the PEC into freeze mode. Currently we set bits 24-27. This changes the code to follow spec and only set bit 26. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Fix order of pfir/nfir clearing in CRESETMichael Neuling1-3/+4
According to the workbook, pfir must be cleared before the nfir. The way we have it now causes the nfir to not clear properly in some error circumstances. This swaps the order to match the workbook. Also updates the comments to be clearer. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Remove incorrect state transitionMichael Neuling1-1/+0
When waiting in PHB4_SLOT_CRESET_WAIT_CQ for transations to end, we incorrectly move onto the next state. Generally we don't hit this as the transactions have ended already anyway. This removes the incorrect state transition. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Set default lane equalisationMichael Neuling1-18/+27
Set default lane equalisation if there is nothing in the device-tree. Default value taken from hdat and confirmed by hardware team. Neatens the code up a bit too. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Fix PHB4 fence recoveryBenjamin Herrenschmidt1-13/+20
We had a few problems: - We used the wrong register to trigger the reset (spec bug) - We should clear the PFIR and NFIR while the reset is asserted - ... and in the right order ! - We should only apply the DD1 workaround after the reset has been lifted. - We should ensure we use ASB whenever we are fenced or doing a CRESET - Make config ops write with ASB Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Use new accessors in a few placesBenjamin Herrenschmidt1-43/+43
This replaces use of MMIO registers with the new accessors in places that can be called during recovery procedures at times when the PHB can be fenced. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Add register access helpersBenjamin Herrenschmidt1-0/+16
Those will pick between ASB (ie, XSCOM) accesses and direct MMIO based on PHB flags, thus allowing transparent access whether the PHB is fenced or not. Mark as unused for now so we don't get a warning. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Verbose EEH optionsBenjamin Herrenschmidt1-24/+118
Enabled via nvram pci-eeh-verbose=true. ie. nvram -p ibm,skiboot --update-config pci-eeh-verbose=true Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-07-13phb4: Print more info when PHB fencesBenjamin Herrenschmidt1-7/+37
For now at PHBERR level. We don't have room in the diags data passed to Linux for these unfortunately. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26Big log level reduction...Benjamin Herrenschmidt1-11/+7
90% of what we print isn't useful to a normal user. This dramatically reduces the amount of messages printed by OPAL in normal circumstances. We still need to add a way to bump the log level at boot based on a BMC scratch register or some HDAT property. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb4: Mask out write-1-to-clear registers in RC cfgRussell Currey1-5/+39
The root complex config space only supports 4-byte accesses. Thus, when the client requests a smaller size write, we do a read-modify-write to the register. However, some register have bits defined as "write 1 to clear". If we do a RMW cycles on such a register and such bits are 1 in the part that the client doesn't intend to modify, we will accidentally write back those 1's and clear the corresponding bit. This avoids it by masking out those magic bits from the "old" value read from the register. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb4: Properly mask out link down errors during resetRussell Currey1-7/+20
Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb3/4: Silence a useless warningBenjamin Herrenschmidt1-1/+1
PHB's don't have base location codes on non-FSP systems and it's normal. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb3/4: Move IO VPD preload out to a common placeBenjamin Herrenschmidt1-11/+0
The code is duplicated between phb3 and phb4 for no reason Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> # Conflicts: # core/init.c # hw/phb3.c Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb4: Workaround bug in spec 053Benjamin Herrenschmidt1-2/+5
Wait for DLP PGRESET to clear *after* lifting the PCIe core reset Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-26phb4: DD2.0 updatesBenjamin Herrenschmidt1-113/+249
Support StoreEOI, full complements of PEs (twice as big TVT) and other updates. Also renumber init steps to match spec 063 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-21phb4: Harden init with bad PHBsMichael Neuling1-0/+8
Currently if we read all 1's from the EEH or IRQ capabilities, we end up train wrecking on some other random code (eg. an assert() in xive). This hardens the PHB4 code to look for these bad reads and more gracefully fails the init for that PHB alone. This allows the rest of the system to boot and ignore those bad PHBs. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-19capi: Handle HMI eventsChristophe Lombard1-0/+33
Find the CAPP on the chip associated with the HMI event for PHB4. The recovery mode (re-initialization of the capp, resume of functional operations) is only available with P9 DD2. A new patch will be provided to support this feature. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-19capi: Load capp microcodeChristophe Lombard1-1/+25
CAPP microcode flash download and CAPP upload for PHB4. A new file 'capp.c' is created to receive common capp code for PHB3 and PHB4. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-19capi: Enable capi mode for PHB4Christophe Lombard1-5/+354
Enable the Coherently attached processor interface. The PHB is used as a CAPI interface. CAPI Adapters can be connected to either PEC0 or PEC2. Single port CAPI adapter can be connected to either PEC0 or PEC2, but Dual-Port Adapter can be only connected to PEC2 CAPP0 attached to PHB0(PEC0 - single port) CAPP1 attached to PHB3(PEC2 - single or dual port) As we did for PHB3, a new specific file 'phb4-capp.h' is created to contain the CAPP register definitions. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-19Ensure P9 DD1 workarounds apply only to NimbusMichael Neuling1-6/+6
The workarounds for P9 DD1 are only needed for Nimbus. P9 Cumulus will be DD1 but don't need these same workarounds. This patch ensures the P9 DD1 workarounds only apply to Nimbus. It also renames some things to make clear what's what. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-19hw/phb4: Rework phb4_get_presence_state()Gavin Shan1-21/+11
There are two issues in current implementation: It should return errcode visibile to Linux, which has prefix OPAL_*. The code isn't very obvious. This returns OPAL_HARDWARE when the PHB is broken. Otherwise, OPAL_SUCCESS is always returned. In the mean while, It refactors the code to make it obvious: OPAL_PCI_SLOT_PRESENT is returned when the presence signal (low active) or PCIe link is active. Otherwise, OPAL_PCI_SLOT_EMPTY is returned. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-16hw/phys_map: Use GCIDs as a chip indexOliver O'Halloran1-6/+4
Currently we pass in a proc_chip structure to phys_map_get(). All we we really need from this structure is the Global Chip ID (GCID). This patch reworks the function so that we only need to pass the GCID which allows us to use it before the proc_chip structures have been initialised (i.e in the HDAT parser). Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Acked-By: Michael Neuling <mikey@neuling.org> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-15phb4: Error injection for config spaceRussell Currey1-1/+181
Implement CFG (config space) error injection. This works the same as PHB3. MMIO and DMA error injection require a rewrite, so they're unsupported for now. While it's not feature complete, this at least provides an easy way to inject an error that will trigger EEH. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-15phb4: Error clear implementationRussell Currey1-68/+57
In PHB3 there were separate recovery procedures depending on the class of error. PHB4 performs almost exactly the same steps in recovering from any class of error, so change phbX_err_ER_clear() to phbX_err_clear() for this implementation. Since the same sequence gets used, call this function in phb4_creset() - which is used to handle fatal (fence) errors - where it was not called in previous hardware revisions. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-15phb4: Mask link down errors during resetRussell Currey1-0/+8
During a hot reset the PCI link will drop, so we need to mask link down events to prevent unnecessary errors. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-15phb4: Implement root port initializationRussell Currey1-6/+11
phb4_root_port_init() was a NOP before, so fix that. Nothing PHB4-specific here. Something may be required in future. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-15phb4: Complete reset implementationRussell Currey1-2/+43
This implements complete reset (creset) functionality for POWER9 DD1. Only partially tested and contends with some DD1 errata, but it's a start. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-07phb4: Activate shared PCI slot on witherspoonFrederic Barrat1-1/+1
Witherspoon systems come with a 'shared' PCI slot: physically, it looks like a x16 slot, but it's actually two x8 slots connected to two PHBs of two different chips. Taking advantage of it requires some logic on the PCI adapter. Only the Mellanox CX5 adapter is known to support it at the time of this writing. This patch enables support for the shared slot on witherspoon if a x16 adapter is detected. Each x8 slot has a presence bit, so both bits need to be set for the activation to take place. Slot sharing is activated through a gpio. Note that there's no easy way to be sure that the card is indeed a shared-slot compatible PCI adapter and not a normal x16 card. Plugging a normal x16 adapter on the shared slot should be avoided on witherspoon, as the link won't train on the second slot, resulting in a timeout and a longer boot time. Only the first slot is usable and the x16 adapter will end up using only half the lines. If the PCI card plugged on the physical slot is only x8 (or less), then the presence bit of the second slot is not set, so this patch does nothing. The x8 (or less) adapter should work like on any other physical slot. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: re-org code, move into platform file] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-06phb4: Block D-state power management on direct slotsBenjamin Herrenschmidt1-4/+30
As current revisions of PHB4 don't properly handle the resulting L1 link transition. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-06phb4: Remove long unused CFG_4B_WORKAROUNDBenjamin Herrenschmidt1-37/+0
This was used for early broken simulators Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-06-06phb4: Call pci config filtersBenjamin Herrenschmidt1-0/+12
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-31core/pci: Rename pci_slot_op poll to run_smMichael Neuling1-1/+1
This renames the "poll" op to "run_sm" (short for run state machine). I think this is a better name since the function does a bunch of things like reseting the slot. Also it avoids confusion with the "poll_link" op which does something different (and can even be called from run_sm). No functional change. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-12phb4: Add an option for disabling EEH MMIO in nvramRussell Currey1-3/+8
Having the option to disable EEH for MMIO without rebuilding skiboot could be useful for testing, so check for pci-eeh-mmio=disabled in nvram. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-05-10phb4: Cleanup BAR inits and loggingMichael Neuling1-57/+20
We always assign BARs in phb4, so this removes the unnecessary force assign logic. This patch also cleanup the logging to make it less verbose. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>