aboutsummaryrefslogtreecommitdiff
path: root/hw/phb4.c
AgeCommit message (Collapse)AuthorFilesLines
2021-06-24phb4: Avoid MMIO load freeze escalation on every chipMahesh Salgaonkar1-1/+5
[ Upstream commit d51eb6f95e7078235ba2217e2dc9fc53e65bc902 ] The commit f397cc30bdf8 ("phb4: Only escalate freezes on MMIO load where necessary") introduced a change to restrict escalation to the chips that actually need it. However it missed one case which still causes the escalation on every chip. This affects EEH recovery to cause full PHB reset on some chips which is not necessary. This patch fixes that. Also, add a check for p9 chip in phb4_escalation_required() function. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24phb4: Disable TCE cache line bufferFrederic Barrat1-0/+1
[ Upstream commit 15b93a301509ba7813343540e25b47ba395674b9 ] This patch implements a circumvention for HW557787. It disables the TCE cache line buffer as, under heavy loads, there's a possibility of an entry being re-allocated incorrectly. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06Revert "mowgli: Limit slot1 to Gen3 by default"LuluTHSu1-20/+0
[ Upstream commit de20b93849c3cdee62ff066e079b5460737e8609 ] This reverts commit 5262cdd1b99f77bca5951fc8132f9795ef0c2b87. When link reset/retrain, this method cannot maintain the max-link-speed limit, so remove it. Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-11-02phb4: Finish removing P9 DD1 workaround on LSIsCédric Le Goater1-4/+1
Commit ad7e9a67c4e4 ("xive/p9: obsolete OPAL_XIVE_IRQ_SHIFT_BUG flags") forgot to remove the internal flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-11-02mowgli: Limit slot1 to Gen3 by defaultLuluTHSu1-0/+21
Refer to the spec. of mowgli, limit the slot to Gen3 speed. For mowgli platform spec. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-28hw/phb4: Verify AER support before initialising AER regsOliver O'Halloran1-0/+3
Check the AER capability offset pointer is non-zero before enabling the AER messages. If the device doesn't support AER we end up writing garbage to config offset 0x0 + PCIECAP_AER_CAPCTL, or 0x18. For a normal device this is one of the BARs so this doesn't do much, but for a bridge this results in overriding: 0x18 - The primary bus number 0x19 - The secondary bus number 0x1A - The subordinate bus number 0x1B - The latency timer 0x1B is hardwired to zero for PCIe devices, but overwriting the bus number register can cause issues with routing of config space accesses. It's worth pointing out that we write actual values for the secondary and subordinate bus numbers before scanning the secondary bus, but the primary bus number is never restored. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-08-28hw/phb4: Actually enable error reportingOliver O'Halloran1-0/+1
PHB3 had an errata about correctable errors and when Ben was doing the initial PHB4 port he deleted the corresponding config write to DEVCTL. Whoops. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Enable error interruptsOliver O'Halloran1-1/+39
In PHB4 the PHB's error and informational interrupts were changed to behave more like actual LSIs. On PHB3 these interrupts would be only be raised on a 0 -> 1 transition of an error status bits (i.e. they were rising edge triggered). On PHB4 the error interrupts are "true" LSIs and will be re-raised as long the underlying error status bit is set. This causes a headache for us because OPAL's PHB error handling model requires Skiboot to preserve the state of the PHB (including errors) until the kernel is ready to handle the error. As a result we can't do anything in Skiboot to handle the interrupt condition and we need to mask the error internally. We can do this by clearing the relevant bits in the IRQ_ENABLE registers of the PHB. It's worth pointing out that we don't want to mask the interrupt by setting the Q bit in the XIVE ESBs. The ESBs are owned by the OS which may be masking and unmasking the interrupt for its own reasons (e.g. migrating IRQs). Skiboot modifying the ESB state could potentially cause problems and should be avoided. Cc: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Factor out interrupt setupOliver O'Halloran1-5/+11
Move the unmasking (enabling) of the various PHB error and informational interrupts out of the main init sequence. We'll need this elsewhere to enable the PHB error interrupts. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Don't disable TXE(12) interrupt if on P9 DD2.0Oliver O'Halloran1-4/+1
Commit 7dbf80d1db45 ("phb4: Generate checkstop on AIB ECC corr/uncorr for DD2.0 parts") changed the PHB inits so that on DD2.0 TXE error bit 12 would cause a checkstop. The patch also changes the TXE_ERR_IRQ_ENABLE settings to prevent this bit from causing a PHB error interrupt. However, there's not much point in doing this since the system is going to checkstop anyway. Removing the code to disable the interrupt simplifies the situation a bit and avoids conflating FIR propagation with the normal PHB error interrupts. The PHB spec is actively confusing in this area since it describes the TXE Error summary bit in the LEM FIR as an "interrupt" even though it's completely seperate to the PHB's LSI error reporting interrupt. Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Fix interrupt namesOliver O'Halloran1-0/+17
Linux doesn't seem to parse the interrupt-names property when there are unnamed (zero length string) interrupts. Add a name callback to the interrupt source and go from there. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-06-03hw/phb4: Make error interrupt handler compileOliver O'Halloran1-3/+1
When phb4.c was copied from phb3.c the error interrupts were disabled by default and apparently never re-enabled. Remove the #if 0 block and call phb4_set_err_pending() rather than the phb3 version. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-30hw/phb4: Tune GPU direct performance on witherspoon in PCI modeFrederic Barrat1-24/+29
Good GPU direct performance on witherspoon, with a Mellanox adapter on the shared slot, requires to reallocate some dma engines within PEC2, "stealing" some from PHB4&5 and giving extras to PHB3. It's currently done when using CAPI mode. But the same is true if the adapter stays in PCI mode. In preparation for upcoming versions of MOFED, which may not use CAPI mode, this patch reallocates dma engines even in PCI mode for a series of Mellanox adapters that can be used with GPU direct, on witherspoon and on the shared slot only. The loss of dma engines for PHB4&5 on witherspoon has not shown problems in testing, as well as in current deployments where CAPI mode is used. Here is a comparison of the bandwidth numbers seen with the PHB in PCI mode (no CAPI) with and without this patch. Variations on smaller packet sizes can be attributed to jitter and are not that meaningful. # OSU MPI-CUDA Bi-Directional Bandwidth Test v5.6.1 # Send Buffer on DEVICE (D) and Receive Buffer on DEVICE (D) # Size Bandwidth (MB/s) Bandwidth (MB/s) # with patch without patch 1 1.29 1.48 2 2.66 3.04 4 5.34 5.93 8 10.68 11.86 16 21.39 23.71 32 42.78 49.15 64 85.43 97.67 128 170.82 196.64 256 385.47 383.02 512 774.68 755.54 1024 1535.14 1495.30 2048 2599.31 2561.60 4096 5192.31 5092.47 8192 9930.30 9566.90 16384 18189.81 16803.42 32768 24671.48 21383.57 65536 28977.71 24104.50 131072 31110.55 25858.95 262144 32180.64 26470.61 524288 32842.23 26961.93 1048576 33184.87 27217.38 2097152 33342.67 27338.08 Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Cc: skiboot-stable@lists.ozlabs.org # skiboot-op940.x Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-03-12Re-license contributions from Raptor Computer SystemsOliver O'Halloran1-1/+1
The following files contain contributions from Timothy Pearson at Raptor Computer Systems. He has agreed to re-license these contributions as Dual Apache 2.0 / GPLv2+, so amend the SPDX tag to reflect that. hw/phb4.c include/phb4.h include/platform.h platforms/astbmc/talos.c platforms/astbmc/romulus.c Cc: Timothy Pearson <tpearson@raptorengineering.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-12-16dt: assorted cleanupsNicholas Piggin1-16/+12
This replaces several instances dt accesses with higher level primitives throughout the tree. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-12-16phb4: make endian-cleanNicholas Piggin1-166/+176
Convert phb4 dt construction and in-memory hardware tables to use explicit endian conversions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-12-16opal-api: add endian conversions to most opal callsNicholas Piggin1-3/+3
This adds missing endian conversions to most calls, sufficient at least to handle calls from a kernel booting on mambo. Subsystems requiring more extensive changes (e.g., xive) will be done with individual changes. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-12-04phb4: Add PHB options get/set OPAL callsAlexey Kardashevskiy1-0/+59
These are new OPAL calls to tweak various PHB parameters. The first two are: - TVT Select 'GTE4GB' Option of the PHB control register to enable use of the second TVE for DMA trafic just above 4GB; - MMIO EEH Disable to disable EEH for all MMIO commands. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-11Remove dead POWER7 codeNicholas Piggin1-11/+0
There are a number of proc_gen branches removed that are trivially dead code and comments that refer to P7. As well as those: - Oliver points out that add_xics_icps() must be unused on POWER8 because it asserts if number of threads > 4, so remove it. - Change 16b7ae641 ("Remove POWER7 and POWER7+ support") removed all references to opal_boot_trampoline, so remove that. - It also removed the only non-trival choose_bus implementation, so that is removed and its caller simplified. - Remove the paca code, later CPUs use pcia. Cc: Stewart Smith <stewart@flamingspork.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-11-04xive/p9: obsolete OPAL_XIVE_IRQ_SHIFT_BUG flagsCédric Le Goater1-0/+1
These were needed to workaround HW bugs in PHB4 LSIs of POWER9 DD1.0 processors. HW395455 P9/PHB4: Wrong Interrupt ESB CI Load Opcode Location in 64K page mode Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-23pci: recheck pci nvram hacks on fast-rebootOliver O'Halloran1-2/+0
Sometimes it's useful to fiddle with some of the PCI NVRAM options that we have. Currently this is mostly for enabling and disabling pci-tracing mode, but having a common place for this stuff is a good idea. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2019-08-16hw/phb4: Use standard MIN/MAX macro definitionsJordan Niethe1-6/+3
The max() macro definition incorrectly returns the minimum value. The max() macro is used to ensure that PERST has been asserted for 250ms and that we wait 100ms seconds for the ETU logic in the CRESET_START PHB4 PCI slot state. However, by returning the minimum value there is no guarantee that either of these requirements are met. Correct macro definitions for MIN and MAX are already provided in skiboot.h. Remove the redundant/incorrect versions here and switch to using the standard ones. Fixes: 70edcbb4b39d ("hw/phb4: Skip FRESET PERST when coming from CRESET") Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16hw/phb4: Prevent register accesses when in resetOliver O'Halloran1-0/+10
While the the ETU is in reset we cannot access any of the PHB registers. If a PHB register is accessed via the XSCOM indirect interface then we'll cause an ETU reset error which may prevent the PHB from being re-initialised once the reset is lifted. Prevent register accesses while in reset by adding a flag that is set while the ETU reset bit is high and checking that flag in the XSCOM (ASB) backdoor register access path. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-08-16pci: Use a macro for accessing PCI BDF Bus NumberJordan Niethe1-3/+3
Currently when the Bus Number bits of a BDF are needed the bit operations to get it are free coded. There are many places where the Bus Number is used, so make a macro to use instead of free coding it everytime. Signed-off-by: Jordan Niethe <jniethe5@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-07-26SPDX-ify all skiboot codeStewart Smith1-17/+4
Use Software Package Data Exchange (SPDX) to indicate license for each file that is unique to skiboot. At the same time, ensure the (C) who and years are correct. See https://spdx.org/ Signed-off-by: Stewart Smith <stewart@linux.ibm.com> [oliver: Added a few missing files] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-06-27pci: Make the pci-eeh-verbose nvram option genericOliver O'Halloran1-7/+3
We currently have the "pci-eeh-verbose" NVRAM flag that causes phb4 to print a register dump when it detects the PHB has been fenced. This is useful for debugging most EEH issues since the kernel may not be ready to handle EEH events when the problem is first detected. There's no real reason this needs to be specific to PHB4 so this patch moves the nvram flag handling into the generic init path (along with the pcie_max_link_speed flag) so we can add a similar function for PHB3. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Move FSP-specific VPD functionality to platforms/ibm-fsp/Stewart Smith1-1/+2
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Make phb4_training_trace() more generalOliver O'Halloran1-19/+25
phb4_training_trace() is used to monitor the Link Training Status State Machine (LTSSM) of the PHB's data link layer. Currently it is only used to observe the LTSSM while bringing up the link, but sometimes it's useful to see what's occurring in other situations (e.g. link disable, or secondary bus reset). This patch renames it to phb4_link_trace() and allows the target LTSSM state and a flexible timeout to help in these situations. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Set trace enable where it's usedOliver O'Halloran1-12/+22
The current LTSSM state was added to the PHB4 link trace output in 961547bceed3 ("phb4: Enhanced PCIe training tracing"). That patch split enabling the LTSSM state output from the rest of the tracing code in phb4_training_trace() to ensure that it would capture events from right after PERST is lifted. This is not really necessary since LTSSM state changes occur over milliseconds. We lose nothing by delaying the enable slightly so this patch moves it into phb4_training_trace() to keep the tracing code in one place. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Add missing LTSSM statesOliver O'Halloran1-0/+6
The "disabled" and "loopback" states are missing from the table. We never expect to see the second, but the first does occasionally come up. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Use read/write_reg in assert_perstOliver O'Halloran1-2/+2
While the PHB is fenced we can't use the MMIO interface to access PHB registers. While processing a complete reset we inject a PHB fence to isolate the PHB from the rest of the system because the PHB won't respond to MMIOs from the rest of the system while being reset. We assert PERST after the fence has been erected which requires us to use the XSCOM indirect interface to access the PHB registers rather than the MMIO interface. Previously we did that when asserting PERST in the CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST control"). This was re-written to use the raw in_be64() accessor. This means that CRESET would not be asserted in the reset path. On some Mellanox cards this would prevent them from re-loading their firmware when the system was fast-reset. This patch fixes the problem by replacing the raw {in|out}_be64() accessors with the phb4_{read|write}_reg() functions. Reported-by: Carol L Soto <clsoto@us.ibm.com> Fixes: b8b4c79d4419 ("hw/phb4: Factor out PERST control") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Carol L Soto <clsoto@us.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Assert Link Disable bit after ETU initOliver O'Halloran1-0/+6
The cursed RAID card in ozrom1 has a bug where it ignores PERST being asserted. The PCIe Base spec is a little vague about what happens while PERST is asserted, but it does clearly specify that when PERST is de-asserted the Link Training and Status State Machine (LTSSM) of a device should return to the initial state (Detect) defined in the spec and the link training process should restart. This bug was worked around in 9078f8268922 ("phb4: Delay training till after PERST is deasserted") by setting the link disable bit at the start of the FRESET process and clearing it after PERST was de-asserted. Although this fixed the bug, the patch offered no explaination of why the fix worked. In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable workaround was moved into phb4_assert_perst(). This is called always in the CRESET case, but a following patch resulted in assert_perst() not being called if phb4_freset() was entered following a CRESET since p->skip_perst was set in the CRESET handler. This is bad since a side-effect of the CRESET is that the Link Disable bit is cleared. This, combined with the RAID card ignoring PERST results in the PCIe link being trained by the PHB while we're waiting out the 100ms ETU reset time. If we hack skiboot to print a DLP trace after returning from phb4_hw_init() we get: PHB#0001[0:1]: Initialization complete PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0 PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0 PHB#0001[0:1]: CRESET: wait_time = 100 PHB#0001[0:1]: FRESET: Starts PHB#0001[0:1]: FRESET: Prepare for link down PHB#0001[0:1]: FRESET: Assert skipped PHB#0001[0:1]: FRESET: Deassert PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0 PHB#0001[0:1]: TRACE: Reached target state PHB#0001[0:1]: LINK: Start polling PHB#0001[0:1]: LINK: Electrical link detected PHB#0001[0:1]: LINK: Link is up PHB#0001[0:1]: LINK: Went down waiting for stabilty PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000 PHB#0001[0:1]: CRESET: Starts What has happened here is that the link is trained to 8x Gen3 33ms after we return from phb4_init_hw(), and before we've waitined to 100ms that we normally wait after re-initialising the ETU. When we "deassert" PERST later on in the FRESET handler the link in L0 (normal) state. At this point we try to read from the Vendor/Device ID register to verify that the link is stable and immediately get a PHB fence due to a PCIe Completion Timeout. Skiboot attempts to recover by doing another CRESET, but this will encounter the same issue. This patch fixes the problem by setting the Link Disable bit (by calling phb4_assert_perst()) immediately after we return from phb4_init_hw(). This prevents the link from being trained while PERST is asserted which seems to avoid the Completion Timeout. With the patch applied we get: PHB#0001[0:1]: Initialization complete PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled PHB#0001[0:1]: CRESET: wait_time = 100 PHB#0001[0:1]: FRESET: Starts PHB#0001[0:1]: FRESET: Prepare for link down PHB#0001[0:1]: FRESET: Assert skipped PHB#0001[0:1]: FRESET: Deassert PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0 PHB#0001[0:1]: TRACE: Reached target state PHB#0001[0:1]: LINK: Start polling PHB#0001[0:1]: LINK: Electrical link detected PHB#0001[0:1]: LINK: Link is up PHB#0001[0:1]: LINK: Link is stable PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3 PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08 PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000 Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03Remove remnants of OPAL_PCI_GET_PHB_DIAG_DATAStewart Smith1-1/+0
Never present in a public OPAL release, and only kernels prior to 3.11 would ever attempt to call it. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20hw/phb4: Make pci-tracing print at PR_NOTICEOliver O'Halloran1-4/+7
When pci-tracing is enabled we print each trace status message and the final trace status at PR_ERROR. The final status messages are similar to those printed when we fail to train in the non-pci-tracing path and this has resulted in spurious op-test failures. This patch reduces the log-level of the tracing message to PR_NOTICE so they're not accidently interpreted as actual error messages. PR_NOTICE messages are still printed to the console during boot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15nvram: Flag dangerous NVRAM optionsMichael Neuling1-6/+5
Most nvram options used by skiboot are just for debug or testing for regressions. They should never be used long term. We've hit a number of issues in testing and the field where nvram options have been set "temporarily" but haven't been properly cleared after, resulting in crashes or real bugs being masked. This patch marks most nvram options used by skiboot as dangerous and prints a chicken to remind users of the problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-02hw/phb4: Fix references to PHB3Oliver O'Halloran1-2/+2
Currently most of the functionality of phb4_lsi_attributes() is disabled when we have #defined DISABLE_ERR_INTS. This is the default behaviour and #undefing the constant results in skiboot not compiling because the code was not updated when it was copied across from PHB3. This patch fixes the problem by changing the names to the phb4 versions. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-29hw/phb4: Read ibm,loc-code from PBCQ nodeOliver O'Halloran1-2/+2
On P9 the PBCQs are subdivided by stacks which implement the PCI Express logic. When phb4 was forked from phb3 most of the properties that were in the pbcq node moved into the stack node, but ibm,loc-code was not one of them. This patch fixes the phb4 init sequence to read the base location code from the PBCQ node (parent of the stack node) rather than the stack node itself. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-04-17hw/phb4: Squash the IO bridge windowOliver O'Halloran1-0/+8
The PCI-PCI bridge spec says that bridges that implement an IO window should hardcode the IO base and limit registers to zero. Unfortunately, these registers only define the upper bits of the IO window and the low bits are assumed to be 0 for the base and 1 for the limit address. As a result, setting both to zero can be mis-interpreted as a 4K IO window. This patch fixes the problem the same way PHB3 does. It sets the IO base and limit values to 0xf000 and 0x1000 respectively which most software interprets as a disabled window. lspci before patch: 0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode]) I/O behind bridge: 00000000-00000fff lspci after patch: 0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode]) I/O behind bridge: None Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28hw/phb4: Drop FRESET_DEASSERT_DELAY stateOliver O'Halloran1-5/+0
The delay between the ASSERT_DELAY and DEASSERT_DELAY states is set to one timebase tick. This state seems to have been a hold over from PHB3 where it was used to add a 1s delay between de-asserting PERST and polling the link for the CAPI FPGA. There's no requirement for that here since the link polling on PHB4 is a bit smarter so we should be fine. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28hw/phb4: Factor out PERST controlOliver O'Halloran1-28/+36
Some time ago Mikey added some code work around a bug we found where a certain RAID card wouldn't come back again after a fast-reboot. The workaround is setting the Link Disable bit before asserting PERST and clear it after de-asserting PERST. Currently we do this in the FRESET path, but not in the CRESET path. This patch moves the PERST control into its own function to reduce duplication and to the workaround is applied in all circumstances. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28hw/phb4: Remove FRESET presence checkOliver O'Halloran1-12/+2
When we do an freset the first step is to check if a card is present in the slot. However, this only occurs when we enter phb4_freset() with the slot state set to SLOT_NORMAL. This occurs in: a) The creset path, and b) When the OS manually requests an FRESET via an OPAL call. a) is problematic because in the boot path the generic code will put the slot into FRESET_START manually before calling into phb4_freset(). This can result in a situation where a device is detected on boot, but not after a CRESET. I've noticed this occurring on systems where the PHB's slot presence detect signal is not wired to an adapter. In this situation we can rely on the in-band presence mechanism, but the presence check will make us exit before that has a chance to work. Additionally, if we enter from the CRESET path this early exit leaves the slot's PERST signal being left asserted. This isn't currently an issue, but if we want to support hotplug of devices into the root port it will be. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28hw/phb4: Skip FRESET PERST when coming from CRESETOliver O'Halloran1-1/+23
PERST is asserted at the beginning of the CRESET process to prevent the downstream device from interacting with the host while the PHB logic is being reset and re-initialised. There is at least a 100ms wait during the CRESET processing so it's not necessary to wait this time again in the FRESET handler. This patch extends the delay after re-setting the PHB logic to extend to the 250ms PERST wait period that we typically use and sets the skip_perst flag so that we don't wait this time again in the FRESET handler. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-03-28hw/phb4: Look for the hub-id from in the PBCQ nodeOliver O'Halloran1-3/+9
The hub-id is stored in the PBCQ node rather than the stack node so we never add it to the PHB node. This breaks the lxvpd slot lookup code since the hub-id is encoded in the VPD record that we need to find the slot information. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-25hw/phb4: Fix indentation of brdgCtlOliver O'Halloran1-2/+1
Come on bridge control register. You're letting the team down. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-26Retry link training at PCIe GEN1 if presence detected but training ↵Timothy Pearson1-13/+46
repeatedly failed Certain older PCIe 1.0 devices will not train unless the training process starts at GEN1 speeds. As a last resort when a device will not train, fall back to GEN1 speed for the last training attempt. This is verified to fix devices based on the Conexant CX23888 on the Talos II platform. Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com> [stewart: cut P9NDD1.0 support, fixup dt_max_link_speed] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-18opal: Deprecate reading the PHB statusAlexey Kardashevskiy1-7/+2
The OPAL_PCI_EEH_FREEZE_STATUS call takes a bunch of parameters, one of them is @phb_status. It is defined as __be64* and always NULL in the current Linux upstream but if anyone ever decides to read that status, then the PHB3's handler will assume it is struct OpalIoPhb3ErrorData* (which is a lot bigger than 8 bytes) and zero it causing the stack corruption; p7ioc-phb has the same issue. This removes @phb_status from all eeh_freeze_status() hooks and moves the error message from PHB4 to the affected OPAL handlers. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Reviewed-By: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-18phb4: Update some commentsOliver O'Halloran1-19/+13
I now know what an IODA cache is and I'm not happy about it. With the power of Comments™ you too can share the misery. Remove the big WARNING about the P8 specific hardware bug while we're here. That seems to have been copied over from phb3.c and no one thought about it too hard. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-18phb4: Eliminate peltv_cacheOliver O'Halloran1-18/+12
The PELT-V is also an in-memory table and there is no reason to have two copies of it. Removing the cache shaves another 128KB off the size of each struct phb4. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-18phb4: Eliminate p->rte_cacheOliver O'Halloran1-22/+15
In ancient times we added a caches to struct phb3 for some of the IODA tables which can only be accessed in-directly via XSCOM. A cache for the Requester Translation Table (RTT) was also added even though this is an in-memory table. This was carried over to PHB4 when Ben did the initial copy and paste, but it's still largely pointless. There's no real need to have a second copy of the table. This patch removes the "cache" and changes all the users to reference the RTT directly if we need to. This reduces the size of the struct phb4 by 128KB. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-02-18phb4: Remove pointless NULL checksOliver O'Halloran1-12/+2
When we allocate the various in-memory tables we assert() on the allocation. There's no point in checking if the table pointer is NULL or not at runtime. Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>