aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2017-10-18occ-sensors : Add OCC inband sensor region to exportsShilpasri G Bhat1-1/+13
Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-18phb4: Fix PCIe GEN4 on DD2.1 and aboveMichael Neuling1-4/+3
In this change: eef0e197ab PHB4: Default to PCIe GEN3 on POWER9 DD2.00 We clamped DD2.00 parts to GEN3 but unfortunately this change also applies to DD2.1 and above. This fixes this to only apply to DD2.00. This also cleans up the documentation and printing. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-16Revert "npu2: Add vendor cap for IRQ testing"Alistair Popple1-28/+0
This reverts commit 9817c9e29b6fe00daa3a0e4420e69a97c90eb373 which seems to break setting the PCI dev flag and the link number in the PCIe vendor specific config space. This leads to the device driver attempting to re-init the DL when it shouldn't which can cause HMI's. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-16hw/imc: Fix the pvr (sub_id) for IMC Catalog loadMadhavan Srinivasan1-2/+2
Currently IMC catalog carry multiple dtbs in the pnor partition, one for each power9 major versions. And system pvr value (pvr_type and pvr_major version) is used as sub-id to load the right dtb from the partition. Since minor version of pvr is not used, mask it out. Reported-by: Shriya <shriyak@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-15hw/imc: pause microcode at bootMadhavan Srinivasan1-0/+24
IMC nest counters has both in-band (ucode access) and out of band access to it. Since not all nest counter configurations are supported by ucode, out of band tools are used to characterize other configuration. So it is prefer to pause the nest microcode at boot to aid the nest out of band tools. If the ucode not paused and OS does not have IMC driver support, then out to band tools will race with ucode and end up getting undesirable values. Patch to check and pause the ucode at boot. OPAL provides APIs to control IMC counters. OPAL_IMC_COUNTERS_INIT is used to initialize these counters at boot. OPAL_IMC_COUNTERS_START and OPAL_IMC_COUNTERS_STOP API calls should be used to start and pause these IMC engines. `doc/opal-api/opal-imc-counters.rst` details the OPAL APIs and their usage. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-15hw/imc: Use ARRAY_SIZE instead of static macroMadhavan Srinivasan1-1/+1
disable_unavailable_units() loops through nest_pmus array to filter out the unsupported nest units from the imc catalog dtb. Current code use a static macro ('MAX_NEST_UNITS') for array limit, instead use ARRAY_SIZE. This will avoid updates to static macro when updating the nest_pmus array. Fixes: 712837cedca06 ('skiboot/imc: Update the nest_pmus array with occ/gpe microcode uav updates') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-15xive: Fix VP free block group mode false-positive parameter checkNicholas Piggin1-1/+3
The check to ensure the buddy allocation idx is aligned to its allocation order was not taking into account the allocation split. This would result in opal_xive_free_vp_block failures despite giving the same value as returned by opal_xive_alloc_vp_block. E.g., starting then stopping 4 KVM guests gives the following pattern in the host: opal_xive_alloc_vp_block(5)=0x45000020 opal_xive_alloc_vp_block(5)=0x45000040 opal_xive_alloc_vp_block(5)=0x45000060 opal_xive_alloc_vp_block(5)=0x45000080 opal_xive_free_vp_block(0x45000020)=-1 opal_xive_free_vp_block(0x45000040)=0 opal_xive_free_vp_block(0x45000060)=-1 opal_xive_free_vp_block(0x45000080)=0 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-15hw/p8-i2c: Fix deadlock in p9_i2c_bus_owner_changeAnton Blanchard1-1/+1
When debugging a system where Linux was taking soft lockup errors, I noticed two CPUs were stuck in OPAL: CPU0 lock p8_i2c_recover opal_handle_interrupt CPU1 sync_timer cancel_timer p9_i2c_bus_owner_change occ_p9_interrupt xive_source_interrupt opal_handle_interrupt p8_i2c_recover() is a timer, and is stuck trying to take master->lock. p9_i2c_bus_owner_change() has taken master->lock, but then is stuck waiting for all timers to complete. We deadlock. Fix this by using cancel_timer_async(), as suggested by Oliver. Fixes: 201fd50f208d ("hw/p8-i2c: Fix OCC locking") Suggested-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Limit number of error loggingVasant Hegde1-8/+13
Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added error logging when buffer is full. In some corner cases kernel may call this function multiple time and we may endup logging error again and again. This patch fixes it by generating error log only once. I think this is enough to indicate something went wrong. Also with previous patch, once console buffer is full, OPAL is returning error to payload from fsp_console_write_buffer_space(). So payload will never call fsp_console_write(). Hence move error logging logic to right place. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Fix fsp_console_write_buffer_space() callVasant Hegde1-1/+35
Kernel calls fsp_console_write_buffer_space() to check console buffer space availability. If there is enough buffer space to write data, then kernel will call fsp_console_write() to write actual data. In some extreme corner cases (like one explained in commit c8a7535f) console becomes full and this function returns 0 to kernel (or space available in console buffer < next incoming data size). Kernel will continue retrying until it gets enough space. So we will start seeing RCU stalls. This patch keeps track of previous available space. If previous space is same as current means not enough space in console buffer to write incoming data. It may be due to very high console write operation and slow response from FSP -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs). If situation is not improved within 10 seconds means something went bad. Lets return OPAL_RESOURCE so that kernel can drop console write and continue. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart: reset timeout in fsp_console_write() path] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Close SOL session during R/RVasant Hegde1-3/+0
Presently we are not closing SOL and FW console sessions during R/R. Host will continue to write to SOL buffer during FSP R/R. If there is heavy console write operation happening during FSP R/R (like running `top` command inside console), then at some point console buffer becomes full. fsp_console_write_buffer_space() returns 0 (or less than required space to write data) to host. While one thread is busy writing to console, if some other threads tries to write data to console we may see RCU stalls (like below) in kernel. kernel call trace: ------------------ [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769) [ 2082.828365] Task dump for CPU 32: [ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884 [ 2082.828375] Workqueue: events dump_work_fn [ 2082.828376] Call Trace: [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable) [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0 [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98 [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250 [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0 [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0 [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130 [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68 Hence lets close SOL (and FW console) during FSP R/R. CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP/CONSOLE: Do not associate unavailable consoleVasant Hegde1-0/+11
Presently OPAL sends associate/unassociate MBOX command for all FSP serial console (like below OPAL message). We have to check console is available or not before sending this message. OPAL log: ------- [ 5013.227994012,7] FSP: Reassociating HVSI console 1 [ 5013.227997540,7] FSP: Reassociating HVSI console 2 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-11FSP: Disable PSI link whenever FSP tells OPAL about impending R/RVasant Hegde1-17/+8
Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went into reset before the CEC power down came in. But this is generic issue that can happen in normal shutdown path as well. Hence disable PSI link as soon as we detect FSP impending R/R. CC: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-10FSP/NVRAM: Handle "get vNVRAM statistics" commandVasant Hegde1-0/+41
FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM statistics. OPAL doesn't maintain any such statistics. Hence return FSP_STATUS_INVALID_SUBCMD. Sample OPAL log: [16944.384670488,3] FSP: Unhandled message eb0500 [16944.474110465,3] FSP: Unhandled message eb0500 [16945.111280784,3] FSP: Unhandled message eb0500 [16945.293393485,3] FSP: Unhandled message eb0500 With this patch, I don't think FSP will ever call "free vNVRAM" MBOX command. But to be safer side lets return FSP_STATUS_INVALID_SUBCMD for this MBOX command as well. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-10xscom: Do not print error message for 'chiplet offline' return valuesVasant Hegde1-0/+14
xscom_read/write operations returns CHIPLET_OFFLINE when chiplet is offline. Some multicast xscom_read/write requests from HBRT results in xscom operation on offline chiplet(s) and printing below warnings in OPAL console. [ 135.036327572,3] XSCOM: Read failed, ret = -14 [ 135.092689829,3] XSCOM: Read failed, ret = -14 This results in unnecessary bugs. Hence remove error message for multicast SCOM operations. Suggested-by: Daniel M Crowell <dcrowell@us.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-10ipmi: Convert common debug prints to traceVasant Hegde1-2/+2
OPAL logs messages for every IPMI request from host. Sometime OPAL console is filled with only these messages. This path is pretty stable now and we have enough logs to cover bad path. Hence lets convert these debug message to trace/info message. [ 1356.423958816,7] opal_ipmi_recv(cmd: 0xf0 netfn: 0x3b resp_size: 0x02) [ 1356.430774496,7] opal_ipmi_send(cmd: 0xf0 netfn: 0x3a len: 0x3b) [ 1356.430797392,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: Message sent to host [ 1356.431668496,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: IPMI MSG done Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-06nx-compress: PR_DEBUG not prerror in the normal caseStewart Smith1-1/+1
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-06capp: Add lid definitions for P9 DD-2.0 & DD-2.1Vaibhav Jain1-0/+4
Update fsp_lid_map to include CAPP ucode lids for phb4-chipid == 0x200d1 and phb4-chipid == 0x201d1 that corresponds to P9 DD-2.0 & DD-2.1 chips respectively. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-06hw/lpc-uart: read from RBR to clear character timeout interruptsJeremy Kerr1-0/+21
When using the aspeed SUART, we see a condition where the UART sends continuous character timeout interrupts. This change adds a (heavily commented) dummy read from the RBR to clear the interrupt condition on init. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-02phb4: Reassign link_retries counter in IODA purgeGuilherme G. Piccoli1-0/+2
Recently, a link_retries counter was added in pci/phb4 in order Skiboot can retry to train a link some times - default number of attempts to retrain a link is 3. Happens that, if during a regular boot process we exhaust the link retries and fail to train a PHB, the variable link_retries is stuck in 0. If a kdump happens later, a PHB reset procedure is triggered by Linux and, since we have a decrement-and-test in this variable, we end up setting it to -1; it's unsigned, hence we get an overflow. This patch fixes the issue by reassigning the default value to link_retries in every IODA purge. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-02phb4: Add additional adapter to retrain whitelistJohn W Walthour1-2/+3
The single port version of the ConnectX-5 has a different device ID 0x1017. Updated descriptions to match pciutils database. Signed-off-by: John Walthour <jwalthour@us.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-10-02phb4: make retry_whitelist staticStewart Smith1-1/+1
Silences sparse warning: hw/phb4.c:XX:20: warning: symbol 'retry_whitelist' was not declared. Should it be static? Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-28capi: Mask Psl Credit timeout error for P9Vaibhav Jain1-0/+4
Mask the PSL credit timeout error in CAPP FIR Mask register bit(46). As per the h/w team this error is now deprecated and shouldn't cause any fir-action for P9. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-28cpu: idle POWER9 power management implementationNicholas Piggin1-1/+1
Add pm idle support to POWER9. IPIs are implemented with doorbells. POWER9 can use the EC=ESL=0 (lite) stop when sreset is not available. EC=ESL=1 state with RL=3 is enabled when we have a sreset wakeup. Deep idle states are not implemented. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-27skiboot/imc: Update the nest_pmus array with occ/gpe microcode uav updatesMadhavan Srinivasan1-2/+5
OOC/gpe nest microcode maintains the list of individual nest units supported. Sync the recent updates to the UAV with nest_pmus array. For reference occ/gpr microcode link for the UAV: https://github.com/open-power/occ/blob/master/src/occ_gpe1/gpe1_24x7.h Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: add in reference to ucode] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-27phb4: Update link training documentationMichael Neuling1-0/+8
We added degraded link retries in: 3f936bae97 phb4: Retrain link if degraded but forgot to update the documentation. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-27phb4: Additional RXE_ARB: DEC Stage Valid Error fixMichael Neuling1-3/+3
In this recent fix: 8b4c7a3cef phb4: Mask RXE_ARB: DEC Stage Valid Error We worked around a problem but the workaround wasn't complete. Now that we have full documentation and details on the issue, we have additional registers we need to change inits on. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20hw/p8-i2c: Rework timeout handlingOliver O'Halloran1-13/+22
Currently we treat a timeout as a hard failure and will automatically fail any transations that hit their timeout. This results in unnecessarily failing I2C requests if interrupts are dropped, etc. Although these are bad things that we should log we can handle them better by checking the actual hardware status and completing the transation if there are no real errors. This patch reworks the timeout handling to check the status and continue the transaction if it can. if it can while logging an error if it detects a timeout due to a dropped interrupt. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20capi: CAPP recoveryChristophe Lombard1-5/+54
CAPP recovery is initiated when a CAPP Machine Check is detected. The capp recovery procedure is initiated via a Hypervisor Maintenance interrupt (HMI). CAPP Machine Check may arise from either an error that results in a PHB freeze or from an internal CAPP error with CAPP checkstop FIR action. An error that causes a PHB freeze will result in the link down signal being asserted. The system continues running and the CAPP and PSL will be re-initialized. Tests performed on some of the old/new hardware. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20npu2: Read slot label from the link nodeOliver O'Halloran1-3/+17
Binding GPU to emulated NPU PCI devices is done using the slot labels since the NPU devices do not have a patching slot node we need to copy the label in here. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20npu2: Copy link speed from the npu nodeOliver O'Halloran1-7/+10
This needs to be in the PCI device node so the speed of the NVLink can be passed to the GPU driver. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20fsp: Move common prints to traceMichael Neuling1-2/+2
These two prints just end up filling the skiboot logs on any machine that's been booted for more than a few hours. They have never been useful, so make them trace level. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-20phb4: Mask RXE_ARB: DEC Stage Valid ErrorMichael Neuling1-2/+2
Change the inits to mask out the RXE ARB: DEC Stage Valid Error (bit 370. This has been a fatal error but should be informational only. This update will be in the next version of the phb4 workbook. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19SLW: Removing timebase related flags for stop4Akshay Adiga1-2/+2
When a core enters stop4, it does not loose decrementer and time base. Hence removing flags OPAL_PM_DEC_STOP and OPAL_PM_TIMEBASE_STOP. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19SLW: Allow deep states if homer address is knownAkshay Adiga1-6/+17
Use a common variable has_wakeup_engine instead of has_slw to tell if the a) SLW image is populated in case of power8 b) CME image is populated in case of power9 Currently we expect CME to be loaded if homer address is known ( except for simulators) Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19SLW: Configure self-restore for HRMORAkshay Adiga1-0/+29
Make a stop api call using libpore to restore HRMOR register. HRMOR needs to be cleared so that when thread exits stop, they arrives at linux system_reset vector (0x100). Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19SLW: Add opal_slw_set_reg support for power9Akshay Adiga1-20/+40
This OPAL call is made from Linux to OPAL to configure values in various SPRs after wakeup from a deep idle state. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19Revert "hw/slw.c: Offline code still uses p8 bits"Stewart Smith1-3/+0
This reverts commit 0a2710381f34e6b4c03cff1fa76bc1b74f280ecd. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-19PHB4: Default to PCIe GEN3 on POWER9 DD2.00Stewart Smith1-0/+6
You can use the NVRAM override for DD2.00 screened parts. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-15npu2: hw-procedures: Add settings to PHY_RESETReza Arbab1-0/+10
Set a few new values in the PHY_RESET procedure, as specified by our updated programming guide documentation. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-15hw/slw.c: Offline code still uses p8 bitsBalbir Singh1-0/+3
I'm seeing an infinite loop while hot unplugging a CPU. This is a workaround till we do the right things for p9. May be a candidate for backporting The messages I see in an infinite loop are: [ 740.250192896,3] LIBPORE: Core ID = 20 is not within valid range of [0;15] [ 740.250230176,3] SLW: Failed to set spr for CPU 51 When trying to hotunplug core id 20. For now the patch just skips calling p8_pore* on p9 machines. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-15phb4: Use link if degradedMichael Neuling1-1/+7
In the recent change: 3f936bae97 phb4: Retrain link if degraded We retrain if the link is degraded. We do 3 retries to get an optimal link. Unfortunately if the last retry fails, we mark the PHB as bad and don't use it. Hence that PHB is lost even though it actually trained (just degraded). This fixes the problem by printing an error message (as below) but still marking the PHB as good. [ 7.179320404,3] PHB#0005[0:5]: LINK: Link degraded [ 8.387346665,3] PHB#0005[0:5]: LINK: Link degraded [ 10.078409137,3] PHB#0005[0:5]: LINK: Link degraded [ 11.281477269,3] PHB#0005[0:5]: LINK: Link degraded [ 11.283123885,3] PHB#0005[0:5]: LINK: Degraded but no more retries Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12phb4: Retrain link if degradedMichael Neuling1-1/+133
On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and below) the PCIe PHY can lockup causing training issues. This can cause a degradation in speed or width in ~5% of training cases (depending on the card). This is fixed in later chip revisions. This issue can also cause PCIe links to not train at all, but this case is already handled. This patch checks if the PCIe link has trained optimally and if not, does a full PHB reset (to fix the PHY lockup) and retrain. One complication is some devices are known to train degraded unless device specific configuration is performed. Because of this, we only retrain when the device is in a whitelist. All devices in the current whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon. We always gather information on the link and print it in the logs even if the card is not in the whitelist. For testing purposes, there's an nvram to retry all PCIe cards and all P9 chips when a degraded link is detected. The new option is 'pci-retry-all=true' which can be set using: nvram -p ibm,skiboot --update-config pci-retry-all=true This option may increase the boot time if used on a badly behaving card. Signed-off-by: Michael Neuling <mikey@neuling.org> [stewart@linux.vnet.ibm.com: fix Cumulus VERS_MAJ r.e. Mikey mail] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12phb4: Make link retries a #defineMichael Neuling1-1/+1
Make link retries a #define rather than open coding it in the PHB4 init code. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12phb4: Split phb4_get_link_state() into a new functionMichael Neuling1-6/+21
Split phb4_get_link_state() into a new function so that it can be reused to get info on the speed and width of the link. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12phb4: Move nvram read of pci-eeh-mmio initMichael Neuling1-1/+3
Move nvram read to the PHB4 init code so that's it's only read once, rather than every time we go though PHB reset. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12phb4: Remove stable retriesMichael Neuling1-7/+0
This code was never used (since retries is set to 0), it's not very useful and it makes the code harder to read. So lets just remove it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12xive: Fix opal_xive_dump_tm() to access W2 properlyBenjamin Herrenschmidt1-1/+7
The HW only supported limited access sizes. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12npu2: Add vendor cap for IRQ testingSam Bobroff1-0/+28
Provide a way to test recoverable data link interrupts via a new vendor capability byte. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> ====== v2 -> v3: ====== * Corrected name of NPU RING (no 2). [Andrew Donnellan] * Corrected spelling of device. [Andrew Donnellan] hw/npu2.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2017-09-12npu2: Enable recoverable data link (no-stall) interruptsSam Bobroff1-15/+121
Allow the NPU2 to trigger "recoverable data link" interrupts. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>