aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2018-04-30SBE: Add timer supportVasant Hegde1-2/+183
SBE on P9 provides one shot programmable timer facility. We can use this to implement OPAL timers and hence limit the reliance on the Linux heartbeat (similar to HW timer facility provided by SLW on P8). Design: - We will continue to run Linux heartbeat. - Each chip has SBE. This patch always schedules timer on SBE on master chip. - Start timer option starts new timer or modifies an active timer for the specified timeout. - SBE expects timeout value in microseconds. We track timeout value in TB. Hence we convert tb to microseconds before sending request to SBE. - We are requesting ack from SBE for timer message. It gaurantees that SBE has scheduled timer. - Disabling SBE timer We expect SBE to send timer expiry interrupt whenever timer expires. We wait for 10 more ms before disabling timer. In future we can consider below alternative approaches: - Presently SBE timer disable is permanent (until we reboot system). SBE sends "I'm back" interrupt after reset. We can consider restarting timer after SBE reset. - Reset SBE and start timer again. - Each chip has SBE. On multi chip system we can try to schedule timer on different chip. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-30Move P8 timer code to separate fileVasant Hegde3-184/+206
Lets move P8 timer support code from slw.c to sbe-p8.c (as suggested by BenH). There is a difference between timer support in P8 and P9. Hence I think it makes sense to name it as sbe-p8.c. Note that this is pure code movement and renaming functions/variables. No functionality changes. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-30Add SBE driver supportVasant Hegde2-19/+679
SBE (Self Boot Engine) on P9 has two different jobs: - Boot the chip up to the point the core is functional - Provide various services like timer, scom, stash MPIPL, etc., at runtime OPAL can communicate to SBE via a set of data and control registers provided by the PSU block in P9 chip. - Four 8 byte registers for Host to send command packets to SBE - Four 8 byte registers for SBE to send response packets to Host - Two doorbell registers (1 on each side) to alert either party when data is placed in above mentioned data register Protocol constraints: Only one command is accepted in the command buffer until the response for the command is enqueued in the response buffer by SBE. Usage: We will use SBE for various purposes like timer, MPIPL, etc. This patch implements the SBE MBOX spec for OPAL to communicate with SBE. Design consideration: - Each chip has SBE. We need to track SBE messages per chip. Hence added per chip sbe structure and list of messages to that chip - SBE accepts only one command at a time. Hence serialized MBOX commands. - OPAL gets interrupted once SBE sets doorbell register - OPAL has to clear doorbell register after reading response - Every command class has timeout option. Timed out messages are discarded - SBE MBOX commands can be classified into four types : - Those that must be sent to the master only (ex: sending MDST/MDDT info) - Those that must be sent to slaves only (ex: continue MPIPL) - Those that must be sent to all chips (ex: close insecure window) - Those that can be sent to any chip (ex: timer) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29uart: fix uart_opal_flush to take console lock over uart_con_flushNicholas Piggin1-2/+8
Cc: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29xive: fix missing unlock in error pathStewart Smith1-0/+1
Found with sparse and some added lock annotations. CC: stable # 5.10+ Fixes: de82c2e0e Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29hw/slw: Don't assert on a unknown chipOliver O'Halloran1-2/+10
For some reason skiboot populates nodes in /cpus/ for the cores on chips that are deconfigured. As a result Linux includes the threads of those cores in it's set of possible CPUs in the system and attempts to set the SPR values that should be used when waking a thread from a deep sleep state. However, in the case where we have deconfigured chip we don't create a xscom node for that chip and as a result we don't have a proc_chip structure for that chip either. In turn, this results in an assertion failure when calling opal_slw_set_reg() since it expects the chip structure to exist. Fix this up and print an error instead. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-23npu2: Use ibm, loc-code rather than ibm, slot-labelOliver O'Halloran1-13/+7
The ibm,slot-label property is to name the slot that appears under a PCIe bridge. In the past we (ab)used the slot tables to attach names to GPU devices and their corresponding NVLinks which resulted in npu2.c using slot-label as a location code rather than as a way to name slots. Fix this up since it's confusing. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-23npu2/hw-procedures: fence bricks on GPU resetBalbir Singh1-7/+45
The NPU workbook defines a way of fencing a brick and getting the brick out of fence state. We do have an implementation of bringing the brick out of fenced/quiesced state. We do the latter in our procedures, but to support run time reset we need to do the former. The fencing ensures that access to memory behind the links will not lead to HMI's, but instead SUE's will be populated in cache (in the case of speculation). The expectation is then that prior to and after reset, the operating system components will flush the cache for the region of memory behind the GPU. This patch does the following: 1. Implements a npu2_dev_fence_brick() function to set/clear fence state 2. Clear FIR bits prior to clearing the fence status 3. Clear's the fence status 4. We take the powerbus out of CQ fence much later now, in credits_check() which is the last hardware procedure called after link training. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19hw/npu2.c: Remove static configuration of NPU2 registerAlistair Popple1-12/+12
The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to select NVLink mode, however Hostboot should configure other bits in this register. For some reason Skiboot was explicitly clearing bit-6 (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared as recent Hostboot versions explicitly set it to the correct value based on the specific system configuration. Therefore Skiboot should not alter it. Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or not. Hostboot does not configure this bit so Skiboot should continue to configure it. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19npu2: Improve log output of GPU-to-link mappingReza Arbab1-4/+4
Debugging issues related to unconnected NVLinks can be a little less irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog(). In short, change this: NPU2: comparing GPU 'GPU2' and NPU2 'GPU1' NPU2: comparing GPU 'GPU3' and NPU2 'GPU1' NPU2: comparing GPU 'GPU4' and NPU2 'GPU1' NPU2: comparing GPU 'GPU5' and NPU2 'GPU1' : npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem. to this: NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1' : NPU6:0:1.0 No PCI device found for slot 'GPU1' Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19sensors: Dont add DTS sensors when OCC inband sensors are availableShilpasri G Bhat1-6/+8
There are two sets of core temperature sensors today. One is DTS scom based core temperature sensors and the second group is the sensors provided by OCC. DTS is the highest temperature among the different temperature zones in the core while OCC core temperature sensors are the average temperature of the core. DTS sensors are read directly by the host by SCOMing the DTS sensors while OCC sensors are read and updated by OCC to main memory. Reading DTS sensors by SCOMing is a heavy and slower operation as compared to reading OCC sensors which is as good as reading memory. So dont add DTS sensors when OCC sensors are available. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Generate hmi event for recovered HDEC parity error.Mahesh Salgaonkar1-4/+7
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Stop flooding HMI event for TOD errors.Mahesh Salgaonkar1-2/+5
Fix the issue where every thread on the chip sends HMI event to host for TOD errors. TOD errors are reported to all the core/threads on the chip. Any one thread can fix the error and send event. Rest of the threads don't need to send HMI event unnecessarily. This patch fixes this by modifying __chiptod_recover_tod_errors() function to return -1 if no errors found. Without this change every thread that see TFMR[51]=1 sends HMI event to the host kernel. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Fix soft lockups during TOD errorsMahesh Salgaonkar1-2/+12
There are some TOD errors which do not affect working of TOD and TB. They stay in valid state. Hence we don't need rendez vous for TOD errors that does not affect TB working. TOD errors that affects TOD/TB will report a global error on TFMR[44] alongwith bit 51, and they will go in rendez vous path as expected. But the TOD errors that does not affect TB register sets only TFMR bit 51. The TFMR bit 51 is cleared when any single thread clears the TOD error. Once cleared, the bit 51 is reflected to all the cores on that chip. Any thread that reads the TFMR register after the error is cleared will see TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through rendez-vous path and threads that see TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in host kernel. This patch fixes this issue by not considering TOD interrupt (TFMR[51]) as a core-global error and hence avoiding rendez-vous path completely. Instead threads that see TFMR[51]=1 will now take different path that just do the TOD error recovery. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Do not send HMI event if no errors are found.Mahesh Salgaonkar1-1/+5
For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Rework HMI handling of TFAC errorsBenjamin Herrenschmidt1-75/+43
This patch reworks the HMI handling for TFAC errors by introducing 4 rendez-vous points improve the thread synchronization while handling timebase errors that requires all thread to clear dirty data from TB/HDEC register before clearing the errors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11phb4: Restore bus numbers after CRSMichael Neuling1-12/+1
Currently we restore PCIe bus numbers right after the link is up. Unfortunately as this point we haven't done CRS so config space may not be accessible. This moves the bus number restore till after CRS has happened. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11phb4: Enable the PCIe slotcap on pluggable slotsOliver O'Halloran1-0/+20
Enables reporting of slot status information, etc in the config space of the root complex. Currently this is only used to set the slot power limit in our generic PCI code, but we might use it for other things later on. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11interrupts: Create an "interrupts" property in the OPAL nodeBenjamin Herrenschmidt8-10/+10
Deprecate the old "opal-interrupts", it's still there, but the new property follows the standard and allow us to specify whether an interrupt is level or edge sensitive. Similarly create "interrupt-names" whose content is identical to "opal-interrupts-names". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11capi: Keep the current mmio windows in the mbt cache table.Christophe Lombard1-32/+36
When the phb is used as a CAPI interface, the current mmio windows list is cleaned before adding the capi and the prefetchable memory (M64) windows, which implies that the non-prefetchable BAR is no more configured. This patch allows to set only the mbt bar to pass capi mmio window and to keep, as defined, the other mmio values (M32 and M64). Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11npu2-opencapi: Fix 'link internal error' FIR, take 2Frederic Barrat1-4/+25
When setting up an opencapi link, we set the transport muxes first, then set the PHY training config register, which includes disabling nvlink mode for the bricks. That's the order of the init sequence, as found in the NPU workbook. In reality, doing so works, but it raises 2 FIR bits in the PowerBus OLL FIR Register for the 2 links when we configure the transport muxes. Presumably because nvlink is not disabled yet and we are configuring the transport muxes for opencapi. bit 60: link0 internal error bit 61: link1 internal error Overall the current setup ends up being correct and everything works, but we raise 2 FIR bits. So tweak the order of operations to disable nvlink before configuring the transport muxes. Incidentally, this is what the scripts from the opencapi enablement team were doing all along. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11npu2-opencapi: Fix 'link internal error' FIR, take 1Frederic Barrat1-3/+17
When we setup a link, we always enable ODL0 and ODL1 at the same time in the PHY training config register, even though we are setting up only one OTL/ODL, so it raises a "link internal error" FIR bit in the PowerBus OLL FIR Register for the second link. The error is harmless, as we'll eventually setup the second link, but there's no reason to raise that FIR bit. The fix is simply to only enable the ODL we are using for the link. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11occ: sensors-groups: Add DT properties to mark HWMON sensor groupsShilpasri G Bhat1-3/+14
Fix the sensor type to match HWMON sensor types. Add compatible flag to indicate the environmental sensor groups so that operations on these groups can be handled by HWMON linux interface. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11Disable stop states from OPALStewart Smith1-0/+23
On ZZ, stop4,5,11 are enabled for PHYP, even though doing so may cause problems with OPAL due to bugs in hcode. For other platforms, this isn't so much of an issue as we can just control stop states by the MRW. However the rebuild-the-world approach to changing values there is a bit annoying if you just want to rule out a specific stop state from being problematic. Provide an nvram option to override what's disabled in OPAL. The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2) You can set an NVRAM override with: nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF This nvram override will disable *all* stop states. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context initReza Arbab1-7/+8
A bad GPU or other condition may leave us with a subset of links that never get initialized. If an ATSD is sent to one of those bricks, it will never complete, leaving us waiting forever for a response: watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050] ... Modules linked in: nvidia_uvm(O) nvidia(O) CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2 task: c0000000285cfc00 task.stack: c000001fea860000 NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60 REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000 CFAR: c0000000000abdf4 SOFTE: 1 GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820 GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560 GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff GPR12: 0000000000008000 c000000003167e80 NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0 LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370 ATSDs are only sent to bricks which have a valid entry in the XTS_BDF table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless we make it all the way to creating a context for the BDF. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.Philippe Bergheaud1-19/+0
The cxl driver will set the capi value, like other drivers already do. Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10hw/imc: Add support to load imc catalog lid fileMadhavan Srinivasan1-0/+3
Add support to load the imc catalog from a lid file packaged as part of the system firmware. Lid number allocated is 0x80f00103.lid. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10phb4: Quieten and improve "Timeout waiting for electrical link"Benjamin Herrenschmidt1-3/+2
This happens normally if a slot doesn't have a working HW presence detect and relies instead of inband presence detect. The message we display is scary and not very useful unless ou are debugging, so quiten it up and change it to something more meaningful. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10hw/imc: Check for pause_microcode_at_boot() return statusMadhavan Srinivasan1-2/+7
pause_microcode_at_boot() loops through all the chip's ucode control block and pause the ucode if it is in the running state. But it does not fail if any of the chip's ucode is not initialised. Add code to return a failure if ucode is not initialized in any of the chip. Since pause_microcode_at_boot() is called just before attaching the IMC device nodes in imc_init(), add code to check for the function return. Fixes: 9750eee802f8d ('hw/imc: pause microcode at boot') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-09phb4: set TVT1 for tunneled operations in capi modePhilippe Bergheaud1-15/+7
The ASN indication is used for tunneled operations (as_notify and atomics). Tunneled operation messages can be sent in PCI mode as well as CAPI mode. The address field of as_notify messages is hijacked to encode the LPID/PID/TID of the target thread, so those messages should not go through address translation. Therefore bit 59 is part of the ASN indication. This patch sets TVT#1 in bypass mode when capi mode is enabled, to prevent as_notify messages from being dropped. Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-03xive: disable store EOI supportCédric Le Goater2-7/+24
Hardware has limitations which would require to put a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This is a killer for performance and the PHBs do not support the sync. So remove the store EOI for the moment, until hardware is improved. Also, while we are at changing the XIVE source flags, let's fix the settings for the PHB4s which should follow these rules : - SHIFT_BUG for DD10 - STORE_EOI for DD20 and if enabled - TRIGGER_PAGE for DDx0 and if not STORE_EOI Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-04-03phb4: Reset FIR/NFIR registers before PHB4 probeVaibhav Jain1-0/+4
The function phb4_probe_stack() resets "ETU Reset Register" to unfreeze the PHB before it performs mmio access on the PHB. However in case the FIR/NFIR registers are set while entering this function, the reset of "ETU Reset Register" wont unfreeze the PHB and it will remain fenced. This leads to failure during initial CRESET of the PHB as mmio access is still not enabled and an error message of the form below is logged: PHB#0000[0:0]: Initializing PHB4... PHB#0000[0:0]: Default system config: 0xffffffffffffffff PHB#0000[0:0]: New system config : 0xffffffffffffffff PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff PHB#0000[0:0]: Waiting for DLP PG reset to complete... <snip> PHB#0000[0:0]: Timeout waiting for DLP PG reset ! PHB#0000[0:0]: Initialization failed This is especially seen happening during the MPIPL flow where SBE would quiesces and fence the PHB so that it doesn't stomp on the main memory. However when skiboot enters phb4_probe_stack() after MPIPL, the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU reset is done. So to fix this issue the patch introduces new xscom writes to phb4_probe_stack() to reset the FIR/NFIR registers before performing ETU reset to enable mmio access to the PHB. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-04-03capi: Poll Err/Status register during CAPP recoveryVaibhav Jain1-17/+68
This patch updates do_capp_recovery_scoms() to poll the CAPP Err/Status control register, check for CAPP-Recovery to complete/fail based on indications of BITS-1,5,9 and then proceed with the CAPP-Recovery scoms iif recovery completed successfully. This would prevent cases where we bring-up the PCIe link while recovery sequencer on CAPP is still busy with casting out cache lines. In case CAPP-Recovery didn't complete successfully an error is returned from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 fenced and mark it as broken. The loop that implements polling of Err/Status register will also log an error on the PHB when it continues for more than 168ms which is the max time to failure for CAPP-Recovery. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/imc: don't access homer memory if it was not initialisedNicholas Piggin1-0/+3
This can happen under mambo, at least. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2: Add performance tuning SCOM initsReza Arbab1-1/+96
Peer-to-peer GPU bandwidth latency testing has produced some tunable values that improve performance. Add them to our device initialization. File these under things that need to be cleaned up with nice #defines for the register names and bitfields when we get time. A few of the settings are dependent on the system's particular NVLink topology, so introduce a helper to determine how many links go to a single GPU. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/npu2: Assign a unique LPARSHORTID per GPUAlistair Popple1-0/+1
This gets used elsewhere to index items in the XTS tables. Signed-off-by: Alistair Popple <alistair@popple.id.au> [arbab@linux.vnet.ibm.com: Added commit log] Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith2-24/+16
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27dts: Zero struct to avoid using uninitialised valueCyril Bur1-2/+2
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/imc: Don't dereference possible NULLCyril Bur1-1/+2
Fixes: CID 263056 and 263052 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they existCyril Bur1-5/+7
Fixes: ac4272bf ("fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt") Fixes: CID 263053 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2-opencapi: Fix memory leakCyril Bur1-1/+1
Fixes: CID 264267 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2: Fix possible NULL dereferenceCyril Bur1-3/+6
The follow pattern exists in several npu2 functions: struct phb *phb = pci_get_phb(phb_id); struct npu2 *p = phb_to_npu2_nvlink(phb); The problem is that pci_get_phb() can return NULL and phb_to_npu2_nvlink() dereferences its parameter. Coverity says that the return value of pci_get_phb() is checked 43 out of 46 times which suggests we should be more careful. Futhurmore, functions with the baddly placed call to phb_to_npu2_nvlink() do seem to check that the return value of pci_get_phb() isn't NULL, but this check would be too little too late. This patch just moves the call of phb_to_npu2_nvlink() to after the NULL check for the return value of pci_get_phb(). Affected functions are: opal_npu_map_lpar() opal_npu_init_context() opal_npu_destroy_context() Fixes: CID 264274, 264273, 264272, 264271, 264266, 264265 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27occ-sensors: Remove NULL checks after dereferenceCyril Bur1-2/+2
Both scale_sensor() and scale_energy() take the value to scale as a pointer. These functions do not NULL check the pointer before the first time they dereference it, which is fine since passing NULL would be completely pointless. Both functions do perform a pointless NULL check later on. This confuses coverity and really doesn't make much sense at all. Since calling these functions with NULL as the sensor parameter makes no sense, and currently theres a dereference before the check, just remove the check. Fixes: CID 264276 and 264275 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27occ: Set up OCC messaging even if we fail to setup pstatesStewart Smith1-8/+21
This means that we no longer hit this bug if we fail to get valid pstates from the OCC. [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 10.318805] Disabling lock debugging due to kernel taint [ 10.318808] Severe Machine check interrupt [Not recovered] [ 10.318812] NIP [000000003003e434]: 0x3003e434 [ 10.318813] Initiator: CPU [ 10.318815] Error type: Real address [Load/Store (foreign)] [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> [Additional changes from Shilpa] Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22xive: fix opal_xive_set_vp_info() error pathCédric Le Goater1-1/+1
In case of error, opal_xive_set_vp_info() will return without unlocking the xive object. This is most certainly a typo. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22npu2: Remove DD1 supportAndrew Donnellan2-85/+52
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of revision-specific code. Now that all our lab machines are DD2, we no longer test anything on DD1 and it's time to get rid of it. Remove DD1-specific code and abort probe if we're running on a DD1 machine. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22phb*: Remove the state field in the various phb structuresOliver O'Halloran4-47/+37
We've been carting around this field since the original p7ioc-phb code. As far as I can tell we never actually use it for anything other than checking if the PHB has been marked as broken or not. The _FENCED state is set in a few places, but we never use it in favour of just checking the MMIO register. This patch just replaces it with a boolean that indicates if the PHB has been marked as broken and removes the giant, mostly wrong, comment explaining it's usage that is copied and pasted into each phb header file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22SLW: Increase stop4-5 residency by 10xAkshay Adiga1-2/+2
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup and provide the subsequent data block to compute. The wakup latency accumulates over the run and shows up as a performance drop. Linux enters stop4/5 more aggressively for its wakeup latency. Increasing the residency from 1ms to 10ms makes the performance drop <1% Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-14dts: spl_wakeup: Remove all workarounds in the spl wakeup logicShilpasri G Bhat1-29/+1
We coded few workarounds in special wakeup logic to handle the buggy firmware. Now that is fixed remove them as they break the special wakeup protocol. As per the spec we should not de-assert beofre assert is complete. So follow this protocol. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-14npu2: Disable fast rebootReza Arbab1-0/+3
Fast reboot does not yet work right with the NPU. It's been disabled on NVLink and OpenCAPI machines. Do the same for NVLink2. This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") from the npu code to npu2. Cc: stable # 5.10.x Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>