aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2018-04-10npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context initReza Arbab1-7/+8
A bad GPU or other condition may leave us with a subset of links that never get initialized. If an ATSD is sent to one of those bricks, it will never complete, leaving us waiting forever for a response: watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050] ... Modules linked in: nvidia_uvm(O) nvidia(O) CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2 task: c0000000285cfc00 task.stack: c000001fea860000 NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60 REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000 CFAR: c0000000000abdf4 SOFTE: 1 GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820 GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560 GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff GPR12: 0000000000008000 c000000003167e80 NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0 LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370 ATSDs are only sent to bricks which have a valid entry in the XTS_BDF table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless we make it all the way to creating a context for the BDF. Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.Philippe Bergheaud1-19/+0
The cxl driver will set the capi value, like other drivers already do. Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10hw/imc: Add support to load imc catalog lid fileMadhavan Srinivasan1-0/+3
Add support to load the imc catalog from a lid file packaged as part of the system firmware. Lid number allocated is 0x80f00103.lid. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10phb4: Quieten and improve "Timeout waiting for electrical link"Benjamin Herrenschmidt1-3/+2
This happens normally if a slot doesn't have a working HW presence detect and relies instead of inband presence detect. The message we display is scary and not very useful unless ou are debugging, so quiten it up and change it to something more meaningful. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-By: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-10hw/imc: Check for pause_microcode_at_boot() return statusMadhavan Srinivasan1-2/+7
pause_microcode_at_boot() loops through all the chip's ucode control block and pause the ucode if it is in the running state. But it does not fail if any of the chip's ucode is not initialised. Add code to return a failure if ucode is not initialized in any of the chip. Since pause_microcode_at_boot() is called just before attaching the IMC device nodes in imc_init(), add code to check for the function return. Fixes: 9750eee802f8d ('hw/imc: pause microcode at boot') Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-09phb4: set TVT1 for tunneled operations in capi modePhilippe Bergheaud1-15/+7
The ASN indication is used for tunneled operations (as_notify and atomics). Tunneled operation messages can be sent in PCI mode as well as CAPI mode. The address field of as_notify messages is hijacked to encode the LPID/PID/TID of the target thread, so those messages should not go through address translation. Therefore bit 59 is part of the ASN indication. This patch sets TVT#1 in bypass mode when capi mode is enabled, to prevent as_notify messages from being dropped. Signed-off-by: Philippe Bergheaud <felix@linux.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-03xive: disable store EOI supportCédric Le Goater2-7/+24
Hardware has limitations which would require to put a sync after each store EOI to make sure the MMIO operations that change the ESB state are ordered. This is a killer for performance and the PHBs do not support the sync. So remove the store EOI for the moment, until hardware is improved. Also, while we are at changing the XIVE source flags, let's fix the settings for the PHB4s which should follow these rules : - SHIFT_BUG for DD10 - STORE_EOI for DD20 and if enabled - TRIGGER_PAGE for DDx0 and if not STORE_EOI Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-04-03phb4: Reset FIR/NFIR registers before PHB4 probeVaibhav Jain1-0/+4
The function phb4_probe_stack() resets "ETU Reset Register" to unfreeze the PHB before it performs mmio access on the PHB. However in case the FIR/NFIR registers are set while entering this function, the reset of "ETU Reset Register" wont unfreeze the PHB and it will remain fenced. This leads to failure during initial CRESET of the PHB as mmio access is still not enabled and an error message of the form below is logged: PHB#0000[0:0]: Initializing PHB4... PHB#0000[0:0]: Default system config: 0xffffffffffffffff PHB#0000[0:0]: New system config : 0xffffffffffffffff PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff PHB#0000[0:0]: Waiting for DLP PG reset to complete... <snip> PHB#0000[0:0]: Timeout waiting for DLP PG reset ! PHB#0000[0:0]: Initialization failed This is especially seen happening during the MPIPL flow where SBE would quiesces and fence the PHB so that it doesn't stomp on the main memory. However when skiboot enters phb4_probe_stack() after MPIPL, the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU reset is done. So to fix this issue the patch introduces new xscom writes to phb4_probe_stack() to reset the FIR/NFIR registers before performing ETU reset to enable mmio access to the PHB. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-04-03capi: Poll Err/Status register during CAPP recoveryVaibhav Jain1-17/+68
This patch updates do_capp_recovery_scoms() to poll the CAPP Err/Status control register, check for CAPP-Recovery to complete/fail based on indications of BITS-1,5,9 and then proceed with the CAPP-Recovery scoms iif recovery completed successfully. This would prevent cases where we bring-up the PCIe link while recovery sequencer on CAPP is still busy with casting out cache lines. In case CAPP-Recovery didn't complete successfully an error is returned from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 fenced and mark it as broken. The loop that implements polling of Err/Status register will also log an error on the PHB when it continues for more than 168ms which is the max time to failure for CAPP-Recovery. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Alastair D'Silva <alastair@d-silva.org> Acked-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/imc: don't access homer memory if it was not initialisedNicholas Piggin1-0/+3
This can happen under mambo, at least. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2: Add performance tuning SCOM initsReza Arbab1-1/+96
Peer-to-peer GPU bandwidth latency testing has produced some tunable values that improve performance. Add them to our device initialization. File these under things that need to be cleaned up with nice #defines for the register names and bitfields when we get time. A few of the settings are dependent on the system's particular NVLink topology, so introduce a helper to determine how many links go to a single GPU. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/npu2: Assign a unique LPARSHORTID per GPUAlistair Popple1-0/+1
This gets used elsewhere to index items in the XTS tables. Signed-off-by: Alistair Popple <alistair@popple.id.au> [arbab@linux.vnet.ibm.com: Added commit log] Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27Revert "NPU2 HMIs: dump out a *LOT* of npu2 registers for debugging"Stewart Smith2-24/+16
This reverts commit fbdc91e693fc3103f7e2a65054ed32bfb26a2e17. We don't need this as we need to do it a different way, with a explicit set of registers as otherwise we trip other random FIR bits and everything becomes even more terrible. I suggest alcohol. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27dts: Zero struct to avoid using uninitialised valueCyril Bur1-2/+2
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27hw/imc: Don't dereference possible NULLCyril Bur1-1/+2
Fixes: CID 263056 and 263052 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they existCyril Bur1-5/+7
Fixes: ac4272bf ("fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt") Fixes: CID 263053 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2-opencapi: Fix memory leakCyril Bur1-1/+1
Fixes: CID 264267 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27npu2: Fix possible NULL dereferenceCyril Bur1-3/+6
The follow pattern exists in several npu2 functions: struct phb *phb = pci_get_phb(phb_id); struct npu2 *p = phb_to_npu2_nvlink(phb); The problem is that pci_get_phb() can return NULL and phb_to_npu2_nvlink() dereferences its parameter. Coverity says that the return value of pci_get_phb() is checked 43 out of 46 times which suggests we should be more careful. Futhurmore, functions with the baddly placed call to phb_to_npu2_nvlink() do seem to check that the return value of pci_get_phb() isn't NULL, but this check would be too little too late. This patch just moves the call of phb_to_npu2_nvlink() to after the NULL check for the return value of pci_get_phb(). Affected functions are: opal_npu_map_lpar() opal_npu_init_context() opal_npu_destroy_context() Fixes: CID 264274, 264273, 264272, 264271, 264266, 264265 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27occ-sensors: Remove NULL checks after dereferenceCyril Bur1-2/+2
Both scale_sensor() and scale_energy() take the value to scale as a pointer. These functions do not NULL check the pointer before the first time they dereference it, which is fine since passing NULL would be completely pointless. Both functions do perform a pointless NULL check later on. This confuses coverity and really doesn't make much sense at all. Since calling these functions with NULL as the sensor parameter makes no sense, and currently theres a dereference before the check, just remove the check. Fixes: CID 264276 and 264275 Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-27occ: Set up OCC messaging even if we fail to setup pstatesStewart Smith1-8/+21
This means that we no longer hit this bug if we fail to get valid pstates from the OCC. [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 [ 10.318805] Disabling lock debugging due to kernel taint [ 10.318808] Severe Machine check interrupt [Not recovered] [ 10.318812] NIP [000000003003e434]: 0x3003e434 [ 10.318813] Initiator: CPU [ 10.318815] Error type: Real address [Load/Store (foreign)] [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> [Additional changes from Shilpa] Reviewed-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22xive: fix opal_xive_set_vp_info() error pathCédric Le Goater1-1/+1
In case of error, opal_xive_set_vp_info() will return without unlocking the xive object. This is most certainly a typo. Signed-off-by: Cédric Le Goater <clg@kaod.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22npu2: Remove DD1 supportAndrew Donnellan2-85/+52
Major changes in the NPU between DD1 and DD2 necessitated a fair bit of revision-specific code. Now that all our lab machines are DD2, we no longer test anything on DD1 and it's time to get rid of it. Remove DD1-specific code and abort probe if we're running on a DD1 machine. Cc: Alistair Popple <alistair@popple.id.au> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-By: Alistair Popple <alistair@popple.id.au> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22phb*: Remove the state field in the various phb structuresOliver O'Halloran4-47/+37
We've been carting around this field since the original p7ioc-phb code. As far as I can tell we never actually use it for anything other than checking if the PHB has been marked as broken or not. The _FENCED state is set in a few places, but we never use it in favour of just checking the MMIO register. This patch just replaces it with a boolean that indicates if the PHB has been marked as broken and removes the giant, mostly wrong, comment explaining it's usage that is copied and pasted into each phb header file. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-22SLW: Increase stop4-5 residency by 10xAkshay Adiga1-2/+2
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup and provide the subsequent data block to compute. The wakup latency accumulates over the run and shows up as a performance drop. Linux enters stop4/5 more aggressively for its wakeup latency. Increasing the residency from 1ms to 10ms makes the performance drop <1% Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-14dts: spl_wakeup: Remove all workarounds in the spl wakeup logicShilpasri G Bhat1-29/+1
We coded few workarounds in special wakeup logic to handle the buggy firmware. Now that is fixed remove them as they break the special wakeup protocol. As per the spec we should not de-assert beofre assert is complete. So follow this protocol. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-14npu2: Disable fast rebootReza Arbab1-0/+3
Fast reboot does not yet work right with the NPU. It's been disabled on NVLink and OpenCAPI machines. Do the same for NVLink2. This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") from the npu code to npu2. Cc: stable # 5.10.x Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-12npu2: Use unfiltered mode in XTS tablesReza Arbab1-44/+31
The XTS_PID context table is limited to 256 possible pids/contexts. To relieve this limitation, make use of "unfiltered mode" instead. If an entry in the XTS_BDF table has the bit for unfiltered mode set, we can just use one context for that entire bdf/lpar, regardless of pid. Instead of of searching the XTS_PID table, the NMMU checkout request will simply use the entry indexed by lparshort id instead. Change opal_npu_init_context() to create these lparshort-indexed wildcard entries (0-15) instead of allocating one for each pid. Check that multiple calls for the same bdf all specify the same msr value. In opal_npu_destroy_context(), continue validating the bdf argument, ensuring that it actually maps to an lpar, but no longer remove anything from the XTS_PID table. If/when we start supporting virtualized GPUs, we might consider actually removing these wildcard entries by keeping a refcount, but keep things simple for now. Signed-off-by: Reza Arbab <arbab@linux.vnet.ibm.com> Acked-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-06Revert "console(lpc/fsp-console): Use only stdout-path property on P9 and above"Stewart Smith2-22/+6
This reverts commit 20f685a3627a2a522c465716377561a8fbcc608f. We've hit problems on Zaius machines and the needed petitboot changes haven't made it upstream yet. Let's revert for the time being while we sort everything out. We probably have to keep both around for a few years. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-06capp: Disable fast-reboot whenever enable_capi_mode() is calledVaibhav Jain1-4/+2
This patch updates phb4_set_capi_mode() to disable fast-reboot whenever enable_capi_mode() is called, irrespective to its return value. This should prevent against a possibility of not disabling fast-reboot when some changes to enable_capi_mode() causing return of an error and leaving CAPP in enabled mode. Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-06capp: Make error in capp timebase sync a non-fatal errorVaibhav Jain1-3/+1
Presently when we encounter an error while synchronizing capp timebase with chip-tod at the end of enable_capi_mode() we return an error. This has an to unintended consequences. First this will prevent disabling of fast-reboot even though CAPP is already enabled by this point. Secondly, failure during timebase sync is a non fatal error or capp initialization as CAPP/PSL can continue working after this and an AFU will only see an error when it tries to read the timebase value from PSL. So this patch updates enable_capi_mode() to not return an error in case call to chiptod_capp_timebase_sync() fails. The function will now just log an error and continue further with capp init sequence. This make the current implementation align with the one in kernel 'cxl' driver which also assumes the PSL timebase sync errors as non-fatal init error. Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-06npu2-opencapi: Fix assert on link reset during initFrederic Barrat1-0/+9
We don't support resetting an opencapi link yet. Commit fe6d86b9 ("pci: Make fast reboot creset PHBs in parallel") tries resetting any PHB whose slot defines a 'run_sm' callback. It raises an assert when applied to an opencapi PHB, as 'run_sm' calls the 'freset' callback, which is not yet defined for opencapi. Fix it for now by removing the currently useless definition of 'run_sm' on the opencapi slot. It will print a message in the skiboot log because the PHB cannot be reset, which is correct. It will all go away when we add support for resetting an opencapi link. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-06capp: Add lid definition for P9 DD-2.2Christophe Lombard1-0/+2
Update fsp_lid_map to include CAPP ucode lid for phb4-chipid == 0x202d1 that corresponds to P9 DD-2.2 chip. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01phb4: set PBCQ Tunnel BAR for tunneled operationsPhilippe Bergheaud1-5/+66
P9 supports PCI tunneled operations (atomics and as_notify) that are initiated by devices. A subset of the tunneled operations require a response, that must be sent back from the host to the device. For example, an atomic compare and swap will return the compare status, as swap will only performed in case of success. Similarly, as_notify reports if the target thread has been woken up or not, because the operation may fail. To enable tunneled operations, a device driver must tell the host where it expects tunneled operation responses, by setting the PBCQ Tunnel BAR Response register with a specific value within the range of its BARs. This register is currently initialized by enable_capi_mode(). But, as tunneled operations may also operate in PCI mode, a new API is required to set the PBCQ Tunnel BAR Response register, without switching to CAPI mode. This patch provides two new OPAL calls to get/set the PBCQ Tunnel BAR Response register. Note: as there is only one PBCQ Tunnel BAR register, shared between all the devices connected to the same PHB, only one of these devices will be able to use tunneled operations, at any time. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01phb4: set PHB CMPM registers for tunneled operationsPhilippe Bergheaud1-7/+42
P9 supports PCI tunneled operations (atomics and as_notify) that require setting the PHB ASN Compare/Mask register with a 16-bit indication. This register is currently initialized by enable_capi_mode(). But, as tunneled operations may also work in PCI mode, the ASN Compare/Mask register should rather be initialized in phb4_init_ioda3(). This patch also adds "ibm,phb-indications" to the device tree, to tell Linux the values of CAPI, ASN, and NBW indications, when supported. Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies in PCI mode. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01console(lpc/fsp-console): Use only stdout-path property on P9 and abovePridhiviraj Paidipeddi2-6/+22
dtc tool complaining about below warning as usage of linux,stdout-path property under /chosen node is deprecated. dts: Warning (chosen_node_stdout_path): Use 'stdout-path' instead of 'linux,stdout-path' So this patch fix this by using stdout-path property on all the systems and keep linux,stdout-path only on P8 and before. This property refers to a node which represents the device to be used for boot console output. Verified boot on both P8 and P9 systems with new and older kernels. And also verified dtc warnings got fixed in both P8 and P9. Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> [stewart: simplify logic] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01capp: Disable fast-reboot when capp is enabledVaibhav Jain1-19/+19
Ref[1] enables fast-reboot by default for POWER9. Presently fast-reboot for CAPP is not yet supported. Hence this patch disables fast-reboot in case the CAPP is enabled after reboot until its support is merged. References: [1] https://patchwork.ozlabs.org/patch/878879/ Suggested-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Add OpenCAPI OPAL API callsFrederic Barrat1-0/+206
Add three OPAL API calls that are required by the ocxl driver. - OPAL_NPU_SPA_SETUP The Shared Process Area (SPA) is a table containing one entry (a "Process Element") per memory context which can be accessed by the OpenCAPI device. - OPAL_NPU_SPA_CLEAR_CACHE The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. - OPAL_NPU_TL_SET The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Train OpenCAPI links and setup devicesAndrew Donnellan2-7/+687
Scan the OpenCAPI links under the NPU, and for each link, reset the card, set up a device, train the link and register a PHB. Implement the necessary operations for the OpenCAPI PHB type. For bringup, test and debug purposes, we allow an NVRAM setting, "opencapi-link-training" that can be set to either disable link training completely or to use the prbs31 test pattern. To disable link training: nvram -p ibm,skiboot --update-config opencapi-link-training=none To use prbs31: nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31 Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-hw-procedures: Add support for OpenCAPI PHY link trainingAndrew Donnellan1-12/+116
Unlike NVLink, which uses the pci-virt framework to fake a PCI configuration space for NVLink devices, the OpenCAPI device model presents us with a real configuration space handled by the device over the OpenCAPI link. As a result, we have to train the OpenCAPI link in skiboot before we do PCI probing, so that config space can be accessed, rather than having link training being triggered by the Linux driver. Add some helper functions to wrap the existing NVLink PHY training sequence so we can easily run it within skiboot. Additionally, we add OpenCAPI-specific lane settings, and a function to "bump" lanes that haven't trained properly (this process isn't documented in the workbook, but the hardware experts assure us that this improves link training reliability...) We also support the PRBS31 pattern that's used for bringup and test purposes. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2-opencapi: Configure NPU for OpenCAPIAndrew Donnellan3-0/+833
Scan the device tree for NPUs with OpenCAPI links and configure the NPU per the initialisation sequence in the NPU OpenCAPI workbook. Training of individual links and setup of per-AFU/link configuration will be in a later patch. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2: Rework NPU data structures for OpenCAPIAndrew Donnellan2-65/+71
Unlike NVLink, OpenCAPI registers a separate PHB for each device, in order to allow us to force Linux to use the correct MMIO windows for each NPU link. This requires some reworking of NPU data structures to account for the fact that a PHB could correspond to either an NPU (NVLink) or a single link (OpenCAPI). At some later point, we may want to rework the NVLink code to present a separate PHB per device in order to simplify this. For now, we split NVLink-specific device data into a separate struct in order to make it clear which fields are NVLink-only. Additionally, add helper functions to correctly translate between OpenCAPI/NVLink PHBs and the underlying structures, and various fields for OpenCAPI data that we're going to need later on. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-03-01npu2: Split out common helper functions into separate fileAndrew Donnellan3-93/+115
Split out common helper functions for NPU register access into a separate file, as these will be used extensively by both NVLink and OpenCAPI code. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Reza Arbab <arbab@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28NPU2 HMIs: dump out a *LOT* of npu2 registers for debuggingStewart Smith2-16/+24
This is not the way we want to end up doing this. This is a hack to make folk happy and not require crondump to debug nvidia/npu2 issues. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgtShilpasri G Bhat1-1/+5
Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC nodes which are already present in the /ibm,opal/power-mgt node. These per-chip nodes hold the voltage IDs for each pstate and these can be changed on OCC pstate table biasing. So delete these before calling the re-init code to re-parse and populate the pstate data. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-28build: use thin archives rather than incremental linkingNicholas Piggin5-5/+5
This changes to build system to use thin archives rather than incremental linking for built-in.o, similar to recent change to Linux. built-in.o is renamed to built-in.a, and is created as a thin archive with no index, for speed and size. All built-in.a are aggregated into a skiboot.tmp.a which is a thin archive built with an index, making it suitable or linking. This is input into the final link. The advantags of build size and linker code placement flexibility are not as great with skiboot as a bigger project like Linux, but it's a conceptually better way to build, and is more compatible with link time optimisation in toolchains which might be interesting for skiboot particularly for size reductions. Size of build tree before this patch is 34.4MB, afterwards 23.1MB. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-22phb4: Disable lane eq when retrying some nvidia GEN3 devicesMichael Neuling1-9/+45
This fixes these nvidia cards training at only GEN2 spends rather than GEN3 by disabling PCIe lane equalisation. Firstly we check if the card is in a whitelist. If it is and the link has not trained optimally, retry with lane equalisation off. We do this on all POWER9 chip revisions since this is a device issue, not a POWER9 chip issue. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensor-groups: occ: Add support to disable/enable sensor groupShilpasri G Bhat2-263/+137
This patch adds a new opal call to enable/disable a sensor group. This call is used to select the sensor groups that needs to be copied to main memory by OCC at runtime. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [stewart: rebase and bump OPAL API number] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensors: occ: Scale the sensor valuesShilpasri G Bhat1-4/+107
Scale the sensor values appropriately to match the HWMON requirements. OCC sensor values should be scaled as per "scale_factor' field and then converted to HWMON required unit. Sensors like temperature and power are already scaled by the kernel driver which convert the values to millidegree Celsius and microWatt respectively. So apart from temperature and power the remaining sensors are converted to HWMON standards. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensors: occ: Add energy countersShilpasri G Bhat1-70/+119
Export the accumulated power values as energy sensors. The accumulator field of power sensors are used for representing energy counters which can be exported as energy counters in Linux hwmon interface. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [stewart: fix old gcc compiler warning] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2018-02-21sensors: Support reading u64 sensor valuesShilpasri G Bhat3-8/+10
This patch adds support to read u64 sensor values. This also adds changes to the core and the backend implementation code to make this API as the base call. Host can use this new API to read sensors upto 64bits. This adds a list to store the pointer to the kernel u32 buffer, for older kernels making async sensor u32 reads. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>