aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2019-06-27npu2: Increase timeout for L2/L3 cache purgingAlexey Kardashevskiy1-7/+13
On NVLink2 bridge reset, we purge all L2/L3 caches in the system. This is an asynchronous operation, we have a 2ms timeout here. There are reports that this is not enough and "PURGE L3 on core xxx timed out" messages appear (for the reference: on the test setup this takes 280us..780us). This defines the timeout as a macro and changes this from 2ms to 20ms. This adds a tracepoint to tell how long it took to purge all the caches. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-27hw/phb3: Add verbose EEH outputOliver O'Halloran1-1/+96
Add support for the pci-eeh-verbose NVRAM flag on PHB3. We've had this on PHB4 since forever and it has proven very useful when debugging EEH issues. When testing changes to the Linux kernel's EEH implementation it's fairly common for the kernel to crash before printing the EEH log so it's helpful to have it in the OPAL log where it can be dumped from XMON. Note that unlike PHB4 we do not enable verbose mode by default. The nvram option must be used to explicitly enable it. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-27pci: Make the pci-eeh-verbose nvram option genericOliver O'Halloran1-7/+3
We currently have the "pci-eeh-verbose" NVRAM flag that causes phb4 to print a register dump when it detects the PHB has been fenced. This is useful for debugging most EEH issues since the kernel may not be ready to handle EEH events when the problem is first detected. There's no real reason this needs to be specific to PHB4 so this patch moves the nvram flag handling into the generic init path (along with the pcie_max_link_speed flag) so we can add a similar function for PHB3. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-27external/mambo: Bump default POWER9 to Nimbus DD2.3Nicholas Piggin1-1/+1
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Experimental support for building without FSP codeStewart Smith1-0/+3
Now, with CONFIG_FSP=0/1 we have: 1.6M/1.4M skiboot.lid 323K/375K skiboot.lid.xz Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Move platform specific PRD functionality to struct platformStewart Smith1-11/+25
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Separate FSP specific PSI codeStewart Smith3-86/+107
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Move FSP-specific VPD functionality to platforms/ibm-fsp/Stewart Smith2-2/+4
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24hw/fsp/fsp.c: remove lying commentsStewart Smith1-9/+0
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24core/opal: move HIR trigger to FSP pollerStewart Smith1-0/+17
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24host_services_occ_base_setup is core homer code not host_servicesStewart Smith1-0/+25
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Move core/hostservices.c to platforms/ibm-fsp/Stewart Smith1-1/+0
It's only used on FSP systems so should really just be part of that platform support. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24Split FSP OCC code out into hw/fsp/Stewart Smith3-388/+426
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-24op_display: make platform function rather than "FSP" specificStewart Smith1-4/+2
We have an implementation for non-FSP systems now, and we shouldn't be calling that from code in an fsp/ directory, so move op_display() to a platform function. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-13npu2: Purge cache when resetting a GPUReza Arbab1-0/+6
After putting all a GPU's links in reset, do a cache purge in case we have CPU cache lines belonging to the now-unaccessible GPU memory. Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU") Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-13npu2: Fix typoReza Arbab1-2/+2
Change "brigde" to "bridge". Fixes: 68d11e4460ec ("npu2: Reset NVLinks when resetting a GPU") Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-04npu2-opencapi: Mask 2 XSL errorsFrederic Barrat1-9/+20
Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs") converted some FIR bits default action from system checkstop to raising an error interrupt. For 2 XSL error events that can be triggered by a misbehaving AFU, the error interrupt is raised twice, once for each link (the XSL logic in the NPU is shared between 2 links). So a badly behaving AFU could impact another, unsuspecting opencapi adapter. It doesn't look good and it turns out we can do better. We can mask those 2 XSL errors. The error will also be picked up by the OTL logic, which is per link. So we'll still get an error interrupt, but only on the relevant link, and the other opencapi adapter can stay functional. Fixes: f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs") Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-04hw/lpc: Fix theoretical possible out-of-bounds-readStewart Smith1-2/+2
number of elements versus starting counting from 0. Found by static analysis. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-04Remove P7 remnants: hw/cec.c, apollo platformStewart Smith2-73/+1
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-04Remove POWER7 and POWER7+ supportStewart Smith15-5391/+15
It's been a good long while since either OPAL POWER7 user touched a machine, and even longer since they'd have been okay using an old version rather than tracking master. There's also been no testing of OPAL on POWER7 systems for an awfully long time, so it's pretty safe to assume that it's very much bitrotted. It also saves a whole 14kb of xz compressed payload space. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Enthusiasticly-Acked-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03prd: Implement generic FSP - HBRT interfaceVasant Hegde2-1/+78
This patch implements generic interface to pass data from FSP to HBRT during runtime (FSP -> OPAL -> opal-prd -> HBRT). OPAL gets notification from FSP for new HBRT messages. We will convert MBOX message to firmware_notify format and send it to HBRT. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03prd: Implement generic HBRT - FSP interfaceVasant Hegde1-1/+129
This patch implements generic interface to pass data from HBRT to FSP during runtime (HBRT -> opal-prd -> kernel -> OPAL -> FSP). HBRT sends data via firmware_request interface. We have to convert that to MBOX format and send it to FSP. OPAL uses TCE mapped memory to send data. FSP will reuse same memory for response. Once processing is complete FSP sends response to OPAL. Finally OPAL calls HBRT with firmware_response message. Also introduces new opal_msg type (OPAL_MSG_PRD2) to pass bigger prd message to kernel. - if (prd_msg > OPAL_MSG_FIXED_PARAMS_SIZE) use OPAL_MSG_PRD2 Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03prd: Validate _opal_queue_msg() return valueVasant Hegde1-6/+12
On safer side, validate _opal_queue_msg() return value. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03opal-msg: Pass parameter size to _opal_queue_msg()Vasant Hegde2-11/+13
Currently _opal_queue_msg() takes number of parameters. So far this was fine as opal_queue_msg() was supporting only fixed number of parameters (8 * 8 bytes). Soon we are going to introduce variable size parameter. Hence num_params -> params_size. Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03opal-msg: Pass return value to callback handlerVasant Hegde3-7/+7
Kernel calls opal_get_msg() API to read OPAL message. In this path OPAL calls "callback" handler to inform caller that kernel read the opal message. It assumes that read is always success. This assumption was fine as message was always fixed size. Next patch introduces variable size opal message. In that situation opal_get_msg() may fail due to insufficient buffer size (ex: old kernel and new OPAL combination). So lets add `return value` parameter to "callback" handler. So that caller knows kernel didn't read the message and take appropriate action. Cc: Jeremy Kerr <jk@ozlabs.org> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03npu2: Clear fence state for a brick being resetAlexey Kardashevskiy1-0/+8
Resetting a GPU before resetting an NVLink leads to occasional HMIs which fence some bricks and prevent the "reset_ntl" procedure from succeeding at the "reset_ntl_release" step - the host system requires reboot; there may be other cases like this as well. This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for the NVLink which we are about to reset. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Make phb4_training_trace() more generalOliver O'Halloran1-19/+25
phb4_training_trace() is used to monitor the Link Training Status State Machine (LTSSM) of the PHB's data link layer. Currently it is only used to observe the LTSSM while bringing up the link, but sometimes it's useful to see what's occurring in other situations (e.g. link disable, or secondary bus reset). This patch renames it to phb4_link_trace() and allows the target LTSSM state and a flexible timeout to help in these situations. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Set trace enable where it's usedOliver O'Halloran1-12/+22
The current LTSSM state was added to the PHB4 link trace output in 961547bceed3 ("phb4: Enhanced PCIe training tracing"). That patch split enabling the LTSSM state output from the rest of the tracing code in phb4_training_trace() to ensure that it would capture events from right after PERST is lifted. This is not really necessary since LTSSM state changes occur over milliseconds. We lose nothing by delaying the enable slightly so this patch moves it into phb4_training_trace() to keep the tracing code in one place. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Add missing LTSSM statesOliver O'Halloran1-0/+6
The "disabled" and "loopback" states are missing from the table. We never expect to see the second, but the first does occasionally come up. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Use read/write_reg in assert_perstOliver O'Halloran1-2/+2
While the PHB is fenced we can't use the MMIO interface to access PHB registers. While processing a complete reset we inject a PHB fence to isolate the PHB from the rest of the system because the PHB won't respond to MMIOs from the rest of the system while being reset. We assert PERST after the fence has been erected which requires us to use the XSCOM indirect interface to access the PHB registers rather than the MMIO interface. Previously we did that when asserting PERST in the CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST control"). This was re-written to use the raw in_be64() accessor. This means that CRESET would not be asserted in the reset path. On some Mellanox cards this would prevent them from re-loading their firmware when the system was fast-reset. This patch fixes the problem by replacing the raw {in|out}_be64() accessors with the phb4_{read|write}_reg() functions. Reported-by: Carol L Soto <clsoto@us.ibm.com> Fixes: b8b4c79d4419 ("hw/phb4: Factor out PERST control") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Tested-by: Carol L Soto <clsoto@us.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03hw/phb4: Assert Link Disable bit after ETU initOliver O'Halloran1-0/+6
The cursed RAID card in ozrom1 has a bug where it ignores PERST being asserted. The PCIe Base spec is a little vague about what happens while PERST is asserted, but it does clearly specify that when PERST is de-asserted the Link Training and Status State Machine (LTSSM) of a device should return to the initial state (Detect) defined in the spec and the link training process should restart. This bug was worked around in 9078f8268922 ("phb4: Delay training till after PERST is deasserted") by setting the link disable bit at the start of the FRESET process and clearing it after PERST was de-asserted. Although this fixed the bug, the patch offered no explaination of why the fix worked. In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable workaround was moved into phb4_assert_perst(). This is called always in the CRESET case, but a following patch resulted in assert_perst() not being called if phb4_freset() was entered following a CRESET since p->skip_perst was set in the CRESET handler. This is bad since a side-effect of the CRESET is that the Link Disable bit is cleared. This, combined with the RAID card ignoring PERST results in the PCIe link being trained by the PHB while we're waiting out the 100ms ETU reset time. If we hack skiboot to print a DLP trace after returning from phb4_hw_init() we get: PHB#0001[0:1]: Initialization complete PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0 PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0 PHB#0001[0:1]: CRESET: wait_time = 100 PHB#0001[0:1]: FRESET: Starts PHB#0001[0:1]: FRESET: Prepare for link down PHB#0001[0:1]: FRESET: Assert skipped PHB#0001[0:1]: FRESET: Deassert PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0 PHB#0001[0:1]: TRACE: Reached target state PHB#0001[0:1]: LINK: Start polling PHB#0001[0:1]: LINK: Electrical link detected PHB#0001[0:1]: LINK: Link is up PHB#0001[0:1]: LINK: Went down waiting for stabilty PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000 PHB#0001[0:1]: CRESET: Starts What has happened here is that the link is trained to 8x Gen3 33ms after we return from phb4_init_hw(), and before we've waitined to 100ms that we normally wait after re-initialising the ETU. When we "deassert" PERST later on in the FRESET handler the link in L0 (normal) state. At this point we try to read from the Vendor/Device ID register to verify that the link is stable and immediately get a PHB fence due to a PCIe Completion Timeout. Skiboot attempts to recover by doing another CRESET, but this will encounter the same issue. This patch fixes the problem by setting the Link Disable bit (by calling phb4_assert_perst()) immediately after we return from phb4_init_hw(). This prevents the link from being trained while PERST is asserted which seems to avoid the Completion Timeout. With the patch applied we get: PHB#0001[0:1]: Initialization complete PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled PHB#0001[0:1]: CRESET: wait_time = 100 PHB#0001[0:1]: FRESET: Starts PHB#0001[0:1]: FRESET: Prepare for link down PHB#0001[0:1]: FRESET: Assert skipped PHB#0001[0:1]: FRESET: Deassert PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0 PHB#0001[0:1]: TRACE: Reached target state PHB#0001[0:1]: LINK: Start polling PHB#0001[0:1]: LINK: Electrical link detected PHB#0001[0:1]: LINK: Link is up PHB#0001[0:1]: LINK: Link is stable PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3 PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08 PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000 Cc: Michael Neuling <mikey@neuling.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03lpc-port80h: Don't write port 80h when running under SimicsAlistair Popple2-0/+6
Simics doesn't model LPC port 80h. Writing to it terminates the simulation due to an invalid LPC memory access. This patch adds a check to ensure port 80h isn't accessed if we are running under Simics. Signed-off-by: Alistair Popple <alistair@popple.id.au> [stewart: fixup run-port80h test] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03Remove remnants of OPAL_PCI_GET_PHB_DIAG_DATAStewart Smith6-6/+0
Never present in a public OPAL release, and only kernels prior to 3.11 would ever attempt to call it. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03Remove unused OPAL_GET_XIVE_SOURCEStewart Smith1-14/+0
While this call was technically implemented by skiboot, no code has ever called it, and it was only ever implemented for the p7ioc-phb back-end (i.e. POWER7). Since this call was unused in Linux, and that POWER7 with OPAL was only ever available internally, so it should be safe to remove the call. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-06-03Remove last remnants of OPAL_PCI_SET_PHB_TCE_MEMORY and ↵Stewart Smith1-14/+0
OPAL_PCI_SET_HUB_TCE_MEMORY Since we have not supported p5ioc systems since skiboot 5.2, it's pretty safe to just wholesale remove these OPAL calls now. Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-24npu2: Fix clearing the FIR bitsAlexey Kardashevskiy1-1/+1
FIR registers are SCOM-only so they cannot be accesses with the indirect write, and yet we use SCOM-based addresses for these; fix this. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-By: Alistair Popple <alistair@popple.id.au> Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-24npu2: Reset NVLinks when resetting a GPUAlexey Kardashevskiy1-0/+55
Resetting a V100 GPU brings its NVLinks down and if an NPU tries using those, an HMI occurs. We were lucky not to observe this as the bare metal does not normally reset a GPU and when passed through, GPUs are usually before NPUs in QEMU command line or Libvirt XML and because of that NPUs are naturally reset first. However simple change of the device order brings HMIs. This defines a bus control filter for a PCI slot with a GPU with NVLinks so when the host system issues secondary bus reset to the slot, it resets associated NVLinks. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-24npu2: Reset PID wildcard and refcounter when mapped to LPIDAlexey Kardashevskiy1-0/+7
Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not register every PID in the XTS table so the table has one entry per LPID. Then we added a reference counter to keep track of the entry use when switching GPU between the host and guest systems (the "Fixes:" tag below). The POWERNV platform setup creates such entries and references them at the boot time when initializing IOMMUs and only removes it when a GPU is passed through to a guest. This creates a problem as POWERNV boots via kexec and no defererencing happens; the XTS table state remains undefined. So when the host kernel boots, skiboot thinks there are valid XTS entries and does not update the XTS table which breaks ATS. This adds the reference counter and the XTS entry reset when a GPU is assigned to LPID and we cannot rely on the kernel to clean that up. Fixes: ba1d95a1d460 ("npu2: Add XTS_BDF_MAP wildcard refcount") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: Reza Arbab <arbab@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-21hw/xive.c: Fix memcmp() in DEBUG build to compare struct not ptrStewart Smith1-2/+2
With GCC9: hw/xive.c: In function ‘xive_check_eq_update’: hw/xive.c:3034:29: error: argument to ‘sizeof’ in ‘__builtin_memcmp’ call is the same expression as the first source; did you mean to dereference it? [-Werror=sizeof-pointer-memaccess] if (memcmp(eq, &eq2, sizeof(eq)) != 0) { ^ hw/xive.c: In function ‘xive_check_vpc_update’: hw/xive.c:3056:29: error: argument to ‘sizeof’ in ‘__builtin_memcmp’ call is the same expression as the first source; did you mean to dereference it? [-Werror=sizeof-pointer-memaccess] if (memcmp(vp, &vp2, sizeof(vp)) != 0) { ^ cc1: all warnings being treated as errors Fixes: 2eea386767728 Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20nx: remove check on the "qemu, powernv" propertyCédric Le Goater1-5/+1
commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU machine") introduced a check on the property "qemu,powernv" to skip NX initialization when running under a QEMU machine. The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the "qemu,powernv" property is not necessary anymore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memoryAndrew Donnellan2-0/+192
Lowest Point of Coherency (LPC) memory allows the host to access memory on an OpenCAPI device. Define 2 OPAL calls, OPAL_NPU_MEM_ALLOC and OPAL_NPU_MEM_RELEASE, for assigning and clearing the memory BAR. (We try to avoid using the term "LPC" to avoid confusion with Low Pin Count.) At present, we use a fixed location in the address space, which means we are restricted to a single range of 4TB, on a single OpenCAPI device per chip. In future, we'll use some chip ID extension magic to give us more space, and some sort of allocator to assign ranges to more than one device. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20hw/phb4: Make pci-tracing print at PR_NOTICEOliver O'Halloran1-4/+7
When pci-tracing is enabled we print each trace status message and the final trace status at PR_ERROR. The final status messages are similar to those printed when we fail to train in the non-pci-tracing path and this has resulted in spurious op-test failures. This patch reduces the log-level of the tracing message to PR_NOTICE so they're not accidently interpreted as actual error messages. PR_NOTICE messages are still printed to the console during boot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-20occ-sensors: Check if OCC is reset while reading inband sensorsShilpasri G Bhat2-0/+8
OCC may not be able to mark the sensor buffer as invalid while going down RESET. If OCC never comes back we will continue to read the stale sensor data. So verify if OCC is reset while reading the sensor values and propagate the appropriate error. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15Add P9 DIO interrupt supportLei YU3-1/+149
On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt is used to handle the interrupts. Add support to the DIO interrupts: 1. Add dio_interrupt_register(chip, port, callback) to register the interrupt; 2. Add dio_interrupt_deregister(chip, port, callback) to deregister; 3. When interrupt on the port occurs, callback is invoked, and the interrupt status is cleared. Signed-off-by: Lei YU <mine260309@gmail.com> [oliver: Fixed Makefile.inc merge conflict] Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15fsp/leds: improve string operations bounds checkingNicholas Piggin1-14/+17
The current code has a few possible issues with string handling, and gcc flags a number of string / buffer warnings when enabling more checking. Some of the issues in the file: - Mixing of null-terminated arrays (in most cases), and non-null in the input/output buffer format. memcpy generally should be used when the length is known. - Lack of input data length bounds checking. Malformed input could cause overruns. - String copying from same sized source and destination array sizes, where the source is a NUL terminated string, so the strncpy copies the string without its NUL terminator, which becomes NUL terminated at the zeroed destination array. Compiler does not like this, and it only works if the destination has been zeroed, so not a great pattern. - Attemping to NUL terminate string using strcat, which will overwrite a byte past the end of the array if the string length is at maximum, or worse if the input was malformed. This patch fixes several of these issues and fixes a number of compiler warnings. In general, the buffer and string handling could probably benefit from a more in-depth audit. Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15xive: Remove xive rev field and recognize P9PNicholas Piggin1-11/+12
All supported P9s are the revision 2 xive model, so there is no point to keeping it around. This avoids P9P being reported as unknown rev (which doesn't cause any other problems). Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15xscom: move more register definitions into processor-specific includesNicholas Piggin3-0/+8
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-15nvram: Flag dangerous NVRAM optionsMichael Neuling7-12/+11
Most nvram options used by skiboot are just for debug or testing for regressions. They should never be used long term. We've hit a number of issues in testing and the field where nvram options have been set "temporarily" but haven't been properly cleared after, resulting in crashes or real bugs being masked. This patch marks most nvram options used by skiboot as dangerous and prints a chicken to remind users of the problem. Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Acked-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2019-05-09platforms/astbmc: Check for SBE validation stepSamuel Mendoza-Jonas1-3/+106
On some POWER8 astbmc systems an update to the SBE requires pausing at runtime to ensure integrity of the SBE. If this is required the BMC will set a chassis boot option IPMI flag using the OEM parameter 0x62. If Skiboot sees this flag is set it waits until the SBE update is complete and the flag is cleared. Unfortunately the mystery operation that validates the SBE also leaves it in a bad state and unable to be used for timer operations. To workaround this the flag is checked as soon as possible (ie. when IPMI and the console are set up), and once complete the system is rebooted. Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2019-05-09ipmi: ensure forward progress on ipmi_queue_msg_sync()Stewart Smith2-0/+8
BT responses are handled using a timer doing the polling. To hope to get an answer to an IPMI synchronous message, the timer needs to run. We can't just check all timers though as there may be a timer that wants a lock that's held by a code path calling ipmi_queue_msg_sync(), and if we did enforce that as a requirement, it's a pretty subtle API that is asking to be broken. So, if we just run a poll function to crank anything that the IPMI backend needs, then we should be fine. This issue shows up very quickly under QEMU when loading the first flash resource with the IPMI HIOMAP backend. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Reviewed-by: Andrew Jeffery <andrew@aj.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>