aboutsummaryrefslogtreecommitdiff
path: root/hw
AgeCommit message (Collapse)AuthorFilesLines
2018-06-01ipmi-watchdog: Add a flag to determine if we are still tickingWilliam A. Kennington III1-3/+12
This makes it easier for future changes to ensure that the watchdog stops ticking and doesn't requeue itself for execution in the background. This way it is safe for resets to be performed after the ticks are assumed to be stopped and it won't start the timer again. Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01ipmi-watchdog: Don't disable at shutdownWilliam A. Kennington III1-6/+1
The op-build linux kernel has been configured to support the ipmi watchdog. This driver will always handle the watchdog by either leaving it enabled if configured, or by disabling it during module load if no configuration is provided. This increases the coverage of the watchdog during the boot process. The watchdog should no longer be disabled at any point during skiboot execution. Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01ipmi-watchdog: Don't reset the watchdog twiceWilliam A. Kennington III1-4/+0
There is no clarification for why this change was needed, but presumably this is due to a buggy BMC implementation where the Watchdog Set command was processed concurrently or after the initial Watchdog Reset. This inversion would cause the watchdog to stop since the DONT_STOP bit was not set. Since we are now using the DONT_STOP bit during initialization, the watchdog should not be stopped even if an inversion occurs. Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01ipmi-watchdog: Make it possible to set DONT_STOPWilliam A. Kennington III1-6/+8
The IPMI standard supports setting a DONT_STOP bit during an Watchdog Set operation. Most of the time we don't want to stop the Watchdog when updating the settings so we should be using this bit. This patch makes it possible for callers of set_wdt to prevent the watchdog from being stopped. This only changes the behavior of the watchdog during the initial settings update when initializing skiboot. The watchdog is no longer disabled and then immediately re-enabled. Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTIONWilliam A. Kennington III1-3/+3
The IPMI specification denotes that action 0x1 is Host Reset and 0x3 is Host Power Cycle. Use the correct name for Reset in our watchdog code. Signed-off-by: William A. Kennington III <wak@google.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01npu2-opencapi: Fix link state to report link downFrederic Barrat1-2/+11
The PHB callback 'get_link_state' is always reporting the link width, irrespective of the link status and even when the link is down. It is causing too much work (and failures) when the PHB is probed during pci init. The fix is to look at the link status first and report the link as down when appropriate. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01npu2-opencapi: Cleanup traces printed during link trainingFrederic Barrat1-39/+41
Now that links may train in parallel, traces shown during training can be all mixed up. So add a prefix to all the traces to clearly identify the chip and link the trace refers to: OCAPI[<chip id>:<link id>]: this is a very useful message The lower-level hardware procedures (npu2-hw-procedures.c) also print traces which would need work. But that code is being reworked to be better integrated with opencapi and nvidia, so leave it alone for now. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01npu2-opencapi: Train links on fundamental resetFrederic Barrat1-113/+266
Reorder our link training steps so that they are executed on fundamental reset instead of during the initial setup. Skiboot always call a fundamental reset on all the PHBs during pci init. It is done through a state machine, similarly to what is done for 'real' PHBs. This is the first step for a longer term goal to be able to trigger an adapter reset from linux. We'll need the reset callbacks of the PHB to be defined. We have to handle the various delays differently, since a linux thread shouldn't stay stuck waiting in opal for too long. No functional changes. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01npu2-opencapi: Rework adapter resetFrederic Barrat1-20/+51
Rework a bit the code to reset the opencapi adapter: - make clearer which i2c pin is resetting which device - break the reset operation in smaller chunks. This is really to prepare for a future patch. No functional changes. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-06-01npu2-opencapi: Use presence detectionFrederic Barrat1-22/+99
Presence detection is not part of the opencapi specification. So each platform may choose to implement it the way it wants. All current platforms implement it through an i2c device where we can query a pin to know if a device is connected or not. ZZ and Zaius have a similar design and even use the same i2c information and pin numbers. However, presence detection on older ZZ planar (older than v4) doesn't work, so we don't activate it for now, until our lab systems are upgraded and it's better tested. Presence detection on witherspoon is still being worked on. It's shaping up to be quite different, so we may have to revisit the topic in a later patch. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-24SLW: Remove stop1_lite and stop2_liteAkshay Adiga1-28/+8
stop1_lite has been removed since it adds no additional benefit over stop0_lite. stop2_lite has been removed since currently it adds minimal benefit over stop2. However, the benefit is eclipsed by the time required to ungate the clocks Moreover, Lite states don't give up the SMT resources, can potentially have a performance impact on sibling threads. Since current OSs (Linux) aren't smart enough to make good decisions with these stop states, we're (temporarly) removing them from what we expose to the OS, the idea being to bring them back in a new DT representation so that only an OS that knows what to do will do things with them. Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [stewart: add to explanation] Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-24check for NULL input string in is_sai_loc_codeBalbir singh1-2/+5
Caught by scan-build, also constant-ify the input parameter. Signed-off-by: Balbir singh <bsingharora@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-24fsp/console: Always establish OPAL console API backendBenjamin Herrenschmidt1-2/+3
Currently we only call set_opal_console() to establish the backend used by the OPAL console API if we find at least one FSP serial port in HDAT. On systems where there is none (IPMI only), we fail to set it, causing the console code to try to use the dummy console causing an assertion failure during boot due to clashing on the device-tree node names. So always set it if an FSP is present Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-24cpu: Do an isync after setting LPCRBenjamin Herrenschmidt1-0/+2
This is required by the architecture and the implementations, I've observed failures to wake up on big cores without this. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-22p8-i2c: Remove force resetOliver O'Halloran1-135/+38
Force reset was added as an attempt to work around some issues with TPM devices locking up their I2C bus. In that particular case the problem was that the device would hold the SCL line down permanently due to a device firmware bug. The force reset doesn't actually do anything to alleviate the situation here, it just happens to reset the internal master state enough to make the I2C driver appear to work until something tries to access the bus again. On P9 systems with secure boot enabled there is the added problem of the "diagostic mode" not being supported on I2C masters A,B,C and D. Diagnostic mode allows the SCL and SDA lines to be driven directly by software. Without this force reset is impossible to implement. This patch removes the force reset functionality entirely since: a) it doesn't do what it's supposed to, and b) it's butt ugly code Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port(). There's no need to reset every port on a master in response to an error that occurred on a specific port. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-22p8-i2c: Allow a per-port default timeoutOliver O'Halloran1-7/+13
Add support for setting a default timeout for the I2C port to the device-tree. This is consumed by skiboot. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-22capi: Add a comment for the Transport Control RegisterChristophe Lombard1-1/+5
The transport control register needs to be loaded in two steps: Once the register values have been set, we have to write bit 63 to a '1', which loads the register values into the ci store buffer logic. Bit 63 always reads back as a zero but to load the ci store buffer values in capp the transition of 0 to 1 of bit 63 must be seen. A new comment is added in the code to avoid confusion and to precise the feature of this register. Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-11phb4: Print WOF registers on fence detectRussell Currey1-1/+7
Without the WOF registers it's hard to figure out what went wrong first, so print those when we print the FIRs when a fence is detected. Suggested-by: Mike Perez <perezma@us.ibm.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-09Update default stop-state-disable mask to cut only stop11Vaidyanathan Srinivasan1-1/+1
Stability improvements in microcode for stop4/stop5 are available in upstream hcode images. Stop4 and stop5 can be safely enabled by default. Use ~0xE0000000 to cut all but stop0,1,2 in case there are any issues with stop4/5. example: nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-09ipmi: Add BMC firmware version to device treeVasant Hegde2-1/+113
BMC Get device ID command gives BMC firmware version details. Lets add this to device tree. User space tools will use this information to display BMC version details. Stewart, I have added bmc information under /ibm,firmware-version node as its firmware version. But may be we should add new node (/bmc/firmware). So that we can keep BMC related information separately. Let me know your thoughts on this. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-06occ: Use major version number while checking the pstate table formatShilpasri G Bhat1-24/+23
The minor version increments of the pstate table are backward compatible. The minor version is changed when the pstate table remains same and the existing reserved bytes are used for pointing new data. So use only major version number while parsing the pstate table. This will allow old skiboot to parse the pstate table and handle minor version updates. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-04fsp: Fix msg vaargs usageJoel Stanley1-2/+2
hw/fsp/fsp.c:1011:17: warning: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Wvarargs] va_start(list, add_words); ^ hw/fsp/fsp.c:1007:59: note: parameter of type 'u8' (aka 'unsigned char') is declared here void fsp_fillmsg(struct fsp_msg *msg, u32 cmd_sub_mod, u8 add_words, ...) ^ [CC] platforms/ibm-fsp/apollo-pci.o hw/fsp/fsp.c:1026:17: warning: passing an object that undergoes default argument promotion to 'va_start' has undefined behavior [-Wvarargs] va_start(list, add_words); ^ hw/fsp/fsp.c:1016:47: note: parameter of type 'u8' (aka 'unsigned char') is declared here struct fsp_msg *fsp_mkmsg(u32 cmd_sub_mod, u8 add_words, ...) Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-04imc: Remove extra parentheses in testJoel Stanley1-1/+1
These make clang angry: hw/imc.c:690:29: warning: equality comparison with extraneous parentheses [-Wparentheses-equality] if ((wakeup_engine_state == WAKEUP_ENGINE_PRESENT)) { ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ hw/imc.c:690:29: note: remove extraneous parentheses around the comparison to silence this warning if ((wakeup_engine_state == WAKEUP_ENGINE_PRESENT)) { ~ ^ ~ Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-02SLW: Fix mambo boot to use stop statesAnton Blanchard1-0/+1
After commit 35c66b8ce5a2 ("SLW: Move MAMBO simulator checks to slw_init"), mambo boot no longer calls add_cpu_idle_state_properties() and as such we never enable stop states. After adding the call back, we get more testing coverage as well as faster mambo SMT boots. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-02phb4: Hardware init updatesRussell Currey1-3/+3
CFG Write Request Timeout was incorrectly set to informational and not fatal for both non-CAPI and CAPI, so set it to fatal. This was a mistake in the specification. Correcting this fixes a niche bug in escalation (which is necessary on pre-DD2.2) that can cause a checkstop due to a NCU timeout. In addition, set the values in the timeout control registers to match. This fixes an extremely rare and unreproducible bug, though the current timings don't make sense since they're higher than the NCU timeout (16) which will checkstop the machine anyway. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # CAPI Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-05-01SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMORStewart Smith3-3/+3
Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-30SBE: Add timer supportVasant Hegde1-2/+183
SBE on P9 provides one shot programmable timer facility. We can use this to implement OPAL timers and hence limit the reliance on the Linux heartbeat (similar to HW timer facility provided by SLW on P8). Design: - We will continue to run Linux heartbeat. - Each chip has SBE. This patch always schedules timer on SBE on master chip. - Start timer option starts new timer or modifies an active timer for the specified timeout. - SBE expects timeout value in microseconds. We track timeout value in TB. Hence we convert tb to microseconds before sending request to SBE. - We are requesting ack from SBE for timer message. It gaurantees that SBE has scheduled timer. - Disabling SBE timer We expect SBE to send timer expiry interrupt whenever timer expires. We wait for 10 more ms before disabling timer. In future we can consider below alternative approaches: - Presently SBE timer disable is permanent (until we reboot system). SBE sends "I'm back" interrupt after reset. We can consider restarting timer after SBE reset. - Reset SBE and start timer again. - Each chip has SBE. On multi chip system we can try to schedule timer on different chip. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-30Move P8 timer code to separate fileVasant Hegde3-184/+206
Lets move P8 timer support code from slw.c to sbe-p8.c (as suggested by BenH). There is a difference between timer support in P8 and P9. Hence I think it makes sense to name it as sbe-p8.c. Note that this is pure code movement and renaming functions/variables. No functionality changes. Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-30Add SBE driver supportVasant Hegde2-19/+679
SBE (Self Boot Engine) on P9 has two different jobs: - Boot the chip up to the point the core is functional - Provide various services like timer, scom, stash MPIPL, etc., at runtime OPAL can communicate to SBE via a set of data and control registers provided by the PSU block in P9 chip. - Four 8 byte registers for Host to send command packets to SBE - Four 8 byte registers for SBE to send response packets to Host - Two doorbell registers (1 on each side) to alert either party when data is placed in above mentioned data register Protocol constraints: Only one command is accepted in the command buffer until the response for the command is enqueued in the response buffer by SBE. Usage: We will use SBE for various purposes like timer, MPIPL, etc. This patch implements the SBE MBOX spec for OPAL to communicate with SBE. Design consideration: - Each chip has SBE. We need to track SBE messages per chip. Hence added per chip sbe structure and list of messages to that chip - SBE accepts only one command at a time. Hence serialized MBOX commands. - OPAL gets interrupted once SBE sets doorbell register - OPAL has to clear doorbell register after reading response - Every command class has timeout option. Timed out messages are discarded - SBE MBOX commands can be classified into four types : - Those that must be sent to the master only (ex: sending MDST/MDDT info) - Those that must be sent to slaves only (ex: continue MPIPL) - Those that must be sent to all chips (ex: close insecure window) - Those that can be sent to any chip (ex: timer) Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29uart: fix uart_opal_flush to take console lock over uart_con_flushNicholas Piggin1-2/+8
Cc: Russell Currey <ruscur@russell.cc> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29xive: fix missing unlock in error pathStewart Smith1-0/+1
Found with sparse and some added lock annotations. CC: stable # 5.10+ Fixes: de82c2e0e Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-29hw/slw: Don't assert on a unknown chipOliver O'Halloran1-2/+10
For some reason skiboot populates nodes in /cpus/ for the cores on chips that are deconfigured. As a result Linux includes the threads of those cores in it's set of possible CPUs in the system and attempts to set the SPR values that should be used when waking a thread from a deep sleep state. However, in the case where we have deconfigured chip we don't create a xscom node for that chip and as a result we don't have a proc_chip structure for that chip either. In turn, this results in an assertion failure when calling opal_slw_set_reg() since it expects the chip structure to exist. Fix this up and print an error instead. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-23npu2: Use ibm, loc-code rather than ibm, slot-labelOliver O'Halloran1-13/+7
The ibm,slot-label property is to name the slot that appears under a PCIe bridge. In the past we (ab)used the slot tables to attach names to GPU devices and their corresponding NVLinks which resulted in npu2.c using slot-label as a location code rather than as a way to name slots. Fix this up since it's confusing. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-23npu2/hw-procedures: fence bricks on GPU resetBalbir Singh1-7/+45
The NPU workbook defines a way of fencing a brick and getting the brick out of fence state. We do have an implementation of bringing the brick out of fenced/quiesced state. We do the latter in our procedures, but to support run time reset we need to do the former. The fencing ensures that access to memory behind the links will not lead to HMI's, but instead SUE's will be populated in cache (in the case of speculation). The expectation is then that prior to and after reset, the operating system components will flush the cache for the region of memory behind the GPU. This patch does the following: 1. Implements a npu2_dev_fence_brick() function to set/clear fence state 2. Clear FIR bits prior to clearing the fence status 3. Clear's the fence status 4. We take the powerbus out of CQ fence much later now, in credits_check() which is the last hardware procedure called after link training. Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-By: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19hw/npu2.c: Remove static configuration of NPU2 registerAlistair Popple1-12/+12
The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to select NVLink mode, however Hostboot should configure other bits in this register. For some reason Skiboot was explicitly clearing bit-6 (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared as recent Hostboot versions explicitly set it to the correct value based on the specific system configuration. Therefore Skiboot should not alter it. Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or not. Hostboot does not configure this bit so Skiboot should continue to configure it. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19npu2: Improve log output of GPU-to-link mappingReza Arbab1-4/+4
Debugging issues related to unconnected NVLinks can be a little less irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog(). In short, change this: NPU2: comparing GPU 'GPU2' and NPU2 'GPU1' NPU2: comparing GPU 'GPU3' and NPU2 'GPU1' NPU2: comparing GPU 'GPU4' and NPU2 'GPU1' NPU2: comparing GPU 'GPU5' and NPU2 'GPU1' : npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem. to this: NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1' NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1' : NPU6:0:1.0 No PCI device found for slot 'GPU1' Signed-off-by: Reza Arbab <arbab@linux.ibm.com> Reviewed-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-19sensors: Dont add DTS sensors when OCC inband sensors are availableShilpasri G Bhat1-6/+8
There are two sets of core temperature sensors today. One is DTS scom based core temperature sensors and the second group is the sensors provided by OCC. DTS is the highest temperature among the different temperature zones in the core while OCC core temperature sensors are the average temperature of the core. DTS sensors are read directly by the host by SCOMing the DTS sensors while OCC sensors are read and updated by OCC to main memory. Reading DTS sensors by SCOMing is a heavy and slower operation as compared to reading OCC sensors which is as good as reading memory. So dont add DTS sensors when OCC sensors are available. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Generate hmi event for recovered HDEC parity error.Mahesh Salgaonkar1-4/+7
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Stop flooding HMI event for TOD errors.Mahesh Salgaonkar1-2/+5
Fix the issue where every thread on the chip sends HMI event to host for TOD errors. TOD errors are reported to all the core/threads on the chip. Any one thread can fix the error and send event. Rest of the threads don't need to send HMI event unnecessarily. This patch fixes this by modifying __chiptod_recover_tod_errors() function to return -1 if no errors found. Without this change every thread that see TFMR[51]=1 sends HMI event to the host kernel. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Fix soft lockups during TOD errorsMahesh Salgaonkar1-2/+12
There are some TOD errors which do not affect working of TOD and TB. They stay in valid state. Hence we don't need rendez vous for TOD errors that does not affect TB working. TOD errors that affects TOD/TB will report a global error on TFMR[44] alongwith bit 51, and they will go in rendez vous path as expected. But the TOD errors that does not affect TB register sets only TFMR bit 51. The TFMR bit 51 is cleared when any single thread clears the TOD error. Once cleared, the bit 51 is reflected to all the cores on that chip. Any thread that reads the TFMR register after the error is cleared will see TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through rendez-vous path and threads that see TFMR[51]=0, returns doing nothing. This ends up in a soft lockups in host kernel. This patch fixes this issue by not considering TOD interrupt (TFMR[51]) as a core-global error and hence avoiding rendez-vous path completely. Instead threads that see TFMR[51]=1 will now take different path that just do the TOD error recovery. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Do not send HMI event if no errors are found.Mahesh Salgaonkar1-1/+5
For TOD errors, all the cores in the chip get HMIs. Any one thread from any core can fix the issue and TFMR will have error conditions cleared. Rest of the threads need take any action if TOD errors are already cleared. Hence thread 0 of every core should get a fresh copy of TFMR before going ahead recovery path. Initialize recover = -1, so that if no errors found that thread need not send a HMI event to linux. This helps in stop flooding host with hmi event by every thread even there are no errors found. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-17opal/hmi: Rework HMI handling of TFAC errorsBenjamin Herrenschmidt1-75/+43
This patch reworks the HMI handling for TFAC errors by introducing 4 rendez-vous points improve the thread synchronization while handling timebase errors that requires all thread to clear dirty data from TB/HDEC register before clearing the errors. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11phb4: Restore bus numbers after CRSMichael Neuling1-12/+1
Currently we restore PCIe bus numbers right after the link is up. Unfortunately as this point we haven't done CRS so config space may not be accessible. This moves the bus number restore till after CRS has happened. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11phb4: Enable the PCIe slotcap on pluggable slotsOliver O'Halloran1-0/+20
Enables reporting of slot status information, etc in the config space of the root complex. Currently this is only used to set the slot power limit in our generic PCI code, but we might use it for other things later on. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11interrupts: Create an "interrupts" property in the OPAL nodeBenjamin Herrenschmidt8-10/+10
Deprecate the old "opal-interrupts", it's still there, but the new property follows the standard and allow us to specify whether an interrupt is level or edge sensitive. Similarly create "interrupt-names" whose content is identical to "opal-interrupts-names". Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11capi: Keep the current mmio windows in the mbt cache table.Christophe Lombard1-32/+36
When the phb is used as a CAPI interface, the current mmio windows list is cleaned before adding the capi and the prefetchable memory (M64) windows, which implies that the non-prefetchable BAR is no more configured. This patch allows to set only the mbt bar to pass capi mmio window and to keep, as defined, the other mmio values (M32 and M64). Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11npu2-opencapi: Fix 'link internal error' FIR, take 2Frederic Barrat1-4/+25
When setting up an opencapi link, we set the transport muxes first, then set the PHY training config register, which includes disabling nvlink mode for the bricks. That's the order of the init sequence, as found in the NPU workbook. In reality, doing so works, but it raises 2 FIR bits in the PowerBus OLL FIR Register for the 2 links when we configure the transport muxes. Presumably because nvlink is not disabled yet and we are configuring the transport muxes for opencapi. bit 60: link0 internal error bit 61: link1 internal error Overall the current setup ends up being correct and everything works, but we raise 2 FIR bits. So tweak the order of operations to disable nvlink before configuring the transport muxes. Incidentally, this is what the scripts from the opencapi enablement team were doing all along. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11npu2-opencapi: Fix 'link internal error' FIR, take 1Frederic Barrat1-3/+17
When we setup a link, we always enable ODL0 and ODL1 at the same time in the PHY training config register, even though we are setting up only one OTL/ODL, so it raises a "link internal error" FIR bit in the PowerBus OLL FIR Register for the second link. The error is harmless, as we'll eventually setup the second link, but there's no reason to raise that FIR bit. The fix is simply to only enable the ODL we are using for the link. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11occ: sensors-groups: Add DT properties to mark HWMON sensor groupsShilpasri G Bhat1-3/+14
Fix the sensor type to match HWMON sensor types. Add compatible flag to indicate the environmental sensor groups so that operations on these groups can be handled by HWMON linux interface. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.ibm.com>
2018-04-11Disable stop states from OPALStewart Smith1-0/+23
On ZZ, stop4,5,11 are enabled for PHYP, even though doing so may cause problems with OPAL due to bugs in hcode. For other platforms, this isn't so much of an issue as we can just control stop states by the MRW. However the rebuild-the-world approach to changing values there is a bit annoying if you just want to rule out a specific stop state from being problematic. Provide an nvram option to override what's disabled in OPAL. The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2) You can set an NVRAM override with: nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF This nvram override will disable *all* stop states. Signed-off-by: Stewart Smith <stewart@linux.ibm.com>