aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-03-16hw/fsp: Do not queue SP and SPCN class messages during reset/reloadskiboot-5.3.xAnanth N Mavinakayanahalli4-0/+32
During FSP R/R, the FSP is inaccessible and will lose state. Messages to the FSP are generally queued for sending later. It does seem like the FSP fails to process any subseuqent messages of certain classes (SP info -- ipmi) if it receives queued mbox messages it isn't expecting. In certain other cases (sensors), the FSP driver returns a default code (async completion) even though there is no known bound from the time of this error return to the actual data being available. The kernel driver keeps waiting leading to soft-lockup on the host side. Mitigate both these (known) cases by returning OPAL_BUSY so the host driver knows to retry later. With this change, the sensors command works fine when the FSP comes back. This version also resolves the remaining IPMI issues Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4940b8148640c06e139aec8c6d0370af7dd3b184) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24Limit number of "Poller recursion detected" errors to displayStewart Smith1-1/+5
In some error conditions, we could spiral out of control on this and spend all of our time printing the exact same backtrace. Limit it to 16 times, because 16 is a nice number. Cc: stable Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit b6a729e118f42dae88ebf70a09a7e2aa4f788fdc) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-11-24fsp: Don't recurse pollers in ibm_fsp_terminateStewart Smith3-1/+45
If we were to terminate in a poller, we'd call op_display() which called pollers which hit the recursive poller warning, which ended in not much fun at all. This patch will skip the running of pollers and instead run the FSP poller to set the op-panel display before attn. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 9fcb109218b1374a8caa3cac62e83fbedb1f7f2f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-24pci: Check power state before powering off slotGavin Shan1-1/+17
I made the inappropriate assumption: PCI slot's power state is always on from the beginning. We don't check the slot's power state before turning it off in PCI enumeration path when there are no PCI adapters behind the slot. The PCI slot's power might have been turned off and we needn't power it off again. Otherwise, the below (not harmful) message is raised: [ 47.243635711,5] SkiBoot skiboot-5.4.0-rc1 starting... : [ 13.239871630,5] PHB#0001:02:01.0 Error -1 powering off slot This checks power state and avoid turning it off again if it's already in off state. Flag PCI_SLOT_FLAG_BOOTUP is also removed after the requested operation is completed as the flag should be used at skiboot booting stage. Cc: stable # 5.3.0+ Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 1408f6f9baa684f280dfb2c4a66daa4d5db996b2) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17hw/phb3: Override root slot's prepare_link_change() with PHB'sGavin Shan1-0/+8
For PCI slot behind root port, its prepare_link_change() should be same to PHB's. Otherwise, the UTL events cannot be masked when the slot is reseted, leading to EEH error because of UTL link-down event. Cc: stable # 5.3.0+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 2e19de4c2f4b47f0259ba265ea7d1786f1b44cd7) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17core/pci: Reserve PCI buses for RC's slotGavin Shan1-2/+21
When RC's downstream link is down, we need reserve spare PCI buses if it has an associated PCI hotplug slot. Otherwise, the adapter behind it can't be probed successfully in PCI hot add scenario. This reserves all available buses (to 255) for RC's hotplug slot when its downstream is down so that PCI adapter can be hot added to the slot afterwards. Cc: stable # 5.3.0+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 481ad7330e332770b1dcd2c9f56d0a2caac67755) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-17core/pci: Get PCI slot before applying quirkGavin Shan1-12/+10
We might need know the associated PCI slot before applying the chip level quirk (phb->ops->device_init()) so that special configuration on the specific PCI slot can be applied. This moves the logic of creating PCI slot, applying the quirk and linking the newly probed device to parent's child list to function pci_scan_one(). Also, the PCI slot is created prior to applying the quirk. Cc: stable # 5.3.0+ Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 39aad95618fea977464bfc38ec0c190075a26304) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-12skiboot-5.3.7 release notesskiboot-5.3.7Stewart Smith1-0/+75
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-10-12pci: Avoid hot resets at boot timeRussell Currey4-53/+10
In the PCI post-fundamental reset code, a hot reset is performed at the end. This is causing issues at boot time as a reset signal is being sent downstream before the links are up, which is causing issues on adapters behind switches. No errors result in skiboot, but the adapters are not usable in Linux as a result. Hot resets also occur in the FSP platform-specific code for conventional PCI slots, which could cause issues. This patch fixes some adapters not being configurable in Linux on some systems. The issue was not present in skiboot 5.2.x. Cc: stable # 5.3.x Signed-off-by: Russell Currey <ruscur@russell.cc> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 5ac71c9b9f7a6c7dc909bdcf121f8d1d11a10dc2) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-28occ/prd/opal-prd: Queue OCC_RESET event message to host in OpenPOWERShilpasri G Bhat5-23/+45
During an OCC reset cycle the system is forced to Psafe pstate. When OCC becomes active, the system has to be restored to its last pstate as requested by host. So host needs to be notified of OCC_RESET event or else system will continue to remian in Psafe state until host requests a new pstate after the OCC reset cycle. This patch defines 'OPAL_PRD_MSG_TYPE_OCC_RESET_NOTIFY' to notify OPAL when opal-prd issues OCC reset. OPAL will queue OCC_RESET message to host when it receives opal_prd_msg of type '*_OCC_RESET_NOTIFY'. Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit b5e54375bdc424eb2e709d41d2306d854f7e07bb) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-28platforms/firenze: Fix clock frequency dt propertyVasant Hegde1-6/+5
Commit 5cda6f6d added 8 byte property instead of 4 byte..which resulted in below calltrace. I think its fine to convert u64 to u32 here as we devide bus frequency by 4. Backtrace: ---------- [ 1.212366090,3] DT: Unexpected property length /xscom@3fc0000000000/i2cm@a0020/clock-frequency [ 1.212369108,3] DT: Expected len: 4 got len: 8 [ 1.212370117,0] Assert fail: core/device.c:603:0 [ 1.212371550,0] Aborting! CPU 0870 Backtrace: S: 0000000033dc39e0 R: 0000000030013758 .backtrace+0x24 S: 0000000033dc3a60 R: 0000000030018e0c ._abort+0x4c S: 0000000033dc3ae0 R: 0000000030018e88 .assert_fail+0x34 S: 0000000033dc3b60 R: 0000000030023da4 .dt_require_property+0xb4 S: 0000000033dc3bf0 R: 000000003002403c .dt_prop_get_u32+0x14 S: 0000000033dc3c60 R: 000000003004e884 .p8_i2c_init+0x12c S: 0000000033dc3e30 R: 0000000030014684 .main_cpu_entry+0x4a8 S: 0000000033dc3f00 R: 00000000300025a0 boot_entry+0x198 Fixes: 5cda6f6d (platforms/firenze: Fix I2C clock source frequency) Fixes: 5acf424a (HDAT: Fix typo in nest-frequency property) Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> CC: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 94125da004650df0133e7dbdbd8c3833c53b4902) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-27HDAT: Fix typo in nest-frequency propertyVasant Hegde3-3/+9
nest-frquency -> nest-frequency Fixes: 5cda6f6d (platforms/firenze: Fix I2C clock source frequency) CC: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 5acf424ad6e1376f0513262b5f9ffdd00e83a94a) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22gard/pflash: Honor linker flags passed from the environmentFrederic Bonnard2-2/+2
Let use LDFLAGS from the environment for gard and pflash Debian/Ubuntu use this mechanism to do hardened builds. Signed-off-by: Frederic Bonnard <frediz@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: also use LDFLAGS for pflash] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 011c0acbce0e15abb55482a67a56d4188cef1147) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22external: Utilize DESTDIR in shared makefilePatrick Williams1-6/+6
Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 03b21dc7a1cf8da546250a99c76c2cbb34d2da39) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22gard: Fix Makefile race conditionPatrick Williams1-1/+5
Commit fd599965 added some dependencies in 'external/pflash' for libflash files that are created via symlink. Replicate that same behavior in 'external/gard' to prevent race conditions where we attempt to compile files from libflash before they are symlink'd. Signed-off-by: Patrick Williams <patrick@stwcx.xyz> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit cd72d0fbdd7184ee88e3a55b0eda089cf7d8528d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-22core/pci: Fix the power-off timeout in pci_slot_power_off()Gavin Shan1-1/+1
The timeout should be 1000ms instead of 1000 ticks while powering off PCI slot in pci_slot_power_off(). Otherwise, it's likely to hit timeout powering off the PCI slot as below skiboot logs reveal: [47912590456,5] SkiBoot skiboot-5.3.6 starting... : [5399532365,7] PHB#0005:02:11.0 Bus 0f..ff scanning... [5399540804,7] PHB#0005:02:11.0 No card in slot [5399576870,5] PHB#0005:02:11.0 Timeout powering off slot [5401431782,3] FIRENZE-PCI: Wrong state 00000000 on slot 8000000002880005 This replaces time_wait() with time_wait_ms() to resolve the issue. Fixes: 358b4d654f100cfdfcba939cae012099a851b3bc Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 171726631d014cd5e61170f06028474d900a827e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-17Merge tag 'skiboot-5.3.6' into skiboot-5.3.xStewart Smith2-1/+17
skiboot-5.3.6
2016-09-17skiboot 5.3.6 release notesskiboot-5.3.6Stewart Smith1-0/+16
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-17SLW: Actually print the register dump only to memoryStewart Smith1-1/+1
Fixes: 81154ba9b2d418cd5f9eda3a6f89ca6631556510 Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit af569fa2aebd52748f3056fe3104b16991a3f6de) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-16opal-prd: Fix error code from scom_read & scom_writeBenjamin Herrenschmidt2-6/+19
Currently, we always return a zero value from scom_read & scom_write, so the HBRT implementation has no way of detecting errors during scom operations. This change uses the actual return value from the scom operation from the kernel instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 925f03c184f42104a1ebd676f4d3c3d50b8a44b8) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-16opal-prd: Add get_interface_capabilities to host interfacesJeremy Kerr3-2/+22
We need a way to indicate behaviour changes & fixes in the prd interface, without requiring a major version bump. This change introduces the get_interface_capabilities callback, returning a bitmask of capability flags, pertaining to 'sets' of capabilities. We currently return 0 for all. Signed-off-by: Jeremy Kerr <jk@ozlabs.org> [stewart@linux.vnet.ibm.com: Dan Crowell says "The interface looks good to me"] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 3d41d2831a05d531f152e0f9c7fc223a82aee128) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-14pflash and opal-prd: Fix dist-cleanBreno Leitao2-1/+2
Currently pflash and opal-prd do not return to its original tree after a 'make distclean'. I understand that distclean should return the tree to its original state, so, the make can restart from scratch. On Debian[1], we need to clear these remaining file 'manually' to make the build 'reproducible' (two builds in sequence). [1] https://packages.debian.org/source/sid/skiboot Signed-off-by: Breno Leitao <breno.leitao@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit f22edb88cefdfbcb19ce6d296ccc4d8f1fbbc73e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-14hw/npu.c: Fix reserved PE#Alistair Popple1-3/+2
Currently the reserved PE is set to NPU_NUM_OF_PES, which is one greater than the maximum PE resulting in the following kernel errors at boot: [ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#4 [ 0.000000] pnv_ioda_reserve_pe: Invalid PE 4 on PHB#5 Due to a HW errata PE#0 is already reserved in the kernel, so update the opal-reserved-pe device-tree property to match this. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4802d470208093139e06b126868b76e1dc0864a1) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-14skiboot 5.3.5 release notesskiboot-5.3.5Stewart Smith1-0/+16
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-14centaur: print message on disabling xscoms to centaur due to many errorsStewart Smith1-1/+13
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit aa341a3fbf23dc7c74a93ecff8662688a063cb8b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-14slw: improve error message for SLW timer stuckStewart Smith1-3/+21
We still register dump, but only to in memory console buffer by default. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 81154ba9b2d418cd5f9eda3a6f89ca6631556510) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-13add skiboot-5.3.4 release notesskiboot-5.3.4Stewart Smith1-0/+22
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-13centaur: Mark centaur offline after 10 consecutive access errorsBenjamin Herrenschmidt3-0/+31
This avoids spamming the logs when the centaur is dead and PRD constantly tries to access it Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 459a7e1f012df02393c0bc0a0885024e044cbafd) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-13xscom: Map all HMER status codes to OPAL errorsBenjamin Herrenschmidt2-3/+20
Instead of mapping them to just 3 different codes, define an OPAL error code for all known HMER error status, as different recovery path might be needed at the call site, and it allows for more informative logging. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d6a64f99f3c9c39b00d0821cc04dc9a51fe06490) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-13xscom: Initialize the data to a known value in xscom_readBenjamin Herrenschmidt1-0/+7
In case of error, don't leave the data random. It helps debugging when the user fails to check the error code. This happens due to a bug in the PRD wrapper app. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit be4843f47baa5f1b36c2c6e7ad6bc4743a8bc43f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-05nvlink: Fix bad PE number checkRussell Currey1-1/+1
NPUs have 4 PEs which are zero indexed, so {0, 1, 2, 3}. A bad PE number check in npu_err_inject checks if the PE number is greater than 4 as a fail case, so it would wrongly perform operations on a non-existant PE 4. Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Cc: stable Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 731d1a8d680bd0bf649cf0c4668a449e299e5f55) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-02skiboot 5.3.3 release notesskiboot-5.3.3Stewart Smith1-0/+19
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-02hw/npu: assert the NPU irq min is aligned.Milton Miller1-1/+3
The hardware enforces the buid range is on a 16 irsn boundary even though there are only 8 irqs. Enforce that here and show where the value comes from when programming the lsi source id field in the npu register block. Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit ac83440c8241902d2c32410050a2fd1e96b20fcf) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-09-02hw/npu: program NPU BUID reg properlyMilton Miller2-8/+16
The NPU BUID register was incorrectly programmed resulting in npu interrupt level 0 causing a PB_CENT_CRESP_ADDR_ERROR checkstop, and irqs from npus in odd chips being aliased to and processed as the interrupts from the corresponding npu on the even chips. The documentation for the BUID register is confusing, describing required values of some bits and bits of differing meaning within contained within one field. This patch seperates the per-irq-level irq enable mask from the documented buid base field, leaving the buid base as the part that is directly compared. It documents the buid as the boundary of a block of 16 sources (in the form of a 4 bit shift), and documents that some bits are sourced from another register and are always compared to that register, so they are not required to be set in the base and mask fields. Fixes: cc61799 Nvlink: Add NPU PHB functions Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 8f67ee3b7fa573885c2bda34c7934418e12287db) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-26Add skiboot-5.3.2 release notesskiboot-5.3.2Stewart Smith1-0/+28
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-25opal/hmi: Fix a TOD HMI failure during a race condition.Mahesh Salgaonkar1-0/+7
There are chances where another interrupt can wake a CPU in 0x100 vector just when HMI for TOD error is also pending. In such a rare race condition if CPU has woken up with tb_loss power saving mode, it will invoke opal call to resync the TB. Since TOD is already in error state, resync TB will timeout leaving TFMR bit 18 set to '1'. (TFMR[18]=1 means TB is prepared to receive new value from TOD. Once the new value is received this bit gets reset to '0', otherwise TB would stay in waiting state). When HMI is delivered, it may find all TFMR errors are already cleared but would fail to restore TB since TFMR bit 18 is already set. This leads to HMI recovery failure causing a kernel crash. This patch fixes this by clearing of TB errors if TFMR[18] is set to 1. This makes sure that TB is in clean state before TB restore process starts. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 026b9a13bf8d61a7e72721d59961b40cbc98b410) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-25lpc: Log LPC SYNC errors as unrecoverable ones for manufacturingVipin K Parashar4-6/+32
High volume of SYNC errors onto LPC bus cause degraded system performance and are likely due to bad hardware present onto system. Thus once LPC SYNC errors cross a certain threshold, OPAL should log them onto BMC as unrecoverable errors in manufacturing mode. This will help manufacturing screen bad parts, causing such errors. Cc: stable Signed-off-by: Vipin K Parashar <vipin@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: s/mfg/manufacturing/] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 51b9eeb66ebbd1706248d8f2277afa9b7dcdbc3b) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-24hw/phb3: Update capi initialization sequenceFrederic Barrat1-1/+2
The capi initialization sequence was revised in a circumvention document when a 'link down' error was converted from fatal to Endpoint Recoverable. Other, non-capi, register setup was corrected even before the initial open-source release of skiboot, but a few capi-related registers were not updated then, so this patch fixes it. The point is that a link-down error detected by the UTL logic will lead to an AIB fence, so that the CAPP unit can detect the error. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e36f4f219b642c6c5032208fca7191fbd75fe1a3) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10Add skiboot 5.3.1 release notesskiboot-5.3.1Stewart Smith1-0/+36
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10FSP/ELOG: elog_enable flag should be false by defaultMukesh Ojha1-3/+1
This issue is one of the corner case, which is related to recent change went upstream and only observed in the petitboot prompt, where we see only one error log instead of getting all error log in /sys/firmware/opal/elog. Below is snippet of the code, where elog module in the kernel initialised. { .. ... rc = request_threaded_irq(irq, NULL, elog_event, =<======= IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); | if (rc) { | pr_err("%s: Can't request OPAL event irq (%d)\n", | __func__, rc); | return rc; | } | /* We are now ready to pull error logs from opal. */ | if (opal_check_token(OPAL_ELOG_RESEND)) | opal_resend_pending_logs(); =<======= } Scenario: While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from OPAL, whenever it has error logs that are waiting to be fetched from the kernel. Race occurs between the code arrowed above, as soon as kernel registers error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on the error log set the state from ELOG_STATE_FETCHED_DATA to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During the same time 'opal_resend_pending_logs'(kernel) call which will set the state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL. Because of that, read call from the kernel, which was to be made after the 'opal_get_elog_size' ends up failing. But, the elog kobject was created for the particular error log. Further in the resend routine in the OPAL, we make opal_commit_elog_in_host() call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes 'opal_get_elog_size' which results in getting the error log info of the same error log which was fetched earlier. It also changes the state machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. Below is the snippet from the elog_event registered handler call { ... ... /* we may get notified twice, let's handle * that gracefully and not create two conflicting * entries. */ if (kset_find_obj(elog_kset, name)) return IRQ_HANDLED; ... ... } In the kernel, we search kobject for the error log whether it already exist. So kobject is found and it returns without reading error log data. So, this patch makes the flag which was true during initialisation to false. And that solves the race. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 218f4ae791c6f66532579d06a0bfe45e56bb3c4e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10npu: reword "error" to indicate it's actually a warningStewart Smith1-6/+1
Confirmed with Alistair on IRC, and earlier discussions with Russell. Basically, I was a bit of an idiot and didn't think hard enough before adding the FWTS annotation. Without this patch, you get spurious FirmWare Test Suite (FWTS) warnings about NVLink not working on machines that aren't fully populated with GPUs. Fixes: 00e3e275344a42f6a682be72c88c015df87a0e28 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a339d4779a6a382c0e197177b0142d62e26a6416) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10Make: Add skiboot.lid.xz to make cleanVasant Hegde1-1/+1
Fixes: 5fc07eaa (Produce XZ compressed skiboot.lid as part of build) CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 742b7226124b140f129d181c07d23dbe36c18ea3) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10hmi: Clean up NPU FIR debug messagesRussell Currey1-3/+4
With the skiboot log set to debug, the FIR (and related registers) were logged all in the same message. It was too much for one line, didn't clarify if the numbers were in hex, and didn't show leading zeroes. So, split it into two lines, with leading zeroes and a "0x" prefix. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4eabfa056562e144c1a011bf4159387337023659) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-09asm: Fix backtrace for unexpected exceptionMichael Neuling1-1/+0
If we take an unknown exception at boot time we attempt to put the exception vector in the back trace, The result looks like this (when we take an 0x700): S: 0000000031e838a0 R: 000000003001365c .backtrace+0x38 S: 0000000031e83930 R: 00000000300186cc ._abort+0x4c S: 0000000031e839b0 R: 0000000030023a78 .exception_entry+0x114 S: 0000000031e83a40 R: 0000000000001f04 * +0x1f04 S: 0000000031e83c10 R: 0000000000000700 * +0x700 S: 0000000031e83e30 R: 0000000030014444 .main_cpu_entry+0x444 S: 0000000031e83f00 R: 000000003000259c boot_entry+0x19c We overwrite link address in the current stack frame with the exception vector (ie. 0x700 in the above example). Unfortunately this overrides the location that caused the exception, which is much more useful information in debugging the problem. This patch removes the writing link register in the current stack frame, so the back trace now looks like this: S: 0000000031da38a0 R: 000000003001365c .backtrace+0x38 S: 0000000031da3930 R: 00000000300186cc ._abort+0x4c S: 0000000031da39b0 R: 0000000030023a78 .exception_entry+0x114 S: 0000000031da3a40 R: 0000000000001f00 * +0x1f00 S: 0000000031da3c10 R: 00000000300323f8 .psi_init+0x1f4 S: 0000000031da3e30 R: 0000000030014444 .main_cpu_entry+0x444 S: 0000000031da3f00 R: 000000003000259c boot_entry+0x19c This loses the exception vector from the back trace, but this information is already available in the exception dump just above it Suggestion by benh. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e8c3f4ce21c24eee58489149769e84315d4d647d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-02log_level: Reduce the in memory console log_level to lower priorityPridhiviraj Paidipeddi3-3/+3
Below are the in-memory console log messages observed with error level(PR_ERROR) [54460318,3] HBRT: Mem region 'ibm,homer-image' not found ! [54465404,3] HBRT: Mem region 'ibm,homer-image' not found ! [54470372,3] HBRT: Mem region 'ibm,homer-image' not found ! [54475369,3] HBRT: Mem region 'ibm,homer-image' not found ! [11540917382,3] NVRAM: Layout appears sane [11694529822,3] OPAL: Trying a CPU re-init with flags: 0x2 [61291003267,3] OPAL: Trying a CPU re-init with flags: 0x1 [61394005956,3] OPAL: Trying a CPU re-init with flags: 0x2 Lowering the log level of mem region not found messages to PR_WARNING and remaining messages to PR_INFO level [54811683,4] HBRT: Mem region 'ibm,homer-image' not found ! [10923382751,6] NVRAM: Layout appears sane [55533988976,6] OPAL: Trying a CPU re-init with flags: 0x1 Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 341daa8104af3231b908e6fcffeedb5e47b33990) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-02Add skiboot-5.3.0 release notesskiboot-5.3.0Stewart Smith1-0/+16
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-01Adopt libtool rules for soname versioning for libflashStewart Smith1-1/+18
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28Merge skiboot 5.2.5 release notesStewart Smith1-0/+37
2016-07-28Add skiboot-5.2.5 release notesskiboot-5.2.5Stewart Smith1-0/+37
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28Merge branch 'skiboot-5.2.x'Stewart Smith0-0/+0