aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-08-10Add skiboot 5.3.1 release notesskiboot-5.3.1Stewart Smith1-0/+36
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10FSP/ELOG: elog_enable flag should be false by defaultMukesh Ojha1-3/+1
This issue is one of the corner case, which is related to recent change went upstream and only observed in the petitboot prompt, where we see only one error log instead of getting all error log in /sys/firmware/opal/elog. Below is snippet of the code, where elog module in the kernel initialised. { .. ... rc = request_threaded_irq(irq, NULL, elog_event, =<======= IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); | if (rc) { | pr_err("%s: Can't request OPAL event irq (%d)\n", | __func__, rc); | return rc; | } | /* We are now ready to pull error logs from opal. */ | if (opal_check_token(OPAL_ELOG_RESEND)) | opal_resend_pending_logs(); =<======= } Scenario: While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from OPAL, whenever it has error logs that are waiting to be fetched from the kernel. Race occurs between the code arrowed above, as soon as kernel registers error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on the error log set the state from ELOG_STATE_FETCHED_DATA to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During the same time 'opal_resend_pending_logs'(kernel) call which will set the state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL. Because of that, read call from the kernel, which was to be made after the 'opal_get_elog_size' ends up failing. But, the elog kobject was created for the particular error log. Further in the resend routine in the OPAL, we make opal_commit_elog_in_host() call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes 'opal_get_elog_size' which results in getting the error log info of the same error log which was fetched earlier. It also changes the state machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. Below is the snippet from the elog_event registered handler call { ... ... /* we may get notified twice, let's handle * that gracefully and not create two conflicting * entries. */ if (kset_find_obj(elog_kset, name)) return IRQ_HANDLED; ... ... } In the kernel, we search kobject for the error log whether it already exist. So kobject is found and it returns without reading error log data. So, this patch makes the flag which was true during initialisation to false. And that solves the race. Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 218f4ae791c6f66532579d06a0bfe45e56bb3c4e) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10npu: reword "error" to indicate it's actually a warningStewart Smith1-6/+1
Confirmed with Alistair on IRC, and earlier discussions with Russell. Basically, I was a bit of an idiot and didn't think hard enough before adding the FWTS annotation. Without this patch, you get spurious FirmWare Test Suite (FWTS) warnings about NVLink not working on machines that aren't fully populated with GPUs. Fixes: 00e3e275344a42f6a682be72c88c015df87a0e28 Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a339d4779a6a382c0e197177b0142d62e26a6416) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10Make: Add skiboot.lid.xz to make cleanVasant Hegde1-1/+1
Fixes: 5fc07eaa (Produce XZ compressed skiboot.lid as part of build) CC: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 742b7226124b140f129d181c07d23dbe36c18ea3) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-10hmi: Clean up NPU FIR debug messagesRussell Currey1-3/+4
With the skiboot log set to debug, the FIR (and related registers) were logged all in the same message. It was too much for one line, didn't clarify if the numbers were in hex, and didn't show leading zeroes. So, split it into two lines, with leading zeroes and a "0x" prefix. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 4eabfa056562e144c1a011bf4159387337023659) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-09asm: Fix backtrace for unexpected exceptionMichael Neuling1-1/+0
If we take an unknown exception at boot time we attempt to put the exception vector in the back trace, The result looks like this (when we take an 0x700): S: 0000000031e838a0 R: 000000003001365c .backtrace+0x38 S: 0000000031e83930 R: 00000000300186cc ._abort+0x4c S: 0000000031e839b0 R: 0000000030023a78 .exception_entry+0x114 S: 0000000031e83a40 R: 0000000000001f04 * +0x1f04 S: 0000000031e83c10 R: 0000000000000700 * +0x700 S: 0000000031e83e30 R: 0000000030014444 .main_cpu_entry+0x444 S: 0000000031e83f00 R: 000000003000259c boot_entry+0x19c We overwrite link address in the current stack frame with the exception vector (ie. 0x700 in the above example). Unfortunately this overrides the location that caused the exception, which is much more useful information in debugging the problem. This patch removes the writing link register in the current stack frame, so the back trace now looks like this: S: 0000000031da38a0 R: 000000003001365c .backtrace+0x38 S: 0000000031da3930 R: 00000000300186cc ._abort+0x4c S: 0000000031da39b0 R: 0000000030023a78 .exception_entry+0x114 S: 0000000031da3a40 R: 0000000000001f00 * +0x1f00 S: 0000000031da3c10 R: 00000000300323f8 .psi_init+0x1f4 S: 0000000031da3e30 R: 0000000030014444 .main_cpu_entry+0x444 S: 0000000031da3f00 R: 000000003000259c boot_entry+0x19c This loses the exception vector from the back trace, but this information is already available in the exception dump just above it Suggestion by benh. Signed-off-by: Michael Neuling <mikey@neuling.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e8c3f4ce21c24eee58489149769e84315d4d647d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-02log_level: Reduce the in memory console log_level to lower priorityPridhiviraj Paidipeddi3-3/+3
Below are the in-memory console log messages observed with error level(PR_ERROR) [54460318,3] HBRT: Mem region 'ibm,homer-image' not found ! [54465404,3] HBRT: Mem region 'ibm,homer-image' not found ! [54470372,3] HBRT: Mem region 'ibm,homer-image' not found ! [54475369,3] HBRT: Mem region 'ibm,homer-image' not found ! [11540917382,3] NVRAM: Layout appears sane [11694529822,3] OPAL: Trying a CPU re-init with flags: 0x2 [61291003267,3] OPAL: Trying a CPU re-init with flags: 0x1 [61394005956,3] OPAL: Trying a CPU re-init with flags: 0x2 Lowering the log level of mem region not found messages to PR_WARNING and remaining messages to PR_INFO level [54811683,4] HBRT: Mem region 'ibm,homer-image' not found ! [10923382751,6] NVRAM: Layout appears sane [55533988976,6] OPAL: Trying a CPU re-init with flags: 0x1 Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 341daa8104af3231b908e6fcffeedb5e47b33990) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-02Add skiboot-5.3.0 release notesskiboot-5.3.0Stewart Smith1-0/+16
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-08-01Adopt libtool rules for soname versioning for libflashStewart Smith1-1/+18
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28Merge skiboot 5.2.5 release notesStewart Smith1-0/+37
2016-07-28Add skiboot-5.2.5 release notesskiboot-5.2.5Stewart Smith1-0/+37
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28Merge branch 'skiboot-5.2.x'Stewart Smith0-0/+0
2016-07-28pflash: Fix the makefileJoel Stanley1-9/+17
Someone was a bit too keen with the cleanups last time. Restore the ability for pflash to build in non-shared mode. Fixes: c327eddd9b29 (pflash: Clean up makefiles and resolve build race) Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit fd599965f723330da5ec55519c20cdb6aa2b3a2d) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28Add skiboot-5.3.0-rc2 release notesskiboot-5.3.0-rc2Stewart Smith1-0/+41
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28pflash: Fix the makefileJoel Stanley1-9/+17
Someone was a bit too keen with the cleanups last time. Restore the ability for pflash to build in non-shared mode. Fixes: c327eddd9b29 (pflash: Clean up makefiles and resolve build race) Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28core/flash: Fix passing pointer instead of valueCyril Bur1-1/+1
flash_find_subpartition() accepts a pointer to a boolean variable indicating ecc for a region of flash and passes the pointer directly to flash_read_corrected() which actually only wants the value. This has always worked probably because there has always been ECC on sub partitions. How there aren't any warnings triggered by this condition escapes me. Fixes: 6c26bc7 ("libflash: move ffs_flash_read into libflash") Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28mambo: Update Radix Tree Size as per ISA 3.0Michael Neuling1-1/+1
Fix Radix Tree Size (RTS) encoding as per ISA 3.0. This is controlled via a SIM_CTRL1 bit in mambo. In Linux we recently changed to this encoding, so we no longer boot. The associated Linux commit is: commit b23d9c5b9c83c05e013aa52460f12a8365062cf4 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Fri Jun 17 11:40:36 2016 +0530 powerpc/mm/radix: Update Radix tree size as per ISA 3.0 Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28pflash: use atexit for musl compatibilityJoel Stanley1-3/+2
I accidentally built myself a cross-toolchain with the musl libc. It does not support on_exit which we use to clean up in pflash. Instead use atexit with is supported by both uclibc, musl and glibc. Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28include/errorlog.h : Renames SRC component's macro nameMukesh Ojha1-134/+134
It replaces two letter SRC components macro name with some meaningful components name to make it more legible. E.g: OPAL_XS => OPAL_SRC_COMPONENT_XSCOM Signed-off-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28platforms/ibm-fsp: Fix incorrect struct member access and comparisonSuraj Jitindar Singh1-1/+1
For a 1004 slot mapping bit 6 (0x40) of the P0 field represents the pwr_ctl bit. This code previously accessed the wrong field (power_ctl) which is a single bit which corresponds to the 1005 mapping (which is the wrong mapping), performed a bitwise and with 0x40 (which will always be 0), and then compared to 1 (which will also always be 0). Fix this to access the byte struct member, bitwise and with 0x40 to mask the power_ctl bit, and double negate to guarantee 0 or 1 result. Fixes: Coverity Bug #97820 Fixes: 6884fe63 ("platforms/ibm-fsp: Support PCI slot") Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28FSP/MDST: Fix TCE alignment issueVasant Hegde1-1/+1
We have used TCE_MASK value (4095) instead of TCE_PSIZE (4096) to align memory source address. In some corner cases (like source memory size = 4097) we may endup doing wrong mapping and corrupting part of SYSDUMP. This patch uses ALIGN_UP macro with TCE_PSIZE value for alignining memory. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-28hdat/vpd: Add chip-id property to processor chip node under vpdVasant Hegde3-4/+17
We have core information under /cpus node and processor chip VPD data (like SN, LN, etc) under /vpd directory. Presently we don't have any property to relate cores and chip information. This patch adds chip-id information for processor chip nodes under vpd. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-27nvram: Add extra debug printing when NVRAM needs formattingCyril Bur2-4/+11
Be more verbose (at debug level) when formatting the NVRAM, this can help catch errors at other levels of the stack. Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-27.gitignore: Add vgcore.*Cyril Bur1-0/+1
Signed-off-by: Cyril Bur <cyril.bur@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-27pflash: Clean up makefiles and resolve build raceJoel Stanley3-21/+20
The pflash build process has regressed from when the were last fixed in 6c21c4ffaf82. This patch resolves that issue and performs some cleanups: - Remove duplicated rules. Patches had moved rules into common files, but forgotten to remove them from the pflash makefiles. - Make assignements simply expanded variables where possible. Form the make manual: Functions referenced in the definition will be executed every time the variable is expanded. This makes make run slower; worse, it causes the wildcard and shell functions to give unpredictable results because you cannot easily control when they are called, or even how many times. To avoid all the problems and inconveniences of recursively expanded variables, there is another flavor: simply expanded variables. - set the 'shared' target as a dependency of the libflash objects. This was the final piece to resolve the race condition. The failed build could be reproduced by doing a `git clean -f -x` and then running the following: $ make -j 32 CROSS_COMPILE=arm-linux-gnueabi- SKIBOOT_VERSION=5.2.4 PFLASH_VERSION=5.2.4 V=1 -C external/pflash all LINKAGE=dynamic make: Entering directory '/home/joel/dev/skiboot/external/pflash' ln -sf ../../libflash ./libflash ln -sf ../../ccan ./ccan ln -sf ../common ./common cc -O2 -Wall -I. -c pflash.c -o pflash.o cc -O2 -Wall -I. -c progress.c -o progress.o make -C ../shared make[1]: Entering directory '/home/joel/dev/skiboot/external/shared' ln -sf ../../hw/ast-bmc/ast-sf-ctrl.c common/ast-sf-ctrl.c ln -sf ../../include/ast.h common/ast.h ln -sf arch_flash_arm_io.h common/io.h cc -O2 -Wall -I. -c common/arch_flash_common.c -o common-arch_flash_common.o cc -O2 -Wall -I. -c common/arch_flash_arm.c -o common-arch_flash_arm.o cc -O2 -Wall -I. -c common/ast-sf-ctrl.c -o common-ast-sf-ctrl.o cc -O2 -Wall -I. -c version.c -o version.o ld -r common-arch_flash_common.o common-arch_flash_arm.o common-ast-sf-ctrl.o -o common-arch_flash.o ln -sf ../../libflash ./libflash ln -sf ../../ccan ./ccan ln -sf ../common ./common make[1]: *** No rule to make target 'libflash/file.c', needed by 'libflash-file.o'. Stop. make[1]: *** Waiting for unfinished jobs.... make[1]: Leaving directory '/home/joel/dev/skiboot/external/shared' rules.mk:25: recipe for target '../shared/libflash.so.skiboot-5.2.4-1-g9f13f64c322f-joel-dirty-d5873ce' failed make: *** [../shared/libflash.so.skiboot-5.2.4-1-g9f13f64c322f-joel-dirty-d5873ce] Error 2 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit c327eddd9b291a0e6e54001fa3b1e547bad3fca2) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-27pflash: Clean up makefiles and resolve build raceJoel Stanley3-21/+20
The pflash build process has regressed from when the were last fixed in 6c21c4ffaf82. This patch resolves that issue and performs some cleanups: - Remove duplicated rules. Patches had moved rules into common files, but forgotten to remove them from the pflash makefiles. - Make assignements simply expanded variables where possible. Form the make manual: Functions referenced in the definition will be executed every time the variable is expanded. This makes make run slower; worse, it causes the wildcard and shell functions to give unpredictable results because you cannot easily control when they are called, or even how many times. To avoid all the problems and inconveniences of recursively expanded variables, there is another flavor: simply expanded variables. - set the 'shared' target as a dependency of the libflash objects. This was the final piece to resolve the race condition. The failed build could be reproduced by doing a `git clean -f -x` and then running the following: $ make -j 32 CROSS_COMPILE=arm-linux-gnueabi- SKIBOOT_VERSION=5.2.4 PFLASH_VERSION=5.2.4 V=1 -C external/pflash all LINKAGE=dynamic make: Entering directory '/home/joel/dev/skiboot/external/pflash' ln -sf ../../libflash ./libflash ln -sf ../../ccan ./ccan ln -sf ../common ./common cc -O2 -Wall -I. -c pflash.c -o pflash.o cc -O2 -Wall -I. -c progress.c -o progress.o make -C ../shared make[1]: Entering directory '/home/joel/dev/skiboot/external/shared' ln -sf ../../hw/ast-bmc/ast-sf-ctrl.c common/ast-sf-ctrl.c ln -sf ../../include/ast.h common/ast.h ln -sf arch_flash_arm_io.h common/io.h cc -O2 -Wall -I. -c common/arch_flash_common.c -o common-arch_flash_common.o cc -O2 -Wall -I. -c common/arch_flash_arm.c -o common-arch_flash_arm.o cc -O2 -Wall -I. -c common/ast-sf-ctrl.c -o common-ast-sf-ctrl.o cc -O2 -Wall -I. -c version.c -o version.o ld -r common-arch_flash_common.o common-arch_flash_arm.o common-ast-sf-ctrl.o -o common-arch_flash.o ln -sf ../../libflash ./libflash ln -sf ../../ccan ./ccan ln -sf ../common ./common make[1]: *** No rule to make target 'libflash/file.c', needed by 'libflash-file.o'. Stop. make[1]: *** Waiting for unfinished jobs.... make[1]: Leaving directory '/home/joel/dev/skiboot/external/shared' rules.mk:25: recipe for target '../shared/libflash.so.skiboot-5.2.4-1-g9f13f64c322f-joel-dirty-d5873ce' failed make: *** [../shared/libflash.so.skiboot-5.2.4-1-g9f13f64c322f-joel-dirty-d5873ce] Error 2 Signed-off-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-27centaur: Initialize i2c master listBenjamin Herrenschmidt1-0/+1
It was left uninitialized which could cause issues later on Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-25core/console: use char literals instead of numericOliver O'Halloran1-2/+2
Save everyone a trip to asciitable.com. I realise everyone has probably memorised \n and friends, but THAT'S NOT THE POINT. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-25hw/phb3: Increase AIB TX command credit for DMA read in CAPP DMA modeAndrew Donnellan1-2/+11
When enabling CAPI in DMA mode, set the AIB TX command credits for channel 2 (DMA read) to 28, rather than 1. This significantly improves DMA read performance in CAPI DMA mode. Fixes: 5477148a439f ("phb3: Add support for CAPP DMA mode") Reported-by: John Walthour <jwalthour@us.ibm.com> Reported-by: Ricardo Mata <ricmata@us.ibm.com> Reported-by: Michael Perez <perezma@us.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-25Add skiboot-5.3.0-rc1 release notesskiboot-5.3.0-rc1Stewart Smith1-0/+270
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-22test/hello_world: always use shutdown type zeroOliver O'Halloran1-0/+1
The hello world kernel fails to correctly set r3 before making the shutdown opal call. On FSP machines only shutdown types 0 and 1 are recognised as valid shutdown types. If any other type is specified (in r3) the call is rejected with an OPAL_PARAMETER error and the machine will continue running. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-22fwts/generate-fwts-olog: Fix whitespace on json.dumpsDeb McLemore1-0/+1
Using the indent option on json dump extra whitespace preceeds the newline in the json separator formatting. We need to remove the extra whitespace when the indent option is used to allow clean patch application. Signed-off-by: Deb McLemore <debmc@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-22platform/mambo: Add a heartbeat timeChris Smart1-0/+10
The console is very slow when using Skiboot with Mambo. This adds a heartbeat timer as a platform quirk so that the console is refresh more quickly. This results in Skiboot doing the right thing without requiring custom settings in skiboot.tcl files. Signed-off-by: Chris Smart <chris@distroguy.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-22core/timer: Add support for platform specific heartbeatChris Smart2-6/+14
The timer code currently has a default and a special check for FSP machines or those with SLW timer facility. This patch adds support for platform quirk to set the timer. Signed-off-by: Chris Smart <chris@distroguy.com> Acked-by: Michael Neuling <mikey@neuling.org> [stewart@linux.vnet.ibm.com: fix whitespace issue] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-22core/timer: Actually use default heartbeat valueChris Smart1-3/+3
HEARTBEAT_DEFAULT_MS sets the default heartbeat timeout, however this was not actually used as the default. The default was ten times quicker than this (HEARTBEAT_DEFAULT_MS / 10) while HEARTBEAT_DEFAULT_MS was actually used as a special case for FSP machines or those with SLW timer facility. This patch makes the default use HEARTBEAT_DEFAULT_MS and changes FSP or machines with SLW timer facility run 10 times slower (HEARTBEAT_DEFAULT_MS * 10). This will also now match the existing in-line comment. Signed-off-by: Chris Smart <chris@distroguy.com> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21Merge branch 'skiboot-5.2.x'Stewart Smith0-0/+0
2016-07-21Merge branch 'skiboot-5.1.x' into skiboot-5.2.xStewart Smith3-39/+89
2016-07-21FSP/ELOG: Fix OPAL generated elog resend logicVasant Hegde1-5/+0
Fix resend logic in opal_resend_pending_logs, so that it actually restarts sending remaining logs. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit a6d4a7884e95cb9c918b8a217c11e46b01218358) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix possible event notifier hangsVasant Hegde2-3/+18
In some corner cases host may send acknowledgement without reading actual data (fsp_opal_elog_info -> fsp_opal_elog_ack). Because of this elog_read_from_fsp_head_state may be stuck in wrong state (ELOG_STATE_HOST_INFO) and not able to send remaining ELOG's to host. Hence reset ELOG state and start sending remaining ELOG's. Also in normal case we will ACK the logs which are already processed (elog_read_processed). Hence rearrange the code such that we go through elog_read_processed first. Finally return OPAL_PARAMETER if we are not able to find ELOG ID. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: spelling fix] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit e7c8cba4ad773055f390632c2996d3242b633bf4) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification if list is not consistentVasant Hegde1-0/+2
Chances of elog_read_pending inconsistent state is very very less. Just to be on safer side, disable notification if list is not in consistent state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit 1fb10de164d3ca034193df81c1f5d007aec37781) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Improve elog event statesVasant Hegde2-1/+3
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). Ideally we should disable notification as soon as host consumes event (after fsp_opal_elog_info). Also if host fails to call fsp_opal_elog_read (ex: situations like duplicate event), then we endup keeping notification forever. This patch introduces new ELOG state (ELOG_STATE_HOST_INFO). As soon as host consumes event elog will move to this new state so that event notification is disabled. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit cec5750a4a86ff3f69e1d8817eda023f4d40c492) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix OPAL generated elog event notificationVasant Hegde3-18/+34
We use elog notifier to notify logs from multiple sources (FSP generated logs - fsp-elog-read.c and OPAL generated logs - fsp-elog-write.c). OPAL generated logs sets elog event bit whenever it has new logs to send to host. But it relies on fsp-elog-read.c to disable the event bit..which is wrong! This patch creates common function to enable/disable event notification. It will enable event notification if any of the source is ready to send error log to host and disables notification once it completes sending all errors to host. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit ec366ad4e2e871096fa4c614ad7e89f5bb6f884f) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification during kexecVasant Hegde1-13/+33
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). In some corner cases like kexec, host may endup reading same ELOG id twice (calling fsp_opal_elog_info twice because of resend request). Host finds it as duplicate and it will not read actual log (fsp_opal_elog_read()). In such situations we fails to disable event notification :-( Scenario : OPAL Host ------------------------------------- OPAL_EVENT_ELOG_AVAIL --> kexec OPAL_EVENT_ELOG_AVAIL --> elog client registered <-- read ELOG (id=x) <-- resend elog (opal_resend_pending_logs()) resend all ELOG --> read ELOG (id=x) -- Duplicate ELOG ! bhoom!! kernel call trace: ------------------ [ 28.055923] CPU: 10 PID: 20 Comm: irq/29-opal-elo Not tainted 4.4.0-24-generic #43-Ubuntu [ 28.056012] task: c0000000ef982a20 ti: c0000000efa38000 task.ti: c0000000efa38000 [ 28.056100] NIP: c000000008010a24 LR: c000000008010a24 CTR: 0000000030033758 [ 28.056188] REGS: c0000000efa3b9c0 TRAP: 0901 Not tainted (4.4.0-24-generic) [ 28.056274] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22000844 XER: 20000000 [ 28.056499] CFAR: c000000008009958 SOFTE: 1 GPR00: c000000008131e8c c0000000efa3bc40 c0000000095b4200 0000000000000900 GPR04: c0000000094a63c8 0000000000000001 9000000100009033 0000000000000062 GPR08: 0000000000000000 0000000000000000 c0000000ef960400 9000000100001003 GPR12: c00000000806de48 c00000000fb45f00 [ 28.057042] NIP [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057117] LR [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057189] Call Trace: [ 28.057221] [c0000000efa3bc40] [c0000000f108a980] 0xc0000000f108a980 (unreliable) [ 28.057326] [c0000000efa3bc60] [c000000008131e8c] irq_finalize_oneshot.part.2+0xbc/0x250 [ 28.057429] [c0000000efa3bcb0] [c000000008132170] irq_thread_fn+0x80/0xa0 [ 28.057519] [c0000000efa3bcf0] [c00000000813263c] irq_thread+0x1ac/0x280 [ 28.057609] [c0000000efa3bd80] [c0000000080e61e0] kthread+0x110/0x130 [ 28.057698] [c0000000efa3be30] [c000000008009538] ret_from_kernel_thread+0x5c/0xa4 [ 28.057799] Instruction dump: [ 28.057844] 994d02ca 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 [ 28.057995] 7c0803a6 4e800020 60420000 4bff17ad <60000000> 4bffffe4 60420000 e92d0020 This patch adds kexec notifier client. It will disable event notification during kexec. Once host is ready to receive ELOG's again it will call fsp_opal_resend_pending_logs(). This call re-enables ELOG notication. It will fix above issue. I will add follow up patch to improve event state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> (cherry picked from commit d2ae07fd97bb9408456279cec799f72cb78680a6) Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix OPAL generated elog resend logicVasant Hegde1-5/+0
Fix resend logic in opal_resend_pending_logs, so that it actually restarts sending remaining logs. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix possible event notifier hangsVasant Hegde2-3/+18
In some corner cases host may send acknowledgement without reading actual data (fsp_opal_elog_info -> fsp_opal_elog_ack). Because of this elog_read_from_fsp_head_state may be stuck in wrong state (ELOG_STATE_HOST_INFO) and not able to send remaining ELOG's to host. Hence reset ELOG state and start sending remaining ELOG's. Also in normal case we will ACK the logs which are already processed (elog_read_processed). Hence rearrange the code such that we go through elog_read_processed first. Finally return OPAL_PARAMETER if we are not able to find ELOG ID. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [stewart@linux.vnet.ibm.com: spelling fix] Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification if list is not consistentVasant Hegde1-0/+2
Chances of elog_read_pending inconsistent state is very very less. Just to be on safer side, disable notification if list is not in consistent state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mukesh Ojha <mukesh02@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Improve elog event statesVasant Hegde2-1/+3
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). Ideally we should disable notification as soon as host consumes event (after fsp_opal_elog_info). Also if host fails to call fsp_opal_elog_read (ex: situations like duplicate event), then we endup keeping notification forever. This patch introduces new ELOG state (ELOG_STATE_HOST_INFO). As soon as host consumes event elog will move to this new state so that event notification is disabled. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Fix OPAL generated elog event notificationVasant Hegde3-18/+34
We use elog notifier to notify logs from multiple sources (FSP generated logs - fsp-elog-read.c and OPAL generated logs - fsp-elog-write.c). OPAL generated logs sets elog event bit whenever it has new logs to send to host. But it relies on fsp-elog-read.c to disable the event bit..which is wrong! This patch creates common function to enable/disable event notification. It will enable event notification if any of the source is ready to send error log to host and disables notification once it completes sending all errors to host. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21FSP/ELOG: Disable event notification during kexecVasant Hegde1-13/+33
ELOG enables event notification once new log is available. And this will be disabled after host completes reading logs (it has to complete both fsp_opal_elog_info and fsp_opal_elog_read). In some corner cases like kexec, host may endup reading same ELOG id twice (calling fsp_opal_elog_info twice because of resend request). Host finds it as duplicate and it will not read actual log (fsp_opal_elog_read()). In such situations we fails to disable event notification :-( Scenario : OPAL Host ------------------------------------- OPAL_EVENT_ELOG_AVAIL --> kexec OPAL_EVENT_ELOG_AVAIL --> elog client registered <-- read ELOG (id=x) <-- resend elog (opal_resend_pending_logs()) resend all ELOG --> read ELOG (id=x) -- Duplicate ELOG ! bhoom!! kernel call trace: ------------------ [ 28.055923] CPU: 10 PID: 20 Comm: irq/29-opal-elo Not tainted 4.4.0-24-generic #43-Ubuntu [ 28.056012] task: c0000000ef982a20 ti: c0000000efa38000 task.ti: c0000000efa38000 [ 28.056100] NIP: c000000008010a24 LR: c000000008010a24 CTR: 0000000030033758 [ 28.056188] REGS: c0000000efa3b9c0 TRAP: 0901 Not tainted (4.4.0-24-generic) [ 28.056274] MSR: 9000000100009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 22000844 XER: 20000000 [ 28.056499] CFAR: c000000008009958 SOFTE: 1 GPR00: c000000008131e8c c0000000efa3bc40 c0000000095b4200 0000000000000900 GPR04: c0000000094a63c8 0000000000000001 9000000100009033 0000000000000062 GPR08: 0000000000000000 0000000000000000 c0000000ef960400 9000000100001003 GPR12: c00000000806de48 c00000000fb45f00 [ 28.057042] NIP [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057117] LR [c000000008010a24] arch_local_irq_restore+0x74/0x90 [ 28.057189] Call Trace: [ 28.057221] [c0000000efa3bc40] [c0000000f108a980] 0xc0000000f108a980 (unreliable) [ 28.057326] [c0000000efa3bc60] [c000000008131e8c] irq_finalize_oneshot.part.2+0xbc/0x250 [ 28.057429] [c0000000efa3bcb0] [c000000008132170] irq_thread_fn+0x80/0xa0 [ 28.057519] [c0000000efa3bcf0] [c00000000813263c] irq_thread+0x1ac/0x280 [ 28.057609] [c0000000efa3bd80] [c0000000080e61e0] kthread+0x110/0x130 [ 28.057698] [c0000000efa3be30] [c000000008009538] ret_from_kernel_thread+0x5c/0xa4 [ 28.057799] Instruction dump: [ 28.057844] 994d02ca 2fa30000 409e0024 e92d0020 61298000 7d210164 38210020 e8010010 [ 28.057995] 7c0803a6 4e800020 60420000 4bff17ad <60000000> 4bffffe4 60420000 e92d0020 This patch adds kexec notifier client. It will disable event notification during kexec. Once host is ready to receive ELOG's again it will call fsp_opal_resend_pending_logs(). This call re-enables ELOG notication. It will fix above issue. I will add follow up patch to improve event state. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
2016-07-21Merge branch 'skiboot-5.2.x' (5.1.17 release notes)Stewart Smith1-0/+19