aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-09skiboot v6.7.4 release notesv6.7.4skiboot-6.7.xCédric Le Goater1-0/+19
Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-11-04secvar/edk2: store timestamp variable in protected storageEric Richter2-1/+4
Each signed variable update contains a timestamp -- this timestamp is checked against the previous timestamp seen for that particular variable (if any), and the update is rejected if the timestamp is not a later time than the previous. This timestamp check is intended to prevent re-use of signed update files. Currently, the code stores the timestamps in the TS variable, which is then stored in regular variable storage (typically PNOR). This patch promotes the variable to "protected storage" (typically TPM NV), so avoid this variable being accidentally cleared. This change should only come into effect when either: - initializing secvar for the first time (i.e. first boot, or after a key-clear-request) - processing any variable update Systems that already have a TS variable in PNOR will not be affected until either of the above actions are taken. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Tested-by: Nick Child <nick.child@ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Cédric Le Goater <clg@kaod.org> (cherry picked from commit 59a247e7f4e9df2521ebb53cdc47aaa34c225fea) Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-11-04secvar/secboot_tpm: unify behavior for bank hash check and secboot header checkEric Richter2-16/+29
As the PNOR variable space cannot be locked, the data must be integrity checked when loaded to ensure it has not beeen modified by an unauthorized party. In the event that a modification has been detected (i.e. hash mismatch), we must not load in data that could potentially be compromised. However, the previous code was a bit overzealous with its reaction to detecting a compromised SECBOOT partition, and also had some inconsistencies in behavior. Case 1: SECBOOT partition cleared. .init() checks the header for the magic number and version. As neither matches, will reformat the entire partition. Now, .load_bank() will pass, as the data was just freshly reformatted (note: this also could trigger the bug addressed in the previous patch). Only variables in the TPM will be loaded by .load_bank() as the data in SECBOOT is now empty. Case 2: Bank hash mismatch. .load_bank() panics and returns an error code, causing secvar_main() to jump to the error scenario, which prevents the secvar API from being exposed. os-secure-enforcing is set unconditionally, and the user will have no API to manage or attempt to fix their system without issuing a key clear request. This patch unifies the behavior of both of these cases. Now, .init() handles checking the header AND comparing the bank hash. If either check fails, the SECBOOT partition will be reformatted. Variables in the TPM will still be loaded in the .load_bank() step, and provided the backend stores its secure boot state in the TPM, secure boot state can be preserved. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Tested-by: Nick Child <nick.child@ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Cédric Le Goater <clg@kaod.org> (cherry picked from commit 8f72fe3071228fe71c0862ba1b5527ff11dbbfd3) Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-11-04secvar/secboot_tpm: correctly reset the control index on secboot formatEric Richter1-4/+7
When the SECBOOT partition is formatted, the bank hash stored in the control TPM NV index must be updated to match, or else we will immediately fail to load the freshly formatted data at the .load_bank() step. However, while the secboot_format() function does calculate and update the bank hash, it only writes the new hash for bank 0. It does not update the value for bank 1, or set the current active bank. This works as expected if the active bank bit happens to be set to 0. On the other hand, if the active bit is set to 1, the freshly formatted bank 1 will be compared against the unchanged bank hash in bank 1 at the load step, therefore causing an error. This patch fixes this issue by also setting the active bit to 0 to match the freshly calculated hash. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Tested-by: Nick Child <nick.child@ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Cédric Le Goater <clg@kaod.org> (cherry picked from commit 5cb28dd14e202b66e95d5420923a157fe9639132) Signed-off-by: Cédric Le Goater <clg@kaod.org>
2021-07-22skiboot v6.7.3 release notesv6.7.3Vasant Hegde1-0/+29
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22pkcs7: pkcs7_get_content_info_type should reset *p on errorDaniel Axtens2-1/+35
[ Upstream commit d8e13853e506e00713d15fa5e23457ba21a16829 ] Fuzzing revealed a crash where pkcs7_get_signed_data was accessing beyond the bounds of the object, despite valid data being passed in to mbedtls_pkcs7_parse_der. Further investigation revealed that pkcs7_get_content_info_type will reset *p to start if the second call to mbedtls_asn1_get_tag fails, but not if the first call fails. mbedtls_asn1_get_tag does indeed advance *p even in some failure cases, so a reset is required. Reset *p to start if the first call to mbedtls_asn1_get_tag fails. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar/backend: fix a memory leak in get_pkcs7Daniel Axtens4-1/+181
[ Upstream commit 8dd8b6e4abb4d61cdf98470f3fe5cb750def7a18 ] We need to actually free the pkcs7 structure, not just pass it to mbedtls_pkcs7_free(). Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar/backend: fix an integer underflow bugDaniel Axtens3-0/+182
[ Upstream commit 0c265ace91b9d9ee08e09392a7d4a78a1301a3ab ] If a declared size is smaller than uuid size, we end up allocating with an allocation of a 'negative' number, which is a huge 64 bit number. This will probably then fail with an OPAL_NO_MEM, but it will be better to catch it and return OPAL_PARAMETER instead. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar/backend: Don't overread data in auth descriptorDaniel Axtens2-0/+22
[ Upstream commit 15da2fd447c04a9f6ea53b8f8bdfaa7cbc6ea520 ] Catch another OOB read picked up by the fuzzer. Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar: return error if verify_signature runs out of ESLsNick Child2-1/+29
[ Upstream commit 56658ad4a0249cdf516e6bc21781cce901965998 ] Currently, in `verify_signature`, the return code `rc` is initialized as 0 (our success value). While looping through the ESL's in the given secvar, the function will break if the remaining data in the secvar is not enough to contain another ESL. This break from the loop was not setting a return code, this means that the successful return code can pass to the end of the function if the first iteration meets this condition. In other words, if a current secvar has a size that is less than minimum size for an ESL, than it will approve any update. In response to this bug, this commit will return an error code if the described condition is met. Additionally, a test case has been added to ensure that this unlikely event is handled correctly. Fixes: 87562bc5c1a6 ("secvar/backend: add edk2 derived key updates processing") Signed-off-by: Nick Child <nick.child@ibm.com> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar: return error if validate_esl has extra dataNick Child2-1/+19
[ Upstream commit 355176a9405c83320748f804e8655e6a8ee2324f ] Currently, in `validate_esl_list`, the return code is initialized to zero (our success value). While looping though the ESL's in the submitted ESL chain, the loop will break if there is not enough data to meet minimum ESL requirements. This condition was not setting a return code, meaning that the successful return code can pass to the end of the function if there is extra data at the end of the ESL. As a consequence, any properly signed update can successfully commit any data (as long as it is less than the min size of an ESL) to the secvars. This commit will return an error if the described condition is met. This means all data in the appended ESL of an auth file must be accounted for. No extra bytes can be added to the end since, on success, this data will become the updated secvar. Additionally, a test case has been added to ensure that this commit addresses the issue correctly. Fixes: 87562bc5c1a6 ("secvar/backend: add edk2 derived key updates processing") Signed-off-by: Nick Child <nick.child@ibm.com> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar: Make `validate_esl_list` iterate through esl chainNick Child3-4/+244
[ Upstream commit 0917fd18ac30f8935563d26629a02f210a485687 ] Currently, the loop in validate_esl_list is not iterating through the ESL entries. As a consequence, all of entries after the first are not being validated and can contain any data. In order to iterate, the pointer to the esl buffer must be incremented by the amount of already read bytes. This commit also adds a new test case and file. The file is `multipletrimmedKEK.h` the array is very similar to the one in `trimmedKEK.h` except this one only has an invalid ESL as the second ESL in the chain. This then tests the condition that this commit tests because only the second ESL is invalid. Fixes: 87562bc5c1a6 ("secvar/backend: add edk2 derived key updates processing") Signed-off-by: Nick Child <nick.child@ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-07-22secvar: ensure ESL buf size is at least what ESL header expectsNick Child3-1/+182
[ Upstream commit 8a31163a0271f11b4597bca4e803f559e38e3d24 ] Currently, `get_esl_cert` receives a data buffer containing an ESL and its length. It is to return a data buffer of the certificate that is contained inside the ESL. The ESL has header info that contains the certificates `size` and the size of the header (`sig_data_offset`). We use this information to copy `size` bytes starting `sig_data_offset` bytes after the given ESL buffer. Currently we are checking that the length of the ESL buffer is at least `sig_data_offset` bytes but we are not checking that it also has enough bytes to also contain `size` bytes of the certificate. This becomes problematic if some data at the end of the ESL gets lost. Since the ESL claims it has more than it actually does, this will lead to a buffer over-read. What is even worse, is that this buffer over-read can go unnoticed since the last 256 bytes of the ESL are usually the x509 2048 bit signature so the extra garbage bytes that are copied will appear to be a valid rsa signature. To resolve this, this commit ensures that the ESL buffer length is large enough to hold the data that it claims it contains. Lastly, a new test case is added to test the described condition. It includes a new test file `trimmedKEK.h` which contains a struct a valid KEK auth file minus 5 bytes, therefore making it invalid. Fixes: 87562bc5c1a6 ("secvar/backend: add edk2 derived key updates processing") Signed-off-by: Nick Child <nick.child@ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Nayna Jain <nayna@linux.ibm.com> Tested-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-30skiboot v6.7.2 release notesv6.7.2Vasant Hegde1-0/+29
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24secvar: fix endian conversionNayna Jain1-1/+1
[ Upstream commit 5be38b672c1410e2f10acd3ad2eecfdc81d5daf7 ] unpack_timestamp() calls le32_to_cpu() for endian conversion of uint16_t "year" value. This patch fixes the code to use le16_to_cpu(). Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24secvar/secvar_util: Properly free memory on zalloc failNick Child1-1/+1
[ Upstream commit e964b78d567fab5263ed339c89516039a8714295 ] If allocating the secure variable name of a secure variable struct, `secvar->key`, fails then the secvar struct should be freed before returning NULL. Previously, if this allocation fails, then only the `secvar->key` is freed (which is likely a typo) leaving the allocated `secvar` struct allocated and returning NULL. This memory leak can be seen with the static analysis tool `cppcheck`. After running valgrind tests, this commit ensures that memory is properly freed if an error occurs when allocating the `key` field of the `secvar` struct. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24edk2-compat-process.c: Remove repetitive debug print statementsNick Child1-7/+2
[ Upstream commit ec884c19bfff75bc5b59093113f170837be28ec9 ] Functions `get_esl_cert`, `validate_esl_list` and `get_esl_signature_list_size` all contain the same debug print statement. This statement prints the size of the ESL. `validate_esl_list` calls `get_esl_cert` so the same debug information prints twice when validating the newly submitted ESL. Additionally, the same debug prints twice when validating the current ESL since `get_esl_cert` and `get_esl_signature_list_size` are both called by the function `verify_signature`. Since `get_esl_cert` is the common factor, this commit removes the other two print statements (and adds some information to an error message to maintain clarity, in case `validate_esl_list` fails before calling `validate_esl_cert`). After double checking that these functions are not being called anywhere else, the only real change is to reduce the number of redundant print statements for the secvar update process. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24phb4: Avoid MMIO load freeze escalation on every chipMahesh Salgaonkar1-1/+5
[ Upstream commit d51eb6f95e7078235ba2217e2dc9fc53e65bc902 ] The commit f397cc30bdf8 ("phb4: Only escalate freezes on MMIO load where necessary") introduced a change to restrict escalation to the chips that actually need it. However it missed one case which still causes the escalation on every chip. This affects EEH recovery to cause full PHB reset on some chips which is not necessary. This patch fixes that. Also, add a check for p9 chip in phb4_escalation_required() function. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24phb4: Disable TCE cache line bufferFrederic Barrat2-0/+2
[ Upstream commit 15b93a301509ba7813343540e25b47ba395674b9 ] This patch implements a circumvention for HW557787. It disables the TCE cache line buffer as, under heavy loads, there's a possibility of an entry being re-allocated incorrectly. Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24hw/imc: Disable only nest_imc devices if pause_microcode() failsMadhavan Srinivasan1-2/+16
[ Upstream commit d505f4037976ac540be1608653272ee57ae737ee ] During opal boot, in imc_init(), 24x7/IMC microcode state is checked and if it is not in running or pause state, currently all the imc devices are removed from device tree. Instead, remove only the nest imc devices. Core/Thread/Trace imc devices are not related to 24x7 microcode. Patch adds a function to remove specific imc device type and the same is used, when pause_microcode() fails, to remove nest imc device types from the device tree. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24hw/imc: move imc_init() towards end main_cpu_entry()Madhavan Srinivasan1-3/+3
[ Upstream commit fbcbd4e47cdcf00fffc7665a297c875ce9ef951a ] imc_init() checks for the 24x7 microcode state at boot to check whether the microcode is in proper state (running or paused). But in a larger system, loading of 24x7 microcode by OCC gets delayed. Because of this, imc_init() removes imc devices from the device tree. Moving imc_init() function towards end of the main_cpu_entry() works around this. Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-06-24Fix lock error when BT IRQ preempt BT timerlixg1-3/+5
[ Upstream commit 46d7eafbda4006b9b858b49f9df9c63575582a92 ] BT IRQ may preempt BT timer if BMC response host when bt msg timeout. When BT IRQ preempt BT timer, the infight_bt_msg did not protected by bt.lock very well. And we will see the following log: [29006114.163785853,3] BT: seq 0x81 netfn 0x0a cmd 0x23: Timeout sending message [29006114.288029290,3] BT: seq 0x81 netfn 0x0b cmd 0x23: Timeout sending message [29006114.288917798,3] IPMI: Incorrect netfn 0x0b in response It may cause 'CPU Hardlock UP', 'memory refree', 'kernel crash' or something else... Signed-off-by: lixg <867314078@qq.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06skiboot v6.7.1 release notesv6.7.1Vasant Hegde1-0/+33
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Account cancelled timer requestVasant Hegde1-0/+3
[ Upstream commit b44c7594523d20945179e497c45ec9007981ac75 ] Currently we are not accounting cancelled timer request. So in some corner cases we may schedule new timer request with new-timer-value > inflight-timer-value. Lets explicit check new_target value with inflight timer value. Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Rate limit timer requestsVasant Hegde1-0/+22
[ Upstream commit 2e654443050acdd4deffdbb44723a847ca11e6b2 ] We schedule timer and wait for `timer expiry` interrupt from SBE. If we get new timer request which is lesser than inflight timer expiry value we can update timer (essentially sending new timer chip-op and SBE takes care of stoping inflight timer and scheduling new one). SBE runs at much slower speed than host CPU. If we do continuous timer update like below then SBE will be busy with handling PSU side timer message and will not get time to handle FIFO side requests. send timer chip-op -> Got ACK -> send timer chip-op Hence this patch limits number of continuous timer update and we will restart sending timer request as soon as we get timer expiry interrupt. Rate limit value (2) is suggested by SBE team. With this patch: If our timer requests are : 2ms, 1500us, 1000us and 800us (and requests are coming after sending each message) We will schedule timer for 2ms and then update timer for 1500us and 1000us (These update happens after getting ACK interrupt from SBE) We will not send 800us request. At 1000us we get `timer expiry` and we are good to send next timer requests (At this stage both 1000us and 800us timeout happens. We will schedule next timer request with timeout value 500us (1500-1000)). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06SBE: Check timer state before scheduling timerVasant Hegde1-2/+4
[ Upstream commit 47ab3a92298e72e44b9477a02b1312a09272a54a ] Timer flow: - OPAL sends timer chip-op to SBE and waits for ACK - Until we get ACK interrupt from SBE we will not schedule any new timer - Once we get ACK either we wait for timer expiry -OR- schedule new one if new-timer-request < inflight-timer-timeout value. - If we get new timer request while processing current one p9_sbe_update_timer_expiry code sets `has_new_target` and we schedule it in ACK path (p9_sbe_timer_resp()). p9_sbe_timer_resp() is callback handler and its called without lock. It does not check whether timer message is busy or not (timer_ctrl_msg). So in theory we may hit below scenario and corrupt msg_list. CPU 1 -> Timer ACK (callback handler) -- its not holding any lock CPU 2 -> Grabbed sbe_timer_lock -> scheduled timer --> done CPU 3 -> p9_sbe_update_timer_expiry() -> see timer is busy -> sets has_new_timer -> done CPU 1 -> gets chance to grab sbe_timer_lock -> saw has_new_timer -> Called p9_sbe_timer_schedule() --> List corrupted ! This patch adds timer message busy check in p9_sbe_timer_resp(). Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06platform/mowgli: Limit PHB0/(pec0) to gen3 speedLuluTHSu1-0/+16
[ Upstream commit 127a3ee2417a2a71b63cca5fd7055d9b64939bb1 ] Use the method provided by Frederic: Add the "ibm, maximum link speed" attribute to the PHB device tree at index 0. The phb4.c code will looks for it and set up the link correctly. Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06Revert "mowgli: Limit slot1 to Gen3 by default"LuluTHSu3-36/+0
[ Upstream commit de20b93849c3cdee62ff066e079b5460737e8609 ] This reverts commit 5262cdd1b99f77bca5951fc8132f9795ef0c2b87. When link reset/retrain, this method cannot maintain the max-link-speed limit, so remove it. Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06xscom: Fix xscom error logging caused due to xscom OPAL callGautham R. Shenoy1-2/+19
[ Upstream commit a4101173cacf79fcd91d395ab12aac9cb6840975 ] Commit 80fd2e963bd4 ("xscom: Don't log xscom errors caused by OPAL calls") ensured that xscom errors caused due to XSCOM read/write OPAL calls aren't logged in the error-log since the caller of the OPAL call is expected to handle it. However we are continuing to print the prerror() in the OPAL log regarding the same. This patch reduces the severity of the log from PR_ERROR to PR_INFO for the xscom read and write made via OPAL calls. Tested-by: Pavaman Subramaniyam <pavsubra@in.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Print info only for xscom read/writes made via opal calls Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06xive/p9: Remove assert from xive_eq_for_target()Cédric Le Goater1-1/+1
[ Upstream commit f07ea9564425d8005ab334dfa40f7cebe4e71fbf ] XIVE VPs are structures describing the vCPUs of guests. When starting a guest, these are allocated and enabled and some checks are done on the location of the associated ENDs, which describe the event queues. If the block of the VP and the block of the ENDs do not match, the XIVE driver asserts. Unfortunately, there is no way to check that a VP identifier is part of a VP block that was previously allocated and it is relatively easy to crash the host with a bogus VP id. That can be done with a QEMU hack on a machine using vsmt. Simply remove the assert, the OS should gracefully handle the error. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reported-by: Greg Kurz <groug@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06Fix possible deadlock with DEBUG buildVasant Hegde1-2/+2
[ Upstream commit b2ac14f459feebd774e2177d74cda0708f38da2a ] Sample output from Cédric: ------------------------- [ 88.294111649,7] cpu_idle_p9 called on cpu 0x063c with pm disabled [ 88.289365222,7] cpu_idle_p9 called on cpu 0x025f with pm disabled [ 88.289900684,7] cpu_idle_p9 called on cpu 0x045f with pm disabled [ 88.302621295,7] CHIPTOD: Base TFMR=0x2512000000000000 [ 88.289899701,7] cpu_idle_p9 called on cpu 0x0456 with pm disabled LOCK ERROR: Deadlock detected @0x30402740 (state: 0x0000000400000001) [ 88.332264757,3] *********************************************** [ 88.332300051,3] < assert failed at core/lock.c:32 > [ 88.332328282,3] . [ 88.332347335,3] . [ 88.332364894,3] . [ 88.332377963,3] OO__) [ 88.332395458,3] <"__/ [ 88.332412628,3] ^ ^ [ 88.332450246,3] Fatal TRAP at 00000000300286a0 .lock_error+0x64 MSR 9000000000021002 [ 88.332501812,3] CFAR : 00000000300414f4 MSR : 9000000000021002 [ 88.332536539,3] SRR0 : 00000000300286a0 SRR1 : 9000000000021002 [ 88.332574644,3] HSRR0: 0000000030020024 HSRR1: 9000000000001000 [ 88.332610635,3] DSISR: 00000000 DAR : 0000000000000000 [ 88.332650628,3] LR : 0000000030028690 CTR : 00000000300f9fa0 [ 88.332684451,3] CR : 20002000 XER : 00000000 [ 88.332712767,3] GPR00: 0000000030028690 GPR16: 0000000032c98000 [ 88.332748046,3] GPR01: 0000000032c9b0a0 GPR17: 0000000000000000 [ 88.332784060,3] GPR02: 0000000030169d00 GPR18: 0000000000000000 [ 88.332822091,3] GPR03: 0000000032c9b310 GPR19: 0000000000000000 [ 88.332861357,3] GPR04: 0000000030041480 GPR20: 0000000000000000 [ 88.332897229,3] GPR05: 0000000000000000 GPR21: 0000000000000000 [ 88.332937051,3] GPR06: 0000000000000010 GPR22: 0000000000000000 [ 88.332968463,3] GPR07: 0000000000000000 GPR23: 0000000000000000 [ 88.333007333,3] GPR08: 000000000002cbb5 GPR24: 0000000000000000 [ 88.333041971,3] GPR09: 0000000000000000 GPR25: 0000000000000000 [ 88.333081073,3] GPR10: 0000000000000000 GPR26: 0000000000000003 [ 88.333114301,3] GPR11: 3839616263646566 GPR27: 0000000000000211 [ 88.333156040,3] GPR12: 0000000020002000 GPR28: 000000003042a134 [ 88.333189222,3] GPR13: 0000000000000000 GPR29: 0000000030402740 [ 88.333225638,3] GPR14: 0000000000000000 GPR30: 0000000000000001 [ 88.333259730,3] GPR15: 0000000000000000 GPR31: 0000000000000000 CPU 0211 Backtrace: S: 0000000032c9b3b0 R: 0000000030028690 .lock_error+0x54 S: 0000000032c9b440 R: 0000000030028828 .add_lock_request+0xd0 S: 0000000032c9b4f0 R: 0000000030028a9c .lock_caller+0x8c S: 0000000032c9b5a0 R: 0000000030021b30 .__mcount_stack_check+0x70 S: 0000000032c9b650 R: 00000000300fabb0 .list_check_node+0x1c S: 0000000032c9b6f0 R: 00000000300fac98 .list_check+0x38 S: 0000000032c9b790 R: 00000000300289bc .try_lock_caller+0xac S: 0000000032c9b830 R: 0000000030028ad8 .lock_caller+0xc8 S: 0000000032c9b8e0 R: 0000000030028d74 .lock_recursive_caller+0x54 S: 0000000032c9b980 R: 0000000030020cb8 .console_write+0x48 S: 0000000032c9ba30 R: 00000000300445a8 .vprlog+0xc8 S: 0000000032c9bc20 R: 0000000030044630 ._prlog+0x50 S: 0000000032c9bcb0 R: 0000000030029204 .cpu_idle_p9+0x74 S: 0000000032c9bd40 R: 0000000030029628 .cpu_idle_pm+0x4c S: 0000000032c9bde0 R: 0000000030023fe0 .__secondary_cpu_entry+0xa0 S: 0000000032c9be70 R: 0000000030024034 .secondary_cpu_entry+0x40 S: 0000000032c9bf00 R: 0000000030003290 secondary_wait+0x8c CPU 0x4: opal_run_pollers -> check_stacks -> takes stack_check_lock lock prlog -> console_write -> waits for con_lock CPU 0x211 cpu_idle_p9 -> prlog -> console_write -> Takes con_lock lock list_check_node -> tries to take stack_check_lock and hits deadlock. I think we don't need to hold `stack_check_lock` while printing backtraces. Instead it makes sense to hold backtrace lock (bt_lock) and print output. Reported-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Tested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06core/platform: Fallback to full_reboot if fast-reboot failsVasant Hegde1-1/+2
[ Upstream commit 8256da311027176dd22885205f16869f55b79f3b ] If fast reboot fails then we return to Linux with OPAL_SUCCESS. Current Linux code thinks that request succedded and enters infinite loop (see Linux pnv_restart() code). This patch fixes above issue by return OPAL_UNSUPPORTED if fast reboot fails. Alternatively we can directly call full_reboot() itself. But I think it makes sense to go back to Linux and report the failure. And Linux falls back to normal reboot request. Fixes: 10bbcd07 ("core/platform: Add an explicit fast-reboot type") Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Dan Horák <dan@danny.cz> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2021-01-06core/cpu: fix next_ungarded_primaryNicholas Piggin1-4/+2
[ Upstream commit 3f65437bb367ccf479fa6b9e905bf50ede359e9d ] next_unguarded_primary dereferences NULL CPU -> UB -> infinite loop Fast reboot works again after this patch. Fixes: 98f5834253c7e ("cpu: Keep track of the "ec_primary" in big core more") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
2020-11-03skiboot v6.7 release notesv6.7Oliver O'Halloran1-0/+37
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-11-02phb4: Finish removing P9 DD1 workaround on LSIsCédric Le Goater2-5/+1
Commit ad7e9a67c4e4 ("xive/p9: obsolete OPAL_XIVE_IRQ_SHIFT_BUG flags") forgot to remove the internal flag. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-11-02platform/mowgli: modify slot_nameLuluTHSu1-5/+5
Since Mowgli has only one slot, modify the names of other slots to avoid confusion. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-11-02mowgli: Limit slot1 to Gen3 by defaultLuluTHSu3-0/+38
Refer to the spec. of mowgli, limit the slot to Gen3 speed. For mowgli platform spec. Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-23skiboot v6.6.4 release notesVasant Hegde1-0/+18
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-23skiboot 5.4.12 release notesVasant Hegde1-0/+14
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-21external/pci-scripts: Add PHB error parsing scriptOliver O'Halloran2-0/+686
A very hacky, but very useful script that parses the PowerNV EEH register dump from the kernel log, and the verbose EEH dump from the opal message log and renders it into something mostly readable. Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-21FSP/NVRAM: Do not assert in vNVRAM statistics callVasant Hegde1-2/+1
`msg` is valid pointer here. I don't recall why I added assert here :-( This is not correct. We shouldn't call assert here. Also we are not using `msg`. Hence convert it to `__unused`. Fixes: 19d4f98e ('FSP/NVRAM: Handle "get vNVRAM statistics" command') Cc: skiboot-stable@lists.ozlabs.org # v5.4.x + Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-15platform/mowgli: modify VPD to export correct data to system VPD EEPROMLuluTHSu1-0/+20
Hostboot doesn't export the correct data for the system VPD EEPROM for this system. So add vpd_dt_fixup(). Cc: skiboot-stable@lists.ozlabs.org Signed-off-by: LuluTHSu <Lulu_Su@wistron.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-15opal-prd: handle devtmpfs mounted with noexecGeorgy Yakovlev1-2/+34
On systems using recent versions of systemd /dev (devtmpfs) is mounted with noexec option. Such mount prevents mapping HBRT image code region as RWX from /dev. This commit, as suggested in github PR linked below, attempts to work around the situation by copying HBRT image to anon mmaped memory region and sets mprotect rwx on it, allowing opal-prd to sucessfully execute the code region. Having memory region set as RWX is not ideal for security, but fixing that is a separate and hard to solve problem. Original code also mmaped region as RWX, so this PR does not make things worse at least. Closes: https://github.com/open-power/skiboot/issues/258 Signed-off-by: Georgy Yakovlev <gyakovlev@gentoo.org> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> [oliver: whitespace fix, add a comment, reflow commit message] Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-15mowgli: Enable secvar support for Host OS Secure BootNayna Jain1-0/+6
Secure variable support is needed for Host OS Secure Boot key management. This needs to be enabled for each platform, as each platform needs to select the storage and backend drivers to use. This patch adds secure variable support to the mowgli platform. Test Results: After applying the patch, sysfs and device-tree shows secvar entries correctly. # cd /sys/firmware/secvar/ # ls format vars # cat format ibm,edk2-compat-v1 # cd vars # ls KEK PK TS db dbx # cat PK/size 0 # cat KEK/size 0 # cat TS/size 64 # cat db/size 0 # cat dbx/size 0 # ls /proc/device-tree/ibm,secureboot/ compatible hw-key-hash-size name secure-enabled hw-key-hash ibm,cvc phandle trusted-enabled # ls /proc/device-tree/ibm,opal/secvar/status /proc/device-tree/ibm,opal/secvar/status # ls /proc/device-tree/ibm,opal/secvar/ compatible max-var-key-len name status format max-var-size phandle update-status # cat /proc/device-tree/ibm,opal/secvar/status okay# # cat /proc/device-tree/ibm,opal/secvar/format ibm,edk2-compat-v1# Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Signed-off-by: Klaus Heinrich Kiwi <klaus@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-02test: Skip qemu tests if skiboot.lid is too largeOliver O'Halloran2-0/+14
With the addition of the secvar patches the GCOV enabled builds now produce a skiboot.lid that greater than 4MB. This is larger than the historical max firmware image size supported by the PowerNV Qemu model so we need to skip the Qemu boot tests in that case. Non-GCOV builds are still well under the limit (2.3MB or so) and mambo tests are not affected, so this shouldn't be a big deal. If the Qemu happens to support a larger image size this should continue to work without issues. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-02secvar/test: use mbedtls cflags when building the test binariesEric Richter2-2/+4
The edk2 test file includes some mbedtls files directly, make sure that those also include the correct mbedtls config file. Without this, the default config file is used, which conflicts with the version we build as part of skiboot. As host libc includes a SIZE_MAX macro, this also changes the SIZE_MAX macro defined in mbedtls_config.h (needed for some mbedtls functions) to only be defined if it isn't already. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-02secvar/test: use vendored mbedtls instead of hostEric Richter2-7/+10
Linking against the host mbedtls introduces problems if the host does not have the library, or if the host has a different version installed. This patch changes the tests to instead build mbedtls from the version included in skiboot using the host compiler, removing the dependency on external mbedtls. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-01secvar: Clean up makefiles and fix out of tree buildsOliver O'Halloran4-19/+13
The secvar makefiles use $(SRC) in a few places they shouldn't and don't use it in a few places they should. Also drop the _SRCS rules and the pattern substuituion that turns them into _OBJS rules because chaining dependent rules is infuriating at the best of times. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-01secvar/test: Remove broken initalizersOliver O'Halloran1-2/+2
Some versions of GCC complain about this. That and since it's a static global it goes in the BSS and is initialized to zero anyway. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
2020-10-01skiboot.lds.S: Move BSS start up a bit to accommodate a larger .dataOliver O'Halloran2-5/+5
Witht addition of libtss and mbedtls the .data section now overlaps the start of the .bss section. Adding a few MB to the offset doesn't hurt. Signed-off-by: Oliver O'Halloran <oohall@gmail.com>