diff options
author | Alexandre Ganea <alexandre.ganea@ubisoft.com> | 2020-11-12 08:14:20 -0500 |
---|---|---|
committer | Alexandre Ganea <alexandre.ganea@ubisoft.com> | 2020-11-12 08:14:43 -0500 |
commit | 45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3 (patch) | |
tree | 36f232547cfcc617e32d65736908a71758094d14 /llvm/lib/Support/CrashRecoveryContext.cpp | |
parent | f37834c7dcbe69405bf3e182d2b3e3227cc4a569 (diff) | |
download | llvm-45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3.zip llvm-45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3.tar.gz llvm-45b8a741fbbf271e0fb71294cb7cdce3ad4b9bf3.tar.bz2 |
[LLD][COFF] When using LLD-as-a-library, always prevent re-entrance on failures
This is a follow-up for D70378 (Cover usage of LLD as a library).
While debugging an intermittent failure on a bot, I recalled this scenario which
causes the issue:
1.When executing lld/test/ELF/invalid/symtab-sh-info.s L45, we reach
lld::elf::Obj-File::ObjFile() which goes straight into its base ELFFileBase(),
then ELFFileBase::init().
2.At that point fatal() is thrown in lld/ELF/InputFiles.cpp L381, leaving a
half-initialized ObjFile instance.
3.We then end up in lld::exitLld() and since we are running with LLD_IN_TEST, we
hapily restore the control flow to CrashRecoveryContext::RunSafely() then back
in lld::safeLldMain().
4.Before this patch, we called errorHandler().reset() just after, and this
attempted to reset the associated SpecificAlloc<ObjFile<ELF64LE>>. That tried
to free the half-initialized ObjFile instance, and more precisely its
ObjFile::dwarf member.
Sometimes that worked, sometimes it failed and was catched by the
CrashRecoveryContext. This scenario was the reason we called
errorHandler().reset() through a CrashRecoveryContext.
But in some rare cases, the above repro somehow corrupted the heap, creating a
stack overflow. When the CrashRecoveryContext's filter (that is,
__except (ExceptionFilter(GetExceptionInformation()))) tried to handle the
exception, it crashed again since the stack was exhausted -- and that took the
whole application down. That is the issue seen on the bot. Locally it happens
about 1 times out of 15.
Now this situation can happen anywhere in LLD. Since catching stack overflows is
not a reliable scenario ATM when using CrashRecoveryContext, we're now
preventing further re-entrance when such failures occur, by signaling
lld::SafeReturn::canRunAgain=false. When running with LLD_IN_TEST=2 (or above),
only one iteration will be executed, instead of two.
Differential Revision: https://reviews.llvm.org/D88348
Diffstat (limited to 'llvm/lib/Support/CrashRecoveryContext.cpp')
-rw-r--r-- | llvm/lib/Support/CrashRecoveryContext.cpp | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/llvm/lib/Support/CrashRecoveryContext.cpp b/llvm/lib/Support/CrashRecoveryContext.cpp index 7609f04..77f0018 100644 --- a/llvm/lib/Support/CrashRecoveryContext.cpp +++ b/llvm/lib/Support/CrashRecoveryContext.cpp @@ -442,6 +442,26 @@ void CrashRecoveryContext::HandleExit(int RetCode) { llvm_unreachable("Most likely setjmp wasn't called!"); } +bool CrashRecoveryContext::throwIfCrash(int RetCode) { +#if defined(_WIN32) + // On Windows, the high bits are reserved for kernel return codes. Values + // starting with 0x80000000 are reserved for "warnings"; values of 0xC0000000 + // and up are for "errors". In practice, both are interpreted as a + // non-continuable signal. + unsigned Code = ((unsigned)RetCode & 0xF0000000) >> 28; + if (Code != 0xC && Code != 8) + return false; + ::RaiseException(RetCode, 0, 0, NULL); +#else + // On Unix, signals are represented by return codes of 128 or higher. + if (RetCode <= 128) + return false; + llvm::sys::unregisterHandlers(); + raise(RetCode - 128); +#endif + return true; +} + // FIXME: Portability. static void setThreadBackgroundPriority() { #ifdef __APPLE__ |