aboutsummaryrefslogtreecommitdiff
path: root/gdb/i386-tdep.c
diff options
context:
space:
mode:
authorAndrew Burgess <aburgess@redhat.com>2023-02-22 12:15:34 +0000
committerAndrew Burgess <aburgess@redhat.com>2023-04-06 14:22:10 +0100
commitcf141dd8ccd36efe833aae3ccdb060b517cc1112 (patch)
tree1265aef99419f042e91ed4a21490e5eb4a2938dc /gdb/i386-tdep.c
parent11aa9f628e28c077c860480571c152e07e6a4938 (diff)
downloadgdb-cf141dd8ccd36efe833aae3ccdb060b517cc1112.zip
gdb-cf141dd8ccd36efe833aae3ccdb060b517cc1112.tar.gz
gdb-cf141dd8ccd36efe833aae3ccdb060b517cc1112.tar.bz2
gdb: fix reg corruption from displaced stepping on amd64
This commit aims to address a problem that exists with the current approach to displaced stepping, and was identified in PR gdb/22921. Displaced stepping is currently supported on AArch64, ARM, amd64, i386, rs6000 (ppc), and s390. Of these, I believe there is a problem with the current approach which will impact amd64 and ARM, and can lead to random register corruption when the inferior makes use of asynchronous signals and GDB is using displaced stepping. The problem can be found in displaced_step_buffers::finish in displaced-stepping.c, and is this; after GDB tries to perform a displaced step, and the inferior stops, GDB classifies the stop into one of two states, either the displaced step succeeded, or the displaced step failed. If the displaced step succeeded then gdbarch_displaced_step_fixup is called, which has the job of fixing up the state of the current inferior as if the step had not been performed in a displaced manner. This all seems just fine. However, if the displaced step is considered to have not completed then GDB doesn't call gdbarch_displaced_step_fixup, instead GDB remains in displaced_step_buffers::finish and just performs a minimal fixup which involves adjusting the program counter back to its original value. The problem here is that for amd64 and ARM setting up for a displaced step can involve changing the values in some temporary registers. If the displaced step succeeds then this is fine; after the step the temporary registers are restored to their original values in the architecture specific code. But if the displaced step does not succeed then the temporary registers are never restored, and they retain their modified values. In this context a temporary register is simply any register that is not otherwise used by the instruction being stepped that the architecture specific code considers safe to borrow for the lifetime of the instruction being stepped. In the bug PR gdb/22921, the amd64 instruction being stepped is an rip-relative instruction like this: jmp *0x2fe2(%rip) When we displaced step this instruction we borrow a register, and modify the instruction to something like: jmp *0x2fe2(%rcx) with %rcx having its value adjusted to contain the original %rip value. Now if the displaced step does not succeed, then %rcx will be left with a corrupted value. Obviously corrupting any register is bad; in the bug report this problem was spotted because %rcx is used as a function argument register. And finally, why might a displaced step not succeed? Asynchronous signals provides one reason. GDB sets up for the displaced step and, at that precise moment, the OS delivers a signal (SIGALRM in the bug report), the signal stops the inferior at the address of the displaced instruction. GDB cancels the displaced instruction, handles the signal, and then tries again with the displaced step. But it is that first cancellation of the displaced step that causes the problem; in that case GDB (correctly) sees the displaced step as having not completed, and so does not perform the architecture specific fixup, leaving the register corrupted. The reason why I think AArch64, rs600, i386, and s390 are not effected by this problem is that I don't believe these architectures make use of any temporary registers, so when a displaced step is not completed successfully, the minimal fix up is sufficient. On amd64 we use at most one temporary register. On ARM, looking at arm_displaced_step_copy_insn_closure, we could modify up to 16 temporary registers, and the instruction being displaced stepped could be expanded to multiple replacement instructions, which increases the chances of this bug triggering. This commit only aims to address the issue on amd64 for now, though I believe that the approach I'm proposing here might be applicable for ARM too. What I propose is that we always call gdbarch_displaced_step_fixup. We will now pass an extra argument to gdbarch_displaced_step_fixup, this a boolean that indicates whether GDB thinks the displaced step completed successfully or not. When this flag is false this indicates that the displaced step halted for some "other" reason. On ARM GDB can potentially read the inferior's program counter in order figure out how far through the sequence of replacement instructions we got, and from that GDB can figure out what fixup needs to be performed. On targets like amd64 the problem is slightly easier as displaced stepping only uses a single replacement instruction. If the displaced step didn't complete the GDB knows that the single instruction didn't execute. The point is that by always calling gdbarch_displaced_step_fixup, each architecture can now ensure that the inferior state is fixed up correctly in all cases, not just the success case. On amd64 this ensures that we always restore the temporary register value, and so bug PR gdb/22921 is resolved. In order to move all architectures to this new API, I have moved the minimal roll-back version of the code inside the architecture specific fixup functions for AArch64, rs600, s390, and ARM. For all of these except ARM I think this is good enough, as no temporaries are used all that's needed is the program counter restore anyway. For ARM the minimal code is no worse than what we had before, though I do consider this architecture's displaced-stepping broken. I've updated the gdb.arch/amd64-disp-step.exp test to cover the 'jmpq*' instruction that was causing problems in the original bug, and also added support for testing the displaced step in the presence of asynchronous signal delivery. I've also added two new tests (for amd64 and i386) that check that GDB can correctly handle displaced stepping over a single instruction that branches to itself. I added these tests after a first version of this patch relied too much on checking the program-counter value in order to see if the displaced instruction had executed. This works fine in almost all cases, but when an instruction branches to itself a pure program counter check is not sufficient. The new tests expose this problem. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=22921 Approved-By: Pedro Alves <pedro@palves.net>
Diffstat (limited to 'gdb/i386-tdep.c')
-rw-r--r--gdb/i386-tdep.c24
1 files changed, 12 insertions, 12 deletions
diff --git a/gdb/i386-tdep.c b/gdb/i386-tdep.c
index e93479c..1ab9fc0 100644
--- a/gdb/i386-tdep.c
+++ b/gdb/i386-tdep.c
@@ -843,7 +843,7 @@ void
i386_displaced_step_fixup (struct gdbarch *gdbarch,
struct displaced_step_copy_insn_closure *closure_,
CORE_ADDR from, CORE_ADDR to,
- struct regcache *regs)
+ struct regcache *regs, bool completed_p)
{
enum bfd_endian byte_order = gdbarch_byte_order (gdbarch);
@@ -886,14 +886,14 @@ i386_displaced_step_fixup (struct gdbarch *gdbarch,
the displaced instruction; make it relative. Well, signal
handler returns don't need relocation either, but we use the
value of %eip to recognize those; see below. */
- if (! i386_absolute_jmp_p (insn)
- && ! i386_absolute_call_p (insn)
- && ! i386_ret_p (insn))
+ if (!completed_p
+ || (!i386_absolute_jmp_p (insn)
+ && !i386_absolute_call_p (insn)
+ && !i386_ret_p (insn)))
{
- ULONGEST orig_eip;
int insn_len;
- regcache_cooked_read_unsigned (regs, I386_EIP_REGNUM, &orig_eip);
+ CORE_ADDR pc = regcache_read_pc (regs);
/* A signal trampoline system call changes the %eip, resuming
execution of the main program after the signal handler has
@@ -910,25 +910,25 @@ i386_displaced_step_fixup (struct gdbarch *gdbarch,
it unrelocated. Goodness help us if there are PC-relative
system calls. */
if (i386_syscall_p (insn, &insn_len)
- && orig_eip != to + (insn - insn_start) + insn_len
+ && pc != to + (insn - insn_start) + insn_len
/* GDB can get control back after the insn after the syscall.
Presumably this is a kernel bug.
i386_displaced_step_copy_insn ensures its a nop,
we add one to the length for it. */
- && orig_eip != to + (insn - insn_start) + insn_len + 1)
+ && pc != to + (insn - insn_start) + insn_len + 1)
displaced_debug_printf ("syscall changed %%eip; not relocating");
else
{
- ULONGEST eip = (orig_eip - insn_offset) & 0xffffffffUL;
+ ULONGEST eip = (pc - insn_offset) & 0xffffffffUL;
/* If we just stepped over a breakpoint insn, we don't backup
the pc on purpose; this is to match behaviour without
stepping. */
- regcache_cooked_write_unsigned (regs, I386_EIP_REGNUM, eip);
+ regcache_write_pc (regs, eip);
displaced_debug_printf ("relocated %%eip from %s to %s",
- paddress (gdbarch, orig_eip),
+ paddress (gdbarch, pc),
paddress (gdbarch, eip));
}
}
@@ -941,7 +941,7 @@ i386_displaced_step_fixup (struct gdbarch *gdbarch,
/* If the instruction was a call, the return address now atop the
stack is the address following the copied instruction. We need
to make it the address following the original instruction. */
- if (i386_call_p (insn))
+ if (completed_p && i386_call_p (insn))
{
ULONGEST esp;
ULONGEST retaddr;