Age | Commit message (Collapse) | Author | Files | Lines |
|
2015-09-08 Sandra Loosemore <sandra@codesourcery.com>
gdb/testsuite/
* gdb.threads/hand-call-in-threads.exp: Make sure the thread
command actually switches threads. Give up on remaining
tests if target fails to stop at breakpoint.
|
|
On a target that is both always in non-stop mode and can do displaced
stepping (such as native x86_64 GNU/Linux, with "maint set
target-non-stop on"), the step-over-trips-on-watchpoint.exp test
sometimes fails like this:
(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: thread 1
set scheduler-locking off
(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
step
-[Switching to Thread 0x7ffff7fc0700 (LWP 11782)]
-Hardware watchpoint 4: watch_me
-
-Old value = 0
-New value = 1
-child_function (arg=0x0) at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:39
-39 other = 1; /* set thread-specific breakpoint here */
-(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
+wait_threads () at /home/pedro/gdb/mygit/src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
+49 return 1; /* in wait_threads */
+(gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
Note "scheduler-locking" was set off. The problem is that on such
targets, the step-over of thread 2 and the "step" of thread 1 can be
set to run simultaneously (since with displaced stepping the
breakpoint isn't ever removed from the target), and sometimes, the
"step" of thread 1 finishes first, so it'd take another resume to see
the watchpoint trigger. Fix this by replacing the wait_threads
function with a one-line infinite loop that doesn't call any function,
so that the "step" of thread 1 never finishes.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* gdb.threads/step-over-lands-on-breakpoint.c (wait_threads):
Delete function.
(main): Add alarm. Run an infinite loop instead of calling
wait_threads.
* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): Change
comment.
* gdb.threads/step-over-trips-on-watchpoint.c (wait_threads):
Delete function.
(main): Add alarm. Run an infinite loop instead of calling
wait_threads.
* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Change
comment.
|
|
With "maint set target-non-stop on" we get:
-PASS: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step
+FAIL: gdb.threads/signal-while-stepping-over-bp-other-thread.exp: step
The issue is simply that switch_back_to_stepped_thread is not used in
non-stop mode, thus infrun doesn't output the expected "switching back
to stepped thread" log.
gdb/testsuite/ChangeLog:
2015-08-07 Pedro Alves <palves@redhat.com>
* signal-while-stepping-over-bp-other-thread.exp: Expect "restart
threads" as alternative to "switching back to stepped thread".
|
|
That was pushed by mistake.
|
|
This adds a kfailed test that has the whole process exit just while
several threads continuously step over a breakpoint. Usually, the
process exits just while GDB or GDBserver is handling the breakpoint
hit. In other words, the process disappears while the event thread is
(ptrace-) stopped. This exposes several issues in GDB and GDBserver.
Errors, crashes, etc.
I fixed some of these issues recently, but there's a lot more to do.
It's a bit like playing whack-a-mole at the moment. You fix an issue,
which then exposes several others.
E.g., with the native target, you get (among other errors):
(...)
[New Thread 0x7ffff47b9700 (LWP 18077)]
[New Thread 0x7ffff3fb8700 (LWP 18078)]
[New Thread 0x7ffff37b7700 (LWP 18079)]
Cannot find user-level thread for LWP 18076: generic error
(gdb) KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
PR gdb/18749
* gdb.threads/process-dies-while-handling-bp.c: New file.
* gdb.threads/process-dies-while-handling-bp.exp: New file.
|
|
|
|
Ref: https://sourceware.org/ml/gdb-patches/2015-07/msg00868.html
This adds a test that has a multithreaded program have several threads
continuously fork, while another thread continuously steps over a
breakpoint.
This exposes several intertwined issues, which this patch addresses:
- When we're stopping and suspending threads, some thread may fork,
and we missed setting its suspend count to 1, like we do when a new
clone/thread is detected. When we next unsuspend threads, the fork
child's suspend count goes below 0, which is bogus and fails an
assertion.
- If a step-over is cancelled because a signal arrives, but then gdb
is not interested in the signal, we pass the signal straight back
to the inferior. However, we miss that we need to re-increment the
suspend counts of all other threads that had been paused for the
step-over. As a result, other threads indefinitely end up stuck
stopped.
- If a detach request comes in just while gdbserver is handling a
step-over (in the test at hand, this is GDB detaching the fork
child), gdbserver internal errors in stabilize_thread's helpers,
which assert that all thread's suspend counts are 0 (otherwise we
wouldn't be able to move threads out of the jump pads). The
suspend counts aren't 0 while a step-over is in progress, because
all threads but the one stepping past the breakpoint must remain
paused until the step-over finishes and the breakpoint can be
reinserted.
- Occasionally, we see "BAD - reinserting but not stepping." being
output (from within linux_resume_one_lwp_throw). That was because
GDB pokes memory while gdbserver is busy with a step-over, and that
suspends threads, and then re-resumes them with proceed_one_lwp,
which missed another reason to tell linux_resume_one_lwp that the
thread should be set back to stepping.
- In a couple places, we were resuming threads that are meant to be
suspended. E.g., when a vCont;c/s request for thread B comes in
just while gdbserver is stepping thread A past a breakpoint. The
resume for thread B must be deferred until the step-over finishes.
- The test runs with both "set detach-on-fork" on and off. When off,
it exercises the case of GDB detaching the fork child explicitly.
When on, it exercises the case of gdb resuming the child
explicitly. In the "off" case, gdb seems to exponentially become
slower as new inferiors are created. This is _very_ noticeable as
with only 100 inferiors gdb is crawling already, which makes the
test take quite a bit to run. For that reason, I've disabled the
"off" variant for now.
gdb/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* target/waitstatus.h (enum target_stop_reason)
<TARGET_STOPPED_BY_SINGLE_STEP>: New value.
gdb/gdbserver/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* linux-low.c (handle_extended_wait): Set the fork child's suspend
count if stopping and suspending threads.
(check_stopped_by_breakpoint): If stopped by trace, set the LWP's
stop reason to TARGET_STOPPED_BY_SINGLE_STEP.
(linux_detach): Complete an ongoing step-over.
(lwp_suspended_inc, lwp_suspended_decr): New functions. Use
throughout.
(resume_stopped_resumed_lwps): Don't resume a suspended thread.
(linux_wait_1): If passing a signal to the inferior after
finishing a step-over, unsuspend and re-resume all lwps. If we
see a single-step event but the thread should be continuing, don't
pass the trap to gdb.
(stuck_in_jump_pad_callback, move_out_of_jump_pad_callback): Use
internal_error instead of gdb_assert.
(enqueue_pending_signal): New function.
(check_ptrace_stopped_lwp_gone): Add debug output.
(start_step_over): Use internal_error instead of gdb_assert.
(complete_ongoing_step_over): New function.
(linux_resume_one_thread): Don't resume a suspended thread.
(proceed_one_lwp): If the LWP is stepping over a breakpoint, reset
it stepping.
gdb/testsuite/ChangeLog:
2015-08-06 Pedro Alves <palves@redhat.com>
* gdb.threads/forking-threads-plus-breakpoint.exp: New file.
* gdb.threads/forking-threads-plus-breakpoint.c: New file.
|
|
At https://sourceware.org/ml/gdb-patches/2015-08/msg00097.html, Joel
observed that trying to next/step a program on GNU/Linux sometimes
results in the following failed assertion:
% gdb -q .obj/gprof/main
(gdb) start
(gdb) n
(gdb) step
[...]/infrun.c:2391: internal-error:
resume: Assertion `sig != GDB_SIGNAL_0' failed.
What happened is that, during the "next" operation, GDB hit a
longjmp/exception/step-resume breakpoint but failed to see that this
breakpoint was set for a different thread than the one being stepped.
Joel's detailed analysis follows:
More precisely, at the end of the "start" command, we are stopped at
the start of function Main in main.adb; there are 4 threads in total,
and we are in the main thread (which is thread 1):
(gdb) info thread
Id Target Id Frame
4 Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall ()
3 Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall ()
2 Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall ()
* 1 Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57
All the logs below reference Thread ID/LWP, but it'll be easier to
talk about the threads by GDB thread number. For instance, thread 1
is LWP 28370 while thread 3 is LWP 28378. So, the explanations below
translate the LWPs into thread numbers.
Back to what happens while we are trying to "next' our program:
(gdb) n
infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379))
infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378))
infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377))
infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370))
infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x8054523
We've resumed thread 1 (LWP 28370), and received in return a signal
that the same thread stopped slightly further. It's still in the
range of instructions for the line of source we started the "next"
from, as evidenced by the following trace...
infrun: stepping inside range [0x805451e-0x8054531]
... and thus, we decide to continue stepping the same thread:
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
infrun: prepare_to_wait
That's when we get an event from a different thread (thread 3)...
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x80782d0
infrun: context switch
infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
... which we find to be at the address where we set a breakpoint on
"the unwinder debug hook" (namely "_Unwind_DebugHook"). But GDB fails
to notice that the breakpoint was inserted for thread 1 only, and so
decides to handle it as...
infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME
... and inserts a breakpoint at the corresponding resume address, as
evidenced by this the next log:
infrun: exception resume at 80542a2
That breakpoint seems innocent right now, but will play a role fairly
quickly. But for now, GDB has inserted the exception-resume
breakpoint, and needs to single-step thread 3 past the breakpoint it
just hit. Thus, it temporarily disables the exception breakpoint, and
requests a step of that thread:
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: skipping breakpoint: stepping past insn at: 0x80782d0
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0
infrun: prepare_to_wait
We then get a notification, still from thread 3, that it's now past
that breakpoint...
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x8078424
... so we can resume what we were doing before, which is single-stepping
thread 1 until we get to a new line of code:
infrun: switching back to stepped thread
infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370)
infrun: expected thread still hasn't advanced
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
The "resume" log above shows that we're resuming thread 1 from where
we left off (0x8054523). We get one more stop at 0x8054529, which is
still inside our stepping range so we go again. That's when we get
the following event, from thread 3:
infrun: prepare_to_wait
infrun: target_wait (-1.0.0, status) =
infrun: 28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x80542a2
Now the stop_pc address is interesting, because it's the address of
"exception resume" breakpoint...
infrun: context switch
infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME
... and since that location is at a different line of code, this is
where it decides the "next" operation should stop:
infrun: stop_waiting
[Switching to Thread 0xb7c5aba0 (LWP 28378)]
0x080542a2 in inte_tache_rt.ttache_rt (
<_task>=0x80968ec <inte_tache_rt_inst.tache2>)
at /[...]/inte_tache_rt.adb:54
54 end loop;
However, what GDB should have noticed earlier that the exception
breakpoint we hit was for a different thread, thus should have
single-stepped that thread out of the breakpoint _without_ inserting
the exception-return breakpoint, and then resumed the single-stepping
of the initial thread (thread 1) until that thread stepped out of its
stepping range.
This is what this patch does, and after applying it, GDB now correctly
stops on the next line of code.
The patch adds a C++ test that exercises this, both for setjmp/longjmp
and exception breakpoints. With an unpatched GDB it shows:
(gdb) next
[Switching to Thread 22445.22455]
thread_try_catch (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/next-other-thr-longjmp.c:59
59 catch (...)
(gdb) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 1
next
/home/pedro/gdb/mygit/build/../src/gdb/infrun.c:4865: internal-error: process_event_stop_test: Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' fa
iled.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 2 (GDB internal error)
Resyncing due to internal error.
n
Tested on x86_64-linux, no regressions.
gdb/ChangeLog:
2015-08-05 Pedro Alves <palves@redhat.com>
Joel Brobecker <brobecker@adacore.com>
* breakpoint.c (bpstat_what) <bp_longjmp, bp_longjmp_call_dummy>
<bp_exception, bp_longjmp_resume, bp_exception_resume>: Handle the
case where BS->STOP is not set.
gdb/testsuite/ChangeLog:
2015-08-05 Pedro Alves <palves@redhat.com>
* gdb.threads/next-while-other-thread-longjmps.c: New file.
* gdb.threads/next-while-other-thread-longjmps.exp: New file.
|
|
(attach-many-short-lived-thread.exp races and others)
The buildbots show that attach-many-short-lived-thread.exp is racy.
But after staring at debug logs and playing with SystemTap scripts for
a (long) while, I figured out that neither GDB, nor the kernel nor the
test's program itself are at fault.
The problem is simply that the testsuite machinery is currently
subject to PID-reuse races. The attach-many-short-lived-threads.c
test program just happens to be much more susceptible to trigger this
race because threads and processes share the same number space on
Linux, and the test spawns many many short lived threads in
succession, thus enlarging the race window a lot.
Part of the problem is that several tests spawn processes with "exec&"
(in order to test the "attach" command) , and then at the end of the
test, to make sure things are cleaned up, issue a 'remote_spawn "kill
-p $testpid"'. Since with tcl's "exec&", tcl itself is responsible
for reaping the process's exit status, when we go kill the process,
testpid may have already exited _and_ its status may have (and often
has) been reaped already. Thus it can happen that another process
meanwhile reuses $testpid, and that "kill" command kills the wrong
process... Frequently, that happens to be
attach-many-short-lived-thread, but this explains other test's races
as well.
In the attach-many-short-lived-threads test, it sometimes manifests
like this:
(gdb) file /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads
Reading symbols from /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads...done.
(gdb) Loaded /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads into /home/pedro/gdb/mygit/build/gdb/testsuite/../../gdb/gdb
attach 5940
Attaching to program: /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/attach-many-short-lived-threads, process 5940
warning: process 5940 is a zombie - the process has already terminated
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ptrace: Operation not permitted.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: attach
info threads
No threads.
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: no new threads
set breakpoint always-inserted on
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 1: set breakpoint always-inserted on
Other times the process dies while the test is ongoing (the process is
ptrace-stopped):
(gdb) print again = 1
Cannot access memory at address 0x6020cc
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: reset timer in the inferior
(Recall that on Linux, SIGKILL is not interceptable)
And other times it dies just while we're detaching:
$4 = 319
(gdb) PASS: gdb.threads/attach-many-short-lived-threads.exp: iter 2: print seconds_left
detach
Can't detach Thread 0x7fb13b7de700 (LWP 1842): No such process
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 2: detach
GDB mishandles the latter (it should ignore ESRCH while detaching just
like when continuing), but that's another story.
The fix here is to change spawn_wait_for_attach to use Expect's
'spawn' command instead of Tcl's 'exec&' to spawn programs, because
with spawn we control when to wait for/reap the process. That allows
killing the process by PID without being subject to pid-reuse races,
because even if the process is already dead, the kernel won't reuse
the process's PID until the zombie is reaped.
The other part of the problem lies in DejaGnu itself, unfortunately.
I have occasionally seen tests (attach-many-short-lived-threads
included, but not only that one) die with a random inexplicable
SIGTERM too, and that too is caused by the same reason, except that in
that case, the rogue SIGTERM is sent from this bit in DejaGnu's remote.exp:
exec sh -c "exec > /dev/null 2>&1 && (kill -2 $pgid || kill -2 $pid) && sleep 5 && (kill $pgid || kill $pid) && sleep 5 && (kill -9 $pgid || kill -9 $pid) &"
...
catch "wait -i $shell_id"
Even if the program exits promptly, that whole cascade of kills
carries on in the background, thus potentially killing the poor
process that manages to reuse $pid...
I sent a fix for that to the DejaGnu list:
http://lists.gnu.org/archive/html/dejagnu/2015-07/msg00000.html
With both patches in place, I haven't seen
attach-many-short-lived-threads.exp fail again.
Tested on x86_64 Fedora 20, native, gdbserver and extended-gdbserver.
gdb/testsuite/ChangeLog:
2015-07-31 Pedro Alves <palves@redhat.com>
* gdb.base/attach-pie-misread.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* gdb.base/attach-pie-noexec.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/attach-twice.exp: Likewise.
* gdb.base/attach.exp: Likewise.
(do_command_attach_tests): Use gdb_spawn_with_cmdline_opts and
gdb_test_multiple.
* gdb.base/solib-overlap.exp: Adjust to spawn_wait_for_attach
returning a spawn id instead of a pid. Use spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/valgrind-infcall.exp: Likewise.
* gdb.multi/multi-attach.exp: Likewise.
* gdb.python/py-prompt.exp: Likewise.
* gdb.python/py-sync-interp.exp: Likewise.
* gdb.server/ext-attach.exp: Likewise.
* gdb.threads/attach-into-signal.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.threads/attach-many-short-lived-threads.exp: Adjust to
spawn_wait_for_attach returning a spawn id instead of a pid. Use
spawn_id_get_pid and kill_wait_spawned_process.
* gdb.threads/attach-stopped.exp (corefunc): Use
spawn_wait_for_attach, spawn_id_get_pid and
kill_wait_spawned_process.
* gdb.base/break-interp.exp: Rename $res to $test_spawn_id.
Use spawn_id_get_pid. Wait for spawn id after eof. Use
kill_wait_spawned_process instead of explicit "kill -9".
* lib/gdb.exp (can_spawn_for_attach): Adjust comment.
(kill_wait_spawned_process, spawn_id_get_pid): New procedures.
(spawn_wait_for_attach): Use spawn instead of exec to spawn
processes. Don't map cygwin/windows pids here. Now returns a
spawn id list.
|
|
Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:
(gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
continue &
Continuing.
(gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
[New Thread 28092.28092]
[Thread 28092.28092] #2 stopped.
[New Thread 28094.28094]
[Inferior 2 (process 28092) exited normally]
[New Thread 28094.28105]
[New Thread 28094.28109]
...
[Thread 28174.28174] #18 stopped.
[New Thread 28185.28185]
[Inferior 10 (process 28174) exited normally]
[New Thread 28185.28196]
[Thread 28185.28185] #20 stopped.
Cannot remove breakpoints because program is no longer writable.
Further execution is probably impossible.
[Inferior 11 (process 28185) exited normally]
[Inferior 1 (process 28091) exited normally]
PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
info threads
No threads.
(gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
info inferiors
Num Description Executable
* 1 <null> /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
(gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left
All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.
The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event. GDB later decides whether to follow the parent
or the child. But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:
...
>>>> entering unstop_all_lwps
unstopping all lwps
proceed_one_lwp: lwp 1600
client wants LWP to remain 1600 stopped
proceed_one_lwp: lwp 1828
Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sending sigstop to lwp 1828
pc is 0x3615ebc7cc
Resuming lwp 1828 (continue, signal 0, stop expected)
continue from pc 0x3615ebc7cc
unstop_all_lwps done
sigchld_handler
<<<< exiting unstop_all_lwps
handling possible target event
>>>> entering linux_wait_1
linux_wait_1: [<all threads>]
my_waitpid (-1, 0x40000001)
my_waitpid (-1, 0x1): status(137f), 1828
LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
LLW: waitpid 1828 received Stopped (signal) (stopped)
pc is 0x3615ebc7cc
Expected stop.
LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
linux_wait_1 ret = LWP 1828.1828, 1, 0
<<<< exiting linux_wait_1
Writing resume reply for LWP 1828.1828:1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Tested on x86_64 Fedora 20, extended-remote.
gdb/gdbserver/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
* linux-low.c (handle_extended_wait): Set the child's last
reported status to TARGET_WAITKIND_STOPPED.
|
|
The new gdb.threads/fork-plus-threads.exp test exposes one more
problem. When one types "info inferiors" after running the program,
one see's a couple inferior left still, while there should only be
inferior #1 left. E.g.:
(gdb) info inferiors
Num Description Executable
4 process 8393 /home/pedro/bugs/src/test
2 process 8388 /home/pedro/bugs/src/test
* 1 <null> /home/pedro/bugs/src/test
(gdb) info threads
Calling prune_inferiors() manually at this point (from a top gdb) does
not remove them, because they still have inf->pid != 0 (while they
shouldn't). This suggests that we never mourned those inferiors.
Enabling logs (master + previous patch) we see:
...
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9513) received Trace/breakpoint trap (stopped)
WL: Handling extended status 0x03057f
LHEW: Got clone event from LWP 9513, new child is LWP 9579
[New Thread 0x7ffff37b8700 (LWP 9579)]
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9508) received 0 (exited)
WL: Thread 0x7ffff7fc2740 (LWP 9508) exited.
^^^^^^^^
[Thread 0x7ffff7fc2740 (LWP 9508) exited]
WL: waitpid Thread 0x7ffff7fc2740 (LWP 9499) received 0 (exited)
WL: Thread 0x7ffff7fc2740 (LWP 9499) exited.
[Thread 0x7ffff7fc2740 (LWP 9499) exited]
RSRL: resuming stopped-resumed LWP Thread 0x7ffff37b8700 (LWP 9579) at 0x3615ef4ce1: step=0
...
(gdb) info inferiors
Num Description Executable
5 process 9508 /home/pedro/bugs/src/test
^^^^
4 process 9503 /home/pedro/bugs/src/test
3 process 9500 /home/pedro/bugs/src/test
2 process 9499 /home/pedro/bugs/src/test
* 1 <null> /home/pedro/bugs/src/test
(gdb)
...
Note the "Thread 0x7ffff7fc2740 (LWP 9508) exited." line.
That's this in wait_lwp:
/* Check if the thread has exited. */
if (WIFEXITED (status) || WIFSIGNALED (status))
{
thread_dead = 1;
if (debug_linux_nat)
fprintf_unfiltered (gdb_stdlog, "WL: %s exited.\n",
target_pid_to_str (lp->ptid));
}
}
That was the leader thread reporting an exit, meaning the whole
process is gone. So the problem is that this code doesn't understand
that an WIFEXITED status of the leader LWP should be reported to
infrun as process exit.
gdb/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
PR threads/18600
* linux-nat.c (wait_lwp): Report to the core when thread group
leader exits.
gdb/testsuite/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
PR threads/18600
* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
only shows inferior 1.
|
|
When a program forks and another process start threads while gdb is
handling the fork event, newly created threads are left stuck stopped
by gdb, even though gdb presents them as "running", to the user.
This can be seen with the test added by this patch. The test has the
inferior fork a certain number of times and waits for all children to
exit. Each fork child spawns a number of threads that do nothing and
joins them immediately. Normally, the program should run unimpeded
(from the point of view of the user) and exit very quickly. Without
this fix, it doesn't because of some threads left stopped by gdb, so
inferior 1 never exits.
The program triggers when a new clone thread is found while inside the
linux_stop_and_wait_all_lwps call in linux-thread-db.c:
linux_stop_and_wait_all_lwps ();
ALL_LWPS (lp)
if (ptid_get_pid (lp->ptid) == pid)
thread_from_lwp (lp->ptid);
linux_unstop_all_lwps ();
Within linux_stop_and_wait_all_lwps, we reach
linux_handle_extended_wait with the "stopping" parameter set to 1, and
because of that we don't mark the new lwp as resumed. As consequence,
the subsequent resume_stopped_resumed_lwps, called from
linux_unstop_all_lwps, never resumes the new LWP.
There's lots of cruft in linux_handle_extended_wait that no longer
makes sense. On systems with CLONE events support, we don't rely on
libthread_db for thread listing anymore, so the code that preserves
stop_requested and the handling of last_resume_kind is all dead.
So the fix is to remove all that, and simply always mark the new LWP
as resumed, so that resume_stopped_resumed_lwps re-resumes it.
gdb/ChangeLog:
2015-07-30 Pedro Alves <palves@redhat.com>
Simon Marchi <simon.marchi@ericsson.com>
PR threads/18600
* linux-nat.c (linux_handle_extended_wait): On CLONE event, always
mark the new thread as resumed. Remove STOPPING parameter.
(wait_lwp): Adjust call to linux_handle_extended_wait.
(linux_nat_filter_event): Adjust call to
linux_handle_extended_wait.
(resume_stopped_resumed_lwps): Add debug output.
gdb/testsuite/ChangeLog:
2015-07-30 Simon Marchi <simon.marchi@ericsson.com>
Pedro Alves <palves@redhat.com>
PR threads/18600
* gdb.threads/fork-plus-threads.c: New file.
* gdb.threads/fork-plus-threads.exp: New file.
|
|
Hi,
While examining BuildBot's logs, I noticed:
<https://sourceware.org/ml/gdb-testers/2015-q3/msg03767.html>
gdb.threads/attach-into-signal.exp has two nested loops and don't use
unique messages. This commit fixes that. Pushed under the obvious
rule.
gdb/testsuite/ChangeLog:
2015-07-29 Sergio Durigan Junior <sergiodj@redhat.com>
* gdb.threads/attach-into-signal.exp (corefunc): Use
with_test_prefix on nested loops, uniquefying the test messages.
|
|
If a non-leader thread exits the process while all other threads are
ptrace-stopped, native gdb fails an assertion. The test added by this
commit catches it:
/home/pedro/gdb/mygit/build/../src/gdb/linux-nat.c:3198: internal-error: linux_nat_filter_event: Assertion `lp->resumed' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)
FAIL: gdb.threads/non-leader-exit-process.exp: program exits normally (GDB internal error)
The fix is just to remove the assertion.
With that out of the way, neither GDB not GDBserver handle this
perfectly though, so I'm adding a KFAIL:
(gdb) continue
Continuing.
[Thread 0x7ffff7fc0700 (LWP 15350) exited]
No unwaited-for children left.
Couldn't get registers: No such process.
(gdb) KFAIL: gdb.threads/non-ldr-exit.exp: program exits normally (PRMS: gdb/18717)
gdb/ChangeLog:
2015-07-24 Pedro Alves <palves@redhat.com>
PR gdb/18717
* linux-nat.c (linux_nat_filter_event): Don't assert that the lwp
is resumed, and extend the debug log.
gdb/testsuite/ChangeLog:
2015-07-24 Pedro Alves <palves@redhat.com>
PR gdb/18717
* gdb.threads/non-ldr-exit.c: New file.
* gdb.threads/non-ldr-exit.exp: New file.
|
|
Refs:
https://sourceware.org/ml/gdb/2015-03/msg00024.html
https://sourceware.org/ml/gdb/2015-06/msg00005.html
On GNU/Linux, if an infcall spawns a thread, that thread ends up with
stuck running state. This happens because:
- when linux-nat.c detects a new thread, it marks them as running,
and does not report anything to the core.
- we skip finish_thread_state when the thread that is running the
infcall stops.
As result, that new thread ends up with stuck "running" state, even
though it really is stopped.
On Windows, _all_ threads end up stuck in running state, not just the
one that was spawned. That happens because when a new thread is
detected, unlike linux-nat.c, windows-nat.c reports
TARGET_WAITKIND_SPURIOUS to infrun. It's the fact that that event
does not cause a user-visible stop that triggers the problem. When
the target is re-resumed, we call set_running with a wildcard ptid,
which marks all thread as running. That set_running is not suppressed
because the (leader) thread being resumed does not have in_infcall
set. Later, when the infcall finally finishes successfully, nothing
marks all threads back to stopped.
We can trigger the same problem on all targets by having a thread
other than the one that is running the infcall report a breakpoint hit
to infrun, and then have that breakpoint not cause a stop. That's
what the included test does.
The fix is to stop GDB from suppressing the set_running calls while
doing an infcall, and then set the threads back to stopped when the
call finishes, iff they were originally stopped before the infcall
started. (Note the MI *running/*stopped event suppression isn't
affected.)
Tested on x86_64 GNU/Linux.
gdb/ChangeLog:
2015-06-29 Pedro Alves <palves@redhat.com>
PR threads/18127
* infcall.c (run_inferior_call): On infcall success, if the thread
was marked stopped before, reset it back to stopped.
* infrun.c (resume): Don't suppress the set_running calls when
doing an infcall.
(normal_stop): Only discard the finish_thread_state cleanup if the
infcall succeeded.
gdb/testsuite/ChangeLog:
2015-06-29 Pedro Alves <palves@redhat.com>
PR threads/18127
* gdb.threads/hand-call-new-thread.c: New file.
* gdb.threads/hand-call-new-thread.c: New file.
|
|
gdb/testsuite/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
* gdb.threads/signal-while-stepping-over-bp-other-thread.exp: Use
gdb_test_sequence and gdb_assert.
|
|
Diffing test results, I noticed:
-PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x0000000000400811 thread 1
+PASS: gdb.threads/step-over-trips-on-watchpoint.exp: displaced=on: with thread-specific bp: next: b *0x00000000004007d1 thread 1
gdb/testsuite/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Use
test messages that don't include the breakpoint address.
|
|
stepping
These tests exercise the infrun.c:proceed code that needs to know to
start new step overs (along with switch_back_to_stepped_thread, etc.).
That code is tricky to get right in the multitude of possible
combinations (at least):
(native | remote)
X (all-stop | all-stop-but-target-always-in-non-stop)
X (displaced-stepping | in-line step-over).
The first two above are properties of the target, but the different
step-over-breakpoint methods should work with any target that supports
them. This patch makes sure we always test both methods on all
targets.
Tested on x86-64 Fedora 20.
gdb/testsuite/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
* gdb.threads/step-over-lands-on-breakpoint.exp (do_test): New
procedure, factored out from ...
(top level): ... here. Add "set displaced-stepping" testing axis.
* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): New
parameter "displaced". Use it.
(top level): Use foreach and add "set displaced-stepping" testing
axis.
|
|
This test is currently failing like this on (at least) PPC64 and s390x:
FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: next: next
FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: step: step
FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: with thread-specific bp: next: next
gdb.log:
(gdb) PASS: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: set scheduler-locking off
step
wait_threads () at ../../../src/gdb/testsuite/gdb.threads/step-over-trips-on-watchpoint.c:49
49 return 1; /* in wait_threads */
(gdb) FAIL: gdb.threads/step-over-trips-on-watchpoint.exp: no thread-specific bp: step: step
The problem is that the test assumes that both the "watch_me = 1;" and
the "other = 1;" lines compile to a single instruction each, which
happens to be true on x86, but no necessarily true everywhere else.
The result is that the test doesn't really test what it wants to test.
Fix it by looking for the instruction that triggers the watchpoint.
gdb/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
* gdb.threads/step-over-trips-on-watchpoint.c (child_function):
Remove comment.
* gdb.threads/step-over-trips-on-watchpoint.exp (do_test): Find
both the address of the instruction that triggers the watchpoint
and the address of the instruction immediately after, and use
those addresses for the test. Fix comment.
|
|
The problem is that with hardware step targets and displaced stepping,
"signal FOO" when stopped at a breakpoint steps the breakpoint
instruction at the same time it delivers a signal. This results in
tp->stepped_breakpoint set, but no step-resume breakpoint set. When
the next stop event arrives, GDB crashes. Irrespective of whether we
should do something more/different to step past the breakpoint in this
scenario (e.g., PR 18225), it's just wrong to assume there'll be a
step-resume breakpoint set (and was not the original intention).
gdb/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
PR gdb/18216
* infrun.c (process_event_stop_test): Don't assume a step-resume
is set if tp->stepped_breakpoint is true.
gdb/testsuite/ChangeLog:
2015-04-10 Pedro Alves <palves@redhat.com>
PR gdb/18216
* gdb.threads/multiple-step-overs.exp: Remove expected eof.
|
|
Both PRs are triggered by the same use case.
PR18214 is about software single-step targets. On those, the 'resume'
code that detects that we're stepping over a breakpoint and delivering
a signal at the same time:
/* Currently, our software single-step implementation leads to different
results than hardware single-stepping in one situation: when stepping
into delivering a signal which has an associated signal handler,
hardware single-step will stop at the first instruction of the handler,
while software single-step will simply skip execution of the handler.
...
Fortunately, we can at least fix this particular issue. We detect
here the case where we are about to deliver a signal while software
single-stepping with breakpoints removed. In this situation, we
revert the decisions to remove all breakpoints and insert single-
step breakpoints, and instead we install a step-resume breakpoint
at the current address, deliver the signal without stepping, and
once we arrive back at the step-resume breakpoint, actually step
over the breakpoint we originally wanted to step over. */
doesn't handle the case of _another_ thread also needing to step over
a breakpoint. Because the other thread is just resumed at the PC
where it had stopped and a breakpoint is still inserted there, the
thread immediately re-traps the same breakpoint. This test exercises
that. On software single-step targets, it fails like this:
KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler
KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr2: continue to sigusr1_handler
gdb.log (simplified):
(gdb) continue
Continuing.
Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
66 callme (); /* set breakpoint thread 2 here */
(gdb) thread 3
(gdb) queue-signal SIGUSR1
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fc1740 (LWP 24824))]
#0 main () at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:106
106 wait_threads (); /* set wait-threads breakpoint here */
(gdb) break sigusr1_handler
Breakpoint 5 at 0x400837: file src/gdb/testsuite/gdb.threads/multiple-step-overs.c, line 31.
(gdb) continue
Continuing.
[Switching to Thread 0x7ffff7fc0700 (LWP 24828)]
Breakpoint 4, child_function_2 (arg=0x0) at src/gdb/testsuite/gdb.threads/multiple-step-overs.c:66
66 callme (); /* set breakpoint thread 2 here */
(gdb) KFAIL: gdb.threads/multiple-step-overs.exp: displaced=off: signal thr3: continue to sigusr1_handler
For good measure, I made the test try displaced stepping too. And
then I found it crashes GDB on x86-64 (a hardware step target), but
only when displaced stepping... :
KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr1: continue to sigusr1_handler (PRMS: gdb/18216)
KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr2: continue to sigusr1_handler (PRMS: gdb/18216)
KFAIL: gdb.threads/multiple-step-overs.exp: displaced=on: signal thr3: continue to sigusr1_handler (PRMS: gdb/18216)
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
4964 if (sr_bp->loc->permanent
Setting up the environment for debugging gdb.
Breakpoint 1 at 0x79fcfc: file src/gdb/common/errors.c, line 54.
Breakpoint 2 at 0x50a26c: file src/gdb/cli/cli-cmds.c, line 217.
(top-gdb) p sr_bp
$1 = (struct breakpoint *) 0x0
(top-gdb) bt
#0 0x000000000062a83a in process_event_stop_test (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4964
#1 0x000000000062a1af in handle_signal_stop (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4715
#2 0x0000000000629097 in handle_inferior_event (ecs=0x7fff847eeee0) at src/gdb/infrun.c:4165
#3 0x0000000000627482 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3298
#4 0x000000000064ad7b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#5 0x00000000004c375f in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4658
#6 0x0000000000648c47 in handle_file_event (file_ptr=0x2e0eaa0, ready_mask=1) at src/gdb/event-loop.c:658
The all-stop-non-stop series fixes this, but meanwhile, this augments
the multiple-step-overs.exp test to cover this, KFAILed.
gdb/testsuite/ChangeLog:
2015-04-08 Pedro Alves <palves@redhat.com>
PR gdb/18214
PR gdb/18216
* gdb.threads/multiple-step-overs.c (sigusr1_handler): New
function.
(main): Install it as SIGUSR1 handler.
* gdb.threads/multiple-step-overs.exp (setup): Remove 'prefix'
parameter. Always use "setup" as prefix. Toggle "set
displaced-stepping" off/on depending on global. Don't switch to
thread 1 here.
(top level): Add displaced stepping "off/on" test axis. Update
"setup" calls. Wrap each subtest with with_test_prefix. Test
continuing with a queued signal in each thread.
|
|
Nowadays, the alarm value is 60, and alarm is generated on some slow
boards. This patch is to pass DejaGNU timeout value to the program,
and move the alarm call before going to infinite loop. If any thread
has activities, the alarm is reset.
gdb/testsuite:
2015-04-07 Yao Qi <yao.qi@linaro.org>
* gdb.threads/non-stop-fair-events.c (SECONDS): New macro.
(child_function): Call alarm.
(main): Move call to alarm into the loop.
* gdb.threads/non-stop-fair-events.exp: Build program with
-DTIMEOUT=$timeout.
|
|
I see these two fails in no-unwaited-for-left.exp in remote testing
for aarch64-linux target.
...
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.
[Thread 1084] #2 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when thread 2 exits
....
continue
Continuing.
warning: Remote failure reply: E.No unwaited-for children left.
[Thread 1081] #1 stopped.
(gdb) FAIL: gdb.threads/no-unwaited-for-left.exp: continue stops when the main thread exits
I checked the gdb.log on buildbot, and find that these two fails also
appear on Debian-i686-native-extended-gdbserver and Fedora-ppc64be-native-gdbserver-m64.
I recall that they are about local/remote parity, and related RSP is missing.
There has been already a PR 14618 about it. This patch is to kfail them
on remote target.
gdb/testsuite:
2015-04-02 Yao Qi <yao.qi@linaro.org>
* gdb.threads/no-unwaited-for-left.exp: Set up kfail if target
is remote.
|
|
If interrupt_and_wait manages to trigger the FAIL path, we get:
ERROR OCCURED: can't read "test": no such variable
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/manythreads.exp (interrupt_and_wait): Pass $message
to fail instead of non-existent $test.
|
|
On GNU/Linux, if the target reuses the TID of a thread that GDB still
has in its list marked as THREAD_EXITED, GDB crashes, like:
(gdb) continue
Continuing.
src/gdb/thread.c:789: internal-error: set_running: Assertion `tp->state != THREAD_EXITED' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n) FAIL: gdb.threads/tid-reuse.exp: continue to breakpoint: after_reuse_time (GDB internal error)
Here:
(top-gdb) bt
#0 internal_error (file=0x953dd8 "src/gdb/thread.c", line=789, fmt=0x953da0 "%s: Assertion `%s' failed.")
at src/gdb/common/errors.c:54
#1 0x0000000000638514 in set_running (ptid=..., running=1) at src/gdb/thread.c:789
#2 0x00000000004bda42 in linux_handle_extended_wait (lp=0x16f5760, status=0, stopping=0) at src/gdb/linux-nat.c:2114
#3 0x00000000004bfa24 in linux_nat_filter_event (lwpid=20570, status=198015) at src/gdb/linux-nat.c:3127
#4 0x00000000004c070e in linux_nat_wait_1 (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3478
#5 0x00000000004c1015 in linux_nat_wait (ops=0xe193d0, ptid=..., ourstatus=0x7fffffffd2c0, target_options=1) at src/gdb/linux-nat.c:3722
#6 0x00000000004c92d2 in thread_db_wait (ops=0xd80b60 <thread_db_ops>, ptid=..., ourstatus=0x7fffffffd2c0, options=1)
at src/gdb/linux-thread-db.c:1525
#7 0x000000000066db43 in delegate_wait (self=0xd80b60 <thread_db_ops>, arg1=..., arg2=0x7fffffffd2c0, arg3=1) at src/gdb/target-delegates.c:116
#8 0x000000000067e54b in target_wait (ptid=..., status=0x7fffffffd2c0, options=1) at src/gdb/target.c:2206
#9 0x0000000000625111 in fetch_inferior_event (client_data=0x0) at src/gdb/infrun.c:3275
#10 0x0000000000648a3b in inferior_event_handler (event_type=INF_REG_EVENT, client_data=0x0) at src/gdb/inf-loop.c:56
#11 0x00000000004c2ecb in handle_target_event (error=0, client_data=0x0) at src/gdb/linux-nat.c:4655
I managed to come up with a test that reliably reproduces this. It
spawns enough threads for the pid number space to wrap around, so
could potentially take a while. On my box that's 4 seconds; on
gcc110, a PPC box which has max_pid set to 65536, it's over 10
seconds. So I made the test compute how long that would take, and cap
the time waited if it would be unreasonably long.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* linux-thread-db.c (record_thread): Readd the thread to gdb's
list if it was marked exited.
gdb/testsuite/ChangeLog:
2015-04-01 Pedro Alves <palves@redhat.com>
* gdb.threads/tid-reuse.c: New file.
* gdb.threads/tid-reuse.exp: New file.
|
|
I noticed that "thread apply all" sometimes crashes.
The problem is that thread_apply_all_command doesn take exited threads
into account, and we qsort and then walk more elements than there
really ever were put in the array. Valgrind shows:
The current thread <Thread ID 3> has terminated. See `help thread'.
(gdb) thread apply all p 1
Thread 1 (Thread 0x7ffff7fc2740 (LWP 29579)):
$1 = 1
==29576== Use of uninitialised value of size 8
==29576== at 0x639CA8: set_thread_refcount (thread.c:1337)
==29576== by 0x5C2C7B: do_my_cleanups (cleanups.c:155)
==29576== by 0x5C2CE8: do_cleanups (cleanups.c:177)
==29576== by 0x63A191: thread_apply_all_command (thread.c:1477)
==29576== by 0x50374D: do_cfunc (cli-decode.c:105)
==29576== by 0x506865: cmd_func (cli-decode.c:1893)
==29576== by 0x7562CB: execute_command (top.c:476)
==29576== by 0x647DA4: command_handler (event-top.c:494)
==29576== by 0x648367: command_line_handler (event-top.c:692)
==29576== by 0x7BF7C9: rl_callback_read_char (callback.c:220)
==29576== by 0x64784C: rl_callback_read_char_wrapper (event-top.c:171)
==29576== by 0x647CB5: stdin_event_handler (event-top.c:432)
==29576==
...
This can happen easily today as linux-nat.c/linux-thread-db.c are
forgetting to purge non-current exited threads. But even with that
fixed, we can always do "thread apply all" with an exited thread
selected, which won't be deleted until the user switches to another
thread. That's what the test added by this commit exercises.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* thread.c (thread_apply_all_command): Take exited threads into
account.
gdb/testsuite/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* gdb.threads/no-unwaited-for-left.exp: Test "thread apply all".
|
|
Currently, "set scheduler-locking step" is a bit odd. The manual
documents it as being optimized for stepping, so that focus of
debugging does not change unexpectedly, but then it says that
sometimes other threads may run, and thus focus may indeed change
unexpectedly... A user can then be excused to get confused and wonder
why does GDB behave like this.
I don't think a user should have to know about details of how "next"
or whatever other run control command is implemented internally to
understand when does the "scheduler-locking step" setting take effect.
This patch completes a transition that the code has been moving
towards for a while. It makes "set scheduler-locking step" hold
threads depending on whether the _command_ the user entered was a
stepping command [step/stepi/next/nexti], or not.
Before, GDB could end up locking threads even on "continue" if for
some reason run control decides a thread needs to be single stepped
(e.g., for a software watchpoint).
After, if a "continue" happens to need to single-step for some reason,
we won't lock threads (unless when stepping over a breakpoint,
naturally). And if a stepping command wants to continue a thread for
bit, like when skipping a function to a step-resume breakpoint, we'll
still lock threads, so focus of debugging doesn't change.
In order to make this work, we need to record in the thread structure
whether what set it running was a stepping command.
(A follow up patch will remove the "step" parameters of 'proceed' and 'resume')
FWIW, Fedora GDB, which defaults to "scheduler-locking step" (mainline
defaults to "off") carries a different patch that goes in this
direction as well.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* gdbthread.h (struct thread_control_state) <stepping_command>:
New field.
* infcmd.c (step_once): Pass step=1 to clear_proceed_status. Set
the thread's stepping_command field.
* infrun.c (resume): Check the thread's stepping_command flag to
determine which threads should be resumed. Rename 'entry_step'
local to user_step.
(clear_proceed_status_thread): Clear 'stepping_command'.
(schedlock_applies): Change parameter type to struct thread_info
pointer. Adjust.
(find_thread_needs_step_over): Remove 'step' parameter. Adjust.
(switch_back_to_stepped_thread): Adjust calls to
'schedlock_applies'.
(_initialize_infrun): Adjust "set scheduler-locking step" help.
gdb/testsuite/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* gdb.threads/schedlock.exp (test_step): No longer expect that
"set scheduler-locking step" with "next" over a function call runs
threads unlocked.
gdb/doc/ChangeLog:
2015-03-24 Pedro Alves <palves@redhat.com>
* gdb.texinfo (test_step) <set scheduler-locking step>: No longer
mention that threads may sometimes run unlocked.
|
|
Wanting to make sure the new continue-pending-status.exp test tests
both cases of threads 2 and 3 reporting an event, I added counters to
the test, to make it FAIL if events for both threads aren't seen.
Assuming a well behaved backend, and given a reasonable number of
iterations, it should PASS.
However, running that against GNU/Linux gdbserver, I found that
surprisingly, that FAILed. GDBserver always reported the breakpoint
hit for the same thread.
Turns out that I broke gdbserver's thread event randomization
recently, with git commit 582511be ([gdbserver] linux-low.c: better
starvation avoidance, handle non-stop mode too). In that commit I
missed that the thread structure also has a status_pending_p field...
The end result was that count_events_callback always returns 0, and
then if no thread is stepping, select_event_lwp always returns the
event thread. IOW, no randomization is happening at all. Quite
curious how all the other changes in that patch were sufficient to fix
non-stop-fair-events.exp anyway even with that broken.
Tested on x86_64 Fedora 20, native and gdbserver.
gdb/gdbserver/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-low.c (count_events_callback, select_event_lwp_callback):
Use the lwp's status_pending_p field, not the thread's.
gdb/testsuite/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* gdb.threads/continue-pending-status.exp (saw_thread_2)
(saw_thread_3): New globals.
(top level): Increment them when an event for the corresponding
thread is seen.
(no thread starvation): New test.
|
|
If the linux_nat_resume's short-circuits the resume because the
current thread has a pending status, and, a thread with a higher
number was previously stopped for a breakpoint, GDB internal errors,
like:
/home/pedro/gdb/mygit/src/gdb/linux-nat.c:2590: internal-error: status_callback: Assertion `lp->status != 0' failed.
Fix this by make status_callback bail out earlier. GDBserver is
already doing the same.
New test added that exercises this.
gdb/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* linux-nat.c (status_callback): Return early if the LWP has no
status pending.
gdb/testsuite/ChangeLog:
2015-03-19 Pedro Alves <palves@redhat.com>
* gdb.threads/continue-pending-status.c: New file.
* gdb.threads/continue-pending-status.exp: New file.
|
|
Gary stumbled on this:
(gdb) PASS: gdb.threads/thread-specific-bp.exp: all-stop: continue to end
info threads
Id Target Id Frame
* 1 Thread 0x7ffff7fdb700 (LWP 13717) "thread-specific" end () at /home/gary/work/archer/startswith/src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29
(gdb) FAIL: gdb.threads/thread-specific-bp.exp: all-stop: thread start is gone
info breakpoint
The problem is that "...archer/startswith/src..." has a "start" in it,
which matches the too-lax regex in the test.
Rather than tweaking the regex, we can just remove the whole "info
threads", like we removed similar ones in other files -- GDB nowadays
does this implicitly already, so things should work without it. Thus
removing this even improves testing here a bit.
gdb/testsuite/ChangeLog:
2015-03-04 Pedro Alves <palves@redhat.com>
* gdb.threads/thread-specific-bp.exp: Delete "info threads" test.
|
|
This fixes:
> gdb compile failed, /gdb/testsuite/gdb.threads/clone-thread_db.c: In function 'main':
> /gdb/testsuite/gdb.threads/clone-thread_db.c:67:3: warning: implicit declaration of function 'alarm' [-Wimplicit-function-declaration]
> alarm (300);
> ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:69:3: warning: implicit declaration of function 'pthread_create' [-Wimplicit-function-declaration]
> pthread_create (&child, NULL, thread_fn, NULL);
> ^
> /gdb/testsuite/gdb.threads/clone-thread_db.c:70:3: warning: implicit declaration of function 'pthread_join' [-Wimplicit-function-declaration]
> pthread_join (child);
> ^
And then adding the missing headers revealed the pthread_join call was
incorrect. This probably fixes the crash we see on ppc64be, e.g., at
https://sourceware.org/ml/gdb-testers/2015-q1/msg04415.html
the logs there show:
...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x3fffb7ff54a0 (LWP 9275)]
0x00003fffb7f3ce74 in .pthread_join () from /lib64/libpthread.so.0
(gdb) FAIL: gdb.threads/clone-thread_db.exp: continue to end
...
Tested on x86_64 Fedora 20.
gdb/testsuite/
2015-03-04 Pedro Alves <palves@redhat.com>
* gdb.threads/clone-thread_db.c: Include unistd.h and pthread.h.
(main): Pass missing retval argument to pthread_join call.
|
|
This fixes invalid reads Valgrind first caught when debugging against
a GDBserver patched with a series that adds exec events to the remote
protocol. Like these, using the gdb.threads/thread-execl.exp test:
$ valgrind ./gdb -data-directory=data-directory ./testsuite/gdb.threads/thread-execl -ex "tar extended-remote :9999" -ex "b thread_execler" -ex "c" -ex "set scheduler-locking on"
...
Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29 if (execl (image, image, NULL) == -1)
(gdb) n
Thread 32509.32509 is executing new program: build/gdb/testsuite/gdb.threads/thread-execl
[New Thread 32509.32532]
==32510== Invalid read of size 4
==32510== at 0x5AA7D8: delete_breakpoint (breakpoint.c:13989)
==32510== by 0x6285D3: delete_thread_breakpoint (thread.c:100)
==32510== by 0x628603: delete_step_resume_breakpoint (thread.c:109)
==32510== by 0x61622B: delete_thread_infrun_breakpoints (infrun.c:2928)
==32510== by 0x6162EF: for_each_just_stopped_thread (infrun.c:2958)
==32510== by 0x616311: delete_just_stopped_threads_infrun_breakpoints (infrun.c:2969)
==32510== by 0x616C96: fetch_inferior_event (infrun.c:3267)
==32510== by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510== by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510== by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510== by 0x4AF6F0: fd_event (ser-base.c:182)
==32510== by 0x63806D: handle_file_event (event-loop.c:762)
==32510== Address 0xcf333e0 is 16 bytes inside a block of size 200 free'd
==32510== at 0x4A07577: free (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==32510== by 0x77CB74: xfree (common-utils.c:98)
==32510== by 0x5AA954: delete_breakpoint (breakpoint.c:14056)
==32510== by 0x5988BD: update_breakpoints_after_exec (breakpoint.c:3765)
==32510== by 0x61360F: follow_exec (infrun.c:1091)
==32510== by 0x6186FA: handle_inferior_event (infrun.c:4061)
==32510== by 0x616C55: fetch_inferior_event (infrun.c:3261)
==32510== by 0x63A2DE: inferior_event_handler (inf-loop.c:57)
==32510== by 0x4E0E56: remote_async_serial_handler (remote.c:11877)
==32510== by 0x4AF620: run_async_handler_and_reschedule (ser-base.c:137)
==32510== by 0x4AF6F0: fd_event (ser-base.c:182)
==32510== by 0x63806D: handle_file_event (event-loop.c:762)
==32510==
[Switching to Thread 32509.32532]
Breakpoint 1, thread_execler (arg=0x0) at src/gdb/testsuite/gdb.threads/thread-execl.c:29
29 if (execl (image, image, NULL) == -1)
(gdb)
The breakpoint in question is the step-resume breakpoint of the
non-main thread, the one that was "next"ed.
The exact same issue can be seen on mainline with native debugging, by
running the thread-execl.exp test in non-stop mode, because the kernel
doesn't report a thread exit event for the execing thread.
Tested on x86_64 Fedora 20.
gdb/ChangeLog:
2015-03-02 Pedro Alves <palves@redhat.com>
* infrun.c (follow_exec): Delete all threads of the process except
the event thread. Extended comments.
gdb/testsuite/ChangeLog:
2015-03-02 Pedro Alves <palves@redhat.com>
* gdb.threads/thread-execl.exp (do_test): Handle non-stop.
(top level): Call do_test with non-stop as well.
|
|
The buildbot shows that the new
gdb.threads/multi-create-ns-info-thr.exp test is timing out when
tested with --target=native-extended-remote. The reason is:
No breakpoints or watchpoints.
(gdb) break main
Breakpoint 1 at 0x10000b00: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 72.
(gdb) run
Starting program: /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-cre
ate-ns-info-thr
Process /home/gdb-buildbot/fedora-21-ppc64be-1/fedora-ppc64be-native-extended-gdbserver/build/gdb/testsuite/outputs/gdb.threads/multi-create-ns-info-thr/multi-create-ns-inf
o-thr created; pid = 16266
Unexpected vCont reply in non-stop mode: T0501:00003fffffffd190;40:00000080560fe290;thread:p3f8a.3f8a;core:0;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(gdb) break multi-create.c:45
Breakpoint 2 at 0x10000994: file ../../../binutils-gdb/gdb/testsuite/gdb.threads/multi-create.c, line 45.
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
Non-stop tests don't really work with the
--target_board=native-extended-remote board, because tests toggle
non-stop on after GDB is already connected to gdbserver, while
Currently, non-stop must be enabled before connecting.
This adjusts the test to bail if running to main fails, like all other
non-stop tests.
Note non-stop tests do work with --target_board=native-gdbserver.
gdb/testsuite/ChangeLog:
2015-02-21 Pedro Alves <palves@redhat.com>
* gdb.threads/multi-create-ns-info-thr.exp: Return early if
runto_main fails.
|
|
TL;DR - GDB can hang if something refreshes the thread list out of the
target while the target is running. GDB hangs inside td_ta_thr_iter.
The fix is to not use that libthread_db function anymore.
Long version:
Running the testsuite against my all-stop-on-top-of-non-stop series is
still exposing latent non-stop bugs.
I was originally seeing this with the multi-create.exp test, back when
we were still using libthread_db thread event breakpoints. The
all-stop-on-top-of-non-stop series forces a thread list refresh each
time GDB needs to start stepping over a breakpoint (to pause all
threads). That test hits the thread event breakpoint often, resulting
in a bunch of step-over operations, thus a bunch of thread list
refreshes while some threads in the target are running.
The commit adds a real non-stop mode test that triggers the issue,
based on multi-create.exp, that does an explicit "info threads" when a
breakpoint is hit. IOW, it does the same things the as-ns series was
doing when testing multi-create.exp.
The bug is a race, so it unfortunately takes several runs for the test
to trigger it. In fact, even when setting the test running in a loop,
it sometimes takes several minutes for it to trigger for me.
The race is related to libthread_db's td_ta_thr_iter. This is
libthread_db's entry point for walking the thread list of the
inferior.
Sometimes, when GDB refreshes the thread list from the target,
libthread_db's td_ta_thr_iter can somehow see glibc's thread list as a
cycle, and get stuck in an infinite loop.
The issue is that when a thread exits, its thread control structure in
glibc is moved from a "used" list to a "cache" list. These lists are
simply circular linked lists where the "next/prev" pointers are
embedded in the thread control structure itself. The "next" pointer
of the last element of the list points back to the list's sentinel
"head". There's only one set of "next/prev" pointers for both lists;
thus a thread can only be in one of the lists at a time, not in both
simultaneously.
So when thread C exits, simplifying, the following happens. A-C are
threads. stack_used and stack_cache are the list's heads.
Before:
stack_used -> A -> B -> C -> (&stack_used)
stack_cache -> (&stack_cache)
After:
stack_used -> A -> B -> (&stack_used)
stack_cache -> C -> (&stack_cache)
td_ta_thr_iter starts by iterating at the list's head's next, and
iterates until it sees a thread whose next pointer points to the
list's head again. Thus in the before case above, C's next points to
stack_used, indicating end of list. In the same case, the stack_cache
list is empty.
For each thread being iterated, td_ta_thr_iter reads the whole thread
object out of the inferior. This includes the thread's "next"
pointer.
In the scenario above, it may happen that td_ta_thr_iter is iterating
thread B and has already read B's thread structure just before thread
C exits and its control structure moves to the cached list.
Now, recall that td_ta_thr_iter is running in the context of GDB, and
there's no locking between GDB and the inferior. From it's local copy
of B, td_ta_thr_iter believes that the next thread after B is thread
C, so it happilly continues iterating to C, a thread that has already
exited, and is now in the stack cache list.
After iterating C, td_ta_thr_iter finds the stack_cache head, which
because it is not stack_used, td_ta_thr_iter assumes it's just another
thread. After this, unless the reverse race triggers, GDB gets stuck
in td_ta_thr_iter forever walking the stack_cache list, as no thread
in thatlist has a next pointer that points back to stack_used (the
terminating condition).
Before fully understanding the issue, I tried adding cycle detection
to GDB's td_ta_thr_iter callback. However, td_ta_thr_iter skips
calling the callback in some cases, which means that it's possible
that the callback isn't called at all, making it impossible for GDB to
break the loop. I did manage to get GDB stuck in that state more than
once.
Fortunately, we can avoid the issue altogether. We don't really need
td_ta_thr_iter for live debugging nowadays, given PTRACE_EVENT_CLONE.
We already know how to map and lwp id to a thread id without iterating
(thread_from_lwp), so use that more.
gdb/ChangeLog:
2015-02-20 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_handle_extended_wait): Call
thread_db_notice_clone whenever a new clone LWP is detected.
(linux_stop_and_wait_all_lwps, linux_unstop_all_lwps): New
functions.
* linux-nat.h (thread_db_attach_lwp): Delete declaration.
(thread_db_notice_clone, linux_stop_and_wait_all_lwps)
(linux_unstop_all_lwps): Declare.
* linux-thread-db.c (struct thread_get_info_inout): Delete.
(thread_get_info_callback): Delete.
(thread_from_lwp): Use td_thr_get_info and record_thread.
(thread_db_attach_lwp): Delete.
(thread_db_notice_clone): New function.
(try_thread_db_load_1): If /proc is mounted and shows the
process'es task list, walk over all LWPs and call thread_from_lwp
instead of relying on td_ta_thr_iter.
(attach_thread): Don't call check_thread_signals here. Split the
tail part of the function (which adds the thread to the core GDB
thread list) to ...
(record_thread): ... this function. Call check_thread_signals
here.
(thread_db_wait): Don't call thread_db_find_new_threads_1. Always
call thread_from_lwp.
(thread_db_update_thread_list): Rename to ...
(thread_db_update_thread_list_org): ... this.
(thread_db_update_thread_list): New function.
(thread_db_find_thread_from_tid): Delete.
(thread_db_get_ada_task_ptid): Simplify.
* nat/linux-procfs.c: Include <sys/stat.h>.
(linux_proc_task_list_dir_exists): New function.
* nat/linux-procfs.h (linux_proc_task_list_dir_exists): Declare.
gdb/gdbserver/ChangeLog:
2015-02-20 Pedro Alves <palves@redhat.com>
* thread-db.c: Include "nat/linux-procfs.h".
(thread_db_init): Skip listing new threads if the kernel supports
PTRACE_EVENT_CLONE and /proc/PID/task/ is accessible.
gdb/testsuite/ChangeLog:
2015-02-20 Pedro Alves <palves@redhat.com>
* gdb.threads/multi-create-ns-info-thr.exp: New file.
|
|
On GNU/Linux, if a pthreaded program has a thread call clone(CLONE_VM)
directly, and then that clone LWP hits a debug event (breakpoint,
etc.) GDB internal errors. Threaded programs shouldn't really be
calling clone directly, but GDB shouldn't crash either.
The crash looks like this:
(gdb) break clone_fn
Breakpoint 2 at 0x4007d8: file clone-thread_db.c, line 35.
(gdb) r
...
[Thread debugging using libthread_db enabled]
...
src/gdb/linux-nat.c:1030: internal-error: lin_lwp_attach_lwp: Assertion `lwpid > 0' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
The problem is that 'clone' ends up clearing the parent thread's tid
field in glibc's thread data structure. For x86_64, the glibc code in
question is here:
sysdeps/unix/sysv/linux/x86_64/clone.S:
...
testq $CLONE_THREAD, %rdi
jne 1f
testq $CLONE_VM, %rdi
movl $-1, %eax <----
jne 2f
movl $SYS_ify(getpid), %eax
syscall
2: movl %eax, %fs:PID
movl %eax, %fs:TID <----
1:
When GDB refreshes the thread list out of libthread_db, it finds a
thread with LWP with pid -1 (the clone's parent), which naturally
isn't yet on the thread list. GDB then tries to attach to that bogus
LWP id, which is caught by that assertion.
The fix is to detect the bad PID early.
Tested on x86-64 Fedora 20. GDBserver doesn't need any fix.
gdb/ChangeLog:
2015-02-20 Pedro Alves <palves@redhat.com>
PR threads/18006
* linux-thread-db.c (thread_get_info_callback): Return early if
the thread's lwp id is -1.
gdb/testsuite/ChangeLog:
2015-02-20 Pedro Alves <palves@redhat.com>
PR threads/18006
* gdb.threads/clone-thread_db.c: New file.
* gdb.threads/clone-thread_db.exp: New file.
|
|
On decr_pc_after_break targets, GDB adjusts the PC incorrectly if a
background single-step stops somewhere where PC-$decr_pc has a
breakpoint, and the thread that finishes the step is not the current
thread, like:
ADDR1 nop <-- breakpoint here
ADDR2 jmp PC
IOW, say thread A is stepping ADDR2's line in the background (an
infinite loop), and the user switches focus to thread B. GDB's
adjust_pc_after_break logic confuses the single-step stop of thread A
for a hit of the breakpoint at ADDR1, and thus adjusts thread A's PC
to point at ADDR1 when it should not, and reports a breakpoint hit,
when thread A did not execute the instruction at ADDR1 at all.
The test added by this patch exercises exactly that.
I can't find any reason we'd need the "thread to be examined is still
the current thread" condition in adjust_pc_after_break, at least
nowadays; it might have made sense in the past. Best just remove it,
and rely on currently_stepping().
Here's the test's log of a run with an unpatched GDB:
35 while (1);
(gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next over nop
next&
(gdb) PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: next& over inf loop
thread 1
[Switching to thread 1 (Thread 0x7ffff7fc2740 (LWP 29027))](running)
(gdb)
PASS: gdb.threads/step-bg-decr-pc-switch-thread.exp: switch to main thread
Breakpoint 2, thread_function (arg=0x0) at ...src/gdb/testsuite/gdb.threads/step-bg-decr-pc-switch-thread.c:34
34 NOP; /* set breakpoint here */
FAIL: gdb.threads/step-bg-decr-pc-switch-thread.exp: no output while stepping
gdb/ChangeLog:
2015-02-11 Pedro Alves <pedro@codesourcery.com>
* infrun.c (adjust_pc_after_break): Don't adjust the PC just
because the event thread is not the current thread.
gdb/testsuite/ChangeLog:
2015-02-11 Pedro Alves <pedro@codesourcery.com>
* gdb.threads/step-bg-decr-pc-switch-thread.c: New file.
* gdb.threads/step-bg-decr-pc-switch-thread.exp: New file.
|
|
Some local changes I was working on related to SIGTRAP handling
resulted in "signal SIGTRAP" no longer passing the SIGTRAP to the
inferior.
Surprisingly, only annota1.exp catches this. This commit adds a test
that doesn't rely on annotations, so that at the point annotations are
finaly dropped, we still have this use case covered ...
This is a multi-threaded test to also exercise the case of first
needing to do a step-over before delivering the signal.
Tested on x86_64 Fedora 20, native, remote/extended-remote gdbserver.
gdb/testsuite/
2015-02-10 Pedro Alves <palves@redhat.com>
* gdb.threads/signal-sigtrap.c: New file.
* gdb.threads/signal-sigtrap.exp: New file.
|
|
The buildbot shows that this test is still racy, and occasionally
fails with time outs on some machines. I'd like to get major issues
with load out of the way.
The test currently exits after 180s, which is just a random number,
that has no relation to what the .exp file considers a time out. This
commit makes the program wait a bit longer than what the .exp file
considers a time out, and, resets the timer for each iteration.
Tested on x86_64 Fedora 20, native and extended-remote gdbserver.
gdb/testsuite/
2015-02-06 Pedro Alves <palves@redhat.com>
* gdb.threads/attach-many-short-lived-threads.c (SECONDS): New
macro.
(seconds_left, again): New globals.
(main): Wait seconds_left in a 1-second sleep loop instead of
sleeping 180 seconds. If 'again' is set, reset the seconds
counter.
* gdb.threads/attach-many-short-lived-threads.exp (test): Set
'again' in the inferior before detaching. Print the seconds left.
(options): New global.
(top level): Build program with -DTIMEOUT=$timeout.
|
|
GCC5 defaults to the GNU11 standard for C and warns by default for
implicit function declarations and implicit return types.
https://gcc.gnu.org/gcc-5/porting_to.html
Fixing these issues in the testsuite turns 9 untested and 17 unsupported
testcases into 417 new passes when compiling with GCC5.
gdb/testsuite/ChangeLog:
* gdb.arch/i386-bp_permanent.c (standard): New declaration.
* gdb.base/disp-step-fork.c: Include unistd.h.
* gdb.base/siginfo-obj.c: Include stdio.h.
* gdb.base/siginfo-thread.c: Likewise.
* gdb.mi/non-stop.c: Include unistd.h.
* gdb.mi/nsthrexec.c: Include stdio.h.
* gdb.mi/pthreads.c: Include unistd.h.
* gdb.modula2/unbounded1.c (main): Declare returns int.
* gdb.reverse/consecutive-reverse.c: Likewise.
* gdb.threads/create-fail.c: Include unistd.h.
* gdb.threads/killed.c: Likewise.
* gdb.threads/linux-dp.c: Likewise.
* gdb.threads/non-ldr-exc-1.c: Include stdio.h and string.h.
* gdb.threads/non-ldr-exc-2.c: Likewise.
* gdb.threads/non-ldr-exc-3.c: Likewise.
* gdb.threads/non-ldr-exc-4.c: Likewise.
* gdb.threads/pthreads.c: Include unistd.h.
(main): Declare returns int.
* gdb.threads/tls-main.c (foo): New declaration.
* gdb.threads/watchpoint-fork-mt.c: Define _GNU_SOURCE.
|
|
linux_nat_is_async_p currently always returns true, even when the
target is _not_ async. That confuses
gdb_readline_wrapper/gdb_readline_wrapper_cleanup, which
force-disables target-async while the secondary prompt is active. As
a result, when gdb_readline_wrapper returns, the target is left async,
even through it was sync to begin with.
That can result in weird bugs, like the one the test added by this
commit exposes.
Ref: https://sourceware.org/ml/gdb-patches/2015-01/msg00592.html
gdb/ChangeLog:
2015-01-23 Pedro Alves <palves@redhat.com>
* linux-nat.c (linux_is_async_p): New macro.
(linux_nat_is_async_p):
(linux_nat_terminal_inferior): Check whether the target can async
instead of whether it is already async.
(linux_nat_terminal_ours): Don't check whether the target is
async.
(linux_async_pipe): Use linux_is_async_p.
gdb/testsuite/ChangeLog:
2015-01-23 Pedro Alves <palves@redhat.com>
* gdb.threads/continue-pending-after-query.c: New file.
* gdb.threads/continue-pending-after-query.exp: New file.
|
|
This commit adds a non-stop mode test originally inspired by
signal-while-stepping-over-bp-other-thread.exp, that exposes the
thread starvation issues fixed by the previous patches. It sets a set
of threads stepping in parallel, and has one of them get a signal.
Without the previous fixes, this would fail with timeouts.
gdb/testsuite/
2015-01-09 Pedro Alves <palves@redhat.com>
* gdb.threads/non-stop-fair-events.c: New file.
* gdb.threads/non-stop-fair-events.exp: New file.
|
|
with GDB
These three test all spawn a few threads and then send a SIGSTOP to
their parent GDB in order to pause it while the new threads set things
up for the test. With a GDB patch that changes the inferior thread's
scheduling a bit, I sometimes see:
FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
...
FAIL: gdb.threads/watchthreads-reorder.exp: reorder1: continue a (timeout)
...
FAIL: gdb.threads/ia64-sigill.exp: continue (timeout)
...
The issue is that the test program stops GDB before it had a chance of
processing the new thread's clone event:
(gdb) PASS: gdb.threads/siginfo-threads.exp: get pid
continue
Continuing.
Stopping GDB PID 21541.
Waiting till the threads initialize their TIDs.
FAIL: gdb.threads/siginfo-threads.exp: catch signal 0 (timeout)
On Linux (at least), new threads start stopped, and the debugger must
resume them. The fix is to make the test program wait for the new
threads to be running before stopping GDB.
gdb/testsuite/
2015-01-09 Pedro Alves <palves@redhat.com>
* gdb.threads/ia64-sigill.c (threads_started_barrier): New global.
(thread_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
* gdb.threads/siginfo-threads.c (threads_started_barrier): New
global.
(thread1_func, thread2_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
* gdb.threads/watchthreads-reorder.c (threads_started_barrier):
New global.
(thread1_func, thread2_func): Wait on barrier.
(main): Wait for all threads to start before stopping GDB.
|
|
Before the previous fixes, on Linux, this would trigger several
different problems, like:
[New LWP 27106]
[New LWP 27047]
warning: unable to open /proc file '/proc/-1/status'
[New LWP 27813]
[New LWP 27869]
warning: Can't attach LWP 11962: No child processes
Warning: couldn't activate thread debugging using libthread_db: Cannot find new threads: debugger service failed
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
gdb/testsuite/
2015-01-09 Pedro Alves <palves@redhat.com>
* gdb.threads/attach-many-short-lived-threads.c: New file.
* gdb.threads/attach-many-short-lived-threads.exp: New file.
|
|
[A test I wrote stumbled on a libthread_db issue related to thread
event breakpoints. See glibc PR17705:
[nptl_db: stale thread create/death events if debugger detaches]
https://sourceware.org/bugzilla/show_bug.cgi?id=17705
This patch avoids that whole issue by making GDB stop using thread
event breakpoints in the first place, which is good for other reasons
as well, anyway.]
Before PTRACE_EVENT_CLONE (Linux 2.6), the only way to learn about new
threads in the inferior (to attach to them) or to learn about thread
exit was to coordinate with the inferior's glibc/runtime, using
libthread_db. That works by putting a breakpoint at a magic address
which is called when a new thread is spawned, or when a thread is
about to exit. When that breakpoint is hit, all threads are stopped,
and then GDB coordinates with libthread_db to read data structures out
of the inferior to learn about what happened. Then the breakpoint is
single-stepped, and then all threads are re-resumed. This isn't very
efficient (stops all threads) and is more fragile (inferior's thread
list in memory may be corrupt; libthread_db bugs, etc.) than ideal.
When the kernel supports PTRACE_EVENT_CLONE (which we already make use
of), there's really no need to use libthread_db's event reporting
mechanism to learn about new LWPs. And if the kernel supports that,
then we learn about LWP exits through regular WIFEXITED wait statuses,
so no need for the death event breakpoint either.
GDBserver has been likewise skipping the thread_db events for a long
while:
https://sourceware.org/ml/gdb-patches/2007-10/msg00547.html
There's one user-visible difference: we'll no longer print about
threads being created and exiting while the program is running, like:
[Thread 0x7ffff7dbb700 (LWP 30670) exited]
[New Thread 0x7ffff7db3700 (LWP 30671)]
[Thread 0x7ffff7dd3700 (LWP 30667) exited]
[New Thread 0x7ffff7dab700 (LWP 30672)]
[Thread 0x7ffff7db3700 (LWP 30671) exited]
[Thread 0x7ffff7dcb700 (LWP 30668) exited]
This is exactly the same behavior as when debugging against remote
targets / gdbserver. I actually think that's a good thing (and as
such have listed this in the local/remote parity wiki page a while
ago), as the printing slows down the inferior. It's also a
distraction to keep bothering the user about short-lived threads that
she won't be able to interact with anyway. Instead, the user (and
frontend) will be informed about new threads that currently exist in
the program when the program next stops:
(gdb) c
...
* ctrl-c *
[New Thread 0x7ffff7963700 (LWP 7797)]
[New Thread 0x7ffff796b700 (LWP 7796)]
Program received signal SIGINT, Interrupt.
[Switching to Thread 0x7ffff796b700 (LWP 7796)]
clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
81 testq %rax,%rax
(gdb) info threads
A couple of tests had assumptions on GDB thread numbers that no longer
hold.
Tested on x86_64 Fedora 20.
gdb/
2014-01-09 Pedro Alves <palves@redhat.com>
Skip enabling event reporting if the kernel supports
PTRACE_EVENT_CLONE.
* linux-thread-db.c: Include "nat/linux-ptrace.h".
(thread_db_use_events): New function.
(try_thread_db_load_1): Check thread_db_use_events before enabling
event reporting.
(update_thread_state): New function.
(attach_thread): Use it. Check thread_db_use_events before
enabling event reporting.
(thread_db_detach): Check thread_db_use_events before disabling
event reporting.
(find_new_threads_callback): Check thread_db_use_events before
enabling event reporting. Update the thread's state if not using
libthread_db events.
gdb/testsuite/
2014-01-09 Pedro Alves <palves@redhat.com>
* gdb.threads/fork-thread-pending.exp: Switch to the main thread
instead of to thread 2.
* gdb.threads/signal-command-multiple-signals-pending.c (main):
Add barrier around each pthread_create call instead of around all
calls.
* gdb.threads/signal-command-multiple-signals-pending.exp (test):
Set a break on thread_function and have the child threads hit it
one at at a time.
|
|
gdb/ChangeLog:
Update year range in copyright notice of all files.
|
|
The target->request_interrupt callback implements the handling for
ctrl-c. User types ctrl-c in GDB, GDB sends a \003 to the remote
target, and the remote targets stops the program with a SIGINT, just
like if the user typed ctrl-c in GDBserver's terminal.
The trouble is that using kill_lwp(signal_pid, SIGINT) sends the
SIGINT directly to the program's main thread. If that thread has
exited already, then that kill won't do anything.
Instead, send the SIGINT to the process group, just like GDB
does (see inf-ptrace.c:inf_ptrace_stop).
gdb.threads/leader-exit.exp is extended to cover the scenario. It
fails against GDBserver before the patch.
Tested on x86_64 Fedora 20, native and GDBserver.
gdb/gdbserver/
2014-11-12 Pedro Alves <palves@redhat.com>
* linux-low.c (linux_request_interrupt): Always send a SIGINT to
the process group instead of to a specific LWP.
gdb/testsuite/
2014-11-12 Pedro Alves <palves@redhat.com>
* gdb.threads/leader-exit.exp: Test sending ctrl-c works after the
leader has exited.
|
|
This PR shows that GDB can easily trigger an assertion here, in
infrun.c:
5392 /* Did we find the stepping thread? */
5393 if (tp->control.step_range_end)
5394 {
5395 /* Yep. There should only one though. */
5396 gdb_assert (stepping_thread == NULL);
5397
5398 /* The event thread is handled at the top, before we
5399 enter this loop. */
5400 gdb_assert (tp != ecs->event_thread);
5401
5402 /* If some thread other than the event thread is
5403 stepping, then scheduler locking can't be in effect,
5404 otherwise we wouldn't have resumed the current event
5405 thread in the first place. */
5406 gdb_assert (!schedlock_applies (currently_stepping (tp)));
5407
5408 stepping_thread = tp;
5409 }
Like:
gdb/infrun.c:5406: internal-error: switch_back_to_stepped_thread: Assertion `!schedlock_applies (1)' failed.
The way the assertion is written is assuming that with schedlock=step
we'll always leave threads other than the one with the stepping range
locked, while that's not true with the "next" command. With schedlock
"step", other threads still run unlocked when "next" detects a
function call and steps over it. Whether that makes sense or not,
still, it's documented that way in the manual. If another thread hits
an event that doesn't cause a stop while the nexting thread steps over
a function call, we'll get here and fail the assertion.
The fix is just to adjust the assertion. Even though we found the
stepping thread, we'll still step-over the breakpoint that just
triggered correctly.
Surprisingly, gdb.threads/schedlock.exp doesn't have any test that
steps over a function call. This commits fixes that. This ensures
that "next" doesn't switch focus to another thread, and checks whether
other threads run locked or not, depending on scheduler locking mode
and command. There's a lot of duplication in that file that this ends
cleaning up. There's more that could be cleaned up, but that would
end up an unrelated change, best done separately.
This new coverage in schedlock.exp happens to trigger the internal
error in question, like so:
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (1) (GDB internal error)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (3) (GDB internal error)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (5) (GDB internal error)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (7) (GDB internal error)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next to increment (9) (GDB internal error)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: next does not change thread (switched to thread 0)
FAIL: gdb.threads/schedlock.exp: schedlock=step: cmd=next: call_function=1: current thread advanced - unlocked (wrong amount)
That's because we have more than one thread running the same loop, and
while one thread is stepping over a function call, the other thread
hits the step-resume breakpoint of the first, which needs to be
stepped over, and we end up in switch_back_to_stepped_thread exactly
in the problem case.
I think a simpler and more directed test is also useful, to not rely
on internal breakpoint magics. So this commit also adds a test that
has a thread trip on a conditional breakpoint that doesn't cause a
user-visible stop while another thread is stepping over a call. That
currently fails like this:
FAIL: gdb.threads/next-bp-other-thread.exp: schedlock=step: next over function call (GDB internal error)
Tested on x86_64 Fedora 20.
gdb/
2014-10-29 Pedro Alves <palves@redhat.com>
PR gdb/17408
* infrun.c (switch_back_to_stepped_thread): Use currently_stepping
instead of assuming a thread with a stepping range is always
stepping.
gdb/testsuite/
2014-10-29 Pedro Alves <palves@redhat.com>
PR gdb/17408
* gdb.threads/schedlock.c (some_function): New function.
(call_function): New global.
(MAYBE_CALL_SOME_FUNCTION): New macro.
(thread_function): Call it.
* gdb.threads/schedlock.exp (get_args): Add description parameter,
and use it instead of a global counter. Adjust all callers.
(get_current_thread): Use "find current thread" for test message
here rather than having all callers pass down the same string.
(goto_loop): New procedure, factored out from ...
(my_continue): ... this.
(step_ten_loops): Change parameter from test message to command to
use. Adjust.
(list_count): Delete global.
(check_result): New procedure, factored out from duplicate top
level code.
(continue tests): Wrap in with_test_prefix.
(test_step): New procedure, factored out from duplicate top level
code.
(top level): Test "step" in combination with all scheduler-locking
modes. Test "next" in combination with all scheduler-locking
modes, and in combination with stepping over a function call or
not.
* gdb.threads/next-bp-other-thread.c: New file.
* gdb.threads/next-bp-other-thread.exp: New file.
|
|
Built and tested on x86_64 Fedora 20, with --enable-targets=all.
gdb/
2014-10-24 Pedro Alves <palves@redhat.com>
* Makefile.in (ALLDEPFILES): Remove vax-nat.c.
* NEWS (Removed targets): Add VAX BSD and VAX Ultrix.
* config/vax/vax.mh: Delete.
* configure.host: Move vax-*-bsd* and vax-*-ultrix* to the
obsolete configurations section.
* configure.tgt (vax-*-*): Don't mention 4.2BSD nor Ultrix.
* vax-nat.c: Delete file.
gdb/testsuite/
2014-10-24 Pedro Alves <palves@redhat.com>
* gdb.base/corefile.exp: Remove references to ultrix.
* gdb.base/interrupt.exp: Likewise.
* gdb.base/whatis.exp: Likewise.
* gdb.gdb/selftest.exp: Likewise.
* gdb.threads/manythreads.exp: Likewise.
* gdb.threads/print-threads.exp: Likewise.
* gdb.threads/pthreads.exp:: Likewise.
* gdb.threads/schedlock.exp: Likewise.
|
|
This commit does most of the mechanical removal. IOW, the easy part.
procfs.c isn't touched beyond removing a couple obvious bits that are
guarded by a couple macros defined in config/alpha/nm-osf3.h. Going
beyond that for procfs.c & co would be a harder excision that
potentially affects Solaris.
Some comments in the generic alpha code ABIs that may still be
relevant and I wouldn't know what to do with them. That can always be
done on a separate pass, preferably by someone who can test on alpha.
A couple other spots have references to OSF/Tru64 and related files
being removed, but it felt like removing them would make things worse,
not better. We can revisit those when we next need to touch that
code.
I didn't remove a reference to osf in testsuite/lib/future.exp, as I
believe that code is imported from DejaGNU.
Built and tested on x86_64 Fedora 20, with --enable-targets=all.
Tested that building for --target=alpha-osf3 on x86_64 Fedora 20
fails with:
checking for default auto-load directory... $debugdir:$datadir/auto-load
checking for default auto-load safe-path... $debugdir:$datadir/auto-load
*** Configuration alpha-unknown-osf3 is obsolete.
*** Support has been REMOVED.
make[1]: *** [configure-gdb] Error 1
make[1]: Leaving directory `build-osf'
make: *** [all] Error 2
gdb/
2014-10-17 Pedro Alves <palves@redhat.com>
* Makefile.in (ALL_64_TARGET_OBS): Remove alpha-osf1-tdep.o.
(HFILES_NO_SRCDIR): Remove config/alpha/nm-osf3.h.
(ALLDEPFILES): Remove alpha-nat.c, alpha-osf1-tdep.c and
solib-osf.c.
* NEWS: Mention that support for alpha*-*-osf* has been removed.
* ada-lang.h [__alpha__ && __osf__]
(ADA_KNOWN_RUNTIME_FILE_NAME_PATTERNS): Delete.
* alpha-nat.c, alpha-osf1-tdep.c: Delete files.
* alpha-tdep.c (alpha_gdbarch_init): Remove reference to
GDB_OSABI_OSF1.
* config/alpha/alpha-osf3.mh, config/alpha/nm-osf3.h: Delete
files.
* config/djgpp/fnchange.lst (config/alpha/alpha-osf1.mh)
(config/alpha/alpha-osf2.mh, config/alpha/alpha-osf3.mh): Delete.
* configure: Regenerate.
* configure.ac: Remove references to osf.
* configure.host: Handle alpha*-*-osf* in the obsolete hosts
section. Remove all other references to osf.
* configure.tgt: Add alpha*-*-osf* to the obsolete targets section.
Remove all other references to osf.
* dec-thread.c: Delete file.
* defs.h (GDB_OSABI_OSF1): Delete.
* inferior.h (START_INFERIOR_TRAPS_EXPECTED): New unconditionally
defined.
* osabi.c (gdb_osabi_names): Delete "OSF/1".
* procfs.c (procfs_debug_inferior) [PROCFS_DONT_TRACE_FAULTS]:
Delete code.
(unconditionally_kill_inferior)
[PROCFS_NEED_CLEAR_CURSIG_FOR_KILL]: Delete code.
* solib-osf.c: Delete file.
gdb/testsuite/
2014-10-17 Pedro Alves <palves@redhat.com>
* gdb.base/callfuncs.exp: emove references to osf.
* gdb.base/sigall.exp: Likewise.
* gdb.gdb/selftest.exp: Likewise.
* gdb.hp/gdb.base-hp/callfwmall.exp: Likewise.
* gdb.mi/non-stop.c: Likewise.
* gdb.mi/pthreads.c: Likewise.
* gdb.reverse/sigall-precsave.exp: Likewise.
* gdb.reverse/sigall-reverse.exp: Likewise.
* gdb.threads/pthreads.c: Likewise.
* gdb.threads/pthreads.exp: Likewise.
gdb/doc/
2014-10-17 Pedro Alves <palves@redhat.com>
* gdb.texinfo (Ada Tasks and Core Files): Delete mention of Tru64.
(SVR4 Process Information): Delete mention of OSF/1.
|
|
As the result of the patch below, GDB updates thread list when a stop is
presented to user. The tests don't have to fetch thread list explicitly.
[PATCH 3/3] Fix non-stop regressions caused by "breakpoints always-inserted off" changes
https://sourceware.org/ml/gdb-patches/2014-09/msg00734.html
This patch is to remove the test code updating thread list.
Run these three tests many times on arm-linux-gnueabi and x86-linux.
No regressions.
gdb/testsuite:
2014-10-11 Yao Qi <yao@codesourcery.com>
* gdb.threads/thread-find.exp: Don't execute command
"info threads".
* gdb.threads/attach-into-signal.exp (corefunc): Likewise.
* gdb.threads/linux-dp.exp: Don't check the condition
$threads_created equals to zero.
|