aboutsummaryrefslogtreecommitdiff
path: root/gdb/linux-thread-db.c
diff options
context:
space:
mode:
authorPedro Alves <palves@redhat.com>2015-02-20 20:21:59 +0000
committerPedro Alves <palves@redhat.com>2015-02-20 21:40:31 +0000
commit2db9a4275ceada4aad3443dc157b96dd2e23afc0 (patch)
tree59a27d7c2e9734db5bbc2237a75c7c5f6402a385 /gdb/linux-thread-db.c
parent3b27ef472df3b4cdcdd54629281610d594c99c97 (diff)
downloadgdb-2db9a4275ceada4aad3443dc157b96dd2e23afc0.zip
gdb-2db9a4275ceada4aad3443dc157b96dd2e23afc0.tar.gz
gdb-2db9a4275ceada4aad3443dc157b96dd2e23afc0.tar.bz2
GNU/Linux: Stop using libthread_db/td_ta_thr_iter
TL;DR - GDB can hang if something refreshes the thread list out of the target while the target is running. GDB hangs inside td_ta_thr_iter. The fix is to not use that libthread_db function anymore. Long version: Running the testsuite against my all-stop-on-top-of-non-stop series is still exposing latent non-stop bugs. I was originally seeing this with the multi-create.exp test, back when we were still using libthread_db thread event breakpoints. The all-stop-on-top-of-non-stop series forces a thread list refresh each time GDB needs to start stepping over a breakpoint (to pause all threads). That test hits the thread event breakpoint often, resulting in a bunch of step-over operations, thus a bunch of thread list refreshes while some threads in the target are running. The commit adds a real non-stop mode test that triggers the issue, based on multi-create.exp, that does an explicit "info threads" when a breakpoint is hit. IOW, it does the same things the as-ns series was doing when testing multi-create.exp. The bug is a race, so it unfortunately takes several runs for the test to trigger it. In fact, even when setting the test running in a loop, it sometimes takes several minutes for it to trigger for me. The race is related to libthread_db's td_ta_thr_iter. This is libthread_db's entry point for walking the thread list of the inferior. Sometimes, when GDB refreshes the thread list from the target, libthread_db's td_ta_thr_iter can somehow see glibc's thread list as a cycle, and get stuck in an infinite loop. The issue is that when a thread exits, its thread control structure in glibc is moved from a "used" list to a "cache" list. These lists are simply circular linked lists where the "next/prev" pointers are embedded in the thread control structure itself. The "next" pointer of the last element of the list points back to the list's sentinel "head". There's only one set of "next/prev" pointers for both lists; thus a thread can only be in one of the lists at a time, not in both simultaneously. So when thread C exits, simplifying, the following happens. A-C are threads. stack_used and stack_cache are the list's heads. Before: stack_used -> A -> B -> C -> (&stack_used) stack_cache -> (&stack_cache) After: stack_used -> A -> B -> (&stack_used) stack_cache -> C -> (&stack_cache) td_ta_thr_iter starts by iterating at the list's head's next, and iterates until it sees a thread whose next pointer points to the list's head again. Thus in the before case above, C's next points to stack_used, indicating end of list. In the same case, the stack_cache list is empty. For each thread being iterated, td_ta_thr_iter reads the whole thread object out of the inferior. This includes the thread's "next" pointer. In the scenario above, it may happen that td_ta_thr_iter is iterating thread B and has already read B's thread structure just before thread C exits and its control structure moves to the cached list. Now, recall that td_ta_thr_iter is running in the context of GDB, and there's no locking between GDB and the inferior. From it's local copy of B, td_ta_thr_iter believes that the next thread after B is thread C, so it happilly continues iterating to C, a thread that has already exited, and is now in the stack cache list. After iterating C, td_ta_thr_iter finds the stack_cache head, which because it is not stack_used, td_ta_thr_iter assumes it's just another thread. After this, unless the reverse race triggers, GDB gets stuck in td_ta_thr_iter forever walking the stack_cache list, as no thread in thatlist has a next pointer that points back to stack_used (the terminating condition). Before fully understanding the issue, I tried adding cycle detection to GDB's td_ta_thr_iter callback. However, td_ta_thr_iter skips calling the callback in some cases, which means that it's possible that the callback isn't called at all, making it impossible for GDB to break the loop. I did manage to get GDB stuck in that state more than once. Fortunately, we can avoid the issue altogether. We don't really need td_ta_thr_iter for live debugging nowadays, given PTRACE_EVENT_CLONE. We already know how to map and lwp id to a thread id without iterating (thread_from_lwp), so use that more. gdb/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * linux-nat.c (linux_handle_extended_wait): Call thread_db_notice_clone whenever a new clone LWP is detected. (linux_stop_and_wait_all_lwps, linux_unstop_all_lwps): New functions. * linux-nat.h (thread_db_attach_lwp): Delete declaration. (thread_db_notice_clone, linux_stop_and_wait_all_lwps) (linux_unstop_all_lwps): Declare. * linux-thread-db.c (struct thread_get_info_inout): Delete. (thread_get_info_callback): Delete. (thread_from_lwp): Use td_thr_get_info and record_thread. (thread_db_attach_lwp): Delete. (thread_db_notice_clone): New function. (try_thread_db_load_1): If /proc is mounted and shows the process'es task list, walk over all LWPs and call thread_from_lwp instead of relying on td_ta_thr_iter. (attach_thread): Don't call check_thread_signals here. Split the tail part of the function (which adds the thread to the core GDB thread list) to ... (record_thread): ... this function. Call check_thread_signals here. (thread_db_wait): Don't call thread_db_find_new_threads_1. Always call thread_from_lwp. (thread_db_update_thread_list): Rename to ... (thread_db_update_thread_list_org): ... this. (thread_db_update_thread_list): New function. (thread_db_find_thread_from_tid): Delete. (thread_db_get_ada_task_ptid): Simplify. * nat/linux-procfs.c: Include <sys/stat.h>. (linux_proc_task_list_dir_exists): New function. * nat/linux-procfs.h (linux_proc_task_list_dir_exists): Declare. gdb/gdbserver/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * thread-db.c: Include "nat/linux-procfs.h". (thread_db_init): Skip listing new threads if the kernel supports PTRACE_EVENT_CLONE and /proc/PID/task/ is accessible. gdb/testsuite/ChangeLog: 2015-02-20 Pedro Alves <palves@redhat.com> * gdb.threads/multi-create-ns-info-thr.exp: New file.
Diffstat (limited to 'gdb/linux-thread-db.c')
-rw-r--r--gdb/linux-thread-db.c255
1 files changed, 111 insertions, 144 deletions
diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index 60d8742..f09685e 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -217,6 +217,13 @@ struct thread_db_info *thread_db_list;
static void thread_db_find_new_threads_1 (ptid_t ptid);
static void thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new);
+static void check_thread_signals (void);
+
+static void record_thread (struct thread_db_info *info,
+ struct thread_info *tp,
+ ptid_t ptid, const td_thrhandle_t *th_p,
+ const td_thrinfo_t *ti_p);
+
/* Add the current inferior to the list of processes using libpthread.
Return a pointer to the newly allocated object that was added to
THREAD_DB_LIST. HANDLE is the handle returned by dlopen'ing
@@ -398,64 +405,6 @@ have_threads (ptid_t ptid)
return iterate_over_threads (have_threads_callback, &pid) != NULL;
}
-struct thread_get_info_inout
-{
- struct thread_info *thread_info;
- struct thread_db_info *thread_db_info;
-};
-
-/* A callback function for td_ta_thr_iter, which we use to map all
- threads to LWPs.
-
- THP is a handle to the current thread; if INFOP is not NULL, the
- struct thread_info associated with this thread is returned in
- *INFOP.
-
- If the thread is a zombie, TD_THR_ZOMBIE is returned. Otherwise,
- zero is returned to indicate success. */
-
-static int
-thread_get_info_callback (const td_thrhandle_t *thp, void *argp)
-{
- td_thrinfo_t ti;
- td_err_e err;
- ptid_t thread_ptid;
- struct thread_get_info_inout *inout;
- struct thread_db_info *info;
-
- inout = argp;
- info = inout->thread_db_info;
-
- err = info->td_thr_get_info_p (thp, &ti);
- if (err != TD_OK)
- error (_("thread_get_info_callback: cannot get thread info: %s"),
- thread_db_err_str (err));
-
- if (ti.ti_lid == -1)
- {
- /* We'll get this if a threaded program has a thread call clone
- with CLONE_VM. `clone' sets the pthread LID of the new LWP
- to -1, which ends up clearing the parent thread's LID. */
- return 0;
- }
-
- /* Fill the cache. */
- thread_ptid = ptid_build (info->pid, ti.ti_lid, 0);
- inout->thread_info = find_thread_ptid (thread_ptid);
-
- if (inout->thread_info == NULL)
- {
- /* New thread. Attach to it now (why wait?). */
- if (!have_threads (thread_ptid))
- thread_db_find_new_threads_1 (thread_ptid);
- else
- attach_thread (thread_ptid, thp, &ti);
- inout->thread_info = find_thread_ptid (thread_ptid);
- gdb_assert (inout->thread_info != NULL);
- }
-
- return 0;
-}
/* Fetch the user-level thread id of PTID. */
@@ -463,9 +412,10 @@ static void
thread_from_lwp (ptid_t ptid)
{
td_thrhandle_t th;
+ td_thrinfo_t ti;
td_err_e err;
struct thread_db_info *info;
- struct thread_get_info_inout io = {0};
+ struct thread_info *tp;
/* Just in case td_ta_map_lwp2thr doesn't initialize it completely. */
th.th_unique = 0;
@@ -484,56 +434,37 @@ thread_from_lwp (ptid_t ptid)
error (_("Cannot find user-level thread for LWP %ld: %s"),
ptid_get_lwp (ptid), thread_db_err_str (err));
- /* Long-winded way of fetching the thread info. */
- io.thread_db_info = info;
- io.thread_info = NULL;
- thread_get_info_callback (&th, &io);
+ err = info->td_thr_get_info_p (&th, &ti);
+ if (err != TD_OK)
+ error (_("thread_get_info_callback: cannot get thread info: %s"),
+ thread_db_err_str (err));
+
+ /* Fill the cache. */
+ tp = find_thread_ptid (ptid);
+ record_thread (info, tp, ptid, &th, &ti);
}
-/* Attach to lwp PTID, doing whatever else is required to have this
- LWP under the debugger's control --- e.g., enabling event
- reporting. Returns true on success. */
+/* See linux-nat.h. */
+
int
-thread_db_attach_lwp (ptid_t ptid)
+thread_db_notice_clone (ptid_t parent, ptid_t child)
{
td_thrhandle_t th;
td_thrinfo_t ti;
td_err_e err;
struct thread_db_info *info;
- info = get_thread_db_info (ptid_get_pid (ptid));
+ info = get_thread_db_info (ptid_get_pid (child));
if (info == NULL)
return 0;
- /* This ptid comes from linux-nat.c, which should always fill in the
- LWP. */
- gdb_assert (ptid_get_lwp (ptid) != 0);
+ thread_from_lwp (child);
- /* Access an lwp we know is stopped. */
- info->proc_handle.ptid = ptid;
-
- /* If we have only looked at the first thread before libpthread was
- initialized, we may not know its thread ID yet. Make sure we do
- before we add another thread to the list. */
- if (!have_threads (ptid))
- thread_db_find_new_threads_1 (ptid);
-
- err = info->td_ta_map_lwp2thr_p (info->thread_agent, ptid_get_lwp (ptid),
- &th);
- if (err != TD_OK)
- /* Cannot find user-level thread. */
- return 0;
-
- err = info->td_thr_get_info_p (&th, &ti);
- if (err != TD_OK)
- {
- warning (_("Cannot get thread info: %s"), thread_db_err_str (err));
- return 0;
- }
-
- attach_thread (ptid, &th, &ti);
+ /* If we do not know about the main thread yet, this would be a good
+ time to find it. */
+ thread_from_lwp (parent);
return 1;
}
@@ -821,7 +752,30 @@ try_thread_db_load_1 (struct thread_db_info *info)
info->td_thr_tls_get_addr_p = dlsym (info->handle, "td_thr_tls_get_addr");
info->td_thr_tlsbase_p = dlsym (info->handle, "td_thr_tlsbase");
- if (thread_db_find_new_threads_silently (inferior_ptid) != 0)
+ /* It's best to avoid td_ta_thr_iter if possible. That walks data
+ structures in the inferior's address space that may be corrupted,
+ or, if the target is running, may change while we walk them. If
+ there's execution (and /proc is mounted), then we're already
+ attached to all LWPs. Use thread_from_lwp, which uses
+ td_ta_map_lwp2thr instead, which does not walk the thread list.
+
+ td_ta_map_lwp2thr uses ps_get_thread_area, but we can't use that
+ currently on core targets, as it uses ptrace directly. */
+ if (target_has_execution
+ && linux_proc_task_list_dir_exists (ptid_get_pid (inferior_ptid)))
+ {
+ struct lwp_info *lp;
+ int pid = ptid_get_pid (inferior_ptid);
+
+ linux_stop_and_wait_all_lwps ();
+
+ ALL_LWPS (lp)
+ if (ptid_get_pid (lp->ptid) == pid)
+ thread_from_lwp (lp->ptid);
+
+ linux_unstop_all_lwps ();
+ }
+ else if (thread_db_find_new_threads_silently (inferior_ptid) != 0)
{
/* Even if libthread_db initializes, if the thread list is
corrupted, we'd not manage to list any threads. Better reject this
@@ -1302,9 +1256,7 @@ static int
attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
const td_thrinfo_t *ti_p)
{
- struct private_thread_info *private;
struct thread_info *tp;
- td_err_e err;
struct thread_db_info *info;
/* If we're being called after a TD_CREATE event, we may already
@@ -1337,9 +1289,6 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
}
}
- if (target_has_execution)
- check_thread_signals ();
-
/* Under GNU/Linux, we have to attach to each and every thread. */
if (target_has_execution
&& tp == NULL)
@@ -1363,31 +1312,48 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
/* Otherwise, we sucessfully attached to the thread. */
}
+ info = get_thread_db_info (ptid_get_pid (ptid));
+ record_thread (info, tp, ptid, th_p, ti_p);
+ return 1;
+}
+
+/* Record a new thread in GDB's thread list. Creates the thread's
+ private info. If TP is NULL, creates a new thread. Otherwise,
+ uses TP. */
+
+static void
+record_thread (struct thread_db_info *info,
+ struct thread_info *tp,
+ ptid_t ptid, const td_thrhandle_t *th_p,
+ const td_thrinfo_t *ti_p)
+{
+ td_err_e err;
+ struct private_thread_info *private;
+ int new_thread = (tp == NULL);
+
+ /* A thread ID of zero may mean the thread library has not
+ initialized yet. Leave private == NULL until the thread library
+ has initialized. */
+ if (ti_p->ti_tid == 0)
+ return;
+
/* Construct the thread's private data. */
private = xmalloc (sizeof (struct private_thread_info));
memset (private, 0, sizeof (struct private_thread_info));
- /* A thread ID of zero may mean the thread library has not initialized
- yet. But we shouldn't even get here if that's the case. FIXME:
- if we change GDB to always have at least one thread in the thread
- list this will have to go somewhere else; maybe private == NULL
- until the thread_db target claims it. */
- gdb_assert (ti_p->ti_tid != 0);
private->th = *th_p;
private->tid = ti_p->ti_tid;
update_thread_state (private, ti_p);
/* Add the thread to GDB's thread list. */
if (tp == NULL)
- add_thread_with_info (ptid, private);
+ tp = add_thread_with_info (ptid, private);
else
tp->private = private;
- info = get_thread_db_info (ptid_get_pid (ptid));
-
/* Enable thread event reporting for this thread, except when
debugging a core file. */
- if (target_has_execution && thread_db_use_events ())
+ if (target_has_execution && thread_db_use_events () && new_thread)
{
err = info->td_thr_event_enable_p (th_p, 1);
if (err != TD_OK)
@@ -1395,7 +1361,8 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
target_pid_to_str (ptid), thread_db_err_str (err));
}
- return 1;
+ if (target_has_execution)
+ check_thread_signals ();
}
static void
@@ -1581,21 +1548,13 @@ thread_db_wait (struct target_ops *ops,
return ptid;
}
- /* If we do not know about the main thread yet, this would be a good time to
- find it. */
- if (ourstatus->kind == TARGET_WAITKIND_STOPPED && !have_threads (ptid))
- thread_db_find_new_threads_1 (ptid);
-
if (ourstatus->kind == TARGET_WAITKIND_STOPPED
&& ourstatus->value.sig == GDB_SIGNAL_TRAP)
/* Check for a thread event. */
check_event (ptid);
- if (have_threads (ptid))
- {
- /* Fill in the thread's user-level thread id. */
- thread_from_lwp (ptid);
- }
+ /* Fill in the thread's user-level thread id and status. */
+ thread_from_lwp (ptid);
return ptid;
}
@@ -1727,6 +1686,9 @@ find_new_threads_once (struct thread_db_info *info, int iteration,
data.info = info;
data.new_threads = 0;
+ /* See comment in thread_db_update_thread_list. */
+ gdb_assert (!target_has_execution || thread_db_use_events ());
+
TRY_CATCH (except, RETURN_MASK_ERROR)
{
/* Iterate over all user-space threads to discover new threads. */
@@ -1805,8 +1767,10 @@ update_thread_core (struct lwp_info *info, void *closure)
return 0;
}
+/* Update the thread list using td_ta_thr_iter. */
+
static void
-thread_db_update_thread_list (struct target_ops *ops)
+thread_db_update_thread_list_td_ta_thr_iter (struct target_ops *ops)
{
struct thread_db_info *info;
struct inferior *inf;
@@ -1830,6 +1794,29 @@ thread_db_update_thread_list (struct target_ops *ops)
thread_db_find_new_threads_1 (thread->ptid);
}
+}
+
+/* Implement the to_update_thread_list target method for this
+ target. */
+
+static void
+thread_db_update_thread_list (struct target_ops *ops)
+{
+ /* It's best to avoid td_ta_thr_iter if possible. That walks data
+ structures in the inferior's address space that may be corrupted,
+ or, if the target is running, the list may change while we walk
+ it. In the latter case, it's possible that a thread exits just
+ at the exact time that causes GDB to get stuck in an infinite
+ loop. To avoid pausing all threads whenever the core wants to
+ refresh the thread list, if the kernel supports clone events
+ (meaning we're always already attached to all LWPs), we use
+ thread_from_lwp immediately when we see an LWP stop. That uses
+ thread_db entry points that do not walk libpthread's thread list,
+ so should be safe, as well as more efficient. */
+ if (target_has_execution && !thread_db_use_events ())
+ ops->beneath->to_update_thread_list (ops->beneath);
+ else
+ thread_db_update_thread_list_td_ta_thr_iter (ops);
if (target_has_execution)
iterate_over_lwps (minus_one_ptid /* iterate over all */,
@@ -1962,33 +1949,13 @@ thread_db_get_thread_local_address (struct target_ops *ops,
return beneath->to_get_thread_local_address (beneath, ptid, lm, offset);
}
-/* Callback routine used to find a thread based on the TID part of
- its PTID. */
-
-static int
-thread_db_find_thread_from_tid (struct thread_info *thread, void *data)
-{
- long *tid = (long *) data;
-
- if (thread->private->tid == *tid)
- return 1;
-
- return 0;
-}
-
/* Implement the to_get_ada_task_ptid target method for this target. */
static ptid_t
thread_db_get_ada_task_ptid (struct target_ops *self, long lwp, long thread)
{
- struct thread_info *thread_info;
-
- thread_db_find_new_threads_1 (inferior_ptid);
- thread_info = iterate_over_threads (thread_db_find_thread_from_tid, &thread);
-
- gdb_assert (thread_info != NULL);
-
- return (thread_info->ptid);
+ /* NPTL uses a 1:1 model, so the LWP id suffices. */
+ return ptid_build (ptid_get_pid (inferior_ptid), lwp, 0);
}
static void