aboutsummaryrefslogtreecommitdiff
path: root/gdb/infrun.h
diff options
context:
space:
mode:
authorSimon Marchi <simon.marchi@efficios.com>2021-01-24 23:57:29 -0500
committerPedro Alves <pedro@palves.net>2021-03-26 15:58:47 +0000
commit1192f124a308601f5fef7a35715ccd6f904e7b17 (patch)
tree0aec8738ccfc8f3948a521f30257dae75e342e65 /gdb/infrun.h
parente5b9b39f8872fa01efb9c7f8ce7283fb9cd5122d (diff)
downloadgdb-1192f124a308601f5fef7a35715ccd6f904e7b17.zip
gdb-1192f124a308601f5fef7a35715ccd6f904e7b17.tar.gz
gdb-1192f124a308601f5fef7a35715ccd6f904e7b17.tar.bz2
gdb: generalize commit_resume, avoid commit-resuming when threads have pending statuses
The rationale for this patch comes from the ROCm port [1], the goal being to reduce the number of back and forths between GDB and the target when doing successive operations. I'll start with explaining the rationale and then go over the implementation. In the ROCm / GPU world, the term "wave" is somewhat equivalent to a "thread" in GDB. So if you read if from a GPU stand point, just s/thread/wave/. ROCdbgapi, the library used by GDB [2] to communicate with the GPU target, gives the illusion that it's possible for the debugger to control (start and stop) individual threads. But in reality, this is not how it works. Under the hood, all threads of a queue are controlled as a group. To stop one thread in a group of running ones, the state of all threads is retrieved from the GPU, all threads are destroyed, and all threads but the one we want to stop are re-created from the saved state. The net result, from the point of view of GDB, is that the library stopped one thread. The same thing goes if we want to resume one thread while others are running: the state of all running threads is retrieved from the GPU, they are all destroyed, and they are all re-created, including the thread we want to resume. This leads to some inefficiencies when combined with how GDB works, here are two examples: - Stopping all threads: because the target operates in non-stop mode, when the user interface mode is all-stop, GDB must stop all threads individually when presenting a stop. Let's suppose we have 1000 threads and the user does ^C. GDB asks the target to stop one thread. Behind the scenes, the library retrieves 1000 thread states and restores the 999 others still running ones. GDB asks the target to stop another one. The target retrieves 999 thread states and restores the 998 remaining ones. That means that to stop 1000 threads, we did 1000 back and forths with the GPU. It would have been much better to just retrieve the states once and stop there. - Resuming with pending events: suppose the 1000 threads hit a breakpoint at the same time. The breakpoint is conditional and evaluates to true for the first thread, to false for all others. GDB pulls one event (for the first thread) from the target, decides that it should present a stop, so stops all threads using stop_all_threads. All these other threads have a breakpoint event to report, which is saved in `thread_info::suspend::waitstatus` for later. When the user does "continue", GDB resumes that one thread that did hit the breakpoint. It then processes the pending events one by one as if they just arrived. It picks one, evaluates the condition to false, and resumes the thread. It picks another one, evaluates the condition to false, and resumes the thread. And so on. In between each resumption, there is a full state retrieval and re-creation. It would be much nicer if we could wait a little bit before sending those threads on the GPU, until it processed all those pending events. To address this kind of performance issue, ROCdbgapi has a concept called "forward progress required", which is a boolean state that allows its user (i.e. GDB) to say "I'm doing a bunch of operations, you can hold off putting the threads on the GPU until I'm done" (the "forward progress not required" state). Turning forward progress back on indicates to the library that all threads that are supposed to be running should now be really running on the GPU. It turns out that GDB has a similar concept, though not as general, commit_resume. One difference is that commit_resume is not stateful: the target can't look up "does the core need me to schedule resumed threads for execution right now". It is also specifically linked to the resume method, it is not used in other contexts. The target accumulates resumption requests through target_ops::resume calls, and then commits those resumptions when target_ops::commit_resume is called. The target has no way to check if it's ok to leave resumed threads stopped in other target methods. To bridge the gap, this patch generalizes the commit_resume concept in GDB to match the forward progress concept of ROCdbgapi. The current name (commit_resume) can be interpreted as "commit the previous resume calls". I renamed the concept to "commit_resumed", as in "commit the threads that are resumed". In the new version, we have two things: - the commit_resumed_state field in process_stratum_target: indicates whether GDB requires target stacks using this target to have resumed threads committed to the execution target/device. If false, an execution target is allowed to leave resumed threads un-committed at the end of whatever method it is executing. - the commit_resumed target method: called when commit_resumed_state transitions from false to true. While commit_resumed_state was false, the target may have left some resumed threads un-committed. This method being called tells it that it should commit them back to the execution device. Let's take the "Stopping all threads" scenario from above and see how it would work with the ROCm target with this change. Before stopping all threads, GDB would set the target's commit_resumed_state field to false. It would then ask the target to stop the first thread. The target would retrieve all threads' state from the GPU and mark that one as stopped. Since commit_resumed_state is false, it leaves all the other threads (still resumed) stopped. GDB would then proceed to call target_stop for all the other threads. Since resumed threads are not committed, this doesn't do any back and forth with the GPU. To simplify the implementation of targets, this patch makes it so that when calling certain target methods, the contract between the core and the targets guarantees that commit_resumed_state is false. This way, the target doesn't need two paths, one for commit_resumed_state == true and one for commit_resumed_state == false. It can just assert that commit_resumed_state is false and work with that assumption. This also helps catch places where we forgot to disable commit_resumed_state before calling the method, which represents a probable optimization opportunity. The commit adds assertions in the target method wrappers (target_resume and friends) to have some confidence that this contract between the core and the targets is respected. The scoped_disable_commit_resumed type is used to disable the commit resumed state of all process targets on construction, and selectively re-enable it on destruction (see below for criteria). Note that it only sets the process_stratum_target::commit_resumed_state flag. A subsequent call to maybe_call_commit_resumed_all_targets is necessary to call the commit_resumed method on all target stacks with process targets that got their commit_resumed_state flag turned back on. This separation is because we don't want to call the commit_resumed methods in scoped_disable_commit_resumed's destructor, as they may throw. On destruction, commit-resumed is not re-enabled for a given target if: 1. this target has no threads resumed, or 2. this target has at least one resumed thread with a pending status known to the core (saved in thread_info::suspend::waitstatus). The first point is not technically necessary, because a proper commit_resumed implementation would be a no-op if the target has no resumed threads. But since we have a flag do to a quick check, it shouldn't hurt. The second point is more important: together with the scoped_disable_commit_resumed instance added in fetch_inferior_event, it makes it so the "Resuming with pending events" described above is handled efficiently. Here's what happens in that case: 1. The user types "continue". 2. Upon destruction, the scoped_disable_commit_resumed in the `proceed` function does not enable commit-resumed, as it sees some threads have pending statuses. 3. fetch_inferior_event is called to handle another event, the breakpoint hit evaluates to false, and that thread is resumed. Because there are still more threads with pending statuses, the destructor of scoped_disable_commit_resumed in fetch_inferior_event still doesn't enable commit-resumed. 4. Rinse and repeat step 3, until the last pending status is handled by fetch_inferior_event. In that case, scoped_disable_commit_resumed's destructor sees there are no more threads with pending statues, so it asks the target to commit resumed threads. This allows us to avoid all unnecessary back and forths, there is a single commit_resumed call once all pending statuses are processed. This change required remote_target::remote_stop_ns to learn how to handle stopping threads that were resumed but pending vCont. The simplest example where that happens is when using the remote target in all-stop, but with "maint set target-non-stop on", to force it to operate in non-stop mode under the hood. If two threads hit a breakpoint at the same time, GDB will receive two stop replies. It will present the stop for one thread and save the other one in thread_info::suspend::waitstatus. Before this patch, when doing "continue", GDB first resumes the thread without a pending status: Sending packet: $vCont;c:p172651.172676#f3 It then consumes the pending status in the next fetch_inferior_event call: [infrun] do_target_wait_1: Using pending wait status status->kind = stopped, signal = GDB_SIGNAL_TRAP for Thread 1517137.1517137. [infrun] target_wait (-1.0.0, status) = [infrun] 1517137.1517137.0 [Thread 1517137.1517137], [infrun] status->kind = stopped, signal = GDB_SIGNAL_TRAP It then realizes it needs to stop all threads to present the stop, so stops the thread it just resumed: [infrun] stop_all_threads: Thread 1517137.1517137 not executing [infrun] stop_all_threads: Thread 1517137.1517174 executing, need stop remote_stop called Sending packet: $vCont;t:p172651.172676#04 This is an unnecessary resume/stop. With this patch, we don't commit resumed threads after proceeding, because of the pending status: [infrun] maybe_commit_resumed_all_process_targets: not requesting commit-resumed for target extended-remote, a thread has a pending waitstatus When GDB handles the pending status and stop_all_threads runs, we stop a resumed but pending vCont thread: remote_stop_ns: Enqueueing phony stop reply for thread pending vCont-resume (1520940, 1520976, 0) That thread was never actually resumed on the remote stub / gdbserver, so we shouldn't send a packet to the remote side asking to stop the thread. Note that there are paths that resume the target and then do a synchronous blocking wait, in sort of nested event loop, via wait_sync_command_done. For example, inferior function calls, or any run control command issued from a breakpoint command list. We handle that making wait_sync_command_one a "sync" point -- force forward progress, or IOW, force-enable commit-resumed state. gdb/ChangeLog: yyyy-mm-dd Simon Marchi <simon.marchi@efficios.com> Pedro Alves <pedro@palves.net> * infcmd.c (run_command_1, attach_command, detach_command) (interrupt_target_1): Use scoped_disable_commit_resumed. * infrun.c (do_target_resume): Remove target_commit_resume call. (commit_resume_all_targets): Remove. (maybe_set_commit_resumed_all_targets): New. (maybe_call_commit_resumed_all_targets): New. (enable_commit_resumed): New. (scoped_disable_commit_resumed::scoped_disable_commit_resumed) (scoped_disable_commit_resumed::~scoped_disable_commit_resumed) (scoped_disable_commit_resumed::reset) (scoped_disable_commit_resumed::reset_and_commit) (scoped_enable_commit_resumed::scoped_enable_commit_resumed) (scoped_enable_commit_resumed::~scoped_enable_commit_resumed): New. (proceed): Use scoped_disable_commit_resumed and maybe_call_commit_resumed_all_targets. (fetch_inferior_event): Use scoped_disable_commit_resumed. * infrun.h (struct scoped_disable_commit_resumed): New. (maybe_call_commit_resumed_all_process_targets): New. (struct scoped_enable_commit_resumed): New. * mi/mi-main.c (exec_continue): Use scoped_disable_commit_resumed. * process-stratum-target.h (class process_stratum_target): <commit_resumed_state>: New. * record-full.c (record_full_wait_1): Change commit_resumed_state around calling commit_resumed. * remote.c (class remote_target) <commit_resume>: Rename to... <commit_resumed>: ... this. (struct stop_reply): Move up. (remote_target::commit_resume): Rename to... (remote_target::commit_resumed): ... this. Check if there is any thread pending vCont resume. (remote_target::remote_stop_ns): Generate stop replies for resumed but pending vCont threads. (remote_target::wait_ns): Add gdb_assert. * target-delegates.c: Regenerate. * target.c (target_wait, target_resume): Assert that the current process_stratum target isn't in commit-resumed state. (defer_target_commit_resume): Remove. (target_commit_resume): Remove. (target_commit_resumed): New. (make_scoped_defer_target_commit_resume): Remove. (target_stop): Assert that the current process_stratum target isn't in commit-resumed state. * target.h (struct target_ops) <commit_resume>: Rename to ... <commit_resumed>: ... this. (target_commit_resume): Remove. (target_commit_resumed): New. (make_scoped_defer_target_commit_resume): Remove. * top.c (wait_sync_command_done): Use scoped_enable_commit_resumed. [1] https://github.com/ROCm-Developer-Tools/ROCgdb/ [2] https://github.com/ROCm-Developer-Tools/ROCdbgapi Change-Id: I836135531a29214b21695736deb0a81acf8cf566
Diffstat (limited to 'gdb/infrun.h')
-rw-r--r--gdb/infrun.h98
1 files changed, 98 insertions, 0 deletions
diff --git a/gdb/infrun.h b/gdb/infrun.h
index e643c84..220ccc7 100644
--- a/gdb/infrun.h
+++ b/gdb/infrun.h
@@ -273,4 +273,102 @@ extern void all_uis_on_sync_execution_starting (void);
detach. */
extern void restart_after_all_stop_detach (process_stratum_target *proc_target);
+/* RAII object to temporarily disable the requirement for target
+ stacks to commit their resumed threads.
+
+ On construction, set process_stratum_target::commit_resumed_state
+ to false for all process_stratum targets in all target
+ stacks.
+
+ On destruction (or if reset_and_commit() is called), set
+ process_stratum_target::commit_resumed_state to true for all
+ process_stratum targets in all target stacks, except those that:
+
+ - have no resumed threads
+ - have a resumed thread with a pending status
+
+ target_commit_resumed is not called in the destructor, because its
+ implementations could throw, and we don't to swallow that error in
+ a destructor. Instead, the caller should call the
+ reset_and_commit_resumed() method so that an eventual exception can
+ propagate. "reset" in the method name refers to the fact that this
+ method has the same effect as the destructor, in addition to
+ committing resumes.
+
+ The creation of nested scoped_disable_commit_resumed objects is
+ tracked, such that only the outermost instance actually does
+ something, for cases like this:
+
+ void
+ inner_func ()
+ {
+ scoped_disable_commit_resumed disable;
+
+ // do stuff
+
+ disable.reset_and_commit ();
+ }
+
+ void
+ outer_func ()
+ {
+ scoped_disable_commit_resumed disable;
+
+ for (... each thread ...)
+ inner_func ();
+
+ disable.reset_and_commit ();
+ }
+
+ In this case, we don't want the `disable` destructor in
+ `inner_func` to require targets to commit resumed threads, so that
+ the `reset_and_commit()` call in `inner_func` doesn't actually
+ resume threads. */
+
+struct scoped_disable_commit_resumed
+{
+ explicit scoped_disable_commit_resumed (const char *reason);
+ ~scoped_disable_commit_resumed ();
+
+ DISABLE_COPY_AND_ASSIGN (scoped_disable_commit_resumed);
+
+ /* Undoes the disabling done by the ctor, and calls
+ maybe_call_commit_resumed_all_targets(). */
+ void reset_and_commit ();
+
+private:
+ /* Undoes the disabling done by the ctor. */
+ void reset ();
+
+ /* Whether this object has been reset. */
+ bool m_reset = false;
+
+ const char *m_reason;
+ bool m_prev_enable_commit_resumed;
+};
+
+/* Call target_commit_resumed method on all target stacks whose
+ process_stratum target layer has COMMIT_RESUME_STATE set. */
+
+extern void maybe_call_commit_resumed_all_targets ();
+
+/* RAII object to temporarily enable the requirement for target stacks
+ to commit their resumed threads. This is the inverse of
+ scoped_disable_commit_resumed. The constructor calls the
+ maybe_call_commit_resumed_all_targets function itself, since it's
+ OK to throw from a constructor. */
+
+struct scoped_enable_commit_resumed
+{
+ explicit scoped_enable_commit_resumed (const char *reason);
+ ~scoped_enable_commit_resumed ();
+
+ DISABLE_COPY_AND_ASSIGN (scoped_enable_commit_resumed);
+
+private:
+ const char *m_reason;
+ bool m_prev_enable_commit_resumed;
+};
+
+
#endif /* INFRUN_H */