gdb: reject inserting breakpoints between functions

When debugging ROCm code, you might have something like this: __global__ void kernel () { ... // break here ... } int main () { // Code to call `kernel` } ... where kernel is a function compiled to execute on the GPU. It does not exist in the host x86-64 program that runs the main function, and GDB doesn't know about that function until it is called, at which point the runtime loads the corresponding code object and GDB learns about the code of the "kernel" function. Before the GPU code object is loaded, from the point of view of GDB, you might as well have blank lines instead of the "kernel" function. The DWARF in the host program doesn't describe anything at these lines. So, a common problem that users face is: - Start GDB with the host binary - Place a breakpoint by line number at the "break here" line - At this point, GDB only knows about the host code, the lines of the `kernel` function are a big void. - GDB finds no code mapped to the "break here" line and searches for the first following line that has code mapped to it. - GDB finds that the line with the opening bracket of the `main` function (or around there) has code mapped to it, places breakpoint there. - User runs the program. - The programs hits the breakpoint at the start of main. - User is confused, because they didn't ask for a breakpoint in main. If they continue, the code object eventually gets loaded, GDB reads the debug info from it, re-evaluates the breakpoint locations, and at this point the breakpoint is placed at the expected location. The goal of this patch is to get rid of this annoyance. A case similar to the one shown above can actually be simulated without GPU-specific code: using a single source file to generate a library and an executable loading that library (see the new test gdb.linespec/line-breakpoint-outside-function.c for an example). Before the library is loaded, trying to place a breakpoint in the library code results in the breakpoint "drifting" down to the main function. To address this problem, make it so that when a user requests a breakpoint outside a function, GDB makes a pending breakpoint, rather than placing a breakpoint at the next line with code, which happens to be in the next function. When the GPU kernel or shared library gets loaded, the breakpoint resolves to a location in the kernel or library. Note that we still want breakpoints placed inside a function to "drift" down to the next line with code. For example, here: 9 10 void foo() 11 { 12 int x; 13 14 x++; There is probably no code associated to lines 10, 12 and 13, but the user can still reasonably expect to be able to put a breakpoint there. In my experience, GCC maps the function prologue to the line with the opening curly bracket, so the user will be able to place a breakpoint there anyway (line 11 in the example). But I don't really see a use case to put a breakpoint above line 10 and expect to get a breakpoint in foo. So I think that is a reasonable behavior change for GDB. This is implemented using the following heuristic: - If a breakpoint is requested at line L but there is no code mapped to L, search for a following line with associated code (this already exists today). - However, if: 1. the found location falls in a function symbol's block 2. the found location's address is equal the entry PC of that function 3. the found location's line is greater that the requested line ... then we don't place a breakpoint at the found location, we will end up with a pending breakpoint. Change the message "No line X in file..." to "No compiled code for line X in file...". There is clearly a line 9 in the example above, so it would be weird to say "No line 9 in file...". What we mean is that there is no code associated to line 9. All the regressions that I found this patch to cause were: 1. tests specifically this behavior where placing a breakpoint before a function results in a breakpoint on that function, in which case I removed the tests or changed them to expect a pending breakpoint 2. linespec tests expecting things like "break -line N garbage" to error out because of the following garbage, but we now got a different error because line N now doesn't resolve to something anymore. For example, before: (gdb) break -line 3 if foofoofoo == 1 No symbol "foofoofoo" in current context. became (gdb) break -line 3 if foofoofoo == 1 No line 3 in the current file. These tests were modified to refer to a valid line with code, so that we can still test what we intended to test. Notes: - The CUDA compiler "solves" this problem by adding dummy function symbols between functions, that are never called. So when you try to insert a breakpoint in the not-yet-loaded kernel, the breakpoint still drifts, but is placed on some dummy symbol. For reasons that would be too long to explain here, the ROCm compiler does not do that, and it is not a desirable option. - You can have constructs like this: void host_function() { struct foo { static void __global__ kernel () { // Place breakpoint here } }; // Host code that calls `kernel` } The heuristic won't work then, as the breakpoint will drift somewhere inside the enclosing function, but won't be at the start of that function. So a bogus breakpoint location will be created on the host side. I don't think that people are going to use this kind of construct often though, so we can probably ignore it (or at least it shouldn't prevent making the more common case better). ROCm doesn't support passing a lambda kernel function to hipLaunchKernelGGL (the function used to launch kernels on the device), but if it eventually does, there will be the same problem. I think that to properly support this, we will need some DWARF improvements to be able to say "there is really nothing at these lines" in the line table. Co-Authored-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I3cc12cfa823dc7d8e24dd4d35bced8e8baf7f9b6
author: Andrew Burgess <aburgess@redhat.com> 2024-05-01 10:47:47 +0100
committer: Simon Marchi <simon.marchi@polymtl.ca> 2024-08-29 14:56:59 -0400
commit: dcaa85e58c4ef50a92908e071ded631ce48c971c (patch)
tree: 453aa17b0b8822327b41c454b0c91be094ee9571 /gdb/testsuite/gdb.python
parent: 83fbcee1a1afb4d2251109a436af82f8dac5b142 (diff)
download: gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.zip
gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.gz
gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.bz2
1 files changed, 3 insertions, 1 deletions
diff --git a/gdb/testsuite/gdb.python/py-breakpoint.exp b/gdb/testsuite/gdb.python/py-breakpoint.exp
index c44477c..934690d 100644
--- a/gdb/testsuite/gdb.python/py-breakpoint.exp
+++ b/gdb/testsuite/gdb.python/py-breakpoint.exp
@@ -743,7 +743,9 @@ proc_with_prefix test_bkpt_explicit_loc {} {
 	"No source file named foo.*" \
 	"set invalid explicit breakpoint by missing source and line"
     gdb_test "python bp1 = gdb.Breakpoint (source=\"$srcfile\", line=\"900\")" \
-	"No line 900 in file \"$srcfile\".*" \
+	[multi_line \
+	     "^No compiled code for line 900 in file \"$srcfile\"\\." \
+	     "Breakpoint $::decimal \[^\r\n\]+ pending\\."] \
 	"set invalid explicit breakpoint by source and invalid line"
     gdb_test "python bp1 = gdb.Breakpoint (function=\"blah\")" \
 	"Function \"blah\" not defined.*" \
author	Andrew Burgess <aburgess@redhat.com>	2024-05-01 10:47:47 +0100
committer	Simon Marchi <simon.marchi@polymtl.ca>	2024-08-29 14:56:59 -0400
commit	dcaa85e58c4ef50a92908e071ded631ce48c971c (patch)
tree	453aa17b0b8822327b41c454b0c91be094ee9571 /gdb/testsuite/gdb.python
parent	83fbcee1a1afb4d2251109a436af82f8dac5b142 (diff)
download	gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.zip gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.gz gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.bz2