diff options
author | Andrew Burgess <aburgess@redhat.com> | 2024-05-01 10:47:47 +0100 |
---|---|---|
committer | Simon Marchi <simon.marchi@polymtl.ca> | 2024-08-29 14:56:59 -0400 |
commit | dcaa85e58c4ef50a92908e071ded631ce48c971c (patch) | |
tree | 453aa17b0b8822327b41c454b0c91be094ee9571 /gdb/testsuite/gdb.python | |
parent | 83fbcee1a1afb4d2251109a436af82f8dac5b142 (diff) | |
download | gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.zip gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.gz gdb-dcaa85e58c4ef50a92908e071ded631ce48c971c.tar.bz2 |
gdb: reject inserting breakpoints between functions
When debugging ROCm code, you might have something like this:
__global__ void kernel ()
{
...
// break here
...
}
int main ()
{
// Code to call `kernel`
}
... where kernel is a function compiled to execute on the GPU. It does
not exist in the host x86-64 program that runs the main function, and
GDB doesn't know about that function until it is called, at which point
the runtime loads the corresponding code object and GDB learns about the
code of the "kernel" function. Before the GPU code object is loaded,
from the point of view of GDB, you might as well have blank lines
instead of the "kernel" function. The DWARF in the host program doesn't
describe anything at these lines.
So, a common problem that users face is:
- Start GDB with the host binary
- Place a breakpoint by line number at the "break here" line
- At this point, GDB only knows about the host code, the lines of the
`kernel` function are a big void.
- GDB finds no code mapped to the "break here" line and searches for
the first following line that has code mapped to it.
- GDB finds that the line with the opening bracket of the `main`
function (or around there) has code mapped to it, places breakpoint
there.
- User runs the program.
- The programs hits the breakpoint at the start of main.
- User is confused, because they didn't ask for a breakpoint in main.
If they continue, the code object eventually gets loaded, GDB reads the
debug info from it, re-evaluates the breakpoint locations, and at this
point the breakpoint is placed at the expected location.
The goal of this patch is to get rid of this annoyance.
A case similar to the one shown above can actually be simulated without
GPU-specific code: using a single source file to generate a library and
an executable loading that library (see the new test
gdb.linespec/line-breakpoint-outside-function.c for an example). Before
the library is loaded, trying to place a breakpoint in the library code
results in the breakpoint "drifting" down to the main function.
To address this problem, make it so that when a user requests a
breakpoint outside a function, GDB makes a pending breakpoint, rather
than placing a breakpoint at the next line with code, which happens to
be in the next function. When the GPU kernel or shared library gets
loaded, the breakpoint resolves to a location in the kernel or library.
Note that we still want breakpoints placed inside a function to
"drift" down to the next line with code. For example, here:
9
10 void foo()
11 {
12 int x;
13
14 x++;
There is probably no code associated to lines 10, 12 and 13, but the
user can still reasonably expect to be able to put a breakpoint there.
In my experience, GCC maps the function prologue to the line with the
opening curly bracket, so the user will be able to place a breakpoint
there anyway (line 11 in the example). But I don't really see a use
case to put a breakpoint above line 10 and expect to get a breakpoint in
foo. So I think that is a reasonable behavior change for GDB.
This is implemented using the following heuristic:
- If a breakpoint is requested at line L but there is no code mapped to
L, search for a following line with associated code (this already
exists today).
- However, if:
1. the found location falls in a function symbol's block
2. the found location's address is equal the entry PC of that
function
3. the found location's line is greater that the requested line
... then we don't place a breakpoint at the found location, we will
end up with a pending breakpoint.
Change the message "No line X in file..." to "No compiled code for line
X in file...". There is clearly a line 9 in the example above, so it
would be weird to say "No line 9 in file...". What we mean is that
there is no code associated to line 9.
All the regressions that I found this patch to cause were:
1. tests specifically this behavior where placing a breakpoint before
a function results in a breakpoint on that function, in which case I
removed the tests or changed them to expect a pending breakpoint
2. linespec tests expecting things like "break -line N garbage" to
error out because of the following garbage, but we now got a
different error because line N now doesn't resolve to something
anymore. For example, before:
(gdb) break -line 3 if foofoofoo == 1
No symbol "foofoofoo" in current context.
became
(gdb) break -line 3 if foofoofoo == 1
No line 3 in the current file.
These tests were modified to refer to a valid line with code, so
that we can still test what we intended to test.
Notes:
- The CUDA compiler "solves" this problem by adding dummy function
symbols between functions, that are never called. So when you try to
insert a breakpoint in the not-yet-loaded kernel, the breakpoint
still drifts, but is placed on some dummy symbol. For reasons that
would be too long to explain here, the ROCm compiler does not do
that, and it is not a desirable option.
- You can have constructs like this:
void host_function()
{
struct foo
{
static void __global__ kernel ()
{
// Place breakpoint here
}
};
// Host code that calls `kernel`
}
The heuristic won't work then, as the breakpoint will drift somewhere
inside the enclosing function, but won't be at the start of that
function. So a bogus breakpoint location will be created on the host
side. I don't think that people are going to use this kind of
construct often though, so we can probably ignore it (or at least it
shouldn't prevent making the more common case better).
ROCm doesn't support passing a lambda kernel function to
hipLaunchKernelGGL (the function used to launch kernels on the
device), but if it eventually does, there will be the same
problem.
I think that to properly support this, we will need some DWARF
improvements to be able to say "there is really nothing at these
lines" in the line table.
Co-Authored-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I3cc12cfa823dc7d8e24dd4d35bced8e8baf7f9b6
Diffstat (limited to 'gdb/testsuite/gdb.python')
-rw-r--r-- | gdb/testsuite/gdb.python/py-breakpoint.exp | 4 |
1 files changed, 3 insertions, 1 deletions
diff --git a/gdb/testsuite/gdb.python/py-breakpoint.exp b/gdb/testsuite/gdb.python/py-breakpoint.exp index c44477c..934690d 100644 --- a/gdb/testsuite/gdb.python/py-breakpoint.exp +++ b/gdb/testsuite/gdb.python/py-breakpoint.exp @@ -743,7 +743,9 @@ proc_with_prefix test_bkpt_explicit_loc {} { "No source file named foo.*" \ "set invalid explicit breakpoint by missing source and line" gdb_test "python bp1 = gdb.Breakpoint (source=\"$srcfile\", line=\"900\")" \ - "No line 900 in file \"$srcfile\".*" \ + [multi_line \ + "^No compiled code for line 900 in file \"$srcfile\"\\." \ + "Breakpoint $::decimal \[^\r\n\]+ pending\\."] \ "set invalid explicit breakpoint by source and invalid line" gdb_test "python bp1 = gdb.Breakpoint (function=\"blah\")" \ "Function \"blah\" not defined.*" \ |