Age | Commit message (Collapse) | Author | Files | Lines |
|
Currently GDB's source cache doesn't track whether the entries within
the cache are styled or not. This is pretty much fine, the assumption
is that any time we are fetching source code, we do so in order to
print it to the terminal, so where possible we always want styling
applied, and if styling is not applied, then it is because that file
cannot be styled for some reason.
Changes to 'set style enabled' cause the source cache to be flushed,
so future calls to fetch source code will regenerate the cache entries
with styling enabled or not as appropriate.
But this all assumes that styling is either on or off, and that
switching between these two states isn't done very often.
However, the Python API allows for individual commands to be executed
with styling turned off via gdb.execute(). See commit:
commit e5348a7ab3f11f4c096ee4ebcdb9eb2663337357
Date: Thu Feb 13 15:39:31 2025 +0000
gdb/python: new styling argument to gdb.execute
Currently the source cache doesn't handle this case. Consider this:
(gdb) list main
... snip, styled source code displayed here ...
(gdb) python gdb.execute("list main", True, False, False)
... snip, styled source code is still shown here ...
In the second case, the final `False` passed to gdb.execute() is
asking for unstyled output.
The problem is that, `get_source_lines` calls `ensure` to prime the
cache for the file in question, then `extract_lines` just pulls the
lines of interest from the cached contents.
In `ensure`, if there is a cache entry for the desired filename, then
that is considered good enough. There is no consideration about
whether the cache entry is styled or not.
This commit aims to fix this, after this commit, the `ensure` function
will make sure that the cache entry used by `get_source_lines` is
styled correctly.
I think there are two approaches I could take:
1. Allow multiple cache entries for a single file, a styled, and
non-styled entry. The `ensure` function would then place the
correct cache entry into the last position so that
`get_source_lines` would use the correct entry, or
2. Have `ensure` recalculate entries if the required styling mode is
different to the styling mode of the current entry.
Approach #1 is better if we are rapidly switching between styling
modes, while #2 might be better if we want to keep more files in the
cache and we only rarely switch styling modes.
In the end I chose approach #2, but the good thing is that the changes
are all contained within the `ensure` function. If in the future we
wanted to change to strategy #1, this could be done transparently to
the rest of GDB.
So after this commit, the `ensure` function checks if styling is
currently possible or not. If it is not, and the current entry is
styled, then the current entry only is dropped from the cache, and a
new, unstyled entry is created. Likewise, if the current entry is
non-styled, but styling is required, we drop one entry and
recalculate.
With this change in place, I have updated set_style_enabled (in
cli/cli-style.c) so the source cache is no longer flushed when the
style settings are changed, the source cache will automatically handle
changes to the style settings now.
This problem was discovered in PR gdb/32676.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32676
Approved-By: Tom Tromey <tom@tromey.com>
|
|
I noticed that it was not possible to return a string containing non
utf-8 characters using gdb.execute(). For example, using the binary
from the gdb.python/py-source-styling.exp test:
(gdb) file ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling
Reading symbols from ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling...
(gdb) set style enabled off
(gdb) list 26
21 int some_variable = 1234;
22
23 /* The following line contains a character that is non-utf-8. This is a
24 critical part of the test as Python 3 can't convert this into a string
25 using its default mechanism. */
26 char c[] = "�"; /* List this line. */
27
28 return 0;
29 }
(gdb) python print(gdb.execute('list 26', to_string=True))
Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte
Error occurred in Python: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte
It is necessary to disable styling before the initial 'list 26',
otherwise the source will be passed through GNU source highlight, and
GNU source highlight seems to be smart enough to figure out the
character encoding, and convert it to UTF-8. This conversion is then
cached in the source cache, and the later Python gdb.execute call will
get back a pure UTF-8 string.
If source styling is disabled, then GDB caches the string without the
conversion to UTF-8, now the gdb.execute call gets back the string
with a non-UTF-8 character within it, and Python throws an error
during its attempt to create a string object.
I'm not, at this point, proposing a solution that tries to guess the
source file encoding, though I guess such a thing could be done.
Instead, I think we should make use of the host_charset(), as set by
the user with 'set host-charset ....' during the creation of the
Python string.
To do this, in execute_gdb_command, we should switch from
PyUnicode_FromString, which requires the input be a UTF-8 string, to
using PyUnicode_Decode, which allows GDB to specify the string
encoding. We will use host_charset().
With this done, it is now possible to list the file contents using
gdb.execute(), with the contents passing through a string:
(gdb) set host-charset ISO-8859-1
(gdb) python print(gdb.execute('list 26', to_string=True), end='')
21 int some_variable = 1234;
22
23 /* The following line contains a character that is non-utf-8. This is a
24 critical part of the test as Python 3 can't convert this into a string
25 using its default mechanism. */
26 char c[] = "À"; /* List this line. */
27
28 return 0;
29 }
(gdb)
There are already plenty of other places in GDB's Python code where we
use PyUnicode_Decode to create a string from something that might
contain user generated content, so I believe this is the correct
approach.
|
|
The top comment in gdb.python/py-source-styling.exp was completely
wrong, clearly a cut&paste job from elsewhere. Write a comment that
actually reflects what the test does.
I've also moved the allow_python_tests check earlier in the file.
And I changed some 'return -1' into just 'return'. I'm not aware that
the '-1' adds any value.
I also folded a 'pass $gdb_test_name' into the preceding gdb_assert,
which I think is neater.
There is no change in what is actually being tested after this commit.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
|
|
I ran across this site:
https://no-color.org/
... which lobbies for tools to recognize the NO_COLOR environment
variable and disable any terminal styling when it is seen.
This patch implements this for gdb.
Regression tested on x86-64 Fedora 38.
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Reviewed-by: Kevin Buettner <kevinb@redhat.com>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
This changes skip_python_tests to invert the sense, and renames it to
allow_python_tests.
|
|
This commit is the result of running the gdb/copyright.py script,
which automated the update of the copyright year range for all
source files managed by the GDB project to be updated to include
year 2023.
|
|
This commit adds support for source files that contain non utf-8
characters when performing source styling using the Python pygments
package. This does not change the behaviour of GDB when the GNU
Source Highlight library is used.
For the following problem description, assume that either GDB is built
without GNU Source Highlight support, of that this has been disabled
using 'maintenance set gnu-source-highlight enabled off'.
The initial problem reported was that a source file containing non
utf-8 characters would cause GDB to print a Python exception, and then
display the source without styling, e.g.:
Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 142: invalid start byte
/* Source code here, without styling... */
Further, as the user steps through different source files, each time
the problematic source file was evicted from the source cache, and
then later reloaded, the exception would be printed again.
Finally, this problem is only present when using Python 3, this issue
is not present for Python 2.
What makes this especially frustrating is that GDB can clearly print
the source file contents, they're right there... If we disable
styling completely, or make use of the GNU Source Highlight library,
then everything is fine. So why is there an error when we try to
apply styling using Python?
The problem is the use of PyString_FromString (which is an alias for
PyUnicode_FromString in Python 3), this function converts a C string
into a either a Unicode object (Py3) or a str object (Py2). For
Python 2 there is no unicode encoding performed during this function
call, but for Python 3 the input is assumed to be a uft-8 encoding
string for the purpose of the conversion. And here of course, is the
problem, if the source file contains non utf-8 characters, then it
should not be treated as utf-8, but that's what we do, and that's why
we get an error.
My first thought when looking at this was to spot when the
PyString_FromString call failed with a UnicodeDecodeError and silently
ignore the error. This would mean that GDB would print the source
without styling, but would also avoid the annoying exception message.
However, I also make use of `pygmentize`, a command line wrapper
around the Python pygments module, which I use to apply syntax
highlighting in the output of `less`. And this command line wrapper
is quite happy to syntax highlight my source file that contains non
utf-8 characters, so it feels like the problem should be solvable.
It turns out that inside the pygments module there is already support
for guessing the encoding of the incoming file content, if the
incoming content is not already a Unicode string. This is what
happens for Python 2 where the incoming content is of `str` type.
We could try and make GDB smarter when it comes to converting C
strings into Python Unicode objects; this would probably require us to
just try a couple of different encoding schemes rather than just
giving up after utf-8.
However, I figure, why bother? The pygments module already does this
for us, and the colorize API is not part of the documented external
API of GDB. So, why not just change the colorize API, instead of the
content being a Unicode string (for Python 3), lets just make the
content be a bytes object. The pygments module can then take
responsibility for guessing the encoding.
So, currently, the colorize API receives a unicode object, and returns
a unicode object. I propose that the colorize API receive a bytes
object, and return a bytes object.
|