aboutsummaryrefslogtreecommitdiff
path: root/gdb/testsuite/gdb.python/py-source-styling.exp
AgeCommit message (Collapse)AuthorFilesLines
2025-03-21gdb: check styled status of source cache entriesAndrew Burgess1-13/+94
Currently GDB's source cache doesn't track whether the entries within the cache are styled or not. This is pretty much fine, the assumption is that any time we are fetching source code, we do so in order to print it to the terminal, so where possible we always want styling applied, and if styling is not applied, then it is because that file cannot be styled for some reason. Changes to 'set style enabled' cause the source cache to be flushed, so future calls to fetch source code will regenerate the cache entries with styling enabled or not as appropriate. But this all assumes that styling is either on or off, and that switching between these two states isn't done very often. However, the Python API allows for individual commands to be executed with styling turned off via gdb.execute(). See commit: commit e5348a7ab3f11f4c096ee4ebcdb9eb2663337357 Date: Thu Feb 13 15:39:31 2025 +0000 gdb/python: new styling argument to gdb.execute Currently the source cache doesn't handle this case. Consider this: (gdb) list main ... snip, styled source code displayed here ... (gdb) python gdb.execute("list main", True, False, False) ... snip, styled source code is still shown here ... In the second case, the final `False` passed to gdb.execute() is asking for unstyled output. The problem is that, `get_source_lines` calls `ensure` to prime the cache for the file in question, then `extract_lines` just pulls the lines of interest from the cached contents. In `ensure`, if there is a cache entry for the desired filename, then that is considered good enough. There is no consideration about whether the cache entry is styled or not. This commit aims to fix this, after this commit, the `ensure` function will make sure that the cache entry used by `get_source_lines` is styled correctly. I think there are two approaches I could take: 1. Allow multiple cache entries for a single file, a styled, and non-styled entry. The `ensure` function would then place the correct cache entry into the last position so that `get_source_lines` would use the correct entry, or 2. Have `ensure` recalculate entries if the required styling mode is different to the styling mode of the current entry. Approach #1 is better if we are rapidly switching between styling modes, while #2 might be better if we want to keep more files in the cache and we only rarely switch styling modes. In the end I chose approach #2, but the good thing is that the changes are all contained within the `ensure` function. If in the future we wanted to change to strategy #1, this could be done transparently to the rest of GDB. So after this commit, the `ensure` function checks if styling is currently possible or not. If it is not, and the current entry is styled, then the current entry only is dropped from the cache, and a new, unstyled entry is created. Likewise, if the current entry is non-styled, but styling is required, we drop one entry and recalculate. With this change in place, I have updated set_style_enabled (in cli/cli-style.c) so the source cache is no longer flushed when the style settings are changed, the source cache will automatically handle changes to the style settings now. This problem was discovered in PR gdb/32676. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32676 Approved-By: Tom Tromey <tom@tromey.com>
2025-03-15gdb/python: handle non-utf-8 character from gdb.execute()Andrew Burgess1-14/+55
I noticed that it was not possible to return a string containing non utf-8 characters using gdb.execute(). For example, using the binary from the gdb.python/py-source-styling.exp test: (gdb) file ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling Reading symbols from ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling... (gdb) set style enabled off (gdb) list 26 21 int some_variable = 1234; 22 23 /* The following line contains a character that is non-utf-8. This is a 24 critical part of the test as Python 3 can't convert this into a string 25 using its default mechanism. */ 26 char c[] = "�"; /* List this line. */ 27 28 return 0; 29 } (gdb) python print(gdb.execute('list 26', to_string=True)) Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte Error occurred in Python: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte It is necessary to disable styling before the initial 'list 26', otherwise the source will be passed through GNU source highlight, and GNU source highlight seems to be smart enough to figure out the character encoding, and convert it to UTF-8. This conversion is then cached in the source cache, and the later Python gdb.execute call will get back a pure UTF-8 string. If source styling is disabled, then GDB caches the string without the conversion to UTF-8, now the gdb.execute call gets back the string with a non-UTF-8 character within it, and Python throws an error during its attempt to create a string object. I'm not, at this point, proposing a solution that tries to guess the source file encoding, though I guess such a thing could be done. Instead, I think we should make use of the host_charset(), as set by the user with 'set host-charset ....' during the creation of the Python string. To do this, in execute_gdb_command, we should switch from PyUnicode_FromString, which requires the input be a UTF-8 string, to using PyUnicode_Decode, which allows GDB to specify the string encoding. We will use host_charset(). With this done, it is now possible to list the file contents using gdb.execute(), with the contents passing through a string: (gdb) set host-charset ISO-8859-1 (gdb) python print(gdb.execute('list 26', to_string=True), end='') 21 int some_variable = 1234; 22 23 /* The following line contains a character that is non-utf-8. This is a 24 critical part of the test as Python 3 can't convert this into a string 25 using its default mechanism. */ 26 char c[] = "À"; /* List this line. */ 27 28 return 0; 29 } (gdb) There are already plenty of other places in GDB's Python code where we use PyUnicode_Decode to create a string from something that might contain user generated content, so I believe this is the correct approach.
2025-02-14gdb/testsuite: clean ups in gdb.python/py-source-styling.expAndrew Burgess1-8/+8
The top comment in gdb.python/py-source-styling.exp was completely wrong, clearly a cut&paste job from elsewhere. Write a comment that actually reflects what the test does. I've also moved the allow_python_tests check earlier in the file. And I changed some 'return -1' into just 'return'. I'm not aware that the '-1' adds any value. I also folded a 'pass $gdb_test_name' into the preceding gdb_assert, which I think is neater. There is no change in what is actually being tested after this commit. Approved-By: Tom Tromey <tom@tromey.com>
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess1-1/+1
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2023-09-29Support the NO_COLOR environment variableTom Tromey1-3/+1
I ran across this site: https://no-color.org/ ... which lobbies for tools to recognize the NO_COLOR environment variable and disable any terminal styling when it is seen. This patch implements this for gdb. Regression tested on x86-64 Fedora 38. Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Reviewed-by: Kevin Buettner <kevinb@redhat.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-01-13Rename to allow_python_testsTom Tromey1-1/+1
This changes skip_python_tests to invert the sense, and renames it to allow_python_tests.
2023-01-01Update copyright year range in header of all files managed by GDBJoel Brobecker1-1/+1
This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.
2022-01-26gdb/python: handle non utf-8 characters when source highlightingAndrew Burgess1-0/+64
This commit adds support for source files that contain non utf-8 characters when performing source styling using the Python pygments package. This does not change the behaviour of GDB when the GNU Source Highlight library is used. For the following problem description, assume that either GDB is built without GNU Source Highlight support, of that this has been disabled using 'maintenance set gnu-source-highlight enabled off'. The initial problem reported was that a source file containing non utf-8 characters would cause GDB to print a Python exception, and then display the source without styling, e.g.: Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 142: invalid start byte /* Source code here, without styling... */ Further, as the user steps through different source files, each time the problematic source file was evicted from the source cache, and then later reloaded, the exception would be printed again. Finally, this problem is only present when using Python 3, this issue is not present for Python 2. What makes this especially frustrating is that GDB can clearly print the source file contents, they're right there... If we disable styling completely, or make use of the GNU Source Highlight library, then everything is fine. So why is there an error when we try to apply styling using Python? The problem is the use of PyString_FromString (which is an alias for PyUnicode_FromString in Python 3), this function converts a C string into a either a Unicode object (Py3) or a str object (Py2). For Python 2 there is no unicode encoding performed during this function call, but for Python 3 the input is assumed to be a uft-8 encoding string for the purpose of the conversion. And here of course, is the problem, if the source file contains non utf-8 characters, then it should not be treated as utf-8, but that's what we do, and that's why we get an error. My first thought when looking at this was to spot when the PyString_FromString call failed with a UnicodeDecodeError and silently ignore the error. This would mean that GDB would print the source without styling, but would also avoid the annoying exception message. However, I also make use of `pygmentize`, a command line wrapper around the Python pygments module, which I use to apply syntax highlighting in the output of `less`. And this command line wrapper is quite happy to syntax highlight my source file that contains non utf-8 characters, so it feels like the problem should be solvable. It turns out that inside the pygments module there is already support for guessing the encoding of the incoming file content, if the incoming content is not already a Unicode string. This is what happens for Python 2 where the incoming content is of `str` type. We could try and make GDB smarter when it comes to converting C strings into Python Unicode objects; this would probably require us to just try a couple of different encoding schemes rather than just giving up after utf-8. However, I figure, why bother? The pygments module already does this for us, and the colorize API is not part of the documented external API of GDB. So, why not just change the colorize API, instead of the content being a Unicode string (for Python 3), lets just make the content be a bytes object. The pygments module can then take responsibility for guessing the encoding. So, currently, the colorize API receives a unicode object, and returns a unicode object. I propose that the colorize API receive a bytes object, and return a bytes object.