riscv-gnu-toolchain/gdb.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Andrew Burgess <aburgess@redhat.com>	2025-02-14 11:51:41 +0000
committer	Andrew Burgess <aburgess@redhat.com>	2025-03-15 12:36:46 +0000
commit	8bfe8a6bfdd45de43c626a12fd176750486a0759 (patch)
tree	f009f658a4f681d144a0c390a7ec3f23524f11a6 /gdb/python/python-config.py
parent	c7d973ab6189290fc894529cec5db7f585074ab4 (diff)
download	gdb-8bfe8a6bfdd45de43c626a12fd176750486a0759.zip gdb-8bfe8a6bfdd45de43c626a12fd176750486a0759.tar.gz gdb-8bfe8a6bfdd45de43c626a12fd176750486a0759.tar.bz2

gdb/python: handle non-utf-8 character from gdb.execute()

I noticed that it was not possible to return a string containing non utf-8 characters using gdb.execute(). For example, using the binary from the gdb.python/py-source-styling.exp test: (gdb) file ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling Reading symbols from ./gdb/testsuite/outputs/gdb.python/py-source-styling/py-source-styling... (gdb) set style enabled off (gdb) list 26 21 int some_variable = 1234; 22 23 /* The following line contains a character that is non-utf-8. This is a 24 critical part of the test as Python 3 can't convert this into a string 25 using its default mechanism. */ 26 char c[] = "�"; /* List this line. */ 27 28 return 0; 29 } (gdb) python print(gdb.execute('list 26', to_string=True)) Python Exception <class 'UnicodeDecodeError'>: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte Error occurred in Python: 'utf-8' codec can't decode byte 0xc0 in position 250: invalid start byte It is necessary to disable styling before the initial 'list 26', otherwise the source will be passed through GNU source highlight, and GNU source highlight seems to be smart enough to figure out the character encoding, and convert it to UTF-8. This conversion is then cached in the source cache, and the later Python gdb.execute call will get back a pure UTF-8 string. If source styling is disabled, then GDB caches the string without the conversion to UTF-8, now the gdb.execute call gets back the string with a non-UTF-8 character within it, and Python throws an error during its attempt to create a string object. I'm not, at this point, proposing a solution that tries to guess the source file encoding, though I guess such a thing could be done. Instead, I think we should make use of the host_charset(), as set by the user with 'set host-charset ....' during the creation of the Python string. To do this, in execute_gdb_command, we should switch from PyUnicode_FromString, which requires the input be a UTF-8 string, to using PyUnicode_Decode, which allows GDB to specify the string encoding. We will use host_charset(). With this done, it is now possible to list the file contents using gdb.execute(), with the contents passing through a string: (gdb) set host-charset ISO-8859-1 (gdb) python print(gdb.execute('list 26', to_string=True), end='') 21 int some_variable = 1234; 22 23 /* The following line contains a character that is non-utf-8. This is a 24 critical part of the test as Python 3 can't convert this into a string 25 using its default mechanism. */ 26 char c[] = "À"; /* List this line. */ 27 28 return 0; 29 } (gdb) There are already plenty of other places in GDB's Python code where we use PyUnicode_Decode to create a string from something that might contain user generated content, so I believe this is the correct approach.

Diffstat (limited to 'gdb/python/python-config.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: