aboutsummaryrefslogtreecommitdiff
path: root/gdb/ada-casefold.h
AgeCommit message (Collapse)AuthorFilesLines
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess1-1/+1
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2023-02-27Remove old GNU indent directivesTom Tromey1-1/+1
Now that gdb_indent.sh has been removed, I think it makes sense to also remove the directives intended for GNU indent.
2023-01-01Update copyright year range in header of all files managed by GDBJoel Brobecker1-1/+1
This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.
2022-03-07Handle non-ASCII identifiers in AdaTom Tromey1-0/+1345
Ada allows non-ASCII identifiers, and GNAT supports several such encodings. This patch adds the corresponding support to gdb. GNAT encodes non-ASCII characters using special symbol names. For character sets like Latin-1, where all characters are a single byte, it uses a "U" followed by the hex for the character. So, for example, thorn would be encoded as "Ufe" (0xFE being lower case thorn). For wider characters, despite what the manual says (it claims Shift-JIS and EUC can be used), in practice recent versions only support Unicode. Here, characters in the base plane are represented using "Wxxxx" and characters outside the base plane using "WWxxxxxxxx". GNAT has some further quirks here. Ada is case-insensitive, and GNAT emits symbols that have been case-folded. For characters in ASCII, and for all characters in non-Unicode character sets, lower case is used. For Unicode, however, characters that fit in a single byte are converted to lower case, but all others are converted to upper case. Furthermore, there is a bug in GNAT where two symbols that differ only in the case of "Y WITH DIAERESIS" (and potentially others, I did not check exhaustively) can be used in one program. I chose to omit handling this case from gdb, on the theory that it is hard to figure out the logic, and anyway if the bug is ever fixed, we'll regret having a heuristic. This patch introduces a new "ada source-charset" setting. It defaults to Latin-1, as that is GNAT's default. This setting controls how "U" characters are decoded -- W/WW are always handled as UTF-32. The ada_tag_name_from_tsd change is needed because this function will read memory from the inferior and interpret it -- and this caused an encoding failure on PPC when running a test that tries to read uninitialized memory. This patch implements its own UTF-32-based case folder. This avoids host platform quirks, and is relatively simple. A short Python program to generate the case-folding table is included. It simply relies on whatever version of Unicode is used by the host Python, which seems basically acceptable. Test cases for UTF-8, Latin-1, and Latin-3 are included. This exercises most of the new code paths, aside from Y WITH DIAERESIS as noted above.