From 4f01ae3761ca1f8dd7a33b833ae30624f047ac9c Mon Sep 17 00:00:00 2001 From: David Malcolm Date: Wed, 21 Jun 2023 21:49:00 -0400 Subject: diagnostics: add support for "text art" diagrams Existing text output in GCC has to be implemented by writing sequentially to a pretty_printer instance. This makes it hard to implement some kinds of diagnostic output (see e.g. diagnostic-show-locus.cc). This patch adds more flexible ways of creating text output: - a canvas class, which can be "painted" to via random-access (rather that sequentially) - a table class for 2D grid layout, supporting items that span multiple rows/columns - a widget class for organizing diagrams hierarchically. The patch also expands GCC's diagnostics subsystem so that diagnostics can have "text art" diagrams - think ASCII art, but potentially including some Unicode characters, such as box-drawing chars. The new code is in a new "gcc/text-art" subdirectory and "text_art" namespace. The patch adds a new "-fdiagnostics-text-art-charset=VAL" option, with values: - "none": don't emit diagrams (added to -fdiagnostics-plain-output) - "ascii": use pure ASCII in diagrams - "unicode": allow for conservative use of unicode drawing characters (such as box-drawing characters). - "emoji" (the default): as "unicode", but potentially allow for conservative use of emoji in the output (such as U+26A0 WARNING SIGN). I made it possible to disable emoji separately from unicode as I believe there's a generation gap in acceptance of these characters (some older programmers have a visceral reaction against them, whereas younger programmers may have no problem with them). Diagrams are emitted to stderr by default. With SARIF output they are captured as a location in "relatedLocations", with the diagram as a code block in Markdown within a "markdown" property of a message. This patch doesn't add any such diagram usage to GCC, saving that for followups, apart from adding a plugin to the test suite to exercise the functionality. contrib/ChangeLog: * unicode/gen-box-drawing-chars.py: New file. * unicode/gen-combining-chars.py: New file. * unicode/gen-printable-chars.py: New file. gcc/ChangeLog: * Makefile.in (OBJS-libcommon): Add text-art/box-drawing.o, text-art/canvas.o, text-art/ruler.o, text-art/selftests.o, text-art/style.o, text-art/styled-string.o, text-art/table.o, text-art/theme.o, and text-art/widget.o. * color-macros.h (COLOR_FG_BRIGHT_BLACK): New. (COLOR_FG_BRIGHT_RED): New. (COLOR_FG_BRIGHT_GREEN): New. (COLOR_FG_BRIGHT_YELLOW): New. (COLOR_FG_BRIGHT_BLUE): New. (COLOR_FG_BRIGHT_MAGENTA): New. (COLOR_FG_BRIGHT_CYAN): New. (COLOR_FG_BRIGHT_WHITE): New. (COLOR_BG_BRIGHT_BLACK): New. (COLOR_BG_BRIGHT_RED): New. (COLOR_BG_BRIGHT_GREEN): New. (COLOR_BG_BRIGHT_YELLOW): New. (COLOR_BG_BRIGHT_BLUE): New. (COLOR_BG_BRIGHT_MAGENTA): New. (COLOR_BG_BRIGHT_CYAN): New. (COLOR_BG_BRIGHT_WHITE): New. * common.opt (fdiagnostics-text-art-charset=): New option. (diagnostic-text-art.h): New SourceInclude. (diagnostic_text_art_charset) New Enum and EnumValues. * configure: Regenerate. * configure.ac (gccdepdir): Add text-art to loop. * diagnostic-diagram.h: New file. * diagnostic-format-json.cc (json_emit_diagram): New. (diagnostic_output_format_init_json): Wire it up to context->m_diagrams.m_emission_cb. * diagnostic-format-sarif.cc: Include "diagnostic-diagram.h" and "text-art/canvas.h". (sarif_result::on_nested_diagnostic): Move code to... (sarif_result::add_related_location): ...this new function. (sarif_result::on_diagram): New. (sarif_builder::emit_diagram): New. (sarif_builder::make_message_object_for_diagram): New. (sarif_emit_diagram): New. (diagnostic_output_format_init_sarif): Set context->m_diagrams.m_emission_cb to sarif_emit_diagram. * diagnostic-text-art.h: New file. * diagnostic.cc: Include "diagnostic-text-art.h", "diagnostic-diagram.h", and "text-art/theme.h". (diagnostic_initialize): Initialize context->m_diagrams and call diagnostics_text_art_charset_init. (diagnostic_finish): Clean up context->m_diagrams.m_theme. (diagnostic_emit_diagram): New. (diagnostics_text_art_charset_init): New. * diagnostic.h (text_art::theme): New forward decl. (class diagnostic_diagram): Likewise. (diagnostic_context::m_diagrams): New field. (diagnostic_emit_diagram): New decl. * doc/invoke.texi (Diagnostic Message Formatting Options): Add -fdiagnostics-text-art-charset=. (-fdiagnostics-plain-output): Add -fdiagnostics-text-art-charset=none. * gcc.cc: Include "diagnostic-text-art.h". (driver_handle_option): Handle OPT_fdiagnostics_text_art_charset_. * opts-common.cc (decode_cmdline_options_to_array): Add "-fdiagnostics-text-art-charset=none" to expanded_args for -fdiagnostics-plain-output. * opts.cc: Include "diagnostic-text-art.h". (common_handle_option): Handle OPT_fdiagnostics_text_art_charset_. * pretty-print.cc (pp_unicode_character): New. * pretty-print.h (pp_unicode_character): New decl. * selftest-run-tests.cc: Include "text-art/selftests.h". (selftest::run_tests): Call text_art_tests. * text-art/box-drawing-chars.inc: New file, generated by contrib/unicode/gen-box-drawing-chars.py. * text-art/box-drawing.cc: New file. * text-art/box-drawing.h: New file. * text-art/canvas.cc: New file. * text-art/canvas.h: New file. * text-art/ruler.cc: New file. * text-art/ruler.h: New file. * text-art/selftests.cc: New file. * text-art/selftests.h: New file. * text-art/style.cc: New file. * text-art/styled-string.cc: New file. * text-art/table.cc: New file. * text-art/table.h: New file. * text-art/theme.cc: New file. * text-art/theme.h: New file. * text-art/types.h: New file. * text-art/widget.cc: New file. * text-art/widget.h: New file. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic-test-text-art-ascii-bw.c: New test. * gcc.dg/plugin/diagnostic-test-text-art-ascii-color.c: New test. * gcc.dg/plugin/diagnostic-test-text-art-none.c: New test. * gcc.dg/plugin/diagnostic-test-text-art-unicode-bw.c: New test. * gcc.dg/plugin/diagnostic-test-text-art-unicode-color.c: New test. * gcc.dg/plugin/diagnostic_plugin_test_text_art.c: New test plugin. * gcc.dg/plugin/plugin.exp (plugin_test_list): Add them. libcpp/ChangeLog: * charset.cc (get_cppchar_property): New function template, based on... (cpp_wcwidth): ...this function. Rework to use the above. Include "combining-chars.inc". (cpp_is_combining_char): New function Include "printable-chars.inc". (cpp_is_printable_char): New function * combining-chars.inc: New file, generated by contrib/unicode/gen-combining-chars.py. * include/cpplib.h (cpp_is_combining_char): New function decl. (cpp_is_printable_char): New function decl. * printable-chars.inc: New file, generated by contrib/unicode/gen-printable-chars.py. Signed-off-by: David Malcolm --- contrib/unicode/gen-box-drawing-chars.py | 94 ++++++++++++++++++++++++++++++++ contrib/unicode/gen-combining-chars.py | 75 +++++++++++++++++++++++++ contrib/unicode/gen-printable-chars.py | 77 ++++++++++++++++++++++++++ 3 files changed, 246 insertions(+) create mode 100755 contrib/unicode/gen-box-drawing-chars.py create mode 100755 contrib/unicode/gen-combining-chars.py create mode 100755 contrib/unicode/gen-printable-chars.py (limited to 'contrib/unicode') diff --git a/contrib/unicode/gen-box-drawing-chars.py b/contrib/unicode/gen-box-drawing-chars.py new file mode 100755 index 0000000..9a55266 --- /dev/null +++ b/contrib/unicode/gen-box-drawing-chars.py @@ -0,0 +1,94 @@ +#!/usr/bin/env python3 +# +# Script to generate gcc/text-art/box-drawing-chars.inc +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. +# +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +import unicodedata + +def get_box_drawing_char_name(up: bool, + down: bool, + left: bool, + right: bool) -> str: + if 0: + print(f'{locals()=}') + if up and down: + vertical = True + up = False + down = False + else: + vertical = False + + if left and right: + horizontal = True + left = False + right = False + else: + horizontal = False + + weights = [] + heavy = [] + light = [] + dirs = [] + for dir_name in ('up', 'down', 'vertical', 'left', 'right', 'horizontal'): + val = locals()[dir_name] + if val: + dirs.append(dir_name.upper()) + + if not dirs: + return 'SPACE' + + name = 'BOX DRAWINGS' + #print(f'{light=} {heavy=}') + + if 0: + print(dirs) + + def weights_frag(weight: str, dirs: list, prefix: bool): + """ + Generate a fragment where all directions share the same weight, e.g.: + 'HEAVY HORIZONTAL' + 'DOWN LIGHT' + 'LEFT DOWN HEAVY' + 'HEAVY DOWN AND RIGHT' + """ + assert len(dirs) >= 1 + assert len(dirs) <= 2 + if prefix: + return f' {weight} ' + (' AND '.join(dirs)) + else: + return ' ' + (' '.join(dirs)) + f' {weight}' + + assert(len(dirs) >= 1 and len(dirs) <= 2) + name += weights_frag('LIGHT', dirs, True) + + return name + +print('/* Generated by contrib/unicode/gen-box-drawing-chars.py. */') +print() +for i in range(16): + up = (i & 8) + down = (i & 4) + left = (i & 2) + right = (i & 1) + name = get_box_drawing_char_name(up, down, left, right) + if i < 15: + trailing_comma = ',' + else: + trailing_comma = ' ' + unichar = unicodedata.lookup(name) + print(f'0x{ord(unichar):04X}{trailing_comma} /* "{unichar}": U+{ord(unichar):04X}: {name} */') diff --git a/contrib/unicode/gen-combining-chars.py b/contrib/unicode/gen-combining-chars.py new file mode 100755 index 0000000..fb5ef50 --- /dev/null +++ b/contrib/unicode/gen-combining-chars.py @@ -0,0 +1,75 @@ +#!/usr/bin/env python3 +# +# Script to generate libcpp/combining-chars.inc +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. +# +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +from pprint import pprint +import unicodedata + +def is_combining_char(code_point) -> bool: + return unicodedata.combining(chr(code_point)) != 0 + +class Range: + def __init__(self, start, end, value): + self.start = start + self.end = end + self.value = value + + def __repr__(self): + return f'Range({self.start:x}, {self.end:x}, {self.value})' + +def make_ranges(value_callback): + ranges = [] + for code_point in range(0x10FFFF): + value = is_combining_char(code_point) + if 0: + print(f'{code_point=:x} {value=}') + if ranges and ranges[-1].value == value: + # Extend current range + ranges[-1].end = code_point + else: + # Start a new range + ranges.append(Range(code_point, code_point, value)) + return ranges + +ranges = make_ranges(is_combining_char) +if 0: + pprint(ranges) + +print(f"/* Generated by contrib/unicode/gen-combining-chars.py") +print(f" using version {unicodedata.unidata_version}" + " of the Unicode standard. */") +print("\nstatic const cppchar_t combining_range_ends[] = {", end="") +for i, r in enumerate(ranges): + if i % 8: + print(" ", end="") + else: + print("\n ", end="") + print("0x%x," % r.end, end="") +print("\n};\n") +print("static const bool is_combining[] = {", end="") +for i, r in enumerate(ranges): + if i % 24: + print(" ", end="") + else: + print("\n ", end="") + if r.value: + print("1,", end="") + else: + print("0,", end="") +print("\n};") diff --git a/contrib/unicode/gen-printable-chars.py b/contrib/unicode/gen-printable-chars.py new file mode 100755 index 0000000..7684c08 --- /dev/null +++ b/contrib/unicode/gen-printable-chars.py @@ -0,0 +1,77 @@ +#!/usr/bin/env python3 +# +# Script to generate libcpp/printable-chars.inc +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. +# +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . */ + +from pprint import pprint +import unicodedata + +def is_printable_char(code_point) -> bool: + category = unicodedata.category(chr(code_point)) + # "Cc" is "control" and "Cf" is "format" + return category[0] != 'C' + +class Range: + def __init__(self, start, end, value): + self.start = start + self.end = end + self.value = value + + def __repr__(self): + return f'Range({self.start:x}, {self.end:x}, {self.value})' + +def make_ranges(value_callback): + ranges = [] + for code_point in range(0x10FFFF): + value = is_printable_char(code_point) + if 0: + print(f'{code_point=:x} {value=}') + if ranges and ranges[-1].value == value: + # Extend current range + ranges[-1].end = code_point + else: + # Start a new range + ranges.append(Range(code_point, code_point, value)) + return ranges + +ranges = make_ranges(is_printable_char) +if 0: + pprint(ranges) + +print(f"/* Generated by contrib/unicode/gen-printable-chars.py") +print(f" using version {unicodedata.unidata_version}" + " of the Unicode standard. */") +print("\nstatic const cppchar_t printable_range_ends[] = {", end="") +for i, r in enumerate(ranges): + if i % 8: + print(" ", end="") + else: + print("\n ", end="") + print("0x%x," % r.end, end="") +print("\n};\n") +print("static const bool is_printable[] = {", end="") +for i, r in enumerate(ranges): + if i % 24: + print(" ", end="") + else: + print("\n ", end="") + if r.value: + print("1,", end="") + else: + print("0,", end="") +print("\n};") -- cgit v1.1