diff options
author | Lewis Hyatt <lhyatt@gmail.com> | 2019-12-09 20:03:47 +0000 |
---|---|---|
committer | David Malcolm <dmalcolm@gcc.gnu.org> | 2019-12-09 20:03:47 +0000 |
commit | ee9256409f21eab5df5076e46d220d6a0b995f79 (patch) | |
tree | 68053762905d3e64e86dc0db19b7d1f3d65b5ba8 /libcpp/charset.c | |
parent | 763c9f4a8544318998c7adf04e4c92e9a4b85614 (diff) | |
download | gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.zip gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.gz gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.bz2 |
Byte vs column awareness for diagnostic-show-locus.c (PR 49973)
contrib/ChangeLog
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* unicode/from_glibc/unicode_utils.py: Support script from
glibc (commit 464cd3) to extract character widths from Unicode data
files.
* unicode/from_glibc/utf8_gen.py: Likewise.
* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
* unicode/EastAsianWidth.txt: Likewise.
* unicode/PropList.txt: Likewise.
* unicode/gen_wcwidth.py: New utility to generate
libcpp/generated_cpp_wcwidth.h with help from the glibc support
scripts and the Unicode data files.
* unicode/unicode-license.txt: Added.
* unicode/README: New explanatory file.
libcpp/ChangeLog
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* generated_cpp_wcwidth.h: New file generated by
../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
* charset.c (compute_next_display_width): New function to help
implement display columns.
(cpp_byte_column_to_display_column): Likewise.
(cpp_display_column_to_byte_column): Likewise.
(cpp_wcwidth): Likewise.
* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
(cpp_display_column_to_byte_column): Declare.
(cpp_wcwidth): Declare.
(cpp_display_width): New function.
gcc/ChangeLog
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* input.c (location_compute_display_column): New function to help with
multibyte awareness in diagnostics.
(test_cpp_utf8): New self-test.
(input_c_tests): Call the new test.
* input.h (location_compute_display_column): Declare.
* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
to all classes and functions.
(enum column_unit): New enum.
(class exploc_with_display_col): New class.
(class layout_point): Convert m_column member to array m_columns[2].
(layout_range::contains_point): Add col_unit argument.
(test_layout_range_for_single_point): Pass new argument.
(test_layout_range_for_single_line): Likewise.
(test_layout_range_for_multiple_lines): Likewise.
(line_bounds::convert_to_display_cols): New function.
(layout::get_state_at_point): Add col_unit argument.
(make_range): Use empty filename rather than dummy filename.
(get_line_width_without_trailing_whitespace): Rename to...
(get_line_bytes_without_trailing_whitespace): ...this.
(test_get_line_width_without_trailing_whitespace): Rename to...
(test_get_line_bytes_without_trailing_whitespace): ...this.
(class layout): m_exploc changed to exploc_with_display_col from
plain expanded_location.
(layout::get_linenum_width): New accessor member function.
(layout::get_x_offset_display): Likewise.
(layout::calculate_linenum_width): New subroutine for the constuctor.
(layout::calculate_x_offset_display): Likewise.
(layout::layout): Use the new subroutines. Add multibyte awareness.
(layout::print_source_line): Add multibyte awareness.
(layout::print_line): Likewise.
(layout::print_annotation_line): Likewise.
(line_label::line_label): Likewise.
(layout::print_any_labels): Likewise.
(layout::annotation_line_showed_range_p): Likewise.
(get_printed_columns): Likewise.
(class line_label): Rename m_length to m_display_width.
(get_affected_columns): Rename to...
(get_affected_range): ...this; add col_unit argument and multibyte
awareness.
(class correction): Add m_affected_bytes and m_display_cols
members. Rename m_len to m_byte_length for clarity. Add multibyte
awareness throughout.
(correction::insertion_p): Add multibyte awareness.
(correction::compute_display_cols): New function.
(correction::ensure_terminated): Use new member name m_byte_length.
(line_corrections::add_hint): Add multibyte awareness.
(layout::print_trailing_fixits): Likewise.
(layout::get_x_bound_for_row): Likewise.
(test_one_liner_simple_caret_utf8): New self-test analogous to the one
with _utf8 suffix removed, testing multibyte awareness.
(test_one_liner_caret_and_range_utf8): Likewise.
(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
(test_one_liner_fixit_insert_before_utf8): Likewise.
(test_one_liner_fixit_insert_after_utf8): Likewise.
(test_one_liner_fixit_remove_utf8): Likewise.
(test_one_liner_fixit_replace_utf8): Likewise.
(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
(test_one_liner_many_fixits_1_utf8): Likewise.
(test_one_liner_many_fixits_2_utf8): Likewise.
(test_one_liner_labels_utf8): Likewise.
(test_diagnostic_show_locus_one_liner_utf8): Likewise.
(test_overlapped_fixit_printing_utf8): Likewise.
(test_overlapped_fixit_printing): Adapt for changes to
get_affected_columns, get_printed_columns and class corrections.
(test_overlapped_fixit_printing_2): Likewise.
(test_linenum_sep): New constant.
(test_left_margin): Likewise.
(test_offset_impl): Helper function for new test.
(test_layout_x_offset_display_utf8): New test.
(diagnostic_show_locus_c_tests): Call new tests.
gcc/testsuite/ChangeLog:
2019-12-09 Lewis Hyatt <lhyatt@gmail.com>
PR preprocessor/49973
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
(test_show_locus): Tweak so that expected output is the same as
before the diagnostic-show-locus.c changes.
* gcc.dg/cpp/pr66415-1.c: Likewise.
From-SVN: r279137
Diffstat (limited to 'libcpp/charset.c')
-rw-r--r-- | libcpp/charset.c | 103 |
1 files changed, 103 insertions, 0 deletions
diff --git a/libcpp/charset.c b/libcpp/charset.c index d457441..956d2da 100644 --- a/libcpp/charset.c +++ b/libcpp/charset.c @@ -2265,3 +2265,106 @@ cpp_string_location_reader::get_next () m_loc += m_offset_per_column; return result; } + +/* Helper for cpp_byte_column_to_display_column and its inverse. Given a + pointer to a UTF-8-encoded character, compute its display width. *INBUFP + points on entry to the start of the UTF-8 encoding of the character, and + is updated to point just after the last byte of the encoding. *INBYTESLEFTP + contains on entry the remaining size of the buffer into which *INBUFP + points, and this is also updated accordingly. If *INBUFP does not + point to a valid UTF-8-encoded sequence, then it will be treated as a single + byte with display width 1. */ + +static inline int +compute_next_display_width (const uchar **inbufp, size_t *inbytesleftp) +{ + cppchar_t c; + if (one_utf8_to_cppchar (inbufp, inbytesleftp, &c) != 0) + { + /* Input is not convertible to UTF-8. This could be fine, e.g. in a + string literal, so don't complain. Just treat it as if it has a width + of one. */ + ++*inbufp; + --*inbytesleftp; + return 1; + } + + /* one_utf8_to_cppchar() has updated inbufp and inbytesleftp for us. */ + return cpp_wcwidth (c); +} + +/* For the string of length DATA_LENGTH bytes that begins at DATA, compute + how many display columns are occupied by the first COLUMN bytes. COLUMN + may exceed DATA_LENGTH, in which case the phantom bytes at the end are + treated as if they have display width 1. */ + +int +cpp_byte_column_to_display_column (const char *data, int data_length, + int column) +{ + int display_col = 0; + const uchar *udata = (const uchar *) data; + const int offset = MAX (0, column - data_length); + size_t inbytesleft = column - offset; + while (inbytesleft) + display_col += compute_next_display_width (&udata, &inbytesleft); + return display_col + offset; +} + +/* For the string of length DATA_LENGTH bytes that begins at DATA, compute + the least number of bytes that will result in at least DISPLAY_COL display + columns. The return value may exceed DATA_LENGTH if the entire string does + not occupy enough display columns. */ + +int +cpp_display_column_to_byte_column (const char *data, int data_length, + int display_col) +{ + int column = 0; + const uchar *udata = (const uchar *) data; + size_t inbytesleft = data_length; + while (column < display_col && inbytesleft) + column += compute_next_display_width (&udata, &inbytesleft); + return data_length - inbytesleft + MAX (0, display_col - column); +} + +/* Our own version of wcwidth(). We don't use the actual wcwidth() in glibc, + because that will inspect the user's locale, and in particular in an ASCII + locale, it will not return anything useful for extended characters. But GCC + in other respects (see e.g. _cpp_default_encoding()) behaves as if + everything is UTF-8. We also make some tweaks that are useful for the way + GCC needs to use this data, e.g. tabs and other control characters should be + treated as having width 1. The lookup tables are generated from + contrib/unicode/gen_wcwidth.py and were made by simply calling glibc + wcwidth() on all codepoints, then applying the small tweaks. These tables + are not highly optimized, but for the present purpose of outputting + diagnostics, they are sufficient. */ + +#include "generated_cpp_wcwidth.h" +int cpp_wcwidth (cppchar_t c) +{ + if (__builtin_expect (c <= wcwidth_range_ends[0], true)) + return wcwidth_widths[0]; + + /* Binary search the tables. */ + int begin = 1; + static const int end + = sizeof wcwidth_range_ends / sizeof (*wcwidth_range_ends); + int len = end - begin; + do + { + int half = len/2; + int middle = begin + half; + if (c > wcwidth_range_ends[middle]) + { + begin = middle + 1; + len -= half + 1; + } + else + len = half; + } while (len); + + if (__builtin_expect (begin != end, true)) + return wcwidth_widths[begin]; + return 1; +} |