Byte vs column awareness for diagnostic-show-locus.c (PR 49973)

contrib/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * unicode/from_glibc/unicode_utils.py: Support script from glibc (commit 464cd3) to extract character widths from Unicode data files. * unicode/from_glibc/utf8_gen.py: Likewise. * unicode/UnicodeData.txt: Unicode v. 12.1.0 data file. * unicode/EastAsianWidth.txt: Likewise. * unicode/PropList.txt: Likewise. * unicode/gen_wcwidth.py: New utility to generate libcpp/generated_cpp_wcwidth.h with help from the glibc support scripts and the Unicode data files. * unicode/unicode-license.txt: Added. * unicode/README: New explanatory file. libcpp/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * generated_cpp_wcwidth.h: New file generated by ../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function. * charset.c (compute_next_display_width): New function to help implement display columns. (cpp_byte_column_to_display_column): Likewise. (cpp_display_column_to_byte_column): Likewise. (cpp_wcwidth): Likewise. * include/cpplib.h (cpp_byte_column_to_display_column): Declare. (cpp_display_column_to_byte_column): Declare. (cpp_wcwidth): Declare. (cpp_display_width): New function. gcc/ChangeLog 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * input.c (location_compute_display_column): New function to help with multibyte awareness in diagnostics. (test_cpp_utf8): New self-test. (input_c_tests): Call the new test. * input.h (location_compute_display_column): Declare. * diagnostic-show-locus.c: Pervasive changes to add multibyte awareness to all classes and functions. (enum column_unit): New enum. (class exploc_with_display_col): New class. (class layout_point): Convert m_column member to array m_columns[2]. (layout_range::contains_point): Add col_unit argument. (test_layout_range_for_single_point): Pass new argument. (test_layout_range_for_single_line): Likewise. (test_layout_range_for_multiple_lines): Likewise. (line_bounds::convert_to_display_cols): New function. (layout::get_state_at_point): Add col_unit argument. (make_range): Use empty filename rather than dummy filename. (get_line_width_without_trailing_whitespace): Rename to... (get_line_bytes_without_trailing_whitespace): ...this. (test_get_line_width_without_trailing_whitespace): Rename to... (test_get_line_bytes_without_trailing_whitespace): ...this. (class layout): m_exploc changed to exploc_with_display_col from plain expanded_location. (layout::get_linenum_width): New accessor member function. (layout::get_x_offset_display): Likewise. (layout::calculate_linenum_width): New subroutine for the constuctor. (layout::calculate_x_offset_display): Likewise. (layout::layout): Use the new subroutines. Add multibyte awareness. (layout::print_source_line): Add multibyte awareness. (layout::print_line): Likewise. (layout::print_annotation_line): Likewise. (line_label::line_label): Likewise. (layout::print_any_labels): Likewise. (layout::annotation_line_showed_range_p): Likewise. (get_printed_columns): Likewise. (class line_label): Rename m_length to m_display_width. (get_affected_columns): Rename to... (get_affected_range): ...this; add col_unit argument and multibyte awareness. (class correction): Add m_affected_bytes and m_display_cols members. Rename m_len to m_byte_length for clarity. Add multibyte awareness throughout. (correction::insertion_p): Add multibyte awareness. (correction::compute_display_cols): New function. (correction::ensure_terminated): Use new member name m_byte_length. (line_corrections::add_hint): Add multibyte awareness. (layout::print_trailing_fixits): Likewise. (layout::get_x_bound_for_row): Likewise. (test_one_liner_simple_caret_utf8): New self-test analogous to the one with _utf8 suffix removed, testing multibyte awareness. (test_one_liner_caret_and_range_utf8): Likewise. (test_one_liner_multiple_carets_and_ranges_utf8): Likewise. (test_one_liner_fixit_insert_before_utf8): Likewise. (test_one_liner_fixit_insert_after_utf8): Likewise. (test_one_liner_fixit_remove_utf8): Likewise. (test_one_liner_fixit_replace_utf8): Likewise. (test_one_liner_fixit_replace_non_equal_range_utf8): Likewise. (test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise. (test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise. (test_one_liner_many_fixits_1_utf8): Likewise. (test_one_liner_many_fixits_2_utf8): Likewise. (test_one_liner_labels_utf8): Likewise. (test_diagnostic_show_locus_one_liner_utf8): Likewise. (test_overlapped_fixit_printing_utf8): Likewise. (test_overlapped_fixit_printing): Adapt for changes to get_affected_columns, get_printed_columns and class corrections. (test_overlapped_fixit_printing_2): Likewise. (test_linenum_sep): New constant. (test_left_margin): Likewise. (test_offset_impl): Helper function for new test. (test_layout_x_offset_display_utf8): New test. (diagnostic_show_locus_c_tests): Call new tests. gcc/testsuite/ChangeLog: 2019-12-09 Lewis Hyatt <lhyatt@gmail.com> PR preprocessor/49973 * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Tweak so that expected output is the same as before the diagnostic-show-locus.c changes. * gcc.dg/cpp/pr66415-1.c: Likewise. From-SVN: r279137
author: Lewis Hyatt <lhyatt@gmail.com> 2019-12-09 20:03:47 +0000
committer: David Malcolm <dmalcolm@gcc.gnu.org> 2019-12-09 20:03:47 +0000
commit: ee9256409f21eab5df5076e46d220d6a0b995f79 (patch)
tree: 68053762905d3e64e86dc0db19b7d1f3d65b5ba8 /libcpp/charset.c
parent: 763c9f4a8544318998c7adf04e4c92e9a4b85614 (diff)
download: gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.zip
gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.gz
gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.bz2
1 files changed, 103 insertions, 0 deletions
diff --git a/libcpp/charset.c b/libcpp/charset.c
index d457441..956d2da 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -2265,3 +2265,106 @@ cpp_string_location_reader::get_next ()
     m_loc += m_offset_per_column;
   return result;
 }
+
+/* Helper for cpp_byte_column_to_display_column and its inverse.  Given a
+   pointer to a UTF-8-encoded character, compute its display width.  *INBUFP
+   points on entry to the start of the UTF-8 encoding of the character, and
+   is updated to point just after the last byte of the encoding.  *INBYTESLEFTP
+   contains on entry the remaining size of the buffer into which *INBUFP
+   points, and this is also updated accordingly.  If *INBUFP does not
+   point to a valid UTF-8-encoded sequence, then it will be treated as a single
+   byte with display width 1.  */
+
+static inline int
+compute_next_display_width (const uchar **inbufp, size_t *inbytesleftp)
+{
+  cppchar_t c;
+  if (one_utf8_to_cppchar (inbufp, inbytesleftp, &c) != 0)
+    {
+      /* Input is not convertible to UTF-8.  This could be fine, e.g. in a
+	 string literal, so don't complain.  Just treat it as if it has a width
+	 of one.  */
+      ++*inbufp;
+      --*inbytesleftp;
+      return 1;
+    }
+
+  /*  one_utf8_to_cppchar() has updated inbufp and inbytesleftp for us.  */
+  return cpp_wcwidth (c);
+}
+
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    how many display columns are occupied by the first COLUMN bytes.  COLUMN
+    may exceed DATA_LENGTH, in which case the phantom bytes at the end are
+    treated as if they have display width 1.  */
+
+int
+cpp_byte_column_to_display_column (const char *data, int data_length,
+				   int column)
+{
+  int display_col = 0;
+  const uchar *udata = (const uchar *) data;
+  const int offset = MAX (0, column - data_length);
+  size_t inbytesleft = column - offset;
+  while (inbytesleft)
+    display_col += compute_next_display_width (&udata, &inbytesleft);
+  return display_col + offset;
+}
+
+/*  For the string of length DATA_LENGTH bytes that begins at DATA, compute
+    the least number of bytes that will result in at least DISPLAY_COL display
+    columns.  The return value may exceed DATA_LENGTH if the entire string does
+    not occupy enough display columns.  */
+
+int
+cpp_display_column_to_byte_column (const char *data, int data_length,
+				   int display_col)
+{
+  int column = 0;
+  const uchar *udata = (const uchar *) data;
+  size_t inbytesleft = data_length;
+  while (column < display_col && inbytesleft)
+      column += compute_next_display_width (&udata, &inbytesleft);
+  return data_length - inbytesleft + MAX (0, display_col - column);
+}
+
+/* Our own version of wcwidth().  We don't use the actual wcwidth() in glibc,
+   because that will inspect the user's locale, and in particular in an ASCII
+   locale, it will not return anything useful for extended characters.  But GCC
+   in other respects (see e.g. _cpp_default_encoding()) behaves as if
+   everything is UTF-8.  We also make some tweaks that are useful for the way
+   GCC needs to use this data, e.g. tabs and other control characters should be
+   treated as having width 1.  The lookup tables are generated from
+   contrib/unicode/gen_wcwidth.py and were made by simply calling glibc
+   wcwidth() on all codepoints, then applying the small tweaks.  These tables
+   are not highly optimized, but for the present purpose of outputting
+   diagnostics, they are sufficient.  */
+
+#include "generated_cpp_wcwidth.h"
+int cpp_wcwidth (cppchar_t c)
+{
+  if (__builtin_expect (c <= wcwidth_range_ends[0], true))
+    return wcwidth_widths[0];
+
+  /* Binary search the tables.  */
+  int begin = 1;
+  static const int end
+      = sizeof wcwidth_range_ends / sizeof (*wcwidth_range_ends);
+  int len = end - begin;
+  do
+    {
+      int half = len/2;
+      int middle = begin + half;
+      if (c > wcwidth_range_ends[middle])
+	{
+	  begin = middle + 1;
+	  len -= half + 1;
+	}
+      else
+	len = half;
+    } while (len);
+
+  if (__builtin_expect (begin != end, true))
+    return wcwidth_widths[begin];
+  return 1;
+}
author	Lewis Hyatt <lhyatt@gmail.com>	2019-12-09 20:03:47 +0000
committer	David Malcolm <dmalcolm@gcc.gnu.org>	2019-12-09 20:03:47 +0000
commit	ee9256409f21eab5df5076e46d220d6a0b995f79 (patch)
tree	68053762905d3e64e86dc0db19b7d1f3d65b5ba8 /libcpp/charset.c
parent	763c9f4a8544318998c7adf04e4c92e9a4b85614 (diff)
download	gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.zip gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.gz gcc-ee9256409f21eab5df5076e46d220d6a0b995f79.tar.bz2