7 Commits

Author SHA1 Message Date
Jakub Jelinek
0c0847158c Update to Unicode 17.0.0
The following patch updates GCC from Unicode 16.0.0 to 17.0.0.

I've followed what the README says and updated also one script from
glibc, but that needed another Unicode file - HangulSyllableType.txt -
around as well, so I'm adding it.
I've added one new test to named-universal-char-escape-1.c for
randomly chosen character from new CJK block.
Note, Unicode 17.0.0 authors forgot to adjust the 4-8 table, I've filed
bugreports about that but the UnicodeData.txt changes for the range ends
and the new range seems to match e.g. what is in the glyph tables, so
the patch follows UnicodeData.txt and not 4-8 table here.

Another thing was that makeuname2c.cc didn't handle correctly when
the size of the generated string table modulo 77 was 76 or 77, in which
case it forgot to emit a semicolon after the string literal and so failed
to compile.

And as can be seen in the emoji-data.txt diff, some properties like
Extended_Pictographic have been removed from certain characters, e.g.
from the Mahjong cards characters except U+1F004, and one libstdc++
test was testing that property exactly on U+1F000.  Dunno why that was
changed, but U+1F004 is the only colored one among tons of black and white
ones.

2025-10-08  Jakub Jelinek  <jakub@redhat.com>

contrib/
	* unicode/README: Add HangulSyllableType.txt file to the
	list as newest utf8_gen.py from glibc now needs it.  Adjust
	git commit hash and change unicode 16 version to 17.
	* unicode/from_glibc/utf8_gen.py: Updated from glibc.
	* unicode/DerivedCoreProperties.txt: Updated from Unicode 17.0.0.
	* unicode/emoji-data.txt: Likewise.
	* unicode/PropList.txt: Likewise.
	* unicode/GraphemeBreakProperty.txt: Likewise.
	* unicode/DerivedNormalizationProps.txt: Likewise.
	* unicode/NameAliases.txt: Likewise.
	* unicode/UnicodeData.txt: Likewise.
	* unicode/EastAsianWidth.txt: Likewise.
	* unicode/DerivedGeneralCategory.txt: Likewise.
	* unicode/HangulSyllableType.txt: New file.
gcc/testsuite/
	* c-c++-common/cpp/named-universal-char-escape-1.c: Add test for
	\N{CJK UNIFIED IDEOGRAPH-3340E}.
libcpp/
	* makeucnid.cc (write_copyright): Adjust copyright year.
	* makeuname2c.cc (generated_ranges): Adjust end points for a couple
	of ranges based on UnicodeData.txt Last changes and add a whole new
	CJK UNIFIED IDEOGRAPH- entry.  None of these changes are in the 4-8
	table, but clearly it has just been forgotten.
	(write_copyright): Adjust copyright year.
	(write_dict): Fix up condition when to print semicolon.
	* generated_cpp_wcwidth.h: Regenerate.
	* ucnid.h: Regenerate.
	* uname2c.h: Regenerate.
libstdc++-v3/
	* include/bits/unicode-data.h: Regenerate.
	* testsuite/ext/unicode/properties.cc: Test __is_extended_pictographic
	on U+1F004 rather than U+1F000.
2025-10-08 18:02:39 +02:00
Jakub Jelinek
d0e8f58b81 contrib, libcpp, libstdc++: Update to Unicode 16.0
It is autumn again and there is a new Unicode version 16.0.

The following patch updates our Unicode stuff in contrib, libcpp and
libstdc++ from that Unicode version.

2024-10-08  Jakub Jelinek  <jakub@redhat.com>

contrib/
	* unicode/README: Update glibc git commit hash, replace
	Unicode 15 or 15.1 versions with 16.
	* unicode/gen_libstdcxx_unicode_data.py: Use 160000 instead of
	150100 in _GLIBCXX_GET_UNICODE_DATA test.
	* unicode/from_glibc/utf8_gen.py: Updated from glibc
	064c708c78cc2a6b5802dce73108fc0c1c6bfc80 commit.
	* unicode/DerivedCoreProperties.txt: Updated from Unicode 16.0.
	* unicode/emoji-data.txt: Likewise.
	* unicode/PropList.txt: Likewise.
	* unicode/GraphemeBreakProperty.txt: Likewise.
	* unicode/DerivedNormalizationProps.txt: Likewise.
	* unicode/NameAliases.txt: Likewise.
	* unicode/UnicodeData.txt: Likewise.
	* unicode/EastAsianWidth.txt: Likewise.
gcc/testsuite/
	* c-c++-common/cpp/named-universal-char-escape-1.c: Add tests
	for some Unicode 16.0 characters, both normal and generated.
libcpp/
	* makeucnid.cc (write_copyright): Update Unicode Copyright years.
	* makeuname2c.cc (generated_ranges): Adjust Unicode version from 15.1
	to 16.0.  Add EGYPTIAN HIEROGLYPH- generated range, adjust indexes in
	following entries.
	(write_copyright): Update Unicode Copyright years.
	* generated_cpp_wcwidth.h: Regenerated.
	* ucnid.h: Regenerated.
	* uname2c.h: Regenerated.
libstdc++-v3/
	* include/bits/unicode.h (std::__unicode::__v15_1_0): Rename inline
	namespace to ...
	(std::__unicode::__v16_0_0): ... this.
	(_GLIBCXX_GET_UNICODE_DATA): Change from 150100 to 160000.
	* include/bits/unicode-data.h: Regenerated.
	* testsuite/ext/unicode/properties.cc: Check for _Gcb_SpacingMark
	on U+11F03 rather than U+1D16D as the latter lost SpacingMark property
	in Unicode 16.0.
2024-10-08 10:01:47 +02:00
Jakub Jelinek
d64b7c82da libcpp, contrib: Update to Unicode 15.1
The following patch (in plaintext just a pseudo-patch where I've left out
the too big parts of either wget downloaded or regenerated files out with
..., full patch attached compressed) updates to Unicode 15.1 from 15.0
we had last year.  Apparently Unicode forgot to add a new range to 4-8 Table
we are using, but from the other files it is clear what should have been
added; I've filed a bugreport against Unicode.

2023-11-14  Jakub Jelinek  <jakub@redhat.com>

contrib/
	* unicode/README: Adjust glibc git commit hash, number of Unicode
	data files to be updated and latest Unicode version.
	* unicode/from_glibc/utf8_gen.py: Update from glibc.
	* unicode/UnicodeData.txt: Update from Unicode 15.1.
	* unicode/EastAsianWidth.txt: Likewise.
	* unicode/DerivedNormalizationProps.txt: Likewise.
	* unicode/NameAliases.txt: Likewise.
	* unicode/DerivedCoreProperties.txt: Likewise.
	* unicode/PropList.txt: Likewise.
libcpp/
	* makeucnid.cc (write_copyright): Update copyright year.
	* makeuname2c.cc (write_copyright): Likewise.
	(struct generated): Update latest Unicode version.
	(generated_ranges): Add 2ebf0-2ee5d CJK UNIFIED IDEOGRAPH
	range which was forgotten to be added to 4-8 table, but
	clearly is expected to be there from the 15.1 additions.
	* ucnid.h: Regenerated.
	* uname2c.h: Regenerated.
	* generated_cpp_wcwidth.h: Regenerated.
2023-11-14 18:32:37 +01:00
Lewis Hyatt
73dd5c6c88 libcpp: Update cpp_wcwidth() to Unicode 15
Updates cpp_wcwidth() to Unicode 15, following the procedure in
contrib/unicode/README mechanically without incident.

contrib/ChangeLog:

	* unicode/DerivedCoreProperties.txt: Update to Unicode 15.
	* unicode/DerivedNormalizationProps.txt: Likewise.
	* unicode/EastAsianWidth.txt: Likwise.
	* unicode/PropList.txt: Likewise.
	* unicode/README: Likewise.
	* unicode/UnicodeData.txt: Likewise.

libcpp/ChangeLog:

	* generated_cpp_wcwidth.h: Regenerated for Unicode 15.
2023-03-13 07:40:50 -04:00
Lewis Hyatt
57988cbe73 libcpp: Update cpp_wcwidth() to Unicode 14.0.0
The procedure detailed in contrib/unicode/README was followed with nothing
notable coming up. The glibc scripts did not require any update, so the
only change was retrieving new versions of the Unicode data files and
rerunning gen_wcwidth.py.

contrib/ChangeLog:

	* unicode/EastAsianWidth.txt: Update to Unicode 14.0.0.
	* unicode/PropList.txt: Likewise.
	* unicode/README: Likewise.
	* unicode/UnicodeData.txt: Likewise.

libcpp/ChangeLog:

	* generated_cpp_wcwidth.h: Generated from updated Unicode data files.
2022-06-26 14:13:26 -04:00
Lewis Hyatt
497c9f8d4d libcpp: Update cpp_wcwidth() to Unicode 13.0.0
generated_cpp_wcwidth.h was regenerated using Unicode 13.0.0 data files. No
material changes to the parsing scripts (either GCC- or glibc-sourced) were
necessary; glibc's utf8_gen.py was tweaked slightly by glibc and matched here.

contrib/ChangeLog:

	* unicode/EastAsianWidth.txt: Update to Unicode 13.0.0.
	* unicode/PropList.txt: Likewise.
	* unicode/README: Likewise.
	* unicode/UnicodeData.txt: Likewise.
	* unicode/from_glibc/unicode_utils.py: Update to latest glibc version.
	* unicode/from_glibc/utf8_gen.py: Likewise.

libcpp/ChangeLog:

	* generated_cpp_wcwidth.h: Regenerated from Unicode 13.0.0 data.
2020-11-07 09:36:43 -05:00
Lewis Hyatt
ee9256409f Byte vs column awareness for diagnostic-show-locus.c (PR 49973)
contrib/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* unicode/from_glibc/unicode_utils.py: Support script from
	glibc (commit 464cd3) to extract character widths from Unicode data
	files.
	* unicode/from_glibc/utf8_gen.py: Likewise.
	* unicode/UnicodeData.txt: Unicode v. 12.1.0 data file.
	* unicode/EastAsianWidth.txt: Likewise.
	* unicode/PropList.txt: Likewise.
	* unicode/gen_wcwidth.py: New utility to generate
	libcpp/generated_cpp_wcwidth.h with help from the glibc support
	scripts and the Unicode data files.
	* unicode/unicode-license.txt: Added.
	* unicode/README: New explanatory file.

libcpp/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* generated_cpp_wcwidth.h: New file generated by
	../contrib/unicode/gen_wcwidth.py, supports new cpp_wcwidth function.
	* charset.c (compute_next_display_width): New function to help
	implement display columns.
	(cpp_byte_column_to_display_column): Likewise.
	(cpp_display_column_to_byte_column): Likewise.
	(cpp_wcwidth): Likewise.
	* include/cpplib.h (cpp_byte_column_to_display_column): Declare.
	(cpp_display_column_to_byte_column): Declare.
	(cpp_wcwidth): Declare.
	(cpp_display_width): New function.

gcc/ChangeLog

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* input.c (location_compute_display_column): New function to help with
	multibyte awareness in diagnostics.
	(test_cpp_utf8): New self-test.
	(input_c_tests): Call the new test.
	* input.h (location_compute_display_column): Declare.
	* diagnostic-show-locus.c: Pervasive changes to add multibyte awareness
	to all classes and functions.
	(enum column_unit): New enum.
	(class exploc_with_display_col): New class.
	(class layout_point): Convert m_column member to array m_columns[2].
	(layout_range::contains_point): Add col_unit argument.
	(test_layout_range_for_single_point): Pass new argument.
	(test_layout_range_for_single_line): Likewise.
	(test_layout_range_for_multiple_lines): Likewise.
	(line_bounds::convert_to_display_cols): New function.
	(layout::get_state_at_point): Add col_unit argument.
	(make_range): Use empty filename rather than dummy filename.
	(get_line_width_without_trailing_whitespace): Rename to...
	(get_line_bytes_without_trailing_whitespace): ...this.
	(test_get_line_width_without_trailing_whitespace): Rename to...
	(test_get_line_bytes_without_trailing_whitespace): ...this.
	(class layout): m_exploc changed to exploc_with_display_col from
	plain expanded_location.
	(layout::get_linenum_width): New accessor member function.
	(layout::get_x_offset_display): Likewise.
	(layout::calculate_linenum_width): New subroutine for the constuctor.
	(layout::calculate_x_offset_display): Likewise.
	(layout::layout): Use the new subroutines. Add multibyte awareness.
	(layout::print_source_line): Add multibyte awareness.
	(layout::print_line): Likewise.
	(layout::print_annotation_line): Likewise.
	(line_label::line_label): Likewise.
	(layout::print_any_labels): Likewise.
	(layout::annotation_line_showed_range_p): Likewise.
	(get_printed_columns): Likewise.
	(class line_label): Rename m_length to m_display_width.
	(get_affected_columns): Rename to...
	(get_affected_range): ...this; add col_unit argument and multibyte
	awareness.
	(class correction): Add m_affected_bytes and m_display_cols
	members.  Rename m_len to m_byte_length for clarity.  Add multibyte
	awareness throughout.
	(correction::insertion_p): Add multibyte awareness.
	(correction::compute_display_cols): New function.
	(correction::ensure_terminated): Use new member name m_byte_length.
	(line_corrections::add_hint): Add multibyte awareness.
	(layout::print_trailing_fixits): Likewise.
	(layout::get_x_bound_for_row): Likewise.
	(test_one_liner_simple_caret_utf8): New self-test analogous to the one
	with _utf8 suffix removed, testing multibyte awareness.
	(test_one_liner_caret_and_range_utf8): Likewise.
	(test_one_liner_multiple_carets_and_ranges_utf8): Likewise.
	(test_one_liner_fixit_insert_before_utf8): Likewise.
	(test_one_liner_fixit_insert_after_utf8): Likewise.
	(test_one_liner_fixit_remove_utf8): Likewise.
	(test_one_liner_fixit_replace_utf8): Likewise.
	(test_one_liner_fixit_replace_non_equal_range_utf8): Likewise.
	(test_one_liner_fixit_replace_equal_secondary_range_utf8): Likewise.
	(test_one_liner_fixit_validation_adhoc_locations_utf8): Likewise.
	(test_one_liner_many_fixits_1_utf8): Likewise.
	(test_one_liner_many_fixits_2_utf8): Likewise.
	(test_one_liner_labels_utf8): Likewise.
	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
	(test_overlapped_fixit_printing_utf8): Likewise.
	(test_overlapped_fixit_printing): Adapt for changes to
	get_affected_columns, get_printed_columns and class corrections.
	(test_overlapped_fixit_printing_2): Likewise.
	(test_linenum_sep): New constant.
	(test_left_margin): Likewise.
	(test_offset_impl): Helper function for new test.
	(test_layout_x_offset_display_utf8): New test.
	(diagnostic_show_locus_c_tests): Call new tests.

gcc/testsuite/ChangeLog:

2019-12-09  Lewis Hyatt  <lhyatt@gmail.com>

	PR preprocessor/49973
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
	(test_show_locus): Tweak so that expected output is the same as
	before the diagnostic-show-locus.c changes.
	* gcc.dg/cpp/pr66415-1.c: Likewise.

From-SVN: r279137
2019-12-09 20:03:47 +00:00