it turns out that half of the global decl stream of cc1 LTO build consits
TREE_LISTS, identifiers and integer cosntats representing TYPE_VALUES of enums.
Those are streamed only to produce ODR warning and used otherwise, so this
patch moves the info to a separate section that is represented and streamed
more effectively.
This also adds place for more info that may be used for ODR diagnostics
(i.e. at the moment we do not warn when the declarations differs i.e. by the
associated member functions and their types) and the type inheritance graph
rather then poluting the global stream.
I was bit unsure what enums we want to store into the section. All parsed
enums is probably too expensive, only those enums streamed to represent IL is
bit hard to get, so I went for those seen by free lang data.
As a plus we now get bit more precise warning because also the location of
mismatched enum CONST_DECL is streamed.
It changes:
[WPA] read 4608466 unshared trees
[WPA] read 2942094 mergeable SCCs of average size 1.365328
[WPA] 8625389 tree bodies read in total
[WPA] tree SCC table: size 524287, 247652 elements, collision ratio: 0.383702
[WPA] tree SCC max chain length 2 (size 1)
[WPA] Compared 2694442 SCCs, 228 collisions (0.000085)
[WPA] Merged 2694419 SCCs
[WPA] Merged 3731982 tree bodies
[WPA] Merged 633335 types
[WPA] 122077 types prevailed (155548 associated trees)
...
[WPA] Compression: 110593119 input bytes, 287696614 uncompressed bytes (ratio: 2.601397)
[WPA] Size of mmap'd section decls: 85628556 bytes
[WPA] Size of mmap'd section function_body: 13842928 bytes
[WPA] read 1720989 unshared trees
[WPA] read 1252217 mergeable SCCs of average size 1.858507
[WPA] 4048243 tree bodies read in total
[WPA] tree SCC table: size 524287, 226524 elements, collision ratio: 0.491759
[WPA] tree SCC max chain length 2 (size 1)
[WPA] Compared 1025693 SCCs, 196 collisions (0.000191)
[WPA] Merged 1025670 SCCs
[WPA] Merged 2063373 tree bodies
[WPA] Merged 633497 types
[WPA] 122299 types prevailed (155827 associated trees)
...
[WPA] Compression: 103428770 input bytes, 281151423 uncompressed bytes (ratio: 2.718310)
[WPA] Size of mmap'd section decls: 49390917 bytes
[WPA] Size of mmap'd section function_body: 13858258 bytes
...
[WPA] Size of mmap'd section odr_types: 29054816 bytes
So number of SCCs streamed drops to 38% and the number of unshared trees (that
are bit misnamed since it is mostly integer_cst) to 37%.
Things speeds up correspondingly, but I did not save time report from previous
build.
The enum values are still quite surprisingly large. I may take a look into
ways getting it smaller incrementally, but it streams reasonably fast:
Time variable usr sys wall GGC
phase opt and generate : 25.20 ( 68%) 10.88 ( 72%) 36.13 ( 69%) 868060 kB ( 52%)
phase stream in : 4.46 ( 12%) 0.90 ( 6%) 5.38 ( 10%) 790724 kB ( 48%)
phase stream out : 6.69 ( 18%) 3.32 ( 22%) 10.03 ( 19%) 8 kB ( 0%)
ipa lto gimple in : 0.79 ( 2%) 1.86 ( 12%) 2.39 ( 5%) 252612 kB ( 15%)
ipa lto gimple out : 2.48 ( 7%) 0.78 ( 5%) 3.26 ( 6%) 0 kB ( 0%)
ipa lto decl in : 1.71 ( 5%) 0.46 ( 3%) 2.34 ( 4%) 417883 kB ( 25%)
ipa lto decl out : 3.28 ( 9%) 0.07 ( 0%) 3.27 ( 6%) 0 kB ( 0%)
whopr wpa I/O : 0.40 ( 1%) 2.24 ( 15%) 2.77 ( 5%) 8 kB ( 0%)
lto stream decompression : 1.38 ( 4%) 0.31 ( 2%) 1.36 ( 3%) 0 kB ( 0%)
ipa ODR types : 0.18 ( 0%) 0.02 ( 0%) 0.25 ( 0%) 0 kB ( 0%)
ipa inlining heuristics : 11.64 ( 31%) 1.45 ( 10%) 13.12 ( 25%) 453160 kB ( 27%)
ipa pure const : 1.74 ( 5%) 0.00 ( 0%) 1.76 ( 3%) 0 kB ( 0%)
ipa icf : 1.72 ( 5%) 5.33 ( 35%) 7.06 ( 13%) 16593 kB ( 1%)
whopr partitioning : 2.22 ( 6%) 0.01 ( 0%) 2.23 ( 4%) 5689 kB ( 0%)
TOTAL : 37.17 15.20 52.46 1660886 kB
LTO-bootstrapped/regtested x86_64-linux, will comit it shortly.
gcc/ChangeLog:
2020-06-03 Jan Hubicka <hubicka@ucw.cz>
* ipa-devirt.c: Include data-streamer.h, lto-streamer.h and
streamer-hooks.h.
(odr_enums): New static var.
(struct odr_enum_val): New struct.
(class odr_enum): New struct.
(odr_enum_map): New hashtable.
(odr_types_equivalent_p): Drop code testing TYPE_VALUES.
(add_type_duplicate): Likewise.
(free_odr_warning_data): Do not free TYPE_VALUES.
(register_odr_enum): New function.
(ipa_odr_summary_write): New function.
(ipa_odr_read_section): New function.
(ipa_odr_summary_read): New function.
(class pass_ipa_odr): New pass.
(make_pass_ipa_odr): New function.
* ipa-utils.h (register_odr_enum): Declare.
* lto-section-in.c: (lto_section_name): Add odr_types section.
* lto-streamer.h (enum lto_section_type): Add odr_types section.
* passes.def: Add odr_types pass.
* lto-streamer-out.c (DFS::DFS_write_tree_body): Do not stream
TYPE_VALUES.
(hash_tree): Likewise.
* tree-streamer-in.c (lto_input_ts_type_non_common_tree_pointers):
Likewise.
* tree-streamer-out.c (write_ts_type_non_common_tree_pointers):
Likewise.
* timevar.def (TV_IPA_ODR): New timervar.
* tree-pass.h (make_pass_ipa_odr): Declare.
* tree.c (free_lang_data_in_type): Regiser ODR types.
gcc/lto/ChangeLog:
2020-06-03 Jan Hubicka <hubicka@ucw.cz>
* lto-common.c (compare_tree_sccs_1): Do not compare TYPE_VALUES.
gcc/testsuite/ChangeLog:
2020-06-03 Jan Hubicka <hubicka@ucw.cz>
* g++.dg/lto/pr84805_0.C: Update.
(cherry picked from commit 3fb68f2e66)
Fix typo.
this patch avoids stremaing completely useless stray references to gobal decl
stream. I am re-testing the patch (rebased to current tree) on x86_64-linux
and intend to commit once testing finishes.
gcc/ChangeLog:
2020-05-22 Jan Hubicka <hubicka@ucw.cz>
* lto-streamer-out.c (lto_output_tree): Do not stream final ref if
it is not needed.
gcc/lto/ChangeLog:
2020-05-22 Jan Hubicka <hubicka@ucw.cz>
* lto-common.c (lto_read_decls): Do not skip stray refs.
(cherry picked from commit bcb63eb2cb)
(cherry picked from commit 57400cf273f8052c601d90d86a47705faa17aaa9)
This is new incarantion of patch to identify unmergeable tree at streaming out
time rather than streaming in and to avoid pickling them to sccs with with hash
codes.
Building cc1 plus this patch reduces:
[WPA] read 4452927 SCCs of average size 1.986030
[WPA] 8843646 tree bodies read in total
[WPA] tree SCC table: size 524287, 205158 elements, collision ratio: 0.505204
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 947551 SCCs, 780270 collisions (0.823460)
[WPA] Merged 944038 SCCs
[WPA] Merged 5253521 tree bodies
[WPA] Merged 590027 types
...
[WPA] Size of mmap'd section decls: 99229066 bytes
[WPA] Size of mmap'd section function_body: 18398837 bytes
[WPA] Size of mmap'd section refs: 733678 bytes
[WPA] Size of mmap'd section jmpfuncs: 2965981 bytes
[WPA] Size of mmap'd section pureconst: 170248 bytes
[WPA] Size of mmap'd section profile: 17985 bytes
[WPA] Size of mmap'd section symbol_nodes: 3392736 bytes
[WPA] Size of mmap'd section inline: 2693920 bytes
[WPA] Size of mmap'd section icf: 435557 bytes
[WPA] Size of mmap'd section offload_table: 0 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651660 bytes
... to ...
[WPA] read 3312246 unshared trees
[WPA] read 1144381 mergeable SCCs of average size 4.833785
[WPA] 8843938 tree bodies read in total
[WPA] tree SCC table: size 524287, 197767 elements, collision ratio: 0.506446
[WPA] tree SCC max chain length 43 (size 1)
[WPA] Compared 946614 SCCs, 775077 collisions (0.818789)
[WPA] Merged 943798 SCCs
[WPA] Merged 5253336 tree bodies
[WPA] Merged 590105 types
....
[WPA] Size of mmap'd section decls: 81262144 bytes
[WPA] Size of mmap'd section function_body: 14702611 bytes
[WPA] Size of mmap'd section ext_symtab: 0 bytes
[WPA] Size of mmap'd section refs: 733695 bytes
[WPA] Size of mmap'd section jmpfuncs: 2332150 bytes
[WPA] Size of mmap'd section pureconst: 170292 bytes
[WPA] Size of mmap'd section profile: 17986 bytes
[WPA] Size of mmap'd section symbol_nodes: 3393358 bytes
[WPA] Size of mmap'd section inline: 2567939 bytes
[WPA] Size of mmap'd section icf: 435633 bytes
[WPA] Size of mmap'd section lto: 4320 bytes
[WPA] Size of mmap'd section ipa_sra: 651824 bytes
so results in about 22% reduction in global decl stream and 24% reduction on
function bodies stream (which is read mostly by ICF)
Martin, the zstd compression breaks the compression statistics (it works when
GCC is configured for zlib)
At first ltrans I get:
[LTRANS] Size of mmap'd section decls: 3734248 bytes
[LTRANS] Size of mmap'd section function_body: 4895962 bytes
... to ...
[LTRANS] Size of mmap'd section decls: 3479850 bytes
[LTRANS] Size of mmap'd section function_body: 3722935 bytes
So 7% reduction of global stream and 31% reduction of function bodies.
Stream in seems to get about 3% faster and stream out about 5% but it is
close to noise factor of my experiment. I expect bigger speedups on
Firefox but I did not test it today since my Firefox setup broke again.
GCC is not very good example on the problem with anonymous namespace
types since we do not have so many of them.
Sice of object files in gcc directory is reduced by 11% (because hash
numbers do not compress well I guess).
The patch makes DFS walk to recognize trees that are not merged (anonymous
namespace, local function/variable decls, anonymous types etc). As discussed
on IRC this is now done during the SCC walk rather than during the hash
computation. When local tree is discovered we know that SCC components of everything that is on
the stack reffers to it and thus is also local. Moreover we mark trees into hash set in output block
so if we get a cross edge referring to local tree it gets marked too.
Patch also takes care of avoiding SCC wrappers around some trees. In particular
1) singleton unmergeable SCCs are now streamed inline in global decl stream
This includes INTEGER_CSTs and IDENTIFIER_NODEs that are shared by different
code than rest of tree merging.
2) We use LTO_trees instead of LTO_tree_scc to wrap unmergeable SCC components.
It is still necessary to mark them because of forward references. LTO_trees
has simple header with number of trees and then things are streamed same way
as for LTO_tree_scc. That is tree headers first followed by pickled references
so things may point to future.
Of course it is not necessary for LTO_tree_scc to be single component and
streamer out may group more components together, but I decided to not snowball
the patch even more
3) In local streams when lto_output_tree is called and the topmost SCC components
turns out to be singleton we stream the tree directly
instead of LTO_tree_scc, hash code, pickled tree, reference to just stremaed tree.
LTO_trees is used to wrap all trees needed to represent tree being streamed.
It would make sense again to use only one LTO_trees rather than one per SCC
but I think this can be done incrementally.
In general local trees are now recognized by new predicate local_tree_p
Bit subtle is handing of TRANLSATION_UNIT_DECL, INTEGER_CST and
IDENTIFIER_NODE.
TRANSLATION_UNIT_DECL a local tree but references to it does not make
other trees local (because we also understand local decls now).
So I check for it later after localness propagation is done.
INTEGER_CST and IDENTIFIER_NODEs are merged but not via the tree merging
machinery. So it makes sense to stream them as unmergeable trees but we
still need to compute their hashes so SCCs referring them do not get too
large collision chains. For this reason they are checked just prior
stream out.
lto-bootstrapped/regteted x86_64-linux, OK?
gcc/ChangeLog:
2020-05-19 Jan Hubicka <hubicka@ucw.cz>
* lto-streamer-in.c (lto_input_scc): Add SHARED_SCC parameter.
(lto_input_tree_1): Strenghten sanity check.
(lto_input_tree): Update call of lto_input_scc.
* lto-streamer-out.c: Include ipa-utils.h
(create_output_block): Initialize local_trees if merigng is going
to happen.
(destroy_output_block): Destroy local_trees.
(DFS): Add max_local_entry.
(local_tree_p): New function.
(DFS::DFS): Initialize and maintain it.
(DFS::DFS_write_tree): Decide on streaming format.
(lto_output_tree): Stream inline singleton SCCs
* lto-streamer.h (enum LTO_tags): Add LTO_trees.
(struct output_block): Add local_trees.
(lto_input_scc): Update prototype.
gcc/lto/ChangeLog:
2020-05-19 Jan Hubicka <hubicka@ucw.cz>
* lto-common.c (compare_tree_sccs_1): Sanity check that we never
read TRANSLATION_UNIT_DECL.
(process_dref): Break out from ...
(unify_scc): ... here.
(process_new_tree): Break out from ...
(lto_read_decls): ... here; handle streaming of singleton trees.
(print_lto_report_1): Update statistics.
(cherry picked from commit 03d90a20a1)
(cherry picked from commit c6328b32770132efa004a3cad127cf74be84e911)
As reported by Iain and David, powerpc-darwin and powerpc-aix* have C++14
vs. C++17 ABI incompatibilities which are not fixed by mere adding of
cxx17_empty_base_field_p calls. Unlike the issues that were seen on other
targets where the artificial empty base field affected function argument
passing or returning of values, on these two targets the difference is
during class layout, not afterwards (e.g.
struct empty_base {};
struct S : public empty_base { unsigned long long l[2]; };
will have different __alignof__ (S) between C++14 and C++17 (or possibly
with double instead of unsigned long long too)).
I've tried:
struct X { };
struct Y { int : 0; };
struct Z { int : 0; Y y; };
struct U : public X { X q; };
struct A { float a, b, c, d; };
struct B : public X { float a, b, c, d; };
struct C : public Y { float a, b, c, d; };
struct D : public Z { float a, b, c, d; };
struct E : public U { float a, b, c, d; };
struct F { [[no_unique_address]] X x; float a, b, c, d; };
struct G { [[no_unique_address]] Y y; float a, b, c, d; };
struct H { [[no_unique_address]] Z z; float a, b, c, d; };
struct I { [[no_unique_address]] U u; float a, b, c, d; };
struct J { float a, b; [[no_unique_address]] X x; float c, d; };
struct K { float a, b; [[no_unique_address]] Y y; float c, d; };
struct L { float a, b; [[no_unique_address]] Z z; float c, d; };
struct M { float a, b; [[no_unique_address]] U u; float c, d; };
#define T(S, s) extern S s; extern void foo##s (S); int bar##s () { foo##s (s); return 0; }
T (A, a)
T (B, b)
T (C, c)
T (D, d)
T (E, e)
T (F, f)
T (G, g)
T (H, h)
T (I, i)
T (J, j)
T (K, k)
T (L, l)
T (M, m)
testcase on powerpc64-linux. Results:
G++ 9 -std=c++14 A, B, C passed in fprs, the rest in gprs
G++ 9 -std=c++17 A passed in fprs, the rest in gprs
current trunk -std=c++14 & 17 A, B, C passed in fprs, the rest in gprs
patched trunk -std=c++14 & 17 A, B, C, F, G, J, K passed in fprs, the rest in gprs
clang++ [*] -std=c++14 & 17 A, B, C, F, G, J, K passed in fprs, the rest in gprs
[*] clang version 11.0.0 (git@github.com:llvm/llvm-project.git 5c352e69e76a26e4eda075e20aa6a9bb7686042c)
Is that what we want? I think it matches the stated intent of P0840R2 or
what Jason/Jonathan said, and doing something different like e.g. not
treating C, G and K as homogenous because of the int : 0 in empty bases
or in zero sized [[no_unique_address] fields would be quite hard to
implement (because for C++14 the FIELD_DECL just isn't there).
2020-04-29 Jakub Jelinek <jakub@redhat.com>
PR target/94707
* tree-core.h (tree_decl_common): Note decl_flag_0 used for
DECL_FIELD_ABI_IGNORED.
* tree.h (DECL_FIELD_ABI_IGNORED): Define.
* calls.h (cxx17_empty_base_field_p): Change into a temporary
macro, check DECL_FIELD_ABI_IGNORED flag with no "no_unique_address"
attribute.
* calls.c (cxx17_empty_base_field_p): Remove.
* tree-streamer-out.c (pack_ts_decl_common_value_fields): Handle
DECL_FIELD_ABI_IGNORED.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Likewise.
* lto-streamer-out.c (hash_tree): Likewise.
* config/rs6000/rs6000-call.c (rs6000_aggregate_candidate): Rename
cxx17_empty_base_seen to empty_base_seen, change type to int *,
adjust recursive calls, use DECL_FIELD_ABI_IGNORED instead of
cxx17_empty_base_field_p, if "no_unique_address" attribute is
present, propagate that to the caller too.
(rs6000_discover_homogeneous_aggregate): Adjust
rs6000_aggregate_candidate caller, emit different diagnostics
when c++17 empty base fields are present and when empty
[[no_unique_address]] fields are present.
* config/rs6000/rs6000.c (rs6000_special_round_type_align,
darwin_rs6000_special_round_type_align): Skip DECL_FIELD_ABI_IGNORED
fields.
* class.c (build_base_field): Set DECL_FIELD_ABI_IGNORED on C++17 empty
base artificial FIELD_DECLs.
(layout_class_type): Set DECL_FIELD_ABI_IGNORED on empty class
field_poverlapping_p FIELD_DECLs.
* lto-common.c (compare_tree_sccs_1): Handle DECL_FIELD_ABI_IGNORED.
* g++.target/powerpc/pr94707-1.C: New test.
* g++.target/powerpc/pr94707-2.C: New test.
* g++.target/powerpc/pr94707-3.C: New test.
* g++.target/powerpc/pr94707-4.C: New test.
* g++.target/powerpc/pr94707-5.C: New test.
* g++.target/powerpc/pr94707-4.C: New test.
PR lto/94612
* lto-common.c: Initialize file_data->lto_section_header
before lto_mode_identity_table call. It is needed because
it decompresses a LTO section.
As mentioned in the PR, we don't guarantee DECL_UID to be the same between
corresponding decls in -g and -g0 builds, -g can create more decls and all
that is guaranteed is that the DECL_UIDs of the corresponding decls compare
the same.
The following testcase gets a -fcompare-debug failure because these
functions use DECL_UID as the number in ASM_FORMAT_PRIVATE_NAME.
The patch fixes it by using just a sequential number there instead.
I don't think this can be called during PCH writing, this only happens for
non-public decls and the C/C++ FEs shouldn't mangling those at that point
(furthermore C++ FE uses a different set_decl_assembler_name hook and this
one is something only the gimplifier calls on C.NNNN temporaries.
2020-03-25 Jakub Jelinek <jakub@redhat.com>
PR c++/94223
* langhooks.c (lhd_set_decl_assembler_name): Use a static ulong
counter instead of DECL_UID.
* lto-lang.c (lto_set_decl_assembler_name): Use a static ulong
counter instead of DECL_UID.
* g++.dg/opt/pr94223.C: New test.
* cgraph.c (cgraph_node_cannot_be_local_p_1): Prevent targets of
symver attributes to be localized.
* ipa-visibility.c (cgraph_externally_visible_p,
varpool_node::externally_visible_p): Likewise.
* symtab.c (symtab_node::verify_base): Check visibility of symbol
versions.
* lto-common.c (read_cgraph_and_symbols): Work around binutils
PR25424
Co-Authored-By: Xi Ruoyao <xry111@mengyan1223.wang>
From-SVN: r279566
* cgraph.c (cgraph_node::verify_node): Verify tp_first_run.
* cgraph.h (cgrpah_node): Turn tp_first_run back to int.
* cgraphunit.c (tp_first_run_node_cmp): Do not watch for overflows.
(expand_all_functions): First expand ordered section and then
unordered.
* lto-partition.c (lto_balanced_map): Fix printing of tp_first_run.
* profile.c (compute_value_histograms): Error on out of range
tp_first_runs.
From-SVN: r279178
This patch fixes three sissues with -fprofile-reorder-functions:
1) First is that tp_first_run is stored as 32bit integer while it can easily
overflow (and does so during Firefox profiling).
2) Second problem is that flag_profile_functions can
not be tested w/o function context.
The changes to expand_all_functions makes it to work on mixed units by
first outputting all functions w/o -fprofile-reorder-function (or with no
profile info) and then outputting in first_run order
3) LTO partitioner was mixing up order by tp_first_run and by order.
for no_reorder we definitly want to order via first, while for everything
else we want to roder by second.
I have also merged duplicated comparators since they are bit fragile into
tp_first_run_node_cmp.
I originaly started to look into this because of undefined symbols with
Firefox PGO builds. These symbols went away with fixing these bug but I am not
quite sure how. it is possible that there is another problem in lto_blanced_map
but even after reading the noreorder code few times carefuly I did not find it.
Other explanation would be that our new qsort with broken comparator due to
overflow can actualy remove some entries in the array, but that sounds bit
crazy.
Bootstrapped/regested x86_64-linux.
* cgraph.c (cgraph_node::dump): Make tp_first_run 64bit.
* cgraph.h (cgrpah_node): Likewise.
(tp_first_run_node_cmp): Deeclare.
* cgraphunit.c (node_cmp): Rename to ...
(tp_first_run_node_cmp): ... this; export; watch for 64bit overflows;
clear tp_first_run for no_reorder and !flag_profile_reorder_functions.
(expand_all_functions): Collect tp_first_run and normal functions to
two vectors so the other functions remain sorted. Do not check for
flag_profile_reorder_functions it is function local flag.
* profile.c (compute_value_histograms): Update tp_first_run printing.
* lto-partition.c (node_cmp): Turn into simple order comparsions.
(varpool_node_cmp): Remove.
(add_sorted_nodes): Use node_cmp.
(lto_balanced_map): Use tp_first_run_node_cmp.
From-SVN: r279093
Code that directly uses _Decimal* types on architectures not
supporting DFP is properly diagnosed ("error: decimal floating-point
not supported for this target"), via a call to
targetm.decimal_float_supported_p, if the _Decimal32, _Decimal64 or
_Decimal128 keywords are used to access it. Use via mode attributes
is also diagnosed ("unable to emulate 'SD'"); so is use of the
FLOAT_CONST_DECIMAL64 pragma. However, it is possible to access those
types via typeof applied to constants or built-in functions without
such an error. I expect that there are ways to get an ICE from this;
certainly it uses a completely undefined ABI.
This patch arranges for the types not to exist in the compiler at all
when DFP is not supported. As is done with unsupported _FloatN /
_FloatNx types, the global tree nodes are left as NULL_TREE, and the
built-in function machinery is made to use error_mark_node for them in
that case in builtin-types.def, so that the built-in functions are
unavailable. Code handling constants is adjusted to give an error,
and other code that might not work with the global tree nodes being
NULL_TREE is also updated.
Bootstrapped with no regressions for x86_64-pc-linux-gnu. Also tested
with no regressions for cross to aarch64-linux-gnu, as a configuration
without DFP support.
PR c/91985
gcc:
* builtin-types.def (BT_DFLOAT32, BT_DFLOAT64, BT_DFLOAT128)
(BT_DFLOAT32_PTR, BT_DFLOAT64_PTR, BT_DFLOAT128_PTR): Define to
error_mark_node if corresponding global tree node is NULL.
* tree.c (build_common_tree_nodes): Do not initialize
dfloat32_type_node, dfloat64_type_node or dfloat128_type_node if
decimal floating-point not supported.
gcc/c:
* c-decl.c (finish_declspecs): Use int instead of decimal
floating-point types if decimal floating-point not supported.
gcc/c-family:
* c-common.c (c_common_type_for_mode): Handle decimal
floating-point types being NULL_TREE.
* c-format.c (get_format_for_type_1): Handle specified types being
NULL_TREE.
* c-lex.c (interpret_float): Give an error for decimal
floating-point constants when decimal floating-point not
supported.
gcc/lto:
* lto-lang.c (lto_type_for_mode): Handle decimal floating-point
types being NULL_TREE.
gcc/testsuite:
* gcc.dg/c2x-no-dfp-1.c, gcc.dg/gnu2x-builtins-no-dfp-1.c: New
tests.
* gcc.dg/fltconst-pedantic-dfp.c: Expect errors when decimal
floating-point not supported.
From-SVN: r278684
2019-11-11 Martin Liska <mliska@suse.cz>
* Make-lang.in: Relax dependency of lto-dump.o to
LTO_OBJS which will allow faster linking (mainly with LTO).
From-SVN: r278054
PR middle-end/92231
* tree.h (fndecl_built_in_p): Use fndecl_built_in_p instead of
DECL_BUILT_IN in comment. Remove redundant ()s around return
argument.
* tree.c (free_lang_data_in_decl): Check if var is FUNCTION_DECL
before calling fndecl_built_in_p.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Check if
TREE_OPERAND (fn, 0) is a FUNCTION_DECL before calling
fndecl_built_in_p on it.
lto/
* lto-lang.c (handle_const_attribute): Don't call fndecl_built_in_p
on *node that is not FUNCTION_DECL.
testsuite/
* gcc.c-torture/compile/pr92231.c: New test.
From-SVN: r277660
2019-10-29 Martin Liska <mliska@suse.cz>
* cgraphunit.c (symbol_table::compile): Pass
title as dump_memory_report argument.
* toplev.c (dump_memory_report): New argument.
(finalize): Pass new argument.
* toplev.h (dump_memory_report): Add argument.
2019-10-29 Martin Liska <mliska@suse.cz>
* lto.c (do_whole_program_analysis): Pass
title as dump_memory_report argument.
From-SVN: r277559
* gimple-streamer-out.c (output_gimple_stmt): Add explicit function
parameter.
* lto-streamer-out.c: Include tree-dfa.h.
(output_cfg): Do not use cfun.
(lto_prepare_function_for_streaming): New.
(output_function): Do not push cfun; do not initialize loop optimizer.
* lto-streamer.h (lto_prepare_function_for_streaming): Declare.
* passes.c (ipa_write_summaries): Use it.
(ipa_write_optimization_summaries): Do not modify bodies.
* tree-dfa.c (renumber_gimple_stmt_uids): Add function parameter.
* tree.dfa.h (renumber_gimple_stmt_uids): Update prototype.
* tree-ssa-dse.c (pass_dse::execute): Update use of
renumber_gimple_stmt_uids.
* tree-ssa-math-opts.c (pass_optimize_widening_mul::execute): Likewise.
* lto.c (lto_wpa_write_files): Prepare all bodies for streaming.
From-SVN: r276870
Various built-in functions that GCC has as extensions are now standard
functions in C2x. This patch adds DEF_C2X_BUILTIN and uses it to mark
them as such. Some of the so-marked functions were previously
DEF_EXT_LIB_BUILTIN, while some DFP ones were DEF_GCC_BUILTIN
(i.e. __builtin_* only); both sets become DEF_C2X_BUILTIN. This in
turn requires flag_isoc2x to be defined in various front ends using
builtins.def.
As the semantics of the built-in functions should already be tested,
the tests added only verify that they are declared in C2x mode but not
in C11 mode. The test of DFP built-in functions being declared for
C2x goes in gcc.dg/dfp/, as while such built-in functions currently
don't depend on whether DFP is supported, that looks like a bug to me
(see bug 91985), so it seems best for the tests not to depend on
exactly how that bug might be fixed.
Bootstrapped with no regressions on x86_64-pc-linux-gnu.
gcc:
* builtins.def (DEF_C2X_BUILTIN): New macro.
(exp10, exp10f, exp10l, fabsd32, fabsd64, fabsd128, nand32)
(nand64, nand128, roundeven, roundevenf, roundevenl, strdup)
(strndup): Use DEF_C2X_BUILTIN.
* coretypes.h (enum function_class): Add function_c2x_misc.
gcc/ada:
* gcc-interface/utils.c (flag_isoc2x): New variable.
gcc/brig:
* brig-lang.c (flag_isoc2x): New variable.
gcc/lto:
* lto-lang.c (flag_isoc2x): New variable.
gcc/testsuite:
* gcc.dg/c11-builtins-1.c, gcc.dg/c2x-builtins-1.c,
gcc.dg/dfp/c2x-builtins-dfp-1.c: New tests.
From-SVN: r276588
2019-09-18 Richard Biener <rguenther@suse.de>
PR lto/91763
* lto-streamer-in.c (input_eh_regions): Move EH init to
lto_materialize_function.
* tree-streamer-in.c (lto_input_ts_function_decl_tree_pointers):
Likewise.
lto/
* lto.c (lto_materialize_function): Initialize EH by looking
at the function personality and flag_exceptions setting.
From-SVN: r275872
We were shoe-horning all built-in enumerations (including frontend
and target-specific ones) into a field of type built_in_function. This
was accessed as either an lvalue or an rvalue using DECL_FUNCTION_CODE.
The obvious danger with this (as was noted by several ??? comments)
is that the ranges have nothing to do with each other, and targets can
easily have more built-in functions than generic code. But my patch to
make the field bigger was the straw that finally made the problem visible.
This patch therefore:
- replaces the field with a plain unsigned int
- turns DECL_FUNCTION_CODE into an rvalue-only accessor that checks
that the function really is BUILT_IN_NORMAL
- adds corresponding DECL_MD_FUNCTION_CODE and DECL_FE_FUNCTION_CODE
accessors for BUILT_IN_MD and BUILT_IN_FRONTEND respectively
- adds DECL_UNCHECKED_FUNCTION_CODE for places that need to access the
underlying field (should be low-level code only)
- adds new helpers for setting the built-in class and function code
- makes DECL_BUILT_IN_CLASS an rvalue-only accessor too, since all
assignments should go through the new helpers
2019-08-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR middle-end/91421
* tree-core.h (function_decl::function_code): Change type to
unsigned int.
* tree.h (DECL_FUNCTION_CODE): Rename old definition to...
(DECL_UNCHECKED_FUNCTION_CODE): ...this.
(DECL_BUILT_IN_CLASS): Make an rvalue macro only.
(DECL_FUNCTION_CODE): New function. Assert that the built-in class
is BUILT_IN_NORMAL.
(DECL_MD_FUNCTION_CODE, DECL_FE_FUNCTION_CODE): New functions.
(set_decl_built_in_function, copy_decl_built_in_function): Likewise.
(fndecl_built_in_p): Change the type of the "name" argument to
unsigned int.
* builtins.c (expand_builtin): Move DECL_FUNCTION_CODE use
after check for DECL_BUILT_IN_CLASS.
* cgraphclones.c (build_function_decl_skip_args): Use
set_decl_built_in_function.
* ipa-param-manipulation.c (ipa_modify_formal_parameters): Likewise.
* ipa-split.c (split_function): Likewise.
* langhooks.c (add_builtin_function_common): Likewise.
* omp-simd-clone.c (simd_clone_create): Likewise.
* tree-streamer-in.c (unpack_ts_function_decl_value_fields): Likewise.
* config/darwin.c (darwin_init_cfstring_builtins): Likewise.
(darwin_fold_builtin): Use DECL_MD_FUNCTION_CODE instead of
DECL_FUNCTION_CODE.
* fold-const.c (operand_equal_p): Compare DECL_UNCHECKED_FUNCTION_CODE
instead of DECL_FUNCTION_CODE.
* lto-streamer-out.c (hash_tree): Use DECL_UNCHECKED_FUNCTION_CODE
instead of DECL_FUNCTION_CODE.
* tree-streamer-out.c (pack_ts_function_decl_value_fields): Likewise.
* print-tree.c (print_node): Use DECL_MD_FUNCTION_CODE when
printing DECL_BUILT_IN_MD. Handle DECL_BUILT_IN_FRONTEND.
* config/aarch64/aarch64-builtins.c (aarch64_expand_builtin)
(aarch64_fold_builtin, aarch64_gimple_fold_builtin): Use
DECL_MD_FUNCTION_CODE instead of DECL_FUNCTION_CODE.
* config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Likewise.
* config/alpha/alpha.c (alpha_expand_builtin, alpha_fold_builtin):
(alpha_gimple_fold_builtin): Likewise.
* config/arc/arc.c (arc_expand_builtin): Likewise.
* config/arm/arm-builtins.c (arm_expand_builtin): Likewise.
* config/avr/avr-c.c (avr_resolve_overloaded_builtin): Likewise.
* config/avr/avr.c (avr_expand_builtin, avr_fold_builtin): Likewise.
* config/bfin/bfin.c (bfin_expand_builtin): Likewise.
* config/c6x/c6x.c (c6x_expand_builtin): Likewise.
* config/frv/frv.c (frv_expand_builtin): Likewise.
* config/gcn/gcn.c (gcn_expand_builtin_1): Likewise.
(gcn_expand_builtin): Likewise.
* config/i386/i386-builtins.c (ix86_builtin_reciprocal): Likewise.
(fold_builtin_cpu): Likewise.
* config/i386/i386-expand.c (ix86_expand_builtin): Likewise.
* config/i386/i386.c (ix86_fold_builtin): Likewise.
(ix86_gimple_fold_builtin): Likewise.
* config/ia64/ia64.c (ia64_fold_builtin): Likewise.
(ia64_expand_builtin): Likewise.
* config/iq2000/iq2000.c (iq2000_expand_builtin): Likewise.
* config/mips/mips.c (mips_expand_builtin): Likewise.
* config/msp430/msp430.c (msp430_expand_builtin): Likewise.
* config/nds32/nds32-intrinsic.c (nds32_expand_builtin_impl): Likewise.
* config/nios2/nios2.c (nios2_expand_builtin): Likewise.
* config/nvptx/nvptx.c (nvptx_expand_builtin): Likewise.
* config/pa/pa.c (pa_expand_builtin): Likewise.
* config/pru/pru.c (pru_expand_builtin): Likewise.
* config/riscv/riscv-builtins.c (riscv_expand_builtin): Likewise.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Likewise.
* config/rs6000/rs6000-call.c (htm_expand_builtin): Likewise.
(altivec_expand_dst_builtin, altivec_expand_builtin): Likewise.
(rs6000_gimple_fold_builtin, rs6000_expand_builtin): Likewise.
* config/rs6000/rs6000.c (rs6000_builtin_md_vectorized_function)
(rs6000_builtin_reciprocal): Likewise.
* config/rx/rx.c (rx_expand_builtin): Likewise.
* config/s390/s390-c.c (s390_resolve_overloaded_builtin): Likewise.
* config/s390/s390.c (s390_expand_builtin): Likewise.
* config/sh/sh.c (sh_expand_builtin): Likewise.
* config/sparc/sparc.c (sparc_expand_builtin): Likewise.
(sparc_fold_builtin): Likewise.
* config/spu/spu-c.c (spu_resolve_overloaded_builtin): Likewise.
* config/spu/spu.c (spu_expand_builtin): Likewise.
* config/stormy16/stormy16.c (xstormy16_expand_builtin): Likewise.
* config/tilegx/tilegx.c (tilegx_expand_builtin): Likewise.
* config/tilepro/tilepro.c (tilepro_expand_builtin): Likewise.
* config/xtensa/xtensa.c (xtensa_fold_builtin): Likewise.
(xtensa_expand_builtin): Likewise.
gcc/ada/
PR middle-end/91421
* gcc-interface/trans.c (gigi): Call set_decl_buillt_in_function.
(Call_to_gnu): Use DECL_FE_FUNCTION_CODE instead of DECL_FUNCTION_CODE.
gcc/c/
PR middle-end/91421
* c-decl.c (merge_decls): Use copy_decl_built_in_function.
gcc/c-family/
PR middle-end/91421
* c-common.c (resolve_overloaded_builtin): Use
copy_decl_built_in_function.
gcc/cp/
PR middle-end/91421
* decl.c (duplicate_decls): Use copy_decl_built_in_function.
* pt.c (declare_integer_pack): Use set_decl_built_in_function.
gcc/d/
PR middle-end/91421
* intrinsics.cc (maybe_set_intrinsic): Use set_decl_built_in_function.
gcc/jit/
PR middle-end/91421
* jit-playback.c (new_function): Use set_decl_built_in_function.
gcc/lto/
PR middle-end/91421
* lto-common.c (compare_tree_sccs_1): Use DECL_UNCHECKED_FUNCTION_CODE
instead of DECL_FUNCTION_CODE.
* lto-symtab.c (lto_symtab_merge_p): Likewise.
From-SVN: r274404