As reported on gcc-regression, this test FAILs on aarch64, but my
r15-2090 change didn't change anything on the generated assembly,
just added the forgotten dg-do run directive to the test, so the
test has been failing forever, just we didn't know it.
I can actually reproduce it on x86_64 with -funsigned-char too,
s2.b.a has int type and -1 is stored to it, so we should compare
it against -1 rather than (char) -1; the latter is appropriate for
testing char fields into which we've stored -1.
2024-07-18 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/torture/builtin-clear-padding-3.c (main): Compare
s2.b.a against -1 rather than (char) -1.
(cherry picked from commit 958ee13874)
The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size. That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some. If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
> (unsigned HOST_WIDE_INT) buf->sz)
{
gcc_assert (wordsize > 1);
wordsize /= 2;
i -= wordsize;
continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end. If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.
2024-07-17 Jakub Jelinek <jakub@redhat.com>
PR middle-end/115527
* gimple-fold.c (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.
* c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
directive.
* c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-6.c: New test.
(cherry picked from commit 8b5919bae1)
Currently unaligned YMM and ZMM load and store costs are cheaper than
aligned which causes the vectorizer to purposely mis-align accesses
by adding an alignment prologue. It looks like the unaligned costs
were simply left untouched from znver3 where they equate the aligned
costs when tweaking aligned costs for znver4. The following makes
the unaligned costs equal to the aligned costs.
This avoids the miscompile seen in PR115843 but it's of course not
a real fix for the issue uncovered there. But it makes it qualify
as a regression fix.
PR tree-optimization/115843
* config/i386/x86-tune-costs.h (znver4_cost): Update unaligned
load and store cost from the aligned costs.
(cherry picked from commit 1e3aa9c927)
The --with-long-double-abi=ibm build is missing some exports that are
present in the --with-long-double-abi=ieee build. Those symbols never
should have been exported at all, but now that they have been, they
should be exported consistently by both ibm and ieee.
This simply defines them as aliases for equivalent symbols that are
already present. The abi-tag on num_get::_M_extract_int isn't really
needed, because it only uses a std::string as a local variable, not in
the return type or function parameters, so it's safe to define the
_M_extract_int[abi:cxx11] symbols as aliases for the corresponding
function without the abi-tag.
This causes some new symbols to be added to the GLIBCXX_3.4.29 version
for the ibm long double build mode, but there is no advantage to adding
them to 3.4.30 for that build. That would just create more
inconsistencies.
libstdc++-v3/ChangeLog:
PR libstdc++/105417
* config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt:
Regenerate.
* src/c++11/compatibility-ldbl-alt128.cc [_GLIBCXX_USE_DUAL_ABI]:
Define __gnu_ieee128::num_get<C>::_M_extract_int[abi:cxx11]<I>
symbols as aliases for corresponding symbols without abi-tag.
(cherry picked from commit bb7cf39b05)
This is a workaround for a possible compiler bug that causes constraint
recursion in the operator<=>(const optional<T>&, const U&) overload.
libstdc++-v3/ChangeLog:
PR libstdc++/104606
* include/std/optional (operator<=>(const optional<T>&, const U&)):
Reverse order of three_way_comparable_with template arguments.
* testsuite/20_util/optional/relops/104606.cc: New test.
(cherry picked from commit 7f65d8267f)
This patch fixes the backend pattern that was printing the wrong input
scalar register pair when inserting into lane 1.
Added a new test to force float-abi=hard so we can use scan-assembler to check
correct codegen.
gcc/ChangeLog:
PR target/115611
* config/arm/mve.md (mve_vec_setv2di_internal): Fix printing of input
scalar register pair when lane = 1.
gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/intrinsics/vsetq_lane_su64.c: New test.
(cherry picked from commit 7c11fdd2cc)
emit_store_flag_1 calculates scode (swapped condition code) at the
beginning of the function from the value of code variable. However,
code variable may change before scode usage site, resulting in
invalid stalled scode value.
Move calculation of scode value just before its only usage site to
avoid stalled scode value.
PR middle-end/115836
gcc/ChangeLog:
* expmed.c (emit_store_flag_1): Move calculation of
scode just before its only usage site.
(cherry picked from commit 44933fdeb3)
The allocator objects in container node handles were not being destroyed
after the node was re-inserted into a container. They are stored in a
union and so need to be explicitly destroyed when the node becomes
empty. The containers were zeroing the node handle's pointer, which
makes it empty, causing the handle's destructor to think there's nothing
to clean up.
Add a new member function to the node handle which destroys the
allocator and zeros the pointer. Change the containers to call that
instead of just changing the pointer manually.
We can also remove the _M_empty member of the union which is not
necessary.
libstdc++-v3/ChangeLog:
PR libstdc++/114401
* include/bits/hashtable.h (_Hashtable::_M_reinsert_node): Call
release() on node handle instead of just zeroing its pointer.
(_Hashtable::_M_reinsert_node_multi): Likewise.
(_Hashtable::_M_merge_unique): Likewise.
* include/bits/node_handle.h (_Node_handle_common::release()):
New member function.
(_Node_handle_common::_Optional_alloc::_M_empty): Remove
unnecessary union member.
(_Node_handle_common): Declare _Hashtable as a friend.
* include/bits/stl_tree.h (_Rb_tree::_M_reinsert_node_unique):
Call release() on node handle instead of just zeroing its
pointer.
(_Rb_tree::_M_reinsert_node_equal): Likewise.
(_Rb_tree::_M_reinsert_node_hint_unique): Likewise.
(_Rb_tree::_M_reinsert_node_hint_equal): Likewise.
* testsuite/23_containers/multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/set/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_multiset/modifiers/114401.cc: New test.
* testsuite/23_containers/unordered_set/modifiers/114401.cc: New test.
(cherry picked from commit c2e28df90a)
The ACLE requires __ARM_FEATURE_SVE_BF16 to be enabled when SVE and BF16
and the associated intrinsics are available.
GCC does support the required intrinsics for TARGET_SVE_BF16 so define
this macro too.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115475
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_SVE_BF16 for TARGET_SVE_BF16.
gcc/testsuite/
PR target/115475
* gcc.target/aarch64/acle/bf16_sve_feature.c: New test.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit 6492c7130d)
The ACLE asks the user to test for __ARM_FEATURE_BF16 before using the
<arm_bf16.h> header but GCC doesn't set this up.
LLVM does, so this is an inconsistency between the compilers.
This patch enables that macro for TARGET_BF16_FP.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/115457
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
Define __ARM_FEATURE_BF16 for TARGET_BF16_FP.
gcc/testsuite/
PR target/115457
* gcc.target/aarch64/acle/bf16_feature.c: New test.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
(cherry picked from commit c10942134f)
This testcase was fixed by r14-5934-gf26d68d5d128c8 but we should add
one to make sure it does not regress again.
Committed as obvious after a quick test on the testcase.
PR c++/97990
gcc/testsuite/ChangeLog:
* g++.dg/torture/vector-struct-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
(cherry picked from commit 5f1438db41)
The following fixes a stray TYPE_ALIAS_SET in a type variant built
by build_opaque_vector_type which is diagnosed by type checking
enabled with -flto.
PR middle-end/112732
* tree.c (build_opaque_vector_type): Reset TYPE_ALIAS_SET
of the newly built type.
(cherry picked from commit f26d68d5d1)
Use INCLUDE_VECTOR before including system.h, instead of directly
including <vector>, to avoid running into poisoned identifiers.
Signed-off-by: Dimitry Andric <dimitry@andric.com>
libcc1/ChangeLog:
PR middle-end/111632
* libcc1plugin.cc: Fix include.
* libcp1plugin.cc: Fix include.
(cherry picked from commit 5213047b1d)
When building gcc's C++ sources against recent libc++, the poisoning of
the ctype macros due to including safe-ctype.h before including C++
standard headers such as <list>, <map>, etc, causes many compilation
errors, similar to:
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:546:5: error: '__abi_tag__' attribute
only applies to structs, variables, functions, and namespaces
546 | _LIBCPP_INLINE_VISIBILITY
| ^
/usr/include/c++/v1/__config:813:37: note: expanded from macro
'_LIBCPP_INLINE_VISIBILITY'
813 | # define _LIBCPP_INLINE_VISIBILITY _LIBCPP_HIDE_FROM_ABI
| ^
/usr/include/c++/v1/__config:792:26: note: expanded from macro
'_LIBCPP_HIDE_FROM_ABI'
792 |
__attribute__((__abi_tag__(_LIBCPP_TOSTRING(
_LIBCPP_VERSIONED_IDENTIFIER))))
| ^
In file included from /home/dim/src/gcc/master/gcc/gensupport.cc:23:
In file included from /home/dim/src/gcc/master/gcc/system.h:233:
In file included from /usr/include/c++/v1/vector:321:
In file included from
/usr/include/c++/v1/__format/formatter_bool.h:20:
In file included from
/usr/include/c++/v1/__format/formatter_integral.h:32:
In file included from /usr/include/c++/v1/locale:202:
/usr/include/c++/v1/__locale:547:37: error: expected ';' at end of
declaration list
547 | char_type toupper(char_type __c) const
| ^
/usr/include/c++/v1/__locale:553:48: error: too many arguments
provided to function-like macro invocation
553 | const char_type* toupper(char_type* __low, const
char_type* __high) const
| ^
/home/dim/src/gcc/master/gcc/../include/safe-ctype.h:146:9: note:
macro 'toupper' defined here
146 | #define toupper(c) do_not_use_toupper_with_safe_ctype
| ^
This is because libc++ uses different transitive includes than
libstdc++, and some of those transitive includes pull in various ctype
declarations (typically via <locale>).
There was already a special case for including <string> before
safe-ctype.h, so move the rest of the C++ standard header includes to
the same location, to fix the problem.
gcc/ChangeLog:
* system.h: Include safe-ctype.h after C++ standard headers.
Signed-off-by: Dimitry Andric <dimitry@andric.com>
(cherry picked from commit 9970b576b7)
Most of the time we get away with using the dsymutil that is
installed with the latest Xcode, however for some cross-compilation
cases that does not work.
We now have the ability to specify the correct dsymutil to use for
the toolchain (--with-dsymutil=) and we should use that specified
tool for debug link. Fixes cross-compilers from x86-64 to powerpc.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ada/ChangeLog:
* gcc-interface/Makefile.in: Use DSYMUTIL_FOR_TARGET in
libgnat/libgnarl recipies.
We check that the final_suspend () method returns a sane type (i.e. a class
or structure) but, unfortunately, that check has to be later than the one
for a throwing case. If the use returns some nonsensical type from the
method, we need to handle that in the checking for noexcept.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/104051
gcc/cp/ChangeLog:
* coroutines.cc (coro_diagnose_throwing_final_aw_expr): Handle
non-target expression inputs.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr104051.C: New test.
(cherry picked from commit 7b96274a34)
We do not support this yet.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/101765
gcc/cp/ChangeLog:
* coroutines.cc (register_local_var_uses): Emit a sorry if
we encounter a VLA in the coroutine local variables.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr101765.C: New test.
(cherry picked from commit fdf0b6ce6c)
The wording of the standard has been clarified to be explicit that
the the parameters to any user-defined operator-new in the promise
class should be lvalues.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/100772
gcc/cp/ChangeLog:
* coroutines.cc (morph_fn_to_coro): Convert function parms
from reference before constructing any operator-new args
list.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr100772-a.C: New test.
* g++.dg/coroutines/pr100772-b.C: New test.
(cherry picked from commit 921942a8a1)
C++20 [expr.await] / 2
An await-expression shall appear only in a potentially-evaluated expression
within the compound-statement of a function-body outside of a handler.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR c++/99710
gcc/cp/ChangeLog:
* coroutines.cc (await_statement_walker): Report an error if
an await expression is found in a handler body.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr99710.C: New test.
(cherry picked from commit 650beb1105)
This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.
This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
libstdc++-v3/ChangeLog:
* doc/xml/manual/status_cxx2023.xml: Change reference from
mainline GCC to the release branch.
* doc/html/manual/status.html: Regenerate.
As the associated test case in PR114846 shows, currently
with eh_return involved some register restoring for EH
RETURN DATA in epilogue can clobber the one which holding
the return value. Referring to the existing handlings in
some other targets, this patch makes eh_return expander
call one new define_insn_and_split eh_return_internal which
directly calls rs6000_emit_epilogue with epilogue_type
EPILOGUE_TYPE_EH_RETURN instead of the previous treating
normal return with crtl->calls_eh_return specially.
PR target/114846
gcc/ChangeLog:
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): As
EPILOGUE_TYPE_EH_RETURN would be passed as epilogue_type directly
now, adjust the relevant handlings on it.
* config/rs6000/rs6000.md (eh_return expander): Append by calling
gen_eh_return_internal and emit_barrier.
(eh_return_internal): New define_insn_and_split, call function
rs6000_emit_epilogue with epilogue type EPILOGUE_TYPE_EH_RETURN.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr114846.c: New test.
(cherry picked from commit e5fc5d42d2)