OpenMP/USM implies memory accessible from host as well as device, but doesn't
imply that allocation vs. deallocation may be done in the opposite context.
For most of the test cases, (by construction) we're not allocating memory
during device execution, so have nothing to clean up. (..., but still document
these semantics.) But for a few, we have to clean up:
'libgomp.c++/target-std__map-concurrent-usm.C',
'libgomp.c++/target-std__multimap-concurrent-usm.C',
'libgomp.c++/target-std__multiset-concurrent-usm.C',
'libgomp.c++/target-std__set-concurrent-usm.C'.
For 'libgomp.c++/target-std__multimap-concurrent-usm.C' (only), this issue
already got addressed in commit 90f2ab4b6e
"libgomp.c++/target-std__multimap-concurrent.C: Fix USM memory freeing".
However, instead of invoking the 'clear' function (which doesn't generally
guarantee to release dynamically allocated memory; for example, see PR123582
"C++ unordered associative container: dynamic memory management"), we properly
restore the respective object into pristine state.
libgomp/
* testsuite/libgomp.c++/target-std__array-concurrent-usm.C:
'#define OMP_USM'.
* testsuite/libgomp.c++/target-std__forward_list-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__list-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__span-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__map-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__multimap-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__multiset-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__set-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__valarray-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__vector-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__bitset-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__deque-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__array-concurrent.C: Comment.
* testsuite/libgomp.c++/target-std__bitset-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__deque-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__forward_list-concurrent.C:
Likewise.
* testsuite/libgomp.c++/target-std__list-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__span-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__valarray-concurrent.C:
Likewise.
* testsuite/libgomp.c++/target-std__vector-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__map-concurrent.C [OMP_USM]:
Fix up dynamic memory allocation.
* testsuite/libgomp.c++/target-std__multimap-concurrent.C
[OMP_USM]: Likewise.
* testsuite/libgomp.c++/target-std__multiset-concurrent.C
[OMP_USM]: Likewise.
* testsuite/libgomp.c++/target-std__set-concurrent.C [OMP_USM]:
Likewise.
All targets use the EH_RETURN_DATA_REGNO macro argument except for
NVPTX which uses the default.
The problem is that we get then -Wunused-but-set-variable warning
when building df-scan.cc for NVPTX target with GCC 16 (post r16-2258
PR44677) on:
unsigned int i;
/* Mark the registers that will contain data for the handler. */
for (i = 0; ; ++i)
{
unsigned regno = EH_RETURN_DATA_REGNO (i);
if (regno == INVALID_REGNUM)
break;
If it were multiple targets suffering from this, I'd think about
adding something to use i in loops like this, but as it is
just the default definition, the following patch fixes it by
using the argument.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR middle-end/123115
* defaults.h (EH_RETURN_DATA_REGNO): Add void (N) to the macro
definition inside of a comma expression before INVALID_REGNUM.
The r12-4475 change added extra code to recog_for_combine to attempt to
force some constants into the constant pool.
Unfortunately, as this (UB at runtime) testcase shows, such changes are
harmful for computed_jump_p jumps. The computed_jump_p returns false
for loads from constant pool MEMs:
case MEM:
return ! (GET_CODE (XEXP (x, 0)) == SYMBOL_REF
&& CONSTANT_POOL_ADDRESS_P (XEXP (x, 0)));
and so if we try to optimize a computed jump that way, it becomes
a non-computed jump which doesn't match any other jump category
(simplejump_p, tablejump_p, condjump_p, returnjump_p, eh_returnjump_p,
asm goto) and doesn't have any label recorded in JUMP_LABEL (because,
it doesn't really jump to any LABEL), so some passes like dwarf2cfi
can get confused about it and ICE.
The following patch just prevents that, by only doing the r12-4475
changes if it is not a jump.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR target/120250
* combine.cc (recog_for_combine): Don't try to put SET_SRC
into a constant pool if SET_DEST is pc_rtx.
* gcc.c-torture/compile/pr120250.c: New test.
The following fixes a regression from the time we split load groups
along SLP boundaries. When we face a permuted load from an access
that is contiguous across loop iterations we emit code that loads
the whole group and then emit required permutations. The permutations
might not need all those loads, and if we split the group we would
not have emitted them. Fortunately when analyzing a permutation
we compute both the number of required permutes and the number of
loads that will survive the followin DCE. So make sure to use that
when costing. This allows the previously added testcase for PR123190
to undergo epilog vectorization also at -O2 plus when using non-generic
tuning, such as tuning for Zen4 which ups the cost for XMM loads.
PR tree-optimization/123190
* tree-vectorizer.h (vect_load_store_data): Add n_loads member.
* tree-vect-stmts.cc (get_load_store_type): Record the
number of required loads for permuted loads.
(vectorizable_load): Make use of this when costing loads
for VMAT_CONTIGUOUS[_REVERSE].
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: Do not
require -mtune=generic.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-2.c: Add
variant with -O2 instead of -O3, inner loop not unrolled.
The following adjusts the condition where we reject vectorization
because the scalar loop runs only for a single iteration (or two,
in case we need to peel for gaps). Because this is over-eager
when considering the case of VF == 1 where instead the cost model
should decide wheter it is worthwhile or not. I'm playing
conservative here and exclude the case of two iterations as I
do not have benchmark evidence.
This helps fixing a regression observed with improved SLP handling,
not exactly for the options used in the PR though, but for a more
common -O3 -march=x86-64-v3 this speeds up 433.milc by 6%.
PR tree-optimization/123190
* tree-vect-loop.cc (vect_analyze_loop_costing): Allow
vectorizing loops with a single scalar iteration iff the
vectorization factor is 1.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123190-1.c: New testcase.
* gcc.dg/vect/slp-28.c: Avoid epilogue vectorization for
simplicity.
On Tue, Nov 04, 2025 at 12:59:03PM +0530, Kishan Parmar wrote:
> PR rtl-optimization/93738
> * simplify-rtx.cc (simplify_binary_operation_1): Canonicalize
> SUBREG(LSHIFTRT) into LSHIFTRT(SUBREG) when valid.
This change regressed the following testcase on aarch64-linux.
From what I can see, the PR93738 change has been written with non-paradoxical
SUBREGs in mind but on this testcase on aarch64 we have a paradoxical SUBREG,
in particular simplify_binary_operation_1 is called with AND, SImode,
(subreg:SI (lshiftrt:HI (subreg:HI (reg/v:SI 108 [ x ]) 0)
(const_int 8 [0x8])) 0)
and op1 (const_int 32767 [0x7fff]) and simplifies that since the PR93738
optimization was added into
(and:SI (lshiftrt:SI (reg/v:SI 108 [ x ])
(const_int 8 [0x8]))
(const_int 32767 [0x7fff]))
This looks wrong to me.
Consider (reg/v:SI 108 [ x ]) 0) could have value 0x12345678U.
The original expression takes lowpart 16-bits from that, i.e. 0x5678U,
shifts that right logically by 8 bits, so 0x56U, makes a paradoxical SUBREG
from that, i.e. 0x????0056U and masks that with 0x7fff, i.e. result is 0x56U.
The new expression shifts 0x12345678U logically right by 8 bits, i.e. 0x123456U and
masks it by 0x7fff, result 0x3456U.
Thus, I think we need to limit to non-paradoxical SUBREGs.
On the rlwimi-2.c testcase I see on powerpc64le-linux no differences in
emitted assembly without/with the patch.
2026-01-14 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/123544
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case AND>: Don't canonicalize (subreg (lshiftrt (x cnt)) low) into
(lshiftrt (subreg x low) cnt) if the SUBREG is paradoxical.
* gcc.dg/pr123544.c: New test.
It meets Cpp17RandomAccessIterator requirements, but does not satisfy
random_access_iterator concept.
libstdc++-v3/ChangeLog:
* testsuite/util/testsuite_iterators.h: Modify comment.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
The patch enables time profile based reordering with AutoFDO with
-fauto-profile -fprofile-reorder-functions, by mapping timestamps obtained from perf
into node->tp_first_run.
The rationale for doing this is:
(1) GCC already implements time-profile function reordering with PGO, the patch enables
it with AutoFDO.
(2) While time profile ordering is primarly meant for optimizing startup time,
we've also observed good effects on code-locality for large internal workloads.
(3) Possibly useful for function reordering when accurate profile annotation is
hard with AutoFDO -- For eg, if branch samples are missing (due to absence of
LBR like structure).
On AutoFDO tools side, a corresponding patch extends gcov to emit 64-bit perf timestamp that
records first execution of function, which loosely corresponds to PGO's time_profile counter.
The timestamp is stored adjacent to head field in toplevel function info.
On GCC side, this patch makes the following changes:
(1) Changes to auto-profile pass:
The patch adds a new field timestamp to function_instance,
and populates it in read_function_instance.
It maintains a new timestamp_info_map from timestamp -> <name, tp_first_run>,
which maps timestamps sorted in ascending order to (1..N), so lowest ordered
timestamp is mapped to 1 and so on. The rationale for this is that
timestamps are 64-bit integers, and we don't need the full 64-bit range
for ordering by tp_first_run.
During annotation, the timestamp associated with function_instance is looked up
in timestamp_info_map, and corresponding mapped value is assigned
to node->tp_first_run.
Dhruv's sourcefile tracking patch already handles LTO privatized symbols.
The patch adds a workaround for mismatched/empty filenames, which should go away
when the issues with AutoFDO tools dwarf parsing are resolved.
(2) Param to disable profile driven opts.
The patch adds param auto-profile-reorder-only which only enables time-profile reordering with
AutoFDO:
(a) Useful as a debugging aid to isolate regression to either function reordering or profile driven opts.
(b) As a stopgap measure to avoid regressions with AutoFDO profile driven opts.
(c) Possibly useful for architectures which do not support branch sampling.
gcc/ChangeLog:
* auto-profile.cc: (string_table::filenames): New method.
(function_instance::timestamp_): New member.
(function_instance::timestamp): New accessor for timestamp_ member.
(function_instance::set_timestamp): New function.
(function_instance::prop_timestamp): Likewise.
(function_instance::prop_timestamp_1): Likewise.
(function_instance::function_instance): Initialize timestamp_ to 0.
(function_instance::read_function_instance): Adjust prototype by
replacing head_count with toplevel param with default value true, and
stream in head_count and timestamp values from gcov file.
(autofdo::timestamp_info_map): New std::map.
(autofdo_source_profile::get_function_instance_by_decl): New argument
filename with default value NULL.
(autofdo_source_profile::read): Populate timestamp_info_map and
propagate timestamp to inlined instances from toplevel function.
(afdo_annotate_cfg): Assign node->tp_first_run based on
timestamp_info_map and bail out of annotation if
param_auto_profile_reorder_only is enabled.
* params.opt: New param auto-profile-reorder-only.
Signed-off-by: Prathamesh Kulkarni <prathameshk@nvidia.com>
For the E-core front end, aligning tight loops provides little benefit.
gcc/ChangeLog:
* config/i386/x86-tune.def (X86_TUNE_ALIGN_TIGHT_LOOPS):
disable tight loop alignment for m_CORE_ATOM.
During ML discussions of a match.pd pattern that was introducing a new
instance of 'warn_strict_overflow', Richard mentioned that this use
should be discouraged [1]. After pointing out that this usage was
documented in tree.h he then explained that we should remove the note
from the header [2]. Here's the reasoning:
"Ah, we should remove that note. -Wstrict-overflow proved useless IMO,
it's way too noisy as it diagnoses when the compiler relies on overflow
not happening, not diagnosing when it possibly happens. That's not a
very useful diagnostic to have - it does not point to a possible problem
in the code (we could as well diagnose _all_ signed arithmetic
operations for the same argument that we might eventually rely on
overflow not happening)."
Aside from removing the tree.h node we're also removing the 2 references
in match.pd. match.pd patterns tend to be copied around to serve as a
base for new patterns (like I did in [3] adding a
'fold_overflow_warning'), and if we want to discourage the use avoiding
its spread is a good start.
Note that there are a lot of references left, most of them in
gcc/fold-const.cc. Some references are using in nested helpers inside
the file, entangled with code that does other things. Removing all
references from the project is out of scope for this quick patch.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2026-January/705320.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2026-January/705482.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2026-January/704992.html
gcc/ChangeLog:
* match.pd: remove 'fold_overflow_warning' references.
* tree.h (TYPE_OVERFLOW_UNDEFINED): remove note telling
that we must use warn_strict_overflow for every optimization
based on TYPE_OVERFLOW_UNDEFINED.
gcc/testsuite/ChangeLog:
* gcc.dg/Wstrict-overflow-1.c: Removed because we no longer
issue a 'fold_overflow_warning' with the
`(le (minus (@0 INTEGER_CST@1)) INTEGER_CST@2)` pattern.
Signed-off-by: Daniel Barboza <daniel.barboza@oss.qualcomm.com>
As mentioned in https://gcc.gnu.org/pipermail/gcc-patches/2026-January/705657.html,
there were some redundant checks in this pattern. In the first if,
the check for pointer and OFFSET_TYPE is redundant as there is a check for
INTEGRAL_TYPE_P before hand. For the second one, the check for INTEGRAL_TYPE_P
on the inner most type is not needed as there is a types_match right afterwards
Pushed as obvious after bootstra/test on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`(T1)(a bit_op (T2)b)`): Remove redundant
type checks.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
While working on another issue I found that currently modules do not
work with coroutines at all. This patch fixes a number of issues in
both the coroutines logic and modules logic to ensure that they play
well together. To summarize:
- The coroutine proxy objects did not have a DECL_CONTEXT set (required
for modules to merge declarations).
- The coroutine transformation functions are always non-inline, even
for an inline ramp function, which means that modules need an override
to ensure the definitions are available where needed.
- Coroutine transformation functions were not marked DECL_COROUTINE_P,
despite accessors implying that they were.
- In an importing TU we had lost the connection between the ramp
functions and the transform functions, as they were kept in a pair
of global maps.
- Modules streaming couldn't discriminate between the actor or destroy
functions when merging.
- Modules streaming wasn't setting the cfun->coroutine_component flag,
needed to activate the middle-end coroutine lowering pass.
This patch also separates the coroutine_info_table initialization from
the ensure_coro_initialized function. If the first time we see a
coroutine is from a module import, we need to register the
transformation functions now but calling ensure_coro_initialized would
lookup e.g. std::coroutine_traits, which may only be visible from this
module that we're currently reading, causing a recursive load.
Separating the concerns allows this to work correctly.
gcc/cp/ChangeLog:
* coroutines.cc (create_coroutine_info_table): New function.
(get_or_insert_coroutine_info): Mark static.
(ensure_coro_initialized): Likewise; use
create_coroutine_info_table.
(coro_promise_type_found_p): Set DECL_CONTEXT for proxies.
(coro_set_ramp_function): New function.
(coro_set_transform_functions): New function.
(coro_build_actor_or_destroy_function): Use
coro_set_ramp_function, mark as DECL_COROUTINE_P.
* cp-tree.h (coro_set_transform_functions): Declare.
(coro_set_ramp_function): Declare.
* module.cc (struct merge_key): New field coro_disc.
(dumper::impl::nested_name): Distinguish coroutine transform
functions.
(get_coroutine_discriminator): New function.
(trees_out::key_mergeable): Stream coroutine discriminator.
(check_mergeable_decl): Adjust comment, check for matching
coroutine discriminator.
(trees_in::key_mergeable): Read coroutine discriminator.
(has_definition): Override for coroutine transform functions.
(trees_out::write_function_def): Stream linked ramp, actor, and
destroy functions for coroutines.
(trees_in::read_function_def): Read them.
(module_state::read_cluster): Set cfun->coroutine_component.
gcc/testsuite/ChangeLog:
* g++.dg/modules/coro-1_a.C: New test.
* g++.dg/modules/coro-1_b.C: New test.
Reviewed-by: Iain Sandoe <iain@sandoe.co.uk>
Reviewed-by: Jason Merrill <jason@redhat.com>
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
The set of lang_decl flags that we were streaming had gotten out of sync
with the current list; update them.
One notable change is that anticipated_p, which had previously been
deliberately skipped, is now only used for DECL_OMP_PRIVATIZED_MEMBER,
and so should probably be streamed as well.
gcc/cp/ChangeLog:
* module.cc (trees_out::lang_decl_bools): Update list of flags.
(trees_in::lang_decl_bools): Likewise.
Reviewed-by: Jason Merrill <jason@redhat.com>
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Implement (X >> C) NE/EQ 0 -> X LT/GE 0 in match.pd instead of fold-const.cc.
Bootstrapped and tested on x86_64 and aarch64.
PR tree-optimization/123109
gcc/ChangeLog:
* fold-const.cc (fold_binary_loc): Remove (X >> C) NE/EQ 0 -> X LT/GE 0
folding.
* match.pd (`(X >> C) NE/EQ 0 -> X LT/GE 0`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/vrp99.c: Update test.
* gcc.dg/pr123109.c: New test.
Signed-off-by: Pengxuan Zheng <pengxuan.zheng@oss.qualcomm.com>
This is a small reassociation for `a*bool & b` into `(a & b) * bool` checking if
`a & b` simplifies. Since it could be the case `b` is `~a` or `a` or something
else that might simplify when anding with `a`.
Note this fixes a regression for aarch64 where the cost of a multiply vs `&-` changed
in GCC 14 and can no longer optimize some cases at the RTL level.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119402
gcc/ChangeLog:
* match.pd (`(a*zero_one_valued_p) & b`): New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-14.c: New test.
* gcc.dg/tree-ssa/bitops-15.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The problem here is after some heurstics changes the check
loop is now unrolled so we eliminate the array. This means
the check for not having -2147483648 no longer works as
we don't handle SLP in this case.
So the best option is to force the check loop not to unroll
(no vectorize) as this is just testing we SLP the normal
signbit places rather than dealing with the checking loop.
Pushed as obvious after testing the testcase on aarch64-linux-gnu.
PR testsuite/122522
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/signbitv2sf.c (main): Disable
unrolling and vectorizer for the checking loop.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
When matching function arguments in composite_type_internal and one
type comes from a transparent union, it is possible to end up with
atomic and non-atomic types because this case is not handled correctly.
The type matching logic is rewritten in a cleaner way to use helper
functions and to not walk the argument lists three times. With this
change, a checking assertion can be added to test for matching qualifiers
for pointers. (In general, this assumption is still violated for
function return types.)
PR c/123309
gcc/c/ChangeLog:
* c-typeck.cc (transparent_union_replacement): New function.
(composite_type_internal): Rewrite logic.
(type_lists_compatible_p): Remove dead code for NULL arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123309.c: New test.
* gcc.dg/union-composite-type.c: New test.
This patch replaces uses of subscripts in heap algorithms, that where introduced
in r16-4100-gaaeca77a79a9a8 with dereference of advanced iterators.
The Cpp17RandomAccessIterator requirements, allows operator[] to return any
type that is convertible to reference, however user-provided comparators are
required only to accept result of dereferencing the iterator (i.e. reference
directly). This is visible, when comparator defines operator() for which
template arguments can be deduduced from reference (which will fail on proxy)
or that accepts types convertible from reference (see included tests).
For test we introduce a new proxy_random_access_iterator_wrapper iterator
in testsuite_iterators.h, that returns a proxy type from subscript operator.
This is separate type (instead of additional template argument and aliases),
as it used for test that work with C++98.
libstdc++-v3/ChangeLog:
* include/bits/stl_heap.h (std::__is_heap_until, std::__push_heap)
(std::__adjust_heap): Replace subscript with dereference of
advanced iterator.
* testsuite/util/testsuite_iterators.h (__gnu_test::subscript_proxy)
(__gnu_test::proxy_random_access_iterator_wrapper): Define.
* testsuite/25_algorithms/sort_heap/check_proxy_brackets.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
PR libfortran/123012
libgfortran/ChangeLog:
* io/list_read.c (read_character): Add new check after
get_string and provide better comments.
gcc/testsuite/ChangeLog:
* gfortran.dg/namelist_101.f90: New test.
After the current improvements to ifcvt, on some targets for
cmp?a&b:a it is better to produce `(cmp?b:-1) & a` rather than
`(!cmp?a:0)|(a & b)`. So this extends noce_try_cond_zero_arith (with
a rename to noce_try_cond_arith) to see if `cmp ? a : -1` is cheaper than
`!cmp?a:0`.
Bootstrapped and tested on x86_64-linux-gnu.
PR rtl-optimization/123312
gcc/ChangeLog:
* ifcvt.cc (noce_try_cond_zero_arith): Rename to ...
(noce_try_cond_arith): This. For AND try `cmp ? a : -1`
also to see which one cost less.
(noce_process_if_block): Handle the rename.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The gimple optimization passes can create negative shift counts and pass them
into the simplification routines as seen by the code in pr123530. If we then
call tree_to_uhwi on those values we get a nice little ICE.
This guards the tree_to_uhwi calls on tree_fits_uhwi_p and resolves the ICE. I
just protected them all in this recently added pattern.
Bootstrapped and regression tested on x86 and riscv. Also tested on the rest
of the embedded targets without any regressions.
Pushing to the trunk.
PR tree-optimization/123530
gcc/
* match.pd (reassociating xor to enable rotations): Verify constants
fit into a uhwi before trying to extract them as a uhwi.
gcc/testsuite/
* gcc.dg/torture/pr123530.c: New test.
The following fixes the fix from r16-6709-ga4716ece529dfd some
more by making sure permute to one operand folding faces same
element number vectors but also insert a VIEW_CONVERT_EXPR for
the case one is VLA and one is VLS (when the VLA case is actually
constant, like with -msve-vector-bits=128). It also makes the
assert that output and input element numbers match done in
fold_vec_perm which this pattern eventually dispatches to into
a check (as the comment already indicates).
Testcases are in the target specific aarch64 testsuite already.
PR middle-end/123573
* fold-const.cc (fold_vec_perm): Actually check, not assert,
that input and output vector element numbers agree.
* match.pd (vec_perm @0 @1 @2): Make sure element numbers
are the same when folding to an input vector and wrap that
inside a VIEW_CONVERT_EXPR.
This issue got raised after r16-6671 in which I removed checks for
number-of-element equality. In the splat case with conversion:
vector(16) int w;
vector(8) long int v;
_13 = BIT_FIELD_REF <w_12(D), 32, 160>;
_2 = (long int) _13;
_3 = (long int) _13;
...
_9 = (long int) _13;
_1 = {_2, _3, _4, _5, _6, _7, _8, _9};
right now we do
_16 = VEC_PERM_EXPR <w_12(D), w_12(D), { 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5 }>;
_17 = VIEW_CONVERT_EXPR<vector(8) intD.6>(_16);
where the view convert is actually an optimized
_17 = BIT_FIELD_REF (_16, 512, 0);
512 is the size of the unconverted source but we should actually use the
converted source type. That's what this patch does.
PR tree-optimization/123525
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor): Use
converted source type for conversion bit field ref.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/pr123525.c: New test.
* g++.dg/vect/pr123525-2.cc: New test.
Currently we allow vector types in scalar conditional reductions by
accident (via the GNU vector extension). This patch prevents that.
PR tree-optimization/123301
gcc/ChangeLog:
* tree-if-conv.cc (convert_scalar_cond_reduction):
Disallow vector types.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr123301.c: New test.
The PR involves large mask vectors (e.g. V128BI) from which we take
the popcount. Currently a (popcount:DI (V128BI)) is assumed to have
at most 8 set bits as we assume the popcount operand also has DImode.
This patch uses the operand mode for unary operations and thus
calculates a proper nonzero-bits mask.
We could do the same estimate for ctz and clz but they use nonzero in a
non-poly way and I didn't want to change more than necessary. Therefore
the patch just returns -1 when we have a different operand mode for
ctz/clz.
PR rtl-optimization/123501
PR rtl-optimization/123444
gcc/ChangeLog:
* rtlanal.cc (nonzero_bits1): Use operand mode instead of
operation mode.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/reduc/pr123501.c: New test.
The change/rationale that commit 1cf9fda493
"amdgcn: Adjust failure mode for gfx908 USM" applied to a number of test cases
likewise applies to 'libgomp.fortran/map-alloc-comp-9-usm.f90'.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-9-usm.f90: Require
working Unified Shared Memory to run the test.
'libgomp.oacc-c-c++-common/vred2d-128.c' had gotten '-Wno-deprecated-openmp'
applied as part of commit 382edf047e
"openmp: Bump Version from 4.5 to 5.2 (2/4)", which conceptually doesn't make
sense, as 'libgomp.oacc-c-c++-common/vred2d-128.c' isn't an OpenMP test case.
In commit 9c119b0fdd
"openmp: Limit - reduction -Wdeprecated-openmp diagnostics to OpenMP, testsuite fixes [PR123098]",
the erroneous diagnostic got disabled, so we don't need
'-Wno-deprecated-openmp' anymore.
PR testsuite/123098
libgomp/
* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Remove
'-Wno-deprecated-openmp'.
On Mon, Jan 12, 2026 at 12:13:35PM +0100, Florian Weimer wrote:
> One way to work around the libtool problem would be to stick the
> as-needed into an existing .so linker script, or create a new one under
> a different name (say libatomic_optional.so) that has AS_NEEDED in it,
> and link with -latomic_optional. Then libtool would not have to be
> taught about --push-state/--pop-state etc.
That seems to work.
So far bootstrapped (c,c++,fortran,lto only) and make install tested
on x86_64-linux, tested on a small program without need to libatomic and
struct S { char a[25]; };
_Atomic struct S s;
int main () { struct S t = s; s = t; }
which does at -O0.
Before this patch I got
for i in `find x86_64-pc-linux-gnu/ -name lib\*.so.\*.\*`; do ldd -u $i 2>&1 | grep -q libatomic.so.1 && echo $i; done
x86_64-pc-linux-gnu/libsanitizer/ubsan/.libs/libubsan.so.1.0.0
x86_64-pc-linux-gnu/libsanitizer/asan/.libs/libasan.so.8.0.0
x86_64-pc-linux-gnu/libsanitizer/hwasan/.libs/libhwasan.so.0.0.0
x86_64-pc-linux-gnu/libsanitizer/lsan/.libs/liblsan.so.0.0.0
x86_64-pc-linux-gnu/libsanitizer/tsan/.libs/libtsan.so.2.0.0
x86_64-pc-linux-gnu/32/libsanitizer/ubsan/.libs/libubsan.so.1.0.0
x86_64-pc-linux-gnu/32/libsanitizer/asan/.libs/libasan.so.8.0.0
x86_64-pc-linux-gnu/32/libstdc++-v3/src/.libs/libstdc++.so.6.0.35
x86_64-pc-linux-gnu/libgcobol/.libs/libgcobol.so.2.0.0
x86_64-pc-linux-gnu/libstdc++-v3/src/.libs/libstdc++.so.6.0.35
With this patch it prints nothing.
2026-01-13 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/123396
gcc/
* configure.ac (gcc_cv_ld_use_as_needed_ldscript): New test.
(USE_LD_AS_NEEDED_LDSCRIPT): New AC_DEFINE.
* gcc.cc (LINK_LIBATOMIC_SPEC): Use "-latomic_asneeded" instead
of LD_AS_NEEDED_OPTION " -latomic " LD_NO_AS_NEEDED_OPTION
if USE_LD_AS_NEEDED_LDSCRIPT is defined.
(init_gcc_specs): Use "-lgcc_s_asneeded" instead of
LD_AS_NEEDED_OPTION " -lgcc_s " LD_NO_AS_NEEDED_OPTION
if USE_LD_AS_NEEDED_LDSCRIPT is defined.
* config.in: Regenerate.
* configure: Regenerate.
libatomic/
* acinclude.m4 (LIBAT_BUILD_ASNEEDED_SOLINK): New AM_CONDITIONAL.
* libatomic_asneeded.so: New file.
* libatomic_asneeded.a: New file.
* Makefile.am (toolexeclib_DATA): Set if LIBAT_BUILD_ASNEEDED_SOLINK.
(all-local): Install those files into gcc subdir.
* Makefile.in: Regenerate.
* configure: Regenerate.
libgcc/
* config/t-slibgcc (SHLIB_ASNEEDED_SOLINK,
SHLIB_MAKE_ASNEEDED_SOLINK, SHLIB_INSTALL_ASNEEDED_SOLINK): New
vars.
(SHLIB_LINK): Include $(SHLIB_MAKE_ASNEEDED_SOLINK).
(SHLIB_INSTALL): Include $(SHLIB_INSTALL_ASNEEDED_SOLINK).
2026-01-14 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/112460
* array.cc (resolve_array_list): Stash the first PDT element
and check its type specification parameters against those of
subsequent elements.
* expr.cc (get_parm_list_from_expr): New function to extract the
type spec lists from expressions to be compared.
(gfc_check_type_spec_parms): New function to compare type spec
lists between two expressions. Emit an error if any constant
values are different.
(gfc_check_assign): Check that the PDT type specification parms
are the same on lhs and rhs.
* gfortran.h : Add prototype for gfc_check_type_spec_parms.
* trans-expr.cc (copyable_array_p): PDT arrays are not copyable
gcc/testsuite
PR fortran/112460
* gfortran.dg/pdt_81.f03: New test.
With previous changes I overlooked one use of vectype.
PR tree-optimization/123539
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Use the compute vectype to pun down to smaller or element
size for by-element reductions.
store_merging_19.c is almost the same as store_merging_18.c except
it has assume align in it to allow it work on strict align targets.
Somehow when I was looking at the testresults I noticed 18 but not 19
when I was looking into failures.
Pushed as obvious.
gcc/testsuite/ChangeLog:
* gcc.dg/store_merging_19.c: xfail.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
gcc.dg/torture/bitint-18.c triggers an ICE in push_partial_def when
compiling for RISC-V with -O2. The issue occurs because
build_nonstandard_integer_type cannot handle bit widths larger than
MAX_FIXED_MODE_SIZE.
For BITINT_TYPE with maxsizei > MAX_FIXED_MODE_SIZE, use build_bitint_type
instead of build_nonstandard_integer_type, similar to what tree-sra.cc does.
gcc/ChangeLog:
* tree-ssa-sccvn.cc (vn_walk_cb_data::push_partial_def): Use
build_bitint_type for BITINT_TYPE when maxsizei exceeds
MAX_FIXED_MODE_SIZE.
This patch implements _BitInt support for RISC-V target by defining the
type layout and ABI requirements. The limb mode selection is based on
the bit width, using appropriate integer modes from QImode to TImode.
The implementation also adds the necessary libgcc version symbols for
_BitInt runtime support functions.
Changes in v3:
- Require sync_char_short effective target for bitint-64.c, bitint-82.c
and bitint-84.c tests since they use atomic operations.
- Add -fno-section-anchors to bitint-32-on-rv64.c and adjust expected
assembly output patterns.
Changes in v2:
- limb_mode use up to XLEN when N > XLEN, which is different setting from
the abi_limb_mode.
- Adding missing floatbitinthf in libgcc.
gcc/ChangeLog:
PR target/117581
* config/riscv/riscv.cc (riscv_bitint_type_info): New function.
(TARGET_C_BITINT_TYPE_INFO): Define.
gcc/testsuite/ChangeLog:
PR target/117581
* gcc.dg/torture/bitint-64.c: Add sync_char_short effective target
requirement.
* gcc.dg/torture/bitint-82.c: Likewise.
* gcc.dg/torture/bitint-84.c: Likewise.
* gcc.target/riscv/bitint-32-on-rv64.c: New test.
* gcc.target/riscv/bitint-alignments.c: New test.
* gcc.target/riscv/bitint-args.c: New test.
* gcc.target/riscv/bitint-sizes.c: New test.
libgcc/ChangeLog:
PR target/117581
* config/riscv/libgcc-riscv.ver: New file.
* config/riscv/t-elf (SHLIB_MAPFILES): Add libgcc-riscv.ver.
* config/riscv/t-softfp32 (softfp_extras): Add floatbitinttf and
fixtfbitint.
This adds the simpliciation of:
```
<unnamed-signed:3> _1;
_2 = (signed char) _1;
_3 = _2 ^ -47;
_4 = (<unnamed-signed:3>) _3;
```
to:
```
<unnamed-signed:3> _n;
_4 = _1 ^ -47;
```
This also fixes PR 122843 by optimizing out the xor such that we get:
```
_1 = b.a;
_21 = (<unnamed-signed:3>) t_23(D);
// t_23 in the original testcase was 200 so this is reduced to 0
_5 = _1 ^ _21;
# .MEM_24 = VDEF <.MEM_13>
b.a = _5;
```
And then there is no cast catch this pattern:
`(bit_xor (convert1? (bit_xor:c @0 @1)) (convert2? (bit_xor:c @0 @2)))`
As we get:
```
_21 = (<unnamed-signed:3>) t_23(D);
_5 = _1 ^ _21;
_22 = (<unnamed-signed:3>) t_23(D);
_7 = _5 ^ _22;
_25 = (<unnamed-signed:3>) t_23(D);
_8 = _7 ^ _25;
_26 = (<unnamed-signed:3>) t_23(D);
_9 = _7 ^ _26;
```
After unrolling and then fre will optimize away all of those xor.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/122845
PR tree-optimization/122843
gcc/ChangeLog:
* match.pd (`(T1)(a bit_op (T2)b)`): Also
simplify if T1 is the same type as b and T2 is wider
type than T1.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/bitops-12.c: New test.
* gcc.dg/tree-ssa/bitops-13.c: New test.
* gcc.dg/store_merging_18.c: xfail store merging.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
PR fortran/91960
gcc/fortran/ChangeLog:
* resolve.cc (resolve_fl_parameter): Check the righthand symbol
is a constant expression.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr69962.f90: Adjust testcase to ignore new error message.
* gfortran.dg/pr91960_1.f90: New test.
* gfortran.dg/pr91960_2.f90: New test.
Since we now defer noexcept parsing for templated friends, a couple of
routines related to deferred parsing need to be updated to cope with friend
template specializations -- their TI_TEMPLATE is a TREE_LIST rather than
a TEMPLATE_DECL, and they don't introduce new template parameters.
PR c++/123189
gcc/cp/ChangeLog:
* name-lookup.cc (binding_to_template_parms_of_scope_p):
Gracefully handle TEMPLATE_INFO whose TI_TEMPLATE is a TREE_LIST.
* pt.cc (maybe_begin_member_template_processing): For a friend
template specialization consider its class context instead.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/noexcept92.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Implicit constexpr makes the use of x disappear, avoiding the exposure and
thus the diagnostic.
gcc/testsuite/ChangeLog:
* g++.dg/modules/internal-17_b.C: Add -fno-implicit-constexpr.