Commit Graph

223579 Commits

Author SHA1 Message Date
David Malcolm
21fe45f111 diagnostics/libcpp: convert enum location_aspect to enum class
Modernization; no functional change intended.

gcc/ChangeLog:
	* diagnostics/paths-output.cc: Update for conversion of
	location_aspect to enum class.
	* diagnostics/source-printing.cc: Likewise.
	* input.cc: Likewise.
	* input.h: Likewise.

libcpp/ChangeLog:
	* include/line-map.h (enum location_aspect): Convert to...
	(enum class location_aspect): ...this.
	* line-map.cc: Update for conversion of location_aspect to enum
	class.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-18 16:07:04 -04:00
David Malcolm
60d7488008 diagnostics: use diagnostic.h in fewer places
No functional change intended.

gcc/ChangeLog:
	* diagnostics/buffering.cc: Drop include of "diagnostic.h".
	* diagnostics/buffering.h: Likewise.
	* diagnostics/context.h (diagnostics::metadata): Add forward decl.
	* diagnostics/html-sink.cc: Drop include of "diagnostic.h".
	* diagnostics/lazy-paths.cc: Likewise.
	* diagnostics/macro-unwinding.cc: Likewise.
	* diagnostics/macro-unwinding.h (diagnostics:diagnostic_info): Add
	forward decl.
	* diagnostics/option-classifier.h: Include
	"diagnostics/option-id.h" and "diagnostics/kinds.h".
	(diagnostics:diagnostic_info): Add forward decl.
	* diagnostics/output-spec.cc: Drop include of "diagnostic.h".
	* diagnostics/paths-output.cc: Likewise.
	* diagnostics/paths.cc: Likewise.
	* diagnostics/sarif-sink.cc: Likewise.
	* diagnostics/selftest-context.cc: Likewise.
	* diagnostics/selftest-paths.cc: Likewise.
	* diagnostics/source-printing-options.h: Include
	"rich-location.h".
	* diagnostics/text-sink.cc: Drop include of "diagnostic.h".

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-18 16:06:39 -04:00
Qing Zhao
f613fdc692 Fix SRA issue with -ftrivial-auto-var-init= [PR121894]
In tree-sra.cc, for the following stmt (initialization in source code):
s = {};

for the above lhs "s", the field "grp_assignment_write" of the created
struct access is 1;

however, for the following stmt (compiler added initialization):
s = .DEFERRED_INIT (size, init_type, &"s"[0]);

for the above lhs "s", the field "grp_assignment_write" of the created
struct access is 0;

Since the field "grp_assignment_write" of the struct access for the
corresponding LHS "s" is not set correctly when the RHS is .DEFERRED_INIT,
SRA phase didn't do a correct transformation for call to .DEFERRED_INIT.

To fix this issue, we should set the field "grp_assignment_write" correctly
for .DEFERRED_INIT.

	PR tree-optimization/121894

gcc/ChangeLog:

	* tree-sra.cc (scan_function): Set grp_assignment_write to 1 when
	specially handle call to .DEFERRED_INIT.

gcc/testsuite/ChangeLog:

	* g++.dg/opt/auto-init-sra-pr121894.C: New test.
2025-09-18 18:31:33 +00:00
Paul Thomas
c52c745c98 Fortran: Implement PDT constructors with syntax variants [PR114815]
2025-09-18  Paul Thomas  <pault@gcc.gnu.org>

gcc/fortran
	PR fortran/114815
	* decl.cc (gfc_get_pdt_instance): Copy the contents of 'tb' and
	not the pointer.
	* primary.cc (gfc_match_rvalue): If there is only one actual
	argument list, use if for the type spec parameter values. If
	this fails try the default type specification values and use
	the actual arguments for the component values.
	* resolve.cc (build_init_assign): Don't initialize implicit PDT
	function results.

gcc/testsuite/
	PR fortran/114815
	* gfortran.dg/pdt_3.f03: Add missing deallocation of 'matrix'.
	* gfortran.dg/pdt_17.f03: Change dg-error text.
	* gfortran.dg/pdt_47.f03: New test.
2025-09-18 19:00:08 +01:00
Zhongyao Chen
642504b41c RISC-V: Correct lmul estimation
The vectorizer's compute_estimated_lmul function could previously
return a bad value when the estimated lmul was larger than RVV_M8.
This is corrected to return RVV_M8, preventing a register spill.

The patch includes a new regression test for PR target/121910, based
on the x264 mc_chroma function. The test uses scan-tree-dump to
confirm that the compiler chooses the expected vector mode (RVVM1QI)
at -O3, verifying the fix.

	PR target/121910
gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (compute_estimated_lmul):
	Return RVV_M8 when estimated lmul is too large.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/pr121910.c: New file.
2025-09-18 09:32:20 -06:00
Jakub Jelinek
b49f1dad54 openmp: Fix up ICE in lower_omp_regimplify_operands_p [PR121977]
The following testcase ICEs in functions called from
lower_omp_regimplify_operands_p, because maybe_lookup_decl returns
NULL for this (on the outer taskloop context) when regimplifying the
taskloop pre body.  If it isn't found in current context, we should
look in outer ones.

2025-09-18  Jakub Jelinek  <jakub@redhat.com>

	PR c++/121977
	* omp-low.cc (lower_omp_regimplify_operands_p): If maybe_lookup_decl
	returns NULL, use maybe_lookup_decl_in_outer_ctx as fallback.

	* g++.dg/gomp/pr121977.C: New test.
2025-09-18 16:41:32 +02:00
Karl Meakin
1d0a5e9fcb AArch64: Add SME LUTv2 intrinsics
Add intrinsic functions for the SME LUTv2 architecture extension
(`svluti4_zt`, `svwrite_lane_zt` and `svwrite_zt`).

gcc/ChangeLog:

	* config/aarch64/aarch64-sme.md (@aarch64_sme_write_zt<SVE_FULL:mode>): New insn.
	(aarch64_sme_lut_zt): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.cc (parse_type): New type format "%T".
	(struct luti_lane_zt_base): New function shape.
	(SHAPE): Likewise.
	(struct write_zt_def): Likewise.
	(struct write_lane_zt_def): Likewise.
	* config/aarch64/aarch64-sve-builtins-shapes.h: New function shape.
	* config/aarch64/aarch64-sve-builtins-sme.cc (class svluti_zt_impl): New function expander.
	(class svwrite_zt_impl): Likewise.
	(class svwrite_lane_zt_impl): Likewise.
	(FUNCTION): Likewise
	* config/aarch64/aarch64-sve-builtins-sme.def (svwrite_zt): New function shape.
	(svwrite_lane_zt): Likewise.
	(svluti4_zt): Likewise.
	* config/aarch64/aarch64-sve-builtins-sme.h: New function base.
	* config/aarch64/aarch64-sve-builtins.h: Mention the arrays of function_group_info by name.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sme2/acle-asm/svluti4_zt_1.c: New test.
	* gcc.target/aarch64/sme2/acle-asm/svwrite_lane_zt_1.c: New test.
	* gcc.target/aarch64/sme2/acle-asm/svwrite_zt_1.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/svluti4_zt_1.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/svwrite_lane_zt_1.c: New test.
	* gcc.target/aarch64/sve/acle/general-c/svwrite_zt_1.c: New test.
2025-09-18 15:15:49 +01:00
Karl Meakin
45ddf55353 AArch64: Add SME LUTv2 architecture extension
Add the SME LUTv2 architecture extension. Users can enable the extension
by adding `+sme-lutv2` to `-march` or `-mcpu`, and test for its presence
with the `__ARM_FEATURE_SME_LUTv2` macro. The intrinsics will be added
in the next commit.

gcc/ChangeLog:

	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Conditonally define
	`__ARM_FEATURE_SME_LUTv2" macro.
	* config/aarch64/aarch64-option-extensions.def (AARCH64_OPT_EXTENSION("sme-lutv2")): New
	optional architecture extension.
	* config/aarch64/aarch64.h (TARGET_SME_LUTv2): New macro.
	* doc/invoke.texi: Document `+sme-lutv2` flag.
2025-09-18 15:15:45 +01:00
Pan Li
80e85c627a RISC-V: Add test case of unsigned scalar SAT_MUL form 5 for widen-mul
The form 5 of unsigned scalar SAT_MUL is covered in middle-expand
alreay, add test case here to cover form 5.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/sat/sat_arith.h: Add test helper macros.
	* gcc.target/riscv/sat/sat_u_mul-6-u16-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-6-u32-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-6-u64-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-6-u8-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-run-6-u16-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-run-6-u32-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-run-6-u64-from-u128.c: New test.
	* gcc.target/riscv/sat/sat_u_mul-run-6-u8-from-u128.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-18 21:47:25 +08:00
Pan Li
f9c72bc02a Match: Add form 5 of unsigned SAT_MUL for widen-mul
This patch would like to try to match the the unsigned
SAT_MUL form 4, aka below:

  #define DEF_SAT_U_MUL_FMT_5(NT, WT)             \
  NT __attribute__((noinline))                    \
  sat_u_mul_##NT##_from_##WT##_fmt_5 (NT a, NT b) \
  {                                               \
    WT x = (WT)a * (WT)b;                         \
    NT hi = x >> (sizeof(NT) * 8);                \
    NT lo = (NT)x;                                \
    return lo | -!!hi;                            \
  }

  while WT is uint128_t, T is uint8_t, uint16_t, uint32_t or uint64_t.

gcc/ChangeLog:

	* match.pd: Add pattern for SAT_MUL form 5.
	* tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children):
	Try match pattern for IOR.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-18 21:47:25 +08:00
Jan Hubicka
8c6b6adce4 Fix verification ICE after ipa-cp
I managed to reproduce the ICE.  The error is verification error about
callgraph edge count being different from basic block count of the
corresponding call stmt.  This verification is only done for local profiles,
since IPA profiles are scaled during inlining, cloning and other
transformations.

Ipa-cp has logic special casis self recursive functions and it adjust
probability of the recursion to avoid non-sential IPA profiles.  Normally
this is not done for local profiles, however in case the IPA profile
is broken enought the earlier logic will push clone profiles to be local
which in turn causes the verifier error.

This patch simply disables the update.  Its main purpose is to keep IPA
profile seemingly meaningful and the upate is never applied back to
gimple code.  Alternative would be to add machinery to adjust frequencies
of call edges, but I do not think it is worth the effort at this moment.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

	* ipa-cp.cc (update_counts_for_self_gen_clones): Do not update
	call frequency for local profiles.
2025-09-18 14:16:30 +02:00
Richard Biener
d5e80bf757 tree-optimization/87615 - VN predication is expensive
The following restricts the number of locations we register a predicate
as valid which avoids the expensive linear search for cases like

   if (a)
     A;
   if (a)
     B;
   if (a)
     C;
   ...

where we register a != 0 as true for locations A, B, C ... in an
unlimited way.  The patch simply choses 8 as limit.  The underlying
issue for this case is the data structure which does not allow for
easy "forgetting" or more optimal searching when locations become
no longer relevant (the whole point of the location list is to
represent where predicates are relevant).

The patch also splits the search/copy loop into two to avoid copying
stuff that we'll not need when finding an existing better entry or,
new now, when we figure we run over the limit.

	PR tree-optimization/87615
	* tree-ssa-sccvn.cc (vn_nary_op_insert_into): When inserting
	a new predicate or location into an existing predicate list
	make sure to not exceed 8 locations.  Avoid copying things
	when we later eventually throw them away.
	(vn_nary_op_insert_pieces_predicated): Avoid expensive check
	when not checking.
	(dominated_by_p_w_unex): Apply the limit on a single successors
	predecessor count consistently.
2025-09-18 13:09:33 +02:00
Tobias Burnus
97c1d2fa97 OpenMP: Unshare expr in context-selector condition [PR121922]
As the testcase shows, a missing unshare_expr caused that the condition
was only evaluated once instead of every time when a 'declare variant'
was resolved.

	PR middle-end/121922

gcc/ChangeLog:

	* omp-general.cc (omp_dynamic_cond): Use 'unshare_expr' for
	the user condition.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/declare-variant-1.c: New test.

Co-authored-by: Sandra Loosemore <sloosemore@baylibre.com>
2025-09-18 11:07:50 +02:00
Richard Biener
c30f58c3f7 tree-optimization/121720 - missed PRE hoisting
The following re-implements the fix for PR84830 where the original
fix causes missed optimizations.  The issue with PR84830 is that
we end up growing ANTIC_IN value set during iteration which happens
because we conditionally prune values based on ANTIC_OUT - TMP_GEN
expressions.  But when ANTIC_OUT was computed including the
MAX set on one edge we fail to take into account the implicitly
represented MAX expression set.  The following rectifies this by
not pruning the value set in bitmap_set_subtract_expressions in
such case.  This avoids the pruning from the ANTIC_IN value
set when MAX is involved and thus later growing, removing the
need to explicitly prune it with the last iteration set.

	PR tree-optimization/121720
	* tree-ssa-pre.cc (bitmap_set_subtract_expressions): Add
	flag to tell whether we should copy instead of prune the
	value set.
	(compute_antic_aux): Remove intersection of ANTIC_IN with
	the old solution.  When subtracting TMP_GEN from
	ANTIC_OUT do not prune the value set when MAX was involved
	in the ANTIC_OUT computation.

	* gcc.dg/tree-ssa/ssa-pre-36.c: New testcase.
2025-09-18 09:44:15 +02:00
Jakub Jelinek
c1e1691b95 libstdc++: Implement C++23 P2590R2 - Explicit lifetime management [PR106658]
As I can't think of how the middle-end would treat
__builtin_start_lifetime_as other than a blackbox and probably would
need to be implemented as such inline asm in RTL, this patch
just implements it using inline asm in the library.
If not anything else, it can serve as fallback before we and/or clang
get some builtin for it.

Right now the inline asms pretend (potential) read from and write to the whole
memory region and make optimizers forget where the return value points to.
If the optimizers don't know where it points to, I think that should be
good enough, but I'm a little bit afraid of possibly future optimizations
trying to optimize
  q->c = 1;
  q->d = 2;
  auto p = std::start_lifetime_as<S>(q);
  if (p == reinterpret_cast<decltype (p)>(q))
    return p->a + p->b;
that because of the guarding condition or perhaps assertion we could
simply use the q pointer in MEM_REFs with S type and be surprised by TBAA.
Though if it is a must-alias case, then we should be fine as well.
Though guess that would be the same case with a builtin.

2025-09-18  Jakub Jelinek  <jakub@redhat.com>

	PR c++/106658
	* include/bits/version.def: Implement C++23 P2590R2 - Explicit
	lifetime management.
	(start_lifetime_as): New.
	* include/bits/version.h: Regenerate.
	* include/std/memory (std::start_lifetime_as,
	std::start_lifetime_as_array): New function templates.
	* src/c++23/std.cc.in (std::start_lifetime_as,
	std::start_lifetime_as_array): Export.
	* testsuite/std/memory/start_lifetime_as/start_lifetime_as.cc: New test.
2025-09-18 07:44:54 +02:00
hongtao.liu
dd713d0f3f Remove SPR/GNR/DMR from avx512_{move,store}_by pieces tune.
Align move_max with prefer_vector_width for SPR/GNR/DMR similar as
below commit.

commit 6ea25c0419
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Aug 15 12:54:07 2024 +0800

    Align ix86_{move_max,store_max} with vectorizer.

    When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
    avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
    set ix86_{move_max,store_max} as max available vector length except
    for AVX part.

                  if (TARGET_AVX512F_P (opts->x_ix86_isa_flags)
                      && TARGET_EVEX512_P (opts->x_ix86_isa_flags2))
                    opts->x_ix86_move_max = PVW_AVX512;
                  else
                    opts->x_ix86_move_max = PVW_AVX128;

    So for -mavx2, vectorizer will choose 256-bit for vectorization, but
    128-bit is used for struct copy, there could be a potential STLF issue
    due to this "misalign".

gcc/ChangeLog:

	* config/i386/x86-tune.def (X86_TUNE_AVX512_MOVE_BY_PIECES):
	Remove SPR/GNR/DMR.
	(X86_TUNE_AVX512_STORE_BY_PIECES): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pieces-memcpy-18.c: Use -mtune=znver5
	instead of -mtune=sapphirerapids.
	* gcc.target/i386/pieces-memcpy-21.c: Ditto.
	* gcc.target/i386/pieces-memset-46.c: Ditto.
	* gcc.target/i386/pieces-memset-49.c: Ditto.
2025-09-17 18:35:52 -07:00
GCC Administrator
9bd24f83a1 Daily bump. 2025-09-18 00:20:40 +00:00
David Malcolm
41f071a64f c++: improve nesting in print_z_candidate [PR121966]
Comment #2 of PR c++/121966 notes that the "inherited here" messages
should be nested *within* the note they describe.

Implemented by this patch, which also nests other notes emitted for
rejection_reason within the first note of print_z_candidate.

gcc/cp/ChangeLog:
	PR c++/121966
	* call.cc (print_z_candidate): Consolidate instances of
	auto_diagnostic_nesting_level into one, above the "inherited here"
	message so that any such message is nested within the note,
	and any messages emitted due to the switch on rejection_reason are
	similarly nested within the note.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-17 16:39:32 -04:00
David Malcolm
c0b21d1f45 c++: fix count of z candidates for non-viable candidates, nesting [PR121966]
In r15-6116-gd3dd24acd74605 I updated print_z_candidates to show the
number of candidates, and a number for each candidate.

PR c++/121966 notes that the printed count is sometimes higher than
what's actually printed: I missed the case where candidates in the
list aren't printed due to not being viable.

Fixed thusly.

gcc/cp/ChangeLog:
	PR c++/121966
	* call.cc (print_z_candidates): Copy the filtering logic on viable
	candidates from the printing loop to the counting loop, so that
	num_candidates matches the number of iterations of the latter
	loop.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-17 16:39:31 -04:00
David Malcolm
76fd69ef3d testsuite: add 'std-' prefix to c++ analyzer test cases
gcc/testsuite/ChangeLog:
	* g++.dg/analyzer/unique_ptr-1.C: Rename to...
	* g++.dg/analyzer/std-unique_ptr-1.C: ...this.
	* g++.dg/analyzer/unique_ptr-2.C: Rename to...
	* g++.dg/analyzer/std-unique_ptr-2.C: ...this.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-17 16:39:31 -04:00
David Malcolm
ddabda614f sarif-replay: fix uninitialized m_debug_physical_locations
In r16-2766-g7969e4859ed007 I added a new field to replay_opts
but forgot to initialize it in set_defaults.

Fixed thusly.

Spotted thanks to valgrind.

gcc/ChangeLog:
	* sarif-replay.cc (set_defaults): Initialize
	m_debug_physical_locations.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-09-17 16:39:31 -04:00
Andrew Pinski
2664206495 uninclude: Add lib/gcc/<anything>/include as an possible include dir
While running uninclude on PR99912's preprocessed source uninclude
didn't uninclude some of the x86_64 target headers. This was because
`lib/gcc/<anything>/include` was not noticed as an possible system
include dir. It supported `gcc-lib/<anything>/include` though.

contrib/ChangeLog:

	* uninclude: Add `lib/gcc/<anything>/include`.
2025-09-17 13:15:26 -07:00
Andrew Pinski
a7a9f0de4f forwprop: Fix up "nop" copies after recent changes [PR121962]
After r16-3887-g597b50abb0d2fc, the check to see if the copy is
a nop copy becomes inefficient. The code going into an infinite
loop as the copy keeps on being propagated over and over again.

That is if we have:
```
  struct s1 *b = &a.t;
  a.t = *b;
  p = *b;
```

This goes into an infinite loop propagating over and over again the
`MEM[&a]`.
To solve this a new function is needed for the comparison that is
similar to new_src_based_on_copy.

	PR tree-optimization/121962

gcc/ChangeLog:

	* tree-ssa-forwprop.cc (same_for_assignment): New function.
	(optimize_agr_copyprop_1): Use same_for_assignment to check for
	nop copies.
	(optimize_agr_copyprop): Likewise.

gcc/testsuite/ChangeLog:

	* gcc.dg/torture/pr121962-1.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-09-17 11:36:42 -07:00
Andrew Pinski
4b83df548f forwprop: Add a quick out for new_src_based_on_copy when both are decls
If both operands that are being compared are decls, operand_equal_p will already
handle that case so an early out can be done here.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* tree-ssa-forwprop.cc (new_src_based_on_copy): An early out
	if both are decls.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-09-17 11:36:05 -07:00
Andrew Pinski
09e1ba1cc0 forwprop: Handle memcpy for arguments with respect to copies
This moves the code used in optimize_agr_copyprop_1 (r16-3887-g597b50abb0d)
to handle this same case into its new function and use it inside
optimize_agr_copyprop_arg. This allows to remove more copies that show up only
in arguments.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

	* tree-ssa-forwprop.cc (optimize_agr_copyprop_1): Split out
	the case where `operand_equal_p (dest, src2)` is false into ...
	(new_src_based_on_copy): This. New function.
	(optimize_agr_copyprop_arg): Use new_src_based_on_copy
	instead of operand_equal_p to find the new src.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/copy-prop-aggregate-arg-2.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-09-17 11:35:39 -07:00
Patrick Palka
3268c47c08 libstdc++/ranges: Fix more wrong value type init from reference type [PR111861]
As in r16-3912-g412a1f78b53709, this fixes some other spots where we
wrongly use a deduced type and non-direct-initialization when trying
to initialize a value type from an iterator's reference type.

	PR libstdc++/111861

libstdc++-v3/ChangeLog:

	* include/bits/ranges_algo.h (ranges::unique_copy): When
	initializing a value type object from *iter, use
	direct-initialization and don't use a deduced type.
	(ranges::push_heap): Use direct-initialization when initializing
	a value type object from ranges::iter_move.
	(ranges::max): As in ranges::unique_copy.
	* include/bits/ranges_util.h (ranges::min): Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-09-17 14:14:37 -04:00
Thomas Koenig
9a68895fee Implement -fexternal-blas64 option.
Libraries like Intel MKL use 64-bit integers in their API, but gfortran
up to now only provides external BLAS for matmul with 32-bit
integers.  This straightforward patch provides a new option -fexternal-blas64
to remedy that situation.

gcc/fortran/ChangeLog:

	* frontend-passes.cc (optimize_namespace): Handle
	flag_external_blas64.
	(call_external_blas): If flag_external_blas is set, use
	gfc_integer_4_kind as the argument kind, gfc_integer_8_kind otherwise.
	* gfortran.h (gfc_integer_8_kind): Define.
	* invoke.texi: Document -fexternal-blas64.
	* lang.opt: Add -fexternal-blas64.
	* lang.opt.urls: Regenerated.
	* options.cc (gfc_post_options): -fexternal-blas is incompatible
	with -fexternal-blas64.

gcc/testsuite/ChangeLog:

	* gfortran.dg/matmul_blas_3.f90: New test.
2025-09-17 18:50:22 +02:00
Shreya Munnangi
cda451531c [PR tree-optimization/58727] Don't over-simplify constants`
Here's Shreya's next patch.

In pr58727 we have a case where the tree/gimple optimizers have decided to
"simplify" constants involved in logical ops by turning off as many bits as
they can in the hope that the simplified constant will be easier/smaller to
encode.  That "simplified" constant gets passed down into the RTL optimizers
where it can ultimately cause a missed optimization.

Concretely let's assume we have insns 6, 7, 8 as shown in the combine dump
below:

> Trying 6, 7 -> 9:
>     6: r139:SI=r141:SI&0xfffffffffffffffd
>       REG_DEAD r141:SI
>     7: r140:SI=r139:SI&0xffffffffffbfffff
>       REG_DEAD r139:SI
>     9: r137:SI=r140:SI|0x2
>       REG_DEAD r140:SI

We can obviously see that insn 6 is redundant as the bit we turn off would be
turned on by insn 9.  But combine ultimately tries to generate:

> (set (reg:SI 137 [ _3 ])
>     (ior:SI (and:SI (reg:SI 141 [ a ])
>             (const_int -4194305 [0xffffffffffbffffd]))
>         (const_int 2 [0x2])))

That does actually match a pattern on RISC-V, but it's a pattern that generates
two bit-clear insns (or a bit-clear followed by andi and a pattern we'll be
removing someday).  But if instead we IOR 0x2 back into the simplified constant
we get:

> (set (reg:SI 137 [ _3 ])
>     (ior:SI (and:SI (reg:SI 141 [ a ])
>             (const_int -4194305 [0xffffffffffbfffff]))
>         (const_int 2 [0x2])))

That doesn't match, but when split by generic code in the combiner we get:

> Successfully matched this instruction:
> (set (reg:SI 140)
>     (and:SI (reg:SI 141 [ a ])
>         (const_int -4194305 [0xffffffffffbfffff])))
> Successfully matched this instruction:
> (set (reg:SI 137 [ _3 ])
>     (ior:SI (reg:SI 140)
>         (const_int 2 [0x2])))

Which is bclr+bset/ori.  ie, we dropped one of the logical AND operations.

Bootstrapped and regression tested on x86 and riscv.  Regression tested on the
30 or so embedded targets as well without new failures.

I'll give this a couple days for folks to chime in before pushing on Shreya's
behalf.  This doesn't fix pr58727 for the other targets as they would need
target dependent hackery.

Jeff

	PR tree-optimization/58727
gcc/
	* simplify-rtx.cc (simplify_context::simplify_binary_operation_1):
	In (A & C1) | C2, if (C1|C2) results in a constant with a single bit
	clear, then adjust C1 appropriately.

gcc/testsuite/
	* gcc.target/riscv/pr58727.c: New test.
2025-09-17 07:43:48 -06:00
Richard Biener
f8cf09130b [gimplefe] fix SSA operand creation
When transitioning gcc.dg/torture/pr84830.c to a GIMPLE testcase to
feed the IL into PRE that caused the original issue (and verify it's
still there with the fix reverted), I noticed we put up SSA operands
before having fully parsed the function and thus with not all
variables having the final TREE_ADDRESSABLE state.  The following
fixes this, delaying update_stmt calls to when we create PHI nodes.
It also makes the pr84830.c not rely on the particular fake exit
edge source location by making the loop have an exit.

gcc/c/
	* gimple-parser.cc (c_parser_parse_gimple_body): Initialize
	SSA operands for each stmt.
	(c_parser_gimple_compound_statement): Append stmts without
	updating SSA operands.

gcc/testsuite/
	* gcc.dg/torture/pr84830.c: Turn into GIMPLE unit test for PRE.
2025-09-17 15:36:44 +02:00
Stefan Schulze Frielinghaus
282c1e682e s390: testsuite: Fix bitops-{1,2}.c and andc-splitter-2.c
After r16-2649-g0340177d54d tests fail for
gcc.target/s390/arch13/bitops-{1,2}.c since sign extends in conjunction
with (subreg (not a)) are folded, now.  That is, of course, wanted.
Since the original tests were about 32-bit operations, circumvent the
sign extend by not returning a value but rather writing it to memory.

Similar for andc-splitter-2.c sign extends are folded there, too.  Since
the test is not about 32- or 64-bit adjust the scan assembler directives
only.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/arch13/bitops-1.c: Do not return a 32bit value
	but write it to memory.
	* gcc.target/s390/arch13/bitops-2.c: Ditto.
	* gcc.target/s390/md/andc-splitter-2.c: Adjust scan assembler
	directive because sign extends are folded, now.
2025-09-17 13:12:30 +02:00
Eric Botcazou
d81e24bbb9 Preserve TREE_THIS_NOTRAP during inlining in more cases
For parameters passed by reference, the Ada compiler sets TREE_THIS_NOTRAP
on their dereference to prevent tree_could_trap_p from returning true and
then causing a new basic block to be created for every access to them,
given that in Ada the -fnon-call-exceptions flag is enabled by default.

However, when the subprogram is inlined, this TREE_THIS_NOTRAP flag cannot
be blindly preserved because the call may pass the dereference of a pointer
as the argument: even if the compiler generates a check that the pointer is
not null just before, preserving TREE_THIS_NOTRAP could cause an access to
be hoisted before the check; therefore it gets cleared for parameters.

Now that's suboptimal if the argument is a full object because accessing it
through the dereference of the parameter cannot trap, which causes MEM_REFs
of the form MEM_REF [&DECL] to be considered as trapping in the case where
the nominal subtype of DECL is self-referential.

gcc/
	* tree-inline.cc (maybe_copy_this_notrap): New function.  Also copy
	the TREE_THIS_NOTRAP flag for parameters when the argument is a full
	object and the parameter's type is self-referential.
	(remap_gimple_op_r): Call maybe_copy_this_notrap.
	(copy_tree_body_r): Likewise.
2025-09-17 11:38:26 +02:00
Iain Sandoe
8a7346964d testsuite, objective-c: Fix duplicate test names in 'special'.
For macOS/Darwin, we run Objective-C tests for both the GNU and
NeXT runtimes (and these runs are usually differentiated by
identifying the runtime in the test name).

However, the 'special' sub-set of tests had a non-standard driver
since it needs two sources for each test (but did not report the
runtime in the test name and so shows duplicates).

We can now automate the multi-source case with dg-additional-sources
but need to do a little work to filter these additional sources
from the set (since they also have a .m suffix).

This addresses the FIXME in the original driver.

To resolve the duplicated names, means amending the reported name
to include the runtime as a differentiator, this means that test
comparisons will temporarily report new and missing tests for any
comparison that includes this change.

gcc/testsuite/ChangeLog:

	* objc.dg/special/load-category-1.m: Add second source.
	* objc.dg/special/load-category-2.m: Likewise.
	* objc.dg/special/load-category-3.m: Likewise.
	* objc.dg/special/unclaimed-category-1.m: Likewise.
	* objc.dg/special/special.exp: Rewrite to make use of generic
	testsuite facilities.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2025-09-17 09:03:58 +01:00
Torbjörn SVENSSON
1cf8cb45d8 testsuite: arm: Simplify fp16-aapcs tests
Reduce fp16-aapcs testcases to return value testing since parameter
passing are already tested in aapcs/vfp*.c

gcc/testsuite/ChangeLog:
	* gcc.target/arm/fp16-aapcs.c: New test.
	* gcc.target/arm/fp16-aapcs-1.c: Removed.
	* gcc.target/arm/fp16-aapcs-2.c: Likewise.
	* gcc.target/arm/fp16-aapcs-3.c: Likewise.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
2025-09-17 09:56:11 +02:00
Tobias Burnus
da5803c794 libgomp: Init hash table for 'indirect'-clause of 'declare target' on the host [PR114445, PR119857]
Especially with unified-shared memory and especially with C++'s virtual
functions, it is not uncommon to have on the device a function pointer
that points to the host function - but has an associated device.
If the pointed-to function is (explicitly or implicitly) 'declare target'
with the 'indirect' clause, it is added to the lookup table.

Before this commit, the conversion of the lookup table into a lookup
hash table happened every time a device kernel was launched on the first
team - albeit if already converted, the function immediately returned.

Ignoring the overhead, there was also a race: If multiple teams were
launched, it could happen that another team of the same target region
already tried to use the lookup table which it was still being created.
Likewise when lauching a kernel with 'nowait' and directly afterward
another kernel, there could be a race of creating the table.

With this commit, the creating of the kernel has been moved to the
host-plugin's GOMP_OFFLOAD_load_image. The previous code stored a
pointer to the host/device pointer array, which makes it hard when
creating the hash table on the host (data is needed for finding the
slot) - but accessing it on the device (where the lookup has to work
as well). As the hash-table implementation (only) supports integral
value as payload (0 and 1 having special meaning), the solution was
to move to an uint128_t variable to store both the host and device
address.

As the host-side library is typically dynamically linked and the
device-side one statically, there is the problem of backward
compatibility. The current implementation permits both older
binaries and newer libgomp and newer binaries with older libgomp.
I could imagine us breaking the latter eventually, but for now
there is up and downward compatibility. (Obviously, the race is
only fixed if new + new is combined.)

Code wise, on the device exist GOMP_INDIRECT_ADDR_MAP which was
updated to point to the host/device-address array. Now additionally
GOMP_INDIRECT_ADDR_HMAP exists, which contains the hash-table map.

If the latter exists, libgomp only updates it and the former remains
a NULL pointer; it is also untouched if there are no indirect functions.
Being NULL therefore avoids the call to the device-side build_indirect_map.
The code also currently supports to have no hash and a linear walk. I think
that remained from testing; due to the backward-compat feature, it can
actually be turned of on either side.

libgomp/ChangeLog:

	PR libgomp/119857
	PR libgomp/114445
	* config/accel/target-indirect.c: Change to use uint128_t instead
	of a struct as data structure and add GOMP_INDIRECT_ADDR_HMAP as
	host-accessible variable.
	(struct indirect_map_t): Remove.
	(USE_HASHTAB_LOOKUP, INDIRECT_DEV_ADDR, INDIRECT_HOST_ADDR,
	SET_INDIRECT_HOST_ADDR, SET_INDIRECT_ADDRS): Define.
	(htab_free): Use __builtin_unreachable.
	(htab_hash, htab_eq, GOMP_target_map_indirect_ptr,
	build_indirect_map): Update for new representation and new
	pointer-to-hash variable.
	* config/gcn/team.c (gomp_gcn_enter_kernel): Only call
	build_indirect_map when GOMP_INDIRECT_ADDR_MAP.
	* config/nvptx/team.c (gomp_nvptx_main): Likewise.
	* libgomp-plugin.h (GOMP_INDIRECT_ADDR_HMAP): Define.
	* plugin/plugin-gcn.c: Conditionally include
	build-target-indirect-htab.h.
	(USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define.
	(create_target_indirect_map): New prototype.
	(GOMP_OFFLOAD_load_image): Update to create the device's
	indirect-function hash table on the host.
	* plugin/plugin-nvptx.c: Conditionally include
	build-target-indirect-htab.h.
	(USE_HASHTAB_LOOKUP_FOR_INDIRECT): Define.
	(create_target_indirect_map): New prototype.
	(GOMP_OFFLOAD_load_image): Update to create the device's
	indirect-function hash table on the host.
	* plugin/build-target-indirect-htab.h: New file.
2025-09-17 08:47:36 +02:00
Tobias Burnus
16d2b8881c libgomp: Add Fortran version of acc_copyout_finalize_async and acc_delete_finalize_async
OpenACC 2.5 added several functions for C and Fortran; while
acc_{copyout,delete}{,_finalize,_async} exist for both, for some
reasons only the C version of acc_{copyout,delete}_finalize_async
was actually added, even though the documentation (.texi) and
the .map file listed also the auxiliar Fortran functions!

OpenACC 2.5 added the Fortran version with the following odd
interface:  'type, dimension(:[,:]...)'. In OpenACC 2.6, it
was then updated to the Fortran 2018 syntax:
'type(*), dimension(..)', which is also used in openacc.f90
internally.
This commit now also updates the documentation to the newer
syntax - plus fixes a function-name typo: acc_delete_async_finalize
should have the _async at the end not in the middle!

libgomp/ChangeLog:

	* libgomp.map (OACC_2.5): Move previously unimplemented
	acc_{copyout,delete}_finalize_async_{32,64,array}_h_ to ...
	(OACC_2.6.1): ... here.
	* libgomp.texi (acc_copyin, acc_present_or_copyin, acc_create,
	acc_present_or_create, acc_copyout, acc_update_device,
	acc_update_self, acc_is_present): Use 'type(*), dimension(..)'
	instead of 'type, dimension(:[,:]...)' for Fortran.
	(acc_delete): Likewise; change acc_delete_async_finalize to
	acc_delete_finalize_async.
	* openacc.f90 (openacc_internal): Add interfaces for
	acc_{copyout,delete}_finalize_async_{{32,64,array}_h,_l}.
	(openacc): Add generic interfaces for
	acc_copyout_finalize_async and acc_delete_finalize_async.
	(acc_{copyout,delete}_finalize_async_{32,64,array}_h): New.
	* openacc_lib.h: Add generic interfaces for
	acc_copyout_finalize_async and acc_delete_finalize_async.
	* testsuite/libgomp.oacc-fortran/pr92970-1.f90: New test.
2025-09-17 08:43:58 +02:00
Pan Li
f666b14cf1 RISC-V: Add test for vec_duplicate + vwmulu.vv signed combine with GR2VR cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwmulu.vv
combine to vwmulu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
	for vwmulu.vx.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Add test helper
	macros.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test
	data for vwmulu.vx run test.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwmulu-run-1-u64.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-17 11:42:20 +08:00
Pan Li
f3d6d41abf RISC-V: Add test for vec_duplicate + vwsubu.vv signed combine with GR2VR cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwsubu.vv
combine to vwsubu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
	for vwsubu.vx.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test
	data for run test.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwsubu-run-1-u64.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-17 11:42:20 +08:00
Pan Li
b653093572 RISC-V: Add test for vec_duplicate + vwaddu.vv signed combine with GR2VR cost 0, 1 and 15
Add asm dump check and run test for vec_duplicate + vwaddu.vv
combine to vwaddu.vx, with the GR2VR cost is 0, 2 and 15.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
	for vwaddu.vx.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_vwaddu-run-1-u64.c: New test.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: New test.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: New test.
	* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_vx_run.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-17 11:42:19 +08:00
Pan Li
638320686c RISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vwaddu.vv to the
vwaddu.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

Before this patch:
  11       beq a3,zero,.L8
  12       vsetvli a5,zero,e32,m1,ta,ma
  13       vmv.v.x v2,a2
  ...
  16   .L3:
  17       vsetvli a5,a3,e32,m1,ta,ma
  ...
  22       vwaddu.vv v1,v2,v3
  ...
  25       bne a3,zero,.L3

After this patch:
  11       beq a3,zero,.L8
  ...
  14    .L3:
  15       vsetvli a5,a3,e32,m1,ta,ma
  ...
  20       vwaddu.vx v1,a2,v3
  ...
  23       bne a3,zero,.L3

The pattern of this patch only works on DImode, aka below pattern.
v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode)
  + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode));

Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss
this extend op after expand.

For uint16_t => uint32_t we have:
(set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0))

For uint32_t => uint64_t we have:
(set (reg:DI 148 [ _6 ])
     (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0)))

We can see there is no zero_extend for uint16_t to uint32_t, and we
cannot hit the pattern above.  So the combine will try below pattern
for uint16_t to uint32_t.

v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode)
  + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode)))

But it cannot match the vwaddu sematics, thus we need another handing
for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to
uint16_t.

gcc/ChangeLog:

	* config/riscv/autovec-opt.md (*widen_first_<any_extend:su>_vx_<mode>):
	Add helper bridge pattern for vwaddu.vx combine.
	(*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add
	new pattern to match vwaddu.vx combine.
	* config/riscv/iterators.md: Add code attr to get extend CODE.
	* config/riscv/vector-iterators.md: Add Dmode iterator for
	widen.

Signed-off-by: Pan Li <pan2.li@intel.com>
2025-09-17 11:42:19 +08:00
Haochen Jiang
39c7b08d4e i386/testsuite: Correct res_ref2 array size for avx512bw-vpmov{,us}wb-2.c
Both of the tests under 128 bit are raising:

warning: writing 16 bytes into a region of size 8 [-Wstringop-overflow=]

when compiling, leading to a test fail. The warning is caused by the
incorrect array size for res_ref2. The wrong size caused the overflow.

Correct them in this patch to fix the test fail.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/avx512bw-vpmovuswb-2.c: Correct res_ref2
	array size.
	* gcc.target/i386/avx512bw-vpmovwb-2.c: Ditto.
2025-09-17 11:03:34 +08:00
Haochen Jiang
0aef0232ee i386/testsuite: Fix scan tree dump in vect-epilogue-4.c
vect-epilogue-4.c uses mask 64 byte to vectorize in epilogue part.
Similar as r16-876 fix for vect-epilogue-5.c, we need to adjust the
scan tree dump.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/vect-epilogues-4.c: Fix for epilogue
	vect tree dump.
2025-09-17 11:03:27 +08:00
Patrick Palka
e690b97761 libstdc++: Explicitly pass -Wsystem-headers in tests that need it
When running libstdc++ tests using an installed gcc (as opposed to an
in-tree gcc), we naturally use system stdlib headers instead of the
in-tree headers.  But warnings from within system headers are suppressed
by default, so tests that check for such warnings spuriously fail in such
a setup.  This patch makes us compile such tests with -Wsystem-headers so
that they consistently pass.

libstdc++-v3/ChangeLog:

	* testsuite/20_util/bind/dangling_ref.cc: Compile with
	-Wsystem-headers.
	* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Likewise.
	* testsuite/20_util/unique_ptr/lwg4148.cc: Likewise.
	* testsuite/29_atomics/atomic/operators/pointer_partial_void.cc:
	Likewise.
	* testsuite/30_threads/packaged_task/cons/dangling_ref.cc:
	Likewise.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2025-09-16 20:59:10 -04:00
GCC Administrator
02666ff894 Daily bump. 2025-09-17 00:20:31 +00:00
Andrew Pinski
df5088e9a2 c: Reject gimple and rtl functions as needed functions [PR121421]
These two don't make sense as nested functions as they both don't handle
the unnesting and/or have support for the static chain.

So let's reject them.

Bootstrapped and tested on x86_64-linux-gnu.

	PR c/121421

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_declaration_or_fndef): Error out for gimple
	and rtl functions as nested functions.

gcc/testsuite/ChangeLog:

	* gcc.dg/gimplefe-error-16.c: New test.

Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
2025-09-16 15:28:48 -07:00
Jakub Jelinek
5eb86c29d2 docs: Adjust -Wimplicit-fallthrough= documentation for C23
I've noticed in -Wimplicit-fallthrough= documentation we talk about
[[fallthrough]]; for C++17 but don't mention that it is also standard
way to suppress the warning for C23.

2025-09-16  Jakub Jelinek  <jakub@redhat.com>

	* doc/invoke.texi (Wimplicit-fallthrough=): Document that also C23
	provides a standard way to suppress the warning with [[fallthrough]];.
2025-09-16 19:26:48 +02:00
Jonathan Wakely
5d774ec80b libstdc++: Optimize determination of std::tuple_cat return type
The std::tuple_cat function has to determine a std::tuple return type
from zero or more tuple-like arguments. This uses the __make_tuple class
template to transform a tuple-like type into a std::tuple, and the
__combine_tuples class template to combine zero or more std::tuple types
into a single std::tuple type.

This change optimizes the __make_tuple class template to use an
_Index_tuple and pack expansion instead of recursive instantiation, and
optimizes __combine_tuples to use fewer levels of recursion.

For ranges::adjacent_view's __detail::__repeated_tuple helper we can
just use the __make_tuple class template directly, instead of doing
overload resolution on std::tuple_cat to get its return type.

libstdc++-v3/ChangeLog:

	* include/std/ranges (__detail::__repeated_tuple): Use
	__make_tuple helper alias directly, instead of doing overload
	resolution on std::tuple_cat.
	* include/std/tuple (__make_tuple_impl): Remove.
	(__do_make_tuple): Replace recursion with _Index_tuple and pack
	expansion.
	(__make_tuple): Adjust to new __do_make_tuple definition.
	(__combine_tuples<tuple<T1s...>, tuple<T2s...>, Rem...>): Replace
	with a partial specialization for exactly two tuples and a
	partial specialization for three or more tuples.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-09-16 18:25:02 +01:00
Jonathan Wakely
412a1f78b5 libstdc++: ranges::rotate should not use 'auto' with ranges::iter_move [PR121913]
The r16-3835-g7801236069a95c change to use ranges::iter_move should also
have used iter_value_t<_Iter> to ensure we get an object of the value
type, not a proxy reference.

libstdc++-v3/ChangeLog:

	PR libstdc++/121913
	* include/bits/ranges_algo.h (__rotate_fn::operator()): Use
	auto_value_t<_Iter> instead of deduced type.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-09-16 17:38:56 +01:00
Jonathan Wakely
0c762f79a9 libstdc++: Fix missing change to views::pairwise from P2165R4 [PR121956]
ranges::adjacent_view::_Iterator::value_type should have been changed by
r14-8710-g65b4cba9d6a9ff to always produce std::tuple, even for the
N == 2 views::pairwise specialization.

libstdc++-v3/ChangeLog:

	PR libstdc++/121956
	* include/std/ranges (adjacent_view::_Iterator::value_type):
	Always define as std::tuple<T, N>, not std::pair<T, T>.
	* testsuite/std/ranges/adaptors/adjacent/1.cc: Check value type
	of views::pairwise.

Reviewed-by: Patrick Palka <ppalka@redhat.com>
2025-09-16 17:38:41 +01:00
Takayuki 'January June' Suwa
fe7cf719a9 xtensa: Simplify the definition of REGNO_OK_FOR_BASE_P() and avoid calling it directly
In recent gcc versions, REGNO_OK_FOR_BASE_P() is not called directly, but
rather via regno_ok_for_base_p() which is a wrapper in gcc/addresses.h.
The wrapper obtains a hard register number from pseudo via reg_renumber
array, so REGNO_OK_FOR_BASE_P() does not need to take this into
consideration.

On the other hand, since there is only one use of REGNO_OK_FOR_BASE_P()
in the target-specific code, it would make more sense to simplify the
definition of REGNO_OK_FOR_BASE_P() and replace its call with that of
regno_ok_for_base_p().

gcc/ChangeLog:

	* config/xtensa/xtensa.cc (#include):
	Add "addresses.h".
	* config/xtensa/xtensa.h (REGNO_OK_FOR_BASE_P):
	Simplify to just a call to GP_REG_P().
	(BASE_REG_P): Replace REGNO_OK_FOR_BASE_P() with the equivalent
	call to regno_ok_for_base_p().
2025-09-16 09:08:30 -07:00
Wilco Dijkstra
5b531aa5cc AArch64: Add isnan expander [PR 66462]
Add an expander for isnan using integer arithmetic.  Since isnan is
just a compare, enable it only with -fsignaling-nans to avoid
generating spurious exceptions.  This fixes part of PR66462.

int isnan1 (float x) { return __builtin_isnan (x); }

Before:
	fcmp	s0, s0
	cset	w0, vs
	ret

After:
	fmov	w1, s0
	mov	w0, -16777216
	cmp	w0, w1, lsl 1
	cset	w0, cc
	ret

gcc:
	PR middle-end/66462
	* config/aarch64/aarch64.md (isnan<mode>2): Add new expander.

gcc/testsuite:
	PR middle-end/66462
	* gcc.target/aarch64/pr66462.c: Update test.
2025-09-16 12:31:13 +00:00