Commit Graph

193603 Commits

Author SHA1 Message Date
Srinath Parvathaneni
96213951b7 arm: Optimize arm-mlib.h header inclusion [pr108505].
I have committed a fix [1] into gcc trunk for a build
issue mentioned in pr108505 and latter received few upstream
comments proposing more robust fix for this issue.

In this patch I'm addressing those comments and sending this
as a followup patch.

gcc/ChangeLog:

2023-01-27  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	PR target/108505
	* config.gcc (tm_mlib_file): Define new variable.
release-12.2.mpacbti-rel1
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
f2b8a0057d arm: Fix inclusion of arm-mlib.h header more than once (pr108505).
The patch fixes the build issue for arm-none-eabi target configured with
--with-multilib-list=aprofile,rmprofile, in which case the header file
arm/arm-mlib.h is being included more than once and the toolchain build
is failing (PR108505).

gcc/ChangeLog:

2023-01-24  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	PR target/108505
	* config.gcc (tm_file): Move the variable out of loop.
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
f430585dbb arm: Documentation fix for -mbranch-protection option.
This patch fixes the documentation for -mbranch-protection command line option.

gcc/ChangeLog:

2023-01-23  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* doc/invoke.texi (-mbranch-protection): Update documentation.
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
28c545f81a arm: Add support for new frame unwinding instruction "0xb5".
This patch adds support for Arm frame unwinding instruction "0xb5" [1]. When
an exception is taken and "0xb5" instruction is encounter during runtime
stack-unwinding, we use effective vsp as modifier in pointer authentication.
On completion of stack unwinding if "0xb5" instruction is not encountered
then CFA will be used as modifier in pointer authentication.

[1] https://github.com/ARM-software/abi-aa/releases/download/2022Q3/ehabi32.pdf

libgcc/ChangeLog:

2022-11-09  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/pr-support.c (__gnu_unwind_execute): Decode opcode
	"0xb5".
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
eb26fa78a8 arm: Add support for dwarf debug directives and pseudo hard-register for PAC feature.
This patch teaches the DWARF support in gcc about RA_AUTH_CODE pseudo hard-register and also
updates the ".save", ".cfi_register", ".cfi_offset", ".cfi_restore" directives accordingly.
This patch also adds support to emit ".pacspval" directive when "pac ip, lr, sp" instruction
in generated in the assembly.

RA_AUTH_CODE register number is 107 and it's dwarf register number is 143.

Applying this patch on top of PACBTI series posted here
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599658.html and when compiling the following
test.c with "-march=armv8.1-m.main+mve+pacbti -mbranch-protection=pac-ret -mthumb -mfloat-abi=hard
fasynchronous-unwind-tables -g -O0 -S" command line options, the assembly output after this patch
looks like below:

$cat test.c

void fun1(int a);
void fun(int a,...)
{
  fun1(a);
}

int main()
{
  fun (10);
  return 0;
}

$ arm-none-eabi-gcc -march=armv8.1-m.main+mve+pacbti -mbranch-protection=pac-ret -mthumb -mfloat-abi=hard
-fasynchronous-unwind-tables -g -O0 -S test.s

Assembly output:
...
fun:
...
        .pacspval
        pac     ip, lr, sp
        .cfi_register 143, 12
        push    {r3, r7, ip, lr}
        .save {r3, r7, ra_auth_code, lr}
...
        .cfi_offset 143, -24
...
        .cfi_restore 143
...
        aut     ip, lr, sp
        bx      lr
...
main:
...
        .pacspval
        pac     ip, lr, sp
        .cfi_register 143, 12
        push    {r3, r7, ip, lr}
        .save {r3, r7, ra_auth_code, lr}
...
        .cfi_offset 143, -8
...
        .cfi_restore 143
...
        aut     ip, lr, sp
        bx      lr
...

gcc/ChangeLog:

2023-01-11  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/aout.h (ra_auth_code): Add entry in enum.
	* config/arm/arm.cc (emit_multi_reg_push): Add RA_AUTH_CODE register
	to dwarf frame expression.
	(arm_emit_multi_reg_pop): Restore RA_AUTH_CODE register.
	(arm_expand_prologue): Update frame related information and reg notes
	for pac/pacbit insn.
	(arm_regno_class): Check for pac pseudo reigster.
	(arm_dbx_register_number): Assign ra_auth_code register number in dwarf.
	(arm_init_machine_status): Set pacspval_needed to zero.
	(arm_debugger_regno): Check for PAC register.
	(arm_unwind_emit_sequence): Print .save directive with ra_auth_code
	register.
	(arm_unwind_emit_set): Add entry for IP_REGNUM in switch case.
	(arm_unwind_emit): Update REG_CFA_REGISTER case._
	* config/arm/arm.h (FIRST_PSEUDO_REGISTER): Modify.
	(DWARF_PAC_REGNUM): Define.
	(IS_PAC_REGNUM): Likewise.
	(enum reg_class): Add PAC_REG entry.
	(machine_function): Add pacbti_needed state to structure.
	* config/arm/arm.md (RA_AUTH_CODE): Define.

gcc/testsuite/ChangeLog:

2023-01-11  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* g++.target/arm/pac-1.C: New test.
	* gcc.target/arm/pac-15.c: Likewise.
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
014175926e arm: Add pacbti related multilib support for armv8.1-m.main.
This patch adds the support for pacbti multlilib linking by making
"-mbranch-protection=none" as default multilib option for arm-none-eabi
target.

Eg 1.

If the passed command line flags are (without mbranch-protection):
a) -march=armv8.1-m.main+mve -mfloat-abi=hard -mfpu=auto

"-mbranch-protection=none" will be used in the multilib matching.

Eg 2.

If the passed command line flags are (with mbranch-protection):
a) -march=armv8.1-m.main+mve+pacbti -mfloat-abi=hard -mfpu=auto  -mbranch-protection=pac-ret

"-mbranch-protection=standard" will be used in the multilib matching.

gcc/ChangeLog:

2023-01-11  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config.gcc ($tm_file): Update variable.
	* config/arm/arm-mlib.h: Create new header file.
	* config/arm/t-rmprofile (MULTI_ARCH_DIRS_RM): Rename mbranch-protection
	multilib arch directory.
	(MULTILIB_REUSE): Add multilib reuse rules.
	(MULTILIB_MATCHES): Add multilib match rules.

gcc/testsuite/ChangeLog:

2023-01-11  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/multilib.exp (multilib_config "rmprofile"): Update
	tests.
	* gcc.target/arm/pac-12.c: New test.
	* gcc.target/arm/pac-13.c: Likewise.
	* gcc.target/arm/pac-14.c: Likewise.
2023-03-09 16:11:15 +01:00
Srinath Parvathaneni
403d1b1760 arm: Add support for Arm Cortex-M85 CPU.
This patch adds the -mcpu support for the Arm Cortex-M85 CPU which is
an Armv8.1-M Mainline CPU supporting MVE and PACBTI by default.

-mpcu=cortex-m85 switch by default matches to -march=armv8.1-m.main+pacbti+mve.fp+fp.dp.

Also following options are provided to disable default features.
+nomve.fp (disables MVE Floating point)
+nomve (disables MVE Integer and MVE Floating point)
+nodsp (disables dsp, MVE Integer and MVE Floating point)
+nopacbti (disables pacbti)
+nofp (disables floating point and MVE floating point)

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config/arm/arm-cpus.in (cortex-m85): Define new CPU.
	* config/arm/arm-tables.opt: Regenerate.
	* config/arm/arm-tune.md: Likewise.
	* doc/invoke.texi (Arm Options): Document -mcpu=cortex-m85.
	* (-mfix-cmse-cve-2021-35465): Likewise.

gcc/testsuite/ChangeLog:

2022-08-12  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/multilib.exp: Add tests for cortex-m85.
2023-03-09 16:11:14 +01:00
Srinath Parvathaneni
db86931b08 [Committed] arm: Document +no options for Cortex-M55 CPU.
This patch documents the following options for Arm Cortex-M55 CPU
under -mcpu= list.

+nomve.fp (disables MVE single precision floating point instructions)
+nomve (disables MVE integer and single precision floating point instructions)
+nodsp (disables dsp, MVE integer and single precision floating point instructions)
+nofp (disables floating point instructions)
Committed as obvious to master.

gcc/ChangeLog:

2022-08-12  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* doc/invoke.texi (Arm Options): Document -mcpu=cortex-m55 options.
2023-03-09 16:11:14 +01:00
Andrea Corallo
8b7f28f2e8 arm: Implement arm Function target attribute 'branch-protection'
gcc/

	* config/arm/arm.cc (arm_valid_target_attribute_rec): Add ARM function
	attribute 'branch-protection' and parse its options.
	* doc/extend.texi: Document ARM Function attribute 'branch-protection'.

gcc/testsuite/

	* gcc.target/arm/acle/pacbti-m-predef-13.c: New test.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 16:11:14 +01:00
Andrea Corallo
9d560b2365 aarch64: Fix return_address_sign_ab_exception.C regression
Hi all,

this is to fix the regression of
g++.target/aarch64/return_address_sign_ab_exception.C that I
introduced with d8dadbc9a5.

'aarch_ra_sign_key' for aarch64 ended up being non defined in the opt
file and the function attribute "branch-protection=pac-ret+leaf+b-key"
stopped working as expected.

This patch moves the definition of 'aarch_ra_sign_key' to the opt
files for both Arm back-ends.

Regards

  Andera Corallo

gcc/ChangeLog:

	* config/aarch64/aarch64-protos.h (aarch_ra_sign_key): Remove
	declaration.
	* config/aarch64/aarch64.cc (aarch_ra_sign_key): Remove
	definition.
	* config/aarch64/aarch64.opt (aarch64_ra_sign_key): Rename
	to 'aarch_ra_sign_key'.
	* config/arm/aarch-common.cc (aarch_ra_sign_key): Remove
	declaration.
	* config/arm/arm-protos.h (aarch_ra_sign_key): Likewise.
	* config/arm/arm.cc (enum aarch_key_type): Remove definition.
	* config/arm/arm.opt: Define.
2023-03-09 16:11:14 +01:00
Andrea Corallo
a93bf01521 [PATCH 12/15] arm: implement bti injection
Hi all,

this patch enables Branch Target Identification Armv8.1-M Mechanism
[1].

This is achieved by using the bti pass made common with Aarch64.

The pass iterates through the instructions and adds the necessary BTI
instructions at the beginning of every function and at every landing
pads targeted by indirect jumps.

Best Regards

  Andrea

[1]
<https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension>

gcc/ChangeLog

2022-04-07  Andrea Corallo  <andrea.corallo@arm.com>

	* config.gcc (arm*-*-*): Add 'aarch-bti-insert.o' object.
	* config/arm/arm-protos.h: Update.
	* config/arm/aarch-common-protos.h: Declare
	'aarch_bti_arch_check'.
	* config/arm/arm.cc (aarch_bti_enabled) Update.
	(aarch_bti_j_insn_p, aarch_pac_insn_p, aarch_gen_bti_c)
	(aarch_gen_bti_j, aarch_bti_arch_check): New functions.
	* config/arm/arm.md (bti_nop): New insn.
	* config/arm/t-arm (PASSES_EXTRA): Add 'arm-passes.def'.
	(aarch-bti-insert.o): New target.
	* config/arm/unspecs.md (VUNSPEC_BTI_NOP): New unspec.
	* config/arm/aarch-bti-insert.cc (rest_of_insert_bti): Verify arch
	compatibility.
	(gate): Make use of 'aarch_bti_arch_check'.
	* config/arm/arm-passes.def: New file.
	* config/aarch64/aarch64.cc (aarch_bti_arch_check): New function.

gcc/testsuite/ChangeLog

2022-04-07  Andrea Corallo  <andrea.corallo@arm.com>

	* gcc.target/arm/bti-1.c: New testcase.
	* gcc.target/arm/bti-2.c: Likewise.
2023-03-09 14:57:23 +00:00
Andrea Corallo
f1ba236eaa [PATCH 11/15] aarch64: Make bti pass generic so it can be used by the arm backend
Hi all,

this patch splits and restructures the aarch64 bti pass code in order
to have it usable by the arm backend as well.  These changes have no
functional impact.

Best Regards

  Andrea

gcc/Changelog

	* config.gcc (aarch64*-*-*): Rename 'aarch64-bti-insert.o' into
	'aarch-bti-insert.o'.
	* config/aarch64/aarch64-protos.h: Remove 'aarch64_bti_enabled'
	proto.
	* config/aarch64/aarch64.cc (aarch_bti_enabled): Rename.
	(aarch_bti_j_insn_p, aarch_pac_insn_p): New functions.
	(aarch64_output_mi_thunk)
	(aarch64_print_patchable_function_entry)
	(aarch64_file_end_indicate_exec_stack): Update renamed function
	calls to renamed functions.
	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Likewise.
	* config/aarch64/t-aarch64 (aarch-bti-insert.o): Update
	target.
	* config/aarch64/aarch64-bti-insert.cc: Delete.
	* config/arm/aarch-bti-insert.cc: New file including and
	generalizing code from aarch64-bti-insert.cc.
	* config/arm/aarch-common-protos.h: Update.
2023-03-09 14:57:23 +00:00
Andrea Corallo
45e636ff48 [PATCH 10/15] arm: Implement cortex-M return signing address codegen
Hi all,

this patch enables address return signature and verification based on
Armv8.1-M Pointer Authentication [1].

To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.

During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.

Here an example of PAC instrumented function prologue and epilogue:

void foo (void);

int main()
{
  foo ();
  return 0;
}

Compiled with '-march=armv8.1-m.main -mbranch-protection=pac-ret
-mthumb' translates into:

main:
	pac	ip, lr, sp
	push	{r3, r7, ip, lr}
	add	r7, sp, #0
	bl	foo
	movs	r3, #0
	mov	r0, r3
	pop	{r3, r7, ip, lr}
	aut	ip, lr, sp
	bx	lr

The patch also takes care of generating a PACBTI instruction in place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.

Ex. the previous example compiled with '-march=armv8.1-m.main
-mbranch-protection=pac-ret+bti -mthumb' translates into:

main:
	pacbti	ip, lr, sp
	push	{r3, r7, ip, lr}
	add	r7, sp, #0
	bl	foo
	movs	r3, #0
	mov	r0, r3
	pop	{r3, r7, ip, lr}
	aut	ip, lr, sp
	bx	lr

As part of previous upstream suggestions a test for varargs has been
added and '-mtpcs-frame' is deemed being incompatible with this return
signing address feature being introduced.

[1] <https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension>

gcc/
	* config/arm/arm.h (arm_arch8m_main): Declare it.
	* config/arm/arm-protos.h (arm_current_function_pac_enabled_p):
	Declare it.
	* config/arm/arm.cc (arm_arch8m_main): Define it.
	(arm_option_reconfigure_globals): Set arm_arch8m_main.
	(arm_compute_frame_layout, arm_expand_prologue)
	(thumb2_expand_return, arm_expand_epilogue)
	(arm_conditional_register_usage): Update for pac codegen.
	(arm_current_function_pac_enabled_p): New function.
	(aarch_bti_enabled) New function.
	(use_return_insn): Return zero when pac is enabled.
	* config/arm/arm.md (pac_ip_lr_sp, pacbti_ip_lr_sp, aut_ip_lr_sp):
	Add new patterns.
	* config/arm/unspecs.md (UNSPEC_PAC_NOP)
	(VUNSPEC_PACBTI_NOP, VUNSPEC_AUT_NOP): Add unspecs.

gcc/testsuite/

	* gcc.target/arm/pac.h : New file.
	* gcc.target/arm/pac-1.c : New test case.
	* gcc.target/arm/pac-2.c : Likewise.
	* gcc.target/arm/pac-3.c : Likewise.
	* gcc.target/arm/pac-4.c : Likewise.
	* gcc.target/arm/pac-5.c : Likewise.
	* gcc.target/arm/pac-6.c : Likewise.
	* gcc.target/arm/pac-7.c : Likewise.
	* gcc.target/arm/pac-8.c : Likewise.
	* gcc.target/arm/pac-9.c : Likewise.
	* gcc.target/arm/pac-10.c : Likewise.
	* gcc.target/arm/pac-11.c : Likewise.
2023-03-09 14:57:23 +00:00
Andrea Corallo
e7c1e80da6 [PATCH 8/15] arm: Introduce multilibs for PACBTI target feature
This patch add the following new multilibs.

thumb/v8.1-m.main+pacbti/mbranch-protection/nofp
thumb/v8.1-m.main+pacbti+dp/mbranch-protection/soft
thumb/v8.1-m.main+pacbti+dp/mbranch-protection/hard
thumb/v8.1-m.main+pacbti+fp/mbranch-protection/soft
thumb/v8.1-m.main+pacbti+fp/mbranch-protection/hard
thumb/v8.1-m.main+pacbti+mve/mbranch-protection/hard

Triggering the following compiler flags:

-mthumb -march=armv8.1-m.main+pacbti -mbranch-protection=standard -mfloat-abi=soft
-mthumb -march=armv8.1-m.main+pacbti+fp -mbranch-protection=standard -mfloat-abi=softfp
-mthumb -march=armv8.1-m.main+pacbti+fp -mbranch-protection=standard -mfloat-abi=hard
-mthumb -march=armv8.1-m.main+pacbti+fp.dp -mbranch-protection=standard -mfloat-abi=softfp
-mthumb -march=armv8.1-m.main+pacbti+fp.dp -mbranch-protection=standard -mfloat-abi=hard
-mthumb -march=armv8.1-m.main+pacbti+mve -mbranch-protection=standard -mfloat-abi=hard

gcc/

	* config/arm/t-rmprofile: Add multilib rules for march +pacbti and
	mbranch-protection.

gcc/testsuite/

	* gcc.target/arm/multilib.exp: Add pacbti related entries.
2023-03-09 14:57:23 +00:00
Andrea Corallo
a09ba7a168 [PATCH 7/15] arm: Emit build attributes for PACBTI target feature
This patch emits assembler directives for PACBTI build attributes as
defined by the
ABI.

<https://github.com/ARM-software/abi-aa/releases/download/2021Q1/addenda32.pdf>

gcc/ChangeLog:

	* config/arm/arm.cc (arm_file_start): Emit EABI attributes for
	Tag_PAC_extension, Tag_BTI_extension, TAG_BTI_use, TAG_PACRET_use.

gcc/testsuite/ChangeLog:

	* gcc.target/arm/acle/pacbti-m-predef-1.c: New test.
	* gcc.target/arm/acle/pacbti-m-predef-3.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-6.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-7.c: Likewise.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
a2301e7f25 [PATCH 6/15] arm: Add pointer authentication for stack-unwinding runtime
This patch adds authentication for when the stack is unwound when an
exception is taken.  All the changes here are done to the runtime code
in libgcc's unwinder code for Arm target. All the changes are guarded
under defined (__ARM_FEATURE_PAC_DEFAULT) and activated only if the
+pacbti feature is switched on for the architecture. This means that
switching on the target feature via -march or -mcpu is sufficient and
-mbranch-protection need not be enabled. This ensures that the
unwinder is authenticated only if the PACBTI instructions are
available in the non-NOP space as it uses AUTG.  Just generating
PAC/AUT instructions using -mbranch-protection will not enable
authentication on the unwinder.

Pre-approved with the requested changes here
<https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586555.html>.

gcc/ChangeLog:

	* ginclude/unwind-arm-common.h (_Unwind_VRS_RegClass): Introduce
	new pseudo register class _UVRSC_PAC.

libgcc/ChangeLog:
	* config/arm/pr-support.c (__gnu_unwind_execute): Decode
	exception opcode (0xb4) for saving RA_AUTH_CODE and authenticate
	with AUTG if found.
	* config/arm/unwind-arm.c (struct pseudo_regs): New.
	(phase1_vrs): Introduce new field to store pseudo-reg state.
	(phase2_vrs): Likewise.
	(_Unwind_VRS_Get): Load pseudo register state from virtual reg set.
	(_Unwind_VRS_Set): Store pseudo register state to virtual reg set.
	(_Unwind_VRS_Pop): Load pseudo register value from stack into VRS.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
Co-Authored-By: Srinath Parvathaneni <srinath.parvathaneni@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
df22dc050f [PATCH 5/15] arm: Implement target feature macros for PACBTI
This patch implements target feature macros when PACBTI is enabled
through the -march option or -mbranch-protection.  The target feature
macros __ARM_FEATURE_PAC_DEFAULT and __ARM_FEATURE_BTI_DEFAULT are
specified in ARM ACLE
<https://developer.arm.com/documentation/101028/0012/5--Feature-test-macros?lang=en>
__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI are specified in the
pull-request <https://github.com/ARM-software/acle/pull/55>.

Approved here
<https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586334.html>.

gcc/

	* config/arm/arm-c.cc (arm_cpu_builtins): Define
	__ARM_FEATURE_BTI_DEFAULT, __ARM_FEATURE_PAC_DEFAULT,
	__ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI.

gcc/testsuite/

	* lib/target-supports.exp
	(check_effective_target_mbranch_protection_ok): New function.
	* gcc.target/arm/acle/pacbti-m-predef-2.c: New test.
	* gcc.target/arm/acle/pacbti-m-predef-4.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-5.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-8.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-9.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-10.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-11.c: Likewise.
	* gcc.target/arm/acle/pacbti-m-predef-12.c: Likewise.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
b3069e3ae4 [PATCH 4/15] arm: Add testsuite library support for PACBTI target
Add targeting-checking entities for PACBTI in testsuite
framework.

Pre-approved with the requested changes here
<https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586331.html>.

gcc/testsuite/ChangeLog

	* lib/target-supports.exp:
	(check_effective_target_arm_pacbti_hw): New.

gcc/ChangeLog:
	* doc/sourcebuild.texi: Document arm_pacbti_hw.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
f9ed6deaec [PATCH 3/15] arm: Add option -mbranch-protection
Add -mbranch-protection option.  This option enables the
code-generation of pointer signing and authentication instructions in
function prologues and epilogues.

gcc/ChangeLog:

	* config/arm/arm.cc (arm_configure_build_target): Parse and validate
	-mbranch-protection option and initialize appropriate data structures.
	* config/arm/arm.opt (-mbranch-protection): New option.
	* doc/invoke.texi (Arm Options): Document it.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
8c86f2c21b [PATCH 2/15] arm: Add Armv8.1-M Mainline target feature +pacbti
This patch adds the -march feature +pacbti to Armv8.1-M Mainline.
This feature enables pointer signing and authentication instructions
on M-class architectures.

Pre-approved here
<https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586144.html>.

gcc/Changelog:

	* config/arm/arm.h (TARGET_HAVE_PACBTI): New macro.
	* config/arm/arm-cpus.in (pacbti): New feature.
	* doc/invoke.texi (Arm Options): Document it.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 14:57:23 +00:00
Andrea Corallo
2f2817bc05 [PATCH 1/15] arm: Make mbranch-protection opts parsing common to AArch32/64
Hi all,

This change refactors all the mbranch-protection option parsing code and
types to make it common to both AArch32 and AArch64 backends.

This change also pulls in some supporting types from AArch64 to make
it common (aarch_parse_opt_result).

The significant changes in this patch are the movement of all branch
protection parsing routines from aarch64.c to aarch-common.c and
supporting data types and static data structures.

This patch also pre-declares variables and types required in the
aarch32 back-end for moved variables for function sign scope and key
to prepare for the impending series of patches that support parsing
the feature mbranch-protection in the aarch32 back-end.

gcc/ChangeLog:

	* common/config/aarch64/aarch64-common.cc: Include aarch-common.h.
	(all_architectures): Fix comment.
	(aarch64_parse_extension): Rename return type, enum value names.
	* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Rename
	factored out aarch_ra_sign_scope and aarch_ra_sign_key variables.
	Also rename corresponding enum values.
	* config/aarch64/aarch64-opts.h (aarch64_function_type): Factor
	out aarch64_function_type and move it to common code as
	aarch_function_type in aarch-common.h.
	* config/aarch64/aarch64-protos.h: Include common types header,
	move out types aarch64_parse_opt_result and aarch64_key_type to
	aarch-common.h
	* config/aarch64/aarch64.cc: Move mbranch-protection parsing types
	and functions out into aarch-common.h and aarch-common.cc.  Fix up
	all the name changes resulting from the move.
	* config/aarch64/aarch64.md: Fix up aarch64_ra_sign_key type name change
	and enum value.
	* config/aarch64/aarch64.opt: Include aarch-common.h to import
	type move.  Fix up name changes from factoring out common code and
	data.
	* config/arm/aarch-common-protos.h: Export factored out routines to both
	backends.
	* config/arm/aarch-common.cc: Include newly factored out types.
	Move all mbranch-protection code and data structures from
	aarch64.cc.
	* config/arm/aarch-common.h: New header that declares types shared
	between aarch32 and aarch64 backends.
	* config/arm/arm-protos.h: Declare types and variables that are
	made common to aarch64 and aarch32 backends - aarch_ra_sign_key,
	aarch_ra_sign_scope and aarch_enable_bti.
	* config/arm/arm.opt (config/arm/aarch-common.h): Include header.
	(aarch_ra_sign_scope, aarch_enable_bti): Declare variable.
	* config/arm/arm.cc: Add missing includes.

Co-Authored-By: Tejas Belagod  <tbelagod@arm.com>
2023-03-09 14:57:23 +00:00
Eric Botcazou
2155c14612 Fix small regression in Ada
gcc/
	* gimplify.cc (gimplify_save_expr): Add missing guard.
gcc/testsuite/
	* gnat.dg/shift2.adb: New test.
2023-02-14 13:34:07 +01:00
Thomas W Rodgers
dec869c955 libstdc++: Add missing free functions for atomic_flag [PR103934]
This patch adds -
  atomic_flag_test
  atomic_flag_test_explicit

Which were missed when commit 491ba6 introduced C++20 atomic flag
test.

libstdc++-v3/ChangeLog:

	PR libstdc++/103934
	* include/std/atomic (atomic_flag_test): Add.
	(atomic_flag_test_explicit): Add.
	* testsuite/29_atomics/atomic_flag/test/explicit.cc: Add
	test case to cover missing atomic_flag free functions.
	* testsuite/29_atomics/atomic_flag/test/implicit.cc:
	Likewise.

(cherry picked from commit a8d769045b)
2023-02-13 17:53:37 -08:00
GCC Administrator
35c6d02981 Daily bump. 2023-02-14 00:21:07 +00:00
Kewen Lin
cb6861acc4 rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396]
As Andrew pointed out in PR108396, there is one typo in
rs6000-overload.def on built-in function vec_vsubcuq:

  [VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq]

"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq, so that compiler is not able to realize
the built-in function name vec_vsubcuq any more.

Co-authored-By: Andrew Pinski <apinski@marvell.com>

	PR target/108396

gcc/ChangeLog:

	* config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo
	vec_vsubcuqP with vec_vsubcuq.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr108396.c: New test.

(cherry picked from commit aaf29ae6cd)
2023-02-12 20:00:53 -06:00
Kewen Lin
3c7bb6c0b0 rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]
PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then.  This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.

btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.

	PR target/108348

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
	support for invalid uses of MMA opaque type in function arguments.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr108348-1.c: New test.
	* gcc.target/powerpc/pr108348-2.c: New test.

(cherry picked from commit 5d9529687d)
2023-02-12 20:00:49 -06:00
Kewen Lin
e5a63c9869 rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]
As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements.  This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.

	PR target/108272

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
	support for invalid uses in inline asm, factor out the checking and
	erroring to lambda function check_and_error_invalid_use.

gcc/testsuite/ChangeLog:

	* gcc.target/powerpc/pr108272-1.c: New test.
	* gcc.target/powerpc/pr108272-2.c: New test.
	* gcc.target/powerpc/pr108272-3.c: New test.
	* gcc.target/powerpc/pr108272-4.c: New test.

(cherry picked from commit 074b0c03ea)
2023-02-12 20:00:40 -06:00
GCC Administrator
b9f5d34ddc Daily bump. 2023-02-13 00:21:15 +00:00
GCC Administrator
d01d4bfc6b Daily bump. 2023-02-12 00:20:51 +00:00
John David Anglin
abaa8f9cc4 Suppress -fstack-protector warning on hppa.
Some package builds enable -fstack-protector and -Werror. Since
-fstack-protector is not supported on hppa because the stack grows
up, these packages must check for the warning generated by
-fstack-protector and suppress it on hppa. This is problematic
since hppa is the only significant architecture where the stack
grows up.

2022-12-16  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

	* config/pa/pa.cc (pa_option_override): Disable -fstack-protector.

gcc/testsuite/ChangeLog:

	* lib/target-supports.exp (check_effective_target_static): Return 0
	on hppa*-*-*.
2023-02-11 16:23:24 +00:00
GCC Administrator
8f68a9b8e4 Daily bump. 2023-02-11 00:20:54 +00:00
Jakub Jelinek
c3ba668049 i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]
As reported in the PR, there are some -Wuninitialized warnings in
avx512erintrin.h.  One can see that by compiling sse-23.c testcase with
-Wuninitialized (or when actually using those intrinsics).
Those 6 spots use an uninitialized variable and pass it as one of the
argument to a builtin with constant mask -1, because there is no unmasked
builtin.  It is true that expansion of those builtins into RTL will see
mask is all ones and ignore the unneeded argument, but -Wuninitialized
is diagnosed on GIMPLE and on GIMPLE these builtins are just builtin calls.
avx512fintrin.h and other headers use in these cases the _mm*_undefined_* ()
intrinsics, like:
  return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
                                                 (__v8di) __Y,
                                                 (__v8di)
                                                 _mm512_undefined_epi32 (),
                                                 (__mmask8) -1);
etc. and the following patch does the same for avx512erintrin.h.
With the recent changes in C++ FE and the _mm*_undefined_* intrinsics,
we don't emit -Wuninitialized warnings for those (previously we didn't
just in C due to self-initialization).  Of course we could also
just self-initialize these uninitialized vars and add the #pragma GCC
diagnostic dances around it, but using the intrinsics is consistent with
the rest and IMHO cleaner.

2023-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105593
	* config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd,
	_mm512_exp2a23_round_ps, _mm512_rcp28_round_pd, _mm512_rcp28_round_ps,
	_mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use
	_mm512_undefined_pd () or _mm512_undefined_ps () instead of using
	uninitialized automatic variable __W.

	* gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options.

(cherry picked from commit 4160239045)
2023-02-10 16:26:18 +01:00
Jakub Jelinek
72af61b122 x86: Avoid -Wuninitialized warnings on _mm*_undefined_* in C++ [PR105593]
In https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609844.html
I've posted a patch to allow ignoring -Winit-self using GCC diagnostic
pragmas, such that one can mark self-initialization as intentional
disabling of -Wuninitialized warnings.

The following incremental patch uses that in the x86 intrinsic
headers.

2023-01-16  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105593
gcc/
	* config/i386/xmmintrin.h (_mm_undefined_ps): Temporarily
	disable -Winit-self using pragma GCC diagnostic ignored.
	* config/i386/emmintrin.h (_mm_undefined_pd, _mm_undefined_si128):
	Likewise.
	* config/i386/avxintrin.h (_mm256_undefined_pd, _mm256_undefined_ps,
	_mm256_undefined_si256): Likewise.
	* config/i386/avx512fintrin.h (_mm512_undefined_pd,
	_mm512_undefined_ps, _mm512_undefined_epi32): Likewise.
	* config/i386/avx512fp16intrin.h (_mm_undefined_ph,
	_mm256_undefined_ph, _mm512_undefined_ph): Likewise.
gcc/testsuite/
	* g++.target/i386/pr105593.C: New test.

(cherry picked from commit 6b0907b4fc)
2023-02-10 16:23:22 +01:00
Jakub Jelinek
732d744e82 c, c++: Allow ignoring -Winit-self through pragmas [PR105593]
As mentioned in the PR, various x86 intrinsics need to return
an uninitialized vector.  Currently they use self initialization
to avoid -Wuninitialized warnings, which works fine in C, but
doesn't work in C++ where -Winit-self is enabled in -Wall.
We don't have an attribute to mark a variable as knowingly
uninitialized (the uninitialized attribute exists but means
something else, only in the -ftrivial-auto-var-init context),
and trying to suppress either -Wuninitialized or -Winit-self
inside of the _mm_undefined_ps etc. intrinsic definitions
doesn't work, one needs to currently disable through pragmas
-Wuninitialized warning at the point where _mm_undefined_ps etc.
result is actually used, but that goes against the intent of
those intrinsics.

The -Winit-self warning option actually doesn't do any warning,
all we do is record a suppression for -Winit-self if !warn_init_self
on the decl definition and later look that up in uninit pass.

The following patch changes those !warn_init_self tests which
are true only based on the command line option setting, not based
on GCC diagnostic pragma overrides to
!warning_enabled_at (DECL_SOURCE_LOCATION (decl), OPT_Winit_self)
such that it takes them into account.

2023-01-16  Jakub Jelinek  <jakub@redhat.com>

	PR c++/105593
gcc/c/
	* c-parser.cc (c_parser_initializer): Check warning_enabled_at
	at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
	of warn_init_self.
gcc/cp/
	* decl.cc (cp_finish_decl): Check warning_enabled_at
	at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
	of warn_init_self.
gcc/testsuite/
	* c-c++-common/Winit-self3.c: New test.
	* c-c++-common/Winit-self4.c: New test.
	* c-c++-common/Winit-self5.c: New test.

(cherry picked from commit 98b41fd404)
2023-02-10 16:22:34 +01:00
Marek Polacek
aabebf76e9 c-family: Honor -Wno-init-self for cv-qual vars [PR102633]
Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r
conversion by creating a NOP_EXPR.  For e.g.

  const int i = i;

that means that the DECL_INITIAL is '(int) i' and not 'i' anymore.
Consequently, we don't suppress_warning here:

711     case DECL_EXPR:
715       if (VAR_P (DECL_EXPR_DECL (*expr_p))
716           && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p))
717           && !TREE_STATIC (DECL_EXPR_DECL (*expr_p))
718           && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL (*expr_p))
719           && !warn_init_self)
720         suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self);

because of the check on line 718 -- (int) i is not i.  So -Wno-init-self
doesn't disable the warning as it's supposed to.

The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.

	PR middle-end/102633

gcc/c-family/ChangeLog:

	* c-gimplify.cc (c_gimplify_expr) <case DECL_EXPR>: Don't call
	suppress_warning here.

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_initializer): Add new tree parameter.  Use it.
	Call suppress_warning.
	(c_parser_declaration_or_fndef): Pass d down to c_parser_initializer.
	(c_parser_omp_declare_reduction): Pass omp_priv down to
	c_parser_initializer.

gcc/cp/ChangeLog:

	* decl.cc (cp_finish_decl): Call suppress_warning.

gcc/testsuite/ChangeLog:

	* c-c++-common/Winit-self1.c: New test.
	* c-c++-common/Winit-self2.c: New test.

(cherry picked from commit 04ce2400b3)
2023-02-10 16:19:36 +01:00
Jakub Jelinek
c153fb6a5b c++: Handle structured bindings like anon unions in initializers [PR108474]
As reported by Andrew Pinski, structured bindings (with the exception
of the ones using std::tuple_{size,element} and get which are really
standalone variables in addition to the binding one) also use
DECL_VALUE_EXPR and needs the same treatment in static initializers.

On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote:
> Though, actually, why not instead fix expand_expr_real_1 (and staticp) to
> look through DECL_VALUE_EXPR?

Doing it when emitting the initializers seems to be too late to me,
we in various spots try to put parts of the static var DECL_INITIAL expressions
into the IL, or e.g. for varpool purposes remember which vars are referenced
there.

This patch moves it to record_reference, which is called from varpool_node::analyze
and so about the same time as gimplification of the bodies which also
replaces DECL_VALUE_EXPRs.

2023-01-24  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108474
	* cp-gimplify.cc (cp_fold_r): Handle structured bindings
	vars like anon union artificial vars.

	* g++.dg/cpp1z/decomp57.C: New test.
	* g++.dg/cpp1z/decomp58.C: New test.

(cherry picked from commit b84e211157)
2023-02-10 15:52:50 +01:00
Jakub Jelinek
a015ebe382 forwprop: Further fixes for simplify_rotate [PR108440]
As mentioned in the simplify_rotate comment, for e.g.
   ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
we already emit
   X r<< (Y & (B - 1))
as replacement.  This PR is about the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms if T2 is wider than T.  Unlike e.g.
   (X << Y) OP (X >> (B - Y))
which is valid just for Y in [1, B - 1], the above 2 forms are actually
valid and do the rotates for Y in [0, B] - for Y 0 the X value is preserved
by the left shift and right logical shift by B adds just zeros (but because
the shift is in wider precision B is still valid shift count), while for
Y equal to B X is preserved through the latter shift and the former adds
just zeros.
Now, it is unclear if we in the middle-end treat rotates with rotate count
equal or larger than precision as UB or not, unlike shifts there are less
reasons to do so, but e.g. expansion of X r<< Y if there is no rotate optab
for the mode is emitted as (X << Y) | (((unsigned) X) >> ((-Y) & (B - 1)))
and so with UB on Y == B.

The following patch does multiple things:
1) for the above 2, asks the ranger if Y could be equal to B and if so,
   instead of using X r<< Y uses X r<< (Y & (B - 1))
2) for the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
   forms that were fixed 2 days ago it only punts if Y might be in the
   [B,B2-1] range but isn't known to be in the
   [0,B][2*B,2*B][3*B,3*B]... range.  Because for Y which is a multiple
   of B but smaller than B2 it acts as a rotate too, left shift provides
   0 and (-Y) & (B - 1) is 0 and so preserves X.  Though, for the cases
   where Y is not known to be in [0,B-1] the patch also uses
   X r<< (Y & (B - 1)) rather than X r<< Y
3) as discussed with Aldy, instead of using global ranger it uses a pass
   specific copy but lazily created on first simplify_rotate that needs it;
   this e.g. handles rotate inside of if body where the guarding condition
   limits the shift count to some range which will not work with the
   global ranger (unless there is some SSA_NAME to attach the range to).

Note, e.g. on x86 X r<< (Y & (B - 1)) and X r<< Y actually emit the
same assembly because rotates work the same even for larger rotate counts,
but that is handled only during combine.

2023-01-19  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/108440
	* tree-ssa-forwprop.cc: Include gimple-range.h.
	(simplify_rotate): For the forms with T2 wider than T and shift counts of
	Y and B - Y add & (B - 1) masking for the rotate count if Y could be equal
	to B.  For the forms with T2 wider than T and shift counts of
	Y and (-Y) & (B - 1), don't punt if range could be [B, B2], but only if
	range doesn't guarantee Y < B or Y = N * B.  If range doesn't guarantee
	Y < B, also add & (B - 1) masking for the rotate count.  Use lazily created
	pass specific ranger instead of get_global_range_query.
	(pass_forwprop::execute): Disable that ranger at the end of pass if it has
	been created.

	* c-c++-common/rotate-10.c: New test.
	* c-c++-common/rotate-11.c: New test.

(cherry picked from commit 05b9868b18)
2023-02-10 15:38:12 +01:00
Jakub Jelinek
a558a4d3d1 forwprop: Fix up rotate pattern matching [PR106523]
The comment above simplify_rotate roughly describes what patterns
are matched into what:
   We are looking for X with unsigned type T with bitsize B, OP being
   +, | or ^, some type T2 wider than T.  For:
   (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
   ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B

   transform these into:
   X r<< CNT1

   Or for:
   (X << Y) OP (X >> (B - Y))
   (X << (int) Y) OP (X >> (int) (B - Y))
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
   (X << Y) | (X >> ((-Y) & (B - 1)))
   (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))

   transform these into (last 2 only if ranger can prove Y < B):
   X r<< Y

   Or for:
   (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
   (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
   ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) (Y & (B - 1)))) \
     | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))

   transform these into:
   X r<< (Y & (B - 1))

The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate.  E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there.  Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
   ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
   ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.

The following patch disables the
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
unless ranger says Y is not in [B, B2 - 1] range.

And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X.  So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.

2023-01-17  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/106523
	* tree-ssa-forwprop.cc (simplify_rotate): For the
	patterns with (-Y) & (B - 1) in one operand's shift
	count and Y in another, if T2 has wider precision than T,
	punt if Y could have a value in [B, B2 - 1] range.

	* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
	f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
	__builtin_unreachable about shift count.
	* c-c++-common/rotate-2b.c: New test.
	* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
	f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
	__builtin_unreachable about shift count.
	* c-c++-common/rotate-4b.c: New test.
	* gcc.c-torture/execute/pr106523.c: New test.

(cherry picked from commit 001121e892)
2023-02-10 15:21:51 +01:00
Jakub Jelinek
369454ecb5 c++: Avoid incorrect shortening of divisions [PR108365]
The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.

Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value.  Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.

But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.

So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion.

2023-01-14  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108365
	* typeck.cc (cp_build_binary_op): For integral division or modulo,
	shorten if type0 is unsigned, or op0 is cast from narrower unsigned
	integral type or stripped_op1 is INTEGER_CST other than -1.

	* g++.dg/opt/pr108365.C: New test.
	* g++.dg/warn/pr108365.C: New test.

(cherry picked from commit 5b3a88640f)
2023-02-10 15:16:56 +01:00
Jakub Jelinek
9c0cbe071b libstdc++: Another merge from fast_float upstream [PR107468]
Upstream fast_float came up with a cheaper test for
fegetround () == FE_TONEAREST using one float addition, one subtraction
and one comparison.  If we know we are rounding to nearest, we can use
fast path in more cases as before.
The following patch merges those changes into libstdc++.

2022-11-24  Jakub Jelinek  <jakub@redhat.com>

	PR libstdc++/107468
	* src/c++17/fast_float/fast_float.h: Partial merge from fast_float
	2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit.

(cherry picked from commit ec73b55c75)
2023-02-10 14:59:24 +01:00
Jakub Jelinek
0061c84d2d libstdc++: Update from latest fast_float [PR107468]
The following patch partially updates from fast_float trunk.
That way it gets fix for the incorrect rounding case,
PR107468 aka https://github.com/fastfloat/fast_float/issues/149
Using std::fegetround showed in benchmarks too slow, so instead of
doing that the patch limits the fast path where it uses floating
point multiplication rather than integral to cases where we can
prove there will be no rounding (the multiplication will be exact, not
just that the two multiplication or division operation arguments are
exactly representable).

2022-11-07  Jakub Jelinek  <jakub@redhat.com>

	PR libstdc++/107468
	* src/c++17/fast_float/fast_float.h: Partial merge from fast_float
	662497742fea7055f0e0ee27e5a7ddc382c2c38e commit.
	* testsuite/20_util/from_chars/pr107468.cc: New test.

(cherry picked from commit cb0ceeaee9)
2023-02-10 14:54:33 +01:00
Andrew Pinski
a5453c659b match.pd: When simplifying BFR of an insert, require a mode precision integral type [PR108688]
The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.

	PR tree-optimization/108688
	* match.pd (bit_field_ref [bit_insert]): Avoid generating
	BIT_FIELD_REFs of non-mode-precision integral operands.

	* gcc.c-torture/compile/pr108688-1.c: New test.

(cherry picked from commit 44f308e59b)
2023-02-10 14:38:10 +01:00
Jakub Jelinek
00136f439e vect-patterns: Fix up vect_widened_op_tree [PR108692]
The following testcase is miscompiled on aarch64-linux since r11-5160.
Given
  <bb 3> [local count: 955630225]:
  # i_22 = PHI <i_20(6), 0(5)>
  # r_23 = PHI <r_19(6), 0(5)>
...
  a.0_5 = (unsigned char) a_15;
  _6 = (int) a.0_5;
  b.1_7 = (unsigned char) b_17;
  _8 = (int) b.1_7;
  c_18 = _6 - _8;
  _9 = ABS_EXPR <c_18>;
  r_19 = _9 + r_23;
...
where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
we first pattern recognize c_18 as
patt_34 = (a.0_5) w- (b.1_7);
which is still correct, 5/7 are unsigned char subtracted in wider type,
but then vect_recog_sad_pattern turns it into
SAD_EXPR <a_15, b_17, r_23>
which is incorrect, because 15/17 are signed char and so it is
sum of absolute signed differences rather than unsigned sum of
absolute unsigned differences.
The reason why this happens is that vect_recog_sad_pattern calls
vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
calls vect_look_through_possible_promotion on the operands of the
WIDEN_MINUS_EXPR, which looks through the further casts.
vect_look_through_possible_promotion has careful code to stop when there
would be nested casts that need to be preserved, but the problem here
is that the WIDEN_*_EXPR operation itself has an implicit cast on the
operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
and vect_look_through_possible_promotion obviously isn't told about that.

Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
to look through possible promotions already when creating those and so
vect_look_through_possible_promotion again isn't really needed, all we need
to do is arrange what that function will do if the operand isn't result
of any cast.  Other option would be let vect_look_through_possible_promotion
know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
that would be much harder.

2023-02-08  Jakub Jelinek  <jakub@redhat.com>

	PR tree-optimization/108692
	* tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is
	widened_code which is different from code, don't call
	vect_look_through_possible_promotion but instead just check op is
	SSA_NAME with integral type for which vect_is_simple_use is true
	and call set_op on this_unprom.

	* gcc.dg/pr108692.c: New test.

(cherry picked from commit 6ad1c10276)
2023-02-10 14:37:15 +01:00
Jakub Jelinek
f2731d1b9a fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]
The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently.  The problem is that
gfc_trans_use_stmts does
          tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash,
                                                          INSERT);
          if (*slot == NULL)
and later on doesn't store anything into *slot and continues.  Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.

2023-02-03  Jakub Jelinek  <jakub@redhat.com>

	PR fortran/108451
	* trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before
	doing continue.

(cherry picked from commit 76f7f0eddc)
2023-02-10 14:35:59 +01:00
Jakub Jelinek
8e74c4389c nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for declare_vars [PR108435]
When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always
into a GIMPLE_BIND, but when putting statements directly into
OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there
are any temporaries that need to be declared in the sequence).
convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always
because it called declare_vars on it.

The following patch wraps it into GIMPLE_BIND in tree-nested if we need to
declare_vars on it on demand.

2023-02-02  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/108435
	* tree-nested.cc (convert_nonlocal_omp_clauses)
	<case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq
	is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND
	before calling declare_vars.
	(convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge
	with the OMP_CLAUSE_LASTPRIVATE handling except for whether
	seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause)
	or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause).

	* gcc.dg/gomp/pr108435.c: New test.

(cherry picked from commit 0f349928e1)
2023-02-10 14:35:23 +01:00
Jakub Jelinek
7bd8b65bd5 ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]
The PR78437 r7-4871 changes made combine_reaching_defs punt on
WORD_REGISTER_OPERATIONS targets if a setter of smaller than word
register has wider uses.  This unfortunately breaks -fcompare-debug,
because if such a use appears only in DEBUG_INSN(s), while all other
uses aren't wider than the setter, we can REE optimize it without -g
and not with -g.

Such decisions shouldn't be based on debug instructions.  We could try
to reset them or adjust in some other way after we decide to perform the
change, but at least on the testcase which used to fail on riscv64-linux
the
(debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160])
                (const_int 1 [0x1])) 0)
        (subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151])
                (debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1
     (nil))
clearly doesn't care about the upper bits and I have hard time imaging how
could one end up with DEBUG_INSN which actually cares about those upper
bits.

So, the following patch just ignores uses on DEBUG_INSNs in this case,
if we run into something where we'd need to do something further later on,
let's deal with it when we have a testcase for it.

2023-02-01  Jakub Jelinek  <jakub@redhat.com>

	PR debug/108573
	* ree.cc (combine_reaching_defs): Don't return false for paradoxical
	subregs in DEBUG_INSNs.

	* gcc.dg/pr108573.c: New test.

(cherry picked from commit e4473d7cf8)
2023-02-10 14:34:38 +01:00
Jakub Jelinek
a62d952064 c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression evaluation [PR108607]
While potential_constant_expression_1 handled most of OMP_* codes (by saying that
they aren't potential constant expressions), OMP_SCOPE was missing in that list.
I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION
it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS).
As the testcase shows, it isn't enough, potential_constant_expression_1
can catch only some cases, as soon as one uses switch or ifs where at least
one of the possible paths could be constant expression, we can run into the
same codes during cxx_eval_constant_expression, so this patch handles those
there as well.

2023-02-01  Jakub Jelinek  <jakub@redhat.com>

	PR c++/108607
	* constexpr.cc (cxx_eval_constant_expression): Handle OMP_*
	and OACC_* constructs as non-constant.
	(potential_constant_expression_1): Handle OMP_SCAN and OMP_SCOPE.

	* g++.dg/gomp/pr108607.C: New test.

(cherry picked from commit bfc070595b)
2023-02-10 14:33:51 +01:00
Jakub Jelinek
7d7f275ebe i386: Fix up ix86_convert_const_wide_int_to_broadcast [PR108599]
The following testcase is miscompiled.  The problem is that during
RTL DSE we see a V4DI register is being loaded { 16, 16, 0, 0 }
value and DSE mostly works in terms of scalar modes, so it calls
movoi to set an OImode REG to (const_wide_int 0x100000000000000010)
and ix86_convert_const_wide_int_to_broadcast thinks it can compute
that value by broadcasting DImode 0x10.  While it is true that
for TImode result the broadcast could be used, for OImode/XImode
it can't be, because all but the lowest 2 HOST_WIDE_INTs aren't
present (so are 0 or -1 depending on sign), not 0x10 in this case.
The function checks if the least significant HOST_WIDE_INT elt
of the CONST_WIDE_INT is broadcastable from QI/HI/SI/DImode and then
  /* Check if OP can be broadcasted from VAL.  */
  for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++)
    if (val != CONST_WIDE_INT_ELT (op, i))
      return nullptr;
That is needed of course, but nothing checks that
CONST_WIDE_INT_NUNITS (op) isn't too small for the mode in question.
I think if op would be 0 or -1, it ought to be never CONST_WIDE_INT,
but CONST_INT and so we can just punt whenever the number of
CONST_WIDE_INT elts is not the expected one.

2023-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR target/108599
	* config/i386/i386-expand.cc
	(ix86_convert_const_wide_int_to_broadcast): Return nullptr if
	CONST_WIDE_INT_NUNITS (op) times HOST_BITS_PER_WIDE_INT isn't
	equal to bitsize of mode.

	* gcc.target/i386/avx2-pr108599.c: New test.

(cherry picked from commit 963315a922)
2023-02-10 14:29:16 +01:00
Jakub Jelinek
e365bfacf2 bbpart: Fix up ICE on asm goto [PR108596]
On the following testcase we have asm goto in hot block with 2 successors,
one cold to which it both falls through and has one of the label
pointing to it and another hot successor with another label.

Now, during bbpart we want to ensure that no blocks from one partition fall
through into a block in a different partition.  fix_up_fall_thru_edges
does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge,
calling force_nonfallthru and then depending on whether it created a new
bb either set EDGE_CROSSING on the single successor edge from the new bb
(the new bb is kept in the same partition as the predecessor block), or
if no new bb has been created setting EDGE_CROSSING back on the fallthru
edge which has been forced non-EDGE_FALLTHRU.
For asm goto this doesn't always work, force_nonfallthru can create a new bb
and change the fallthrough edge to point to that, but if the original
fallthru destination block has its label referenced among the asm goto
labels, it will create a new non-fallthru edge for the label(s).
But because we've temporarily cheated and cleared EDGE_CROSSING on the edge,
it is cleared on the new edge as well, then the caller sees we've created
a new bb and just sets EDGE_CROSSING on the single fallthru edge from the
new bb.  But the direct edge from cur_bb to fallthru edge's destination
isn't handled and fails afterwards consistency checks, because it crosses
partitions.

The following patch notes the case and sets EDGE_CROSSING on that edge too.

2023-01-31  Jakub Jelinek  <jakub@redhat.com>

	PR rtl-optimization/108596
	* bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where cur_bb
	ends with asm goto and has a crossing fallthrough edge to the same bb
	that contains at least one of its labels by restoring EDGE_CROSSING
	flag even on possible edge from cur_bb to new_bb successor.

	* gcc.c-torture/compile/pr108596.c: New test.

(cherry picked from commit 603a6fbcaa)
2023-02-10 14:28:36 +01:00
Jakub Jelinek
ca1e81324a doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]
__builtin_va_arg_pack_len as implemented returned int since its introduction
in 2007.  The initial documentation didn't mention any return type,
which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some
documentation formatting cleanups
https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html
I can understand that for formatting some type was needed there
but what exactly hasn't been really discussed.

So, I think we should change documentation to match the implementation,
rather than change implementation to match the documentation.
Most people don't use more than 2147483647 arguments to inline functions,
and on poor targets with 16-bit ints I bet even having more than 65535
arguments to inline functions would be highly unexpected.

2023-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR other/108560
	* doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len
	from size_t to int.

(cherry picked from commit 16f30680f4)
2023-02-10 14:27:52 +01:00