riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2021-04-30	aarch64: Use RTL builtins for FP ml[as][q]_laneq intrinsics	Jonathan Wright	3	-4/+62
	Rewrite floating-point vml[as][q]_laneq Neon intrinsics to use RTL builtins rather than relying on the GCC vector extensions. Using RTL builtins allows control over the emission of fmla/fmls instructions (which we don't want here.) With this commit, the code generated by these intrinsics changes from a fused multiply-add/subtract instruction to an fmul followed by an fadd/fsub instruction. If the programmer really wants fmla/fmls instructions, they can use the vfm[as] intrinsics. gcc/ChangeLog: 2021-02-17 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as][q]_laneq builtin generator macros. * config/aarch64/aarch64-simd.md (mul_laneq<mode>3): Define. (aarch64_float_mla_laneq<mode>): Define. (aarch64_float_mls_laneq<mode>): Define. * config/aarch64/arm_neon.h (vmla_laneq_f32): Use RTL builtin instead of GCC vector extensions. (vmlaq_laneq_f32): Likewise. (vmls_laneq_f32): Likewise. (vmlsq_laneq_f32): Likewise.
2021-04-30	aarch64: Use RTL builtins for FP ml[as][q]_lane intrinsics	Jonathan Wright	3	-13/+55
	Rewrite floating-point vml[as][q]_lane Neon intrinsics to use RTL builtins rather than relying on the GCC vector extensions. Using RTL builtins allows control over the emission of fmla/fmls instructions (which we don't want here.) With this commit, the code generated by these intrinsics changes from a fused multiply-add/subtract instruction to an fmul followed by an fadd/fsub instruction. If the programmer really wants fmla/fmls instructions, they can use the vfm[as] intrinsics. gcc/ChangeLog: 2021-02-16 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as]_lane builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_mul3_elt<mode>): Rename to... (mul_lane<mode>3): This, and re-order arguments. (aarch64_float_mla_lane<mode>): Define. (aarch64_float_mls_lane<mode>): Define. config/aarch64/arm_neon.h (vmla_lane_f32): Use RTL builtin instead of GCC vector extensions. (vmlaq_lane_f32): Likewise. (vmls_lane_f32): Likewise. (vmlsq_lane_f32): Likewise.
2021-04-30	aarch64: Use RTL builtins for FP ml[as] intrinsics	Jonathan Wright	4	-8/+43
	Rewrite floating-point vml[as][q] Neon intrinsics to use RTL builtins rather than relying on the GCC vector extensions. Using RTL builtins allows control over the emission of fmla/fmls instructions (which we don't want here.) With this commit, the code generated by these intrinsics changes from a fused multiply-add/subtract instruction to an fmul followed by an fadd/fsub instruction. If the programmer really wants fmla/fmls instructions, they can use the vfm[as] intrinsics. gcc/ChangeLog: 2021-02-16 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as] builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_float_mla<mode>): Define. (aarch64_float_mls<mode>): Define. * config/aarch64/arm_neon.h (vmla_f32): Use RTL builtin instead of relying on GCC vector extensions. (vmla_f64): Likewise. (vmlaq_f32): Likewise. (vmlaq_f64): Likewise. (vmls_f32): Likewise. (vmls_f64): Likewise. (vmlsq_f32): Likewise. (vmlsq_f64): Likewise. * config/aarch64/iterators.md: Define VDQF_DF mode iterator.
2021-04-30	aarch64: Use RTL builtins for FP ml[as]_n intrinsics	Jonathan Wright	3	-34/+47
	Rewrite floating-point vml[as][q]_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-01-18 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_ml[as]_n_builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_mul3_elt_from_dup<mode>): Rename to... (mul_n<mode>3): This, and re-order arguments. (aarch64_float_mla_n<mode>): Define. (aarch64_float_mls_n<mode>): Define. config/aarch64/arm_neon.h (vmla_n_f32): Use RTL builtin instead of inline asm. (vmlaq_n_f32): Likewise. (vmls_n_f32): Likewise. (vmlsq_n_f32): Likewise.
2021-04-30	aarch64: Use RTL builtins for vmull[_high]_p8 intrinsics	Jonathan Wright	3	-12/+44
	Rewrite vmull[_high]_p8 Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-05 Jonathan Wright <joanthan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add pmull[2] builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_pmullv8qi): Define. (aarch64_pmull_hiv16qi_insn): Define. (aarch64_pmull_hiv16qi): Define. * config/aarch64/arm_neon.h (vmull_high_p8): Use RTL builtin instead of inline asm. (vmull_p8): Likewise.
2021-04-30	AVR cc0 conversion - adjust peepholes	Senthil Kumar Selvaraj	1	-216/+308
	This patch adjusts peepholes to match and generate parallels with a clobber of REG_CC. It also sets mov<mode>_insn as the name of the pattern for the split insn (rather than the define_insn_and_split), so that avr_2word_insn_p, which looks for CODE_FOR_mov<mode>_insn, works correctly. This is required for the cpse.eq peephole to fire, and also helps generate better code for avr_out_sbxx_branch. gcc/ChangeLog: config/avr/avr.md: Adjust peepholes to match and generate parallels with clobber of REG_CC. (mov<mode>_insn): Rename to mov<mode>_insn_split. (*mov<mode>_insn): Rename to mov<mode>_insn.
2021-04-30	Define target hook to emit KFmode constants for libgcc.	Michael Meissner	1	-0/+29
	This patch defines a target hook so that the KFmode constants (__LIBGCC_KF_MAX__, __LIBGCC_KF_MIN__, and __LIBGCC_KF_EPSILON__) needed to build _divkc3.c in libgcc are defined. The need for these constants were added in the April 28th changes to libgcc that added complex division optimizations. We only define the KFmode constants if IEEE 128-bit floating point is supported, but long double does not use the IEEE 128-bit format. If long double uses the IEEE 128-bit format, it will use TFmode and not KFmode. gcc/ 2021-04-30 Michael Meissner <meissner@linux.ibm.com> PR bootstrap/100327 * config/rs6000/rs6000.c (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Define. (rs6000_libgcc_floating_mode_supported_p): New target hook.
2021-04-30	i386: Introduce reversed ADC and SBB patterns [PR98060]	Uros Bizjak	3	-7/+84
	The compiler is able to merge LTU comparisons with PLUS or MINUS pattern to form addition with carry (ADC) and subtraction with borrow (SBB) instructions: op = op + carry [ADC $0, op] op = op - carry [SBB $0, op] The patch introduces reversed ADC and SBB insn patterns: op = op + !carry [SBB $-1, op] op = op - !carry [ADC $-1, op] allowing the compiler to also merge GEU comparisons. 2021-04-30 Uroš Bizjak <ubizjak@gmail.com> gcc/ PR target/98060 * config/i386/i386.md (add<mode>3_carry_0r): New insn pattern. (addsi3_carry_zext_0r): Ditto. (sub<mode>3_carry_0): Ditto. (subsi3_carry_zext_0r): Ditto. * config/i386/predicates.md (ix86_carry_flag_unset_operator): New predicate. * config/i386/i386.c (ix86_rtx_costs) <case PLUS, case MINUS>: Also consider ix86_carry_flag_unset_operator to calculate the cost of adc/sbb insn. gcc/testsuite/ PR target/98060 * gcc.target/i386/pr98060.c: New test.
2021-04-29	RISC-V: For '-march' and '-mabi' options, add 'Negative' property mentions ↵	Geng Qi	1	-2/+2
	itself. When use multi-lib riscv-tool-chain. A bug is triggered when there are two '-march' at command line. riscv64-unknown-elf-gcc -march=rv32gcp -mabi=ilp32f -march=rv32gcpzp64 HelloWorld.c /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/lib/crt0.o: ABI is incompatible with that of the selected emulation: target emulation `elf64-littleriscv' does not match `elf32-littleriscv' /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: failed to merge target specific data of file /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/lib/crt0.o /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/crtbegin.o: ABI is incompatible with that of the selected emulation: target emulation `elf64-littleriscv' does not match `elf32-littleriscv' /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: failed to merge target specific data of file /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/crtbegin.o ...... This patch fix it. And the DRIVER would prune the extra '-march' and '-mabi' options and keep only the last one valid. gcc/ChangeLog: * config/riscv/riscv.opt (march=,mabi=): Negative itself.
2021-04-29	RISC-V: Add patterns for builtin overflow.	LevyHsu	3	-0/+257
	gcc/ * config/riscv/riscv.c (riscv_min_arithmetic_precision): New. * config/riscv/riscv.h (TARGET_MIN_ARITHMETIC_PRECISION): New. * config/riscv/riscv.md (addv<mode>4, uaddv<mode>4): New. (subv<mode>4, usubv<mode>4, mulv<mode>4, umulv<mode>4): New.
2021-04-29	add ASM_OUTPUT_MAX_SKIP_ALIGN to i386.h	Alexandre Oliva	13	-165/+11
	Several i386 align tests expect p2align to be used, but not all configurations define ASM_OUTPUT_MAX_SKIP_ALIGN, even when HAVE_GAS_MAX_SKIP_P2ALIGN. i386.h had an equivalent ASM_OUTPUT_MAX_SKIP_PAD. I've renamed it and its uses to the documented _ALIGN spelling, and dropped all redundant defines elsewhere in gcc/config/i386/. for gcc/ChangeLog * config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Rename to... (ASM_OUTPUT_MAX_SKIP_ALIGN): ... this. Enclose in do/while(0). * config/i386/i386.c: Adjust. * config/i386/i386.md: Adjust. * config/i386/darwin.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Drop. * config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/freebsd.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/gnu-user.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/iamcu.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/openbsdelf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. * config/i386/x86-64.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Likewise. (ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
2021-04-29	i386: Optimize carry flag comparisons a bit	Uros Bizjak	2	-2/+10
	In ix86_int_compare, opportunistically swap operands of GTU and LEU comparisons to emit carry flag comparison, with the expectation that the comparison will combine to add<mode>3_carry_0 or sub<mode>3_carry_0 insn pattern. Do not use ix86_expand_carry_flag_compare because this function prefers carry flag comparisons too much - it forces the constants into registers and/or emits additional arithmetic instructions to convert simple comparisons into carry flag comparisons - but simply swap operands to convert GTU and LEU comparisons into GEU and LTU ones. Also, change the insn predicates of add<mode>3_carry_0 and sub<mode>3_carry_0 insn patterns to allow more combine opportunities with memory operands. 2021-04-29 Uroš Bizjak <ubizjak@gmail.com> gcc/ * config/i386/i386-expand.c (ix86_expand_int_compare): Swap operands of GTU and LEU comparison to emit carry flag comparison. * config/i386/i386.md (add<mode>3_carry_0): Change insn predicate to allow more combine opportunities with memory operands. (sub<mode>3_carry_0): Ditto.
2021-04-29	Fix nios2 build failure	Jeff Law	1	-1/+1
	gcc * config/nios2/nios2-protos.h (nios2_fpu_insn_enabled): Move outside of RTX_CODE guard.
2021-04-29	i386: Mark x86 masked load builtins pure [PR100312]	Uros Bizjak	3	-11/+44
	Mark x86 AVX and AVX2 masked load builtins pure to enable dead code elimination and more appropriate alias analysis. 2021-04-29 Uroš Bizjak <ubizjak@gmail.com> Richard Biener <rguenther@suse.de> gcc/ PR target/100312 * config/i386/i386-builtin.def (IX86_BUILTIN_MASKLOADPD) (IX86_BUILTIN_MASKLOADPS, IX86_BUILTIN_MASKLOADPD256) (IX86_BUILTIN_MASKLOADPS256, IX86_BUILTIN_MASKLOADD) (IX86_BUILTIN_MASKLOADQ, IX86_BUILTIN_MASKLOADD256) (IX86_BUILTIN_MASKLOADQ256): Move from SPECIAL_ARGS to PURE_ARGS category. * config/i386/i386-builtins.c (ix86_init_mmx_sse_builtins): Handle PURE_ARGS category. * config/i386/i386-expand.c (ix86_expand_builtin): Ditto.
2021-04-29	i386: Cleanup comparison predicates.	Uros Bizjak	1	-18/+13
	CCCmode is allowed only with GEU and LTU comparison codes. Also allow CCGZmode for these two codes. There is no need to check for trivial FP comparison operator, ix86_fp_compare_code_to_integer will return UNKNOWN code for unsupported operators. 2021-04-29 Uroš Bizjak <ubizjak@gmail.com> gcc/ * config/i386/predicates.md (fcmov_comparison_operator): Do not check for trivial FP comparison operator. <case GEU, case LTU>: Allow CCGZmode. <case GTU, case LEU>: Do not allow CCCmode. (ix86_comparison_operator) <case GTU, case LEU>: Allow only CCmode. (ix86_carry_flag_operator): Match only LTU and UNLT code. Do not check for trivial FP comparison operator. Allow CCGZmode.
2021-04-29	Small housekeeping work in SPARC back-end	Eric Botcazou	2	-127/+70
	gcc/ * config/sparc/sparc.c (gen_load_pcrel_sym): Delete. (load_got_register): Do the PIC dance here. (sparc_legitimize_tls_address): Simplify. (sparc_emit_probe_stack_range): Likewise. (sparc32_initialize_trampoline): Likewise. (sparc64_initialize_trampoline): Likewise. * config/sparc/sparc.md (load_pcrel_sym<P:mode>): Add @ marker. (probe_stack_range<P:mode>): Likewise. (flush<P:mode>): Likewise. (tgd_hi22<P:mode>): Likewise. (tgd_lo10<P:mode>): Likewise. (tgd_add<P:mode>): Likewise. (tgd_call<P:mode>): Likewise. (tldm_hi22<P:mode>): Likewise. (tldm_lo10<P:mode>): Likewise. (tldm_add<P:mode>): Likewise. (tldm_call<P:mode>): Likewise. (tldo_hix22<P:mode>): Likewise. (tldo_lox10<P:mode>): Likewise. (tldo_add<P:mode>): Likewise. (tie_hi22<P:mode>): Likewise. (tie_lo10<P:mode>): Likewise. (tie_add<P:mode>): Likewise. (tle_hix22<P:mode>): Likewise. (tle_lox10<P:mode>): Likewise. (stack_protect_setsi): Rename to... (stack_protect_set32): ...this. (stack_protect_setdi): Rename to... (stack_protect_set64): ...this. (stack_protect_set): Adjust calls to above. (stack_protect_testsi): Rename to... (stack_protect_test32): ...this. (stack_protect_testdi): Rename to... (stack_protect_test64): ...this. (stack_protect_test): Adjust calls to above.
2021-04-29	Generate offset adjusted operation for op_by_pieces operations	H.J. Lu	1	-0/+3
	Add an overlap_op_by_pieces_p target hook for op_by_pieces operations between two areas of memory to generate one offset adjusted operation in the smallest integer mode for the remaining bytes on the last piece operation of a memory region to avoid doing more than one smaller operations. Pass the RTL information from the previous iteration to m_constfn in op_by_pieces operation so that builtin_memset_[read\|gen]_str can generate the new RTL from the previous RTL. Tested on Linux/x86-64. gcc/ PR middle-end/90773 * builtins.c (builtin_memcpy_read_str): Add a dummy argument. (builtin_strncpy_read_str): Likewise. (builtin_memset_read_str): Add an argument for the previous RTL information and generate the new RTL from the previous RTL info. (builtin_memset_gen_str): Likewise. * builtins.h (builtin_strncpy_read_str): Update the prototype. (builtin_memset_read_str): Likewise. * expr.c (by_pieces_ninsns): If targetm.overlap_op_by_pieces_p() returns true, round up size and alignment to the widest integer mode for maximum size. (pieces_addr::adjust): Add a pointer to by_pieces_prev argument and pass it to m_constfn. (op_by_pieces_d): Add m_push and m_overlap_op_by_pieces. (op_by_pieces_d::op_by_pieces_d): Add a bool argument to initialize m_push. Initialize m_overlap_op_by_pieces with targetm.overlap_op_by_pieces_p (). (op_by_pieces_d::run): Pass the previous RTL information to pieces_addr::adjust and generate overlapping operations if m_overlap_op_by_pieces is true. (PUSHG_P): New. (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d change. (store_by_pieces_d::store_by_pieces_d): Updated for op_by_pieces_d change. (can_store_by_pieces): Use by_pieces_constfn on constfun. (store_by_pieces): Use by_pieces_constfn on constfun. Updated for op_by_pieces_d change. (clear_by_pieces_1): Add a dummy argument. (clear_by_pieces): Updated for op_by_pieces_d change. (compare_by_pieces_d::compare_by_pieces_d): Likewise. (string_cst_read_str): Add a dummy argument. * expr.h (by_pieces_constfn): Add a dummy argument. (by_pieces_prev): New. * target.def (overlap_op_by_pieces_p): New target hook. * config/i386/i386.c (TARGET_OVERLAP_OP_BY_PIECES_P): New. * doc/tm.texi.in: Add TARGET_OVERLAP_OP_BY_PIECES_P. * doc/tm.texi: Regenerated. gcc/testsuite/ PR middle-end/90773 * g++.dg/pr90773-1.h: New test. * g++.dg/pr90773-1a.C: Likewise. * g++.dg/pr90773-1b.C: Likewise. * g++.dg/pr90773-1c.C: Likewise. * g++.dg/pr90773-1d.C: Likewise. * gcc.target/i386/pr90773-1.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr90773-4.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-6.c: Likewise. * gcc.target/i386/pr90773-7.c: Likewise. * gcc.target/i386/pr90773-8.c: Likewise. * gcc.target/i386/pr90773-9.c: Likewise. * gcc.target/i386/pr90773-10.c: Likewise. * gcc.target/i386/pr90773-11.c: Likewise. * gcc.target/i386/pr90773-12.c: Likewise. * gcc.target/i386/pr90773-13.c: Likewise. * gcc.target/i386/pr90773-14.c: Likewise.
2021-04-29	aarch64: Fix ICE in aarch64_add_offset_1_temporaries [PR100302]	Jakub Jelinek	1	-1/+1
	In PR94121 I've changed aarch64_add_offset_1 to use absu_hwi instead of abs_hwi because offset can be HOST_WIDE_INT_MIN. As can be seen with the testcase below, aarch64_add_offset_1_temporaries suffers from the same problem and should be in sync with aarch64_add_offset_1, i.e. for HOST_WIDE_INT_MIN it needs a temporary. 2021-04-29 Jakub Jelinek <jakub@redhat.com> PR target/100302 * config/aarch64/aarch64.c (aarch64_add_offset_1_temporaries): Use absu_hwi instead of abs_hwi. * gcc.target/aarch64/sve/pr100302.c: New test.
2021-04-28	aarch64: Remove unspecs from [su]qmovn RTL pattern	Jonathan Wright	3	-10/+5
	Saturating truncation can be expressed using the RTL expressions ss_truncate and us_truncate. This patch changes the implementation of the vqmovn_* intrinsics to use these RTL expressions rather than a pair of unspecs. The redundant unspecs are removed along with their code iterator. gcc/ChangeLog: 2021-04-12 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Modify comment to make consistent with updated RTL pattern. * config/aarch64/aarch64-simd.md (aarch64_<sur>qmovn<mode>): Implement using ss_truncate and us_truncate rather than unspecs. * config/aarch64/iterators.md: Remove redundant unspecs and iterator: UNSPEC_[SU]QXTN and SUQMOVN respectively.
2021-04-28	aarch64: Update attributes of arm_acle.h intrinsics	Jonathan Wright	1	-23/+46
	Update the attributes of all intrinsics defined in arm_acle.h to be consistent with the attributes of the intrinsics defined in arm_neon.h. Specifically, this means updating the attributes from: __extension__ static __inline <type> __attribute__ ((__always_inline__)) to: __extension__ extern __inline <type> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) gcc/ChangeLog: 2021-03-18 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/arm_acle.h (__attribute__): Make intrinsic attributes consistent with those defined in arm_neon.h.
2021-04-28	aarch64: Update attributes of arm_fp16.h intrinsics	Jonathan Wright	1	-89/+178
	Update the attributes of all intrinsics defined in arm_fp16.h to be consistent with the attributes of the intrinsics defined in arm_neon.h. Specifically, this means updating the attributes from: __extension__ static __inline <type> __attribute__ ((__always_inline__)) to: __extension__ extern __inline <type> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__)) gcc/ChangeLog: 2021-03-18 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/arm_fp16.h (__attribute__): Make intrinsic attributes consistent with those defined in arm_neon.h.
2021-04-28	aarch64: Use RTL builtins for vcvtx intrinsics	Jonathan Wright	4	-18/+62
	Rewrite vcvtx Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-18 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add float_trunc_rodd builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_float_trunc_rodd_df): Define. (aarch64_float_trunc_rodd_lo_v2sf): Define. (aarch64_float_trunc_rodd_hi_v4sf_le): Define. (aarch64_float_trunc_rodd_hi_v4sf_be): Define. (aarch64_float_trunc_rodd_hi_v4sf): Define. * config/aarch64/arm_neon.h (vcvtx_f32_f64): Use RTL builtin instead of inline asm. (vcvtx_high_f32_f64): Likewise. (vcvtxd_f32_f64): Likewise. * config/aarch64/iterators.md: Add FCVTXN unspec.
2021-04-28	aarch64: Use RTL builtins for v[q]tbx intrinsics	Jonathan Wright	3	-54/+30
	Rewrite v[q]tbx Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-12 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add tbx1 builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_tbx1<mode>): Define. * config/aarch64/arm_neon.h (vqtbx1_s8): USE RTL builtin instead of inline asm. (vqtbx1_u8): Likewise. (vqtbx1_p8): Likewise. (vqtbx1q_s8): Likewise. (vqtbx1q_u8): Likewise. (vqtbx1q_p8): Likewise. (vtbx2_s8): Likewise. (vtbx2_u8): Likewise. (vtbx2_p8): Likewise.
2021-04-28	aarch64: Use RTL builtins for v[q]tbl intrinsics	Jonathan Wright	2	-81/+32
	Rewrite v[q]tbl Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-12 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add tbl1 builtin generator macros. * config/aarch64/arm_neon.h (vqtbl1_p8): Use RTL builtin instead of inline asm. (vqtbl1_s8): Likewise. (vqtbl1_u8): Likewise. (vqtbl1q_p8): Likewise. (vqtbl1q_s8): Likewise. (vqtbl1q_u8): Likewise. (vtbl1_s8): Likewise. (vtbl1_u8): Likewise. (vtbl1_p8): Likewise. (vtbl2_s8): Likewise. (vtbl2_u8): Likewise. (vtbl2_p8): Likewise.
2021-04-28	aarch64: Use RTL builtins for polynomial vsri[q]_n intrinsics	Jonathan Wright	2	-77/+42
	Rewrite vsri[q]_n_p* Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add polynomial ssri_n buitin generator macro. * config/aarch64/arm_neon.h (vsri_n_p8): Use RTL builtin instead of inline asm. (vsri_n_p16): Likewise. (vsri_n_p64): Likewise. (vsriq_n_p8): Likewise. (vsriq_n_p16): Likewise. (vsriq_n_p64): Likewise.
2021-04-28	aarch64: Use RTL builtins for polynomial vsli[q]_n intrinsics	Jonathan Wright	3	-49/+28
	Rewrite vsli[q]_n_p* Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-10 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use VALLP mode iterator for polynomial ssli_n builtin generator macro. * config/aarch64/arm_neon.h (vsli_n_p8): Use RTL builtin instead of inline asm. (vsli_n_p16): Likewise. (vsliq_n_p8): Likewise. (vsliq_n_p16): Likewise. * config/aarch64/iterators.md: Define VALLP mode iterator.
2021-04-28	aarch64: Use RTL builtins for vpadal_[su]32 intrinsics	Jonathan Wright	3	-16/+6
	Rewrite vpadal_[su]32 Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-09 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use VDQV_L iterator to generate [su]adalp RTL builtins. * config/aarch64/aarch64-simd.md: Use VDQV_L iterator in [su]adalp RTL pattern. * config/aarch64/arm_neon.h (vpadal_s32): Use RTL builtin instead of inline asm. (vpadal_u32): Likewise.
2021-04-28	aarch64: Use RTL builtins for [su]paddl[q] intrinsics	Jonathan Wright	4	-72/+31
	Rewrite [su]paddl[q] Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add [su]addlp builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_<su>addlp<mode>): Define. * config/aarch64/arm_neon.h (vpaddl_s8): Use RTL builtin instead of inline asm. (vpaddl_s16): Likewise. (vpaddl_s32): Likewise. (vpaddl_u8): Likewise. (vpaddl_u16): Likewise. (vpaddl_u32): Likewise. (vpaddlq_s8): Likewise. (vpaddlq_s16): Likewise. (vpaddlq_s32): Likewise. (vpaddlq_u8): Likewise. (vpaddlq_u16): Likewise. (vpaddlq_u32): Liwewise. * config/aarch64/iterators.md: Define [SU]ADDLP unspecs with appropriate attributes.
2021-04-28	aarch64: Use RTL builtins for vpaddq intrinsics	Jonathan Wright	3	-53/+17
	Rewrite vpaddq Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Use VDQ_I iterator for aarch64_addp<mode> builtin macro generator. * config/aarch64/aarch64-simd.md: Use VDQ_I iterator in aarch64_addp<mode> RTL pattern. * config/aarch64/arm_neon.h (vpaddq_s8): Use RTL builtin instead of inline asm. (vpaddq_s16): Likewise. (vpaddq_s32): Likewise. (vpaddq_s64): Likewise. (vpaddq_u8): Likewise. (vpaddq_u16): Likewise. (vpaddq_u32): Likewise. (vpaddq_u64): Likewise.
2021-04-28	aarch64: Use RTL builtins for vq[r]dmulh[q]_n intrinsics	Jonathan Wright	3	-48/+23
	Rewrite vq[r]dmulh[q]_n Neon intrinsics to use RTL builtins rather than inline assembly code, allowing for better scheduling and optimization. gcc/ChangeLog: 2021-02-08 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd-builtins.def: Add sq[r]dmulh_n builtin generator macros. * config/aarch64/aarch64-simd.md (aarch64_sq<r>dmulh_n<mode>): Define. * config/aarch64/arm_neon.h (vqdmulh_n_s16): Use RTL builtin instead of inline asm. (vqdmulh_n_s32): Likewise. (vqdmulhq_n_s16): Likewise. (vqdmulhq_n_s32): Likewise. (vqrdmulh_n_s16): Likewise. (vqrdmulh_n_s32): Likewise. (vqrdmulhq_n_s16): Likewise. (vqrdmulhq_n_s32): Likewise.
2021-04-28	AVR cc0 conversion	Senthil Kumar Selvaraj	5	-1183/+4111
	See https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563638.html for background. This patch converts the avr backend to MODE_CC. It addresses some of the comments made in the previous submission over here (https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561757.html). Specifically, this patch has 1. Automatic clobber of REG_CC in inline asm statements, via TARGET_MD_ASM_ADJUST hook. 2. Direct clobber of REG_CC in insns emitted after reload (pro and epilogue). 3. Regression testing done on atmega8, atmega128, attiny40 and atxmega128a3 devices (more details below). 4. Verification and fixes for casesi and avr_compare_pattern related code that inspects insns, by looking at avr-casesi and mach RTL dumps. 5. Use length of parallel instead of passing in operand counts when generating code for shift patterns. 6. Fixes for indentation glitches. 7. Removal of CC_xxx stuff in avr-protos.h. In the places where the macros were still used (cond_string), I've replaced them with a bool hardcoded to false. I expect this will go away/get fixed when I eventually add specific CC modes. Things still to do: 1. Adjustment of peepholes/define_splits to match against patterns with REG_CC clobber. 2. Model effect of non-compare insns on REG_CC using additional CC modes. I'm hoping to use of a modified version of the cc attribute and define_subst (again inspired by the cris port), to do this. 3. RTX cost adjustment. gcc/ * config/avr/avr-dimode.md: Turn existing patterns into define_insn_and_split style patterns where the splitter adds a clobber of the condition code register. Drop "cc" attribute. Add new patterns to match output of the splitters. * config/avr/avr-fixed.md: Likewise. * config/avr/avr.c (cc_reg_rtx): New. (avr_parallel_insn_from_insns): Adjust insn count for removal of set of cc0. (avr_is_casesi_sequence): Likewise. (avr_casei_sequence_check_operands): Likewise. (avr_optimize_casesi): Likewise. Also insert new insns after jump_insn. (avr_pass_casesi::avr_rest_of_handle_casesi): Adjust for removal of set of cc0. (avr_init_expanders): Initialize cc_reg_rtx. (avr_regno_reg_class): Handle REG_CC. (cond_string): Remove usage of CC_OVERFLOW_UNUSABLE. (avr_notice_update_cc): Remove function. (ret_cond_branch): Remove usage of CC_OVERFLOW_UNUSABLE. (compare_condition): Adjust for PARALLEL with REG_CC clobber. (out_shift_with_cnt): Likewise. (ashlhi3_out): Likewise. (ashrhi3_out): Likewise. (lshrhi3_out): Likewise. (avr_class_max_nregs): Return single reg for REG_CC. (avr_compare_pattern): Check for REG_CC instead of cc0_rtx. (avr_reorg_remove_redundant_compare): Likewise. (avr_reorg):Adjust for PARALLEL with REG_CC clobber. (avr_hard_regno_nregs): Return single reg for REG_CC. (avr_hard_regno_mode_ok): Allow only CCmode for REG_CC. (avr_md_asm_adjust): Clobber REG_CC. (TARGET_HARD_REGNO_NREGS): Define. (TARGET_CLASS_MAX_NREGS): Define. (TARGET_MD_ASM_ADJUST): Define. * config/avr/avr.h (FIRST_PSEUDO_REGISTER): Adjust for REG_CC. (enum reg_class): Add CC_REG class. (NOTICE_UPDATE_CC): Remove. (CC_OVERFLOW_UNUSABLE): Remove. (CC_NO_CARRY): Remove. * config/avr/avr.md: Turn existing patterns into define_insn_and_split style patterns where the splitter adds a clobber of the condition code register. Drop "cc" attribute. Add new patterns to match output of the splitters. (sez): Remove unused pattern.
2021-04-28	arm: fix UB due to missing mode check [PR100311]	Richard Earnshaw	1	-1/+1
	Some places in the compiler iterate over all the fixed registers to check if that register can be used in a particular mode. The idiom is to iterate over the register and then for that register, if it supports the current mode to check all that register and any additional registers needed (HARD_REGNO_NREGS). If these two checks are not fully aligned then it is possible to generate a buffer overrun when testing data objects that are sized by the number of hard regs in the machine. The VPR register is a case where these checks were not consistent and because this is the last HARD register the result was that we ended up overflowing the fixed_regs array. gcc: PR target/100311 * config/arm/arm.c (arm_hard_regno_mode_ok): Only allow VPR to be used in HImode.
2021-04-28	aarch64: Fix address mode for vec_concat pattern [PR100305]	Richard Sandiford	1	-0/+2
	The load_pair_lanes<mode> patterns match a vec_concat of two adjacent 64-bit memory locations as a single 128-bit load. The Utq constraint made sure that the address was suitable for a 128-bit vector, but this meant that it allowed some addresses that aren't valid for the 64-bit element mode. Two obvious fixes were: (1) Continue to accept addresses that aren't valid for the element modes. This would mean changing the mode of operands[1] before printing it. It would also mean using a custom predicate instead of the current memory_operand. (2) Restrict addresses to the intersection of those that are valid element and vector addresses. The problem with (1) is that, as well as being more complicated, it doesn't deal with the fact that we still have a memory_operand for the second element. If we encourage the first operand to be outside the range of a normal element memory_operand, we'll have to reload the second operand to make it valid. This reload will often be dead code, but will be kept around because the RTL pattern makes it look as though the second element address is still needed. This patch therefore does (2) instead. As mentioned in the PR notes, I think we have a general problem with the way that the aarch64 port deals with paired addresses. There's nothing to guarantee that the two addresses will be reloaded in a way that keeps them “obviously” adjacent, so the rtx_equal_p conditions could fail if something rechecked them later. For this particular pattern, I think it would be better to teach simplify-rtx.c to fold the vec_concat to a normal vector memory reference, to remove any suggestion that targets should try to match the unsimplified form. That obviously wouldn't be suitable for backports though. gcc/ PR target/100305 * config/aarch64/constraints.md (Utq): Require the address to be valid for both the element mode and for V2DImode. gcc/testsuite/ PR target/100305 * gcc.c-torture/compile/pr100305.c: New test.
2021-04-27	aix: Alias -m64 to -maix64 and -m32 to -maix32.	David Edelsohn	2	-0/+12
	GCC on AIX historically has used -maix64 and -maix32 to switch to 64 bit mode or 32 bit mode, unlike other ports that use -m64 and -m32. The Alias() directive for options cannot be used because aix64 is expected in multiple parts of the compiler infrastructure and one cannot switch to -m64 due to backward compatibility. This patch defines DRIVER_SELF_SPECS to translate -m64 to -maix64 and -m32 to -maix32 so that the command line option compatible with other targets can be used while continuing to allow the historical options. gcc/ChangeLog: * config/rs6000/aix.h (SUBTARGET_DRIVER_SELF_SPECS): New. * config/rs6000/aix64.opt (m64): New. (m32): New.
2021-04-27	VAX: Accept ASHIFT in address expressions	Maciej W. Rozycki	1	-13/+21
	Fix regressions: FAIL: gcc.c-torture/execute/20090113-2.c -O1 (internal compiler error) FAIL: gcc.c-torture/execute/20090113-2.c -O1 (test for excess errors) FAIL: gcc.c-torture/execute/20090113-3.c -O1 (internal compiler error) FAIL: gcc.c-torture/execute/20090113-3.c -O1 (test for excess errors) triggering if LRA is used rather than old reload and caused by: (plus:SI (plus:SI (mult:SI (reg:SI 30 [ _10 ]) (const_int 4 [0x4])) (reg/f:SI 26 [ _6 ])) (const_int 12 [0xc])) coming from: (insn 58 57 59 10 (set (reg:SI 33 [ _13 ]) (zero_extract:SI (mem:SI (plus:SI (plus:SI (mult:SI (reg:SI 30 [ _10 ]) (const_int 4 [0x4])) (reg/f:SI 26 [ _6 ])) (const_int 12 [0xc])) [4 _6->bits[_10]+0 S4 A32]) (reg:QI 56) (reg:SI 53))) ".../gcc/testsuite/gcc.c-torture/execute/20090113-2.c":64:12 490 {extzv_non_const} (expr_list:REG_DEAD (reg:QI 56) (expr_list:REG_DEAD (reg:SI 53) (expr_list:REG_DEAD (reg:SI 30 [ _10 ]) (expr_list:REG_DEAD (reg/f:SI 26 [ _6 ]) (nil)))))) being converted into: (plus:SI (plus:SI (ashift:SI (reg:SI 30 [ _10 ]) (const_int 2 [0x2])) (reg/f:SI 26 [ _6 ])) (const_int 12 [0xc])) which is an rtx the VAX backend currently does not recognize as a valid machine address, although apparently it is only inside MEM rtx's that indexed addressing is supposed to be canonicalized to a MULT rather than ASHIFT form. Handle the ASHIFT form too throughout the backend then. The change appears to also improve code generation with old reload and code size stats are as follows, collected from 18153 executables built in `check-c' GCC testing: samples average median -------------------------------------- regressions 47 0.702% 0.521% unchanged 17503 0.000% 0.000% progressions 603 -0.920% -0.403% -------------------------------------- total 18153 -0.029% 0.000% with a small number of outliers (over 5% size change): old new change %change filename ---------------------------------------------------- 1885 1645 -240 -12.7320 pr53505.exe 1331 1221 -110 -8.2644 pr89634.exe 1553 1473 -80 -5.1513 stdatomic-vm.exe 1413 1341 -72 -5.0955 pr45830.exe 1415 1343 -72 -5.0883 stdatomic-vm.exe 25765 24463 -1302 -5.0533 strlen-5.exe 25765 24463 -1302 -5.0533 strlen-5.exe 25765 24463 -1302 -5.0533 strlen-5.exe 1191 1131 -60 -5.0377 20050527-1.exe (all changes on the expansion side are below 5%). gcc/ config/vax/vax.c (print_operand_address, vax_address_cost_1) (index_term_p): Handle ASHIFT too.
2021-04-27	VAX: Fix ill-formed `jbb<ccss>i<mode>' insn operands	Maciej W. Rozycki	1	-6/+4
	The insn has extraneous operand #3 that is aliased in RTL to operand #0 with a constraint. The operands specify a single-bit field in memory that the machine instruction produced boths reads for the purpose of determining whether to branch or not and either clears or sets according to the machine operation selected with the `ccss' iterator. The caller of the insn is supposed to supply the same rtx for both operands. This odd arrangement happens to work with old reload, but breaks with libatomic if LRA is used instead: .../libatomic/flag.c: In function 'atomic_flag_test_and_set': .../libatomic/flag.c:36:1: error: unable to generate reloads for: 36 \| } \| ^ (jump_insn 7 6 19 2 (unspec_volatile [ (set (pc) (if_then_else (eq (zero_extract:SI (mem/v:QI (reg:SI 27) [-1 S1 A8]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 1 [0x1])) (label_ref:SI 25) (pc))) (set (zero_extract:SI (mem/v:QI (reg:SI 28) [-1 S1 A8]) (const_int 1 [0x1]) (const_int 0 [0])) (const_int 1 [0x1])) ] 100) ".../libatomic/flag.c":35:10 669 {jbbssiqi} (nil) -> 25) during RTL pass: reload .../libatomic/flag.c:36:1: internal compiler error: in curr_insn_transform, at lra-constraints.c:4098 0x1112c587 _fatal_insn(char const, rtx_def const, char const, int, char const) .../gcc/rtl-error.c:108 0x10ee6563 curr_insn_transform .../gcc/lra-constraints.c:4098 0x10eeaf87 lra_constraints(bool) .../gcc/lra-constraints.c:5133 0x10ec97e3 lra(_IO_FILE) .../gcc/lra.c:2336 0x10e4633f do_reload .../gcc/ira.c:5827 0x10e46b27 execute .../gcc/ira.c:6013 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. Switch to using `match_dup' as expected then for a machine instruction that in its encoding only has one actual operand in for the single-bit field. gcc/ config/vax/builtins.md (jbb<ccss>i<mode>): Remove operand #3. (sync_lock_test_and_set<mode>): Adjust accordingly. (sync_lock_release<mode>): Likewise.
2021-04-27	VAX: Remove dead `adjacent_operands_p' function	Maciej W. Rozycki	2	-74/+0
	This function has never been used and it is unclear what its intended purpose was. gcc/ * config/vax/vax-protos.h (adjacent_operands_p): Remove prototype. * config/vax/vax.c (adjacent_operands_p): Remove.
2021-04-27	powerpc: fix bootstrap.	David Edelsohn	1	-0/+2
	gcc/ChangeLog: * config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): Protect with TARGET_AIX_OS.
2021-04-27	aix: TLS precompute register parameters (PR 94177)	David Edelsohn	2	-1/+15
	AIX uses a compiler-managed TOC for global data, including TLS symbols. The GCC TOC implementation manages the TOC entries through the constant pool. TLS symbols sometimes require a function call to obtain the TLS base pointer. The arguments to the TLS call can conflict with arguments to a normal function call if the TLS symbol is an argument in the normal call. GCC specifically checks for this situation and precomputes the TLS arguments, but the mechanism to check for this requirement utilizes legitimate_constant_p(). The necessary result of legitimate_constant_p() for correct TOC behavior and for correct TLS argument behavior is in conflict. This patch adds a new target hook precompute_tls_p() to decide if an argument should be precomputed regardless of the result from legitmate_constant_p(). gcc/ChangeLog: PR target/94177 * calls.c (precompute_register_parameters): Additionally test targetm.precompute_tls_p to pre-compute argument. * config/rs6000/aix.h (TARGET_PRECOMPUTE_TLS_P): Define. * config/rs6000/rs6000.c (rs6000_aix_precompute_tls_p): New. * target.def (precompute_tls_p): New. * doc/tm.texi.in (TARGET_PRECOMPUTE_TLS_P): Add hook documentation. * doc/tm.texi: Regenerated.
2021-04-27	aarch64: Fix up last commit [PR100200]	Jakub Jelinek	1	-1/+1
	Pedantically signed vs. unsigned mismatches in va_arg are only well defined if the value can be represented in both signed and unsigned integer types. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/100200 * config/aarch64/aarch64.c (aarch64_print_operand): Cast -UINTVAL back to HOST_WIDE_INT.
2021-04-27	arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]	Alex Coplan	3	-18/+51
	The PR shows two ICEs with __sync_bool_compare_and_swap and -mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one later on, after the CAS insn is split. The LRA ICE occurs because the @atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie two output operands together (operands 0 and 1 in the third alternative). LRA can't handle this, since it doesn't make sense for an insn to assign to the same operand twice. The later (post-splitting) ICE occurs because the expansion of the cbranchsi4_scratch insn doesn't quite go according to plan. As it stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch, attempting to pass a register (neg_bval) to use as a scratch register. However, since the RTL template has a match_scratch here, gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx. Since this is all happening after RA, this is doomed to fail (and we get an ICE about the insn not matching its constraints). It seems that the motivation for the choice of constraints in the atomic_compare_and_swap pattern comes from an attempt to satisfy the constraints of the cbranchsi4_scratch insn. This insn requires the scratch register to be the same as the input register in the case that we use a larger negative immediate (one that satisfies J, but not L). Of course, as noted above, LRA refuses to assign two output operands to the same register, so this was never going to work. The solution I'm proposing here is to collapse the alternatives to the CAS insn (allowing the two output register operands to be matched to different registers) and to ensure that the constraints for cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by inserting a move to ensure the source and destination registers match if necessary (i.e. in the case of large negative immediates). Another notable change here is that we only do: emit_move_insn (neg_bval, const1_rtx); for non-negative immediates. This is because the ADDS instruction used in the negative case suffices to leave a suitable value in neg_bval: if the operands compare equal, we don't take the branch (so neg_bval will be set by the load exclusive). Otherwise, the ADDS will leave a nonzero value in neg_bval, which will correctly signal that the CAS has failed when it is later negated. gcc/ChangeLog: PR target/99977 * config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen with negative immediates: ensure we expand cbranchsi4_scratch correctly and ensure we satisfy its constraints. * config/arm/sync.md (@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't attempt to tie two output operands together with constraints; collapse two alternatives. (@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise. * config/arm/thumb1.md (cbranchsi4_neg_late): New. gcc/testsuite/ChangeLog: PR target/99977 * gcc.target/arm/pr99977.c: New test.
2021-04-27	aarch64: Fix UB in the compiler [PR100200]	Jakub Jelinek	3	-7/+8
	The following patch fixes UBs in the compiler when negativing a CONST_INT containing HOST_WIDE_INT_MIN. I've changed the spots where there wasn't an obvious earlier condition check or predicate that would fail for such CONST_INTs. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/100200 * config/aarch64/predicates.md (aarch64_sub_immediate, aarch64_plus_immediate): Use -UINTVAL instead of -INTVAL. * config/aarch64/aarch64.md (casesi, rotl<mode>3): Likewise. * config/aarch64/aarch64.c (aarch64_print_operand, aarch64_split_atomic_op, aarch64_expand_subvti): Likewise.
2021-04-27	arm: fix UB when compiling thumb2 with PIC [PR100236]	Richard Earnshaw	1	-3/+7
	arm_compute_save_core_reg_mask contains UB in that the saved PIC register number is used to create a bit mask. However, for some target options this register is undefined and we end up with a shift of ~0. On native compilations this is benign since the shift will still be large enough to move the bit outside of the range of the mask, but if cross compiling from a system that truncates out-of-range shifts to zero (or worse, raises a trap for such values) we'll get potentially wrong code (or a fault). gcc: PR target/100236 * config/arm/arm.c (THUMB2_WORK_REGS): Check PIC_OFFSET_TABLE_REGNUM is valid before including it in the mask.
2021-04-27	aarch64: Handle SVE attributes in comp_type_attributes [PR100270]	Richard Sandiford	1	-0/+4
	Even though "SVE type" and "SVE sizeless type" are marked as affecting type identity, the middle end doesn't truly believe it unless we also handle them in comp_type_attributes. gcc/ PR target/100270 * config/aarch64/aarch64.c (aarch64_comp_type_attributes): Handle SVE attributes. gcc/testsuite/ PR target/100270 * gcc.target/aarch64/sve/acle/general-c/pr100270_1.c: New test. * gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Change expected error message when subtracting pointers to different vector types. Expect warnings when mixing them elsewhere. * gcc.target/aarch64/sve/acle/general/attributes_7.c: Remove XFAILs. Tweak error messages for some cases.
2021-04-27	i386: Improve [QH]Imode rotates with masked shift count [PR99405]	Jakub Jelinek	1	-19/+19
	The following testcase shows that while we nicely optimize away the useless and? of shift count before rotation for [SD]Imode rotates, we don't do that for [QH]Imode. The following patch optimizes that by using the right iterator on those 4 patterns. 2021-04-27 Jakub Jelinek <jakub@redhat.com> PR target/99405 * config/i386/i386.md (<insn><mode>3_mask, <insn><mode>3_mask_1): For any_rotate define_insn_split and following splitters, use SWI iterator instead of SWI48. * gcc.target/i386/pr99405.c: New test.
2021-04-27	Synchronize Rocket Lake's processor_names and processor_cost_table with ↵	Cui,Lili	1	-1/+1
	processor_type gcc/ChangeLog * common/config/i386/i386-common.c (processor_names): Sync processor_names with processor_type. * config/i386/i386-options.c (processor_cost_table): Sync processor_cost_table with processor_type.
2021-04-26	aarch64: Handle V4BF V8BF modes in vwcore attribute	Kyrylo Tkachov	1	-0/+1
	While playing with other unrelated changes I hit an assemble-failure bug where a pattern (one of the get_lane ones) that was using V4BF, V8BF as part of a mode iterator and outputting registers with the vwcore attribute, but there is no vwcore mapping for V4BF and V8BF. This patch fixes that in the obvious way by adding the missing mappings Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/iterators.md (vwcore): Handle V4BF, V8BF.
2021-04-26	Simplify {gimplify_and_,}update_call_from_tree API	Richard Biener	1	-1/+1
	This removes update_call_from_tree in favor of gimplify_and_update_call_from_tree, removing some code duplication and simplifying the API use. Some users of update_call_from_tree have been transitioned to replace_call_with_value and the API and its dependences have been moved to gimple-fold.h. This shaves off another user of valid_gimple_rhs_p which is now only used from within gimple-fold.c and thus moved and made private. 2021-04-14 Richard Biener <rguenther@suse.de> * tree-ssa-propagate.h (valid_gimple_rhs_p): Remove. (update_gimple_call): Likewise. (update_call_from_tree): Likewise. * tree-ssa-propagate.c (valid_gimple_rhs_p): Remove. (valid_gimple_call_p): Likewise. (move_ssa_defining_stmt_for_defs): Likewise. (finish_update_gimple_call): Likewise. (update_gimple_call): Likewise. (update_call_from_tree): Likewise. (propagate_tree_value_into_stmt): Use replace_call_with_value. * gimple-fold.h (update_gimple_call): Declare. * gimple-fold.c (valid_gimple_rhs_p): Move here from tree-ssa-propagate.c. (update_gimple_call): Likewise. (valid_gimple_call_p): Likewise. (finish_update_gimple_call): Likewise, and simplify. (gimplify_and_update_call_from_tree): Implement update_call_from_tree functionality, avoid excessive push/pop_gimplify_context. (gimple_fold_builtin): Use only gimplify_and_update_call_from_tree. (gimple_fold_call): Likewise. * gimple-ssa-sprintf.c (try_substitute_return_value): Likewise. * tree-ssa-ccp.c (ccp_folder::fold_stmt): Likewise. (pass_fold_builtins::execute): Likewise. (optimize_stack_restore): Use replace_call_with_value. * tree-cfg.c (fold_loop_internal_call): Likewise. * tree-ssa-dce.c (maybe_optimize_arith_overflow): Use only gimplify_and_update_call_from_tree. * tree-ssa-strlen.c (handle_builtin_strlen): Likewise. (handle_builtin_strchr): Likewise. * tsan.c: Include gimple-fold.h instead of tree-ssa-propagate.h. * config/rs6000/rs6000-call.c (rs6000_gimple_fold_builtin): Use replace_call_with_value.
2021-04-25	Add folding and remove expanders for x86 pcmp{et,gt} builtins [PR ↵	liuhongt	3	-41/+61
	target/98911] gcc/ChangeLog: PR target/98911 * config/i386/i386-builtin.def (BDESC): Change the icode of the following builtins to CODE_FOR_nothing. * config/i386/i386.c (ix86_gimple_fold_builtin): Fold IX86_BUILTIN_PCMPEQB128, IX86_BUILTIN_PCMPEQW128, IX86_BUILTIN_PCMPEQD128, IX86_BUILTIN_PCMPEQQ, IX86_BUILTIN_PCMPEQB256, IX86_BUILTIN_PCMPEQW256, IX86_BUILTIN_PCMPEQD256, IX86_BUILTIN_PCMPEQQ256, IX86_BUILTIN_PCMPGTB128, IX86_BUILTIN_PCMPGTW128, IX86_BUILTIN_PCMPGTD128, IX86_BUILTIN_PCMPGTQ, IX86_BUILTIN_PCMPGTB256, IX86_BUILTIN_PCMPGTW256, IX86_BUILTIN_PCMPGTD256, IX86_BUILTIN_PCMPGTQ256. * config/i386/sse.md (avx2_eq<mode>3): Deleted. (sse2_eq<mode>3): Ditto. (sse4_1_eqv2di3): Ditto. (sse2_gt<mode>3): Rename to .. (sse2_gt<mode>3): .. this. gcc/testsuite/ChangeLog: PR target/98911 gcc.target/i386/pr98911.c: New test. * gcc.target/i386/funcspec-8.c: Replace __builtin_ia32_pcmpgtq with __builtin_ia32_pcmpistrm128 since it has been folded.
2021-04-24	Revert "Darwin : Adjust darwin_binds_local_p for PIC code [PR100152]."	Iain Sandoe	1	-13/+4
	Unfortunately, although this is required to fix the PR, and is notionally correct, it regresses some of the sanitizer and IPA tests. Reverting until this can be analysed. This reverts commit b6600392bf71c4a9785f8f49948b611425896830.