riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-02-06	riscv: Fix compiler warning in thead.cc	Christoph Müllner	1	-1/+2
	A recent commit introduced a compiler warning in thead.cc: error: invalid suffix on literal; C++11 requires a space between literal and string macro [-Werror=literal-suffix] 1144 \| fprintf (file, "(%s),"HOST_WIDE_INT_PRINT_DEC",%u", reg_names[REGNO (addr.reg)], \| ^ This commit addresses this issue and breaks the line such that it won't exceed 80 characters. gcc/ChangeLog: * config/riscv/thead.cc (th_print_operand_address): Fix compiler warning. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2024-02-05	x86-64: Find a scratch register for large model profiling	H.J. Lu	1	-15/+76
	2 scratch registers, %r10 and %r11, are available at function entry for large model profiling. But %r10 may be used by stack realignment and we can't use %r10 in this case. Add x86_64_select_profile_regnum to find a caller-saved register which isn't live or a callee-saved register which has been saved on stack in the prologue at entry for large model profiling and sorry if we can't find one. gcc/ PR target/113689 * config/i386/i386.cc (x86_64_select_profile_regnum): New. (x86_function_profiler): Call x86_64_select_profile_regnum to get a scratch register for large model profiling. gcc/testsuite/ PR target/113689 * gcc.target/i386/pr113689-1.c: New file. * gcc.target/i386/pr113689-2.c: Likewise. * gcc.target/i386/pr113689-3.c: Likewise.
2024-02-05	arm: Fix missing bti instruction for virtual thunks	Richard Ball	1	-0/+2
	Adds missing bti instruction at the beginning of a virtual thunk, when bti is enabled. gcc/ChangeLog: * config/arm/arm.cc (arm_output_mi_thunk): Emit insn for bti_c when bti is enabled. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add v8_1_m_main_pacbti. * g++.target/arm/bti_thunk.C: New test.
2024-02-05	mips: Fix missing mode in neg<mode:MSA>2	Xi Ruoyao	1	-1/+1
	I was too sleepy writting this :(. gcc/ChangeLog: * config/mips/mips-msa.md (neg<mode:MSA>2): Add missing mode for neg.
2024-02-05	MIPS: Fix wrong MSA FP vector negation	Xi Ruoyao	1	-3/+15
	We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with MSA enabled. Use the bnegi.df instructions to simply reverse the sign bit instead. gcc/ChangeLog: * config/mips/mips-msa.md (elmsgnbit): New define_mode_attr. (neg<mode>2): Change the mode iterator from MSA to IMSA because in FP arithmetic we cannot use (0 - x) for -x. (neg<mode>2): New define_insn to implement FP vector negation, using a bnegi instruction to negate the sign bit.
2024-02-05	i386: Clear REG_UNUSED and REG_DEAD notes from the IL at the end of ↵	Jakub Jelinek	1	-0/+26
	vzeroupper pass [PR113059] The move of the vzeroupper pass from after reload pass to after postreload_cse helped only partially, CSE-like passes can still invalidate those notes (especially REG_UNUSED) if they use some earlier register holding some value later on in the IL. So, either we could try to move it one pass further after gcse2 and hope no later pass invalidates the notes, or the following patch attempts to restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes reappear only at the start of dse2 pass when it calls df_note_add_problem (); df_analyze (); So, effectively NEXT_PASS (pass_postreload_cse); NEXT_PASS (pass_gcse2); NEXT_PASS (pass_split_after_reload); NEXT_PASS (pass_ree); NEXT_PASS (pass_compare_elim_after_reload); NEXT_PASS (pass_thread_prologue_and_epilogue); passes operate without those notes in the IL. While in GCC 14 mode switching computes the notes problem at the start of vzeroupper, the patch below removes them at the end of the pass again, so that the above passes continue to operate without them. 2024-02-05 Jakub Jelinek <jakub@redhat.com> PR target/113059 * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper): Remove REG_DEAD/REG_UNUSED notes at the end of the pass before df_analyze call.
2024-02-05	target/113255 - avoid REG_POINTER on a pointer difference	Richard Biener	1	-1/+1
	The following avoids re-using a register holding a pointer (and thus might be REG_POINTER) for the result of a pointer difference computation. That might confuse heuristics in (broken) RTL alias analysis which relies on REG_POINTER indicating that we're dealing with one. This alone doesn't fix anything. PR target/113255 * config/i386/i386-expand.cc (expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves): Use a new pseudo for the skipped number of bytes.
2024-02-04	RISC-V: Add sifive-p450, sifive-p67 to -mcpu	Monk Chiang	1	-0/+9
	gcc/ChangeLog: * config/riscv/riscv-cores.def: Add sifive-p450, sifive-p670. * doc/invoke.texi (RISC-V Options): Add sifive-p450, sifive-p670. gcc/testsuite/ChangeLog: * gcc.target/riscv/mcpu-sifive-p450.c: New test. * gcc.target/riscv/mcpu-sifive-p670.c: New test.
2024-02-04	RISC-V: Support scheduling for sifive p400 series	Monk Chiang	6	-1/+196
	Add sifive p400 series scheduler module. For more information see https://www.sifive.com/cores/performance-p450-470. gcc/ChangeLog: * config/riscv/riscv.md: Include sifive-p400.md. * config/riscv/sifive-p400.md: New file. * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add sifive_p400. * config/riscv/riscv.cc (sifive_p400_tune_info): New. * config/riscv/riscv.h (TARGET_SFB_ALU): Update. * doc/invoke.texi (RISC-V Options): Add sifive-p400-series
2024-02-04	xtensa: Fix missing mode warning in "*eqne_zero_masked_bits"	Takayuki 'January June' Suwa	1	-1/+1
	gcc/ChangeLog: * config/xtensa/xtensa.md (*eqne_zero_masked_bits): Add missing ":SI" to the match_operator.
2024-02-04	xtensa: Recover constant synthesis for HImode after LRA transition	Takayuki 'January June' Suwa	1	-8/+14
	After LRA transition, HImode constants that don't fit into signed 12 bits are no longer subject to constant synthesis: /* example / void test(void) { short foo = 32767; __asm__ ("" :: "r"(foo)); } ;; before .literal_position .literal .LC0, 32767 test: l32r a9, .LC0 ret.n This patch fixes that: ;; after test: movi.n a9, -1 extui a9, a9, 17, 15 ret.n gcc/ChangeLog: config/xtensa/xtensa.md (SHI): New mode iterator. (2 split patterns related to constsynth): Change to also accept HImode operands.
2024-02-04	[committed] Reasonably handle SUBREGs in risc-v cost modeling	Jeff Law	1	-7/+11
	This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P \|\| SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
2024-02-04	LoongArch: Fix wrong LSX FP vector negation	Xi Ruoyao	3	-27/+18
	We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with LSX enabled. Use the vbitrevi.{d/w} instructions to simply reverse the sign bit instead. We are already doing this for LASX and now we can unify them into simd.md. gcc/ChangeLog: * config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the incorrect expand. * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr. (elmsgnbit): Likewise. (neg<mode:FVEC>2): New define_insn. * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they are now instantiated in simd.md.
2024-02-04	LoongArch: Avoid out-of-bounds access in loongarch_symbol_insns	Xi Ruoyao	1	-1/+2
	We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. But in loongarch_symbol_insns: if (LSX_SUPPORTED_MODE_P (mode) \|\| LASX_SUPPORTED_MODE_P (mode)) return 0; And LSX_SUPPORTED_MODE_P is defined as: #define LSX_SUPPORTED_MODE_P(MODE) \ (ISA_HAS_LSX \ && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ... GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined: ALWAYS_INLINE poly_uint16 mode_to_bytes (machine_mode mode) { #if GCC_VERSION >= 4001 return (__builtin_constant_p (mode) ? mode_size_inline (mode) : mode_size[mode]); #else return mode_size[mode]; #endif } There is an assertion in mode_size_inline: gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES); Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc), thus if __builtin_constant_p (mode) is evaluated true (it happens when GCC is bootstrapped with LTO+PGO), the assertion will be triggered and cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false, mode_size[mode] is still an out-of-bound array access (the length or the mode_size array is NUM_MACHINE_MODES). So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a MIPS bug PR98491 fixed by me about 3 years ago. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
2024-02-04	LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1, 2}.c FAIL.	Li Wei	1	-55/+163
	This FAIL was introduced from r14-6908. The reason is that when merging constant vector permutation implementations, the 128-bit matching situation was not fully considered. In fact, the expansion of 128-bit vectors after merging only supports value-based 4 elements set shuffle, so this time is a complete implementation of the entire 128-bit vector constant permutation, and some structural adjustments have also been made to the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust. (loongarch_expand_vselect_vconcat): Ditto. (loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement all 128-bit constant permutation situations. (loongarch_expand_lsx_shuffle): Adjust and rename function name. (loongarch_is_imm_set_shuffle): Renamed function name. (loongarch_expand_vec_perm_even_odd): Function forward declaration. (loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit extract-even and extract-odd permutations. (loongarch_is_odd_extraction): Delete. (loongarch_is_even_extraction): Ditto. (loongarch_expand_vec_perm_const): Adjust.
2024-02-03	LoongArch: Fix an ODR violation	Xi Ruoyao	2	-2/+3
	When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR violation is detected: ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] 57 \| abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; ../../gcc/config/loongarch/loongarch-def.cc:186: note: 'abi_minimal_isa' was previously declared here 186 \| abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>, ../../gcc/config/loongarch/loongarch-def.cc:186: note: code may be misoptimized unless '-fno-strict-aliasing' is used Fix it by adding a proper declaration of abi_minimal_isa into loongarch-def.h and remove the ODR-violating local declaration in loongarch-opts.cc. gcc/ChangeLog: * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare. * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove the ODR-violating locale declaration.
2024-02-02	hppa: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV	John David Anglin	2	-1/+298
	This change implements __builtin_get_fpsr() and __builtin_set_fpsr(x) to get and set the floating-point status register. They are used to implement pa_atomic_assign_expand_fenv(). 2024-02-02 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/59778 * config/pa/pa.cc (enum pa_builtins): Add PA_BUILTIN_GET_FPSR and PA_BUILTIN_SET_FPSR builtins. * (pa_builtins_icode): Declare. * (def_builtin, pa_fpu_init_builtins): New. * (pa_init_builtins): Initialize FPU builtins. * (pa_builtin_decl, pa_expand_builtin_1): New. * (pa_expand_builtin): Handle PA_BUILTIN_GET_FPSR and PA_BUILTIN_SET_FPSR builtins. * (pa_atomic_assign_expand_fenv): New. * config/pa/pa.md (UNSPECV_GET_FPSR, UNSPECV_SET_FPSR): New UNSPECV constants. (get_fpsr, put_fpsr): New expanders. (get_fpsr_32, get_fpsr_64, set_fpsr_32, set_fpsr_64): New insn patterns.
2024-02-03	RISC-V: Expand VLMAX scalar move in reduction	Juzhe-Zhong	1	-5/+7
	This patch fixes the following: vsetvli a5,a1,e32,m1,tu,ma slli a4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 vsetivli zero,1,e32,m1,ta,ma vmv.s.x v2,zero vsetvli a5,zero,e32,m1,ta,ma ---> Redundant vsetvl. vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction. However, this following RTL blocks the fusion in dependence analysis in VSETVL PASS: (insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148]) (if_then_else:RVVM1SI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI [ (const_int 1 [0x1]) repeat [ (const_int 0 [0]) ] ]) (const_int 1 [0x1]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM1SI repeat [ (const_int 0 [0]) ]) (unspec:RVVM1SI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) 3813 {pred_broadcastrvvm1si_zero} (nil)) (insn 50 49 51 5 (set (reg:DI 15 a5 [151]) ----> It set a5, blocks the following VLMAX into the scalar move above. (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX)) 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 32 [0x20]) ] UNSPEC_VLMAX) (nil))) (insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150]) (unspec:RVVM1SI [ (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 15 a5 [151]) (const_int 2 [0x2]) (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (unspec:RVVM1SI [ (reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134]) (reg:RVVM1SI 98 v2 [148]) ] UNSPEC_REDUC_SUM) (unspec:RVVM1SI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF) ] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si} (expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148]) (expr_list:REG_DEAD (reg:SI 66 vl) (expr_list:REG_DEAD (reg:DI 15 a5 [151]) (expr_list:REG_DEAD (reg:DI 0 zero) (nil)))))) Such situation can only happen on auto-vectorization, never happen on intrinsic codes. Since the reduction is passed VLMAX AVL, it should be more natural to pass VLMAX to the scalar move which initial the value of the reduction. After this patch: vsetvli a5,a1,e32,m1,tu,ma slli a4,a5,2 sub a1,a1,a5 vle32.v v2,0(a0) add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 vsetvli a5,zero,e32,m1,ta,ma vmv.s.x v2,zero vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret Tested on both RV32/RV64 no regression. PR target/113697 gcc/ChangeLog: config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to scalar move. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr113697.c: New test.
2024-02-02	Revert "RISC-V: Allow LICM hoist POLY_INT configuration code sequence"	Lehua Ding	1	-5/+4
	This reverts commit 74489c19070703361acc20bc172f304cae845a96.
2024-02-02	RISC-V: Allow LICM hoist POLY_INT configuration code sequence	Juzhe-Zhong	1	-4/+5
	Realize in recent benchmark evaluation (coremark-pro zip-test): vid.v v2 vmv.v.i v5,0 .L9: vle16.v v3,0(a4) vrsub.vx v4,v2,a6 ---> LICM failed to hoist it outside the loop. The root cause is: (insn 56 47 57 4 (set (subreg:DI (reg:HI 220) 0) (reg:DI 223)) "rvv.c":11:9 208 {movdi_64bit} -> Its result used by the following vrsub.vx then supress the hoist of the vrsub.vx (nil)) (insn 57 56 59 4 (set (reg:RVVMF2HI 216) (if_then_else:RVVMF2HI (unspec:RVVMF32BI [ (const_vector:RVVMF32BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 350) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (minus:RVVMF2HI (vec_duplicate:RVVMF2HI (reg:HI 220)) (reg:RVVMF2HI 217)) (unspec:RVVMF2HI [ (reg:DI 0 zero) ] UNSPEC_VUNDEF))) "rvv.c":11:9 6938 {pred_subrvvmf2hi_reverse_scalar} (expr_list:REG_DEAD (reg:HI 220) (nil))) This patch fixes it generate (set (reg:HI) (subreg:HI (reg:DI))) instead of (set (subreg:DI (reg:DI)) (reg:DI)). After this patch: vid.v v2 vrsub.vx v2,v2,a7 vmv.v.i v4,0 .L3: vle16.v v3,0(a4) Tested on both RV32 and RV64 no regression. gcc/ChangeLog: config/riscv/riscv.cc (riscv_legitimize_move): Fix poly_int dest generation. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/poly_licm-1.c: New test. * gcc.target/riscv/rvv/autovec/poly_licm-2.c: New test.
2024-02-02	RISC-V: Cleanup the comments for the psabi	Pan Li	1	-12/+9
	This patch would like to cleanup some comments which are out of date or incorrect. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_get_arg_info): Cleanup comments. (riscv_pass_by_reference): Ditto. (riscv_fntype_abi): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-02-02	RISC-V: Remove vsetvl_pre bogus instructions in VSETVL PASS	Juzhe-Zhong	1	-0/+64
	I realize there is a RTL regression between GCC-14 and GCC-13. https://godbolt.org/z/Ga7K6MqaT GCC-14: (insn 9 13 31 2 (set (reg:DI 15 a5 [138]) (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX) (nil))) (insn 31 9 10 2 (parallel [ (set (reg:DI 15 a5 [138]) (unspec:DI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) (set (reg:SI 66 vl) (unspec:SI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) ] UNSPEC_VSETVL)) (set (reg:SI 67 vtype) (unspec:SI [ (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) ]) "/app/example.c":5:15 3281 {vsetvldi} (nil)) GCC-13: (insn 10 7 26 2 (set (reg/f:DI 11 a1 [139]) (plus:DI (reg:DI 11 a1 [142]) (const_int 800 [0x320]))) "/app/example.c":6:32 5 {adddi3} (nil)) (insn 26 10 9 2 (parallel [ (set (reg:DI 15 a5) (unspec:DI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) (set (reg:SI 66 vl) (unspec:SI [ (reg:DI 0 zero) (const_int 32 [0x20]) (const_int 7 [0x7]) ] UNSPEC_VSETVL)) (set (reg:SI 67 vtype) (unspec:SI [ (const_int 32 [0x20]) (const_int 7 [0x7]) (const_int 1 [0x1]) repeated x2 ] UNSPEC_VSETVL)) ]) "/app/example.c":5:15 792 {vsetvldi} (nil)) GCC-13 doesn't have: (insn 9 13 31 2 (set (reg:DI 15 a5 [138]) (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX)) "/app/example.c":5:15 2566 {vlmax_avldi} (expr_list:REG_EQUIV (unspec:DI [ (const_int 64 [0x40]) ] UNSPEC_VLMAX) (nil))) vsetvl_pre doesn't emit any assembler which is just used for occupying scalar register. It should be removed in VSETVL PASS. Tested on both RV32 and RV64 no regression. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vsetvl_pre_insn_p): New function. (pre_vsetvl::cleaup): Remove vsetvl_pre. (pre_vsetvl::remove_vsetvl_pre_insns): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_pre-1.c: New test.
2024-02-02	LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functions	Jiahao Xu	1	-8/+8
	gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
2024-02-02	LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.	Li Wei	1	-0/+48
	We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental results show that after the modification, 549.fotonik3d_r performance can be improved by 9.77% under the 128-bit vectorization option. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_multiply_add_p): New. (loongarch_vector_costs::add_stmt_cost): Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/vect/vect-10.f90: New test.
2024-02-02	LoongArch: Don't split the instructions containing relocs for extreme code ↵	Xi Ruoyao	2	-57/+94
	model. The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for addressing a symbol to be adjacent. So model them as "one large instruction", i.e. define_insn, with two output registers. The real address is the sum of these two registers. The advantage of this approach is the RTL passes can still use ldx/stx instructions to skip an addi.d instruction. gcc/ChangeLog: * config/loongarch/loongarch.md (unspec): Add UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2. (la_pcrel64_two_parts): New define_insn. * config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a typo in the comment. (loongarch_call_tls_get_addr): If -mcmodel=extreme -mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL note to allow CSE addressing __tls_get_addr. (loongarch_legitimize_tls_address): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address TLS IE symbols with la_pcrel64_two_parts. (loongarch_split_symbol): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address symbols with la_pcrel64_two_parts. (loongarch_output_mi_thunk): Clean up unreachable code. If -mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI thunks with la_pcrel64_two_parts. gcc/testsuite/ChangeLog: * gcc.target/loongarch/func-call-extreme-1.c (dg-options): Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i instruction sequences are not reordered by the compiler. (NOIPA): Disallow interprocedural optimizations. * gcc.target/loongarch/func-call-extreme-2.c: Remove the content duplicated from func-call-extreme-1.c, include it instead. (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-3.c (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-4.c (dg-options): Likewise. * gcc.target/loongarch/cmodel-extreme-1.c: New test. * gcc.target/loongarch/cmodel-extreme-2.c: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
2024-02-02	LoongArch: Added support for loading __get_tls_addr symbol address using call36.	Lulu Cheng	1	-6/+16
	gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Add support for call36. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
2024-02-02	LoongArch: Enable explicit reloc for extreme TLS GD/LD with ↵	Lulu Cheng	1	-10/+9
	-mexplicit-relocs=auto. Binutils does not support relaxation using four instructions to obtain symbol addresses gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): When the code model of the symbol is extreme and -mexplicit-relocs=auto, the macro instruction loading symbol address is not applicable. (loongarch_call_tls_get_addr): Adjust code. (loongarch_legitimize_tls_address): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test. * gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
2024-02-02	LoongArch: Add the macro implementation of mcmodel=extreme.	Lulu Cheng	4	-44/+127
	gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p): Add function declaration. * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend" is not allowed (loongarch_load_tls): Added macro support in extreme mode. (loongarch_call_tls_get_addr): Likewise. (loongarch_legitimize_tls_address): Likewise. (loongarch_force_address): Likewise. (loongarch_legitimize_move): Likewise. (loongarch_output_mi_thunk): Likewise. (loongarch_option_override_internal): Remove the code that detects explicit relocs status. (loongarch_handle_model_attribute): Likewise. * config/loongarch/loongarch.md (movdi_symbolic_off64): New template. * config/loongarch/predicates.md (symbolic_off64_operand): New predicate. (symbolic_off64_or_reg_operand): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/attr-model-5.c: New test. * gcc.target/loongarch/func-call-extreme-5.c: New test. * gcc.target/loongarch/func-call-extreme-6.c: New test. * gcc.target/loongarch/tls-extreme-macro.c: New test.
2024-02-02	LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.	Lulu Cheng	2	-76/+30
	gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_load_tls): Load all types of tls symbols through one function. (loongarch_got_load_tls_gd): Delete. (loongarch_got_load_tls_ld): Delete. (loongarch_got_load_tls_ie): Delete. (loongarch_got_load_tls_le): Delete. (loongarch_call_tls_get_addr): Modify the called function name. (loongarch_legitimize_tls_address): Likewise. * config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete. (@load_tls<mode>): New template. (@got_load_tls_ld<mode>): Delete. (@got_load_tls_le<mode>): Delete. (@got_load_tls_ie<mode>): Delete.
2024-02-02	LoongArch: Modify the address calculation logic for obtaining array element ↵	Lulu Cheng	1	-0/+43
	values through fp. Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C). Thereby modifying the register dependencies and optimizing the code. The value of C is 2 4 or 8. The following is the assembly code before and after a loop modification in spec2006 401.bzip: old \| new 735 .L71: \| 735 .L71: 736 slli.d $r12,$r15,2 \| 736 slli.d $r12,$r15,2 737 ldx.w $r13,$r22,$r12 \| 737 ldx.w $r13,$r22,$r12 738 addi.d $r15,$r15,-1 \| 738 addi.d $r15,$r15,-1 739 slli.w $r16,$r15,0 \| 739 slli.w $r16,$r15,0 740 addi.w $r13,$r13,-1 \| 740 addi.w $r13,$r13,-1 741 slti $r14,$r13,0 \| 741 slti $r14,$r13,0 742 add.w $r12,$r26,$r13 \| 742 add.w $r12,$r26,$r13 743 maskeqz $r12,$r12,$r14 \| 743 maskeqz $r12,$r12,$r14 744 masknez $r14,$r13,$r14 \| 744 masknez $r14,$r13,$r14 745 or $r12,$r12,$r14 \| 745 or $r12,$r12,$r14 746 ldx.bu $r14,$r30,$r12 \| 746 ldx.bu $r14,$r30,$r12 747 lu12i.w $r13,4096>>12 \| 747 alsl.d $r14,$r14,$r18,2 748 ori $r13,$r13,432 \| 748 ldptr.w $r13,$r14,0 749 add.d $r13,$r13,$r3 \| 749 addi.w $r17,$r13,-1 750 alsl.d $r14,$r14,$r13,2 \| 750 stptr.w $r17,$r14,0 751 ldptr.w $r13,$r14,-1968 \| 751 slli.d $r13,$r13,2 752 addi.w $r17,$r13,-1 \| 752 stx.w $r12,$r22,$r13 753 st.w $r17,$r14,-1968 \| 753 ldptr.w $r12,$r19,0 754 slli.d $r13,$r13,2 \| 754 blt $r12,$r16,.L71 755 stx.w $r12,$r22,$r13 \| 755 .align 4 756 ldptr.w $r12,$r18,-2048 \| 756 757 blt $r12,$r16,.L71 \| 757 758 .align 4 \| 758 This patch is ported from riscv's commit r14-3111. gcc/ChangeLog: * config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function. (loongarch_legitimize_address): Add logical transformation code.
2024-02-01	i386: Improve *cmp<dwi>_doubleword splitter [PR113701]	Uros Bizjak	1	-4/+0
	The fix for PR70321 introduced a splitter that split a doubleword comparison into a pair of XORs followed by an IOR to set the (zero) flags register. To help the reload, splitter forced SUBREG pieces of double-word input values to a pseudo, but this regressed gcc.target/i386/pr82580.c: int f0 (U x, U y) { return x == y; } from: xorq %rdx, %rdi xorq %rcx, %rsi xorl %eax, %eax orq %rsi, %rdi sete %al ret to: xchgq %rdi, %rsi movq %rdx, %r8 movq %rcx, %rax movq %rsi, %rdx movq %rdi, %rcx xorq %rax, %rcx xorq %r8, %rdx xorl %eax, %eax orq %rcx, %rdx sete %al ret To mitigate the regression, remove this legacy heuristic (workaround?). There have been many incremental changes and improvements to x86 TImode and register allocation, so this legacy workaround is not only no longer useful, but it actually hurts register allocation. The patched compiler now produces: xchgq %rdi, %rsi xorl %eax, %eax xorq %rsi, %rdx xorq %rdi, %rcx orq %rcx, %rdx sete %al ret PR target/113701 gcc/ChangeLog: * config/i386/i386.md (*cmp<dwi>_doubleword): Do not force SUBREG pieces to pseudos.
2024-02-01	hppa: Fix bug in atomic_storedi_1 pattern	John David Anglin	1	-3/+3
	The first alternative stores the floating-point status register in the destination. It should store zero. We need to copy %fr0 to another floating-point register to initialize it to zero. 2024-02-01 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.md (atomic_storedi_1): Fix bug in alternative 1.
2024-02-01	AVR: Tabify avr.cc	Georg-Johann Lay	1	-3715/+3715
	gcc/ * config/avr/avr.cc: Tabify.
2024-02-01	GCN: Don't hard-code number of SGPR/VGPR/AVGPR registers	Thomas Schwinge	2	-6/+15
	Also add 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers (in '#ifndef USED_FOR_TARGET', as otherwise 'STATIC_ASSERT' isn't available). gcc/ * config/gcn/gcn.cc (gcn_hsa_declare_function_name): Don't hard-code number of SGPR/VGPR/AVGPR registers. * config/gcn/gcn.h: Add a 'STATIC_ASSERT's for number of SGPR/VGPR/AVGPR registers.
2024-02-01	RISC-V: Support scheduling for sifive p600 series	Monk Chiang	9	-12/+214
	Add sifive p600 series scheduler module. For more information see https://www.sifive.com/cores/performance-p650-670. Add sifive-p650, sifive-p670 for mcpu option will come in separate patches. gcc/ChangeLog: * config/riscv/riscv.md: Add "fcvt_i2f", "fcvt_f2i" type attribute, and include sifive-p600.md. * config/riscv/generic-ooo.md: Update type attribute. * config/riscv/generic.md: Update type attribute. * config/riscv/sifive-7.md: Update type attribute. * config/riscv/sifive-p600.md: New file. * config/riscv/riscv-cores.def (RISCV_TUNE): Add parameter. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add sifive_p600. * config/riscv/riscv.cc (sifive_p600_tune_info): New. * config/riscv/riscv.h (TARGET_SFB_ALU): Update. * doc/invoke.texi (RISC-V Options): Add sifive-p600-series
2024-02-01	RISC-V: Add minimal support for 7 new unprivileged extensions	Monk Chiang	1	-0/+14
	The RISC-V Profiles specification here: https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#7-new-isa-extensions These extensions don't add any new features but describe existing features. So this patch only adds parsing. Za64rs: Reservation set size of 64 bytes Za128rs: Reservation set size of 128 bytes Ziccif: Main memory supports instruction fetch with atomicity requirement Ziccrse: Main memory supports forward progress on LR/SC sequences Ziccamoa: Main memory supports all atomics in A Zicclsm: Main memory supports misaligned loads/stores Zic64b: Cache block size isf 64 bytes gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add Za64rs, Za128rs, Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b items. * config/riscv/riscv.opt: New macro for 7 new unprivileged extensions. * doc/invoke.texi (RISC-V Options): Add Za64rs, Za128rs, Ziccif, Ziccrse, Ziccamoa, Zicclsm, Zic64b extensions. gcc/testsuite/ChangeLog: * gcc.target/riscv/za-ext.c: New test. * gcc.target/riscv/zi-ext.c: New test.
2024-02-01	Link shared libasan with -z now on Solaris	Rainer Orth	1	-1/+1
	g++.dg/asan/default-options-1.C FAILs on Solaris/SPARC and x86: FAIL: g++.dg/asan/default-options-1.C -O0 execution test FAIL: g++.dg/asan/default-options-1.C -O1 execution test FAIL: g++.dg/asan/default-options-1.C -O2 execution test FAIL: g++.dg/asan/default-options-1.C -O2 -flto execution test FAIL: g++.dg/asan/default-options-1.C -O2 -flto -flto-partition=none execution test FAIL: g++.dg/asan/default-options-1.C -O3 -g execution test FAIL: g++.dg/asan/default-options-1.C -Os execution test The failure is always the same: AddressSanitizer: CHECK failed: asan_rtl.cpp:397 "((!AsanInitIsRunning() && "ASan init calls itself!")) != (0)" (0x0, 0x0) (tid=1) This happens because libasan makes unportable assumptions about initialization order that don't hold on Solaris. The problem has already been fixed in clang by [Driver] Link shared asan runtime lib with -z now on Solaris/x86 https://reviews.llvm.org/D156325 where it was way more prevalent. This patch applies the same fix to gcc. Tested on i386-pc-solaris2.11 (ld and gld) and sparc-sun-solaris2.11. 2024-01-30 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc: * config/sol2.h (LIBASAN_EARLY_SPEC): Add -z now unless -static-libasan. Add missing whitespace.
2024-02-01	GCN: Remove 'FIRST_{SGPR,VGPR,AVGPR}_REG', 'LAST_{SGPR,VGPR,AVGPR}_REG' from ↵	Thomas Schwinge	1	-8/+2
	machine description They're not used there, and we avoid potentially out-of-sync definitions. gcc/ * config/gcn/gcn.md (FIRST_SGPR_REG, LAST_SGPR_REG) (FIRST_VGPR_REG, LAST_VGPR_REG, FIRST_AVGPR_REG, LAST_AVGPR_REG): Don't 'define_constants'.
2024-02-01	GCN: Remove 'SGPR_OR_VGPR_REGNO_P' definition	Thomas Schwinge	1	-1/+0
	..., which was always (a) unused, and (b) bogus: always-false. gcc/ * config/gcn/gcn.h (SGPR_OR_VGPR_REGNO_P): Remove.
2024-02-01	GCN, RDNA 3: Adjust 'sync_compare_and_swap<mode>_lds_insn'	Thomas Schwinge	1	-1/+6
	For OpenACC/GCN '-march=gfx1100', a lot of libgomp OpenACC test cases FAIL: /tmp/ccGfLJ8a.mkoffload.2.s:406:2: error: instruction not supported on this GPU ds_cmpst_rtn_b32 v0, v0, v4, v3 ^ In RDNA 3, 'ds_cmpst_[...]' has been replaced by 'ds_cmpstore_[...]', and the notes for 'ds_cmpst_[...]' in pre-RDNA 3 ISA manuals: Caution, the order of src and cmp are the opposite of the BUFFER_ATOMIC_CMPSWAP opcode. ..., have been resolved for 'ds_cmpstore_[...]' in the RDNA 3 ISA manual: In this architecture the order of src and cmp agree with the BUFFER_ATOMIC_CMPSWAP opcode. ..., and therefore '%2', '%3' now swapped with regards to GCC operand order. Most of the affected libgomp OpenACC test cases then PASS their execution test. gcc/ * config/gcn/gcn.md (sync_compare_and_swap<mode>_lds_insn) [TARGET_RDNA3]: Adjust.
2024-01-31	Revert "RISC-V: Add non-vector types to dfa pipelines"	Edwin Lu	6	-102/+66
	This reverts commit 26c34b809cd1a6249027730a8b52bbf6a1c0f4a8.
2024-01-31	Revert "RISC-V: Add vector related pipelines"	Edwin Lu	3	-145/+126
	This reverts commit e56fb037d9d265682f5e7217d8a4c12a8d3fddf8.
2024-01-31	Revert "RISC-V: Enable assert for insn_has_dfa_reservation"	Edwin Lu	1	-0/+2
	This reverts commit 23cd2961bd2ff63583f46e3499a07bd54491d45c.
2024-01-31	RISC-V: Enable assert for insn_has_dfa_reservation	Edwin Lu	1	-2/+0
	Enables assert that every typed instruction is associated with a dfa reservation gcc/ChangeLog: * config/riscv/riscv.cc (riscv_sched_variable_issue): enable assert
2024-01-31	RISC-V: Add vector related pipelines	Edwin Lu	3	-126/+145
	Creates new generic vector pipeline file common to all cpu tunes. Moves all vector related pipelines from generic-ooo to generic-vector-ooo. Creates new vector crypto related insn reservations. gcc/ChangeLog: * config/riscv/generic-ooo.md (generic_ooo): Move reservation (generic_ooo_vec_load): ditto (generic_ooo_vec_store): ditto (generic_ooo_vec_loadstore_seg): ditto (generic_ooo_vec_alu): ditto (generic_ooo_vec_fcmp): ditto (generic_ooo_vec_imul): ditto (generic_ooo_vec_fadd): ditto (generic_ooo_vec_fmul): ditto (generic_ooo_crypto): ditto (generic_ooo_perm): ditto (generic_ooo_vec_reduction): ditto (generic_ooo_vec_ordered_reduction): ditto (generic_ooo_vec_idiv): ditto (generic_ooo_vec_float_divsqrt): ditto (generic_ooo_vec_mask): ditto (generic_ooo_vec_vesetvl): ditto (generic_ooo_vec_setrm): ditto (generic_ooo_vec_readlen): ditto * config/riscv/riscv.md: include generic-vector-ooo * config/riscv/generic-vector-ooo.md: New file. to here Signed-off-by: Edwin Lu <ewlu@rivosinc.com> Co-authored-by: Robin Dapp <rdapp.gcc@gmail.com>
2024-01-31	RISC-V: Add non-vector types to dfa pipelines	Edwin Lu	6	-66/+102
	This patch adds non-vector related insn reservations and updates/creates new insn reservations so all non-vector typed instructions have a reservation. gcc/ChangeLog: * config/riscv/generic-ooo.md (generic_ooo_sfb_alu): Add reservation (generic_ooo_branch): ditto * config/riscv/generic.md (generic_sfb_alu): ditto (generic_fmul_half): ditto * config/riscv/riscv.md: Remove cbo, pushpop, and rdfrm types * config/riscv/sifive-7.md (sifive_7_hfma):Add reservation (sifive_7_popcount): ditto * config/riscv/vector.md: change rdfrm to fmove * config/riscv/zc.md: change pushpop to load/store Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
2024-01-31	aarch64: -mstrict-align vs __arm_data512_t [PR113657]	Andrew Pinski	1	-4/+7
	After r14-1187-gd6b756447cd58b, simplify_gen_subreg can return NULL for "unaligned" memory subreg. Since V8DI has an alignment of 8 bytes, using TImode causes simplify_gen_subreg to return NULL. This fixes the issue by using DImode instead for the loop. And then we will have later on the STP/LDP pass combine it back into STP/LDP if needed. Since strict align is less important (usually used for firmware and early boot only), not doing LDP/STP here is ok. Built and tested for aarch64-linux-gnu with no regressions. PR target/113657 gcc/ChangeLog: * config/aarch64/aarch64-simd.md (split for movv8di): For strict aligned mode, use DImode instead of TImode. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/ls64_strict_align.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-01-31	aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]	Alex Coplan	1	-1/+1
	The PR shows us ICEing due to an unrecognizable TFmode save emitted by aarch64_process_components. The problem is that for T{I,F,D}mode we conservatively require mems to be in range for x-register ldp/stp. That is because (at least for TImode) it can be allocated to both GPRs and FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is a q-register load/store. As Richard pointed out in the PR, aarch64_get_separate_components already checks that the offsets are suitable for a single load, so we just need to choose a mode in aarch64_reg_save_mode that gives the full q-register range. In this patch, we choose V16QImode as an alternative 16-byte "bag-of-bits" mode that doesn't have the artificial range restrictions imposed on T{I,F,D}mode. For T{F,D}mode in GCC 15 I think we could consider relaxing the restriction imposed in aarch64_classify_address, as typically T{F,D}mode should be allocated to FPRs. But such a change seems too invasive to consider for GCC 14 at this stage (let alone backports). Fortunately the new flexible load/store pair patterns in GCC 14 allow this mode change to work without further changes. The backports are more involved as we need to adjust the load/store pair handling to cater for V16QImode in a few places. Note that for the testcase we are relying on the torture options to add -funroll-loops at -O3 which is necessary to trigger the ICE on trunk (but not on the 13 branch). gcc/ChangeLog: PR target/111677 * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use V16QImode for the full 16-byte FPR saves in the vector PCS case. gcc/testsuite/ChangeLog: PR target/111677 * gcc.target/aarch64/torture/pr111677.c: New test.
2024-01-31	AVR: Add AVR64DU and some older devices.	Georg-Johann Lay	1	-1/+7
	gcc/ * config/avr/avr-mcus.def: Add AVR64DU28, AVR64DU32, ATA5787, ATA5835, ATtiny64AUTO, ATA5700M322. * doc/avr-mmcu.texi: Rebuild.
2024-01-31	0From: Alexandre Oliva <oliva@adacore.com>	Alexandre Oliva	1	-0/+7
	strub: introduce STACK_ADDRESS_OFFSET Since STACK_POINTER_OFFSET is not necessarily at the boundary between caller- and callee-owned stack, as desired by __builtin_stack_address(), and using it as if it were or not causes problems, introduce a new macro so that ports can define it suitably, without modifying STACK_POINTER_OFFSET. for gcc/ChangeLog PR middle-end/112917 PR middle-end/113100 * builtins.cc (expand_builtin_stack_address): Use STACK_ADDRESS_OFFSET. * doc/extend.texi (__builtin_stack_address): Adjust. * config/sparc/sparc.h (STACK_ADDRESS_OFFSET): Define. * doc/tm.texi.in (STACK_ADDRESS_OFFSET): Document. * doc/tm.texi: Rebuilt.