riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-26	aarch64: Use <DWI> instead of <V2XWIDE> in scalar SQRSHRUN pattern	Kyrylo Tkachov	1	-10/+10
	In the scalar pattern for SQRSHRUN it's a bit clearer to use DWI instead of V2XWIDE to make it more clear that no vector modes are involved. No behavioural change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_sqrshrun_n<mode>_insn): Use <DWI> instead of <V2XWIDE>. (aarch64_sqrshrun_n<mode>): Likewise.
2023-06-26	aarch64: Clean up some rounding immediate predicates	Kyrylo Tkachov	4	-24/+20
	aarch64_simd_rsra_rnd_imm_vec is now used for more than just RSRA and accepts more than just vectors so rename it to make it more truthful. The aarch64_simd_rshrn_imm_vec is now unused and can be deleted. No behavioural change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_const_vec_rsra_rnd_imm_p): Rename to... (aarch64_rnd_imm_p): ... This. * config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): Rename to... (aarch64_int_rnd_operand): ... This. (aarch64_simd_rshrn_imm_vec): Delete. * config/aarch64/aarch64-simd.md (aarch64_<sra_op>rsra_n<mode>_insn): Adjust for the above. (aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): Likewise. (aarch64_<shrn_op>rshrn_n<mode>_insn): Likewise. (aarch64_sqrshrun_n<mode>_insn<vczle><vczbe>): Likewise. (aarch64_sqrshrun_n<mode>_insn): Likewise. (aarch64_<shrn_op>rshrn2_n<mode>_insn_le): Likewise. (aarch64_<shrn_op>rshrn2_n<mode>_insn_be): Likewise. (aarch64_sqrshrun2_n<mode>_insn_le): Likewise. (aarch64_sqrshrun2_n<mode>_insn_be): Likewise. * config/aarch64/aarch64.cc (aarch64_const_vec_rsra_rnd_imm_p): Rename to... (aarch64_rnd_imm_p): ... This.
2023-06-26	IBM zSystems: Assume symbols without explicit alignment to be ok	Andreas Krebbel	1	-2/+4
	A change we have committed back in 2015 relies on the backend requested ABI alignment to be applied to ALL symbols by the middle-end. However, this does not appear to be the case for external symbols. With this commit we assume all symbols without explicit alignment to be aligned according to the ABI. That's the behavior we had before. This fixes a performance regression caused by the 2015 patch. Since then the address of external char type symbols have been pushed to the literal pool, although it is safe to access them with larl (which requires symbols to reside at even addresses). gcc/ * config/s390/s390.cc (s390_encode_section_info): Set SYMBOL_FLAG_SET_NOTALIGN2 only if the symbol has explicitely been misaligned. gcc/testsuite/ * gcc.target/s390/larl-1.c: New test.
2023-06-26	RISC-V: Remove duplicated extern function_base decl	Pan Li	1	-5/+0
	Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.h: Remove duplicated decl.
2023-06-26	RISC-V: Remove redundant vcond patterns	Juzhe-Zhong	3	-61/+0
	Previously, Richi has suggested that vcond patterns are only needed when target support comparison + select consuming 1 instruction. Now, I do the experiments on removing those "vcond" patterns, it works perfectly. All testcases PASS. Really appreicate Richi helps us recognize such issue. Now remove all "vcond" patterns as Richi suggested. gcc/ChangeLog: * config/riscv/autovec.md (vcond<V:mode><VI:mode>): Remove redundant vcond patterns. (vcondu<V:mode><VI:mode>): Ditto. * config/riscv/riscv-protos.h (expand_vcond): Ditto. * config/riscv/riscv-v.cc (expand_vcond): Ditto.
2023-06-26	i386: New *ashl<dwi3>_doubleword_highpart define_insn_and_split.	Roger Sayle	1	-0/+34
	This patch contains a pair of (related) optimizations in i386.md that allow us to generate better code for the example below (this is a step towards fixing a bugzilla PR, but I've forgotten the number). __int128 foo64(__int128 x, long long y) { __int128 t = (__int128)y << 64; return x ^ t; } The hidden issue is that the RTL currently seen by reload contains the sign extension of y from DImode to TImode, even though this is dead (not required) for left shifts by more than WORD_SIZE bits. (insn 11 8 12 2 (parallel [ (set (reg:TI 0 ax [orig:91 y ] [91]) (sign_extend:TI (reg:DI 1 dx [97]))) (clobber (reg:CC 17 flags)) (clobber (scratch:DI)) ]) {extendditi2} What makes this particularly undesirable is that the sign-extension pattern above requires an additional DImode scratch register, indicated by the clobber, which unnecessarily increases register pressure. The proposed solution is to add a define_insn_and_split for such left shifts (of sign or zero extensions) that only have a non-zero highpart, where the extension is redundant and eliminated, that can be split after reload, without scratch registers or early clobbers. This (late split) exposes a second optimization opportunity where setting the lowpart to zero can sometimes be combined/simplified with the following instruction during peephole2. For the test case above, we previously generated with -O2: foo64: xorl %eax, %eax xorq %rsi, %rdx xorq %rdi, %rax ret with this patch, we now generate: foo64: movq %rdi, %rax xorq %rsi, %rdx ret Likewise for the related -m32 test case, we go from: foo32: movl 12(%esp), %eax movl %eax, %edx xorl %eax, %eax xorl 8(%esp), %edx xorl 4(%esp), %eax ret to the improved: foo32: movl 12(%esp), %edx movl 4(%esp), %eax xorl 8(%esp), %edx ret 2023-06-26 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Simplify zeroing a register followed by an IOR, XOR or PLUS operation on it, into a move. (ashl<dwi>3_doubleword_highpart): New define_insn_and_split to eliminate (and hide from reload) unnecessary word to doubleword extensions that are followed by left shifts by sufficiently large, but valid, bit counts. gcc/testsuite/ChangeLog gcc.target/i386/ashldi3-1.c: New 32-bit test case. * gcc.target/i386/ashlti3-2.c: New 64-bit test case.
2023-06-26	i386: Sync tune_string with arch_string for target attribute arch=*	Hongyu Wang	1	-1/+5
	For function with target attribute arch=, current logic will set its tune to -mtune from command line so all target_clones will get same tuning flags which would affect the performance for each clone. Override tune with arch if tune was not explicitly specified to get proper tuning flags for target_clones. gcc/ChangeLog: config/i386/i386-options.cc (ix86_valid_target_attribute_tree): Override tune_string with arch_string if tune_string is not explicitly specified. gcc/testsuite/ChangeLog: * gcc.target/i386/mvc17.c: New test.
2023-06-25	RISC-V: Optimize VSETVL codegen of SELECT_VL with LEN_MASK_{LOAD, STORE}	Juzhe-Zhong	2	-3/+47
	This patch is depending on LEN_MASK_{LOAD,STORE} patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622742.html After enabling the LEN_MASK_{LOAD,STORE}, I notice that there is a case that VSETVL PASS need to be optimized: void f (int32_t __restrict a, int32_t __restrict b, int32_t __restrict cond, int n) { for (int i = 0; i < 8; i++) if (cond[i]) a[i] = b[i]; } Before this patch: f: vsetivli a5,8,e8,mf4,tu,mu --> Propagate "8" to the following vsetvl vsetvli zero,a5,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma li a3,8 vmsne.vi v0,v0,0 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t sub a4,a3,a5 beq a3,a5,.L6 slli a5,a5,2 add a2,a2,a5 add a1,a1,a5 add a0,a0,a5 vsetvli a5,a4,e8,mf4,tu,mu --> Propagate "a4" to the following vsetvl vsetvli zero,a5,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma vmsne.vi v0,v0,0 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t .L6: ret Current VSETLV PASS only enable AVL propagation of VLMAX AVL ("zero"). Now, we enable AVL propagation of immediate && conservative non-VLMAX. After this patch: f: vsetivli a5,8,e8,mf4,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma li a3,8 vmsne.vi v0,v0,0 vsetivli zero,8,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t sub a4,a3,a5 beq a3,a5,.L6 slli a5,a5,2 vsetvli a4,a4,e8,mf4,ta,ma add a2,a2,a5 vle32.v v0,0(a2) add a1,a1,a5 vsetvli a6,zero,e32,m1,ta,ma add a0,a0,a5 vmsne.vi v0,v0,0 vsetvli zero,a4,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t .L6: ret gcc/ChangeLog: config/riscv/riscv-vsetvl.cc (vector_insn_info::parse_insn): Ehance AVL propagation. * config/riscv/riscv-vsetvl.h: New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/select_vl-1.c: Add dump checks. * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: New test.
2023-06-25	RISC-V: fix expand function of vlmul_ext RVV intrinsic	Li Xu	1	-1/+1
	Consider this following case: void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); } Compilation fails with: test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4': test.c:5:1: error: unrecognizable insn: 5 \| } \| ^ (insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32]) (mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1 (nil)) during RTL pass: vregs test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791 0x7c61b8 _fatal_insn(char const, rtx_def const, char const, int, char const) ../.././riscv-gcc/gcc/rtl-error.cc:108 0x7c61d7 _fatal_insn_not_found(rtx_def const, char const, int, char const) ../.././riscv-gcc/gcc/rtl-error.cc:116 0xed58a7 extract_insn(rtx_insn) ../.././riscv-gcc/gcc/recog.cc:2791 0xb7f789 instantiate_virtual_regs_in_insn ../.././riscv-gcc/gcc/function.cc:1611 0xb7f789 instantiate_virtual_regs ../.././riscv-gcc/gcc/function.cc:1984 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: change emit_insn to emit_move_insn gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.
2023-06-25	RISC-V: Enable len_mask{load, store} and remove len_{load, store}	Juzhe-Zhong	6	-15/+105
	This patch enable len_mask_{load,store} to support flow-control in RVV auto-vectorization. Consider this following case: void f (int32_t __restrict a, int32_t __restrict b, int32_t __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i]; } Before this patch: <source>:9:21: missed: couldn't vectorize loop <source>:9:21: missed: not vectorized: control flow in loop. After this patch: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e32,m1,ta,ma vle32.v v0,0(a2) vsetvli a6,zero,e32,m1,ta,ma slli a4,a5,2 vmsne.vi v0,v0,0 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a1),v0.t vse32.v v1,0(a0),v0.t add a2,a2,a4 add a1,a1,a4 add a0,a0,a4 bne a3,zero,.L3 .L5: ret gcc/ChangeLog: config/riscv/autovec.md (len_load_<mode>): Remove. (len_maskload<mode><vm>): Remove. (len_store_<mode>): New pattern. (len_maskstore<mode><vm>): New pattern. * config/riscv/predicates.md (autovec_length_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_load_store): New function. * config/riscv/riscv-v.cc (emit_vlmax_masked_insn): Ditto. (emit_nonvlmax_masked_insn): Ditto. (expand_load_store): Ditto. * config/riscv/riscv-vector-builtins.cc (function_expander::use_contiguous_store_insn): Add avl_type operand into pred_store. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-2.h: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup-3.h: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/single_rgroup_run-3.c: New test.
2023-06-25	Revert "RISC-V:Add float16 tuple type abi"	Pan Li	1	-18/+13
	This reverts commit f9ab5d62c94547499de52c800ab914cc8e802212 due to the bootstrap failure on machine mode out of range memory access. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/vector.md: Revert. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-10.c: Revert. * gcc.target/riscv/rvv/base/abi-11.c: Ditto. * gcc.target/riscv/rvv/base/abi-12.c: Ditto. * gcc.target/riscv/rvv/base/abi-15.c: Ditto. * gcc.target/riscv/rvv/base/abi-8.c: Ditto. * gcc.target/riscv/rvv/base/abi-9.c: Ditto. * gcc.target/riscv/rvv/base/abi-17.c: Ditto. * gcc.target/riscv/rvv/base/abi-18.c: Ditto.
2023-06-25	Revert "RISC-V:Add float16 tuple type support"	Pan Li	7	-144/+3
	This reverts commit 8a96f240d71d367a2955ab9e0f0fef3a0b0e2a74 due to bootstrap failure on mode out of range access, will commit this patch after the issue addressed. gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (valid_type): Revert changes. * config/riscv/riscv-modes.def (RVV_TUPLE_MODES): Ditto. (ADJUST_ALIGNMENT): Ditto. (RVV_TUPLE_PARTIAL_MODES): Ditto. (ADJUST_NUNITS): Ditto. * config/riscv/riscv-vector-builtins-types.def (vfloat16mf4x2_t): Ditto. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Diito. (vfloat16m2x4_t): Diito. (vfloat16m4x2_t): Diito. * config/riscv/riscv-vector-builtins.def (vfloat16mf4x2_t): Ditto. (vfloat16mf4x3_t): Ditto. (vfloat16mf4x4_t): Ditto. (vfloat16mf4x5_t): Ditto. (vfloat16mf4x6_t): Ditto. (vfloat16mf4x7_t): Ditto. (vfloat16mf4x8_t): Ditto. (vfloat16mf2x2_t): Ditto. (vfloat16mf2x3_t): Ditto. (vfloat16mf2x4_t): Ditto. (vfloat16mf2x5_t): Ditto. (vfloat16mf2x6_t): Ditto. (vfloat16mf2x7_t): Ditto. (vfloat16mf2x8_t): Ditto. (vfloat16m1x2_t): Ditto. (vfloat16m1x3_t): Ditto. (vfloat16m1x4_t): Ditto. (vfloat16m1x5_t): Ditto. (vfloat16m1x6_t): Ditto. (vfloat16m1x7_t): Ditto. (vfloat16m1x8_t): Ditto. (vfloat16m2x2_t): Ditto. (vfloat16m2x3_t): Ditto. (vfloat16m2x4_t): Ditto. (vfloat16m4x2_t): Ditto. * config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto. * config/riscv/riscv.md: Ditto. * config/riscv/vector-iterators.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/tuple-28.c: Removed. * gcc.target/riscv/rvv/base/tuple-29.c: Removed. * gcc.target/riscv/rvv/base/tuple-30.c: Removed. * gcc.target/riscv/rvv/base/tuple-31.c: Removed. * gcc.target/riscv/rvv/base/tuple-32.c: Removed. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-06-25	Refine maskloadmn pattern with UNSPEC_MASKLOAD.	liuhongt	1	-14/+18
	If mem_addr points to a memory region with less than whole vector size bytes of accessible memory and k is a mask that would prevent reading the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd. gcc/ChangeLog: PR target/110309 * config/i386/sse.md (maskload<mode><avx512fmaskmodelower>): Refine pattern with UNSPEC_MASKLOAD. (maskload<mode><avx512fmaskmodelower>): Ditto. (<avx512>_load<mode>_mask): Extend mode iterator to VI12HFBF_AVX512VL. (<avx512>_load<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110309.c: New test.
2023-06-25	RISC-V:Add float16 tuple type abi	yulong	1	-13/+18
	gcc/ChangeLog: * config/riscv/vector.md: Add float16 attr at sew、vlmul and ratio. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-10.c: Add float16 tuple type case. * gcc.target/riscv/rvv/base/abi-11.c: Ditto. * gcc.target/riscv/rvv/base/abi-12.c: Ditto. * gcc.target/riscv/rvv/base/abi-15.c: Ditto. * gcc.target/riscv/rvv/base/abi-8.c: Ditto. * gcc.target/riscv/rvv/base/abi-9.c: Ditto. * gcc.target/riscv/rvv/base/abi-17.c: New test. * gcc.target/riscv/rvv/base/abi-18.c: New test.
2023-06-24	i386: Add alternate representation for {and,or,xor}b %ah,%dh.	Roger Sayle	1	-0/+22
	A patch that I'm working on to improve RTL simplifications in the middle-end results in the regression of pr78904-1b.c, due to changes in the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic. See also PR target/78904. This patch avoids/prevents those failures by adding support for the alternate representation, duplicating the existing <code>qi_ext<mode>_2 as <code>qi_ext<mode>_3 (the new version also replacing any_or with any_logic to provide andqi_ext<mode>_3 in the same pattern). Removing the original pattern isn't trivial, as it's generated by define_split, but this can be investigated after the other pieces are approved. The current representation of this instruction is: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94) (const_int 8 [0x8]) (const_int 8 [0x8])) 0) (subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) 0)) 0)) after my proposed middle-end improvement, we attempt to recognize: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (zero_extract:DI (xor:DI (reg:DI 94) (reg/v:DI 87 [ aD.2763 ])) (const_int 8 [0x8]) (const_int 8 [0x8]))) 2023-06-24 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog config/i386/i386.md (*<code>qi_ext<mode>_3): New define_insn.
2023-06-24	RISC-V: Refactor the integer ternary autovec pattern	Juzhe-Zhong	1	-26/+28
	Long time ago, I encounter ICE when trying to set clobber register as Pmode and I forgot the reason. So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which makes patterns look unreasonable. According to Jeff's comments, I tried it again, it works now when we try to set clobber register as Pmode and the patterns look more reasonable now. The tests are all passed, Ok for trunk. gcc/ChangeLog: * config/riscv/autovec.md (fma<mode>): set clobber to Pmode in expand stage. (fma<VI:mode><P:mode>): Ditto. (fnma<mode>): Ditto. (fnma<VI:mode><P:mode>): Ditto.
2023-06-24	RISC-V: Support RVV floating-point auto-vectorization	Juzhe-Zhong	4	-14/+224
	This patch adds RVV floating-point auto-vectorization. Also, fix attribute bug of floating-point ternary operations in vector.md. gcc/ChangeLog: * config/riscv/autovec.md (fma<mode>4): New pattern. (fma<mode>): Ditto. (fnma<mode>4): Ditto. (fnma<mode>): Ditto. (fms<mode>4): Ditto. (fms<mode>): Ditto. (fnms<mode>4): Ditto. (fnms<mode>): Ditto. * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto. * config/riscv/vector.md: Fix attribute bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Adjust tests. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-9.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run_zvfh-9.c: New test.
2023-06-23	Fix power10 fusion bug with prefixed loads, PR target/105325	Michael Meissner	4	-36/+46
	This changes fixes PR target/105325. PR target/105325 is a bug where an invalid lwa instruction is generated due to power10 fusion of a load instruction to a GPR and an compare immediate instruction with the immediate being -1, 0, or 1. In some cases, when the load instruction is done, the GCC compiler would generate a load instruction with an offset that was too large to fit into the normal load instruction. In particular, loads from the stack might originally have a small offset, so that the load is not a prefixed load. However, after the stack is set up, and register allocation has been done, the offset now is large enough that we would have to use a prefixed load instruction. The support for prefixed loads did not consider that patterns with a fused load and compare might have a prefixed address. Without this support, the proper prefixed load won't be generated. In the original code, when the split2 pass is run after reload has finished the ds_form_mem_operand predicate that was used for lwa and ld no longer returns true. When the pattern was created, ds_form_mem_operand recognized the insn as being valid since the offset was small. But after register allocation, ds_form_mem_operand did not return true. Because it didn't return true, the insn could not be split. Since the insn was not split and the prefix support did not indicate a prefixed instruction was used, the wrong load is generated. The solution involves: 1) Don't use ds_form_mem_operand for ld and lwa, always use non_update_memory_operand. 2) Delete ds_form_mem_operand since it is no longer used. 3) Use the "YZ" constraints for ld/lwa instead of "m". 4) If we don't need to sign extend the lwa, convert it to lwz, and use cmpwi instead of cmpdi. Adjust the insn name to reflect the code generate. 5) Insure that the insn using lwa will be recognized as having a prefixed operand (and hence the insn length will be 16 bytes instead of 8 bytes). 5a) Set the prefixed and maybe_prefix attributes to know that fused_load_cmpi are also load insns; 5b) In the case where we are just setting CC and not using the memory afterward, set the clobber to use a DI register, and put an explicit sign_extend operation in the split; 5c) Set the sign_extend attribute to "yes" for lwa. 5d) 5a-5c are the things that prefixed_load_p in rs6000.cc checks to ensure that lwa is treated as a ds-form instruction and not as a d-form instruction (i.e. lwz). 6) Add a new test case for this case. 7) Adjust the insn counts in fusion-p10-ldcmpi.c. Because we are no longer using ds_form_mem_operand, the ld and lwa instructions will fuse x-form (reg+reg) addresses in addition ds-form (reg+offset or reg). 2023-06-23 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/105325 * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that allowed prefixed lwa to be generated. * config/rs6000/fusion.md: Regenerate. * config/rs6000/predicates.md (ds_form_mem_operand): Delete. * config/rs6000/rs6000.md (prefixed attribute): Add support for load plus compare immediate fused insns. (maybe_prefixed): Likewise. gcc/testsuite/ PR target/105325 * g++.target/powerpc/pr105325.C: New test. * gcc.target/powerpc/fusion-p10-ldcmpi.c: Update insn counts. Co-Authored-By: Aaron Sawdey <acsawdey@linux.ibm.com>
2023-06-22	Change fma_reassoc_width tuning for ampere1	Di Zhao OS	1	-1/+1
	This patch enables reassociation of floating-point additions on ampere1. This brings about 1% overall benefit on spec2017 fprate cases. (There are minor regressions in 510.parest_r and 508.namd_r, analyzed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110279 .) gcc/ChangeLog: * config/aarch64/aarch64.cc: Change fma_reassoc_width for ampere1.
2023-06-22	i386: Convert ptestz of pandn into ptestc.	Roger Sayle	3	-8/+91
	This patch is the next installment in a set of backend patches around improvements to ptest/vptest. A previous patch optimized the sequence t=pand(x,y); ptestz(t,t) into the equivalent ptestz(x,y), using the property that ZF is set to (X&Y) == 0. This patch performs a similar transformation, converting t=pandn(x,y); ptestz(t,t) into the (almost) equivalent ptestc(y,x), using the property that the CF flags is set to (~X&Y) == 0. The tricky bit is that this sets the CF flag instead of the ZF flag, so we can only perform this transformation when we can also convert the flags consumer, as well as the producer. For the test case: int foo (__m128i x, __m128i y) { __m128i a = x & ~y; return __builtin_ia32_ptestz128 (a, a); } With -O2 -msse4.1 we previously generated: foo: pandn %xmm0, %xmm1 xorl %eax, %eax ptest %xmm1, %xmm1 sete %al ret with this patch we now generate: foo: xorl %eax, %eax ptest %xmm0, %xmm1 setc %al ret At the same time, this patch also provides alternative fixes for PR target/109973 and PR target/110118, by recognizing that ptestc(x,x) always sets the carry flag (X&~X is always zero). This is achieved both by recognizing the special case in ix86_expand_sse_ptest and with a splitter to convert an eligible ptest into an stc. 2023-06-22 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_expand_sse_ptest): Recognize expansion of ptestc with equal operands as producing const1_rtx. * config/i386/i386.cc (ix86_rtx_costs): Provide accurate cost estimates of UNSPEC_PTEST, where the ptest performs the PAND or PAND of its operands. * config/i386/sse.md (define_split): Transform CCCmode UNSPEC_PTEST of reg_equal_p operands into an x86_stc instruction. (define_split): Split pandn/ptestz/set{n?}e into ptestc/set{n?}c. (define_split): Similar to above for strict_low_part destinations. (define_split): Split pandn/ptestz/j{n?}e into ptestc/j{n?}c. gcc/testsuite/ChangeLog * gcc.target/i386/avx-vptest-4.c: New test case. * gcc.target/i386/avx-vptest-5.c: Likewise. * gcc.target/i386/avx-vptest-6.c: Likewise. * gcc.target/i386/pr109973-1.c: Update test case. * gcc.target/i386/pr109973-2.c: Likewise. * gcc.target/i386/sse4_1-ptest-4.c: New test case. * gcc.target/i386/sse4_1-ptest-5.c: Likewise. * gcc.target/i386/sse4_1-ptest-6.c: Likewise.
2023-06-21	c-family: implement -ffp-contract=on	Alexander Monakov	1	-1/+1
	Implement -ffp-contract=on for C and C++ without changing default behavior (=off for -std=cNN, =fast for C++ and -std=gnuNN). gcc/c-family/ChangeLog: * c-gimplify.cc (fma_supported_p): New helper. (c_gimplify_expr) [PLUS_EXPR, MINUS_EXPR]: Implement FMA contraction. gcc/ChangeLog: * common.opt (fp_contract_mode) [on]: Remove fallback. * config/sh/sh.md (fmasf4): Correct flag_fp_contract_mode test. doc/invoke.texi (-ffp-contract): Update. * trans-mem.cc (diagnose_tm_1): Skip internal function calls.
2023-06-21	aarch64: Avoid same input and output Z register for gather loads	Kyrylo Tkachov	2	-71/+135
	The architecture recommends that load-gather instructions avoid using the same Z register for the load address and the destination, and the Software Optimization Guides for Arm cores recommend that as well. This means that for code like: svuint64_t food (svbool_t p, uint64_t in, svint64_t offsets, svuint64_t a) { return svadd_u64_x (p, a, svld1_gather_offset(p, in, offsets)); } we'll want to avoid generating the current: food: ld1d z0.d, p0/z, [x0, z0.d] // Z0 reused as input and output. add z0.d, z1.d, z0.d ret However, we still want to avoid generating extra moves where there were none before, so the tight aarch64-sve-acle.exp tests for load gathers should still pass as they are. This patch implements that recommendation for the load gather patterns by: duplicating the alternatives * marking the output operand as early clobber * Tying the input Z register operand in the original alternatives to 0 * Penalising the original alternatives with '?' This results in a large-ish patch in terms of diff lines but the new compact syntax (thanks Tamar) makes it quite a readable an regular change. The benchmark numbers on a Neoverse V1 on fprate look okay: diff 503.bwaves_r 0.00% 507.cactuBSSN_r 0.00% 508.namd_r 0.00% 510.parest_r 0.55% 511.povray_r 0.22% 519.lbm_r 0.00% 521.wrf_r 0.00% 526.blender_r 0.00% 527.cam4_r 0.56% 538.imagick_r 0.00% 544.nab_r 0.00% 549.fotonik3d_r 0.00% 554.roms_r 0.00% fprate 0.10% Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Add alternatives to prefer to avoid same input and output Z register. (mask_gather_load<mode><v_int_container>): Likewise. (mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (mask_gather_load<mode><v_int_container>_sxtw): Likewise. (mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (aarch64_ldff1_gather<mode>_sxtw): Likewise. (aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/gather_earlyclobber.c: New test. * gcc.target/aarch64/sve2/gather_earlyclobber.c: New test.
2023-06-21	aarch64: Convert SVE gather patterns to compact syntax	Kyrylo Tkachov	2	-191/+211
	This patch converts the SVE load gather patterns to the new compact syntax that Tamar introduced. This allows for a future patch I want to contribute to add more alternatives that are better viewed in the more compact form. The lines in some patterns are >80 long now, but I think that's unavoidable and those patterns already had overly long constraint strings. No functional change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Convert to compact alternatives syntax. (mask_gather_load<mode><v_int_container>): Likewise. (mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (mask_gather_load<mode><v_int_container>_sxtw): Likewise. (mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (aarch64_ldff1_gather<mode>_sxtw): Likewise. (aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise.
2023-06-21	Revert "aarch64: Convert SVE gather patterns to compact syntax"	Kyrylo Tkachov	2	-275/+191
	This reverts commit bb3c69058a5fb874ea3c5c26bfb331d33d0497c3.
2023-06-21	aarch64: Convert SVE gather patterns to compact syntax	Kyrylo Tkachov	2	-191/+275
	This patch converts the SVE load gather patterns to the new compact syntax that Tamar introduced. This allows for a future patch I want to contribute to add more alternatives that are better viewed in the more compact form. The lines in some patterns are >80 long now, but I think that's unavoidable and those patterns already had overly long constraint strings. No functional change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (mask_gather_load<mode><v_int_container>): Convert to compact alternatives syntax. (mask_gather_load<mode><v_int_container>): Likewise. (mask_gather_load<mode><v_int_container>_<su>xtw_unpacked): Likewise. (mask_gather_load<mode><v_int_container>_sxtw): Likewise. (mask_gather_load<mode><v_int_container>_uxtw): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_4HSI:mode><SVE_4BHI:mode>): Likewise. (@aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode><SVE_2BHSI:mode>): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_<ANY_EXTEND2:su>xtw_unpacked): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_sxtw): Likewise. (aarch64_gather_load_<ANY_EXTEND:optab><SVE_2HSDI:mode> <SVE_2BHSI:mode>_uxtw): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (@aarch64_ldff1_gather<mode>): Likewise. (aarch64_ldff1_gather<mode>_sxtw): Likewise. (aarch64_ldff1_gather<mode>_uxtw): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx4_WIDE:mode> <VNx4_NARROW:mode>): Likewise. (@aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_sxtw): Likewise. (aarch64_ldff1_gather_<ANY_EXTEND:optab><VNx2_WIDE:mode> <VNx2_NARROW:mode>_uxtw): Likewise. * config/aarch64/aarch64-sve2.md (@aarch64_gather_ldnt<mode>): Likewise. (@aarch64_gather_ldnt_<ANY_EXTEND:optab><SVE_FULL_SDI:mode> <SVE_PARTIAL_I:mode>): Likewise.
2023-06-21	[i386] Reject too large vectors for partial vector vectorization	Richard Biener	1	-0/+26
	The following works around the lack of the x86 backend making the vectorizer compare the costs of the different possible vector sizes the backed advertises through the vector_modes hook. When enabling masked epilogues or main loops then this means we will select the prefered vector mode which is usually the largest even for loops that do not iterate close to the times the vector has lanes. When not using masking the vectorizer would reject any mode resulting in a VF bigger than the number of iterations but with masking they are simply masked out. So this overloads the finish_cost function and matches for the problematic case, forcing a high cost to make us try a smaller vector size. * config/i386/i386.cc (ix86_vector_costs::finish_cost): Overload. For masked main loops make sure the vectorization factor isn't more than double the number of iterations. * gcc.target/i386/vect-partial-vectors-1.c: New testcase. * gcc.target/i386/vect-partial-vectors-2.c: Likewise.
2023-06-21	x86: make VPTERNLOG* usable on less than 512-bit operands with just AVX512F	Jan Beulich	2	-11/+27
	There's no reason to constrain this to AVX512VL, unless instructed so by -mprefer-vector-width=, as the wider operation is unusable for more narrow operands only when the possible memory source is a non-broadcast one. This way even the scalar copysign<mode>3 can benefit from the operation being a single-insn one (leaving aside moves which the compiler decides to insert for unclear reasons, and leaving aside the fact that bcst_mem_operand() is too restrictive for broadcast to be embedded right into VPTERNLOG). While there also bring <avx512>_vternlog<mode>_all's in sync with that of the three splitters. Along with this also request value duplication in ix86_expand_copysign()'s call to ix86_build_signbit_mask(), eliminating excess space allocation in .rodata., filled with zeros which are never read. gcc/ config/i386/i386-expand.cc (ix86_expand_copysign): Request value duplication by ix86_build_signbit_mask() when AVX512F and not HFmode. * config/i386/sse.md (<avx512>_vternlog<mode>_all): Convert to 2-alternative form. Adjust "mode" attribute. Add "enabled" attribute. (<avx512>_vpternlog<mode>_1): Also permit when TARGET_AVX512F && !TARGET_PREFER_AVX256. (<avx512>_vpternlog<mode>_2): Likewise. (<avx512>_vpternlog<mode>_3): Likewise. gcc/testsuite/ * gcc.target/i386/avx512f-copysign.c: New test.
2023-06-20	aarch64: Robustify stack tie handling	Richard Sandiford	2	-7/+18
	The SVE handling of stack clash protection copied the stack pointer to X11 before the probe and set up X11 as the CFA for unwind purposes: /* This is done to provide unwinding information for the stack adjustments we're about to do, however to prevent the optimizers from removing the R11 move and leaving the CFA note (which would be very wrong) we tie the old and new stack pointer together. The tie will expand to nothing but the optimizers will not touch the instruction. / rtx stack_ptr_copy = gen_rtx_REG (Pmode, STACK_CLASH_SVE_CFA_REGNUM); emit_move_insn (stack_ptr_copy, stack_pointer_rtx); emit_insn (gen_stack_tie (stack_ptr_copy, stack_pointer_rtx)); / We want the CFA independent of the stack pointer for the duration of the loop. / add_reg_note (insn, REG_CFA_DEF_CFA, stack_ptr_copy); RTX_FRAME_RELATED_P (insn) = 1; -fcprop-registers is now smart enough to realise that X11 = SP, replace X11 with SP in the stack tie, and delete the instruction created above. This patch tries to prevent that by making stack_tie fussy about the register numbers. It fixes failures in gcc.target/aarch64/sve/pcs/stack_clash.c. gcc/ * config/aarch64/aarch64.md (stack_tie): Hard-code the first register operand to the stack pointer. Require the second register operand to have the number specified in a separate const_int operand. * config/aarch64/aarch64.cc (aarch64_emit_stack_tie): New function. (aarch64_allocate_and_probe_stack_space): Use it. (aarch64_expand_prologue, aarch64_expand_epilogue): Likewise. (aarch64_expand_epilogue): Likewise.
2023-06-20	rs6000: Add builtins for IEEE 128-bit floating point values	Carl Love	5	-21/+62
	Add support for the following builtins: __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128); __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128); __ieee128 scalar_insert_exp (__vector unsigned __int128, __vector unsigned long long); The instructions used in the builtins operate on vector registers. Thus the result must be moved to a scalar type. There is no clean, performant way to do this. The user code typically needs the result as a vector anyway. gcc/ * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename CODE_FOR_xsxsigqp_tf to CODE_FOR_xsxsigqp_tf_ti. Rename CODE_FOR_xsxsigqp_kf to CODE_FOR_xsxsigqp_kf_ti. Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di. Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di. (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti, CODE_FOR_xsiexpqp_kf_v2di): Add case statements. * config/rs6000/rs6000-builtins.def (__builtin_vsx_scalar_extract_exp_to_vec, __builtin_vsx_scalar_extract_sig_to_vec, __builtin_vsx_scalar_insert_exp_vqp): Add new builtin definitions. Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di, xsxsigqp_kf_ti, xsiexpqp_kf_di respectively. * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new overloaded instance. Update comments. * config/rs6000/rs6000-overload.def (__builtin_vec_scalar_insert_exp): Add new overload definition with vector arguments. (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New overloaded definitions. * config/rs6000/vsx.md (V2DI_DI): New mode iterator. (DI_to_TI): New mode attribute. Rename xsxexpqp_<mode> to sxexpqp_<IEEE128:mode>_<V2DI_DI:mode>. Rename xsxsigqp_<mode> to xsxsigqp_<IEEE128:mode>_<VEC_TI:mode>. Rename xsiexpqp_<mode> to xsiexpqp_<IEEE128:mode>_<V2DI_DI:mode>. * doc/extend.texi (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): Add documentation for new builtins. (scalar_insert_exp): Add new overloaded builtin definition. gcc/testsuite/ * gcc.target/powerpc/bfp/scalar-extract-exp-8.c: New test case. * gcc.target/powerpc/bfp/scalar-extract-sig-8.c: New test case. * gcc.target/powerpc/bfp/scalar-insert-exp-16.c: New test case.
2023-06-20	RISC-V: Set the natural size of constant vector mask modes to one RVV data ↵	Li Xu	1	-0/+5
	vector. If reinterpret vnx2bi as vnx16qi, vnx16qi must occupy no more of the underlying registers than vnx2bi. Consider this following case: void test_vreinterpret_v_b64_i8m1 (uint8_t in, int8_t out) { vbool64_t vmask = __riscv_vlm_v_b64 (in, 2); vint8m1_t vout = __riscv_vreinterpret_v_b64_i8m1 (vmask); __riscv_vse8_v_i8m1(out, vout, 16); } compiler parameters: -march=rv64gcv -mabi=lp64d --param=riscv-autovec-preference=fixed-vlmax -O3 Compilation fails with: test_vreinterpret_v_b64_i8m1during RTL pass: expand test.c: In function 'test_vreinterpret_v_b64_i8m1': test.c:11:22: internal compiler error: in gen_lowpart_general, at rtlhooks.cc:57 11 \| vint8m1_t vout = __riscv_vreinterpret_v_b64_i8m1(src); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0xf11876 gen_lowpart_general(machine_mode, rtx_def) ../.././riscv-gcc/gcc/rtlhooks.cc:57 0x191435e gen_vreinterpretvnx16qi(rtx_def, rtx_def) ../.././riscv-gcc/gcc/config/riscv/vector.md:486 0xe08858 maybe_expand_insn(insn_code, unsigned int, expand_operand) ../.././riscv-gcc/gcc/optabs.cc:8213 0x1471209 riscv_vector::function_expander::generate_insn(insn_code) ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.cc:3813 0x147629c riscv_vector::function_expander::expand() ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.h:520 0x147629c riscv_vector::expand_builtin(unsigned int, tree_node, rtx_def) ../.././riscv-gcc/gcc/config/riscv/riscv-vector-builtins.cc:4103 0x9868f9 expand_builtin(tree_node, rtx_def, rtx_def, machine_mode, int) ../.././riscv-gcc/gcc/builtins.cc:7342 gcc/ChangeLog: config/riscv/riscv.cc (riscv_regmode_natural_size): set the natural size of vector mask mode to one rvv register. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vreinterpet-fixed.c: New test.
2023-06-20	RISC-V: Optimize codegen of VLA SLP	Juzhe-Zhong	1	-45/+36
	Add comments for Robin: We want to create a pattern where value[ix] = floor (ix / NPATTERNS). As NPATTERNS is always a power of two we can rewrite this as = ix & -NPATTERNS. ` Recently, I figure out a better approach in case of codegen for VLA stepped vector. Here is the detail descriptions: Case 1: void f (uint8_t restrict a, uint8_t restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8] = b[i * 8 + 37] + 1; a[i * 8 + 1] = b[i * 8 + 37] + 2; a[i * 8 + 2] = b[i * 8 + 37] + 3; a[i * 8 + 3] = b[i * 8 + 37] + 4; a[i * 8 + 4] = b[i * 8 + 37] + 5; a[i * 8 + 5] = b[i * 8 + 37] + 6; a[i * 8 + 6] = b[i * 8 + 37] + 7; a[i * 8 + 7] = b[i * 8 + 37] + 8; } } We need to generate the stepped vector: NPATTERNS = 8. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8 } Before this patch: vid.v v4 ;; {0,1,2,3,4,5,6,7,...} vsrl.vi v4,v4,3 ;; {0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,...} li a3,8 ;; {8} vmul.vx v4,v4,a3 ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...} After this patch: vid.v v4 ;; {0,1,2,3,4,5,6,7,...} vand.vi v4,v4,-8(-NPATTERNS) ;; {0,0,0,0,0,0,0,8,8,8,8,8,8,8,8,...} Case 2: void f (uint8_t restrict a, uint8_t restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8] = b[i * 8 + 3] + 1; a[i * 8 + 1] = b[i * 8 + 2] + 2; a[i * 8 + 2] = b[i * 8 + 1] + 3; a[i * 8 + 3] = b[i * 8 + 0] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 6] + 6; a[i * 8 + 6] = b[i * 8 + 5] + 7; a[i * 8 + 7] = b[i * 8 + 4] + 8; } } We need to generate the stepped vector: NPATTERNS = 4. { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } Before this patch: li a6,134221824 slli a6,a6,5 addi a6,a6,3 ;; 64-bit: 0x0003000200010000 vmv.v.x v6,a6 ;; {3, 2, 1, 0, ... } vid.v v4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... } vsrl.vi v4,v4,2 ;; {0, 0, 0, 0, 1, 1, 1, 1, ... } li a3,4 ;; {4} vmul.vx v4,v4,a3 ;; {0, 0, 0, 0, 4, 4, 4, 4, ... } vadd.vv v4,v4,v6 ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } After this patch: li a3,-536875008 slli a3,a3,4 addi a3,a3,1 slli a3,a3,16 vmv.v.x v2,a3 ;; {3, 1, -1, -3, ... } vid.v v4 ;; {0, 1, 2, 3, 4, 5, 6, 7, ... } vadd.vv v4,v4,v2 ;; {3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12, ... } gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Optimize codegen. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt testcase. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-16.c: New test.
2023-06-20	RISC-V: Fix compiler warning of riscv_arg_has_vector	Lehua Ding	1	-2/+4
	Hi, This little patch fixes a compile warning issue that my previous patch introduced, sorry for introducing this issue. Best, Lehua gcc/ChangeLog: * config/riscv/riscv.cc (riscv_arg_has_vector): Add default switch handler.
2023-06-20	aarch64: Optimise ADDP with same source operands	Kyrylo Tkachov	1	-0/+30
	We've been asked to optimise the testcase in this patch of a 64-bit ADDP with the low and high halves of the same 128-bit vector. This can be done by a single .4s ADDP followed by just reading the bottom 64 bits. A splitter for this is quite straightforward now that all the vec_concat stuff is collapsed by simplify-rtx. With this patch we generate a single: addp v0.4s, v0.4s, v0.4s instead of: dup d31, v0.d[1] addp v0.2s, v0.2s, v31.2s ret Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_addp_same_reg<mode>): New define_insn_and_split. gcc/testsuite/ChangeLog: gcc.target/aarch64/simd/addp-same-low_1.c: New test.
2023-06-20	AArch64: remove test comment from *mov<mode>_aarch64	Tamar Christina	1	-1/+1
	I accidentally left a test comment in the final version of the patch. This removes the comment. gcc/ChangeLog: * config/aarch64/aarch64.md (*mov<mode>_aarch64): Drop test comment.
2023-06-20	x86: correct and improve "*vec_dupv2di"	Jan Beulich	1	-6/+22
	The input constraint for the %vmovddup alternative was wrong, as the upper 16 XMM registers require AVX512VL to be used with this insn. To compensate, introduce a new alternative permitting all 32 registers, by broadcasting to the full 512 bits in that case if AVX512VL is not available. gcc/ * config/i386/sse.md (vec_dupv2di): Correct %vmovddup input constraint. Add new AVX512F alternative. gcc/testsuite/ * gcc.target/i386/avx512f-dupv2di.c: New test.
2023-06-20	RISC-V: Add tuple vector mode psABI checking and simplify code	Lehua Ding	1	-36/+17
	Hi, This patch does several things: 1. Adds the missed checking of tuple vector mode 2. Extend the scope of checking to all vector types, previously it was only for scalable vector types. 3. Simplify the logic of determining code of vector type which will lower to vector tmode code Best, Lehua gcc/ChangeLog: * config/riscv/riscv.cc (riscv_scalable_vector_type_p): Delete. (riscv_arg_has_vector): Simplify. (riscv_pass_in_vector_p): Adjust warning message. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Add -Wno-psabi option. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/merge_run-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: Ditto. * gcc.target/riscv/rvv/base/pr110119-1.c: Ditto. * gcc.target/riscv/rvv/base/pr110119-2.c: Ditto. * gcc.target/riscv/vector-abi-1.c: Ditto. * gcc.target/riscv/vector-abi-2.c: Ditto. * gcc.target/riscv/vector-abi-3.c: Ditto. * gcc.target/riscv/vector-abi-4.c: Ditto. * gcc.target/riscv/vector-abi-5.c: Ditto. * gcc.target/riscv/vector-abi-6.c: Ditto. * gcc.target/riscv/vector-abi-7.c: New test. * gcc.target/riscv/vector-abi-8.c: New test. * gcc.target/riscv/vector-abi-9.c: New test.
2023-06-19	RISC-V: Save and restore FCSR in interrupt functions to avoid program errors.	Jin Ma	2	-3/+58
	In order to avoid interrupt functions to change the FCSR, it needs to be saved and restored at the beginning and end of the function. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_compute_frame_info): Allocate frame for FCSR. (riscv_for_each_saved_reg): Save and restore FCSR in interrupt functions. * config/riscv/riscv.md (riscv_frcsr): New patterns. (riscv_fscsr): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-fcsr-1.c: New test. * gcc.target/riscv/interrupt-fcsr-2.c: New test. * gcc.target/riscv/interrupt-fcsr-3.c: New test.
2023-06-19	AArch64: convert some patterns to compact MD syntax	Tamar Christina	1	-83/+78
	Hi All, This converts some patterns in the AArch64 backend to use the new compact syntax. gcc/ChangeLog: * config/aarch64/aarch64.md (arches): Add nosimd. (mov<mode>_aarch64, movsi_aarch64, *movdi_aarch64): Rewrite to compact syntax.
2023-06-19	RISC-V: Fix VWEXTF iterator requirement	Li Xu	1	-6/+6
	gcc/ChangeLog: * config/riscv/vector-iterators.md: zvfh/zvfhmin depends on the Zve32f extension.
2023-06-19	RISC-V: Bugfix for RVV widenning reduction in ZVE32/64	Pan Li	3	-199/+163
	The rvv widdening reduction has 3 different patterns for zve128+, zve64 and zve32. They take the same iterator with different attributions. However, we need the generated function code_for_reduc (code, mode1, mode2). The implementation of code_for_reduc may look like below. code_for_reduc (code, mode1, mode2) { if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+ if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64 if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32 } Thus there will be a problem here. For example zve32, we will have code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of the ZVE128+ instead of the ZVE32 logically. This patch will merge the 3 patterns into pattern, and pass both the input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32 will be returned as expectation. Please note both GCC 13 and 14 are impacted by this issue. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: PR target/110299 * config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for modes. * config/riscv/vector-iterators.md: Remove VWLMUL1, VWLMUL1_ZVE64, VWLMUL1_ZVE32, VI_ZVE64, VI_ZVE32, VWI, VWI_ZVE64, VWI_ZVE32, VF_ZVE63 and VF_ZVE32. * config/riscv/vector.md (@pred_widen_reduc_plus<v_su><mode><vwlmul1>): Removed. (@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve64>): Ditto. (@pred_widen_reduc_plus<v_su><mode><vwlmul1_zve32>): Ditto. (@pred_widen_reduc_plus<order><mode><vwlmul1>): Ditto. (@pred_widen_reduc_plus<order><mode><vwlmul1_zve64>): Ditto. (@pred_widen_reduc_plus<v_su><VQI:mode><VHI_LMUL1:mode>): New pattern. (@pred_widen_reduc_plus<v_su><VHI:mode><VSI_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<v_su><VSI:mode><VDI_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<order><VHF:mode><VSF_LMUL1:mode>): Ditto. (@pred_widen_reduc_plus<order><VSF:mode><VDF_LMUL1:mode>): Ditto. gcc/testsuite/ChangeLog: PR target/110299 * gcc.target/riscv/rvv/base/pr110299-1.c: New test. * gcc.target/riscv/rvv/base/pr110299-1.h: New test. * gcc.target/riscv/rvv/base/pr110299-2.c: New test. * gcc.target/riscv/rvv/base/pr110299-2.h: New test. * gcc.target/riscv/rvv/base/pr110299-3.c: New test. * gcc.target/riscv/rvv/base/pr110299-3.h: New test. * gcc.target/riscv/rvv/base/pr110299-4.c: New test. * gcc.target/riscv/rvv/base/pr110299-4.h: New test.
2023-06-19	RISC-V: Bugfix for RVV float reduction in ZVE32/64	Pan Li	3	-216/+280
	The rvv integer reduction has 3 different patterns for zve128+, zve64 and zve32. They take the same iterator with different attributions. However, we need the generated function code_for_reduc (code, mode1, mode2). The implementation of code_for_reduc may look like below. code_for_reduc (code, mode1, mode2) { if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx16hf; // ZVE128+ if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx8hf; // ZVE64 if (code == max && mode1 == VNx1HF && mode2 == VNx1HF) return CODE_FOR_pred_reduc_maxvnx1hfvnx4hf; // ZVE32 } Thus there will be a problem here. For example zve32, we will have code_for_reduc (max, VNx1HF, VNx1HF) which will return the code of the ZVE128+ instead of the ZVE32 logically. This patch will merge the 3 patterns into pattern, and pass both the input_vector and the ret_vector of code_for_reduc. For example, ZVE32 will be code_for_reduc (max, VNx1HF, VNx2HF), then the correct code of ZVE32 will be returned as expectation. Please note both GCC 13 and 14 are impacted by this issue. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: PR target/110277 * config/riscv/riscv-vector-builtins-bases.cc: Adjust expand for ret_mode. * config/riscv/vector-iterators.md: Add VHF, VSF, VDF, VHF_LMUL1, VSF_LMUL1, VDF_LMUL1, and remove unused attr. * config/riscv/vector.md (@pred_reduc_<reduc><mode><vlmul1>): Removed. (@pred_reduc_<reduc><mode><vlmul1_zve64>): Ditto. (@pred_reduc_<reduc><mode><vlmul1_zve32>): Ditto. (@pred_reduc_plus<order><mode><vlmul1>): Ditto. (@pred_reduc_plus<order><mode><vlmul1_zve32>): Ditto. (@pred_reduc_plus<order><mode><vlmul1_zve64>): Ditto. (@pred_reduc_<reduc><VHF:mode><VHF_LMUL1:mode>): New pattern. (@pred_reduc_<reduc><VSF:mode><VSF_LMUL1:mode>): Ditto. (@pred_reduc_<reduc><VDF:mode><VDF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VHF:mode><VHF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VSF:mode><VSF_LMUL1:mode>): Ditto. (@pred_reduc_plus<order><VDF:mode><VDF_LMUL1:mode>): Ditto. gcc/testsuite/ChangeLog: PR target/110277 * gcc.target/riscv/rvv/base/pr110277-1.c: New test. * gcc.target/riscv/rvv/base/pr110277-1.h: New test. * gcc.target/riscv/rvv/base/pr110277-2.c: New test. * gcc.target/riscv/rvv/base/pr110277-2.h: New test.
2023-06-19	amdgcn: implement vector div and mod libfuncs	Andrew Stubbs	1	-0/+244
	Also divmod, but only for scalar modes, for now (because there are no complex int vectors yet). gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_divmod_libfunc): New function. (gcn_init_libfuncs): Add div and mod functions for all modes. Add placeholders for divmod functions. (TARGET_EXPAND_DIVMOD_LIBFUNC): Define. libgcc/ChangeLog: * config/gcn/lib2-divmod-di.c: Reimplement like lib2-divmod.c. * config/gcn/lib2-divmod.c: Likewise. * config/gcn/lib2-gcn.h: Add new types and prototypes for all the new vector libfuncs. * config/gcn/t-amdgcn: Add new files. * config/gcn/amdgcn_veclib.h: New file. * config/gcn/lib2-vec_divmod-di.c: New file. * config/gcn/lib2-vec_divmod-hi.c: New file. * config/gcn/lib2-vec_divmod-qi.c: New file. * config/gcn/lib2-vec_divmod.c: New file. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/predcom-2.c: Avoid vectors on amdgcn. * gcc.dg/unroll-8.c: Likewise. * gcc.dg/vect/slp-26.c: Change expected results on amdgdn. * lib/target-supports.exp (check_effective_target_vect_int_mod): Add amdgcn. (check_effective_target_divmod): Likewise. * gcc.target/gcn/simd-math-3-16.c: New test. * gcc.target/gcn/simd-math-3-2.c: New test. * gcc.target/gcn/simd-math-3-32.c: New test. * gcc.target/gcn/simd-math-3-4.c: New test. * gcc.target/gcn/simd-math-3-8.c: New test. * gcc.target/gcn/simd-math-3-char-16.c: New test. * gcc.target/gcn/simd-math-3-char-2.c: New test. * gcc.target/gcn/simd-math-3-char-32.c: New test. * gcc.target/gcn/simd-math-3-char-4.c: New test. * gcc.target/gcn/simd-math-3-char-8.c: New test. * gcc.target/gcn/simd-math-3-char-run-16.c: New test. * gcc.target/gcn/simd-math-3-char-run-2.c: New test. * gcc.target/gcn/simd-math-3-char-run-32.c: New test. * gcc.target/gcn/simd-math-3-char-run-4.c: New test. * gcc.target/gcn/simd-math-3-char-run-8.c: New test. * gcc.target/gcn/simd-math-3-char-run.c: New test. * gcc.target/gcn/simd-math-3-char.c: New test. * gcc.target/gcn/simd-math-3-long-16.c: New test. * gcc.target/gcn/simd-math-3-long-2.c: New test. * gcc.target/gcn/simd-math-3-long-32.c: New test. * gcc.target/gcn/simd-math-3-long-4.c: New test. * gcc.target/gcn/simd-math-3-long-8.c: New test. * gcc.target/gcn/simd-math-3-long-run-16.c: New test. * gcc.target/gcn/simd-math-3-long-run-2.c: New test. * gcc.target/gcn/simd-math-3-long-run-32.c: New test. * gcc.target/gcn/simd-math-3-long-run-4.c: New test. * gcc.target/gcn/simd-math-3-long-run-8.c: New test. * gcc.target/gcn/simd-math-3-long-run.c: New test. * gcc.target/gcn/simd-math-3-long.c: New test. * gcc.target/gcn/simd-math-3-run-16.c: New test. * gcc.target/gcn/simd-math-3-run-2.c: New test. * gcc.target/gcn/simd-math-3-run-32.c: New test. * gcc.target/gcn/simd-math-3-run-4.c: New test. * gcc.target/gcn/simd-math-3-run-8.c: New test. * gcc.target/gcn/simd-math-3-run.c: New test. * gcc.target/gcn/simd-math-3-short-16.c: New test. * gcc.target/gcn/simd-math-3-short-2.c: New test. * gcc.target/gcn/simd-math-3-short-32.c: New test. * gcc.target/gcn/simd-math-3-short-4.c: New test. * gcc.target/gcn/simd-math-3-short-8.c: New test. * gcc.target/gcn/simd-math-3-short-run-16.c: New test. * gcc.target/gcn/simd-math-3-short-run-2.c: New test. * gcc.target/gcn/simd-math-3-short-run-32.c: New test. * gcc.target/gcn/simd-math-3-short-run-4.c: New test. * gcc.target/gcn/simd-math-3-short-run-8.c: New test. * gcc.target/gcn/simd-math-3-short-run.c: New test. * gcc.target/gcn/simd-math-3-short.c: New test. * gcc.target/gcn/simd-math-3.c: New test. * gcc.target/gcn/simd-math-4-char-run.c: New test. * gcc.target/gcn/simd-math-4-char.c: New test. * gcc.target/gcn/simd-math-4-long-run.c: New test. * gcc.target/gcn/simd-math-4-long.c: New test. * gcc.target/gcn/simd-math-4-run.c: New test. * gcc.target/gcn/simd-math-4-short-run.c: New test. * gcc.target/gcn/simd-math-4-short.c: New test. * gcc.target/gcn/simd-math-4.c: New test. * gcc.target/gcn/simd-math-5-16.c: New test. * gcc.target/gcn/simd-math-5-32.c: New test. * gcc.target/gcn/simd-math-5-4.c: New test. * gcc.target/gcn/simd-math-5-8.c: New test. * gcc.target/gcn/simd-math-5-char-16.c: New test. * gcc.target/gcn/simd-math-5-char-32.c: New test. * gcc.target/gcn/simd-math-5-char-4.c: New test. * gcc.target/gcn/simd-math-5-char-8.c: New test. * gcc.target/gcn/simd-math-5-char-run-16.c: New test. * gcc.target/gcn/simd-math-5-char-run-32.c: New test. * gcc.target/gcn/simd-math-5-char-run-4.c: New test. * gcc.target/gcn/simd-math-5-char-run-8.c: New test. * gcc.target/gcn/simd-math-5-char-run.c: New test. * gcc.target/gcn/simd-math-5-char.c: New test. * gcc.target/gcn/simd-math-5-long-16.c: New test. * gcc.target/gcn/simd-math-5-long-32.c: New test. * gcc.target/gcn/simd-math-5-long-4.c: New test. * gcc.target/gcn/simd-math-5-long-8.c: New test. * gcc.target/gcn/simd-math-5-long-run-16.c: New test. * gcc.target/gcn/simd-math-5-long-run-32.c: New test. * gcc.target/gcn/simd-math-5-long-run-4.c: New test. * gcc.target/gcn/simd-math-5-long-run-8.c: New test. * gcc.target/gcn/simd-math-5-long-run.c: New test. * gcc.target/gcn/simd-math-5-long.c: New test. * gcc.target/gcn/simd-math-5-run-16.c: New test. * gcc.target/gcn/simd-math-5-run-32.c: New test. * gcc.target/gcn/simd-math-5-run-4.c: New test. * gcc.target/gcn/simd-math-5-run-8.c: New test. * gcc.target/gcn/simd-math-5-run.c: New test. * gcc.target/gcn/simd-math-5-short-16.c: New test. * gcc.target/gcn/simd-math-5-short-32.c: New test. * gcc.target/gcn/simd-math-5-short-4.c: New test. * gcc.target/gcn/simd-math-5-short-8.c: New test. * gcc.target/gcn/simd-math-5-short-run-16.c: New test. * gcc.target/gcn/simd-math-5-short-run-32.c: New test. * gcc.target/gcn/simd-math-5-short-run-4.c: New test. * gcc.target/gcn/simd-math-5-short-run-8.c: New test. * gcc.target/gcn/simd-math-5-short-run.c: New test. * gcc.target/gcn/simd-math-5-short.c: New test. * gcc.target/gcn/simd-math-5.c: New test.
2023-06-19	amdgcn: minimal V64TImode vector support	Andrew Stubbs	3	-130/+299
	Just enough support for TImode vectors to exist, load, store, move, without any real instructions available. This is primarily for the use of divmodv64di4, which uses TImode to return a pair of DImode values. gcc/ChangeLog: * config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function. * config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators. (V_MOV, V_MOV_ALT): Likewise. (scalar_mode, SCALAR_MODE): Add TImode. (vnsi, VnSI, vndi, VnDI): Likewise. (vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV. (mov<mode>, mov<mode>_unspec): Use V_MOV. (mov<mode>_4reg): New insn. (mov<mode>_exec): New 4reg variant. (mov<mode>_sgprbase): Likewise. (reload_in<mode>, reload_out<mode>): Use V_MOV. (vec_set<mode>): Likewise. (vec_duplicate<mode><exec>): New 4reg variant. (vec_extract<mode><scalar_mode>): Likewise. (vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Rename to ... (vec_extract<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV. (vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop): New 4reg variant. (fold_extract_last_<mode>): Use V_MOV. (vec_init<V_ALL:mode><V_ALL_ALT:mode>): Rename to ... (vec_init<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV. (gather_load<mode><vnsi>, gather<mode>_expr<exec>, gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>, gather<mode>_insn_2offsets<exec>): Use V_MOV. (scatter_store<mode><vnsi>, scatter<mode>_expr<exec_scatter>, scatter<mode>_insn_1offset<exec_scatter>, scatter<mode>_insn_1offset_ds<exec_scatter>, scatter<mode>_insn_2offsets<exec_scatter>): Likewise. (maskload<mode>di, maskstore<mode>di, mask_gather_load<mode><vnsi>, mask_scatter_store<mode><vnsi>): Likewise. config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p. (gcn_hard_regno_mode_ok): Likewise. (GEN_VNM): Add TImode support. (USE_TI): New macro. Separate TImode operations from non-TImode ones. (gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode, V8TImode, and V2TImode. (print_operand): Add 'J' and 'K' print codes.
2023-06-19	Fix build of aarc64	Richard Biener	1	-1/+2
	The following fixes a reference to LOOP_VINFO_MASKS array in the aarch64 backend after my changes. * config/aarch64/aarch64.cc (aarch64_vector_costs::analyze_loop_vinfo): Fix reference to LOOP_VINFO_MASKS.
2023-06-19	avr: Fix wrong array bounds warning on SFR access	Senthil Kumar Selvaraj	1	-0/+17
	The warning was raised on accessing SFRs at addresses below the default page size, as gcc considers accessing addresses in the first page of memory as suspicious. This doesn't apply to an embedded target like the avr, where both flash and RAM have zero as a valid address. Zero is also a valid address in named address spaces (__memx, flash<n> etc..). This commit implements TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID for the avr target and reports to gcc that zero is a valid address on all address spaces. It also disables flag_delete_null_pointer_checks based on the target hook, and modifies target-supports.exp to add avr to the list of targets that always keep null pointer checks. This fixes a bunch of DejaGNU failures that occur otherwise. PR target/105523 gcc/ChangeLog: * common/config/avr/avr-common.cc: Remove setting of OPT_fdelete_null_pointer_checks. * config/avr/avr.cc (avr_option_override): Clear flag_delete_null_pointer_checks if zero_address_valid. (avr_addr_space_zero_address_valid): New function. (TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID): Provide target hook. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_keeps_null_pointer_checks): Add avr. * gcc.target/avr/pr105523.c: New test.
2023-06-19	RISC-V: Add autovec FP unary operations.	Robin Dapp	1	-1/+35
	This patch adds floating-point autovec expanders for vfneg, vfabs as well as vfsqrt and the accompanying tests. Similary to the binop tests, there are flavors for zvfh now. gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>2): Add unop expanders. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP. * gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test. * gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/zvfhmin-1.c: Add unops.
2023-06-19	RISC-V: Add autovec FP binary operations.	Robin Dapp	5	-14/+137
	This implements the floating-point autovec expanders for binary operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds tests. The existing tests are split up into non-_Float16 and _Float16 flavors as we cannot rely on the zvfh extension being present. As long as we do not have full middle-end support we need -ffast-math for the tests. In order to allow proper _Float16 this patch disables general _Float16 promotion to float TARGET_ZVFH is defined similar to TARGET_ZFH or TARGET_ZHINX. gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode>3): Implement binop expander. * config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare. (enum vxrm_field_enum): Rename this... (enum fixed_point_rounding_mode): ...to this. (enum frm_field_enum): Rename this... (enum floating_point_rounding_mode): ...to this. * config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function * config/riscv/riscv.cc (riscv_const_insns): Clarify const vector handling. (riscv_libgcc_floating_mode_supported_p): Adjust comment. (riscv_excess_precision): Do not convert to float for ZVFH. * config/riscv/vector-iterators.md: Add VF_AUTO iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP. * gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP. * gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test. * lib/target-supports.exp: Add riscv_vector_hw and riscv_zvfh_hw target selectors.
2023-06-19	RISC-V: Add sign-extending variants for vmv.x.s.	Robin Dapp	2	-0/+34
	When the destination register of a vmv.x.s needs to be sign extended to XLEN we currently emit an sext insn. Since vmv.x.s performs this automatically this patch adds two instruction patterns that include sign_extend for the destination operand. gcc/ChangeLog: * config/riscv/vector-iterators.md: Add VI_QH iterator. * config/riscv/autovec-opt.md (@pred_extract_first_sextdi<mode>): New vmv.x.s pattern that includes sign extension. (@pred_extract_first_sextsi<mode>): Dito for SImode. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: Ensure that no sext insns are present. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: Dito. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: Dito. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: Dito.
2023-06-19	RISC-V: Implement vec_set and vec_extract.	Robin Dapp	3	-2/+132
	This implements the vec_set and vec_extract patterns for integer and floating-point data types. For vec_set we broadcast the insert value to a vector register and then perform a vslideup with effective length 1 to the requested index. vec_extract is done by sliding down the requested element to index 0 and v(f)mv.[xf].s to a scalar register. The patch does not include vector-vector extraction which will be done at a later time. gcc/ChangeLog: * config/riscv/autovec.md (vec_set<mode>): Implement. (vec_extract<mode><vel>): Implement. * config/riscv/riscv-protos.h (enum insn_type): Add slide insn. (emit_vlmax_slide_insn): Declare. (emit_nonvlmax_slide_tu_insn): Declare. (emit_scalar_move_insn): Export. (emit_nonvlmax_integer_move_insn): Export. * config/riscv/riscv-v.cc (emit_vlmax_slide_insn): New function. (emit_nonvlmax_slide_tu_insn): New function. (emit_vlmax_masked_mu_insn): No change. (emit_vlmax_integer_move_insn): Export. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_extract-zvfh-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/vec_set-zvfh-run.c: New test.
2023-06-19	avr: Fix ICE on optimize attribute.	Senthil Kumar Selvaraj	1	-2/+2
	This commit fixes an ICE when an optimize attribute changes the prevailing optimization level. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105069 describes the same ICE for the sh target, where the fix was to enable save/restore of target specific options modified via TARGET_OPTIMIZATION_TABLE hook. For the AVR target, mgas-isr-prologues and -mmain-is-OS_task are those target specific options. As they enable generation of more optimal code, this commit adds the Optimization option property to those option records, and that fixes the ICE. Regression run shows no regressions, and >100 new PASSes. PR target/110086 gcc/ChangeLog: * config/avr/avr.opt (mgas-isr-prologues, mmain-is-OS_task): Add Optimization option property. gcc/testsuite/ChangeLog: * gcc.target/avr/pr110086.c: New test.