riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2023-06-04	RISC-V: Support RVV FP16 ZVFHMIN intrinsic API	Pan Li	3	-1/+17
	This patch support the 2 intrinsic API of FP16 ZVFHMIN extension. Aka SEW=16 for below instructions vfwcvt.f.f.v vfncvt.f.f.w Then users can leverage the instrinsic APIs to perform the conversion between RVV vector single float point and half float point. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat32mf2_t): Add vfloat32mf2_t type to vfncvt.f.f.w operations. (vfloat32m1_t): Likewise. (vfloat32m2_t): Likewise. (vfloat32m4_t): Likewise. (vfloat32m8_t): Likewise. * config/riscv/riscv-vector-builtins.def: Fix typo in comments. * config/riscv/vector-iterators.md: Add single to half machine mode conversion. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfhmin-intrinsic.c: New test.
2023-06-04	RISC-V: Move optimization patterns into autovec-opt.md	Juzhe-Zhong	3	-91/+92
	Move all optimization patterns into autovec-opt.md to make organization easier maintain. gcc/ChangeLog: * config/riscv/autovec-opt.md (<optab>not<mode>): Move to autovec-opt.md. (n<optab><mode>): Ditto. * config/riscv/autovec.md (<optab>not<mode>): Ditto. (n<optab><mode>): Ditto. * config/riscv/vector.md: Ditto.
2023-06-04	PR target/110083: Fix-up REG_EQUAL notes on COMPARE in STV.	Roger Sayle	1	-0/+33
	This patch fixes PR target/110083, an ICE-on-valid regression exposed by my recent PTEST improvements (to address PR target/109973). The latent bug (admittedly mine) is that the scalar-to-vector (STV) pass doesn't update or delete REG_EQUAL notes attached to COMPARE instructions. As a result the operands of COMPARE would be mismatched, with the register transformed to V1TImode, but the immediate operand left as const_wide_int, which is valid for TImode but not V1TImode. This remained latent when the STV conversion converted the mode of the COMPARE to CCmode, with later passes recognizing the REG_EQUAL note is obviously invalid as the modes didn't match, but now that we (correctly) preserve the CCZmode on COMPARE, the mismatched operand modes trigger a sanity checking ICE downstream. Fixed by updating (or deleting) any REG_EQUAL notes in convert_compare. Before: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_wide_int 0x80000000000000000000000000000000)) After: (expr_list:REG_EQUAL (compare:CCZ (reg:V1TI 119 [ ivin.29_38 ]) (const_vector:V1TI [ (const_wide_int 0x80000000000000000000000000000000) ])) 2023-06-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/110083 * config/i386/i386-features.cc (scalar_chain::convert_compare): Update or delete REG_EQUAL notes, converting CONST_INT and CONST_WIDE_INT immediate operands to a suitable CONST_VECTOR. gcc/testsuite/ChangeLog PR target/110083 * gcc.target/i386/pr110083.c: New test case.
2023-06-04	RISC-V: Support RVV zvfh{min} vfloat16*_t mov and spill	Pan Li	4	-0/+68
	This patch would like to allow the mov and spill operation for the RVV vfloat16_t types. The involved machine mode includes VNx1HF, VNx2HF, VNx4HF, VNx8HF, VNx16HF, VNx32HF and VNx64HF. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: config/riscv/riscv-vector-builtins-types.def (vfloat16mf4_t): Add the float16 type to DEF_RVV_F_OPS. (vfloat16mf2_t): Likewise. (vfloat16m1_t): Likewise. (vfloat16m2_t): Likewise. (vfloat16m4_t): Likewise. (vfloat16m8_t): Likewise. * config/riscv/riscv.md: Add vfloat16_t to attr mode. config/riscv/vector-iterators.md: Add vfloat16_t machine mode to V, V_WHOLE, V_FRACT, VINDEX, VM, VEL and sew. config/riscv/vector.md: Add vfloat16_t machine mode to sew, vlmul and ratio. gcc/testsuite/ChangeLog: gcc.target/riscv/rvv/base/mov-14.c: New test. * gcc.target/riscv/rvv/base/spill-13.c: New test.
2023-06-03	[RISC-V] fix cfi issue in save-restore.	Fei Gao	1	-2/+2
	This patch fixes a cfi issue introduced by https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=60524be1e3929d83e15fceac6e2aa053c8a6fb20 Test code: char my_getchar(); float getf(); int test_f0() { int s0 = my_getchar(); float f0 = getf(); int b = my_getchar(); return f0+s0+b; } cflags: -g -Os -march=rv32imafc -mabi=ilp32f -msave-restore -mcmodel=medlow before patch: test_f0: ... .cfi_startproc call t0,__riscv_save_1 .cfi_offset 8, -8 .cfi_offset 1, -4 .cfi_def_cfa_offset 16 ... addi sp,sp,-16 .cfi_def_cfa_offset 32 ... addi sp,sp,16 .cfi_def_cfa_offset 0 // issue here ... tail __riscv_restore_1 .cfi_restore 8 .cfi_restore 1 .cfi_def_cfa_offset -16 // issue here .cfi_endproc after patch: test_f0: ... .cfi_startproc call t0,__riscv_save_1 .cfi_offset 8, -8 .cfi_offset 1, -4 .cfi_def_cfa_offset 16 ... addi sp,sp,-16 .cfi_def_cfa_offset 32 ... addi sp,sp,16 .cfi_def_cfa_offset 16 // corrected here ... tail __riscv_restore_1 .cfi_restore 8 .cfi_restore 1 .cfi_def_cfa_offset 0 // corrected here .cfi_endproc gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_epilogue): fix cfi issue with correct offset.
2023-06-03	Remove unnecessary md pattern for TARGET_XTHEADCONDMOV	Die Li	1	-14/+1
	There are 2 small changes in this patch, but they do not affect the result. 1. Remove unnecessary md pattern for TARGET_XTHEADCONDMOV in thead.md. The operands[4] in "if_then_else" are always comparison operations, so the generated rtl does not match the pattern that is expected to be deleted. 2. Change operands[4] from const0_rtx to operands[1] to maintain rtl consistency. Although when output assembly, only operands[4] CODE will affect the output result. Signed-off-by: Die Li <lidie@eswincomputing.com> gcc/ChangeLog: * config/riscv/thead.md (*th_cond_gpr_mov<GPR:mode><GPR2:mode>): Delete.
2023-06-03	RISC-V: Fix warning in predicated.md	Juzhe-Zhong	1	-1/+1
	Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function â€˜bool arith_operand_or_mode_mask(rtx, machine_mode)â€™: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] \|\| INTVAL (op) == GET_MODE_MASK (SImode)")))) gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL.
2023-06-03	RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction ↵	Juzhe-Zhong	2	-1/+82
	optimizations This patch is to enhance vwmul.vv combine optimizations. Consider this following code: void vwadd_int16_t_int8_t (int16_t __restrict dst, int16_t __restrict dst2, int16_t __restrict dst3, int16_t __restrict dst4, int8_t __restrict a, int8_t __restrict b, int8_t __restrict a2, int8_t __restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } In such complicate case, the operand is not single used, used by multiple statements. GCC combine optimization will iterate the combination of the operands. Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization. Currently, we have format: (mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling. Now, we add a new vwmulsu.ww with this format: (mult: (zero_extend) (sign_extend)) To handle this following cases (sign and unsigned widening multiplication mixing codes): void vwadd_int16_t_int8_t (int16_t __restrict dst, int16_t __restrict dst2, int16_t __restrict dst3, int16_t __restrict dst4, int8_t __restrict a, uint8_t __restrict b, uint8_t __restrict a2, int8_t __restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] * (int16_t) b[i]; dst2[i] = (int16_t) a2[i] * (int16_t) b[i]; dst3[i] = (int16_t) a2[i] * (int16_t) a[i]; dst4[i] = (int16_t) a[i] * (int16_t) b2[i]; } } Before this patch: ... vsext.vf2 v6,v1 add t0,a0,t4 vzext.vf2 v4,v1 vmul.vv v2,v4,v6 add t0,a1,t4 vzext.vf2 v2,v1 vmul.vv v4,v2,v4 add t0,a2,t4 vmul.vv v2,v2,v6 add t0,a3,t4 sub t6,t6,t1 vsext.vf2 v2,v1 vmul.vv v2,v2,v6 ... After this patch: ... add t0,a0,t3 vwmulsu.vv v2,v1,v3 add t0,a1,t3 vwmulu.vv v4,v3,v2 add t0,a2,t3 vwmulsu.vv v3,v1,v2 add t0,a3,t3 sub t4,t4,t1 vwmul.vv v2,v1,v3 ... gcc/ChangeLog: * config/riscv/vector.md: Add vector-opt.md. * config/riscv/autovec-opt.md: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
2023-06-03	i386: Add missing vector truncate patterns [PR92658].	liuhongt	1	-0/+21
	Add missing insn patterns for v2si -> v2hi/v2qi and v2hi-> v2qi vector truncate. gcc/ChangeLog: PR target/92658 * config/i386/mmx.md (truncv2hiv2qi2): New define_insn. (truncv2si<mode>2): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr92658-avx512bw-trunc-2.c: New test.
2023-06-02	Darwin, PPC: Fix struct layout with pragma pack [PR110044].	Iain Sandoe	1	-1/+2
	This bug was essentially that darwin_rs6000_special_round_type_align() was ignoring externally-imposed capping of field alignment. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR target/110044 gcc/ChangeLog: * config/rs6000/rs6000.cc (darwin_rs6000_special_round_type_align): Make sure that we do not have a cap on field alignment before altering the struct layout based on the type alignment of the first entry. gcc/testsuite/ChangeLog: * gcc.target/powerpc/darwin-abi-13-0.c: New test. * gcc.target/powerpc/darwin-abi-13-1.c: New test. * gcc.target/powerpc/darwin-abi-13-2.c: New test. * gcc.target/powerpc/darwin-structs-0.h: New test.
2023-06-02	rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, ↵	Carl Love	1	-2/+2
	__builtin_altivec_tr_stxvrhx The third argument for __builtin_altivec_tr_stxvrhx should be short * not int . Similarly, the third argument for __builtin_altivec_tr_stxvrwx should be int not short . This patch fixes the arguments in the two builtins. A runnable test case is added to test the __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and __builtin_altivec_tr_stxvrdx builtins. gcc/ config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx): Fix type of third argument. gcc/testsuite/ * gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.
2023-06-02	target/110088: Improve operation of l-reg with const after move from d-reg.	Georg-Johann Lay	1	-1/+40
	After reload, there may be sequences like lreg = dreg lreg = lreg <op> const with an LD_REGS dreg, non-LD_REGS lreg, and <op> in PLUS, IOR, AND. If dreg dies after the first insn, it is possible to use dreg = dreg <op> const lreg = dreg instead which is more efficient. gcc/ PR target/110088 * config/avr/avr.md: Add an RTL peephole to optimize operations on non-LD_REGS after a move from LD_REGS. (piaop): New code iterator.
2023-06-02	RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid	Juzhe-Zhong	2	-9/+13
	Base on these: https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232 https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233 Add _mu C++ overloaded intrinsics for load && viota && vid. Co-authored-by: KuanLin Chen <best124612@gmail.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded intrinsics. * config/riscv/riscv-vector-builtins-shapes.cc (struct fault_load_def): Ditto.
2023-06-02	RISC-V: Optimize reverse series index vector	Juzhe-Zhong	1	-0/+17
	This patch optimizes the following seriese vector: [nunits - 1, nunits - 2, ...., 0] Before this patch: vid vmul vadd After this patch: vid vrsub This patch is an obvious and simple optimization, ok for trunk? gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_series): Optimize reverse series index vector. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Add assembly check.
2023-06-02	RISC-V: Fix warning in predicated.md	Juzhe-Zhong	1	-1/+1
	Notice there is warning in predicates.md: ../../../riscv-gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mode_mask(rtx, machine_mode)’: ../../../riscv-gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] (match_test "INTVAL (op) == GET_MODE_MASK (HImode) ../../../riscv-gcc/gcc/config/riscv/predicates.md:34:20: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] \|\| INTVAL (op) == GET_MODE_MASK (SImode)")))) gcc/ChangeLog: * config/riscv/predicates.md: Change INTVAL into UINTVAL.
2023-06-02	RISC-V: Add __RISCV_ prefix to VXRM and FRM enum	Juzhe-Zhong	1	-4/+4
	According to doc: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/222/files https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226 Add __RISCV_ prefix to VXRM and FRM enum. gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (DEF_RVV_VXRM_ENUM): Add __RISCV_ prefix. (DEF_RVV_FRM_ENUM): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/frm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-1.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-10.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-11.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-12.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-6.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-7.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-8.c: Ditto. * gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
2023-06-02	RISC-V: Add vwadd.wv/vwsub.wv auto-vectorization lowering optimization	Juzhe-Zhong	2	-6/+31
	1. This patch optimize the codegen of the following auto-vectorization codes: void foo (int32_t * __restrict a, int64_t * __restrict b, int64_t * __restrict c, int n) { for (int i = 0; i < n; i++) c[i] = (int64_t)a[i] + b[i]; } Combine instruction from: ... vsext.vf2 vadd.vv ... into: ... vwadd.wv ... Since for PLUS operation, GCC prefer the following RTL operand order when combining: (plus: (sign_extend:..) (reg:) instead of (plus: (reg:..) (sign_extend:) which is different from MINUS pattern. I split patterns of vwadd/vwsub, and add dedicated patterns for them. 2. This patch not only optimize the case as above (1) mentioned, also enhance vwadd.vv/vwsub.vv optimization for complicate PLUS/MINUS codes, consider this following codes: __attribute__ ((noipa)) void vwadd_int16_t_int8_t (int16_t __restrict dst, int16_t __restrict dst2, int16_t __restrict dst3, int8_t __restrict a, int8_t __restrict b, int8_t __restrict a2, int8_t __restrict b2, int n) { for (int i = 0; i < n; i++) { dst[i] = (int16_t) a[i] + (int16_t) b[i]; dst2[i] = (int16_t) a2[i] + (int16_t) b[i]; dst3[i] = (int16_t) a2[i] + (int16_t) a[i]; } } Before this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v2,0(a3) vle8.v v1,0(a4) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v3,v2 vsext.vf2 v2,v1 vadd.vv v1,v2,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v1,0(a0) vle8.v v4,0(a5) vsetvli t1,zero,e16,m1,ta,ma vsext.vf2 v1,v4 vadd.vv v2,v1,v2 ... After this patch: ... vsetvli zero,a6,e8,mf2,ta,ma vle8.v v3,0(a4) vle8.v v1,0(a3) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vv v2,v1,v3 vsetvli zero,a6,e16,m1,ta,ma vse16.v v2,0(a0) vle8.v v2,0(a5) vsetvli t4,zero,e8,mf2,ta,ma vwadd.vv v4,v3,v2 vsetvli zero,a6,e16,m1,ta,ma vse16.v v4,0(a1) vsetvli t4,zero,e8,mf2,ta,ma sub a7,a7,a6 vwadd.vv v3,v2,v1 vsetvli zero,a6,e16,m1,ta,ma vse16.v v3,0(a2) ... The reason why current upstream GCC can not optimize codes using vwadd thoroughly is combine PASS needs intermediate RTL IR (extend one of the operand pattern (vwadd.wv)), then base on this intermediate RTL IR, extend the other operand to generate vwadd.vv. So vwadd.wv/vwsub.wv definitely helps to vwadd.vv/vwsub.vv code optimizations. gcc/ChangeLog: config/riscv/riscv-vector-builtins-bases.cc: Change vwadd.wv/vwsub.wv intrinsic API expander * config/riscv/vector.md (@pred_single_widen_<plus_minus:optab><any_extend:su><mode>): Remove it. (@pred_single_widen_sub<any_extend:su><mode>): New pattern. (@pred_single_widen_add<any_extend:su><mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: New test.
2023-06-02	RISC-V: Support RVV permutation auto-vectorization	Juzhe-Zhong	4	-0/+177
	This patch supports vector permutation for VLS only by vec_perm pattern. We will support TARGET_VECTORIZE_VEC_PERM_CONST to support VLA permutation in the future. Fixed following comments from Robin. gcc/ChangeLog: * config/riscv/autovec.md (vec_perm<mode>): New pattern. * config/riscv/predicates.md (vector_perm_operand): New predicate. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_perm): New function. * config/riscv/riscv-v.cc (const_vec_all_in_range_p): Ditto. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Ditto. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_vec_perm): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-7.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-6.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm_run-7.c: New test.
2023-06-01	xtensa: Add 'adddi3' and 'subdi3' insn patterns	Takayuki 'January June' Suwa	1	-0/+52
	More optimized than the default RTL generation. gcc/ChangeLog: * config/xtensa/xtensa.md (adddi3, subdi3): New RTL generation patterns implemented according to the instruc- tion idioms described in the Xtensa ISA reference manual (p. 600).
2023-06-01	PR target/109973: CCZmode and CCCmode variants of [v]ptest on x86.	Roger Sayle	6	-28/+76
	This is my proposed minimal fix for PR target/109973 (hopefully suitable for backporting) that follows Jakub Jelinek's suggestion that we introduce CCZmode and CCCmode variants of ptest and vptest, so that the i386 backend treats [v]ptest instructions similarly to testl instructions; using different CCmodes to indicate which condition flags are desired, and then relying on the RTL cmpelim pass to eliminate redundant tests. This conveniently matches Intel's intrinsics, that provide different functions for retrieving different flags, _mm_testz_si128 tests the Z flag, _mm_testc_si128 tests the carry flag. Currently we use the same instruction (pattern) for both, and unfortunately the ptest<mode>_and optimization is only valid when the ptest/vptest instruction is used to set/test the Z flag. The downside, as predicted by Jakub, is that GCC's cmpelim pass is currently COMPARE-centric and not able to merge the ptests from expressions such as _mm256_testc_si256 (a, b) + _mm256_testz_si256 (a, b), which is a known issue, PR target/80040. 2023-06-01 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR target/109973 config/i386/i386-builtin.def (__builtin_ia32_ptestz128): Use new CODE_for_sse4_1_ptestzv2di. (__builtin_ia32_ptestc128): Use new CODE_for_sse4_1_ptestcv2di. (__builtin_ia32_ptestz256): Use new CODE_for_avx_ptestzv4di. (__builtin_ia32_ptestc256): Use new CODE_for_avx_ptestcv4di. * config/i386/i386-expand.cc (ix86_expand_branch): Use CCZmode when expanding UNSPEC_PTEST to compare against zero. * config/i386/i386-features.cc (scalar_chain::convert_compare): Likewise generate CCZmode UNSPEC_PTESTs when converting comparisons. (general_scalar_chain::convert_insn): Use CCZmode for COMPARE result. (timode_scalar_chain::convert_insn): Use CCZmode for COMPARE result. * config/i386/i386-protos.h (ix86_match_ptest_ccmode): Prototype. * config/i386/i386.cc (ix86_match_ptest_ccmode): New predicate to check for suitable matching modes for the UNSPEC_PTEST pattern. * config/i386/sse.md (define_split): When splitting UNSPEC_MOVMSK to UNSPEC_PTEST, preserve the FLAG_REG mode as CCZ. (<sse4_1>_ptest<mode>): Add asterisk to hide define_insn. Remove ":CC" mode of FLAGS_REG, instead use ix86_match_ptest_ccmode. (<sse4_1>_ptestz<mode>): New define_expand to specify CCZ. (<sse4_1>_ptestc<mode>): New define_expand to specify CCC. (<sse4_1>_ptest<mode>): A define_expand using CC to preserve the current behavior. (ptest<mode>_and): Specify CCZ to only perform this optimization when only the Z flag is required. gcc/testsuite/ChangeLog PR target/109973 * gcc.target/i386/pr109973-1.c: New test case. * gcc.target/i386/pr109973-2.c: Likewise.
2023-06-01	aarch64: Add =r,m and =m,r alternatives to 64-bit vector move patterns	Kyrylo Tkachov	1	-17/+23
	We can use the X registers to load and store 64-bit vector modes, we just need to add the alternatives to the mov patterns. This straightforward patch does that and for the pair variants too. For the testcase in the code we now generate the optimal assembly without any superfluous GP<->SIMD moves. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_mov<VDMOV:mode>): Add =r,m and =r,m alternatives. (load_pair<DREG:mode><DREG2:mode>): Likewise. (vec_store_pair<DREG:mode><DREG2:mode>): Likewise. gcc/testsuite/ChangeLog: gcc.target/aarch64/xreg-vec-modes_1.c: New test.
2023-06-01	RISC-V: Introduce vfloat16m{f}*_t and their machine mode.	Pan Li	6	-10/+47
	This patch would like to introduce the built-in type vfloat16m{f}_t, as well as their machine mode VNxHF. They depend on architecture zvfhmin or zvfh. When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will be true. The underlying PATCH will implement the zvfhmin extension based on this. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin and zvfh. * config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16. (main): Disable FP16 tuple. * config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro. (TARGET_VECTOR_ELEN_FP_16): Ditto. * config/riscv/riscv-vector-builtins.cc (check_required_extensions): Add FP16. * config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type. (vfloat16mf2_t): Ditto. (vfloat16m1_t): Ditto. (vfloat16m2_t): Ditto. (vfloat16m4_t): Ditto. (vfloat16m8_t): Ditto. * config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16): New macro. * config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16 machine mode based on TARGET_VECTOR_ELEN_FP_16.
2023-06-01	RISC-V: Add RVV FRM enum for floating-point rounding mode intriniscs	Juzhe-Zhong	2	-0/+26
	gcc/ChangeLog: * config/riscv/riscv-vector-builtins.cc (register_frm): New function. (DEF_RVV_FRM_ENUM): New macro. (handle_pragma_vector): Add FRM enum * config/riscv/riscv-vector-builtins.def (DEF_RVV_FRM_ENUM): New macro. (RNE): Ditto. (RTZ): Ditto. (RDN): Ditto. (RUP): Ditto. (RMM): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/frm-1.c: New test.
2023-05-31	aarch64: PR target/99195 Annotate dot-product patterns for vec-concat-zero	Kyrylo Tkachov	1	-5/+5
	This straightforward patch annotates the dotproduct instructions, including the i8mm ones. Tests included. Nothing unexpected here. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (<sur>dot_prod<vsi2qi>): Rename to... (<sur>dot_prod<vsi2qi><vczle><vczbe>): ... This. (usdot_prod<vsi2qi>): Rename to... (usdot_prod<vsi2qi><vczle><vczbe>): ... This. (aarch64_<sur>dot_lane<vsi2qi>): Rename to... (aarch64_<sur>dot_lane<vsi2qi><vczle><vczbe>): ... This. (aarch64_<sur>dot_laneq<vsi2qi>): Rename to... (aarch64_<sur>dot_laneq<vsi2qi><vczle><vczbe>): ... This. (aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi>): Rename to... (aarch64_<DOTPROD_I8MM:sur>dot_lane<VB:isquadop><VS:vsi2qi><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_11.c: New test.
2023-05-31	aarch64: PR target/99195 Annotate saturating mult patterns for vec-concat-zero	Kyrylo Tkachov	1	-9/+9
	This patch goes through the various alphabet soup saturating multiplication patterns, including those in TARGET_RDMA and annotates them with <vczle><vczbe>. Many other patterns are widening and always write the full 128-bit vectors so this annotation doesn't apply to them. Nothing out of the ordinary in this patch. Bootstrapped and tested on aarch64-none-linux and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_sq<r>dmulh<mode>): Rename to... (aarch64_sq<r>dmulh<mode><vczle><vczbe>): ... This. (aarch64_sq<r>dmulh_n<mode>): Rename to... (aarch64_sq<r>dmulh_n<mode><vczle><vczbe>): ... This. (aarch64_sq<r>dmulh_lane<mode>): Rename to... (aarch64_sq<r>dmulh_lane<mode><vczle><vczbe>): ... This. (aarch64_sq<r>dmulh_laneq<mode>): Rename to... (aarch64_sq<r>dmulh_laneq<mode><vczle><vczbe>): ... This. (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode>): Rename to... (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h<mode><vczle><vczbe>): ... This. (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Rename to... (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode><vczle><vczbe>): ... This. (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Rename to... (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add tests for qdmulh, qrdmulh. * gcc.target/aarch64/simd/pr99195_10.c: New test.
2023-05-31	RISC-V: Add vwadd<u>/vwsub<u>/vwmul<u>/vwmulsu.vv lowering optimizaiton for ↵	Juzhe-Zhong	1	-4/+12
	RVV auto-vectorization Base on V1 patch, adding comment: ;; Use define_insn_and_split to define vsext.vf2/vzext.vf2 will help combine PASS ;; to combine instructions as below: ;; vsext.vf2 + vsext.vf2 + vadd.vv ==> vwadd.vv gcc/ChangeLog: * config/riscv/autovec.md (<optab><v_double_trunc><mode>2): Change expand into define_insn_and_split. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: * gcc.target/riscv/rvv/autovec/widen/widen-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-4.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-4.c: New test.
2023-05-31	RISC-V: Remove FRM for vfwcvt (RVV float to float widening conversion)	Juzhe-Zhong	1	-3/+1
	Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-31	RISC-V: Remove FRM for vfwcvt.f.x<u>.v (RVV integer to float widening ↵	Juzhe-Zhong	1	-3/+1
	conversion) Base on the discussion here: https://github.com/riscv/riscv-v-spec/issues/884 vfwcvt.f.x<u>.v doesn't depend on FRM. So remove FRM preparing for mode switching support. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-31	RISC-V: Remove FRM for vfncvt.rod instruction	Juzhe-Zhong	1	-3/+1
	Apparently, vfncvt.rod rounding mode is encoded, so we don't need FRM. gcc/ChangeLog: * config/riscv/vector.md: Remove FRM. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-31	aarch64: Add pattern for bswap + rotate [PR 110039]	Christophe Lyon	1	-0/+10
	After commit g:d8545fb2c71683f407bfd96706103297d4d6e27b, we missed a pattern to match the new GIMPLE form. With this patch, gcc.target/aarch64/rev16_2.c passes again. 2023-05-31 Christophe Lyon <christophe.lyon@linaro.org> PR target/110039 gcc/ * config/aarch64/aarch64.md (aarch64_rev16si2_alt3): New pattern.
2023-05-31	aarch64: Simplify output template emission code for a few patterns	Kyrylo Tkachov	2	-99/+40
	If the output code for a define_insn just does a switch (which_alternative) with no other computation we can almost always replace it with more compact MD syntax for each alternative in a mult-alternative '@' block. This patch cleans up some such patterns in the aarch64 backend, making them shorter and more concise. No behavioural change intended. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_mov<VDMOV:mode>): Rewrite output template to avoid explicit switch on which_alternative. (aarch64_simd_mov<VQMOV:mode>): Likewise. (and<mode>3): Likewise. (ior<mode>3): Likewise. * config/aarch64/aarch64.md (*mov<mode>_aarch64): Likewise.
2023-05-31	xtensa: Improve "*shlrd_reg" insn pattern and its variant	Takayuki 'January June' Suwa	2	-26/+58
	The insn "shlrd_reg" shifts two registers with a funnel shifter by the third register to get a single word result: reg0 = (reg1 SHIFT_OP0 reg3) BIT_JOIN_OP (reg2 SHIFT_OP1 (32 - reg3)) where the funnel left shift is SHIFT_OP0 := ASHIFT, SHIFT_OP1 := LSHIFTRT and its right shift is SHIFT_OP0 := LSHIFTRT, SHIFT_OP1 := ASHIFT, respectively. And also, BIT_JOIN_OP can be either PLUS or IOR in either shift direction. [(set (match_operand:SI 0 "register_operand" "=a") (match_operator:SI 6 "xtensa_bit_join_operator" [(match_operator:SI 4 "logical_shift_operator" [(match_operand:SI 1 "register_operand" "r") (match_operand:SI 3 "register_operand" "r")]) (match_operator:SI 5 "logical_shift_operator" [(match_operand:SI 2 "register_operand" "r") (neg:SI (match_dup 3))])]))] Although the RTL matching template can express it as above, there is no way of direcing that the operator (operands[6]) that combines the two individual shifts is commutative. Thus, if multiple insn sequences matching the above pattern appear adjacently, the combiner may accidentally mix them up and get partial results. This patch adds a new insn-and-split pattern with the two sides swapped representation of the bit-combining operation that was lacking and described above. And also changes the other "shlrd" variants from previously describing the arbitraryness of bit-combining operations with code iterators to a combination of the match_operator and the predicate above. gcc/ChangeLog: * config/xtensa/predicates.md (xtensa_bit_join_operator): New predicate. * config/xtensa/xtensa.md (ior_op): Remove. (shlrd_reg): Rename from "shlrd_reg_<code>", and add the insn_and_split pattern of the same name to express and capture the bit-combining operation with both sides swapped. In addition, replace use of code iterator with new operator predicate. (shlrd_const, shlrd_per_byte): Likewise regarding the code iterator.
2023-05-31	RISC-V: Add ZVFH extension to the -march= option	Pan Li	1	-0/+2
	This patch would like to add new sub extension (aka ZVFH) to the -march= option. To make it simple, only the sub extension itself is involved in this patch, and the underlying FP16 related RVV intrinsic API depends on the TARGET_ZVFH. The Zvfh extension depends on the Zve32f and Zfhmin extensions. You can locate more information about ZVFH from below spec doc. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#185-zvfh-vector-extension-for-half-precision-floating-point Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * common/config/riscv/riscv-common.cc: (riscv_implied_info): Add zvfh item. (riscv_ext_version_table): Ditto. (riscv_ext_flag_table): Ditto. * config/riscv/riscv-opts.h (MASK_ZVFH): New macro. (TARGET_ZVFH): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-21.c: New test. * gcc.target/riscv/predef-27.c: New test.
2023-05-30	i386: Fix misleading identation in i386-expand.cc [PR110041]	Uros Bizjak	1	-12/+12
	gcc/ChangeLog: PR target/110041 * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Fix misleading identation.
2023-05-30	RISC-V: Allow all const_vec_duplicates as constants.	Robin Dapp	1	-6/+17
	As we can always broadcast an integer constant to a vector register allow them in riscv_const_insns. We need as many instructions as it takes to generate the constant and one vmv.vx. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Allow const_vec_duplicates. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Add vmv.v.x tests. * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-run.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito. * gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Dito.
2023-05-30	aarch64: Convert ADDLP and ADALP patterns to standard RTL codes	Kyrylo Tkachov	3	-19/+74
	This patch converts the patterns for the integer widen and pairwise-add instructions to standard RTL operations. The pairwise addition withing a vector can be represented as an addition of two vec_selects, one selecting the even elements, and one selecting odd. Thus for the intrinsic vpaddlq_s8 we can generate: (set (reg:V8HI 92) (plus:V8HI (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 0 [0]) (const_int 2 [0x2]) (const_int 4 [0x4]) (const_int 6 [0x6]) (const_int 8 [0x8]) (const_int 10 [0xa]) (const_int 12 [0xc]) (const_int 14 [0xe]) ])) (vec_select:V8HI (sign_extend:V16HI (reg/v:V16QI 93 [ a ])) (parallel [ (const_int 1 [0x1]) (const_int 3 [0x3]) (const_int 5 [0x5]) (const_int 7 [0x7]) (const_int 9 [0x9]) (const_int 11 [0xb]) (const_int 13 [0xd]) (const_int 15 [0xf]) ])))) Similarly for the accumulating forms where there's an extra outer PLUS for the accumulation. We already have the handy helper functions aarch64_stepped_int_parallel_p and aarch64_gen_stepped_int_parallel defined in aarch64.cc that we can make use of to define the right predicate for the VEC_SELECT PARALLEL. This patch allows us to remove some code iterators and the UNSPEC definitions for SADDLP and UADDLP. UNSPEC_UADALP and UNSPEC_SADALP are retained because they are used by SVE2 patterns still. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>): Delete. (aarch64_<su>adalp<mode>): New define_expand. (aarch64_<su>adalp<mode><vczle><vczbe>_insn): New define_insn. (aarch64_<su>addlp<mode>): Convert to define_expand. (aarch64_<su>addlp<mode><vczle><vczbe>_insn): New define_insn. * config/aarch64/iterators.md (UNSPEC_SADDLP, UNSPEC_UADDLP): Delete. (ADALP): Likewise. (USADDLP): Likewise. * config/aarch64/predicates.md (vect_par_cnst_even_or_odd_half): Define.
2023-05-30	aarch64: Reimplement v(r)hadd and vhsub intrinsics with RTL codes	Kyrylo Tkachov	2	-16/+95
	This patch reimplements the MD patterns for the UHADD,SHADD,UHSUB,SHSUB,URHADD,SRHADD instructions using standard RTL operations rather than unspecs. The correct RTL representations involves widening the inputs before adding them and halving, followed by a truncation back to the original mode. An unfortunate wart in the patch is that we end up having very similar expanders for the intrinsics through the aarch64_<su>h<ADDSUB:optab><mode> and aarch64_<su>rhadd<mode> names and the standard names for the vector averaging optabs <su>avg<mode>3_floor and <su>avg<mode>3_ceil. I'd like to reuse <su>avg<mode>3_ceil for the intrinsics builtin as well but our scheme in aarch64-simd-builtins.def and aarch64-builtins.cc makes it awkward by only allowing mappings of entries in aarch64-simd-builtins.def to: 0 - CODE_FOR_aarch64_<name><mode> 1-9 - CODE_FOR_<name><mode><1-9> 10 - CODE_FOR_<name><mode> whereas here we want a string after the <mode> i.e. CODE_FOR_uavg<mode>3_ceil. This patch adds a bit of remapping logic in aarch64-builtins.cc before the construction of the builtin info that remaps the CODE_FOR_* definitions in aarch64-simd-builtins.def to the optab-derived ones. CODE_FOR_aarch64_srhaddv4si gets remapped to CODE_FOR_avgv4si3_ceil, for example. It's a bit specific to this case, but this solution requires the least invasive changes while avoiding having duplicate expanders just for the sake of a different pattern name. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (VAR1): Move to after inclusion of aarch64-builtin-iterators.h. Add definition to remap shadd, uhadd, srhadd, urhadd builtin codes for standard optab ones. * config/aarch64/aarch64-simd.md (<u>avg<mode>3_floor): Rename to... (<su_optab>avg<mode>3_floor): ... This. Expand to RTL codes rather than unspec. (<u>avg<mode>3_ceil): Rename to... (<su_optab>avg<mode>3_ceil): ... This. Expand to RTL codes rather than unspec. (aarch64_<su>hsub<mode>): New define_expand. (aarch64_<sur>h<addsub><mode><vczle><vczbe>): Split into... (aarch64_<su>h<ADDSUB:optab><mode><vczle><vczbe>_insn): ... This... (aarch64_<su>rhadd<mode><vczle><vczbe>_insn): ... And this.
2023-05-30	riscv: update riscv_asan_shadow_offset	Andreas Schwab	1	-4/+3
	gcc/ PR target/110036 * config/riscv/riscv.cc (riscv_asan_shadow_offset): Update to match libsanitizer.
2023-05-30	stor-layout, aarch64: Express SRA intrinsics with RTL codes	Kyrylo Tkachov	6	-26/+171
	This patch expresses the intrinsics for the SRA and RSRA instructions with standard RTL codes rather than relying on UNSPECs. These instructions perform a vector shift right plus accumulate with an optional rounding constant addition for the RSRA variant. There are a number of interesting points: * The scalar-in-SIMD-registers variant for DImode SRA e.g. ssra d0, d1, #N is left using the UNSPECs. Expressing it as a DImode plus+shift led to all kinds of trouble as it started matching the existing define_insns for "add x0, x0, asr #N" instructions and adding the SRA form as an extra alternative required a significant amount of deduplication of iterators and things still didn't work out well. I decided not to tackle that case in this patch. It can be attempted later. * For the RSRA variants that add a rounding constant (1 << (shift-1)) the addition is notionally performed in a wider mode than the input types so that overflow is handled properly. In RTL this can be represented with an appropriate extend operation followed by a truncate back to the original modes. However for 128-bit input modes such as V4SI we don't have appropriate modes defined for this widening i.e. we'd need a V4DI mode to represent the intermediate widened result. This patch defines such modes for V16HI,V8SI,V4DI,V2TI. These will come handy in the future too as we have more Advanced SIMD instruction that have similar intermediate widening semantics. * The above new modes led to a problem with stor-layout.cc. The new modes only exist for the sake of the RTL optimisers understanding the semantics of the instruction but are not indended to be moved to and from register or memory, assigned to types, used as TYPE_MODE or participate in auto-vectorisation. This is expressed in aarch64 by aarch64_classify_vector_mode returning zero for these new modes. However, the code in stor-layout.cc:<mode_for_vector> explicitly doesn't check this when picking a TYPE_MODE due to modes being made potentially available later through target switching (PR38240). This led to these modes being picked as TYPE_MODE for declarations such as: typedef int16_t vnx8hi __attribute__((vector_size (32))) when 256-bit fixed-length SVE modes are available and vector_type_mode later struggling to rectify this. This issue is addressed with the new target hook TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P that is intended to check if a vector mode can be used in any legal target attribute configuration of the port, as opposed to the existing TARGET_VECTOR_MODE_SUPPORTED_P that checks only the initial target configuration. This allows a simple adjustment in stor-layout.cc that still disqualifies these limited modes early on while allowing consideration of modes that can be turned on in the future with target attributes. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-modes.def (V16HI, V8SI, V4DI, V2TI): New modes. * config/aarch64/aarch64-protos.h (aarch64_const_vec_rnd_cst_p): Declare prototype. (aarch64_const_vec_rsra_rnd_imm_p): Likewise. * config/aarch64/aarch64-simd.md (aarch64_simd_sra<mode>): Rename to... (aarch64_<sra_op>sra_n<mode>_insn): ... This. (aarch64_<sra_op>rsra_n<mode>_insn): New define_insn. (aarch64_<sra_op>sra_n<mode>): New define_expand. (aarch64_<sra_op>rsra_n<mode>): Likewise. (aarch64_<sur>sra_n<mode>): Rename to... (aarch64_<sur>sra_ndi): ... This. config/aarch64/aarch64.cc (aarch64_classify_vector_mode): Add any_target_p argument. (aarch64_extract_vec_duplicate_wide_int): Define. (aarch64_const_vec_rsra_rnd_imm_p): Likewise. (aarch64_const_vec_rnd_cst_p): Likewise. (aarch64_vector_mode_supported_any_target_p): Likewise. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise. * config/aarch64/iterators.md (UNSPEC_SRSRA, UNSPEC_URSRA): Delete. (VSRA): Adjust for the above. (sur): Likewise. (V2XWIDE): New mode_attr. (vec_or_offset): Likewise. (SHIFTEXTEND): Likewise. * config/aarch64/predicates.md (aarch64_simd_rsra_rnd_imm_vec): New predicate. * doc/tm.texi (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to clarify that it applies to current target options. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Document. * doc/tm.texi.in: Regenerate. * stor-layout.cc (mode_for_vector): Check vector_mode_supported_any_target_p when iterating through vector modes. * target.def (TARGET_VECTOR_MODE_SUPPORTED_P): Adjust description to clarify that it applies to current target options. (TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Define.
2023-05-30	RISC-V: Add floating-point to integer conversion RVV auto-vectorization support	Juzhe-Zhong	3	-1/+31
	Even though we can't support floating-point operations which are depending on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc is not updated and we can't support mode switching for this. We can support floating-point to integer conversion now since it's not depending on FRM and we don't need mode switching support for this ('rtz' conversions independent FRM). Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (<optab><mode><vconvert>2): New pattern. * config/riscv/iterators.md: New attribute. * config/riscv/vector-iterators.md: New attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New test. * gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New test.
2023-05-30	RISC-V: Fix warning in riscv.md	From: Juzhe-Zhong	1	-2/+2
	Notice there is warning: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md: In function â€˜rtx_def* gen_anddi3(rtx, rtx, rtx)â€™: ../../../riscv-gcc/gcc/config/riscv/riscv.md:1356:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] if (INTVAL (operands[2]) == GET_MODE_MASK (HImode)) ../../../riscv-gcc/gcc/config/riscv/riscv.md:1358:37: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode)) Add unsigned conversion to fix this warning. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/riscv.md: Fix signed and unsigned comparison warning.
2023-05-30	RISC-V: Add RVV FNMA auto-vectorization support	Juzhe-Zhong	1	-0/+45
	Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (fnma<mode>4): New pattern. (fnma<mode>): Ditto. gcc/testsuite/ChangeLog: gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
2023-05-29	RISC-V: Optimize TARGET_XTHEADCONDMOV	Die Li	1	-38/+6
	This patch allows less instructions to be used when TARGET_XTHEADCONDMOV is enabled. Provide an example from the existing testcases. Testcase: int ConEmv_imm_imm_reg(int x, int y){ if (x == 1000) return 10; return y; } Cflags: -O2 -march=rv64gc_xtheadcondmov -mabi=lp64d before patch: ConEmv_imm_imm_reg: addi a5,a0,-1000 li a0,10 th.mvnez a0,zero,a5 th.mveqz a1,zero,a5 or a0,a0,a1 ret after patch: ConEmv_imm_imm_reg: addi a5,a0,-1000 li a0,10 th.mvnez a0,a1,a5 ret Signed-off-by: Die Li <lidie@eswincomputing.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_conditional_move_onesided): Delete. (riscv_expand_conditional_move): Reuse the TARGET_SFB_ALU expand process for TARGET_XTHEADCONDMOV gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadcondmov-indirect-rv32.c: Update the output. * gcc.target/riscv/xtheadcondmov-indirect-rv64.c: Likewise.
2023-05-29	i386: Also require TARGET_AVX512BW to generate truncv16hiv16qi2 [PR110021]	Uros Bizjak	1	-1/+1
	gcc/ChangeLog: PR target/110021 * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2): Also require TARGET_AVX512BW to generate truncv16hiv16qi2.
2023-05-29	RISC-V: Use extension instructions instead of bitwise "and"	Jivan Hakobyan	2	-1/+46
	In the case where the target supports extension instructions, it is preferable to use that instead of doing the same in other ways. For the following case void foo (unsigned long a, unsigned long* ptr) { ptr[0] = a & 0xffffffffUL; ptr[1] &= 0xffffffffUL; } GCC generates foo: li a5,-1 srli a5,a5,32 and a0,a0,a5 sd a0,0(a1) ld a4,8(a1) and a5,a4,a5 sd a5,8(a1) ret but it will be profitable to generate this one foo: zext.w a0,a0 sd a0,0(a1) lwu a5,8(a1) sd a5,8(a1) ret This patch fixes mentioned issue. It supports HI -> DI, HI->SI and SI -> DI extensions. gcc/ChangeLog: * config/riscv/riscv.md (and<mode>3): New expander. (and<mode>3) New pattern. config/riscv/predicates.md (arith_operand_or_mode_mask): New predicate. gcc/testsuite/ChangeLog: * gcc.target/riscv/and-extend-1.c: New test * gcc.target/riscv/and-extend-2.c: New test
2023-05-29	RISC-V: Refactor comments and naming of riscv-v.cc.	Pan Li	1	-47/+49
	This patch would like to remove unnecessary comments of some self explained parameters and try a better name to avoid misleading. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_insn): Remove unnecessary comments and rename local variables. (emit_nonvlmax_insn): Diito. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (emit_scalar_move_insn): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29	RISC-V: Eliminate the magic number in riscv-v.cc	Pan Li	1	-31/+46
	This patch would like to remove the magic number in the riscv-v.cc, and align the same value to one macro. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vlmax_insn): Eliminate the magic number. (emit_nonvlmax_insn): Ditto. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_cmp_insn): Ditto. (emit_vlmax_cmp_mu_insn): Ditto. (expand_vec_series): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29	RISC-V: Using merge approach to optimize repeating sequence in vec_init	Pan Li	3	-6/+221
	This patch would like to optimize the VLS vector initialization like repeating sequence. From the vslide1down to the vmerge with a simple cost model, aka every instruction only has 1 cost. Given code with -march=rv64gcv_zvl256b --param riscv-autovec-preference=fixed-vlmax typedef int64_t vnx32di __attribute__ ((vector_size (256))); __attribute__ ((noipa)) void f_vnx32di (int64_t a, int64_t b, int64_t out) { vnx32di v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b, }; (vnx32di ) out = v; } Before this patch: vslide1down.vx (x31 times) After this patch: li a5,-1431654400 addi a5,a5,-1365 li a3,-1431654400 addi a3,a3,-1366 slli a5,a5,32 add a5,a5,a3 vsetvli a4,zero,e64,m8,ta,ma vmv.v.x v8,a0 vmv.s.x v0,a5 vmerge.vxm v8,v8,a1,v0 vs8r.v v8,0(a2) Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into SEW = 128 element and then broadcast this big element. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: config/riscv/riscv-protos.h (enum insn_type): New type. * config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro. (rvv_builder::can_duplicate_repeating_sequence_p): Align the referenced class member. (rvv_builder::get_merged_repeating_sequence): Ditto. (rvv_builder::repeating_sequence_use_merge_profitable_p): New function to evaluate the optimization cost. (rvv_builder::get_merge_scalar_mask): New function to get the merge mask. (emit_scalar_move_insn): New function to emit vmv.s.x. (emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x. (emit_nonvlmax_integer_move_insn): New function to emit nonvlmax vmv.v.x. (get_repeating_sequence_dup_machine_mode): New function to get the dup machine mode. (expand_vector_init_merge_repeating_sequence): New function to perform the optimization. (expand_vec_init): Add this vector init optimization. * config/riscv/riscv.h (BITS_PER_WORD): New macro. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-05-29	RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization	Juzhe-Zhong	1	-1/+29
	Fix bug reported here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974 PR target/109974 Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.
2023-05-29	RISC-V: Add RVV FMA auto-vectorization support	Juzhe-Zhong	3	-0/+87
	This patch support FMA auto-vectorization pattern. Let's RA decide vmacc or vmadd. Signed-off-by: Juzhe-Zhong <juzhe.zhong@rivai.ai> gcc/ChangeLog: * config/riscv/autovec.md (fma<mode>4): New pattern. (fma<mode>): Ditto. config/riscv/riscv-protos.h (enum insn_type): New enum. (emit_vlmax_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add ternary tests * gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.