aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/riscv/riscv-protos.h
AgeCommit message (Collapse)AuthorFilesLines
2023-10-27RISC-V: Add rawmemchr expander.Robin Dapp1-0/+2
This patch adds a vectorized rawmemchr expander. It also moves the vectorized expand_block_move to riscv-string.cc. gcc/ChangeLog: * config/riscv/autovec.md (rawmemchr<ANYI:mode>): New expander. * config/riscv/riscv-protos.h (gen_no_side_effects_vsetvl_rtx): Define. (expand_rawmemchr): Define. * config/riscv/riscv-v.cc (force_vector_length_operand): Remove static. (expand_block_move): Move from here... * config/riscv/riscv-string.cc (expand_block_move): ...to here. (expand_rawmemchr): Add vectorized expander. * internal-fn.cc (expand_RAWMEMCHR): Fix typo. gcc/testsuite/ChangeLog: * gcc.dg/tree-prof/peel-2.c: Add -fno-tree-loop-distribute-patterns. * gcc.dg/tree-ssa/ldist-rawmemchr-1.c: Add riscv. * gcc.dg/tree-ssa/ldist-rawmemchr-2.c: Ditto. * gcc.target/riscv/rvv/rvv.exp: Add builtin directory. * gcc.target/riscv/rvv/autovec/builtin/rawmemchr-1.c: New test.
2023-10-27RISC-V: Add AVL propagation PASS for RVV auto-vectorizationJuzhe-Zhong1-0/+1
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization which is a known issue for a long time and I finally find the time to address it. Consider a simple vector addition operation: https://godbolt.org/z/7hfGfEjW3 void foo (int *__restrict a, int *__restrict b, int *__restrict n) { for (int i = 0; i < n; i++) a[i] = a[i] + b[i]; } Optimized IR: Loop body: _38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma ... vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0) vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1) vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2 .MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4) We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling. The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment: vect__7.12_19 = vect__6.11_20 + vect__4.8_27; GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization. Such flow are used by all other targets like ARM SVE (RVV also uses such flow): ARM SVE: .L3: ld1w z30.s, p7/z, [x0, x3, lsl 2] -> predicated load ld1w z31.s, p7/z, [x1, x3, lsl 2] -> predicated load add z31.s, z31.s, z30.s -> un-predicated add st1w z31.s, p7, [x0, x3, lsl 2] -> predicated store Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it. Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons: 1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend. 2. Changing Loop vectorizer for it will make code base ugly and hard to maintain. 3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, .... We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns. To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls due to AVL/VL toggling. The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS) Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several experiments and tries. The reasons as follows: 1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL PASS become heavy and heavy again, then we will need to refactor it again in the future. Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor fixes. 2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS. 3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation. 4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations. This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements. We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate VSETVL PASS again which is already so complicated.) Here is an example to demonstrate more: https://godbolt.org/z/bE86sv3q5 void foo2 (int *__restrict a, int *__restrict b, int *__restrict c, int *__restrict a2, int *__restrict b2, int *__restrict c2, int *__restrict a3, int *__restrict b3, int *__restrict c3, int *__restrict a4, int *__restrict b4, int *__restrict c4, int *__restrict a5, int *__restrict b5, int *__restrict c5, int n) { for (int i = 0; i < n; i++){ a[i] = b[i] + c[i]; b5[i] = b[i] + c[i]; a2[i] = b2[i] + c2[i]; a3[i] = b3[i] + c3[i]; a4[i] = b4[i] + c4[i]; a5[i] = a[i] + a4[i]; a[i] = a5[i] + b5[i]+ a[i]; a[i] = a[i] + c[i]; b5[i] = a[i] + c[i]; a2[i] = a[i] + c2[i]; a3[i] = a[i] + c3[i]; a4[i] = a[i] + c4[i]; a5[i] = a[i] + a4[i]; a[i] = a[i] + b5[i]+ a[i]; } } 1. Loop Body: Before this patch: After this patch: vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma vle32.v v2,0(a2) vle32.v v2,0(a2) vle32.v v4,0(a1) vle32.v v3,0(t2) vle32.v v1,0(t2) vle32.v v4,0(a1) vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0) vadd.vv v4,v2,v4 vadd.vv v4,v2,v4 vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1 vle32.v v3,0(s0) vadd.vv v1,v1,v4 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2 vadd.vv v1,v1,v4 vadd.vv v2,v1,v2 vadd.vv v1,v1,v4 vse32.v v2,0(t5) vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1 vle32.v v4,0(a5) vadd.vv v2,v2,v1 vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2 vadd.vv v1,v1,v2 vadd.vv v3,v1,v3 vadd.vv v2,v1,v2 vle32.v v5,0(a5) vadd.vv v4,v1,v4 vle32.v v6,0(t6) vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3) vse32.v v2,0(t5) vse32.v v2,0(a0) vse32.v v4,0(a3) vadd.vv v3,v3,v1 vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5 vadd.vv v3,v1,v3 vse32.v v3,0(t4) vadd.vv v2,v2,v1 vadd.vv v1,v1,v6 vadd.vv v2,v2,v1 vse32.v v2,0(a3) vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6) vse32.v v2,0(a0) vse32.v v3,0(t3) vle32.v v2,0(t0) vsetvli a7,zero,e32,m1,ta,ma vadd.vv v3,v3,v1 vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t4) vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2 vadd.vv v1,v1,v2 sub t1,t1,a4 vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6) It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated. 2. Epilogue: Before this patch: After this patch: .L5: .L5: ld s0,8(sp) ret addi sp,sp,16 jr ra This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma' The final codegen after this patch: foo2: lw t1,56(sp) ld t6,0(sp) ld t3,8(sp) ld t0,16(sp) ld t2,24(sp) ld t4,32(sp) ld t5,40(sp) ble t1,zero,.L5 .L3: vsetvli a4,t1,e32,m1,ta,ma vle32.v v2,0(a2) vle32.v v3,0(t2) vle32.v v4,0(a1) vle32.v v1,0(t0) vadd.vv v4,v2,v4 vadd.vv v1,v3,v1 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v4 vadd.vv v1,v1,v2 vadd.vv v2,v1,v2 vse32.v v2,0(t5) vadd.vv v2,v2,v1 vadd.vv v2,v2,v1 slli a7,a4,2 vadd.vv v3,v1,v3 vle32.v v5,0(a5) vle32.v v6,0(t6) vse32.v v3,0(t3) vse32.v v2,0(a0) vadd.vv v3,v3,v1 vadd.vv v2,v1,v5 vse32.v v3,0(t4) vadd.vv v1,v1,v6 vse32.v v2,0(a3) vse32.v v1,0(a6) sub t1,t1,a4 add a1,a1,a7 add a2,a2,a7 add a5,a5,a7 add t6,t6,a7 add t0,t0,a7 add t2,t2,a7 add t5,t5,a7 add a3,a3,a7 add a6,a6,a7 add t3,t3,a7 add t4,t4,a7 add a0,a0,a7 bne t1,zero,.L3 .L5: ret PR target/111318 PR target/111888 gcc/ChangeLog: * config.gcc: Add AVL propagation pass. * config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto. * config/riscv/riscv-protos.h (make_pass_avlprop): Ditto. * config/riscv/t-riscv: Ditto. * config/riscv/riscv-avlprop.cc: New file. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto. * gcc.target/riscv/rvv/autovec/pr111318.c: New test. * gcc.target/riscv/rvv/autovec/pr111888.c: New test. Tested-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-25RISC-V: Export some functions from riscv-vsetvl to riscv-v[NFC]Juzhe-Zhong1-0/+8
Address kito's comments of AVL propagation patch. Export the functions that are not only used by VSETVL PASS but also AVL propagation PASS. No functionality change. gcc/ChangeLog: * config/riscv/riscv-protos.h (has_vl_op): Export from riscv-vsetvl to riscv-v (tail_agnostic_p): Ditto. (validate_change_or_fail): Ditto. (nonvlmax_avl_type_p): Ditto. (vlmax_avl_p): Ditto. (get_sew): Ditto. (enum vlmul_type): Ditto. (count_regno_occurrences): Ditto. * config/riscv/riscv-v.cc (has_vl_op): Ditto. (get_default_ta): Ditto. (tail_agnostic_p): Ditto. (validate_change_or_fail): Ditto. (nonvlmax_avl_type_p): Ditto. (vlmax_avl_p): Ditto. (get_sew): Ditto. (enum vlmul_type): Ditto. (get_vlmul): Ditto. (count_regno_occurrences): Ditto. * config/riscv/riscv-vsetvl.cc (vlmax_avl_p): Ditto. (has_vl_op): Ditto. (get_sew): Ditto. (get_vlmul): Ditto. (get_default_ta): Ditto. (tail_agnostic_p): Ditto. (count_regno_occurrences): Ditto. (validate_change_or_fail): Ditto.
2023-10-25RISC-V: Change MD attribute avl_type into avl_type_idx[NFC]Juzhe-Zhong1-0/+1
Address kito's comments of AVL propagation patch. Change avl_type into avl_type_idx. No functionality change. gcc/ChangeLog: * config/riscv/riscv-protos.h (vlmax_avl_type_p): New function. * config/riscv/riscv-v.cc (vlmax_avl_type_p): Ditto. * config/riscv/riscv-vsetvl.cc (get_avl): Adapt function. * config/riscv/vector.md: Change avl_type into avl_type_idx.
2023-10-23RISC-V: Add popcount fallback expander.Robin Dapp1-0/+1
I didn't manage to get back to the generic vectorizer fallback for popcount so I figured I'd rather create a popcount fallback in the riscv backend. It uses the WWG algorithm from libgcc. gcc/ChangeLog: * config/riscv/autovec.md (popcount<mode>2): New expander. * config/riscv/riscv-protos.h (expand_popcount): Define. * config/riscv/riscv-v.cc (expand_popcount): Vectorize popcount with the WWG algorithm. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/popcount-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/popcount-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/popcount-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/popcount.c: New test.
2023-10-23RISC-V: Bugfix for merging undefined tmp register in mathPan Li1-0/+5
For math function autovec, there will be one step like rtx tmp = gen_reg_rtx (vec_int_mode); emit_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_DYN, vec_fp_mode); The MU will leave the tmp (aka dest register) register unmasked elements unchanged and it is undefined here. This patch would like to adjust the MU to MA. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_type): Add new type values. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f): Add undef merge operand handling. (expand_vec_ceil): Take MA instead of MU for tmp register. (expand_vec_floor): Ditto. (expand_vec_nearbyint): Ditto. (expand_vec_rint): Ditto. (expand_vec_round): Ditto. (expand_vec_roundeven): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-21RISC-V: Support partial VLS mode when preference fixed-vlmax [PR111857]Pan Li1-0/+1
Given we have code like below: typedef char vnx16i __attribute__ ((vector_size (16))); vnx16i __attribute__ ((noinline, noclone)) test (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, 11, 12, 13, 14, 11, 12, 13, 14, 11, 12, 13, 14, 11, 12, 13, 14); } It can perform the auto vectorization when -march=rv64gcv_zvl1024b --param=riscv-autovec-preference=fixed-vlmax but cannot when -march=rv64gcv_zvl2048b --param=riscv-autovec-preference=fixed-vlmax The reason comes from the miniaml machine mode of QI is RVVMF8QI, which is 1024 / 8 = 128 bits, aka the size of VNx16QI. When we set zvl2048b, the bit size of RVVMFQI is 2048 / 8 = 256, which is not matching the bit size of VNx16QI (128 bits). Thus, this patch would like to enable the VLS mode for such case, aka VNx16QI vls mode for zvl2048b. Before this patch: test: srli a4,a1,40 andi a4,a4,0xff srli a3,a1,32 srli a5,a1,48 slli a0,a4,8 andi a3,a3,0xff andi a5,a5,0xff slli a2,a5,16 or a0,a3,a0 srli a1,a1,56 or a0,a0,a2 slli a2,a1,24 slli a3,a3,32 or a0,a0,a2 slli a4,a4,40 or a0,a0,a3 slli a5,a5,48 or a0,a0,a4 or a0,a0,a5 slli a1,a1,56 or a0,a0,a1 mv a1,a0 ret After this patch: test: vsetivli zero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivli zero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivli zero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret PR target/111857 gcc/ChangeLog: * config/riscv/riscv-opts.h (TARGET_VECTOR_VLS): Remove. * config/riscv/riscv-protos.h (vls_mode_valid_p): New func decl. * config/riscv/riscv-v.cc (autovectorize_vector_modes): Replace macro reference to func. (vls_mode_valid_p): New func impl for vls mode valid or not. * config/riscv/riscv-vector-switch.def (VLS_ENTRY): Replace macro reference to func. * config/riscv/vector-iterators.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/vls/def.h: Add help define. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-0.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-3.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-4.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-5.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/pr111857-6.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-16RISC-V: NFC: Move scalar block move expansion code into riscv-string.ccChristoph Müllner1-2/+3
This just moves a few functions out of riscv.cc into riscv-string.cc in an attempt to keep riscv.cc manageable. This was originally Christoph's code and I'm just pushing it on his behalf. Full disclosure: I built rv64gc after changing to verify everything still builds. Given it was just lifting code from one place to another, I didn't run the testsuite. gcc/ * config/riscv/riscv-protos.h (emit_block_move): Remove redundant prototype. Improve comment. * config/riscv/riscv.cc (riscv_block_move_straight): Move from riscv.cc into riscv-string.cc. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise. * config/riscv/riscv-string.cc (riscv_block_move_straight): Add moved function. (riscv_adjust_block_mem, riscv_block_move_loop): Likewise. (riscv_expand_block_move): Likewise.
2023-10-13RISC-V: Support FP lfloor/lfloorf auto vectorizationPan Li1-0/+2
This patch would like to support the FP lfloor/lfloorf auto vectorization. * long lfloor (double) for rv64 * long lfloorf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lfloormn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lfloor (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lfloor (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rdn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 2 // RDN .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lfloor<mode><v_i_l_ll_convert>2): New pattern for lfloor/lfloorf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lfloor): New func decl for expanding lfloor. * config/riscv/riscv-v.cc (expand_vec_lfloor): New func impl for expanding lfloor. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lfloor-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lfloor-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-13RISC-V: Support FP lceil/lceilf auto vectorizationPan Li1-0/+2
This patch would like to support the FP lceil/lceilf auto vectorization. * long lceil (double) for rv64 * long lceilf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lceilmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lceil (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lceil (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rup sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 3 // RUP .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lceil<mode><v_i_l_ll_convert>2): New pattern] for lceil/lceilf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lceil): New func decl for expanding lceil. * config/riscv/riscv-v.cc (expand_vec_lceil): New func impl for expanding lceil. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lceil-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lceil-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lceil-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-12RISC-V: Support FP lround/lroundf auto vectorizationPan Li1-0/+2
This patch would like to support the FP lround/lroundf auto vectorization. * long lround (double) for rv64 * long lroundf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lroundmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lround (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lround (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,rmm sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: frrm a6 ... fsrmi 4 // RMM .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 ... fsrm a6 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lround<mode><v_i_l_ll_convert>2): New pattern for lround/lroundf. * config/riscv/riscv-protos.h (enum insn_type): New enum value. (expand_vec_lround): New func decl for expanding lround. * config/riscv/riscv-v.cc (expand_vec_lround): New func impl for expanding lround. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lround-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lround-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-11RISC-V: Fix incorrect index(offset) of gather/scatterJuzhe-Zhong1-0/+1
I suddenly discovered I made a mistake that was lucky un-exposed. https://godbolt.org/z/c3jzrh7or GCC is using 32 bit index offset: vsll.vi v1,v1,2 vsetvli zero,a5,e32,m1,ta,ma vluxei32.v v1,(a1),v1 This is wrong since v1 may overflow 32bit after vsll.vi. After this patch: vsext.vf2 v8,v4 vsll.vi v8,v8,2 vluxei64.v v8,(a1),v8 Same as Clang. Regression passed. Ok for trunk ? gcc/ChangeLog: * config/riscv/autovec.md: Fix index bug. * config/riscv/riscv-protos.h (gather_scatter_valid_offset_mode_p): New function. * config/riscv/riscv-v.cc (expand_gather_scatter): Fix index bug. (gather_scatter_valid_offset_mode_p): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/offset_extend-1.c: New test.
2023-10-11RISC-V: Support FP lrint/lrintf auto vectorizationPan Li1-0/+1
This patch would like to support the FP lrint/lrintf auto vectorization. * long lrint (double) for rv64 * long lrintf (float) for rv32 Due to the limitation that only the same size of data type are allowed in the vectorier, the standard name lrintmn2 only act on DF => DI for rv64, and SF => SI for rv32. Given we have code like: void test_lrint (long *out, double *in, unsigned count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_lrint (in[i]); } Before this patch: .L3: ... fld fa5,0(a1) fcvt.l.d a5,fa5,dyn sd a5,-8(a0) ... bne a1,a4,.L3 After this patch: .L3: ... vsetvli a3,zero,e64,m1,ta,ma vfcvt.x.f.v v1,v1 vsetvli zero,a2,e64,m1,ta,ma vse32.v v1,0(a0) ... bne a2,zero,.L3 The rest part like SF => DI/HF => DI/DF => SI/HF => SI will be covered by TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION. gcc/ChangeLog: * config/riscv/autovec.md (lrint<mode><vlconvert>2): New pattern for lrint/lintf. * config/riscv/riscv-protos.h (expand_vec_lrint): New func decl for expanding lint. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f): New helper func impl for vfcvt.x.f.v. (expand_vec_lrint): New function impl for expanding lint. * config/riscv/vector-iterators.md: New mode attr and iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: New define for CVT like test case. * gcc.target/riscv/rvv/autovec/vls/def.h: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-lrint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-lrint-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-02cpymem for RISC-V with v extensionJoern Rennecke1-0/+1
gcc/ * config/riscv/riscv-protos.h (riscv_vector::expand_block_move): Declare. * config/riscv/riscv-v.cc (riscv_vector::expand_block_move): New function. * config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move. Change to .. (cpymem<P:mode>) .. this. gcc/testsuite/ * gcc.target/riscv/rvv/base/cpymem-1.c: New test. * gcc.target/riscv/rvv/base/cpymem-2.c: Likewise. Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-10-01Make riscv_vector::legitimize_move adjust SRC in the caller.Joern Rennecke1-1/+1
2023-09-29 Joern Rennecke <joern.rennecke@embecosm.com> Juzhe-Zhong <juzhe.zhong@rivai.ai> PR target/111566 gcc/ * config/riscv/riscv-protos.h (riscv_vector::legitimize_move): Change second parameter to rtx *. * config/riscv/riscv-v.cc (risv_vector::legitimize_move): Likewise. * config/riscv/vector.md: Changed callers of riscv_vector::legitimize_move. (*mov<mode>_mem_to_mem): Remove. gcc/testsuite/ * gcc.target/riscv/rvv/autovec/vls/mov-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/vls/mov-10.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-3.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-7.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-8.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/mov-9.c: Ditto.1 * gcc.target/riscv/rvv/autovec/vls/mov-2.c: Removed. * gcc.target/riscv/rvv/autovec/vls/mov-4.c: Removed. * gcc.target/riscv/rvv/autovec/vls/mov-6.c: Removed. * gcc.target/riscv/rvv/fortran/pr111566.f90: New test. Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>
2023-09-27RISC-V: Support FP roundeven auto-vectorizationPan Li1-0/+5
This patch would like to support auto-vectorization for the roundeven API in math.h. It depends on the -ffast-math option. When we would like to call roundeven like v2 = roundeven (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RNE * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +-----------+---------------+-----------------+ | raw float | binary layout | after roundeven | +-----------+---------------+-----------------+ | 8388607.5 | 0x4affffff | 8388608.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +-----------+---------------+-----------------+ All single floating point glte 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-roundeven-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call roundeven fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 0 // Rounding to nearest, ties to even .L4: vfabs.v v1,v2 vmflt.vf v0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 bne .L4 .L14: fsrm a6 ret Please note VLS mode is also involved in this patch and covered by the test cases. We will add more run test with zfa support later. gcc/ChangeLog: * config/riscv/autovec.md (roundeven<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_roundeven): New func decl. * config/riscv/riscv-v.cc (expand_vec_roundeven): New func impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-roundeven-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-roundeven-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-27RISC-V: Support FP trunc auto-vectorizationPan Li1-0/+1
This patch would like to support auto-vectorization for the trunc API in math.h. It depends on the -ffast-math option. When we would like to call trunc/truncf like v2 = trunc (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.rtz.x.f v3, v1 * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: +------------+---------------+-----------------+ | raw float | binary layout | after trunc | +------------+---------------+-----------------+ | -8388607.5 | 0xcaffffff | -8388607.0 | | 8388607.5 | 0x4affffff | 8388607.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +------------+---------------+-----------------+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-trunc-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call trunc fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vf v0,v2,fa5 vfcvt.rtz.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 bne .L4 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (btrunc<mode>2): New pattern. * config/riscv/riscv-protos.h (expand_vec_trunc): New func decl. * config/riscv/riscv-v.cc (emit_vec_cvt_x_f_rtz): New func impl. (expand_vec_trunc): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-trunc-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-trunc-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-26RISC-V: Support FP round auto-vectorizationPan Li1-0/+5
This patch would like to support auto-vectorization for the round API in math.h. It depends on the -ffast-math option. When we would like to call round/roundf like v2 = round (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RMM * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: +------------+---------------+-----------------+ | raw float | binary layout | after round | +------------+---------------+-----------------+ | -8388607.5 | 0xcaffffff | -8388608.0 | | 8388607.5 | 0x4affffff | 8388608.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +------------+---------------+-----------------+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-round-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call round fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 4 // RMM, rounding to nearest, ties to max magnitude .L4: vfabs.v v2,v1 vmflt.vf v0,v2,fa5 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrm a6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (round<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_round): New function decl. * config/riscv/riscv-v.cc (expand_vec_round): New function impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-round-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-round-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-26RISC-V: Support FP rint auto-vectorizationPan Li1-0/+1
This patch would like to support auto-vectorization for the rint API in math.h. It depends on the -ffast-math option. When we would like to call rint/rintf like v2 = rint (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1 * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode +------------+---------------+-----------------+ | raw float | binary layout | after int | +------------+---------------+-----------------+ | -8388607.5 | 0xcaffffff | -8388607.0 | | 8388607.5 | 0x4affffff | 8388607.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +------------+---------------+-----------------+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-rint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call rint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vf v0,v2,fa5 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (rint<mode>2): New pattern. * config/riscv/riscv-protos.h (expand_vec_rint): New function decl. * config/riscv/riscv-v.cc (expand_vec_rint): New function impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-rint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-rint-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-26RISC-V: Support FP nearbyint auto-vectorizationPan Li1-0/+2
This patch would like to support auto-vectorization for the nearbyint API in math.h. It depends on the -ffast-math option. When we would like to call nearbyint/nearbyintf like v2 = nearbyint (v1), we will convert it into below insns (reference the implementation of llvm). * frflags a5 * vfcvt.x.f v3, v1, RDN * vfcvt.f.x v2, v3 * fsflags a5 However, the floating point value may not need the cvt as above if its mantissa is zero. Take single precision floating point as example: Assume we have RTZ rounding mode +------------+---------------+-----------------+ | raw float | binary layout | after nearbyint | +------------+---------------+-----------------+ | 8388607.5 | 0x4affffff | 8388607.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +------------+---------------+-----------------+ All single floating point >= 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-nearbyint-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call nearbyint fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: vfabs.v v2,v1 vmflt.vf v0,v2,fa5 frflags a7 vfcvt.x.f.v v4,v1,v0.t vfcvt.f.x.v v2,v4,v0.t fsflags a7 vfsgnj.vv v2,v2,v1 Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (nearbyint<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_vec_nearbyint): New function decl. * config/riscv/riscv-v.cc (expand_vec_nearbyint): New func impl. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/test-math.h: Add helper function. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-nearbyint-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-23RISC-V: Suport FP floor auto-vectorizationPan Li1-0/+5
This patch would like to support auto-vectorization for the floor API in math.h. It depends on the -ffast-math option. When we would like to call floor/floorf like v2 = floor (v1), we will convert it into below insns (reference the implementation of llvm). * vfcvt.x.f v3, v1, RDN * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +-----------+---------------+-------------+ | raw float | binary layout | after floor | +-----------+---------------+-------------+ | 8388607.5 | 0x4affffff | 8388607.0 | | 8388608.0 | 0x4b000000 | 8388608.0 | | 8388609.0 | 0x4b000001 | 8388609.0 | +-----------+---------------+-------------+ All single floating point glte 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-floor-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call ceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 2 // Rounding Down .L4: vfabs.v v1,v2 vmflt.vf v0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 bne .L4 .L14: fsrm a6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (floor<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_floor): New function decl. * config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl. (expand_vec_floor): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-22RISC-V: Add VLS conditional patterns supportJuzhe-Zhong1-0/+3
Regression passed. Committed. gcc/ChangeLog: * config/riscv/autovec.md: Add VLS conditional patterns. * config/riscv/riscv-protos.h (expand_cond_unop): Ditto. (expand_cond_binop): Ditto. (expand_cond_ternop): Ditto. * config/riscv/riscv-v.cc (expand_cond_unop): Ditto. (expand_cond_binop): Ditto. (expand_cond_ternop): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS conditional tests. * gcc.target/riscv/rvv/autovec/vls/cond_add-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_add-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_and-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_div-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_div-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fma-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fma-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fms-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fnma-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fnma-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_fnms-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_ior-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_max-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_max-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_min-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_min-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_mod-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_mul-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_mul-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_neg-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_not-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_shift-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_shift-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_sub-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_sub-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/cond_xor-1.c: New test.
2023-09-22RISC-V: Support combine cond extend and reduce sum to widen reduce sumLehua Ding1-0/+1
This patch support combining cond extend and reduce_sum to cond widen reduce_sum like combine the following three insns: (set (reg:RVVM2HI 149) (if_then_else:RVVM2HI (unspec:RVVMF8BI [ (const_vector:RVVMF8BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 146) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM2HI repeat [ (const_int 0 [0]) ]) (unspec:RVVM2HI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) (set (reg:RVVM2HI 138) (if_then_else:RVVM2HI (reg:RVVMF8BI 135) (reg:RVVM2HI 148) (reg:RVVM2HI 149))) (set (reg:HI 150) (unspec:HI [ (reg:RVVM2HI 138) ] UNSPEC_REDUC_SUM)) into one insn: (set (reg:SI 147) (unspec:SI [ (if_then_else:RVVM2SI (reg:RVVMF16BI 135) (sign_extend:RVVM2SI (reg:RVVM1HI 136)) (if_then_else:RVVM2HI (unspec:RVVMF8BI [ (const_vector:RVVMF8BI repeat [ (const_int 1 [0x1]) ]) (reg:DI 146) (const_int 2 [0x2]) repeated x2 (const_int 1 [0x1]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (const_vector:RVVM2HI repeat [ (const_int 0 [0]) ]) (unspec:RVVM2HI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF))) ] UNSPEC_REDUC_SUM)) Consider the following C code: int16_t foo (int8_t *restrict a, int8_t *restrict pred) { int16_t sum = 0; for (int i = 0; i < 16; i += 1) if (pred[i]) sum += a[i]; return sum; } assembly before this patch: foo: vsetivli zero,16,e16,m2,ta,ma li a5,0 vmv.v.i v2,0 vsetvli zero,zero,e8,m1,ta,ma vl1re8.v v0,0(a1) vmsne.vi v0,v0,0 vsetvli zero,zero,e16,m2,ta,mu vle8.v v4,0(a0),v0.t vmv.s.x v1,a5 vsext.vf2 v2,v4,v0.t vredsum.vs v2,v2,v1 vmv.x.s a0,v2 slliw a0,a0,16 sraiw a0,a0,16 ret assembly after this patch: foo: li a5,0 vsetivli zero,16,e16,m1,ta,ma vmv.s.x v3,a5 vsetivli zero,16,e8,m1,ta,ma vl1re8.v v0,0(a1) vmsne.vi v0,v0,0 vle8.v v2,0(a0),v0.t vwredsum.vs v1,v2,v3,v0.t vsetivli zero,0,e16,m1,ta,ma vmv.x.s a0,v1 slliw a0,a0,16 sraiw a0,a0,16 ret gcc/ChangeLog: * config/riscv/autovec-opt.md (*cond_widen_reduc_plus_scal_<mode>): New combine patterns. * config/riscv/riscv-protos.h (enum insn_type): New insn_type. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_widen_reduc_run-2.c: New test.
2023-09-22RISC-V: Split VLS avl_type from NONVLMAX avl_typeLehua Ding1-2/+19
This patch split a VLS avl_type from the NONVLMAX avl_type, denoting those RVV insn with length set to the number of units of VLS modes. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum avl_type): New VLS avl_type. * config/riscv/riscv-v.cc (autovec_use_vlmax_p): Move comments.
2023-09-22RISC-V: Support ceil and ceilf auto-vectorizationPan Li1-0/+5
Update in v4: * Add test for _Float16. * Remove unnecessary macro in def.h for test. Original log: This patch would like to support auto-vectorization for both the ceil and ceilf of math.h. It depends on the -ffast-math option. When we would like to call ceil/ceilf like v2 = ceil (v1), we will convert it into below insn (reference the implementation of llvm). * vfcvt.x.f v3, v1, RUP * vfcvt.f.x v2, v3 However, the floating point value may not need the cvt as above if its mantissa is zero. For example single precision floating point below. +-----------+---------------+ | float | binary layout | +-----------+---------------+ | 8388607.5 | 0x4affffff | | 8388608.0 | 0x4b000000 | | 8388609.0 | 0x4b000001 | +-----------+---------------+ All single floating point great than 8388608.0 will have all zero mantisaa. We leverage vmflt and mask to filter them out in vector and only do the cvt on mask. Befor this patch: math-ceil-1.c:21:1: missed: couldn't vectorize loop ... .L3: flw fa0,0(s0) addi s0,s0,4 addi s1,s1,4 call ceilf fsw fa0,-4(s1) bne s0,s2,.L3 After this patch: ... fsrmi 3 .L4: vfabs.v v0,v1 vmv1r.v v2,v1 vmflt.vv v0,v0,v4 sub a3,a3,a4 vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 bne .L4 .L14: fsrm a6 ret Please note VLS mode is also involved in this patch and covered by the test cases. gcc/ChangeLog: * config/riscv/autovec.md (ceil<mode>2): New pattern. * config/riscv/riscv-protos.h (enum insn_flags): New enum type. (enum insn_type): Ditto. (expand_vec_ceil): New function decl. * config/riscv/riscv-v.cc (gen_ceil_const_fp): New function impl. (expand_vec_float_cmp_mask): Ditto. (expand_vec_copysign): Ditto. (expand_vec_ceil): Ditto. * config/riscv/vector.md: Add VLS mode support. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/math-ceil-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-2.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-3.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: New test. * gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: New test. * gcc.target/riscv/rvv/autovec/test-math.h: New test. * gcc.target/riscv/rvv/autovec/vls/math-ceil-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-19RISC-V: Refactor and cleanup fma patternsLehua Ding1-1/+1
At present, FMA autovec's patterns do not fully use the corresponding pattern in vector.md. The previous reason is that the merge operand of pattern in vector.md cannot be VUNDEF. Now allowing it to be VUNDEF, reunify insn used for reload pass into vector.md, and the corresponding vlmax pattern in autovec.md is used for combine. This patch also refactors the corresponding combine pattern inside autovec-opt.md and removes the unused ones. gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>_fma<mode>): Removed old combine patterns. (*single_<optab>mult_plus<mode>): Ditto. (*double_<optab>mult_plus<mode>): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. (*double_widen_fma<mode>): Ditto. (*single_widen_fma<mode>): Ditto. (*double_widen_fnma<mode>): Ditto. (*single_widen_fnma<mode>): Ditto. (*double_widen_fms<mode>): Ditto. (*single_widen_fms<mode>): Ditto. (*double_widen_fnms<mode>): Ditto. (*single_widen_fnms<mode>): Ditto. (*reduc_plus_scal_<mode>): Adjust name. (*widen_reduc_plus_scal_<mode>): Adjust name. (*dual_widen_fma<mode>): New combine pattern. (*dual_widen_fmasu<mode>): Ditto. (*dual_widen_fmaus<mode>): Ditto. (*dual_fma<mode>): Ditto. (*single_fma<mode>): Ditto. (*dual_fnma<mode>): Ditto. (*single_fnma<mode>): Ditto. (*dual_fms<mode>): Ditto. (*single_fms<mode>): Ditto. (*dual_fnms<mode>): Ditto. (*single_fnms<mode>): Ditto. * config/riscv/autovec.md (fma<mode>4): Reafctor fma pattern. (*fma<VI:mode><P:mode>): Removed. (fnma<mode>4): Reafctor. (*fnma<VI:mode><P:mode>): Removed. (*fma<VF:mode><P:mode>): Removed. (*fnma<VF:mode><P:mode>): Removed. (fms<mode>4): Reafctor. (*fms<VF:mode><P:mode>): Removed. (fnms<mode>4): Reafctor. (*fnms<VF:mode><P:mode>): Removed. * config/riscv/riscv-protos.h (prepare_ternary_operands): Adjust prototype. * config/riscv/riscv-v.cc (prepare_ternary_operands): Refactor. * config/riscv/vector.md (*pred_mul_plus<mode>_undef): New pattern. (*pred_mul_plus<mode>): Removed. (*pred_mul_plus<mode>_scalar): Removed. (*pred_mul_plus<mode>_extended_scalar): Removed. (*pred_minus_mul<mode>_undef): New pattern. (*pred_minus_mul<mode>): Removed. (*pred_minus_mul<mode>_scalar): Removed. (*pred_minus_mul<mode>_extended_scalar): Removed. (*pred_mul_<optab><mode>_undef): New pattern. (*pred_mul_<optab><mode>): Removed. (*pred_mul_<optab><mode>_scalar): Removed. (*pred_mul_neg_<optab><mode>_undef): New pattern. (*pred_mul_neg_<optab><mode>): Removed. (*pred_mul_neg_<optab><mode>_scalar): Removed.
2023-09-19RISC-V: Bugfix for scalar move with merged operandPan Li1-0/+4
Given below example for VLS mode void test (vl_t *u) { vl_t t; long long *p = (long long *)&t; p[0] = p[1] = 2; *u = t; } The vec_set will simplify the insn to vmv.s.x when index is 0, without merged operand. That will result in some problems in DCE, aka: 1: 137[DI] = a0 2: 138[V2DI] = 134[V2DI] // deleted by DCE 3: 139[DI] = #2 // deleted by DCE 4: 140[DI] = #2 // deleted by DCE 5: 141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE 6: 138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE 7: 135[V2DI] = 138[V2DI] // deleted by DCE 8: 142[V2DI] = 135[V2DI] // deleted by DCE 9: 143[DI] = #2 10: 142[V2DI] = vec_dup:V2DI (143[DI]) 11: (137[DI]) = 142[V2DI] The higher 64 bits of 142[V2DI] is unknown here and it generated incorrect code when store back to memory. This patch would like to fix this issue by adding a new SCALAR_MOVE_MERGED_OP for vec_set. Please note this patch doesn't enable VLS for vec_set, the underlying patches will support this soon. gcc/ChangeLog: * config/riscv/autovec.md: Bugfix. * config/riscv/riscv-protos.h (SCALAR_MOVE_MERGED_OP): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-09-15RISC-V: Fix using wrong mode to get reduction insn vlmaxLehua Ding1-6/+6
This patch fix using wrong mode when emit vlmax reduction insn. We should use src operand instead dest operand (which always LMUL=m1) to get the vlmax length. This patch alse remove dest_mode and mask_mode from insn_expander constructor, which can be geted by insn_flags. gcc/ChangeLog: * config/riscv/riscv-protos.h (enum insn_flags): Change name. (enum insn_type): Ditto. * config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): Removed. (emit_vlmax_insn): Adjust. (emit_nonvlmax_insn): Adjust. (emit_vlmax_insn_lra): Adjust. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: New test.
2023-09-15RISC-V: Refactor expand_reduction and cleanup enum reduction_typeLehua Ding1-8/+1
This patch refactors expand_reduction, remove the reduction_type argument and add insn_flags argument to determine the passing of the operands. ops has also been modified to restrict it to only two cases and to remove operand that are not in use. gcc/ChangeLog: * config/riscv/autovec-opt.md: Adjust. * config/riscv/autovec.md: Ditto. * config/riscv/riscv-protos.h (enum class): Delete enum reduction_type. (expand_reduction): Adjust expand_reduction prototype. * config/riscv/riscv-v.cc (need_mask_operand_p): New helper function. (expand_reduction): Refactor expand_reduction.
2023-09-14RISC-V: Refactor vector reduction patternsLehua Ding1-1/+1
This patch adjust reduction patterns struct, change it from: (any_reduc:VI (vec_duplicate:VI (vec_select:<VEL> (match_operand:<V_LMUL1> 4 "register_operand" " vr, vr") (parallel [(const_int 0)]))) (match_operand:VI 3 "register_operand" " vr, vr")) to: (unspec:<V_LMUL1> [ (match_operand:VI 3 "register_operand" " vr, vr") (match_operand:<V_LMUL1> 4 "register_operand" " vr, vr") ] ANY_REDUC) The reason for the change is that the semantics of the previous pattern is incorrect. GCC does not have a standard rtx code to express the reduction calculation process. It makes more sense to use UNSPEC. Further, all reduction icode are geted by the UNSPEC and MODE (code_for_pred (unspec, mode)), so that all reduction patterns can have a uniform icode name. After this adjust, widen_reducop and widen_freducop are redundant. gcc/ChangeLog: * config/riscv/autovec.md: Change rtx code to unspec. * config/riscv/riscv-protos.h (expand_reduction): Change prototype. * config/riscv/riscv-v.cc (expand_reduction): Change prototype. * config/riscv/riscv-vector-builtins-bases.cc (class widen_reducop): Removed. (class widen_freducop): Removed. * config/riscv/vector-iterators.md (minu): Add reduc unspec, iterators, attrs. * config/riscv/vector.md (@pred_reduc_<reduc><mode>): Change name. (@pred_<reduc_op><mode>): New name. (@pred_widen_reduc_plus<v_su><mode>): Change name. (@pred_reduc_plus<order><mode>): Change name. (@pred_widen_reduc_plus<order><mode>): Change name.
2023-09-12riscv: Add support for str(n)cmp inline expansionChristoph Müllner1-0/+1
This patch implements expansions for the cmpstrsi and cmpstrnsi builtins for RV32/RV64 for xlen-aligned strings if Zbb or XTheadBb instructions are available. The expansion basically emits a comparison sequence which compares XLEN bits per step if possible. This allows to inline calls to strcmp() and strncmp() if both strings are xlen-aligned. For strncmp() the length parameter needs to be known. The benefits over calls to libc are: * no call/ret instructions * no stack frame allocation * no register saving/restoring * no alignment tests The inlining mechanism is gated by a new switches ('-minline-strcmp' and '-minline-strncmp') and by the variable 'optimize_size'. The amount of emitted unrolled loop iterations can be controlled by the parameter '--param=riscv-strcmp-inline-limit=N', which defaults to 64. The comparision sequence is inspired by the strcmp example in the appendix of the Bitmanip specification (incl. the fast result calculation in case the first word does not contain a NULL byte). Additional inspiration comes from rs6000-string.c. The emitted sequence is not triggering any readahead pagefault issues, because only aligned strings are accessed by aligned xlen-loads. This patch has been tested using the glibc string tests on QEMU: * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=64 * rv64gc_zbb/rv64gc_xtheadbb with riscv-strcmp-inline-limit=8 * rv32gc_zbb/rv32gc_xtheadbb with riscv-strcmp-inline-limit=64 Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> gcc/ChangeLog: * config/riscv/bitmanip.md (*<optab>_not<mode>): Export INSN name. (<optab>_not<mode>3): Likewise. * config/riscv/riscv-protos.h (riscv_expand_strcmp): New prototype. * config/riscv/riscv-string.cc (GEN_EMIT_HELPER3): New helper macros. (GEN_EMIT_HELPER2): Likewise. (emit_strcmp_scalar_compare_byte): New function. (emit_strcmp_scalar_compare_subword): Likewise. (emit_strcmp_scalar_compare_word): Likewise. (emit_strcmp_scalar_load_and_compare): Likewise. (emit_strcmp_scalar_call_to_libc): Likewise. (emit_strcmp_scalar_result_calculation_nonul): Likewise. (emit_strcmp_scalar_result_calculation): Likewise. (riscv_expand_strcmp_scalar): Likewise. (riscv_expand_strcmp): Likewise. * config/riscv/riscv.md (*slt<u>_<X:mode><GPR:mode>): Export INSN name. (@slt<u>_<X:mode><GPR:mode>3): Likewise. (cmpstrnsi): Invoke expansion function for str(n)cmp. (cmpstrsi): Likewise. * config/riscv/riscv.opt: Add new parameter '-mstring-compare-inline-limit'. * doc/invoke.texi: Document new parameter '-mstring-compare-inline-limit'. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadbb-strcmp.c: New test. * gcc.target/riscv/zbb-strcmp-disabled-2.c: New test. * gcc.target/riscv/zbb-strcmp-disabled.c: New test. * gcc.target/riscv/zbb-strcmp-unaligned.c: New test. * gcc.target/riscv/zbb-strcmp.c: New test. Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
2023-09-12riscv: Add support for strlen inline expansionChristoph Müllner1-0/+3
This patch implements the expansion of the strlen builtin for RV32/RV64 for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available. The inserted sequences are: rv32gc_zbb (RV64 is similar): add a3,a0,4 li a4,-1 .L1: lw a5,0(a0) add a0,a0,4 orc.b a5,a5 beq a5,a4,.L1 not a5,a5 ctz a5,a5 srl a5,a5,0x3 add a0,a0,a5 sub a0,a0,a3 rv64gc_xtheadbb (RV32 is similar): add a4,a0,8 .L2: ld a5,0(a0) add a0,a0,8 th.tstnbz a5,a5 beqz a5,.L2 th.rev a5,a5 th.ff1 a5,a5 srl a5,a5,0x3 add a0,a0,a5 sub a0,a0,a4 This allows to inline calls to strlen(), with optimized code for xlen-aligned strings, resulting in the following benefits over a call to libc: * no call/ret instructions * no stack frame allocation * no register saving/restoring * no alignment test The inlining mechanism is gated by a new switch ('-minline-strlen') and by the variable 'optimize_size'. Tested using the glibc string tests. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> gcc/ChangeLog: * config.gcc: Add new object riscv-string.o. riscv-string.cc. * config/riscv/riscv-protos.h (riscv_expand_strlen): New function. * config/riscv/riscv.md (strlen<mode>): New expand INSN. * config/riscv/riscv.opt: New flag 'minline-strlen'. * config/riscv/t-riscv: Add new object riscv-string.o. * config/riscv/thead.md (th_rev<mode>2): Export INSN name. (th_rev<mode>2): Likewise. (th_tstnbz<mode>2): New INSN. * doc/invoke.texi: Document '-minline-strlen'. * emit-rtl.cc (emit_likely_jump_insn): New helper function. (emit_unlikely_jump_insn): Likewise. * rtl.h (emit_likely_jump_insn): New prototype. (emit_unlikely_jump_insn): Likewise. * config/riscv/riscv-string.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test. * gcc.target/riscv/xtheadbb-strlen.c: New test. * gcc.target/riscv/zbb-strlen-disabled-2.c: New test. * gcc.target/riscv/zbb-strlen-disabled.c: New test. * gcc.target/riscv/zbb-strlen-unaligned.c: New test. * gcc.target/riscv/zbb-strlen.c: New test.
2023-09-11RISC-V: Remove redundant functionsJuzhe-Zhong1-2/+0
I just finished V2 version of LMUL cost model. Turns out we don't these redundant functions. Remove them. gcc/ChangeLog: * config/riscv/riscv-protos.h (get_all_predecessors): Remove. (get_all_successors): Ditto. * config/riscv/riscv-v.cc (get_all_predecessors): Ditto. (get_all_successors): Ditto.
2023-09-11RISC-V: Add VLS modes VEC_PERM support[PR111311]Juzhe-Zhong1-0/+2
This patch add VLS modes VEC_PERM support which fix these following FAILs in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311: FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-40.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 0 FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_FIELD_REF" 0 FAIL: gcc.dg/tree-ssa/forwprop-41.c scan-tree-dump-times optimized "BIT_INSERT_EXPR" 1 These FAILs are fixed after this patch. PR target/111311 gcc/ChangeLog: * config/riscv/autovec.md: Add VLS modes. * config/riscv/riscv-protos.h (cmp_lmul_le_one): New function. (cmp_lmul_gt_one): Ditto. * config/riscv/riscv-v.cc (cmp_lmul_le_one): Ditto. (cmp_lmul_gt_one): Ditto. * config/riscv/riscv.cc (riscv_print_operand): Add VLS modes. (riscv_vectorize_vec_perm_const): Ditto. * config/riscv/vector-iterators.md: Ditto. * config/riscv/vector.md: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/compress-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/compress-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/compress-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/compress-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/compress-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/compress-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/merge-7.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-3.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-4.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-5.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-6.c: New test. * gcc.target/riscv/rvv/autovec/vls/perm-7.c: New test.
2023-09-06RISC-V: Part-3: Output .variant_cc directive for vector functionLehua Ding1-0/+3
Functions which follow vector calling convention variant need be annotated by .variant_cc directive according the RISC-V Assembly Programmer's Manual[1] and RISC-V ELF Specification[2]. [1] https://github.com/riscv-non-isa/riscv-asm-manual/blob/master/riscv-asm.md#pseudo-ops [2] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc#dynamic-linking gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_declare_function_name): Add protos. (riscv_asm_output_alias): Ditto. (riscv_asm_output_external): Ditto. * config/riscv/riscv.cc (riscv_asm_output_variant_cc): Output .variant_cc directive for vector function. (riscv_declare_function_name): Ditto. (riscv_asm_output_alias): Ditto. (riscv_asm_output_external): Ditto. * config/riscv/riscv.h (ASM_DECLARE_FUNCTION_NAME): Implement ASM_DECLARE_FUNCTION_NAME. (ASM_OUTPUT_DEF_FROM_DECLS): Implement ASM_OUTPUT_DEF_FROM_DECLS. (ASM_OUTPUT_EXTERNAL): Implement ASM_OUTPUT_EXTERNAL. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-call-variant_cc.c: New test.
2023-09-06RISC-V: Part-1: Select suitable vector registers for vector type args and ↵Lehua Ding1-0/+1
returns I post the vector register calling convention rules from in the proposal[1] directly here: v0 is used to pass the first vector mask argument to a function, and to return vector mask result from a function. v8-v23 are used to pass vector data arguments, vector tuple arguments and the rest vector mask arguments to a function, and to return vector data and vector tuple results from a function. Each vector data type and vector tuple type has an LMUL attribute that indicates a vector register group. The value of LMUL indicates the number of vector registers in the vector register group and requires the first vector register number in the vector register group must be a multiple of it. For example, the LMUL of `vint64m8_t` is 8, so v8-v15 vector register group can be allocated to this type, but v9-v16 can not because the v9 register number is not a multiple of 8. If LMUL is less than 1, it is treated as 1. If it is a vector mask type, its LMUL is 1. Each vector tuple type also has an NFIELDS attribute that indicates how many vector register groups the type contains. Thus a vector tuple type needs to take up LMUL×NFIELDS registers. The rules for passing vector arguments are as follows: 1. For the first vector mask argument, use v0 to pass it. The argument has now been allocated. 2. For vector data arguments or rest vector mask arguments, starting from the v8 register, if a vector register group between v8-v23 that has not been allocated can be found and the first register number is a multiple of LMUL, then allocate this vector register group to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated. 3. For vector tuple arguments, starting from the v8 register, if NFIELDS consecutive vector register groups between v8-v23 that have not been allocated can be found and the first register number is a multiple of LMUL, then allocate these vector register groups to the argument and mark these registers as allocated. Otherwise, pass it by reference. The argument has now been allocated. NOTE: It should be stressed that the search for the appropriate vector register groups starts at v8 each time and does not start at the next register after the registers are allocated for the previous vector argument. Therefore, it is possible that the vector register number allocated to a vector argument can be less than the vector register number allocated to previous vector arguments. For example, for the function `void foo (vint32m1_t a, vint32m2_t b, vint32m1_t c)`, according to the rules of allocation, v8 will be allocated to `a`, v10-v11 will be allocated to `b` and v9 will be allocated to `c`. This approach allows more vector registers to be allocated to arguments in some cases. Vector values are returned in the same manner as the first named argument of the same type would be passed. [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/389 gcc/ChangeLog: * config/riscv/riscv-protos.h (builtin_type_p): New function for checking vector type. * config/riscv/riscv-vector-builtins.cc (builtin_type_p): Ditto. * config/riscv/riscv.cc (struct riscv_arg_info): New fields. (riscv_init_cumulative_args): Setup variant_cc field. (riscv_vector_type_p): New function for checking vector type. (riscv_hard_regno_nregs): Hoist declare. (riscv_get_vector_arg): Subroutine of riscv_get_arg_info. (riscv_get_arg_info): Support vector cc. (riscv_function_arg_advance): Update cum. (riscv_pass_by_reference): Handle vector args. (riscv_v_abi): New function return vector abi. (riscv_return_value_is_vector_type_p): New function for check vector arguments. (riscv_arguments_is_vector_type_p): New function for check vector returns. (riscv_fntype_abi): Implement TARGET_FNTYPE_ABI. (TARGET_FNTYPE_ABI): Implement TARGET_FNTYPE_ABI. * config/riscv/riscv.h (GCC_RISCV_H): Define macros for vector abi. (MAX_ARGS_IN_VECTOR_REGISTERS): Ditto. (MAX_ARGS_IN_MASK_REGISTERS): Ditto. (V_ARG_FIRST): Ditto. (V_ARG_LAST): Ditto. (enum riscv_cc): Define all RISCV_CC variants. * config/riscv/riscv.opt: Add --param=riscv-vector-abi. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-call-args-1-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-1.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-2-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-2.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-3-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-3.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-4-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-args-4.c: New test. * gcc.target/riscv/rvv/base/abi-call-error-1.c: New test. * gcc.target/riscv/rvv/base/abi-call-return-run.c: New test. * gcc.target/riscv/rvv/base/abi-call-return.c: New test.
2023-09-05RISC-V: Export functions as global extern preparing for dynamic LMUL patch useJuzhe-Zhong1-0/+3
Notice those functions need to be use by COST model for dynamic LMUL use. Extract as a single patch and committed. gcc/ChangeLog: * config/riscv/riscv-protos.h (lookup_vector_type_attribute): Export global. (get_all_predecessors): New function. (get_all_successors): Ditto. * config/riscv/riscv-v.cc (get_all_predecessors): Ditto. (get_all_successors): Ditto. * config/riscv/riscv-vector-builtins.cc (sizeless_type_p): Export global. * config/riscv/riscv-vsetvl.cc (get_all_predecessors): Remove it.
2023-09-01RISC-V: Adjust expand_cond_len_{unary,binop,op} apiLehua Ding1-2/+2
This patch change expand_cond_len_{unary,binop}'s argument `rtx_code code` to `unsigned icode` and use the icode directly to determine whether the rounding_mode operand is required. gcc/ChangeLog: * config/riscv/autovec.md: Adjust. * config/riscv/riscv-protos.h (expand_cond_len_unop): Ditto. (expand_cond_len_binop): Ditto. * config/riscv/riscv-v.cc (needs_fp_rounding): Ditto. (expand_cond_len_op): Ditto. (expand_cond_len_unop): Ditto. (expand_cond_len_binop): Ditto. (expand_cond_len_ternop): Ditto.
2023-08-31RISC-V: Change vsetvl tail and mask policy to default policyLehua Ding1-0/+3
This patch change the vsetvl policy to default policy (returned by get_prefer_mask_policy and get_prefer_tail_policy) instead fixed policy. Any policy is now returned, allowing change to agnostic or undisturbed. In the future, users may be able to control the default policy, such as keeping agnostic by compiler options. gcc/ChangeLog: * config/riscv/riscv-protos.h (IS_AGNOSTIC): Move to here. * config/riscv/riscv-v.cc (gen_no_side_effects_vsetvl_rtx): Change to default policy. * config/riscv/riscv-vector-builtins-bases.cc: Change to default policy. * config/riscv/riscv-vsetvl.h (IS_AGNOSTIC): Delete. * config/riscv/riscv.cc (riscv_print_operand): Use IS_AGNOSTIC to test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: Adjust. * gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: Adjust. * gcc.target/riscv/rvv/vsetvl/vsetvl-24.c: New test.
2023-08-31RISC-V: Refactor and clean emit_{vlmax,nonvlmax}_xxx functionsLehua Ding1-34/+161
This patch refactor the code of emit_{vlmax,nonvlmax}_xxx functions. These functions are used to generate RVV insn. There are currently 31 such functions and a few duplicates. The reason so many functions are needed is because there are more types of RVV instructions. There are patterns that don't have mask operand, patterns that don't have merge operand, and patterns that don't need a tail policy operand, etc. Previously there was the insn_type enum, but it's value was just used to indicate how many operands were passed in by caller. The rest of the operands information is scattered throughout these functions. For example, emit_vlmax_fp_insn indicates that a rounding mode operand of FRM_DYN should also be passed, emit_vlmax_merge_insn means that there is no mask operand or mask policy operand. I introduced a new enum insn_flags to indicate some properties of these RVV patterns. These insn_flags are then used to define insn_type enum. For example for the defintion of WIDEN_TERNARY_OP: WIDEN_TERNARY_OP = HAS_DEST_P | HAS_MASK_P | USE_ALL_TRUES_MASK_P | TDEFAULT_POLICY_P | MDEFAULT_POLICY_P | TERNARY_OP_P, This flags mean the RVV pattern has no merge operand. This flags only apply to vwmacc instructions. After defining the desired insn_type, all the emit_{vlmax,nonvlmax}_xxx functions are unified into three functions: emit_vlmax_insn (icode, insn_flags, ops); emit_nonvlmax_insn (icode, insn_flags, ops, vl); emit_vlmax_insn_lra (icode, insn_flags, ops, vl); Then user can select the appropriate insn_type and the appropriate emit_xxx function for RVV patterns generation as needed. gcc/ChangeLog: * config/riscv/autovec-opt.md: Adjust. * config/riscv/autovec-vls.md: Ditto. * config/riscv/autovec.md: Ditto. * config/riscv/riscv-protos.h (enum insn_type): Add insn_type. (enum insn_flags): Add insn flags. (emit_vlmax_insn): Adjust. (emit_vlmax_fp_insn): Delete. (emit_vlmax_ternary_insn): Delete. (emit_vlmax_fp_ternary_insn): Delete. (emit_nonvlmax_insn): Adjust. (emit_vlmax_slide_insn): Delete. (emit_nonvlmax_slide_tu_insn): Delete. (emit_vlmax_merge_insn): Delete. (emit_vlmax_cmp_insn): Delete. (emit_vlmax_cmp_mu_insn): Delete. (emit_vlmax_masked_mu_insn): Delete. (emit_scalar_move_insn): Delete. (emit_nonvlmax_integer_move_insn): Delete. (emit_vlmax_insn_lra): Add. * config/riscv/riscv-v.cc (get_mask_mode_from_insn_flags): New. (emit_vlmax_insn): Adjust. (emit_nonvlmax_insn): Adjust. (emit_vlmax_insn_lra): Add. (emit_vlmax_fp_insn): Delete. (emit_vlmax_ternary_insn): Delete. (emit_vlmax_fp_ternary_insn): Delete. (emit_vlmax_slide_insn): Delete. (emit_nonvlmax_slide_tu_insn): Delete. (emit_nonvlmax_slide_insn): Delete. (emit_vlmax_merge_insn): Delete. (emit_vlmax_cmp_insn): Delete. (emit_vlmax_cmp_mu_insn): Delete. (emit_vlmax_masked_insn): Delete. (emit_nonvlmax_masked_insn): Delete. (emit_vlmax_masked_store_insn): Delete. (emit_nonvlmax_masked_store_insn): Delete. (emit_vlmax_masked_mu_insn): Delete. (emit_vlmax_masked_fp_mu_insn): Delete. (emit_nonvlmax_tu_insn): Delete. (emit_nonvlmax_fp_tu_insn): Delete. (emit_nonvlmax_tumu_insn): Delete. (emit_nonvlmax_fp_tumu_insn): Delete. (emit_scalar_move_insn): Delete. (emit_cpop_insn): Delete. (emit_vlmax_integer_move_insn): Delete. (emit_nonvlmax_integer_move_insn): Delete. (emit_vlmax_gather_insn): Delete. (emit_vlmax_masked_gather_mu_insn): Delete. (emit_vlmax_compress_insn): Delete. (emit_nonvlmax_compress_insn): Delete. (emit_vlmax_reduction_insn): Delete. (emit_vlmax_fp_reduction_insn): Delete. (emit_nonvlmax_fp_reduction_insn): Delete. (expand_vec_series): Adjust. (expand_const_vector): Adjust. (legitimize_move): Adjust. (sew64_scalar_helper): Adjust. (expand_tuple_move): Adjust. (expand_vector_init_insert_elems): Adjust. (expand_vector_init_merge_repeating_sequence): Adjust. (expand_vec_cmp): Adjust. (expand_vec_cmp_float): Adjust. (expand_vec_perm): Adjust. (shuffle_merge_patterns): Adjust. (shuffle_compress_patterns): Adjust. (shuffle_decompress_patterns): Adjust. (expand_load_store): Adjust. (expand_cond_len_op): Adjust. (expand_cond_len_unop): Adjust. (expand_cond_len_binop): Adjust. (expand_gather_scatter): Adjust. (expand_cond_len_ternop): Adjust. (expand_reduction): Adjust. (expand_lanes_load_store): Adjust. (expand_fold_extract_last): Adjust. * config/riscv/riscv.cc (vector_zero_call_used_regs): Adjust. * config/riscv/vector.md: Adjust.
2023-08-30RISC-V: support cm.push cm.pop cm.popret in zcmpFei Gao1-0/+2
Zcmp can share the same logic as save-restore in stack allocation: pre-allocation by cm.push, step 1 and step 2. Pre-allocation not only saves callee saved GPRs, but also saves callee saved FPRs and local variables if any. Please be noted cm.push pushes ra, s0-s11 in reverse order than what save-restore does. So adaption has been done in .cfi directives in my patch. gcc/ChangeLog: * config/riscv/iterators.md (slot0_offset): slot 0 offset in stack GPRs area in bytes (slot1_offset): slot 1 offset in stack GPRs area in bytes (slot2_offset): likewise (slot3_offset): likewise (slot4_offset): likewise (slot5_offset): likewise (slot6_offset): likewise (slot7_offset): likewise (slot8_offset): likewise (slot9_offset): likewise (slot10_offset): likewise (slot11_offset): likewise (slot12_offset): likewise * config/riscv/predicates.md (stack_push_up_to_ra_operand): predicates of stack adjust pushing ra (stack_push_up_to_s0_operand): predicates of stack adjust pushing ra, s0 (stack_push_up_to_s1_operand): likewise (stack_push_up_to_s2_operand): likewise (stack_push_up_to_s3_operand): likewise (stack_push_up_to_s4_operand): likewise (stack_push_up_to_s5_operand): likewise (stack_push_up_to_s6_operand): likewise (stack_push_up_to_s7_operand): likewise (stack_push_up_to_s8_operand): likewise (stack_push_up_to_s9_operand): likewise (stack_push_up_to_s11_operand): likewise (stack_pop_up_to_ra_operand): predicates of stack adjust poping ra (stack_pop_up_to_s0_operand): predicates of stack adjust poping ra, s0 (stack_pop_up_to_s1_operand): likewise (stack_pop_up_to_s2_operand): likewise (stack_pop_up_to_s3_operand): likewise (stack_pop_up_to_s4_operand): likewise (stack_pop_up_to_s5_operand): likewise (stack_pop_up_to_s6_operand): likewise (stack_pop_up_to_s7_operand): likewise (stack_pop_up_to_s8_operand): likewise (stack_pop_up_to_s9_operand): likewise (stack_pop_up_to_s11_operand): likewise * config/riscv/riscv-protos.h (riscv_zcmp_valid_stack_adj_bytes_p):declaration * config/riscv/riscv.cc (struct riscv_frame_info): comment change (riscv_avoid_multi_push): helper function of riscv_use_multi_push (riscv_use_multi_push): true if multi push is used (riscv_multi_push_sregs_count): num of sregs in multi-push (riscv_multi_push_regs_count): num of regs in multi-push (riscv_16bytes_align): align to 16 bytes (riscv_stack_align): moved to a better place (riscv_save_libcall_count): no functional change (riscv_compute_frame_info): add zcmp frame info (riscv_for_each_saved_reg): save or restore fprs in specified slot for zcmp (riscv_adjust_multi_push_cfi_prologue): adjust cfi for cm.push (riscv_gen_multi_push_pop_insn): gen function for multi push and pop (get_multi_push_fpr_mask): get mask for the fprs pushed by cm.push (riscv_expand_prologue): allocate stack by cm.push (riscv_adjust_multi_pop_cfi_epilogue): adjust cfi for cm.pop[ret] (riscv_expand_epilogue): allocate stack by cm.pop[ret] (zcmp_base_adj): calculate stack adjustment base size (zcmp_additional_adj): calculate stack adjustment additional size (riscv_zcmp_valid_stack_adj_bytes_p): check if stack adjustment valid * config/riscv/riscv.h (RETURN_ADDR_MASK): mask of ra (S0_MASK): likewise (S1_MASK): likewise (S2_MASK): likewise (S3_MASK): likewise (S4_MASK): likewise (S5_MASK): likewise (S6_MASK): likewise (S7_MASK): likewise (S8_MASK): likewise (S9_MASK): likewise (S10_MASK): likewise (S11_MASK): likewise (MULTI_PUSH_GPR_MASK): GPR_MASK that cm.push can cover at most (ZCMP_MAX_SPIMM): max spimm value (ZCMP_SP_INC_STEP): zcmp sp increment step (ZCMP_INVALID_S0S10_SREGS_COUNTS): num of s0-s10 (ZCMP_S0S11_SREGS_COUNTS): num of s0-s11 (ZCMP_MAX_GRP_SLOTS): max slots of pushing and poping in zcmp (CALLEE_SAVED_FREG_NUMBER): get x of fsx(fs0 ~ fs11) * config/riscv/riscv.md: include zc.md * config/riscv/zc.md: New file. machine description for zcmp gcc/testsuite/ChangeLog: * gcc.target/riscv/rv32e_zcmp.c: New test. * gcc.target/riscv/rv32i_zcmp.c: New test. * gcc.target/riscv/zcmp_push_fpr.c: New test. * gcc.target/riscv/zcmp_stack_alignment.c: New test.
2023-08-29RISC-V: Refactor and clean expand_cond_len_{unop,binop,ternop}Lehua Ding1-10/+6
This patch refactors the codes of expand_cond_len_{unop,binop,ternop}. Introduces a new unified function expand_cond_len_op to do the main thing. The expand_cond_len_{unop,binop,ternop} functions only care about how to pass the operands to the intrinsic patterns. gcc/ChangeLog: * config/riscv/autovec.md: Adjust * config/riscv/riscv-protos.h (RVV_VUNDEF): Clean. (get_vlmax_rtx): Exported. * config/riscv/riscv-v.cc (emit_nonvlmax_fp_ternary_tu_insn): Deleted. (emit_vlmax_masked_gather_mu_insn): Adjust. (get_vlmax_rtx): New func. (expand_load_store): Adjust. (expand_cond_len_unop): Call expand_cond_len_op. (expand_cond_len_op): New subroutine. (expand_cond_len_binop): Call expand_cond_len_op. (expand_cond_len_ternop): Call expand_cond_len_op. (expand_lanes_load_store): Adjust.
2023-08-26RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorizationJuzhe-Zhong1-0/+2
Consider this following case: int __attribute__ ((noinline, noclone)) condition_reduction (int *a, int min_v) { int last = 66; /* High start value. */ for (int i = 0; i < 4; i++) if (a[i] < min_v) last = i; return last; } --param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8 condition_reduction: vsetvli a4,zero,e32,m8,ta,ma li a5,32 vmv.v.x v8,a1 vl8re32.v v0,0(a0) vid.v v16 vmslt.vv v0,v0,v8 vsetvli zero,a5,e8,m2,ta,ma vcpop.m a5,v0 beq a5,zero,.L2 addi a5,a5,-1 vsetvli a4,zero,e32,m8,ta,ma vcompress.vm v8,v16,v0 vslidedown.vx v8,v8,a5 vmv.x.s a0,v8 ret .L2: li a0,66 ret --param=riscv-autovec-preference=scalable condition_reduction: csrr a6,vlenb mv a2,a0 li a3,32 li a0,66 srli a6,a6,2 vsetvli a4,zero,e32,m1,ta,ma vmv.v.x v4,a1 vid.v v1 .L4: vsetvli a5,a3,e8,mf4,tu,mu vsetvli zero,a5,e32,m1,ta,ma ----> redundant vsetvl vle32.v v0,0(a2) vsetvli a4,zero,e32,m1,ta,ma slli a1,a5,2 vmv.v.x v2,a6 vmslt.vv v0,v0,v4 sub a3,a3,a5 vmv1r.v v3,v1 vadd.vv v1,v1,v2 vsetvli zero,a5,e8,mf4,ta,ma vcpop.m a5,v0 beq a5,zero,.L3 addi a5,a5,-1 vsetvli a4,zero,e32,m1,ta,ma vcompress.vm v2,v3,v0 vslidedown.vx v2,v2,a5 vmv.x.s a0,v2 .L3: sext.w a0,a0 add a2,a2,a1 bne a3,zero,.L4 ret There is a redundant vsetvli instruction in VLA vectorized codes which is the VSETVL PASS issue. vsetvl issue is not included in this patch but will be fixed soon. gcc/ChangeLog: * config/riscv/autovec.md (len_fold_extract_last_<mode>): New pattern. * config/riscv/riscv-protos.h (enum insn_type): New enum. (expand_fold_extract_last): New function. * config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto. (emit_cpop_insn): Ditto. (emit_nonvlmax_compress_insn): Ditto. (expand_fold_extract_last): Ditto. * config/riscv/vector.md: Fix vcpop.m ratio demand. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test. * gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.
2023-08-25[PATCH v10] RISC-V: Add support for the Zfa extensionJin Ma1-0/+1
This patch adds the 'Zfa' extension for riscv, which is based on: https://github.com/riscv/riscv-isa-manual/commits/zfb The binutils-gdb for 'Zfa' extension: https://sourceware.org/pipermail/binutils/2023-April/127060.html What needs special explanation is: 1, According to riscv-spec, "The FCVTMO D.W.D instruction was added principally to accelerate the processing of JavaScript Numbers.", so it seems that no implementation is required. 2, The instructions FMINM and FMAXM correspond to C23 library function fminimum and fmaximum. Therefore, this patch has simply implemented the pattern of fminm<hf\sf\df>3 and fmaxm<hf\sf\df>3 to prepare for later. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add zfa extension version, which depends on the F extension. * config/riscv/constraints.md (zfli): Constrain the floating point number that the instructions FLI.H/S/D can load. * config/riscv/iterators.md (ceil): New. * config/riscv/riscv-opts.h (MASK_ZFA): New. (TARGET_ZFA): New. * config/riscv/riscv-protos.h (riscv_float_const_rtx_index_for_fli): New. * config/riscv/riscv.cc (riscv_float_const_rtx_index_for_fli): New. (riscv_cannot_force_const_mem): If instruction FLI.H/S/D can be used, memory is not applicable. (riscv_const_insns): Likewise. (riscv_legitimize_const_move): Likewise. (riscv_split_64bit_move_p): If instruction FLI.H/S/D can be used, no split is required. (riscv_split_doubleword_move): Likewise. (riscv_output_move): Output the mov instructions in zfa extension. (riscv_print_operand): Output the floating-point value of the FLI.H/S/D immediate in assembly. (riscv_secondary_memory_needed): Likewise. * config/riscv/riscv.md (fminm<mode>3): New. (fmaxm<mode>3): New. (movsidf2_low_rv32): New. (movsidf2_high_rv32): New. (movdfsisi3_rv32): New. (f<quiet_pattern>_quiet<ANYF:mode><X:mode>4_zfa): New. * config/riscv/riscv.opt: New. gcc/testsuite/ChangeLog: * gcc.target/riscv/zfa-fleq-fltq.c: New test. * gcc.target/riscv/zfa-fli-zfh.c: New test. * gcc.target/riscv/zfa-fli.c: New test. * gcc.target/riscv/zfa-fmovh-fmovp.c: New test. * gcc.target/riscv/zfa-fli-1.c: New test. * gcc.target/riscv/zfa-fli-2.c: New test. * gcc.target/riscv/zfa-fli-3.c: New test. * gcc.target/riscv/zfa-fli-4.c: New test. * gcc.target/riscv/zfa-fli-6.c: New test. * gcc.target/riscv/zfa-fli-7.c: New test. * gcc.target/riscv/zfa-fli-8.c: New test. Co-authored-by: Tsukasa OI <research_trasio@irq.a4lg.com>
2023-08-23RISC-V: Add conditional unary neg/abs/not autovec patternsLehua Ding1-2/+5
Hi, This patch add conditional unary neg/abs/not autovec patterns to RISC-V backend. For this C code: void test_3 (float *__restrict a, float *__restrict b, int *__restrict pred, int n) { for (int i = 0; i < n; i += 1) { a[i] = pred[i] ? __builtin_fabsf (b[i]) : a[i]; } } Before this patch: ... vsetvli a7,zero,e32,m1,ta,ma vfabs.v v2,v2 vmerge.vvm v1,v1,v2,v0 ... After this patch: ... vsetvli a7,zero,e32,m1,ta,mu vfabs.v v1,v2,v0.t ... For int neg/not and FP neg patterns, Defining the corresponding cond_xxx paterns is enough. For the FP abs pattern, We need to change the definition of `abs<mode>2` and `@vcond_mask_<mode><vm>` pattern from define_expand to define_insn_and_split in order to fuse them into a new pattern `*cond_abs<mode>` at the combine pass. A fusion process similar to the one below: (insn 30 29 31 4 (set (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ]))) "float.c":15:56 discrim 1 12799 {absrvvm1sf2} (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ]) (nil))) (insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ]) (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ]) (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 12707 {vcond_mask_rvvm1sfrvvmf32bi} (expr_list:REG_DEAD (reg:RVVM1SF 152 [ vect_iftmp.15 ]) (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ]) (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ]) (nil))))) ==> (insn 31 30 32 4 (set (reg:RVVM1SF 140 [ vect_iftmp.19 ]) (if_then_else:RVVM1SF (reg:RVVMF32BI 136 [ mask__27.11 ]) (abs:RVVM1SF (reg:RVVM1SF 137 [ vect__6.14 ])) (reg:RVVM1SF 139 [ vect_iftmp.18 ]))) 13444 {*cond_absrvvm1sf} (expr_list:REG_DEAD (reg:RVVM1SF 137 [ vect__6.14 ]) (expr_list:REG_DEAD (reg:RVVMF32BI 136 [ mask__27.11 ]) (expr_list:REG_DEAD (reg:RVVM1SF 139 [ vect_iftmp.18 ]) (nil))))) Best, Lehua gcc/ChangeLog: * config/riscv/autovec-opt.md (*cond_abs<mode>): New combine pattern. (*copysign<mode>_neg): Ditto. * config/riscv/autovec.md (@vcond_mask_<mode><vm>): Adjust. (<optab><mode>2): Ditto. (cond_<optab><mode>): New. (cond_len_<optab><mode>): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New. (expand_cond_len_unop): New helper func. * config/riscv/riscv-v.cc (shuffle_merge_patterns): Adjust. (expand_cond_len_unop): New helper func. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_unary_run-8.c: New test.
2023-08-16RISC-V: Support MASK_LEN_{LOAD_LANES,STORE_LANES}Juzhe-Zhong1-0/+1
This patch allow us auto-vectorize this following case: void __attribute__ ((noinline, noclone)) \ NAME##_8 (OUTTYPE *__restrict dest, INTYPE *__restrict src, \ MASKTYPE *__restrict cond, intptr_t n) \ { \ for (intptr_t i = 0; i < n; ++i) \ if (cond[i]) \ dest[i] = (src[i * 8] + src[i * 8 + 1] + src[i * 8 + 2] \ + src[i * 8 + 3] + src[i * 8 + 4] + src[i * 8 + 5] \ + src[i * 8 + 6] + src[i * 8 + 7]); \ } TEST_LOOP (NAME##_f32, OUTTYPE, INTYPE, int32_t) \ TEST2 (NAME##_i32, OUTTYPE, int32_t) \ TEST1 (NAME##_i32, int32_t) \ TEST (test) ASM: test_i32_i32_f32_8: ble a3,zero,.L5 .L3: vsetvli a4,a3,e8,mf4,ta,ma vle32.v v0,0(a2) vsetvli a5,zero,e32,m1,ta,ma vmsne.vi v0,v0,0 vsetvli zero,a4,e32,m1,ta,ma vlseg8e32.v v8,(a1),v0.t vsetvli a5,zero,e32,m1,ta,ma slli a6,a4,2 vadd.vv v1,v9,v8 slli a7,a4,5 vadd.vv v1,v1,v10 sub a3,a3,a4 vadd.vv v1,v1,v11 vadd.vv v1,v1,v12 vadd.vv v1,v1,v13 vadd.vv v1,v1,v14 vadd.vv v1,v1,v15 vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a0),v0.t add a2,a2,a6 add a1,a1,a7 add a0,a0,a6 bne a3,zero,.L3 .L5: ret gcc/ChangeLog: * config/riscv/autovec.md (vec_mask_len_load_lanes<mode><vsingle>): New pattern. (vec_mask_len_store_lanes<mode><vsingle>): Ditto. * config/riscv/riscv-protos.h (expand_lanes_load_store): New function. * config/riscv/riscv-v.cc (get_mask_mode): Add tuple mask mode. (expand_lanes_load_store): New function. * config/riscv/vector-iterators.md: New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Adapt test. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-16.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-17.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: Ditto. * gcc.target/riscv/rvv/rvv.exp: Add lanes tests. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_load_run-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/mask_struct_store_run-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-10.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-11.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-12.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-13.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-14.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-15.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-16.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-17.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-18.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-8.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect-9.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-1.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-10.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-11.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-12.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-13.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-14.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-15.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-16.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-17.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-18.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-2.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-3.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-4.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-5.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-6.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-7.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-8.c: New test. * gcc.target/riscv/rvv/autovec/struct/struct_vect_run-9.c: New test.
2023-08-10RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsicPan Li1-0/+4
The frm_mode attr has some assumptions for each define insn as below. 1. The define insn has at least 9 operands. 2. The operands[9] must be frm reg. 3. The operands[9] must be const int. Actually, the frm operand can be operands[8], operands[9] or operands[10], and not all the define insn has frm operands. This patch would like to refactor frm and eliminate the above assumptions, as well as unblock the underlying rounding mode intrinsic API support. After refactor, the default frm will be none, and the selected insn type will be dyn. For the floating point which honors the frm, we will set the frm_mode attr explicitly in define_insn. Signed-off-by: Pan Li <pan2.li@intel.com> Co-Authored-by: Kito Cheng <kito.cheng@sifive.com> gcc/ChangeLog: * config/riscv/riscv-protos.h (enum floating_point_rounding_mode): Add NONE, DYN_EXIT and DYN_CALL. (get_frm_mode): New declaration. * config/riscv/riscv-v.cc (get_frm_mode): New function to get frm mode. * config/riscv/riscv-vector-builtins.cc (function_expander::use_ternop_insn): Take care of frm reg. * config/riscv/riscv.cc (riscv_static_frm_mode_p): Migrate to FRM_XXX. (riscv_emit_frm_mode_set): Ditto. (riscv_emit_mode_set): Ditto. (riscv_frm_adjust_mode_after_call): Ditto. (riscv_frm_mode_needed): Ditto. (riscv_frm_mode_after): Ditto. (riscv_mode_entry): Ditto. (riscv_mode_exit): Ditto. * config/riscv/riscv.h (NUM_MODES_FOR_MODE_SWITCHING): Ditto. * config/riscv/vector.md (rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none): Removed (symbol_ref): * config/riscv/vector.md: Set frm_mode attr explicitly.
2023-08-08RISC-V: Support CALL conditional autovec patternsJuzhe-Zhong1-0/+5
This patch is depending on middle-end patch on vectorizable_call. Consider this following case: void foo (float * __restrict a, float * __restrict b, int * __restrict cond, int n) { for (int i = 0; i < n; i++) if (cond[i]) a[i] = b[i] + a[i]; } Before this patch (**NO** -ffast-math): <source>:5:21: missed: couldn't vectorize loop <source>:5:21: missed: not vectorized: control flow in loop. After this patch: foo: ble a3,zero,.L5 mv a6,a0 .L3: vsetvli a5,a3,e8,mf4,ta,ma vle32.v v0,0(a2) vsetvli a7,zero,e32,m1,ta,ma slli a4,a5,2 vmsne.vi v0,v0,0 sub a3,a3,a5 vsetvli zero,a5,e32,m1,tu,mu ------> must be TUMU vle32.v v2,0(a0),v0.t vle32.v v1,0(a1),v0.t vfadd.vv v1,v1,v2,v0.t ------> generated by COND_LEN_ADD with real mask and len. vse32.v v1,0(a6),v0.t add a2,a2,a4 add a1,a1,a4 add a0,a0,a4 add a6,a6,a4 bne a3,zero,.L3 .L5: ret gcc/ChangeLog: * config/riscv/autovec.md (cond_<optab><mode>): New pattern. (cond_len_<optab><mode>): Ditto. (cond_fma<mode>): Ditto. (cond_len_fma<mode>): Ditto. (cond_fnma<mode>): Ditto. (cond_len_fnma<mode>): Ditto. (cond_fms<mode>): Ditto. (cond_len_fms<mode>): Ditto. (cond_fnms<mode>): Ditto. (cond_len_fnms<mode>): Ditto. * config/riscv/riscv-protos.h (riscv_get_v_regno_alignment): Export global. (enum insn_type): Add new enum type. (prepare_ternary_operands): New function. * config/riscv/riscv-v.cc (emit_vlmax_masked_fp_mu_insn): Ditto. (emit_nonvlmax_tumu_insn): Ditto. (emit_nonvlmax_fp_tumu_insn): Ditto. (expand_cond_len_binop): Add condtional operations. (expand_cond_len_ternop): Ditto. (prepare_ternary_operands): New function. * config/riscv/riscv.cc (riscv_memmodel_needs_amo_release): Export riscv_get_v_regno_alignment as global scope. * config/riscv/vector.md: Fix ternary bugs. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add condition tests. * gcc.target/riscv/rvv/autovec/cond/cond_arith-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith-9.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_arith_run-9.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fadd_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma_run-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmax_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmin_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_fmul_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_logical_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift-9.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-1.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-2.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-3.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-4.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-5.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-6.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-7.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-8.c: New test. * gcc.target/riscv/rvv/autovec/cond/cond_shift_run-9.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-1.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-3.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: New test. * gcc.target/riscv/rvv/autovec/reduc/reduc_call-5.c: New test.
2023-08-07[committed] [RISC-V] Handle more cases in riscv_expand_conditional_moveRaphael Zinsly1-1/+1
As I've mentioned in the main zicond thread, Ventana has had patches that support more cases by first emitting a suitable scc instruction essentially as a canonicalization step of the condition for zicond. For example if we have (set (target) (if_then_else (op (reg1) (reg2)) (true_value) (false_value))) The two register comparison isn't handled by zicond directly. But we can generate something like this instead (set (temp) (op (reg1) (reg2))) (set (target) (if_then_else (op (temp) (const_int 0)) (true_value) (false_value) Then let the remaining code from Xiao handle the true_value/false_value to make sure it's zicond compatible. This is primarily Raphael's work. My involvement has been mostly to move it from its original location (in the .md file) into the expander function and fix minor problems with the FP case. gcc/ * config/riscv/riscv.cc (riscv_expand_int_scc): Add invert_ptr as an argument and pass it to riscv_emit_int_order_test. (riscv_expand_conditional_move): Handle cases where the condition is not EQ/NE or the second argument to the conditional is not (const_int 0). * config/riscv/riscv-protos.h (riscv_expand_int_scc): Update prototype. Co-authored-by: Jeff Law <jlaw@ventanamicro.com>
2023-07-31RISC-V: Return machine_mode rather than opt_machine_mode for get_mask_mode, NFCKito Cheng1-1/+1
We always want get_mask_mode return a valid mode, it's something wrong if it failed, so I think we could just move the `.require ()` into get_mask_mode, instead of calling that every call-site. The only exception is riscv_get_mask_mode, it might put supported mode into get_mask_mode, so added a check with riscv_v_ext_mode_p to make sure only valid vector mode will ask get_mask_mode. gcc/ChangeLog: * config/riscv/autovec.md (abs<mode>2): Remove `.require ()`. * config/riscv/riscv-protos.h (get_mask_mode): Update return type. * config/riscv/riscv-v.cc (rvv_builder::rvv_builder): Remove `.require ()`. (emit_vlmax_insn): Ditto. (emit_vlmax_fp_insn): Ditto. (emit_vlmax_ternary_insn): Ditto. (emit_vlmax_fp_ternary_insn): Ditto. (emit_nonvlmax_fp_ternary_tu_insn): Ditto. (emit_nonvlmax_insn): Ditto. (emit_vlmax_slide_insn): Ditto. (emit_nonvlmax_slide_tu_insn): Ditto. (emit_vlmax_merge_insn): Ditto. (emit_vlmax_masked_insn): Ditto. (emit_nonvlmax_masked_insn): Ditto. (emit_vlmax_masked_store_insn): Ditto. (emit_nonvlmax_masked_store_insn): Ditto. (emit_vlmax_masked_mu_insn): Ditto. (emit_nonvlmax_tu_insn): Ditto. (emit_nonvlmax_fp_tu_insn): Ditto. (emit_scalar_move_insn): Ditto. (emit_vlmax_compress_insn): Ditto. (emit_vlmax_reduction_insn): Ditto. (emit_vlmax_fp_reduction_insn): Ditto. (emit_nonvlmax_fp_reduction_insn): Ditto. (expand_vec_series): Ditto. (expand_vector_init_merge_repeating_sequence): Ditto. (expand_vec_perm): Ditto. (shuffle_merge_patterns): Ditto. (shuffle_compress_patterns): Ditto. (shuffle_decompress_patterns): Ditto. (expand_reduction): Ditto. (get_mask_mode): Update return type. * config/riscv/riscv.cc (riscv_get_mask_mode): Check vector type is valid, and use new get_mask_mode interface.