aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/riscv
AgeCommit message (Collapse)AuthorFilesLines
4 days[PATCH] RISC-V: Fix FIXED_REGISTERS comment missing return address registerYixuan Chen1-1/+1
gcc/ChangeLog: * config/riscv/riscv.h: Fix FIXED_REGISTERS comment missing return address register.
5 daysRISC-V: Add more vector-vector extract cases.Robin Dapp2-0/+212
This adds a V16SI -> V4SI and related i.e. "quartering" vector-vector extract expander for VLS modes. It helps with spills in x264 that may cause a load-hit-store. gcc/ChangeLog: * config/riscv/autovec.md (vec_extract<mode><vls_quarter>): Add quarter vec-vec extract. * config/riscv/vector-iterators.md: New iterators.
10 days[PATCH v3] RISC-V: Fixed incorrect semantic description in DF to DI pattern ↵Jin Ma1-7/+9
in the Zfa extension on rv32. gcc/ChangeLog: * config/riscv/riscv.md: Change "truncate" to unspec for the Zfa extension on rv32. gcc/testsuite/ChangeLog: * gcc.target/riscv/zfa-fmovh-fmovp-bug.c: New test.
10 days[PATCH 1/2] RISC-V: Fix the outer_code when calculating the cost of SET ↵Xianmiao Qu1-1/+1
expression. I think it is a typo. When calculating the 'SET_SRC (x)' cost, outer_code should be set to SET. gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Fix the outer_code when calculating the cost of SET expression.
10 days[PATCH] RISC-V: Fix th.extu operands exceeding range on rv32.Xianmiao Qu1-1/+3
The Combine Pass may generate zero_extract instructions that are out of range. Drawing from other architectures like AArch64, we should impose restrictions on the "*th_extu<mode>4" pattern. gcc/ * config/riscv/thead.md (*th_extu<mode>4): Fix th.extu operands exceeding range on rv32. gcc/testsuite/ * gcc.target/riscv/xtheadbb-extu-4.c: New.
10 days[PATCH] RISC-V: Allow zero operand for DI variants of vssubu.vxBohan Lei1-4/+4
The RISC-V vector machine description relies on the helper function `sew64_scalar_helper` to emit actual insns for the DI variants of vssub.vx and vssubu.vx. This works with vssub.vx, but can cause problems with vssubu.vx with the scalar operand being constant zero, because `has_vi_variant_p` returns false, and the operand will be taken without being loaded into a reg. The attached testcases can cause an internal compiler error as a result. Allowing a constant zero operand in those insns seems to be a simple solution that only affects minimum existing code. gcc/ChangeLog: * config/riscv/vector.md: Allow zero operand for DI variants of vssubu.vx gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vssubu-1.c: New test. * gcc.target/riscv/rvv/base/vssubu-2.c: New test.
11 daysRISC-V: Implement SAT_ADD for signed integer vectorPan Li3-0/+21
This patch would like to implement the ssadd for vector integer. Aka form 1 of ssadd vector. Form 1: #define DEF_VEC_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_1 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum = (UT)x + (UT)y; \ out[i] = (x ^ y) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } \ } DEF_VEC_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) Before this patch: vec_sat_s_add_int64_t_fmt_1: ... vsetvli t1,zero,e64,m1,ta,mu vadd.vv v3,v1,v2 vxor.vv v0,v1,v3 vmslt.vi v0,v0,0 vxor.vv v2,v1,v2 vmsge.vi v2,v2,0 vmand.mm v0,v0,v2 vsra.vx v1,v1,t3 vxor.vv v3,v1,v4,v0.t ... After this patch: vec_sat_s_add_int64_t_fmt_1: ... vsetvli a6,zero,e64,m1,ta,ma vsadd.vv v1,v1,v2 ... The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (ssadd<mode>3): Add new pattern for signed integer vector SAT_ADD. * config/riscv/riscv-protos.h (expand_vec_ssadd): Add new func decl for vector ssadd expanding. * config/riscv/riscv-v.cc (expand_vec_ssadd): Add new func impl to expand vector ssadd pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: Add test data for vector ssadd. * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-4.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-2.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-3.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
13 daysriscv: Fix duplicate assmbler label in @tlsdesc<mode> insnAndreas Schwab2-11/+8
Use %= instead of maintaining a sequence number manually, so that it doesn't result in a duplicate assembler label when the insn is duplicated. PR target/116693 * config/riscv/riscv.cc (riscv_legitimize_tls_address): Don't pass seqno to gen_tlsdesc and remove it. * config/riscv/riscv.md (@tlsdesc<mode>): Remove operand 1. Use %= instead of %1 in template.
2024-09-12RISC-V: Eliminate latter vsetvl when fusedBohan Lei1-0/+3
Hi all, A simple assembly check has been added in this version. Previous version: https://gcc.gnu.org/pipermail/gcc-patches/2024-September/662783.html Thanks, Bohan ------ The current vsetvl pass eliminates a vsetvl instruction when the previous info is "available," but does not when "compatible." This can lead to not only redundancy, but also incorrect behaviors when the previous info happens to be compatible with a later vector instruction, which ends of using the vsetvl info that should have been eliminated, as is shown in the testcase. This patch eliminates the vsetvl when the previous info is "compatible." gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pre_vsetvl::fuse_local_vsetvl_info): Delete vsetvl insn when `prev_info` is compatible gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-4.c: New test.
2024-09-12RISC-V: Fix vl_used_by_non_rvv_insn logic of vsetvl passgarthlei1-5/+11
This patch fixes a bug in the current vsetvl pass. The current pass uses `m_vl` to determine whether the dest operand has been used by non-RVV instructions. However, `m_vl` may have been modified as a result of an `update_avl` call, and thus would be no longer the dest operand of the original instruction. This can lead to incorrect vsetvl eliminations, as is shown in the testcase. In this patch, we create a `dest_vl` variable for this scenerio. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Use `dest_vl` for dest VL operand gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-3.c: New test.
2024-09-07[PATCH] RISC-V: Add missing insn types for XiangShan Nanhu scheduler modelZhao Dingyi1-3/+8
This patch aims to add the missing instruction types to the XiangShan-Nanhu scheduler model. The current XiangShan -Nanhu model lacks the trap, atomic trap, fcvt_i2f, and fcvt_f2i instructions. The trap, atomic, and i2f instructions belong to xs_jmp_rs. [1] The f2i instruction belongs to xs_fmisc_rs.[2] [1] https://github.com/OpenXiangShan/XiangShan/blob/v2.0/src/main/scala/xiangshan/package.scala#L780 [2] https://github.com/OpenXiangShan/XiangShan/blob/v2.0/src/main/scala/xiangshan/backend/decode/DecodeUnit.scala#L290 gcc/ChangeLog: * config/riscv/xiangshan.md: Add atomic, trap, fcvt_i2f, fcvt_f2i.
2024-09-07[PATCH v4] [target/116592] RISC-V: Fix illegal operands "th.vsetvli ↵Jin Ma1-2/+2
zero,0,e32,m8" for XTheadVector Since the THeadVector vsetvli does not support vl as an immediate, we need to convert 0 to zero when outputting asm. PR target/116592 gcc/ChangeLog: * config/riscv/thead.cc (th_asm_output_opcode): Change '0' to "zero" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xtheadvector/pr116592.c: New test.
2024-09-05[PATCH 2/2 v2] RISC-V: Constant synthesis of inverted halvesRaphael Moreira Zinsly1-0/+30
Changes since v1: - Fix synthesis-15.c. -- >8 -- Improve handling of constants where the high half can be constructed by inverting the lower half. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_build_integer): Detect constants were the higher half is the lower half inverted. gcc/testsuite/ChangeLog: * gcc.target/riscv/synthesis-15.c: New test.
2024-09-05[PATCH 1/2 v2] RISC-V: Additional large constant synthesis improvementsRaphael Moreira Zinsly1-6/+132
Changes since v1: - Fix bit31. - Remove negative shift checks. - Fix synthesis-7.c expected output. -- >8 -- Improve handling of large constants in riscv_build_integer, generate better code for constants where the high half can be constructed by shifting/shiftNadding the low half or if the halves differ by less than 2k. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_build_integer): Detect new case of constants that can be improved. (riscv_move_integer): Add synthesys for concatening constants without Zbkb. gcc/testsuite/ChangeLog: * gcc.target/riscv/synthesis-7.c: Adjust expected output. * gcc.target/riscv/synthesis-12.c: New test. * gcc.target/riscv/synthesis-13.c: New test. * gcc.target/riscv/synthesis-14.c: New test.
2024-09-05[V2][RISC-V] Avoid unnecessary extensions after sCC insnsJeff Law1-5/+41
So the first patch failed the pre-commit CI; it didn't fail in my testing because I'm using --with-arch to set a default configuration that includes things like zicond to ensure that's always tested. And the failing test is skipped when zicond is enabled by default. The failing test is designed to ensure that we don't miss an if-conversion due to costing issues around the extension that was typically done in an sCC sequence (which is why it's only run when zicond is off). The test failed because we have a little routine that is highly dependent on the code generated by the sCC expander and will adjust the costing to account for expansion quirks that usually go away in register allocation. That code needs to be enhanced to work after the sCC expansion change. Essentially it needs to account for the subreg extraction that shows up in the sequence as well as being a bit looser on mode checking. I kept the code working for the old sequences -- in theory a user could conjure up the old sequence so handling them seems useful. This also drops the testsuite changes. Palmer's change makes them unnecessary. --- So I was looking at a performance regression in spec with Ventana's internal tree. Ultimately the problem was a bad interaction with an internal patch (REP_MODE_EXTENDED), fwprop and ext-dce. The details of that problem aren't particularly important. Removal of the local patch went reasonably well. But I did see some secondary cases where we had redundant sign extensions. The most notable cases come from the integer sCC insns. Expansion of those cases for rv64 can be improved using Jivan's trick. ie, if the target is not DImode, then create a DImode temporary for the result and copy the low bits out with a promoted subreg to the real target. With the change in expansion the final code we generate is slightly different for a few tests at -O1/-Og, but should perform the same. The key for the affected tests is we're not seeing the introduction of unnecessary extensions. Rather than adjust the regexps to handle the -O1/-Og output, skipping for those seemed OK to me. I didn't extract a testcase. I'm a bit fried from digging through LTO'd code right now. gcc/ * config/riscv/riscv.cc (riscv_expand_int_scc): For rv64, use a DI temporary for the output and a promoted subreg to extract it into SI arget. (riscv_noce_conversion_profitable_p): Recognize new output from sCC expansion too.
2024-09-04[PATCH 1/3] RISC-V: Improve codegen for negative repeating large constantsRaphael Moreira Zinsly1-8/+21
Improve handling of constants where its upper and lower 32-bit halves are the same and have negative values. e.g. for: unsigned long f (void) { return 0xf0f0f0f0f0f0f0f0UL; } Without the patch: li a0,-252645376 addi a0,a0,240 li a5,-252645376 addi a5,a5,241 slli a5,a5,32 add a0,a5,a0 With the patch: li a5,252645376 addi a5,a5,-241 slli a0,a5,32 add a0,a0,a5 xori a0,a0,-1 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_split_integer_cost): Adjust the cost of negative repeating constants. (riscv_split_integer): Handle negative repeating constants. gcc/testsuite/ChangeLog: * gcc.target/riscv/synthesis-11.c: New test.
2024-09-04RISC-V: Allow IMM operand for unsigned scalar .SAT_ADDPan Li2-3/+3
This patch would like to allow the IMM operand of the unsigned scalar .SAT_ADD. Like the operand 0, the operand 1 of .SAT_ADD will be zero extended to Xmode before underlying code generation. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_usadd): Zero extend the second operand of usadd as the first operand does. * config/riscv/riscv.md (usadd<m>3): Allow imm operand for scalar usadd pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_add-11.c: Make asm check robust. * gcc.target/riscv/sat_u_add-15.c: Ditto. * gcc.target/riscv/sat_u_add-19.c: Ditto. * gcc.target/riscv/sat_u_add-23.c: Ditto. * gcc.target/riscv/sat_u_add-3.c: Ditto. * gcc.target/riscv/sat_u_add-7.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-03[PR target/115921] Improve reassociation for rv64Jeff Law1-4/+6
As Jovan pointed out in pr115921, we're not reassociating expressions like this on rv64: (x & 0x3e) << 12 It generates something like this: li a5,258048 slli a0,a0,12 and a0,a0,a5 We have a pattern that's designed to clean this up. Essentially reassociating the operations so that we don't need to load the constant resulting in something like this: andi a0,a0,63 slli a0,a0,12 That pattern wasn't working for certain constants due to its condition. The condition is trying to avoid cases where this kind of reassociation would hinder shadd generation on rv64. That condition was just written poorly. This patch tightens up that condition in a few ways. First, there's no need to worry about shadd cases if ZBA is not enabled. Second we can't use shadd if the shift value isn't 1, 2 or 3. Finally rather than open-coding one of the tests, we can use an existing operand predicate. The net is we'll start performing this transformation in more cases on rv64 while still avoiding reassociation if it would spoil shadd generation. PR target/115921 gcc/ * config/riscv/riscv.md (reassociate bitwise ops): Tighten test for cases we do not want reassociate. gcc/testsuite/ * gcc.target/riscv/pr115921.c: New test.
2024-09-03RISC-V: Support form 1 of integer scalar .SAT_ADDPan Li3-0/+102
This patch would like to support the scalar signed ssadd pattern for the RISC-V backend. Aka Form 1: #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_1 (T x, T y) \ { \ T sum = (UT)x + (UT)y; \ return (x ^ y) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) Before this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ mv a5,a0 12 │ add a0,a0,a1 13 │ xor a1,a5,a1 14 │ not a1,a1 15 │ xor a4,a5,a0 16 │ and a1,a1,a4 17 │ blt a1,zero,.L5 18 │ ret 19 │ .L5: 20 │ srai a5,a5,63 21 │ li a0,-1 22 │ srli a0,a0,1 23 │ xor a0,a5,a0 24 │ ret After this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ add a2,a0,a1 12 │ xor a1,a0,a1 13 │ xor a5,a0,a2 14 │ srli a5,a5,63 15 │ srli a1,a1,63 16 │ xori a1,a1,1 17 │ and a5,a5,a1 18 │ srai a4,a0,63 19 │ li a3,-1 20 │ srli a3,a3,1 21 │ xor a3,a3,a4 22 │ neg a4,a5 23 │ and a3,a3,a4 24 │ addi a5,a5,-1 25 │ and a0,a2,a5 26 │ or a0,a0,a3 27 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func decl for expanding ssadd. * config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func impl to gen the max int rtx. (riscv_expand_ssadd): Add new func impl to expand the ssadd. * config/riscv/riscv.md (ssadd<mode>3): Add new pattern for signed integer .SAT_ADD. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_arith_data.h: Add test data. * gcc.target/riscv/sat_s_add-1.c: New test. * gcc.target/riscv/sat_s_add-2.c: New test. * gcc.target/riscv/sat_s_add-3.c: New test. * gcc.target/riscv/sat_s_add-4.c: New test. * gcc.target/riscv/sat_s_add-run-1.c: New test. * gcc.target/riscv/sat_s_add-run-2.c: New test. * gcc.target/riscv/sat_s_add-run-3.c: New test. * gcc.target/riscv/sat_s_add-run-4.c: New test. * gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-01[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.Xianmiao Qu1-0/+5
Currently, in RV32, even with the D extension enabled, the cost of DFmode register moves is still set to 'COSTS_N_INSNS (2)'. This results in the 'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG register moves, leading to the generation of many redundant instructions. As an example, consider the following test case: double foo (int t, double a, double b) { if (t > 0) return a; else return b; } When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is generated: .cfi_startproc addi sp,sp,-32 .cfi_def_cfa_offset 32 fsd fa0,8(sp) fsd fa1,16(sp) lw a4,8(sp) lw a5,12(sp) lw a2,16(sp) lw a3,20(sp) bgt a0,zero,.L1 mv a4,a2 mv a5,a3 .L1: sw a4,24(sp) sw a5,28(sp) fld fa0,24(sp) addi sp,sp,32 .cfi_def_cfa_offset 0 jr ra .cfi_endproc After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the generated code is as follows, with a significant reduction in the number of instructions. .cfi_startproc ble a0,zero,.L5 ret .L5: fmv.d fa0,fa1 ret .cfi_endproc gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the DFmode register move for RV32. gcc/testsuite/ * gcc.target/riscv/rv32-movdf-cost.c: New test.
2024-09-02RISC-V: Refactor gen zero_extend rtx for SAT_* when expand SImode in RV64Pan Li1-53/+46
In previous, we have some specially handling for both the .SAT_ADD and .SAT_SUB for unsigned int. There are similar to take care of SImode in RV64 for zero extend. Thus refactor these two helper function into one for possible code duplication. The below test suite are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Merge the zero_extend handing from func riscv_gen_unsigned_xmode_reg. (riscv_gen_unsigned_xmode_reg): Remove. (riscv_expand_ussub): Leverage riscv_gen_zero_extend_rtx instead of riscv_gen_unsigned_xmode_reg. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_sub-11.c: Adjust asm check. * gcc.target/riscv/sat_u_sub-15.c: Ditto. * gcc.target/riscv/sat_u_sub-19.c: Ditto. * gcc.target/riscv/sat_u_sub-23.c: Ditto. * gcc.target/riscv/sat_u_sub-27.c: Ditto. * gcc.target/riscv/sat_u_sub-3.c: Ditto. * gcc.target/riscv/sat_u_sub-31.c: Ditto. * gcc.target/riscv/sat_u_sub-35.c: Ditto. * gcc.target/riscv/sat_u_sub-39.c: Ditto. * gcc.target/riscv/sat_u_sub-43.c: Ditto. * gcc.target/riscv/sat_u_sub-47.c: Ditto. * gcc.target/riscv/sat_u_sub-7.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-29Use std::unique_ptr for optinfo_itemDavid Malcolm2-0/+2
As preliminary work towards an overhaul of how optinfo_items interact with dump_pretty_printer, replace uses of optinfo_item * with std::unique_ptr<optinfo_item> to make ownership clearer. No functional change intended. gcc/ChangeLog: * config/aarch64/aarch64.cc: Define INCLUDE_MEMORY. * config/arm/arm.cc: Likewise. * config/i386/i386.cc: Likewise. * config/loongarch/loongarch.cc: Likewise. * config/riscv/riscv-vector-costs.cc: Likewise. * config/riscv/riscv.cc: Likewise. * config/rs6000/rs6000.cc: Likewise. * dump-context.h (dump_context::emit_item): Convert "item" param from * to const &. (dump_pretty_printer::stash_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. (dump_pretty_printer::emit_item): Likewise. * dumpfile.cc: Include "make-unique.h". (make_item_for_dump_gimple_stmt): Replace uses of optinfo_item * with std::unique_ptr<optinfo_item>. (dump_context::dump_gimple_stmt): Likewise. (make_item_for_dump_gimple_expr): Likewise. (dump_context::dump_gimple_expr): Likewise. (make_item_for_dump_generic_expr): Likewise. (dump_context::dump_generic_expr): Likewise. (make_item_for_dump_symtab_node): Likewise. (dump_pretty_printer::emit_items): Likewise. (dump_pretty_printer::emit_any_pending_textual_chunks): Likewise. (dump_pretty_printer::emit_item): Likewise. (dump_pretty_printer::stash_item): Likewise. (dump_pretty_printer::decode_format): Likewise. (dump_context::dump_printf_va): Fix overlong line. (make_item_for_dump_dec): Replace uses of optinfo_item * with std::unique_ptr<optinfo_item>. (dump_context::dump_dec): Likewise. (dump_context::dump_symtab_node): Likewise. (dump_context::begin_scope): Likewise. (dump_context::emit_item): Likewise. * gimple-loop-interchange.cc: Define INCLUDE_MEMORY. * gimple-loop-jam.cc: Likewise. * gimple-loop-versioning.cc: Likewise. * graphite-dependences.cc: Likewise. * graphite-isl-ast-to-gimple.cc: Likewise. * graphite-optimize-isl.cc: Likewise. * graphite-poly.cc: Likewise. * graphite-scop-detection.cc: Likewise. * graphite-sese-to-poly.cc: Likewise. * graphite.cc: Likewise. * opt-problem.cc: Likewise. * optinfo.cc (optinfo::add_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. (optinfo::emit_for_opt_problem): Update for change to dump_context::emit_item. * optinfo.h: Add #error to fail immediately if INCLUDE_MEMORY wasn't defined, rather than fail to find std::unique_ptr. (optinfo::add_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. * sese.cc: Define INCLUDE_MEMORY. * targhooks.cc: Likewise. * tree-data-ref.cc: Likewise. * tree-if-conv.cc: Likewise. * tree-loop-distribution.cc: Likewise. * tree-parloops.cc: Likewise. * tree-predcom.cc: Likewise. * tree-ssa-live.cc: Likewise. * tree-ssa-loop-ivcanon.cc: Likewise. * tree-ssa-loop-ivopts.cc: Likewise. * tree-ssa-loop-prefetch.cc: Likewise. * tree-ssa-loop-unswitch.cc: Likewise. * tree-ssa-phiopt.cc: Likewise. * tree-ssa-threadbackward.cc: Likewise. * tree-ssa-threadupdate.cc: Likewise. * tree-vect-data-refs.cc: Likewise. * tree-vect-generic.cc: Likewise. * tree-vect-loop-manip.cc: Likewise. * tree-vect-loop.cc: Likewise. * tree-vect-patterns.cc: Likewise. * tree-vect-slp-patterns.cc: Likewise. * tree-vect-slp.cc: Likewise. * tree-vect-stmts.cc: Likewise. * tree-vectorizer.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/dump_plugin.c: Define INCLUDE_MEMORY. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-08-29RISC-V: Fix subreg of VLS modes larger than a vector [PR116086].Robin Dapp3-0/+248
When the source mode is potentially larger than one vector (e.g. an LMUL2 mode for VLEN=128) we don't know which vector the subreg actually refers to. For zvl128b and LMUL=2 the subreg in (subreg:V2DI (reg:V4DI)) could actually be the a full (high) vector register of a two-register group (at VLEN=128) or the higher part of a single register (at VLEN>128). As the subreg is statically ambiguous we prevent such situations in can_change_mode_class. The culprit in PR116086 is _12 = BIT_FIELD_REF <vect_cst__42, 128, 128>; which can be expanded with a vector-vector extract (from V4DI to V2DI). This patch adds a VLS-mode vector-vector extract that handles "halving" cases like this one by sliding down the source vector, thus making sure the correct part is used. PR target/116086 gcc/ChangeLog: * config/riscv/autovec.md (vec_extract<mode><v_half>): Add vector-vector extract for VLS modes. * config/riscv/riscv.cc (riscv_can_change_mode_class): Forbid VLS modes larger than one vector. * config/riscv/vector-iterators.md: Add vector-vector extract iterators. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add effective target checks for zvl256b and zvl512b. * gcc.target/riscv/rvv/autovec/pr116086-2-run.c: New test. * gcc.target/riscv/rvv/autovec/pr116086-2.c: New test. * gcc.target/riscv/rvv/autovec/pr116086.c: New test.
2024-08-28RISC-V: Add missing mode_idx for vrol and vrorKito Cheng1-1/+1
We add pattern for vector rotate, but seems like we forgot adding mode_idx which used in AVL propgation (riscv-avlprop.cc). gcc/ChangeLog: * config/riscv/vector.md (mode_idx): Add vrol and vror. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/rotr.c: New.
2024-08-27RISC-V: Move helper functions above expand_const_vectorPatrick O'Neill1-66/+66
These subroutines will be used in expand_const_vector in a future patch. Relocate so expand_const_vector can use them. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vector_init_insert_elems): Relocate. (expand_vector_init_trailing_same_elem): Ditto. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Allow non-duplicate bool patterns in expand_const_vectorPatrick O'Neill1-15/+8
Currently we assert when encountering a non-duplicate boolean vector. This patch allows non-duplicate vectors to fall through to the gcc_unreachable and assert there. This will be useful when adding a catch-all pattern to emit costs and handle arbitary vectors. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Allow non-duplicate to fall through other patterns before asserting. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Handle 0.0 floating point pattern costing to match const_vector expanderPatrick O'Neill3-6/+15
The comment previously here stated that the Wc0/Wc1 cases are handled by the vi constraint but that is not true for the 0.0 Wc0 case. gcc/ChangeLog: * config/riscv/riscv-v.h (valid_vec_immediate_p): Add new helper. * config/riscv/riscv-v.cc (valid_vec_immediate_p): Ditto. (expand_const_vector): Use new helper. * config/riscv/riscv.cc (riscv_const_insns): Handle 0.0 floating-point case. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Emit costs for bool and stepped const vectorsPatrick O'Neill3-52/+131
These cases are handled in the expander (riscv-v.cc:expand_const_vector). We need the vector builder to detect these cases so extract that out into a new riscv-v.h header file. gcc/ChangeLog: * config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h. * config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for bool/stepped const vectors. * config/riscv/riscv-v.h: New file. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Handle case when constant vector construction target rtx is not a ↵Patrick O'Neill1-32/+41
register This manifests in RTL that is optimized away which causes runtime failures in the testsuite. Update all patterns to use a temp result register if required. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if needed. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Reorder insn cost match order to match corresponding expander match ↵Patrick O'Neill1-9/+9
order The corresponding expander (riscv-v.cc:expand_const_vector) matches const_vec_duplicate_p before const_vec_series_p. Reorder to match this behavior when calculating costs. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_const_insns): Relocate. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Fix vid const vector expander for non-npatterns size stepsPatrick O'Neill1-6/+42
Prior to this patch the expander would emit vectors like: { 0, 0, 5, 5, 10, 10, ...} as: { 0, 0, 2, 2, 4, 4, ...} This patch sets the step size to the requested value. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Fix STEP size in expander. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-27RISC-V: Support IMM for operand 1 of ussub patternPan Li2-2/+2
This patch would like to allow IMM for the operand 1 of ussub pattern. Aka .SAT_SUB(x, 22) as the below example. Form 2: #define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \ T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_2 (T x) \ { \ return x >= (T)IMM ? x - (T)IMM : 0; \ } DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022) It is almost the as support imm for operand 0 of ussub pattern, but allow the second operand to be imm insted of the first operand. The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the second operand, aka y in parameter. * config/riscv/riscv.md (ussub<mode>3): Allow const_int for operand 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_u_sub_imm-5.c: New test. * gcc.target/riscv/sat_u_sub_imm-5_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-5_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-6.c: New test. * gcc.target/riscv/sat_u_sub_imm-6_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-6_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-7.c: New test. * gcc.target/riscv/sat_u_sub_imm-7_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-7_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-8.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-5.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-6.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-7.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-26RISC-V: Support IMM for operand 0 of ussub patternPan Li2-2/+46
This patch would like to allow IMM for the operand 0 of ussub pattern. Aka .SAT_SUB(1023, y) as the below example. Form 1: #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ { \ return (T)IMM >= y ? (T)IMM - y : 0; \ } DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) Before this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ bgtu a0,a5,.L3 13 │ sub a0,a5,a0 14 │ ret 15 │ .L3: 16 │ li a0,0 17 │ ret After this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ sltu a4,a5,a0 13 │ addi a4,a4,-1 14 │ sub a0,a5,a0 15 │ and a0,a4,a0 16 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new func impl to gen xmode rtx reg from operand rtx. (riscv_expand_ussub): Gen xmode reg for operand 1. * config/riscv/riscv.md: Allow const_int for operand 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_u_sub_imm-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-4.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-25RISC-V: Fix double mode under RV32 not utilize vfdemin.han1-1/+2
Currently, some binops of vector vs double scalar under RV32 can't translated to vf but vfmv+vxx.vv. The cause is that vec_duplicate is also expanded to broadcast for double mode under RV32. last-combine can't process expanded broadcast. gcc/ChangeLog: * config/riscv/vector.md: Add !FLOAT_MODE_P constraint. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto.
2024-08-23RISC-V: Use encoded nelts when calling repeating_sequence_pPatrick O'Neill1-7/+3
repeating_sequence_p operates directly on the encoded pattern and does not derive elements using the .elt() accessor. Passing in the length of the unencoded vector can cause an out-of-bounds read of the encoded pattern. gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Use encoded_nelts when calling repeating_sequence_p. (rvv_builder::is_repeating_sequence): Ditto. (rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-23RISC-V: Expand vec abs without masking.Robin Dapp1-18/+8
Standard abs synthesis during expand is max (a, -a). This expansion has the advantage of avoiding masking and is thus potentially faster than the a < 0 ? -a : a synthesis. gcc/ChangeLog: * config/riscv/autovec.md (abs<mode>2): Expand via max (a, -a). gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Adjust test expectation. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
2024-08-22RISC-V: Fix vector cfi notes for stack-clash protectionRaphael Moreira Zinsly1-2/+16
The stack-clash code is generating wrong cfi directives in riscv_v_adjust_scalable_frame because REG_CFA_DEF_CFA has a different encoding than REG_FRAME_RELATED_EXPR, this patch fixes the offset sign in prologue and starts using REG_CFA_DEF_CFA in the epilogue. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_v_adjust_scalable_frame): Add epilogue code for stack-clash and fix prologue cfi note. gcc/testsuite/ChangeLog: * gcc.target/riscv/stack-check-cfa-3.c: Fix the expected output.
2024-08-18RISC-V: Implement the quad and oct .SAT_TRUNC for scalarPan Li2-0/+40
This patch would like to implement the quad and oct .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow; \ } DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t) Before this patch: 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ _Bool overflow; 8 │ short unsigned int _1; 9 │ short unsigned int _2; 10 │ short unsigned int _3; 11 │ uint16_t _6; 12 │ 13 │ ;; basic block 2, loop depth 0 14 │ ;; pred: ENTRY 15 │ overflow_5 = x_4(D) > 65535; 16 │ _1 = (short unsigned int) x_4(D); 17 │ _2 = (short unsigned int) overflow_5; 18 │ _3 = -_2; 19 │ _6 = _1 | _3; 20 │ return _6; 21 │ ;; succ: EXIT 22 │ 23 │ } After this patch: 3 │ 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ uint16_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _6; 13 │ ;; succ: EXIT 14 │ 15 │ } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for quad truncation. (ANYI_OCT_TRUNC): New iterator for oct truncation. (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes. (ANYI_OCT_TRUNCATED): New attr for truncated oct modes. (anyi_quad_truncated): Ditto but for lower case. (anyi_oct_truncated): Ditto but for lower case. * config/riscv/riscv.md (ustrunc<mode><anyi_quad_truncated>2): Add new pattern for quad truncation. (ustrunc<mode><anyi_oct_truncated>2): Ditto but for oct. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust the expand dump check times. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/sat_arith_data.h: Add test helper macros. * gcc.target/riscv/sat_u_trunc-4.c: New test. * gcc.target/riscv/sat_u_trunc-5.c: New test. * gcc.target/riscv/sat_u_trunc-6.c: New test. * gcc.target/riscv/sat_u_trunc-run-4.c: New test. * gcc.target/riscv/sat_u_trunc-run-5.c: New test. * gcc.target/riscv/sat_u_trunc-run-6.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-18RISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]Pan Li1-12/+22
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the high bits of Xmode may be all 1 which is not expected. For example as below code. signed char b[1]; unsigned short c; signed char *d = b; int main() { b[0] = -40; c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9; __builtin_printf("%d\n", c); } After expanding we have: ;; _6 = .SAT_ADD (_3, 9); (insn 8 7 9 (set (reg:DI 143) (high:DI (symbol_ref:DI ("d") [flags 0x86] <var_decl d>))) (nil)) (insn 9 8 10 (set (reg/f:DI 142) (mem/f/c:DI (lo_sum:DI (reg:DI 143) (symbol_ref:DI ("d") [flags 0x86] <var_decl d>)) [1 d+0 S8 A64])) (nil)) (insn 10 9 11 (set (reg:HI 144 [ _3 ]) (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) "test.c":7:10 -1 (nil)) The convert from signed char to unsigned short will have sign_extend rtl as above. And finally become the lb insn as below: lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8 lui a0,0x1a addi a5,a1,9 slli a5,a5,0x30 srli a5,a5,0x30 // a5 is 65505 sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we actually want to compare 65505 and 65496 (0xffd8). Thus we need to clean up the high bits to ensure this. The below test suites are passed for this patch: * The rv64gcv fully regression test. PR target/116278 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new func impl to zero extend rtx. (riscv_expand_usadd): Leverage above func to cleanup operands 0 and remove the special handing for SImode in RV64. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_add-11.c: Adjust asm check body. * gcc.target/riscv/sat_u_add-15.c: Ditto. * gcc.target/riscv/sat_u_add-19.c: Ditto. * gcc.target/riscv/sat_u_add-23.c: Ditto. * gcc.target/riscv/sat_u_add-3.c: Ditto. * gcc.target/riscv/sat_u_add-7.c: Ditto. * gcc.target/riscv/sat_u_add_imm-11.c: Ditto. * gcc.target/riscv/sat_u_add_imm-15.c: Ditto. * gcc.target/riscv/sat_u_add_imm-3.c: Ditto. * gcc.target/riscv/sat_u_add_imm-7.c: Ditto. * gcc.target/riscv/pr116278-run-1.c: New test. * gcc.target/riscv/pr116278-run-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-17t-rtems: add rv32imf architecture to the RTEMS multilib for RISC-VKevin Kirspel1-2/+3
The attach patch is specific to the RTEMS RISC-V architecture multilib which is controlled by the t-rtems file in the gcc/config/riscv/ directory. The patch file was created from the gcc-13.3.0 branch. It was successfully tested within RTEMS Source Builder. gcc/ * config/riscv/t-rtems: Add ilp32f multilib.
2024-08-17RISC-V: Fix ICE for vector single-width integer multiply-add intrinsicsJin Ma1-40/+40
When rs1 is the immediate 0, the following ICE occurs: error: unrecognizable insn: (insn 8 5 12 2 (set (reg:RVVM1DI 134 [ <retval> ]) (if_then_else:RVVM1DI (unspec:RVVMF64BI [ (const_vector:RVVMF64BI repeat [ (const_int 1 [0x1]) ]) (reg/v:DI 137 [ vl ]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 [0])) (reg/v:RVVM1DI 136 [ vs2 ])) (reg/v:RVVM1DI 135 [ vd ])) (reg/v:RVVM1DI 135 [ vd ]))) gcc/ChangeLog: * config/riscv/vector.md: Allow scalar operand to be 0. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-7.c: New test. * gcc.target/riscv/rvv/base/bug-8.c: New test.
2024-08-17[RISC-V][PR target/116282] Stabilize pattern conditionsJeff Law4-31/+55
So as expected the core problem with target/116282 is that the cost of certain constant synthesis cases varied depending on whether or not we're allowed to generate new pseudos or not. That in turn meant that in obscure cases an insn might change from recognizable to unrecognizable and triggers the observed failure. So we need to keep the cost stable, at least when called from a pattern's condition. So we pass another boolean down when necessary. I've tried to keep API fallout minimized. Built and tested on rv32 in my tester. Let's see what pre-commit testing has to say though 🙂 Note this will also require a minor change to the in-flight constant synthesis work. PR target/116282 gcc/ * config/riscv/riscv-protos.h (riscv_const_insns): Add new argument. * config/riscv/riscv.cc (riscv_build_integer): Add new argument ALLOW_NEW_PSEUDOS. Pass it down to recursive calls and check it before using synthesis which allows new registers to be created. (riscv_split_integer_cost): Pass new argument to riscv_build_integer. (riscv_integer_cost): Add ALLOW_NEW_PSEUDOS argument, pass it down to riscv_build_integer. (riscv_legitimate_constant_p): Pass new argument to riscv_const_insns. (riscv_const_insns): New argment ALLOW_NEW_PSEUDOS. Pass it down to riscv_integer_cost and riscv_const_insns. (riscv_split_const_insns): Pass new argument to riscv_const_insns. (riscv_move_integer, riscv_rtx_costs): Similarly. * config/riscv/riscv.md (shadd with costly constant): Pass new argument to riscv_const_insns. * config/riscv/bitmanip.md (and with costly constant): Pass new argument to riscv_const_insns. gcc/testsuite/ * gcc.target/riscv/pr116282.c: New test.
2024-08-17RISC-V: Bugfix for RVV rounding intrinsic ICE in function checkerJin Ma3-3/+7
When compiling an interface for rounding of type 'vfloat16*' without using zvfh or zvfhmin, it is not enough to use FLOAT_MODE_P because the type does not support it. Although the subsequent riscv_validate_vector_type checks will still fail and throw exceptions, I don't think we should have ICE here. internal compiler error: in check, at config/riscv/riscv-vector-builtins-shapes.cc:444 10 | return __riscv_vfadd_vv_f16m1_rm (vs2, vs1, 0, vl); | ^~~~~~ 0x4191794 internal_error(char const*, ...) /iothome/jin.ma/code/master/gcc/gcc/diagnostic-global-context.cc:491 0x416ebf5 fancy_abort(char const*, int, char const*) /iothome/jin.ma/code/master/gcc/gcc/diagnostic.cc:1772 0x220aae6 riscv_vector::build_frm_base::check(riscv_vector::function_checker&) const /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:444 0x2205323 riscv_vector::function_checker::check() /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4414 gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_vector_float_type_p): New. * config/riscv/riscv-vector-builtins.cc (function_instance::any_type_float_p): Use riscv_vector_float_type_p instead of FLOAT_MODE_P for judgment. * config/riscv/riscv.cc (riscv_vector_int_type_p): Change static to extern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-9.c: New test.
2024-08-17RISC-V: Bugfix incorrect operand for vwsll auto-vectPan Li1-0/+4
This patch would like to fix one ICE when rv64gcv_zvbb for vwsll. Consider below example. void vwsll_vv_test (short *restrict dst, char *restrict a, int *restrict b, int n) { for (int i = 0; i < n; i++) dst[i] = a[i] << b[i]; } It will hit the vwsll pattern with following operands. operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ]) operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ]) operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ]) According to the ISA, operand 2 should be the same as operand 1. Aka operand 2 should have RVVMF4QI mode as above. Thus, add quad truncation for operand 2 before emit vwsll. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/116280 gcc/ChangeLog: * config/riscv/autovec-opt.md: Add quad truncation to align the mode requirement for vwsll. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr116280-1.c: New test. * gcc.target/riscv/rvv/base/pr116280-2.c: New test.
2024-08-17RISC-V: Add auto-vect pattern for vector rotate shiftFeng Wang1-0/+16
This patch add the vector rotate shift pattern for auto-vect. With this patch, the scalar rotate shift can be automatically vectorized into vector rotate shift. gcc/ChangeLog: * config/riscv/autovec.md (v<bitmanip_optab><mode>3): Add new define_expand pattern for vector rotate shift. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.
2024-08-17RISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]曾治金1-2/+2
This patch is to fix the bug (BugId:116305) introduced by the commit bd93ef for risc-v target. The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So it changes the value of BYTES_PER_RISCV_VECTOR. For example, before merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer equal. Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb register value in riscv_legitimize_poly_move, and dwarf2cfi will also get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value to calculate the number of times to multiply the vlenb register value. So need to change the factor from riscv_bytes_per_vector_chunk to BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf information. The incorrect example as follow: ``` csrr    t0,vlenb slli    t1,t0,1 sub     sp,sp,t1 .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 ``` The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means the literal 4, '0x1e' means the multiply operation. But in fact, the vlenb register value just need to multiply the literal 2. PR target/116305 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. Signed-off-by: Zhijin Zeng <zhijin.zeng@spacemit.com>
2024-08-15RISC-V: use fclass insns to implement isfinite,isnormal and isinf builtinsVineet Gupta1-0/+63
Currently these builtins use float compare instructions which require FP flags to be saved/restored which could be costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP values w/o disturbing FP exception flags. Now that upstream supports the corresponding optabs, wire them up in the backend. gcc/ChangeLog: * config/riscv/riscv.md: define_insn for fclass insn. define_expand for isfinite, isnormal, isinf. gcc/testsuite/ChangeLog: * gcc.target/riscv/fclass.c: New tests. Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #2060 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2024-08-13RISC-V: Fix non-obvious comment typosPatrick O'Neill5-8/+8
This fixes the remainder of the typos I found when reading various parts of the RISC-V backend. gcc/ChangeLog: * config/riscv/riscv-v.cc (legitimize_move): extrac -> extract. (expand_vec_cmp_float): Remove duplicate vmnor.mm. * config/riscv/riscv-vector-builtins.cc: ins -> insns. * config/riscv/riscv.cc (riscv_init_machine_status): mwrvv -> mrvv. * config/riscv/vector-iterators.md: RVVM8QImde -> RVVM8QImode * config/riscv/vector.md: Replaced non-existant vsetivl with vsetivli. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-09[RISC-V][PR target/116283] Fix split code for recent Zbs improvements with ↵Jeff Law1-6/+18
masked bit positions So Patrick's fuzzer found an interesting little buglet in the Zbs improvements I added a couple months back. Specifically when we have masked bit position for a Zbs instruction. If the mask has extraneous bits set we'll generate an unrecognizable insn due to an invalid constant. More concretely, let's take this pattern: > (define_insn_and_split "" > [(set (match_operand:DI 0 "register_operand" "=r") > (any_extend:DI > (ashift:SI (const_int 1) > (subreg:QI (and:DI (match_operand:DI 1 "register_operand" "r") > (match_operand 2 "const_int_operand")) 0))))] What we need to know to transform this into bset for rv64. After masking the shift count we want to know the low 5 bits aren't 0x1f. If they were 0x1f, then the constant generated would be 0x80000000 which would then need sign extension out to 64bits, which the bset instruction will not do for us. We can ignore anything outside the low 5 bits. The mode of the shift is SI, so shifting by 32+ bits is undefined behavior. It's also worth explicitly mentioning that the hardware is going to mask the count against 0x3f. The net is if (operands[2] & 0x1f) != 0x1f, then this transformation is safe. So onto the generated split code... > [(set (match_dup 0) (and:DI (match_dup 1) (match_dup 2))) > (set (match_dup 0) (zero_extend:DI (ashift:SI > (const_int 1) > (subreg:QI (match_dup 0) 0))))] Which would seemingly do exactly what we want. The problem is the first split insn. If the constant does not fit into a simm12, that insn won't be recognized resulting in the ICE. The fix is simple, we just need to mask the constant before generating RTL. We can just mask it against 0x1f since we only care about the low 5 bits. This affects multiple patterns. I've added the appropriate fix to all of them. Tested in my tester. Waiting for the pre-commit bits to run before pushing. PR target/116283 gcc/ * config/riscv/bitmanip.md (Zbs combiner patterns/splitters): Mask the bit position in the split code appropriately. gcc/testsuite/ * gcc.target/riscv/pr116283.c: New test
2024-08-09RISC-V: Enable stack clash in allocaRaphael Moreira Zinsly2-0/+34
Add the TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE to riscv in order to enable stack clash protection when using alloca. The code and tests are the same used by aarch64. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_compute_frame_info): Update outgoing args size. (riscv_stack_clash_protection_alloca_probe_range): New. (TARGET_STACK_CLASH_PROTECTION_ALLOCA_PROBE_RANGE): New. * config/riscv/riscv.h (STACK_CLASH_MIN_BYTES_OUTGOING_ARGS): New. (STACK_DYNAMIC_OFFSET): New. gcc/testsuite/ChangeLog: * gcc.target/riscv/stack-check-14.c: New test. * gcc.target/riscv/stack-check-15.c: New test. * gcc.target/riscv/stack-check-alloca-1.c: New test. * gcc.target/riscv/stack-check-alloca-2.c: New test. * gcc.target/riscv/stack-check-alloca-3.c: New test. * gcc.target/riscv/stack-check-alloca-4.c: New test. * gcc.target/riscv/stack-check-alloca-5.c: New test. * gcc.target/riscv/stack-check-alloca-6.c: New test. * gcc.target/riscv/stack-check-alloca-7.c: New test. * gcc.target/riscv/stack-check-alloca-8.c: New test. * gcc.target/riscv/stack-check-alloca-9.c: New test. * gcc.target/riscv/stack-check-alloca-10.c: New test. * gcc.target/riscv/stack-check-alloca.h: New.