aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
10 hoursRISC-V: Combine vec_duplicate + vwaddu.vv to vwaddu.vx on GR2VR costPan Li3-0/+61
This patch would like to combine the vec_duplicate + vwaddu.vv to the vwaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. Before this patch: 11 beq a3,zero,.L8 12 vsetvli a5,zero,e32,m1,ta,ma 13 vmv.v.x v2,a2 ... 16 .L3: 17 vsetvli a5,a3,e32,m1,ta,ma ... 22 vwaddu.vv v1,v2,v3 ... 25 bne a3,zero,.L3 After this patch: 11 beq a3,zero,.L8 ... 14 .L3: 15 vsetvli a5,a3,e32,m1,ta,ma ... 20 vwaddu.vx v1,a2,v3 ... 23 bne a3,zero,.L3 The pattern of this patch only works on DImode, aka below pattern. v1:RVVM1DImode = (zero_extend:RVVM1DImode v2:RVVM1SImode) + (vec_dup:RVVM1DImode (zero_extend:DImode x2:SImode)); Unfortunately, for uint16_t to uint32_t or uint8_t to uint16_t, we loss this extend op after expand. For uint16_t => uint32_t we have: (set (reg:SI 149) (subreg/s/v:SI (reg/v:DI 146 [ rs1 ]) 0)) For uint32_t => uint64_t we have: (set (reg:DI 148 [ _6 ]) (zero_extend:DI (subreg/s/u:SI (reg/v:DI 146 [ rs1 ]) 0))) We can see there is no zero_extend for uint16_t to uint32_t, and we cannot hit the pattern above. So the combine will try below pattern for uint16_t to uint32_t. v1:RVVM1SImode = (zero_extend:RVVM1SImode v2:RVVM1HImode) + (vec_dup:RVVM1SImode (subreg:SIMode (:DImode x2:SImode))) But it cannot match the vwaddu sematics, thus we need another handing for the vwaddu.vv for uint16_t to uint32_t, as well as the uint8_t to uint16_t. gcc/ChangeLog: * config/riscv/autovec-opt.md (*widen_first_<any_extend:su>_vx_<mode>): Add helper bridge pattern for vwaddu.vx combine. (*widen_<any_widen_binop:optab>_<any_extend:su>_vx_<mode>): Add new pattern to match vwaddu.vx combine. * config/riscv/iterators.md: Add code attr to get extend CODE. * config/riscv/vector-iterators.md: Add Dmode iterator for widen. Signed-off-by: Pan Li <pan2.li@intel.com>
22 hoursxtensa: Simplify the definition of REGNO_OK_FOR_BASE_P() and avoid calling ↵Takayuki 'January June' Suwa2-3/+4
it directly In recent gcc versions, REGNO_OK_FOR_BASE_P() is not called directly, but rather via regno_ok_for_base_p() which is a wrapper in gcc/addresses.h. The wrapper obtains a hard register number from pseudo via reg_renumber array, so REGNO_OK_FOR_BASE_P() does not need to take this into consideration. On the other hand, since there is only one use of REGNO_OK_FOR_BASE_P() in the target-specific code, it would make more sense to simplify the definition of REGNO_OK_FOR_BASE_P() and replace its call with that of regno_ok_for_base_p(). gcc/ChangeLog: * config/xtensa/xtensa.cc (#include): Add "addresses.h". * config/xtensa/xtensa.h (REGNO_OK_FOR_BASE_P): Simplify to just a call to GP_REG_P(). (BASE_REG_P): Replace REGNO_OK_FOR_BASE_P() with the equivalent call to regno_ok_for_base_p().
25 hoursAArch64: Add isnan expander [PR 66462]Wilco Dijkstra1-0/+16
Add an expander for isnan using integer arithmetic. Since isnan is just a compare, enable it only with -fsignaling-nans to avoid generating spurious exceptions. This fixes part of PR66462. int isnan1 (float x) { return __builtin_isnan (x); } Before: fcmp s0, s0 cset w0, vs ret After: fmov w1, s0 mov w0, -16777216 cmp w0, w1, lsl 1 cset w0, cc ret gcc: PR middle-end/66462 * config/aarch64/aarch64.md (isnan<mode>2): Add new expander. gcc/testsuite: PR middle-end/66462 * gcc.target/aarch64/pr66462.c: Update test.
29 hoursaarch64: Force vector in SVE gimple_folder::fold_active_lanes_to.Jennifer Schmitz1-0/+1
An ICE was reported in the following test case: svint8_t foo(svbool_t pg, int8_t op2) { return svmul_n_s8_z(pg, svdup_s8(1), op2); } with a type mismatch in 'vec_cond_expr': _4 = VEC_COND_EXPR <v16_2(D), v32_3(D), { 0, ... }>; The reason is that svmul_impl::fold folds calls where one of the operands is all ones to the other operand using gimple_folder::fold_active_lanes_to. However, we implicitly assumed that the argument that is passed to fold_active_lanes_to is a vector type. In the given test case op2 is a scalar type, resulting in the type mismatch in the vec_cond_expr. This patch fixes the ICE by forcing a vector type of the argument in fold_active_lanes_to before the statement with the vec_cond_expr. In the initial version of this patch, the force_vector statement was placed in svmul_impl::fold, but it was moved to fold_active_lanes_to to align it with fold_const_binary which takes care of the fixup from scalar to vector type using vector_const_binop. The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? OK to backport to GCC 15? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ PR target/121602 * config/aarch64/aarch64-sve-builtins.cc (gimple_folder::fold_active_lanes_to): Add force_vector statement. gcc/testsuite/ PR target/121602 * gcc.target/aarch64/sve/acle/asm/mul_s16.c: New test. * gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise. * gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
34 hoursRISC-V: Allow profiles input in '--with-arch' option.Jiawei1-1/+22
Allows profiles input in '--with-arch'. Check profiles with 'riscv-profiles.def'. gcc/ChangeLog: * config.gcc: Accept RISC-V profiles in `--with-arch`. * config/riscv/arch-canonicalize: Add profile detection and skip canonicalization for profiles.
34 hoursRISC-V: Configure Profiles definitions in the definition file.Jiawei1-0/+82
Moving RISC-V Profiles definations into 'riscv-profiles.def'. Add comments for 'riscv_profiles'. gcc/ChangeLog: * common/config/riscv/riscv-common.cc (struct riscv_profiles): Add comments. (RISCV_PROFILE): Removed. * config/riscv/riscv-profiles.def: New file.
34 hoursRISC-V: Imply zicsr for sdtrig and ssstrict extensions.Dongyan Chen1-2/+2
This patch implies zicsr for sdtrig and ssstrict extensions. According to the riscv-privileged spec, the sdtrig and ssstrict extensions are privileged extensions, so they should imply zicsr. gcc/ChangeLog: * config/riscv/riscv-ext.def: Imply zicsr.
36 hoursOptimize vpermpd to vbroadcastf128 for specific permutations.liuhongt2-0/+34
gcc/ChangeLog: * config/i386/predicates.md (avx_vbroadcast128_operand): New predicate. * config/i386/sse.md (*avx_vbroadcastf128_<mode>_perm): New pre_reload splitter. gcc/testsuite/ChangeLog: * gcc.target/i386/avx_vbroadcastf128.c: New test.
39 hours[ppc] [vxworks] allow code model selectionAlexandre Oliva1-0/+5
Bring code model selection logic to vxworks.h as well. for gcc/ChangeLog * config/rs6000/vxworks.h (TARGET_CMODEL, SET_CMODEL): Define.
2 daysAVR: Support AVR32EB14/20/28/32.Georg-Johann Lay1-0/+4
Add support for some recent AVR devices. gcc/ * config/avr/avr-mcus.def: Add avr32eb14, avr32eb20, avr32eb28, avr32eb32. * doc/avr-mmcu.texi: Rebuild.
2 daysx86: Don't align destination for a single instructionH.J. Lu1-24/+38
If a single instruction can store or move the whole block of memory, use vector instruction and don't align destination. gcc/ PR target/121934 * config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): If a single instruction can store or move the whole block of memory, use vector instruction and don't align destination. gcc/testsuite/ PR target/121934 * gcc.target/i386/pr121934-1a.c: New test. * gcc.target/i386/pr121934-1b.c: Likewise. * gcc.target/i386/pr121934-2a.c: Likewise. * gcc.target/i386/pr121934-2b.c: Likewise. * gcc.target/i386/pr121934-3a.c: Likewise. * gcc.target/i386/pr121934-3b.c: Likewise. * gcc.target/i386/pr121934-4a.c: Likewise. * gcc.target/i386/pr121934-4b.c: Likewise. * gcc.target/i386/pr121934-5a.c: Likewise. * gcc.target/i386/pr121934-5b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2 daysLoongArch: Fix wrong code from bstrpick splitXi Ruoyao1-7/+7
After late-combine is added, split1 can see an input like (insn 56 55 169 5 (set (reg/v:DI 87 [ n ]) (ior:DI (and:DI (reg/v:DI 87 [ n ]) (const_int 281474976710655 [0xffffffffffff])) (and:DI (reg:DI 131 [ _45 ]) (const_int -281474976710656 [0xffff000000000000])))) "pr121906.c":22:8 108 {*bstrins_di_for_ior_mask} (nil)) And the splitter ends up emitting (insn 184 55 185 5 (set (reg/v:DI 87 [ n ]) (reg:DI 131 [ _45 ])) "pr121906.c":22:8 -1 (nil)) (insn 185 184 169 5 (set (zero_extract:DI (reg/v:DI 87 [ n ]) (const_int 48 [0x30]) (const_int 0 [0])) (reg/v:DI 87 [ n ])) "pr121906.c":22:8 -1 (nil)) which obviously lost everything in r87, instead of retaining its lower bits as we expect. It's because the splitter didn't anticipate the output register may be one of the input registers. PR target/121906 gcc/ * config/loongarch/loongarch.md (*bstrins_<mode>_for_ior_mask): Always create a new pseudo for the input register of the bstrins instruction. gcc/testsuite/ * gcc.target/loongarch/pr121906.c: New test.
5 daysAarch64: Add support for addhn vectorizer optabs for Adv.SIMDTamar Christina1-0/+11
This implements the new vector optabs vec_<su>addh_narrow<mode> adding support for in-vectorizer use for early break. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (vec_addh_narrow<mode>): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vect-addhn_1.c: New test.
6 daysAArch64: Add isfinite expander [PR 66462]Wilco Dijkstra1-0/+16
Add an expander for isfinite using integer arithmetic. This is typically faster and avoids generating spurious exceptions on signaling NaNs. This fixes part of PR66462. int isfinite1 (float x) { return __builtin_isfinite (x); } Before: fabs s0, s0 mov w0, 2139095039 fmov s31, w0 fcmp s0, s31 cset w0, hi eor w0, w0, 1 ret After: fmov w1, s0 mov w0, -16777216 cmp w0, w1, lsl 1 cset w0, hi ret gcc: PR middle-end/66462 * config/aarch64/aarch64.md (isfinite<mode>2): Add new expander. gcc/testsuite: PR middle-end/66462 * gcc.target/aarch64/pr66462.c: Add tests for isfinite.
6 daysLoongArch: Fix the semantic of 16B CASXi Ruoyao1-41/+63
In a CAS operation, even if expected != *memory we still need to do an atomic load of *memory into output. But I made a mistake in the initial implementation, causing the output to contain junk in this situation. Like a normal atomic load, the atomic load embedded in the CAS semantic is required to work on read-only page. Thus we cannot rely on sc.q to ensure the atomicity of the load. Use LSX to perform the load instead, and also use LSX to compare the 16B values to keep the ll-sc loop body short. gcc/ChangeLog: * config/loongarch/sync.md (atomic_compare_and_swapti_scq): Require LSX. Change the operands for the output, the memory, and the expected value to LSX vector modes. Add a FCCmode output to indicate if CAS has written the desired value into memory. Use LSX to atomically load both words of the 16B value in memory. (atomic_compare_and_swapti): Pun the modes to satisify the new atomic_compare_and_swapti_scq implementation. Read the bool return value from the FCC instead of performing a comparision.
6 daysLoongArch: Fix the "%t" modifier handling for (const_int 0)Xi Ruoyao1-2/+1
This modifier is intended to output $r0 for (const_int 0), but the logic: GET_MODE (op) != TImode || (op != CONST0_RTX (TImode) && code != REG) will reject (const_int 0) because (const_int 0) actually does not have a mode and GET_MODE will return VOIDmode for it. Use reg_or_0_operand instead to fix the issue. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_print_operand): Call reg_or_0_operand for checking the sanity of %t.
6 daysRISC-V: Suppress cross CC sibcall optimization from vectorTsukasa OI1-0/+6
In general, tail call optimization requires that the callee's saved registers are a superset of the caller's. The Standard Vector Calling Convention Variant (assembler: .variant_cc) requires that a function with this calling convention preserves vector registers v1-v7 and v24-v31 across calls (i.e. callee-saved). However, the same set of registers are (function-local) temporary registers (i.e. caller-saved) on the normal (non-vector) calling convention. Even if a function with this calling convention variant calls another function with a non-vector calling convention, those vector registers are correctly clobbered -- except when the sibling (tail) call optimization occurs as it violates the general rule mentioned above. If this happens, following function body: 1. Save v1-v7 and v24-v31 for clobbering 2. Call another function with a non-vector calling convention (which may destroy v1-v7 and/or v24-v31) 3. Restore v1-v7 and v24-v31 4. Return. may be incorrectly optimized into the following sequence: 1. Save v1-v7 and v24-v31 for clobbering 2. Restore v1-v7 and v24-v31 (?!) 3. Jump to another function with a non-vector calling convention (which may destroy v1-v7 and/or v24-v31). This commit suppresses cross CC sibling call optimization from the vector calling convention variant. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_function_ok_for_sibcall): Suppress cross calling convention sibcall optimization from the vector calling convention variant. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall.c: New test. * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-1.c: Ditto. * gcc.target/riscv/rvv/base/abi-call-variant_cc-sibcall-indirect-2.c: Ditto.
6 daysRISC-V: Add min/max patterns for ifcvt.Robin Dapp3-3/+21
ifcvt likes to emit (set (if_then_else) (ge (reg 1) (reg2)) (reg 1) (reg 2)) which can be recognized as min/max patterns in the backend. This patch adds such patterns and the respective iterators as well as a test. gcc/ChangeLog: * config/riscv/bitmanip.md (*<bitmanip_minmax_cmp_insn>_cmp_<mode>3): New min/max ifcvt pattern. * config/riscv/iterators.md (minu): New iterator. * config/riscv/riscv.cc (riscv_noce_conversion_profitable_p): Remove riscv-specific adjustment. gcc/testsuite/ChangeLog: * gcc.target/riscv/zbb-min-max-04.c: New test.
6 daysRISC-V: Fix can_find_related_mode_p for VLS typesKito Cheng2-1/+158
can_find_related_mode_p incorrectly handled VLS (Vector Length Specific) types by using TARGET_MIN_VLEN directly, which is in bits, instead of converting it to bytes as required. This patch fixes the issue by dividing TARGET_MIN_VLEN by 8 to convert from bits to bytes when calculating the number of units for VLS modes. The fix enables proper vectorization for several test cases: - zve32f-1.c: Now correctly finds vector mode for SF mode in foo3, enabling vectorization of an additional loop. - zve32f_zvl256b-1.c and zve32x_zvl256b-1.c: Added -mrvv-max-lmul=m2 option to handle V8SI[2] (vector array mode) requirements during vectorizer analysis, which needs V16SI to pass, and V16SI was enabled incorrectly before. Changes since V4: - Fix testsuite, also triaged why changed. gcc/ChangeLog: * config/riscv/riscv-selftests.cc (riscv_run_selftests): Call run_vectorize_related_mode_selftests. (test_vectorize_related_mode): New function to test vectorize_related_mode behavior. (run_vectorize_related_mode_selftests): New function to run all vectorize_related_mode tests. (run_vectorize_related_mode_vla_selftests): New function to test VLA modes. (run_vectorize_related_mode_vls_rv64gcv_selftests): New function to test VLS modes on rv64gcv. (run_vectorize_related_mode_vls_rv32gc_zve32x_zvl256b_selftests): New function to test VLS modes on rv32gc_zve32x_zvl256b. (run_vectorize_related_mode_vls_selftests): New function to run all VLS mode tests. * config/riscv/riscv-v.cc (can_find_related_mode_p): Fix VLS type handling by converting TARGET_MIN_VLEN from bits to bytes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/zve32f-1.c: Update expected vectorization count from 2 to 3. * gcc.target/riscv/rvv/autovec/zve32f_zvl256b-1.c: Add -mrvv-max-lmul=m2 option. * gcc.target/riscv/rvv/autovec/zve32x_zvl256b-1.c: Add -mrvv-max-lmul=m2 option.
7 daysxtensa: Correct a typoTakayuki 'January June' Suwa1-1/+1
That was introduced in commit 42db504c2fb2b460e24ca0e73b8559fc33b3cf33, over 15 years ago! gcc/ChangeLog: * config/xtensa/xtensa.h (TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P): Change "Xtrnase" in the comment to "Xtensa".
7 daysRISC-V: Fix typo in tt-ascalon-d8's pipeline description [PR121878]Peter Bergner1-6/+6
PR121878 shows a typo in the tt-ascalon-d8's pipeline description that leads to an ICE. The problem is that the vector define_insn_reservation patterns test for scalar modes rather than vector modes, meaning the insns don't get handled correctly. We could correct the modes, but given we could have multiple VLEN values, the number of modes we'd have to check can be large and mode iterators are not allowed in the mode attribute check. Instead, I've removed the mode check and replaced it with a test of the Selected Elenent Width (SEW). 2025-09-09 Peter Bergner <bergner@tenstorrent.com> gcc/ PR target/121878 * config/riscv/tt-ascalon-d8.md (tt_ascalon_d8_vec_idiv_half): Test the Selected Element Width (SEW) rather than the mode. (tt_ascalon_d8_vec_idiv_single): Likewise. (tt_ascalon_d8_vec_idiv_double): Likewise. (tt_ascalon_d8_vec_float_divsqrt_half): Likewise. (tt_ascalon_d8_vec_float_divsqrt_single): Likewise. (tt_ascalon_d8_vec_float_divsqrt_double): Likewise. gcc/testsuite/ PR target/121878 * gcc.target/riscv/pr121878.c: New test. Signed-off-by: Peter Bergner <bergner@tenstorrent.com>
7 daysRISC-V: Add pattern for vector-scalar single widening floating-point subPaul-Antoine Arras1-0/+22
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a minus RTL instruction. The other minus operand is already wide. Before this patch, we have four instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v2,fa0 vfsub.vv v1,v1,v2 After, we get only one: vfwsub.wf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwsub_wf_<mode>): New pattern to combine float_extend + vec_duplicate + vfsub.vv into vfwsub.wf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwsub.wf. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h (DEF_VF_BINOP_WIDEN_CASE_2, DEF_VF_BINOP_WIDEN_CASE_3): Swap operands. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-2-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-2-f32.c: New test.
7 dayss390: Implement clz and ctz for SI modeJuergen Christ2-2/+25
To properly implement __builtin_ffs for SI mode, implement clz and (for >= z17) ctz for SI mode. Otherwise, gcc falls back to a libcall which causes problems for Linux kernel code. Also adjust the C?Z_DEFINED_VALUE_AT_ZERO macros to return 2. Since the optabs now return exactly the value set by these macros, return value 2 is more appropriate and leads to better code. gcc/ChangeLog: * config/s390/s390.h (CLZ_DEFINED_VALUE_AT_ZERO): Adjust and return 2. (CTZ_DEFINED_VALUE_AT_ZERO): Return 2. * config/s390/s390.md (clzsi2): Implement. (ctzsi2): Implement. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr109011-2.c: Fix expected outcome. * gcc.dg/vect/pr109011-4.c: Fix expected outcome. * gcc.target/s390/ffs-1.c: New test. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
7 dayss390: fix vec_extract_plus define insnMaximilian Immanuel Brandtner1-6/+21
Because of a wrong define_insn for vec_extract_plus a vector access wasn't combined with a preceeding plus operation which set the offset at which to perform the vector access even though the instruction offers that capability. Bootstrapped and regtested on s390x. gcc/ChangeLog: * config/s390/vector.md (*vec_extract<mode>_plus_zero_extend): Fix define insn. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vec-extract-3.c: New test. Signed-off-by: Maximilian Immanuel Brandtner <maxbr@linux.ibm.com>
8 daysamdgcn: fix GFX10/GFX11 VGPR countsAndrew Stubbs2-21/+17
The previous definition had all the GFX11 register counts doubled to fix a bug that was encountered in early testing. This seems to have been a misunderstanding of the problem (which is no longer reproducible). gcc/ChangeLog: * config/gcn/gcn-devices.def: Correct the Max ISA VGPRs counts for GFX10 and GFX11 devices. * config/gcn/gcn.cc (gcn_hsa_declare_function_name): Remove the wave64 VGPR count fudge.
8 daysamdgcn: fix builtin codegen at -O0Andrew Stubbs1-1/+25
Fix an unrecognised insn ICE that only shows while using builtins at -O0. gcc/ChangeLog: * config/gcn/gcn.cc (gcn_expand_builtin_1): Enable the "mode" parameter and ensure that "target" is a register for most of the builtins.
8 daysRISC-V: Add pattern for vector-scalar dual widening floating-point subPaul-Antoine Arras1-0/+23
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a minus RTL instruction. Both minus operands are widened. Before this patch, we have six instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v3,fa0 vfwcvt.f.f.v v1,v2 vsetvli zero,zero,e64,m1,ta,ma vfsub.vv v1,v1,v3 After, we get only one: vfwsub.vf v1,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwsub_vf_<mode>): New pattern to combine float_extend + vec_duplicate + vfwsub.vv into vfwsub.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwsub.vf. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h (DEF_VF_BINOP_WIDEN_CASE_0, DEF_VF_BINOP_WIDEN_CASE_1): Swap operands. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwsub-run-1-f32.c: New test.
8 daysRISC-V: Add pattern for vector-scalar single widening floating-point addPaul-Antoine Arras2-4/+26
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a plus RTL instruction. The other plus operand is already wide. Before this patch, we have four instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v2,fa0 vfadd.vv v1,v1,v2 After, we get only one: vfwadd.wf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwadd_wf_<mode>): New pattern to combine float_extend + vec_duplicate + vfadd.vv into vfwadd.wf. * config/riscv/vector.md (@pred_single_widen_<plus_minus:optab><mode>_scalar): Swap and reorder operands to match the RTL emitted by expand. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwadd.wf. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for single widening variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: Add support for single widening variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-2-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-2-f32.c: New test.
8 daysRevert "aarch64: Handle DImode BCAX operations"Kyrylo Tkachov1-29/+0
This reverts commit 1b7bcac0327ccd84f1966c748f4d1aedef64a9c5. PR target/121785 gcc/ * config/aarch64/aarch64-simd.md (*bcaxqdi4): Delete. gcc/testsuite/ * gcc.target/aarch64/simd/bcax_d.c: Remove tests for DImode arguments.
8 daysx86: Enable SSE4.1 ceil/floor/trunc for -OsH.J. Lu1-5/+4
Enable SSE4.1 ceil/floor/trunc for -Os to replace a function call with roundss or roundsd by dropping the !flag_trapping_math check. gcc/ PR target/121861 * config/i386/i386.cc (ix86_optab_supported_p): Drop !flag_trapping_math check for floor_optab, ceil_optab and btrunc_optab. gcc/testsuite/ PR target/121861 * gcc.target/i386/pr121861-1a.c: New file. * gcc.target/i386/pr121861-1b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
9 daysUse vpermil{ps,pd} instead of vperm{d,q} when permutation is in-lane.liuhongt3-10/+31
gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_vpermil): Extend to handle V8SImode. * config/i386/i386.cc (avx_vpermilp_parallel): Extend to handle vector integer modes with same vector size and same component size. * config/i386/sse.md (<sse2_avx_avx512f>_vpermilp<mode><mask_name>): Ditto. (V48_AVX): New mode iterator. (ssefltmodesuffix): Extend for V16SI/V8DI/V16SF/V8DF. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm-3.c: New test. * gcc.target/i386/avx256_avoid_vec_perm-4.c: New test. * gcc.target/i386/avx512bw-vpalignr-4.c: Adjust testcase. * gcc.target/i386/avx512vl-vpalignr-4.c: Ditto.
9 daysExclude fake cross-lane permutation from avx256_avoid_vec_perm.liuhongt1-2/+57
SLP may take a broadcast as kind of vec_perm, the patch checks the permutation index to exclude those false positive. gcc/ChangeLog: * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Check permutation index for vec_perm, don't count it if we know it's not a cross-lane permutation. gcc/testsuite/ChangeLog: * gcc.target/i386/avx256_avoid_vec_perm.c: Adjust testcase. * gcc.target/i386/avx256_avoid_vec_perm-2.c: New test. * gcc.target/i386/avx256_avoid_vec_perm-5.c: New test.
9 daysRISC-V: Add pattern for vector-scalar widening floating-point addPaul-Antoine Arras1-0/+23
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a plus RTL instruction. Before this patch, we have four instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v3,fa0 vfwadd.wv v1,v3,v2 After, we get only one: vfwadd.vf v1,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwadd_vf_<mode>): New pattern to combine float_extend + vec_duplicate + vfwadd.vv into vfwadd.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h (DEF_VF_BINOP_WIDEN_CASE_0): Fix OP. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwadd-run-1-f32.c: New test.
9 daysRISC-V: Adjust tt-ascalon-d8 branch costAnton Blanchard1-1/+1
If-conversion isn't being applied to this nbench code: #include <stdint.h> #define INTERNAL_FPF_PRECISION 4 typedef uint16_t u16; void ShiftMantLeft1(u16 *carry, u16 *mantissa) { int i; int new_carry; u16 accum; for(i=INTERNAL_FPF_PRECISION-1;i>=0;i--) { accum=mantissa[i]; new_carry=accum & 0x8000; accum=accum<<1; if(*carry) accum|=1; *carry=new_carry; mantissa[i]=accum; } return; } Bumping branch_cost from 3 to 4 triggers if-conversion, improving the nbench FP EMULATION result on Ascalon significantly. There's a risk that more aggressive use of conditional zero instructions will negatively impact workloads that predict well, but we haven't seen anything obvious. gcc/ChangeLog: * config/riscv/riscv.cc (tt_ascalon_d8_tune_info): Increase branch_cost from 3 to 4.
9 daysRISC-V: Add pattern for vector-scalar single-width floating-point reverse subPaul-Antoine Arras1-0/+20
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the minuend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfsub.vv v1,v2,v1 After, we get only one: vfrsub.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfrsub_vf_<mode>): New pattern to combine vec_duplicate + vfsub.vv into vfrsub.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfrsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfrsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfrsub-run-1-f64.c: New test.
9 daysRISC-V: Add pattern for vector-scalar single-width floating-point subPaul-Antoine Arras2-6/+25
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a minus RTL instruction. The vec_duplicate is the subtrahend operand. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfsub.vv v1,v1,v2 After, we get only one: vfsub.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfsub_vf_<mode>): New pattern to combine vec_duplicate + vfsub.vv into vfsub.vf. * config/riscv/vector.md (@pred_<optab><mode>_scalar): Allow VLS modes. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-sub-2.c: Adjust scan dumps. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfsub. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfsub-run-1-f64.c: New test.
9 daysRISC-V: Add pattern for vector-scalar single-width floating-point addPaul-Antoine Arras1-0/+19
This pattern enables the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into a plus RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfadd.vv v1,v1,v2 After, we get only one: vfadd.vf v1,v1,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfadd_vf_<mode>): New pattern to combine vec_duplicate + vfadd.vv into vfadd.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/floating-point-add-2.c: Adjust scan dump. * gcc.target/riscv/rvv/autovec/vls/floating-point-add-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/floating-point-sub-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_data.h: Add data for vfadd. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfadd-run-1-f64.c: New test.
9 daysRISC-V: Add pattern for vector-scalar widening floating-point multiplyPaul-Antoine Arras2-2/+25
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a mult RTL instruction. Before this patch, we have six instructions, e.g.: fcvt.d.s fa0,fa0 vsetvli a5,zero,e64,m1,ta,ma vfmv.v.f v3,fa0 vfwcvt.f.f.v v1,v2 vsetvli zero,zero,e64,m1,ta,ma vfmul.vv v1,v3,v1 After, we get only one: vfwmul.vf v1,v2,fa0 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwmul_vf_<mode>): New pattern to combine float_extend + vec_duplicate + vfmul.vv into vfmul.vf. * config/riscv/vector.md (*@pred_dual_widen_<optab><mode>_scalar): Swap operands to match the RTL emitted by expand, i.e. first float_extend then vec_duplicate. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwmul. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop.h: Add support for widening variants. * gcc.target/riscv/rvv/autovec/vx_vf/vf_binop_widen_run.h: New test helper. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwmul-run-1-f32.c: New test.
9 daysRISC-V: Add patterns for vector-scalar IEEE floating-point maxPaul-Antoine Arras1-6/+8
These patterns enable the combine pass (or late-combine, depending on the case) to merge a vec_duplicate into an unspec_vfmax RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.f v2,fa0 vfmax.vv v1,v2,v1 After, we get only one: vfmax.vf v1,v1,fa0 In some cases, it also shaves off one vsetvli. gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfmin_vf_ieee_<mode>): Rename into... (*v<ieee_fmaxmin_op>_vf_<mode>): New pattern to combine vec_duplicate + vf{max,min}.vv (unspec) into vf{max,min}.vf. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f16.c: Add vfmax. * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-5-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-6-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-7-f64.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f16.c: Add vfmax. Also add missing -fno-fast-math. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-8-f64.c: Likewise.
10 daysRISC-V: Add support for the XAndesvdot ISA extension.Kuan-Lin Chen8-3/+148
This extension defines vector instructions to calculae of the signed/unsigned dot product of four SEW/4-bit data and accumulate the result into a SEWbit element for all elements in a vector register. gcc/ChangeLog: * config/riscv/andes-vector-builtins-bases.cc (nds_vd4dot): New class. (class nds_vd4dotsu): New class. * config/riscv/andes-vector-builtins-bases.h: New def. * config/riscv/andes-vector-builtins-functions.def (nds_vd4dots): Ditto. (nds_vd4dotsu): Ditto. (nds_vd4dotu): Ditto. * config/riscv/andes-vector.md (@pred_nds_vd4dot<su><mode>): New pattern. (@pred_nds_vd4dotsu<mode>): New pattern. * config/riscv/genrvv-type-indexer.cc (main): Modify sew of QUAD_FIX, QUAD_FIX_SIGNED and QUAD_FIX_UNSIGNED. * config/riscv/riscv-vector-builtins.cc (qexti_vvvv_ops): New operand information. (qexti_su_vvvv_ops): New operand information. (qextu_vvvv_ops): New operand information. * config/riscv/riscv-vector-builtins.h (XANDESVDOT_EXT): New def. (required_ext_to_isa_name): Add case XANDESVDOT_EXT. (required_extensions_specified): Ditto. (struct function_group_info): Ditto. * config/riscv/vector-iterators.md (NDS_QUAD_FIX): New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vd4dotu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dots.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotsu.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vd4dotu.c: New test.
10 days[RISC-V] Fix ordering of pipeline modelsJeff Law1-1/+1
I missed that the new ascalon pipeline description was put into the wrong place during review. The net is tests which wanted to use generic-ooo explicitly for stability in the test output ended up getting a different pipeline model and different codegen than the test expected. This tripped a small number of vsetvl failures in the testsuite. This has spun on riscv64-elf and riscv32-elf in my tester and fixes the regression. I'm going to go ahead and push it as I'm likely offline this afternoon/evening and don't want anyone else to waste their time chasing the regression down. gcc/ * config/riscv/riscv-opts.h (riscv_microarchitecture_type): Fix ordering.
10 daysRISC-V: Add support for the XAndesvpackfph ISA extension.Kuan-Lin Chen9-0/+113
This extension defines vector instructions to extract a pair of FP16 data from a floating-point register. Multiply the top FP16 data with the FP16 elements and add the result with the bottom FP16 data. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Turn on VECTOR_ELEN_FP_16 for XAndesvpackfph. * config/riscv/andes-vector-builtins-bases.cc (nds_vfpmad): New class. * config/riscv/andes-vector-builtins-bases.h: New def. * config/riscv/andes-vector-builtins-functions.def (nds_vfpmadt): Ditto. (nds_vfpmadb): Ditto. (nds_vfpmadt_frm): Ditto. (nds_vfpmadb_frm): Ditto. * config/riscv/andes-vector.md (@pred_nds_vfpmad<nds_tb><mode>): New pattern. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_F16_OPS): New def. * config/riscv/riscv-vector-builtins.cc (f16_ops): Ditto * config/riscv/riscv-vector-builtins.def (float32_type_node): Ditto. * config/riscv/riscv-vector-builtins.h (XANDESVPACKFPH_EXT): Ditto. (required_ext_to_isa_name): Add case XANDESVPACKFPH_EXT. (required_extensions_specified): Ditto. * config/riscv/vector-iterators.md (VHF): New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfpmadt.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadb.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfpmadt.c: New test.
10 daysAVR: ad target/121794 - Invoke zero_reg less.Georg-Johann Lay1-5/+5
gcc/ PR target/121794 * config/avr/avr.md (cmpqi3): Use cpi R,0 if possible.
10 daysRISC-V: Combine vec_duplicate + vnmsub.vv to vnmsub.vx on GR2VR costPan Li2-27/+31
This patch would like to combine the vec_duplicate + vnmsub.vv to the vnmsub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 ... 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma ... 22 │ vnmsub.vv v1,v2,v3 ... 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 ... 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma ... 20 │ vnmsub.vx v1,a2,v3 ... 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vnmsac_vx_<mode>): Rename from. (*mul_minus_vx_<mode>): Rename to and add nmsub support. * config/riscv/vector.md (@pred_vnmsac_vx_<mode>): Rename from. (@pred_mul_minus_vx_<mode>): Rename to and add nmsub support. (*pred_nmsac_<mode>_scalar_undef): Rename from. (*pred_mul_minus_vx<mode>_undef): Rename to and add nmsub support. Signed-off-by: Pan Li <pan2.li@intel.com>
11 daysRISC-V: Add support for the XAndesvsintload ISA extension.Kuan-Lin Chen9-1/+136
This extension defines vector load instructions to move sign-extended or zero-extended INT4 data into 8-bit vector register elements. gcc/ChangeLog: * config/riscv/andes-vector-builtins-bases.cc (nds_nibbleload): New class. * config/riscv/andes-vector-builtins-bases.h (nds_vln8): New def. (nds_vlnu8): Ditto. * config/riscv/andes-vector-builtins-functions.def (nds_vln8): Ditto. (nds_vlnu8): Ditto. * config/riscv/andes-vector.md (@pred_intload_mov<su><mode>): New pattern. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_Q_OPS): New def. (DEF_RVV_QU_OPS): Ditto. * config/riscv/riscv-vector-builtins.cc (q_v_void_const_ptr_ops): New operand information. (qu_v_void_const_ptr_ops): Ditto. * config/riscv/riscv-vector-builtins.def (void_const_ptr): New def. * config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto. (required_ext_to_isa_name): Add case XANDESVSINTLOAD_EXT. (required_extensions_specified): Ditto. * config/riscv/vector-iterators.md (NDS_QVI): New iterator. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vln8.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vln8.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vln8.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vln8.c: New test.
11 daysRISC-V: Add support for the XAndesvbfhcvt ISA extension.Kuan-Lin Chen12-1/+344
This patch add support for XAndesvbfhcvt ISA extension. This extension defines instructions to perform vector floating-point conversion between the BFLOAT16 floating-point data and the IEEE-754 32-bit single-precision floating-point (SP) data in a vector register. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Turn on VECTOR_ELEN_BF_16 for XAndesvbfhcvt. * config.gcc: Add extra_objs andes-vector-builtins-bases.o and extra_headers andes_vector.h. * config/riscv/riscv-vector-builtins-shapes.cc (BASE_NAME_MAX_LEN): Increase size to 20. * config/riscv/riscv-vector-builtins.cc (f32_to_bf16_nf_w_ops): New operand information. (f32_to_bf16_nf_w_ops): New operand information. (DEF_RVV_FUNCTION): New def. * config/riscv/riscv-vector-builtins.def (bf16): Ditto. * config/riscv/riscv-vector-builtins.h (enum required_ext): Ditto. (required_ext_to_isa_name): Add case XANDESVBFHCVT_EXT. (required_extensions_specified): Ditto. * config/riscv/t-riscv: Add andes-vector-builtins-functions.def, andes-vector-builtins-bases.h and andes-vector-builtins-bases.o. * config/riscv/vector-iterators.md (NDS_VWEXTBF): New iterator. (NDS_V_DOUBLE_TRUNC_BF): New attr. * config/riscv/andes-vector-builtins-bases.cc: New file. * config/riscv/andes-vector-builtins-bases.h: New file. * config/riscv/andes-vector-builtins-functions.def: New file. * config/riscv/andes_vector.h: New file. * config/riscv/andes-vector.md: New file. * config/riscv/vector.md: Include andes_vector.md. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add regression for xandesvector. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfncvtbf16s.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/non-overloaded/nds_vfwcvtsbf16.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfncvtbf16s.c: New test. * gcc.target/riscv/rvv/xandesvector/non-policy/overloaded/nds_vfwcvtsbf16.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfncvtbf16s.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/non-overloaded/nds_vfwcvtsbf16.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfncvtbf16s.c: New test. * gcc.target/riscv/rvv/xandesvector/policy/overloaded/nds_vfwcvtsbf16.c: New test.
11 daysRISC-V: Add tt-ascalon-d8 pipeline descriptionAnton Blanchard4-2/+357
Add pipeline description for the Tenstorrent Ascalon 8 wide CPU. gcc/ChangeLog * config/riscv/riscv-cores.def (RISCV_TUNE): Update. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add tt_ascalon_d8. * config/riscv/riscv.md: Update tune attribute and include tt-ascalon-d8.md. * config/riscv/tt-ascalon-d8.md: New file.
12 daysRISC-V: Check if we can vec_extract [PR121510].Robin Dapp1-1/+2
For Zvfhmin a vector mode exists but the corresponding vec_extract does not. This patch checks that a vec_extract is available and otherwise falls back to standard handling. PR target/121510 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_move): Check if we can vec_extract. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr121510.c: New test.
12 daysAVR: target/121794 - Invoke zero_reg less.Georg-Johann Lay1-27/+47
There are some cases where involing zero_reg is not needed and where there are other sequences with the same efficiency. An example is to use SBCI R,0 instead of SBC R,__zero_reg__ when R >= R16. This may turn out to be better for small ISRs. PR target/121794 gcc/ * config/avr/avr.cc (avr_out_compare): Only use zero_reg when there is no other sequence of the same length. (avr_out_plus_ext): Same. (avr_out_plus_1): Same.
12 daysaarch64: Use SVE for V2DImode integer min/max operationsKyrylo Tkachov3-6/+19
Unlike Advanced SIMD, SVE has instruction to perform smin, smax, umin, umax on 64-bit elements. Thus, we can use them with the fixed-width V2DImode expander. Most of the machinery is already there on the define_insn side, supporting V2DImode operands of the SVE pattern. We just need to wire up the RTL emission to the v2di standard names for the TARGET_SVE case. So for the smin case we now generate: min_di: ldr q30, [x0] ptrue p7.b, all ldr q31, [x1] smin z30.d, p7/m, z30.d, z31.d str q30, [x2] ret min_imm_di: ldr q31, [x0] smin z31.d, z31.d, #5 str q31, [x2] ret instead of the previous: min_di: ldr q30, [x0] ldr q31, [x1] cmgt v29.2d, v30.2d, v31.2d bsl v29.16b, v31.16b, v30.16b str q29, [x2] ret min_imm_di: ldr q31, [x0] mov z30.d, #5 cmgt v29.2d, v30.2d, v31.2d bsl v29.16b, v31.16b, v30.16b str q29, [x2] ret The register operand case is the same length, though the new ptrue can now be shared and moved away. But the immediate operand case is obviously better as the SVE immediate form doesn't require a predicate operand. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/iterators.md (sve_di_suf): New mode attribute. * config/aarch64/aarch64-sve.md (<optab><mode>3 SVE_INT_BINARY_MULTI): Rename to... (<optab><mode>3<sve_di_suf>): ... This. Use SVE_I_SIMD_DI mode iterator. * config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Use the above for TARGET_SVE. gcc/testsuite/ * gcc.target/aarch64/sve/usminmax_di.c: New test.