aboutsummaryrefslogtreecommitdiff
path: root/gcc/config/loongarch
AgeCommit message (Collapse)AuthorFilesLines
2024-02-04LoongArch: Fix wrong LSX FP vector negationXi Ruoyao3-27/+18
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is wrong because -0.0 is not 0 - 0.0. This causes some Python tests to fail when Python is built with LSX enabled. Use the vbitrevi.{d/w} instructions to simply reverse the sign bit instead. We are already doing this for LASX and now we can unify them into simd.md. gcc/ChangeLog: * config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the incorrect expand. * config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr. (elmsgnbit): Likewise. (neg<mode:FVEC>2): New define_insn. * config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they are now instantiated in simd.md.
2024-02-04LoongArch: Avoid out-of-bounds access in loongarch_symbol_insnsXi Ruoyao1-1/+2
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes. But in loongarch_symbol_insns: if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode)) return 0; And LSX_SUPPORTED_MODE_P is defined as: #define LSX_SUPPORTED_MODE_P(MODE) \ (ISA_HAS_LSX \ && GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ... GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined: ALWAYS_INLINE poly_uint16 mode_to_bytes (machine_mode mode) { #if GCC_VERSION >= 4001 return (__builtin_constant_p (mode) ? mode_size_inline (mode) : mode_size[mode]); #else return mode_size[mode]; #endif } There is an assertion in mode_size_inline: gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES); Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc), thus if __builtin_constant_p (mode) is evaluated true (it happens when GCC is bootstrapped with LTO+PGO), the assertion will be triggered and cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false, mode_size[mode] is still an out-of-bound array access (the length or the mode_size array is NUM_MACHINE_MODES). So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a MIPS bug PR98491 fixed by me about 3 years ago. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is MAX_MACHINE_MODE.
2024-02-04LoongArch: testsuite: Fix gcc.dg/vect/vect-reduc-mul_{1, 2}.c FAIL.Li Wei1-55/+163
This FAIL was introduced from r14-6908. The reason is that when merging constant vector permutation implementations, the 128-bit matching situation was not fully considered. In fact, the expansion of 128-bit vectors after merging only supports value-based 4 elements set shuffle, so this time is a complete implementation of the entire 128-bit vector constant permutation, and some structural adjustments have also been made to the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust. (loongarch_expand_vselect_vconcat): Ditto. (loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement all 128-bit constant permutation situations. (loongarch_expand_lsx_shuffle): Adjust and rename function name. (loongarch_is_imm_set_shuffle): Renamed function name. (loongarch_expand_vec_perm_even_odd): Function forward declaration. (loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit extract-even and extract-odd permutations. (loongarch_is_odd_extraction): Delete. (loongarch_is_even_extraction): Ditto. (loongarch_expand_vec_perm_const): Adjust.
2024-02-03LoongArch: Fix an ODR violationXi Ruoyao2-2/+3
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR violation is detected: ../../gcc/config/loongarch/loongarch-opts.cc:57: warning: 'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr] 57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES]; ../../gcc/config/loongarch/loongarch-def.cc:186: note: 'abi_minimal_isa' was previously declared here 186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>, ../../gcc/config/loongarch/loongarch-def.cc:186: note: code may be misoptimized unless '-fno-strict-aliasing' is used Fix it by adding a proper declaration of abi_minimal_isa into loongarch-def.h and remove the ODR-violating local declaration in loongarch-opts.cc. gcc/ChangeLog: * config/loongarch/loongarch-def.h (abi_minimal_isa): Declare. * config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove the ODR-violating locale declaration.
2024-02-02LoongArch: Fix incorrect return type for frecipe/frsqrte intrinsic functionsJiahao Xu1-8/+8
gcc/ChangeLog: * config/loongarch/larchintrin.h (__frecipe_s): Update function return type. (__frecipe_d): Ditto. (__frsqrte_s): Ditto. (__frsqrte_d): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
2024-02-02LoongArch: Adjust cost of vector_stmt that match multiply-add pattern.Li Wei1-0/+48
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r failed to vectorize effectively. For this reason, we adjust the cost of 128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit vectorization. The experimental results show that after the modification, 549.fotonik3d_r performance can be improved by 9.77% under the 128-bit vectorization option. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_multiply_add_p): New. (loongarch_vector_costs::add_stmt_cost): Adjust. gcc/testsuite/ChangeLog: * gfortran.dg/vect/vect-10.f90: New test.
2024-02-02LoongArch: Don't split the instructions containing relocs for extreme code ↵Xi Ruoyao2-57/+94
model. The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for addressing a symbol to be adjacent. So model them as "one large instruction", i.e. define_insn, with two output registers. The real address is the sum of these two registers. The advantage of this approach is the RTL passes can still use ldx/stx instructions to skip an addi.d instruction. gcc/ChangeLog: * config/loongarch/loongarch.md (unspec): Add UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2. (la_pcrel64_two_parts): New define_insn. * config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a typo in the comment. (loongarch_call_tls_get_addr): If -mcmodel=extreme -mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL note to allow CSE addressing __tls_get_addr. (loongarch_legitimize_tls_address): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address TLS IE symbols with la_pcrel64_two_parts. (loongarch_split_symbol): If -mcmodel=extreme -mexplicit-relocs={always,auto}, address symbols with la_pcrel64_two_parts. (loongarch_output_mi_thunk): Clean up unreachable code. If -mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI thunks with la_pcrel64_two_parts. gcc/testsuite/ChangeLog: * gcc.target/loongarch/func-call-extreme-1.c (dg-options): Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i instruction sequences are not reordered by the compiler. (NOIPA): Disallow interprocedural optimizations. * gcc.target/loongarch/func-call-extreme-2.c: Remove the content duplicated from func-call-extreme-1.c, include it instead. (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-3.c (dg-options): Likewise. * gcc.target/loongarch/func-call-extreme-4.c (dg-options): Likewise. * gcc.target/loongarch/cmodel-extreme-1.c: New test. * gcc.target/loongarch/cmodel-extreme-2.c: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test. * g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
2024-02-02LoongArch: Added support for loading __get_tls_addr symbol address using call36.Lulu Cheng1-6/+16
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_call_tls_get_addr): Add support for call36. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
2024-02-02LoongArch: Enable explicit reloc for extreme TLS GD/LD with ↵Lulu Cheng1-10/+9
-mexplicit-relocs=auto. Binutils does not support relaxation using four instructions to obtain symbol addresses gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): When the code model of the symbol is extreme and -mexplicit-relocs=auto, the macro instruction loading symbol address is not applicable. (loongarch_call_tls_get_addr): Adjust code. (loongarch_legitimize_tls_address): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test. * gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
2024-02-02LoongArch: Add the macro implementation of mcmodel=extreme.Lulu Cheng4-44/+127
gcc/ChangeLog: * config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p): Add function declaration. * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend" is not allowed (loongarch_load_tls): Added macro support in extreme mode. (loongarch_call_tls_get_addr): Likewise. (loongarch_legitimize_tls_address): Likewise. (loongarch_force_address): Likewise. (loongarch_legitimize_move): Likewise. (loongarch_output_mi_thunk): Likewise. (loongarch_option_override_internal): Remove the code that detects explicit relocs status. (loongarch_handle_model_attribute): Likewise. * config/loongarch/loongarch.md (movdi_symbolic_off64): New template. * config/loongarch/predicates.md (symbolic_off64_operand): New predicate. (symbolic_off64_or_reg_operand): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/attr-model-5.c: New test. * gcc.target/loongarch/func-call-extreme-5.c: New test. * gcc.target/loongarch/func-call-extreme-6.c: New test. * gcc.target/loongarch/tls-extreme-macro.c: New test.
2024-02-02LoongArch: Merge template got_load_tls_{ld/gd/le/ie}.Lulu Cheng2-76/+30
gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_load_tls): Load all types of tls symbols through one function. (loongarch_got_load_tls_gd): Delete. (loongarch_got_load_tls_ld): Delete. (loongarch_got_load_tls_ie): Delete. (loongarch_got_load_tls_le): Delete. (loongarch_call_tls_get_addr): Modify the called function name. (loongarch_legitimize_tls_address): Likewise. * config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete. (@load_tls<mode>): New template. (@got_load_tls_ld<mode>): Delete. (@got_load_tls_le<mode>): Delete. (@got_load_tls_ie<mode>): Delete.
2024-02-02LoongArch: Modify the address calculation logic for obtaining array element ↵Lulu Cheng1-0/+43
values through fp. Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C). Thereby modifying the register dependencies and optimizing the code. The value of C is 2 4 or 8. The following is the assembly code before and after a loop modification in spec2006 401.bzip: old | new 735 .L71: | 735 .L71: 736 slli.d $r12,$r15,2 | 736 slli.d $r12,$r15,2 737 ldx.w $r13,$r22,$r12 | 737 ldx.w $r13,$r22,$r12 738 addi.d $r15,$r15,-1 | 738 addi.d $r15,$r15,-1 739 slli.w $r16,$r15,0 | 739 slli.w $r16,$r15,0 740 addi.w $r13,$r13,-1 | 740 addi.w $r13,$r13,-1 741 slti $r14,$r13,0 | 741 slti $r14,$r13,0 742 add.w $r12,$r26,$r13 | 742 add.w $r12,$r26,$r13 743 maskeqz $r12,$r12,$r14 | 743 maskeqz $r12,$r12,$r14 744 masknez $r14,$r13,$r14 | 744 masknez $r14,$r13,$r14 745 or $r12,$r12,$r14 | 745 or $r12,$r12,$r14 746 ldx.bu $r14,$r30,$r12 | 746 ldx.bu $r14,$r30,$r12 747 lu12i.w $r13,4096>>12 | 747 alsl.d $r14,$r14,$r18,2 748 ori $r13,$r13,432 | 748 ldptr.w $r13,$r14,0 749 add.d $r13,$r13,$r3 | 749 addi.w $r17,$r13,-1 750 alsl.d $r14,$r14,$r13,2 | 750 stptr.w $r17,$r14,0 751 ldptr.w $r13,$r14,-1968 | 751 slli.d $r13,$r13,2 752 addi.w $r17,$r13,-1 | 752 stx.w $r12,$r22,$r13 753 st.w $r17,$r14,-1968 | 753 ldptr.w $r12,$r19,0 754 slli.d $r13,$r13,2 | 754 blt $r12,$r16,.L71 755 stx.w $r12,$r22,$r13 | 755 .align 4 756 ldptr.w $r12,$r18,-2048 | 756 757 blt $r12,$r16,.L71 | 757 758 .align 4 | 758 This patch is ported from riscv's commit r14-3111. gcc/ChangeLog: * config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function. (loongarch_legitimize_address): Add logical transformation code.
2024-01-26LoongArch: Split vec_selects of bottom elements into simple moveJiahao Xu1-0/+15
For below pattern, can be treated as a simple move because floating point and vector share a common register on loongarch64. (set (reg/v:SF 32 $f0 [orig:93 res ] [93]) (vec_select:SF (reg:V8SF 32 $f0 [115]) (parallel [ (const_int 0 [0]) ]))) gcc/ChangeLog: * config/loongarch/lasx.md (vec_extract<mode>_0): New define_insn_and_split patten. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-extract.c: New test.
2024-01-26LoongArch: Define LOGICAL_OP_NON_SHORT_CIRCUITJiahao Xu1-0/+1
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the short-circuit operation instead of the non-short-circuit operation. SPEC2017 performance evaluation shows 1% performance improvement for fprate GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6% on 3A6000. This modification will introduce the following FAIL items: FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Conditional combines static and invariant" 1 FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Will duplicate bb" 2 FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid sum" 0 gcc/ChangeLog: * config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define. gcc/testsuite/ChangeLog: * gcc.target/loongarch/short-circuit.c: New test.
2024-01-26LoongArch: Optimize implementation of single-precision floating-point ↵Li Wei1-6/+13
approximate division. We found that in the spec17 521.wrf program, some loop invariant code generated from single-precision floating-point approximate division calculation failed to propose a loop. This is because the pseudo-register that stores the intermediate temporary calculation results is rewritten in the implementation of single-precision floating-point approximate division, failing to propose invariants in the loop2_invariant pass. To this end, the intermediate temporary calculation results are stored in new pseudo-registers without destroying the read-write dependency, so that they could be recognized as loop invariants in the loop2_invariant pass. After optimization, the number of instructions of 521.wrf is reduced by 0.18% compared with before optimization (1716612948501 -> 1713471771364). gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust. gcc/testsuite/ChangeLog: * gcc.target/loongarch/invariant-recip.c: New test.
2024-01-25LoongArch: Remove vec_concatz<mode> pattern.Jiahao Xu2-26/+6
It is incorrect to use vld/vori to implement the vec_concatz<mode> because when the LSX instruction is used to update the value of the vector register, the upper 128 bits of the vector register will not be zeroed. gcc/ChangeLog: * config/loongarch/lasx.md (@vec_concatz<mode>): Remove this define_insn pattern. * config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use vec_concat<mode>.
2024-01-25LoongArch: Disable TLS type symbols from generating non-zero offsets.Lulu Cheng1-9/+9
TLS gd ld and ie type symbols will generate corresponding GOT entries, so non-zero offsets cannot be generated. The address of TLS le type symbol+addend is not implemented in binutils, so non-zero offset is not generated here for the time being. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_symbolic_constant_p): For symbols of type tls, non-zero Offset is not generated.
2024-01-23LoongArch: Disable explicit reloc for TLS LD/GD with -mexplicit-relocs=autoXi Ruoyao1-5/+5
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler macro. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_explicit_relocs_p): If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false for SYMBOL_TLS_LDM and SYMBOL_TLS_GD. (loongarch_call_tls_get_addr): Do not split symbols of SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check for la.tls.ld and la.tls.gd.
2024-01-18LoongArch: Remove constraint z from movsi_internalXi Ruoyao1-3/+3
We don't allow SImode in FCC, so constraint z is never really used here. gcc/ChangeLog: * config/loongarch/loongarch.md (movsi_internal): Remove constraint z.
2024-01-18LoongArch: Assign the '/u' attribute to the mem to which the global offset ↵Lulu Cheng1-0/+5
table belongs. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_split_symbol): Assign the '/u' attribute to the mem. gcc/testsuite/ChangeLog: * g++.target/loongarch/got-load.C: New test.
2024-01-12LoongArch: Redundant sign extension elimination optimization 2.Li Wei1-0/+6
Eliminate the redundant sign extension that exists after the conditional move when the target register is SImode. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_conditional_move): Adjust. gcc/testsuite/ChangeLog: * gcc.target/loongarch/sign-extend-2.c: Adjust.
2024-01-12LoongArch: Redundant sign extension elimination optimization.Li Wei1-24/+69
We found that the current combine optimization pass in gcc cannot handle the following redundant sign extension situations: (insn 77 76 78 5 (set (reg:SI 143) (plus:SI (subreg/s/u:SI (reg/v:DI 104 [ len ]) 0) (const_int 1 [0x1]))) {addsi3} (expr_list:REG_DEAD (reg/v:DI 104 [ len ]) (nil))) (insn 78 77 82 5 (set (reg/v:DI 104 [ len ]) (sign_extend:DI (reg:SI 143))) {extendsidi2} (nil)) Because reg:SI 143 is not died or set in insn 78, no replacement merge will be performed for the insn sequence. We adjusted the add template to eliminate redundant sign extensions during the expand pass. Adjusted based on upstream comments: https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641988.html gcc/ChangeLog: * config/loongarch/loongarch.md (add<mode>3): Removed. (*addsi3): New. (addsi3): Ditto. (adddi3): Ditto. (*addsi3_extended): Removed. (addsi3_extended): New. gcc/testsuite/ChangeLog: * gcc.target/loongarch/sign-extend.c: Moved to... * gcc.target/loongarch/sign-extend-1.c: ...here. * gcc.target/loongarch/sign-extend-2.c: New test.
2024-01-11LoongArch: Implement option save/restoreYang Yujie5-54/+111
LTO option streaming and target attributes both require per-function target configuration, which is achieved via option save/restore. We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target context in addition to other automatically maintained option states (via the "Save" option property in the .opt files). Tested on loongarch64-linux-gnu without regression. PR target/113233 gcc/ChangeLog: * config/loongarch/genopts/loongarch.opt.in: Mark options with the "Save" property. * config/loongarch/loongarch.opt: Same. * config/loongarch/loongarch-opts.cc: Refresh -mcmodel= state according to la_target. * config/loongarch/loongarch.cc: Implement TARGET_OPTION_{SAVE, RESTORE} for the la_target structure; Rename option conditions to have the same "la_" prefix. * config/loongarch/loongarch.h: Same.
2024-01-11LoongArch: Optimized some of the symbolic expansion instructions generated ↵Lulu Cheng1-24/+69
during bitwise operations. There are two mode iterators defined in the loongarch.md: (define_mode_iterator GPR [SI (DI "TARGET_64BIT")]) and (define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")]) Replace the mode in the bit arithmetic from GPR to X. Since the bitwise operation instruction does not distinguish between 64-bit, 32-bit, etc., it is necessary to perform symbolic expansion if the bitwise operation is less than 64 bits. The original definition would have generated a lot of redundant symbolic extension instructions. This problem is optimized with reference to the implementation of RISCV. Add this patch spec2017 500.perlbench performance improvement by 1.8% gcc/ChangeLog: * config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X. (*nor<mode>3): Likewise. (nor<mode>3): Likewise. (*negsi2_extended): New template. (*<optab>si3_internal): Likewise. (*one_cmplsi2_internal): Likewise. (*norsi3_internal): Likewise. (*<optab>nsi_internal): Likewise. (bytepick_w_<bytepick_imm>_extend): Modify this template according to the modified bit operation to make the optimization work. gcc/testsuite/ChangeLog: * gcc.target/loongarch/sign-extend-bitwise.c: New test.
2024-01-10LoongArch: Simplify -mexplicit-reloc definitionsYang Yujie5-28/+5
Since we do not need printing or manual parsing of this option, (whether in the driver or for target attributes to be supported later) it can be handled in the .opt file framework. gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Remove explicit-reloc argument string definitions. * config/loongarch/loongarch-str.h: Same. * config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]explicit-relocs as aliases to -mexplicit-relocs={always,none} * config/loongarch/loongarch.opt: Regenerate. * config/loongarch/loongarch.cc: Same.
2024-01-10LoongArch: Use enums for constantsYang Yujie1-48/+67
Target features constants from loongarch-def.h are currently defined as macros. Switch to enums for better look in the debugger. gcc/ChangeLog: * config/loongarch/loongarch-def.h: Define constants with enums instead of Macros.
2024-01-10LoongArch: Rename ISA_BASE_LA64V100 to ISA_BASE_LA64Yang Yujie9-21/+21
LoongArch ISA manual v1.10 suggests that software should not depend on the ISA version number for marking processor features. The ISA version number is now defined as a collective name of individual ISA evolutions. Since there is a independent ISA evolution mask now, we can drop the version information from the base ISA. gcc/ChangeLog: * config/loongarch/genopts/loongarch-strings: Rename. * config/loongarch/genopts/loongarch.opt.in: Same. * config/loongarch/loongarch-cpu.cc: Same. * config/loongarch/loongarch-def.cc: Same. * config/loongarch/loongarch-def.h: Same. * config/loongarch/loongarch-opts.cc: Same. * config/loongarch/loongarch-opts.h: Same. * config/loongarch/loongarch-str.h: Same. * config/loongarch/loongarch.opt: Same.
2024-01-10LoongArch: Handle ISA evolution switches along with other optionsYang Yujie14-59/+90
gcc/ChangeLog: * config/loongarch/genopts/genstr.sh: Prepend the isa_evolution variable with the common la_ prefix. * config/loongarch/genopts/loongarch.opt.in: Mark ISA evolution flags as saved using TargetVariable. * config/loongarch/loongarch.opt: Same. * config/loongarch/loongarch-def.h: Define evolution_set to mark changes to the -march default. * config/loongarch/loongarch-driver.cc: Same. * config/loongarch/loongarch-opts.cc: Same. * config/loongarch/loongarch-opts.h: Define and use ISA evolution conditions around the la_target structure. * config/loongarch/loongarch.cc: Same. * config/loongarch/loongarch.md: Same. * config/loongarch/loongarch-builtins.cc: Same. * config/loongarch/loongarch-c.cc: Same. * config/loongarch/lasx.md: Same. * config/loongarch/lsx.md: Same. * config/loongarch/sync.md: Same.
2024-01-09LoongArch: Implement vec_init<M><N> where N is a LSX vector modeJiahao Xu2-7/+63
This patch implements more vec_init optabs that can handle two LSX vectors producing a LASX vector by concatenating them. When an lsx vector is concatenated with an LSX const_vector of zeroes, the vec_concatz pattern can be used effectively. For example as below typedef short v8hi __attribute__ ((vector_size (16))); typedef short v16hi __attribute__ ((vector_size (32))); v8hi a, b; v16hi vec_initv16hiv8hi () { return __builtin_shufflevector (a, b, 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15); } Before this patch: vec_initv16hiv8hi: addi.d $r3,$r3,-64 .cfi_def_cfa_offset 64 xvrepli.h $xr0,0 la.local $r12,.LANCHOR0 xvst $xr0,$r3,0 xvst $xr0,$r3,32 vld $vr0,$r12,0 vst $vr0,$r3,0 vld $vr0,$r12,16 vst $vr0,$r3,32 xvld $xr1,$r3,32 xvld $xr2,$r3,32 xvld $xr0,$r3,0 xvilvh.h $xr0,$xr1,$xr0 xvld $xr1,$r3,0 xvilvl.h $xr1,$xr2,$xr1 addi.d $r3,$r3,64 .cfi_def_cfa_offset 0 xvpermi.q $xr0,$xr1,32 jr $r1 After this patch: vec_initv16hiv8hi: la.local $r12,.LANCHOR0 vld $vr0,$r12,32 vld $vr2,$r12,48 xvilvh.h $xr1,$xr2,$xr0 xvilvl.h $xr0,$xr2,$xr0 xvpermi.q $xr1,$xr0,32 xvst $xr1,$r4,0 jr $r1 gcc/ChangeLog: * config/loongarch/lasx.md (vec_initv32qiv16qi): Rename to .. (vec_init<mode><lasxhalf>): .. this, and extend to mode. (@vec_concatz<mode>): New insn pattern. * config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Handle VALS containing two vectors. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: New test.
2024-01-06LoongArch: Improve lasx_xvpermi_q_<LASX:mode> insn patternJiahao Xu1-1/+8
For instruction xvpermi.q, unused bits in operands[3] need be set to 0 to avoid causing undefined behavior on LA464. gcc/ChangeLog: * config/loongarch/lasx.md: Set the unused bits in operand[3] to 0. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lasx/lasx-xvpremi.c: Removed. * gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c: New test.
2024-01-05LoongArch: Fixed the problem of incorrect judgment of the immediate field of ↵Lulu Cheng4-83/+4
the [x]vld/[x]vst instruction. The [x]vld/[x]vst directive is defined as follows: [x]vld/[x]vst {x/v}d, rj, si12 When not modified, the immediate field of [x]vld/[x]vst is between 10 and 14 bits depending on the type. However, in loongarch_valid_offset_p, the immediate field is restricted first, so there is no error. However, in some cases redundant instructions will be generated, see test cases. Now modify it according to the description in the instruction manual. gcc/ChangeLog: * config/loongarch/lasx.md (lasx_mxld_<lasxfmt_f>): Modify the method of determining the memory offset of [x]vld/[x]vst. (lasx_mxst_<lasxfmt_f>): Likewise. * config/loongarch/loongarch.cc (loongarch_valid_offset_p): Delete. (loongarch_address_insns): Likewise. * config/loongarch/lsx.md (lsx_ld_<lsxfmt_f>): Likewise. (lsx_st_<lsxfmt_f>): Likewise. * config/loongarch/predicates.md (aq10b_operand): Likewise. (aq10h_operand): Likewise. (aq10w_operand): Likewise. (aq10d_operand): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-ld-st-imm12.c: New test.
2024-01-04Add generated .opt.urls filesDavid Malcolm1-0/+66
Changed in v5: regenerated Changed in v4: regenerated Changed in v3: regenerated Changed in v2: the files now contain some lang-specific URLs. gcc/ada/ChangeLog: * gcc-interface/lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/analyzer/ChangeLog: * analyzer.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/c-family/ChangeLog: * c.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/ChangeLog: * common.opt.urls: New file, autogenerated by regenerate-opt-urls.py. * config/aarch64/aarch64.opt.urls: Likewise. * config/alpha/alpha.opt.urls: Likewise. * config/alpha/elf.opt.urls: Likewise. * config/arc/arc-tables.opt.urls: Likewise. * config/arc/arc.opt.urls: Likewise. * config/arm/arm-tables.opt.urls: Likewise. * config/arm/arm.opt.urls: Likewise. * config/arm/vxworks.opt.urls: Likewise. * config/avr/avr.opt.urls: Likewise. * config/bpf/bpf.opt.urls: Likewise. * config/c6x/c6x-tables.opt.urls: Likewise. * config/c6x/c6x.opt.urls: Likewise. * config/cris/cris.opt.urls: Likewise. * config/cris/elf.opt.urls: Likewise. * config/csky/csky.opt.urls: Likewise. * config/csky/csky_tables.opt.urls: Likewise. * config/darwin.opt.urls: Likewise. * config/dragonfly.opt.urls: Likewise. * config/epiphany/epiphany.opt.urls: Likewise. * config/fr30/fr30.opt.urls: Likewise. * config/freebsd.opt.urls: Likewise. * config/frv/frv.opt.urls: Likewise. * config/ft32/ft32.opt.urls: Likewise. * config/fused-madd.opt.urls: Likewise. * config/g.opt.urls: Likewise. * config/gcn/gcn.opt.urls: Likewise. * config/gnu-user.opt.urls: Likewise. * config/h8300/h8300.opt.urls: Likewise. * config/hpux11.opt.urls: Likewise. * config/i386/cygming.opt.urls: Likewise. * config/i386/cygwin.opt.urls: Likewise. * config/i386/djgpp.opt.urls: Likewise. * config/i386/i386.opt.urls: Likewise. * config/i386/mingw-w64.opt.urls: Likewise. * config/i386/mingw.opt.urls: Likewise. * config/i386/nto.opt.urls: Likewise. * config/ia64/ia64.opt.urls: Likewise. * config/ia64/ilp32.opt.urls: Likewise. * config/ia64/vms.opt.urls: Likewise. * config/iq2000/iq2000.opt.urls: Likewise. * config/linux-android.opt.urls: Likewise. * config/linux.opt.urls: Likewise. * config/lm32/lm32.opt.urls: Likewise. * config/loongarch/loongarch.opt.urls: Likewise. * config/lynx.opt.urls: Likewise. * config/m32c/m32c.opt.urls: Likewise. * config/m32r/m32r.opt.urls: Likewise. * config/m68k/ieee.opt.urls: Likewise. * config/m68k/m68k-tables.opt.urls: Likewise. * config/m68k/m68k.opt.urls: Likewise. * config/m68k/uclinux.opt.urls: Likewise. * config/mcore/mcore.opt.urls: Likewise. * config/microblaze/microblaze.opt.urls: Likewise. * config/mips/mips-tables.opt.urls: Likewise. * config/mips/mips.opt.urls: Likewise. * config/mips/sde.opt.urls: Likewise. * config/mmix/mmix.opt.urls: Likewise. * config/mn10300/mn10300.opt.urls: Likewise. * config/moxie/moxie.opt.urls: Likewise. * config/msp430/msp430.opt.urls: Likewise. * config/nds32/nds32-elf.opt.urls: Likewise. * config/nds32/nds32-linux.opt.urls: Likewise. * config/nds32/nds32.opt.urls: Likewise. * config/netbsd-elf.opt.urls: Likewise. * config/netbsd.opt.urls: Likewise. * config/nios2/elf.opt.urls: Likewise. * config/nios2/nios2.opt.urls: Likewise. * config/nvptx/nvptx-gen.opt.urls: Likewise. * config/nvptx/nvptx.opt.urls: Likewise. * config/openbsd.opt.urls: Likewise. * config/or1k/elf.opt.urls: Likewise. * config/or1k/or1k.opt.urls: Likewise. * config/pa/pa-hpux.opt.urls: Likewise. * config/pa/pa-hpux1010.opt.urls: Likewise. * config/pa/pa-hpux1111.opt.urls: Likewise. * config/pa/pa-hpux1131.opt.urls: Likewise. * config/pa/pa.opt.urls: Likewise. * config/pa/pa64-hpux.opt.urls: Likewise. * config/pdp11/pdp11.opt.urls: Likewise. * config/pru/pru.opt.urls: Likewise. * config/riscv/riscv.opt.urls: Likewise. * config/rl78/rl78.opt.urls: Likewise. * config/rpath.opt.urls: Likewise. * config/rs6000/476.opt.urls: Likewise. * config/rs6000/aix64.opt.urls: Likewise. * config/rs6000/darwin.opt.urls: Likewise. * config/rs6000/linux64.opt.urls: Likewise. * config/rs6000/rs6000-tables.opt.urls: Likewise. * config/rs6000/rs6000.opt.urls: Likewise. * config/rs6000/sysv4.opt.urls: Likewise. * config/rtems.opt.urls: Likewise. * config/rx/elf.opt.urls: Likewise. * config/rx/rx.opt.urls: Likewise. * config/s390/s390.opt.urls: Likewise. * config/s390/tpf.opt.urls: Likewise. * config/sh/sh.opt.urls: Likewise. * config/sh/superh.opt.urls: Likewise. * config/sol2.opt.urls: Likewise. * config/sparc/long-double-switch.opt.urls: Likewise. * config/sparc/sparc.opt.urls: Likewise. * config/stormy16/stormy16.opt.urls: Likewise. * config/v850/v850.opt.urls: Likewise. * config/vax/elf.opt.urls: Likewise. * config/vax/vax.opt.urls: Likewise. * config/visium/visium.opt.urls: Likewise. * config/vms/vms.opt.urls: Likewise. * config/vxworks-smp.opt.urls: Likewise. * config/vxworks.opt.urls: Likewise. * config/xtensa/elf.opt.urls: Likewise. * config/xtensa/uclinux.opt.urls: Likewise. * config/xtensa/xtensa.opt.urls: Likewise. gcc/d/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/fortran/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/go/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/lto/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/m2/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/ChangeLog: * params.opt.urls: New file, autogenerated by regenerate-opt-urls.py. gcc/rust/ChangeLog: * lang.opt.urls: New file, autogenerated by regenerate-opt-urls.py. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-01-04LoongArch: Merge constant vector permuatation implementations.Li Wei1-1092/+204
There are currently two versions of the implementations of constant vector permutation: loongarch_expand_vec_perm_const_1 and loongarch_expand_vec_perm_const_2. The implementations of the two versions are different. Currently, only the implementation of loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We hope to streamline the code as much as possible while retaining the better-performing implementation of the two. By repeatedly testing spec2006 and spec2017, we got the following Merged version. Compared with the pre-merger version, the number of lines of code in loongarch.cc has been reduced by 888 lines. At the same time, the performance of SPECint2006 under Ofast has been improved by 0.97%, and the performance of SPEC2017 fprate has been improved by 0.27%. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_is_odd_extraction): Remove useless forward declaration. (loongarch_is_even_extraction): Remove useless forward declaration. (loongarch_try_expand_lsx_vshuf_const): Removed. (loongarch_expand_vec_perm_const_1): Merged. (loongarch_is_double_duplicate): Removed. (loongarch_is_center_extraction): Ditto. (loongarch_is_reversing_permutation): Ditto. (loongarch_is_di_misalign_extract): Ditto. (loongarch_is_si_misalign_extract): Ditto. (loongarch_is_lasx_lowpart_extract): Ditto. (loongarch_is_op_reverse_perm): Ditto. (loongarch_is_single_op_perm): Ditto. (loongarch_is_divisible_perm): Ditto. (loongarch_is_triple_stride_extract): Ditto. (loongarch_expand_vec_perm_const_2): Merged. (loongarch_expand_vec_perm_const): New. (loongarch_vectorize_vec_perm_const): Adjust.
2024-01-03Update copyright years.Jakub Jelinek43-45/+45
2024-01-03LoongArch: Provide fmin/fmax RTL pattern for vectorsXi Ruoyao1-0/+31
We already had smin/smax RTL pattern using vfmin/vfmax instructions. But for smin/smax, it's unspecified what will happen if either operand contains any NaN operands. So we would not vectorize the loop with -fno-finite-math-only (the default for all optimization levels expect -Ofast). But, LoongArch vfmin/vfmax instruction is IEEE-754-2008 conformant so we can also use them and vectorize the loop. gcc/ChangeLog: * config/loongarch/simd.md (fmax<mode>3): New define_insn. (fmin<mode>3): Likewise. (reduc_fmax_scal_<mode>3): New define_expand. (reduc_fmin_scal_<mode>3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vfmax-vfmin.c: New test.
2024-01-02LoongArch: Added TLS Le Relax support.Lulu Cheng3-3/+59
Check whether the assembler supports tls le relax. If it supports it, the assembly instruction sequence of tls le relax will be generated by default. The original way to obtain the tls le symbol address: lu12i.w $rd, %le_hi20(sym) ori $rd, $rd, %le_lo12(sym) add.{w/d} $rd, $rd, $tp If the assembler supports tls le relax, the following sequence is generated: lu12i.w $rd, %le_hi20_r(sym) add.{w/d} $rd,$rd,$tp,%le_add_r(sym) addi.{w/d} $rd,$rd,%le_lo12_r(sym) gcc/ChangeLog: * config.in: Regenerate. * config/loongarch/loongarch-opts.h (HAVE_AS_TLS_LE_RELAXATION): Define. * config/loongarch/loongarch.cc (loongarch_legitimize_tls_address): Added TLS Le Relax support. (loongarch_print_operand_reloc): Add the output string of TLS Le Relax. * config/loongarch/loongarch.md (@add_tls_le_relax<mode>): New template. * configure: Regenerate. * configure.ac: Check if binutils supports TLS le relax. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add a function to check whether binutil supports TLS Le Relax. * gcc.target/loongarch/tls-le-relax.c: New test.
2023-12-29LoongArch: Fix the format of bstrins_<mode>_for_ior_mask condition (NFC)Xi Ruoyao1-2/+2
gcc/ChangeLog: * config/loongarch/loongarch.md (bstrins_<mode>_for_ior_mask): For the condition, remove unneeded trailing "\" and move "&&" to follow GNU coding style. NFC.
2023-12-29LoongArch: Replace -mexplicit-relocs=auto simple-used address peephole2 with ↵Xi Ruoyao4-87/+57
combine The problem with peephole2 is it uses a naive sliding-window algorithm and misses many cases. For example: float a[10000]; float t() { return a[0] + a[8000]; } is compiled to: la.local $r13,a la.local $r12,a+32768 fld.s $f1,$r13,0 fld.s $f0,$r12,-768 fadd.s $f0,$f1,$f0 by trunk. But as we've explained in r14-4851, the following would be better with -mexplicit-relocs=auto: pcalau12i $r13,%pc_hi20(a) pcalau12i $r12,%pc_hi20(a+32000) fld.s $f1,$r13,%pc_lo12(a) fld.s $f0,$r12,%pc_lo12(a+32000) fadd.s $f0,$f1,$f0 However the sliding-window algorithm just won't detect the pcalau12i/fld pair to be optimized. Use a define_insn_and_rewrite in combine pass will work around the issue. gcc/ChangeLog: * config/loongarch/predicates.md (symbolic_pcrel_offset_operand): New define_predicate. (mem_simple_ldst_operand): Likewise. * config/loongarch/loongarch-protos.h (loongarch_rewrite_mem_for_simple_ldst): Declare. * config/loongarch/loongarch.cc (loongarch_rewrite_mem_for_simple_ldst): Implement. * config/loongarch/loongarch.md (simple_load<mode>): New define_insn_and_rewrite. (simple_load_<su>ext<SUBDI:mode><GPR:mode>): Likewise. (simple_store<mode>): Likewise. (define_peephole2): Remove la.local/[f]ld peepholes. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c: New test. * gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c: New test.
2023-12-27LoongArch: Fix infinite secondary reloading of FCCmode [PR113148]Xi Ruoyao1-1/+2
The GCC internal doc says: X might be a pseudo-register or a 'subreg' of a pseudo-register, which could either be in a hard register or in memory. Use 'true_regnum' to find out; it will return -1 if the pseudo is in memory and the hard register number if it is in a register. So "MEM_P (x)" is not enough for checking if we are reloading from/to the memory. This bug has caused reload pass to stall and finally ICE complaining with "maximum number of generated reload insns per insn achieved", since r14-6814. Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue. gcc/ChangeLog: PR target/113148 * config/loongarch/loongarch.cc (loongarch_secondary_reload): Check if regno == -1 besides MEM_P (x) for reloading FCCmode from/to FPR to/from memory. gcc/testsuite/ChangeLog: PR target/113148 * gcc.target/loongarch/pr113148.c: New test.
2023-12-27LoongArch: Expand left rotate to right rotate with negated amountXi Ruoyao2-0/+41
gcc/ChangeLog: * config/loongarch/loongarch.md (rotl<mode>3): New define_expand. * config/loongarch/simd.md (vrotl<mode>3): Likewise. (rotl<mode>3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotl-with-rotr.c: New test. * gcc.target/loongarch/rotl-with-vrotr-b.c: New test. * gcc.target/loongarch/rotl-with-vrotr-h.c: New test. * gcc.target/loongarch/rotl-with-vrotr-w.c: New test. * gcc.target/loongarch/rotl-with-vrotr-d.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-b.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-h.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-w.c: New test. * gcc.target/loongarch/rotl-with-xvrotr-d.c: New test.
2023-12-27LoongArch: Fix ICE when passing two same vector argument consecutivelyChenghui Pan4-57/+10
Following code will cause ICE on LoongArch target: #include <lsxintrin.h> extern void bar (__m128i, __m128i); __m128i a; void foo () { bar (a, a); } It is caused by missing constraint definition in mov<mode>_lsx. This patch fixes the template and remove the unnecessary processing from loongarch_split_move () function. This patch also cleanup the redundant definition from loongarch_split_move () and loongarch_split_move_p (). gcc/ChangeLog: * config/loongarch/lasx.md: Use loongarch_split_move and loongarch_split_move_p directly. * config/loongarch/loongarch-protos.h (loongarch_split_move): Remove unnecessary argument. (loongarch_split_move_insn_p): Delete. (loongarch_split_move_insn): Delete. * config/loongarch/loongarch.cc (loongarch_split_move_insn_p): Delete. (loongarch_load_store_insns): Use loongarch_split_move_p directly. (loongarch_split_move): remove the unnecessary processing. (loongarch_split_move_insn): Delete. * config/loongarch/lsx.md: Use loongarch_split_move and loongarch_split_move_p directly. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test.
2023-12-27LoongArch: Fix insn output of vec_concat templates for LASX.Chenghui Pan1-67/+7
When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following instruction block are being generated by vec_concatv32qi (which is generated by vec_initv32qiv16qi) at entrance of foo() function: vldx $vr3,$r5,$r6 vld $vr2,$r5,0 xvpermi.q $xr2,$xr3,0x20 causes the reversion of vec_initv32qiv16qi operation's high and low 128-bit part. According to other target's similar impl and LSX impl for following RTL representation, current definition in lasx.md of "vec_concat<mode>" are wrong: (set (op0) (vec_concat (op1) (op2))) For correct behavior, the last argument of xvpermi.q should be 0x02 instead of 0x20. This patch fixes this issue and cleanup the vec_concat template impl. gcc/ChangeLog: * config/loongarch/lasx.md (vec_concatv4di): Delete. (vec_concatv8si): Delete. (vec_concatv16hi): Delete. (vec_concatv32qi): Delete. (vec_concatv4df): Delete. (vec_concatv8sf): Delete. (vec_concat<mode>): New template with insn output fixed.
2023-12-27LoongArch: Fixed bug in *bstrins_<mode>_for_ior_mask template.Li Wei1-1/+1
We found that using the latest compiled gcc will cause a miscompare error when running spec2006 400.perlbench test with -flto turned on. After testing, it was found that only the LoongArch architecture will report errors. The first error commit was located through the git bisect command as r14-3773-g5b857e87201335. Through debugging, it was found that the problem was that the split condition of the *bstrins_<mode>_for_ior_mask template was empty, which should actually be consistent with the insn condition. gcc/ChangeLog: * config/loongarch/loongarch.md: Adjust.
2023-12-23LoongArch: Add sign_extend pattern for 32-bit rotate shiftXi Ruoyao1-0/+10
Remove a redundant sign extension. gcc/ChangeLog: * config/loongarch/loongarch.md (rotrsi3_extend): New define_insn. gcc/testsuite/ChangeLog: * gcc.target/loongarch/rotrw.c: New test.
2023-12-23LoongArch: Implement FCCmode reload and cstore<ANYF:mode>4Xi Ruoyao5-33/+138
We used a branch to load floating-point comparison results into GPR. This is very slow when the branch is not predictable. Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM. Then implement cstore<ANYF:mode>4. gcc/ChangeLog: * config/loongarch/loongarch-tune.h (loongarch_rtx_cost_data::movcf2gr): New field. (loongarch_rtx_cost_data::movcf2gr_): New method. (loongarch_rtx_cost_data::use_movcf2gr): New method. * config/loongarch/loongarch-def.cc (loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr to COSTS_N_INSNS (7) and movgr2cf to COSTS_N_INSNS (15), based on timing on LA464. (loongarch_cpu_rtx_cost_data): Set movcf2gr and movgr2cf to COSTS_N_INSNS (1) for LA664. (loongarch_rtx_cost_optimize_size): Set movcf2gr and movgr2cf to COSTS_N_INSNS (1) + 1. * config/loongarch/predicates.md (loongarch_fcmp_operator): New predicate. * config/loongarch/loongarch.md (movfcc): Change to define_expand. (movfcc_internal): New define_insn. (fcc_to_<X:mode>): New define_insn. (cstore<ANYF:mode>4): New define_expand. * config/loongarch/loongarch.cc (loongarch_hard_regno_mode_ok_uncached): Allow FCCmode in GPRs and GPRs. (loongarch_secondary_reload): Reload FCCmode via FPR and/or GPR. (loongarch_emit_float_compare): Call gen_reg_rtx instead of loongarch_allocate_fcc. (loongarch_allocate_fcc): Remove. (loongarch_move_to_gpr_cost): Handle FCC_REGS -> GR_REGS. (loongarch_move_from_gpr_cost): Handle GR_REGS -> FCC_REGS. (loongarch_register_move_cost): Handle FCC_REGS -> FCC_REGS, FCC_REGS -> FP_REGS, and FP_REGS -> FCC_REGS. gcc/testsuite/ChangeLog: * gcc.target/loongarch/movcf2gr.c: New test. * gcc.target/loongarch/movcf2gr-via-fr.c: New test.
2023-12-21LoongArch: Fix incorrect code generation for sad patternJiahao Xu2-8/+8
When I attempt to enable vect_usad_char effective target for LoongArch, slp-reduc-sad.c and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern generates bad code. This patch to fixed them, for sad patterns, use zero expansion instead of sign expansion for reduction. Currently, we are fixing failed vectorized tests, and in the future, we will enable more tests of "vect" for LoongArch. gcc/ChangeLog: * config/loongarch/lasx.md: Use zero expansion instruction. * config/loongarch/lsx.md: Ditto.
2023-12-20LoongArch: Clean up vec_init expanderXi Ruoyao1-9/+9
Non functional change, clean up the code. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_expand_vector_init_same): Remove "temp2" and reuse "temp" instead. (loongarch_expand_vector_init): Use gcc_unreachable () instead of gcc_assert (0), and fix the comment for it.
2023-12-20LoongArch: Use force_reg instead of gen_reg_rtx + emit_move_insn in vec_init ↵Xi Ruoyao1-26/+12
expander [PR113033] Jakub says: Then that seems like a bug in the loongarch vec_init pattern(s). Those really don't have a predicate in any of the backends on the input operand, so they need to force_reg it if it is something it can't handle. I've looked e.g. at i386 vec_init and that is exactly what it does, see the various tests + force_reg calls in ix86_expand_vector_init*. So replace gen_reg_rtx + emit_move_insn with force_reg to fix PR 113033. gcc/ChangeLog: PR target/113033 * config/loongarch/loongarch.cc (loongarch_expand_vector_init_same): Replace gen_reg_rtx + emit_move_insn with force_reg. (loongarch_expand_vector_init): Likewise. gcc/testsuite/ChangeLog: PR target/113033 * gcc.target/loongarch/pr113033.c: New test.
2023-12-20LoongArch: Fix FP vector comparsons [PR113034]Xi Ruoyao4-218/+119
We had the following mappings between <x>vfcmp submenmonics and RTX codes: (define_code_attr fcc [(unordered "cun") (ordered "cor") (eq "ceq") (ne "cne") (uneq "cueq") (unle "cule") (unlt "cult") (le "cle") (lt "clt")]) This is inconsistent with scalar code: (define_code_attr fcond [(unordered "cun") (uneq "cueq") (unlt "cult") (unle "cule") (eq "ceq") (lt "slt") (le "sle") (ordered "cor") (ltgt "sne") (ne "cune") (ge "sge") (gt "sgt") (unge "cuge") (ungt "cugt")]) For every RTX code for which the LSX/LASX code is different from the scalar code, the scalar code is correct and the LSX/LASX code is wrong. Most seriously, the RTX code NE should be mapped to "cneq", not "cne". Rewrite <x>vfcmp define_insns in simd.md using the same mapping as scalar fcmp. Note that GAS does not support [x]vfcmp.{c/s}[u]{ge/gt} (pseudo) instruction (although fcmp.{c/s}[u]{ge/gt} is supported), so we need to switch the order of inputs and use [x]vfcmp.{c/s}[u]{le/lt} instead. The <x>vfcmp.{sult/sule/clt/cle}.{s/d} instructions do not have a single RTX code, but they can be modeled as an inversed RTX code following a "not" operation. Doing so allows the compiler to optimized vectorized __builtin_isless etc. to a single instruction. This optimization should be added for scalar code too and I'll do it later. Tests are added for mapping between C code, IEC 60559 operations, and vfcmp instructions. [1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640713.html gcc/ChangeLog: PR target/113034 * config/loongarch/lasx.md (UNSPEC_LASX_XVFCMP_*): Remove. (lasx_xvfcmp_caf_<flasxfmt>): Remove. (lasx_xvfcmp_cune_<FLASX:flasxfmt>): Remove. (FSC256_UNS): Remove. (fsc256): Remove. (lasx_xvfcmp_<vfcond:fcc>_<FLASX:flasxfmt>): Remove. (lasx_xvfcmp_<fsc256>_<FLASX:flasxfmt>): Remove. * config/loongarch/lsx.md (UNSPEC_LSX_XVFCMP_*): Remove. (lsx_vfcmp_caf_<flsxfmt>): Remove. (lsx_vfcmp_cune_<FLSX:flsxfmt>): Remove. (vfcond): Remove. (fcc): Remove. (FSC_UNS): Remove. (fsc): Remove. (lsx_vfcmp_<vfcond:fcc>_<FLSX:flsxfmt>): Remove. (lsx_vfcmp_<fsc>_<FLSX:flsxfmt>): Remove. * config/loongarch/simd.md (fcond_simd): New define_code_iterator. (<simd_isa>_<x>vfcmp_<fcond:fcond_simd>_<simdfmt>): New define_insn. (fcond_simd_rev): New define_code_iterator. (fcond_rev_asm): New define_code_attr. (<simd_isa>_<x>vfcmp_<fcond:fcond_simd_rev>_<simdfmt>): New define_insn. (fcond_inv): New define_code_iterator. (fcond_inv_rev): New define_code_iterator. (fcond_inv_rev_asm): New define_code_attr. (<simd_isa>_<x>vfcmp_<fcond_inv>_<simdfmt>): New define_insn. (<simd_isa>_<x>vfcmp_<fcond_inv:fcond_inv_rev>_<simdfmt>): New define_insn. (UNSPEC_SIMD_FCMP_CAF, UNSPEC_SIMD_FCMP_SAF, UNSPEC_SIMD_FCMP_SEQ, UNSPEC_SIMD_FCMP_SUN, UNSPEC_SIMD_FCMP_SUEQ, UNSPEC_SIMD_FCMP_CNE, UNSPEC_SIMD_FCMP_SOR, UNSPEC_SIMD_FCMP_SUNE): New unspecs. (SIMD_FCMP): New define_int_iterator. (fcond_unspec): New define_int_attr. (<simd_isa>_<x>vfcmp_<fcond_unspec>_<simdfmt>): New define_insn. * config/loongarch/loongarch.cc (loongarch_expand_lsx_cmp): Remove unneeded special cases. gcc/testsuite/ChangeLog: PR target/113034 * gcc.target/loongarch/vfcmp-f.c: New test. * gcc.target/loongarch/vfcmp-d.c: New test. * gcc.target/loongarch/xvfcmp-f.c: New test. * gcc.target/loongarch/xvfcmp-d.c: New test. * gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Scan for cune instead of cne. * gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Likewise.
2023-12-18LoongArch: Add support for D frontend.liushuyu3-0/+107
gcc/ChangeLog: * config.gcc: Add loongarch-d.o to d_target_objs for LoongArch architecture. * config/loongarch/t-loongarch: Add object target for loongarch-d.cc. * config/loongarch/loongarch-d.cc (loongarch_d_target_versions): add interface function to define builtin D versions for LoongArch architecture. (loongarch_d_handle_target_float_abi): add interface function to define builtin D traits for LoongArch architecture. (loongarch_d_register_target_info): add interface function to register loongarch_d_handle_target_float_abi function. * config/loongarch/loongarch-d.h (loongarch_d_target_versions): add function prototype. (loongarch_d_register_target_info): Likewise. libphobos/ChangeLog: * configure.tgt: Enable libphobos for LoongArch architecture. * libdruntime/gcc/sections/elf.d: Add TLS_DTV_OFFSET constant for LoongArch64. * libdruntime/gcc/unwind/generic.d: Add __aligned__ constant for LoongArch64.