Age | Commit message (Collapse) | Author | Files | Lines |
|
We expanded (neg x) to (minus const0 x) for LSX FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with LSX enabled.
Use the vbitrevi.{d/w} instructions to simply reverse the sign bit
instead. We are already doing this for LASX and now we can unify them
into simd.md.
gcc/ChangeLog:
* config/loongarch/lsx.md (neg<mode:FLSX>2): Remove the
incorrect expand.
* config/loongarch/simd.md (simdfmt_as_i): New define_mode_attr.
(elmsgnbit): Likewise.
(neg<mode:FVEC>2): New define_insn.
* config/loongarch/lasx.md (negv4df2, negv8sf2): Remove as they
are now instantiated in simd.md.
|
|
We call loongarch_symbol_insns with mode = MAX_MACHINE_MODE sometimes.
But in loongarch_symbol_insns:
if (LSX_SUPPORTED_MODE_P (mode) || LASX_SUPPORTED_MODE_P (mode))
return 0;
And LSX_SUPPORTED_MODE_P is defined as:
#define LSX_SUPPORTED_MODE_P(MODE) \
(ISA_HAS_LSX \
&& GET_MODE_SIZE (MODE) == UNITS_PER_LSX_REG ... ...
GET_MODE_SIZE is expanded to a call to mode_to_bytes, which is defined:
ALWAYS_INLINE poly_uint16
mode_to_bytes (machine_mode mode)
{
#if GCC_VERSION >= 4001
return (__builtin_constant_p (mode)
? mode_size_inline (mode) : mode_size[mode]);
#else
return mode_size[mode];
#endif
}
There is an assertion in mode_size_inline:
gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);
Note that NUM_MACHINE_MODES = MAX_MACHINE_MODE (emitted by genmodes.cc),
thus if __builtin_constant_p (mode) is evaluated true (it happens when
GCC is bootstrapped with LTO+PGO), the assertion will be triggered and
cause an ICE. OTOH if __builtin_constant_p (mode) is evaluated false,
mode_size[mode] is still an out-of-bound array access (the length or the
mode_size array is NUM_MACHINE_MODES).
So we shouldn't call LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P with
MAX_MACHINE_MODE in loongarch_symbol_insns. This is very similar to a
MIPS bug PR98491 fixed by me about 3 years ago.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbol_insns): Do not
use LSX_SUPPORTED_MODE_P or LASX_SUPPORTED_MODE_P if mode is
MAX_MACHINE_MODE.
|
|
This FAIL was introduced from r14-6908. The reason is that when merging
constant vector permutation implementations, the 128-bit matching situation
was not fully considered. In fact, the expansion of 128-bit vectors after
merging only supports value-based 4 elements set shuffle, so this time is a
complete implementation of the entire 128-bit vector constant permutation,
and some structural adjustments have also been made to the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_vselect): Adjust.
(loongarch_expand_vselect_vconcat): Ditto.
(loongarch_try_expand_lsx_vshuf_const): New, use vshuf to implement
all 128-bit constant permutation situations.
(loongarch_expand_lsx_shuffle): Adjust and rename function name.
(loongarch_is_imm_set_shuffle): Renamed function name.
(loongarch_expand_vec_perm_even_odd): Function forward declaration.
(loongarch_expand_vec_perm_even_odd_1): Add implement for 128-bit
extract-even and extract-odd permutations.
(loongarch_is_odd_extraction): Delete.
(loongarch_is_even_extraction): Ditto.
(loongarch_expand_vec_perm_const): Adjust.
|
|
When bootstrapping GCC 14 with --with-build-config=bootstrap-lto, an ODR
violation is detected:
../../gcc/config/loongarch/loongarch-opts.cc:57: warning:
'abi_minimal_isa' violates the C++ One Definition Rule [-Wodr]
57 | abi_minimal_isa[N_ABI_BASE_TYPES][N_ABI_EXT_TYPES];
../../gcc/config/loongarch/loongarch-def.cc:186: note:
'abi_minimal_isa' was previously declared here
186 | abi_minimal_isa = array<array<loongarch_isa, N_ABI_EXT_TYPES>,
../../gcc/config/loongarch/loongarch-def.cc:186: note:
code may be misoptimized unless '-fno-strict-aliasing' is used
Fix it by adding a proper declaration of abi_minimal_isa into
loongarch-def.h and remove the ODR-violating local declaration in
loongarch-opts.cc.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h (abi_minimal_isa): Declare.
* config/loongarch/loongarch-opts.cc (abi_minimal_isa): Remove
the ODR-violating locale declaration.
|
|
gcc/ChangeLog:
* config/loongarch/larchintrin.h
(__frecipe_s): Update function return type.
(__frecipe_d): Ditto.
(__frsqrte_s): Ditto.
(__frsqrte_d): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/larch-frecipe-intrinsic.c: New test.
|
|
We found that when only 128-bit vectorization was enabled, 549.fotonik3d_r
failed to vectorize effectively. For this reason, we adjust the cost of
128-bit vector_stmt that match the multiply-add pattern to facilitate 128-bit
vectorization.
The experimental results show that after the modification, 549.fotonik3d_r
performance can be improved by 9.77% under the 128-bit vectorization option.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_multiply_add_p): New.
(loongarch_vector_costs::add_stmt_cost): Adjust.
gcc/testsuite/ChangeLog:
* gfortran.dg/vect/vect-10.f90: New test.
|
|
model.
The ABI mandates the pcalau12i/addi.d/lu32i.d/lu52i.d instructions for
addressing a symbol to be adjacent. So model them as "one large
instruction", i.e. define_insn, with two output registers. The real
address is the sum of these two registers.
The advantage of this approach is the RTL passes can still use ldx/stx
instructions to skip an addi.d instruction.
gcc/ChangeLog:
* config/loongarch/loongarch.md (unspec): Add
UNSPEC_LA_PCREL_64_PART1 and UNSPEC_LA_PCREL_64_PART2.
(la_pcrel64_two_parts): New define_insn.
* config/loongarch/loongarch.cc (loongarch_tls_symbol): Fix a
typo in the comment.
(loongarch_call_tls_get_addr): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, use la_pcrel64_two_parts for
addressing the TLS symbol and __tls_get_addr. Emit an REG_EQUAL
note to allow CSE addressing __tls_get_addr.
(loongarch_legitimize_tls_address): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address TLS IE symbols with
la_pcrel64_two_parts.
(loongarch_split_symbol): If -mcmodel=extreme
-mexplicit-relocs={always,auto}, address symbols with
la_pcrel64_two_parts.
(loongarch_output_mi_thunk): Clean up unreachable code. If
-mcmodel=extreme -mexplicit-relocs={always,auto}, address the MI
thunks with la_pcrel64_two_parts.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/func-call-extreme-1.c (dg-options):
Use -O2 instead of -O0 to ensure the pcalau12i/addi/lu32i/lu52i
instruction sequences are not reordered by the compiler.
(NOIPA): Disallow interprocedural optimizations.
* gcc.target/loongarch/func-call-extreme-2.c: Remove the content
duplicated from func-call-extreme-1.c, include it instead.
(dg-options): Likewise.
* gcc.target/loongarch/func-call-extreme-3.c (dg-options):
Likewise.
* gcc.target/loongarch/func-call-extreme-4.c (dg-options):
Likewise.
* gcc.target/loongarch/cmodel-extreme-1.c: New test.
* gcc.target/loongarch/cmodel-extreme-2.c: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-1.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-2.C: New test.
* g++.target/loongarch/cmodel-extreme-mi-thunk-3.C: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_call_tls_get_addr):
Add support for call36.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-medium-call36-auto-tls-ld-gd.c: New test.
|
|
-mexplicit-relocs=auto.
Binutils does not support relaxation using four instructions to obtain
symbol addresses
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
When the code model of the symbol is extreme and -mexplicit-relocs=auto,
the macro instruction loading symbol address is not applicable.
(loongarch_call_tls_get_addr): Adjust code.
(loongarch_legitimize_tls_address): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-extreme-auto-tls-ld-gd.c: New test.
* gcc.target/loongarch/explicit-relocs-medium-auto-tls-ld-gd.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h (loongarch_symbol_extreme_p):
Add function declaration.
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For SYMBOL_PCREL64, non-zero addend of "la.local $rd,$rt,sym+addend"
is not allowed
(loongarch_load_tls): Added macro support in extreme mode.
(loongarch_call_tls_get_addr): Likewise.
(loongarch_legitimize_tls_address): Likewise.
(loongarch_force_address): Likewise.
(loongarch_legitimize_move): Likewise.
(loongarch_output_mi_thunk): Likewise.
(loongarch_option_override_internal): Remove the code that detects
explicit relocs status.
(loongarch_handle_model_attribute): Likewise.
* config/loongarch/loongarch.md (movdi_symbolic_off64): New template.
* config/loongarch/predicates.md (symbolic_off64_operand): New predicate.
(symbolic_off64_or_reg_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/attr-model-5.c: New test.
* gcc.target/loongarch/func-call-extreme-5.c: New test.
* gcc.target/loongarch/func-call-extreme-6.c: New test.
* gcc.target/loongarch/tls-extreme-macro.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_load_tls):
Load all types of tls symbols through one function.
(loongarch_got_load_tls_gd): Delete.
(loongarch_got_load_tls_ld): Delete.
(loongarch_got_load_tls_ie): Delete.
(loongarch_got_load_tls_le): Delete.
(loongarch_call_tls_get_addr): Modify the called function name.
(loongarch_legitimize_tls_address): Likewise.
* config/loongarch/loongarch.md (@got_load_tls_gd<mode>): Delete.
(@load_tls<mode>): New template.
(@got_load_tls_ld<mode>): Delete.
(@got_load_tls_le<mode>): Delete.
(@got_load_tls_ie<mode>): Delete.
|
|
values through fp.
Modify address calculation logic from (((a x C) + fp) + offset) to ((fp + offset) + a x C).
Thereby modifying the register dependencies and optimizing the code.
The value of C is 2 4 or 8.
The following is the assembly code before and after a loop modification in spec2006 401.bzip:
old | new
735 .L71: | 735 .L71:
736 slli.d $r12,$r15,2 | 736 slli.d $r12,$r15,2
737 ldx.w $r13,$r22,$r12 | 737 ldx.w $r13,$r22,$r12
738 addi.d $r15,$r15,-1 | 738 addi.d $r15,$r15,-1
739 slli.w $r16,$r15,0 | 739 slli.w $r16,$r15,0
740 addi.w $r13,$r13,-1 | 740 addi.w $r13,$r13,-1
741 slti $r14,$r13,0 | 741 slti $r14,$r13,0
742 add.w $r12,$r26,$r13 | 742 add.w $r12,$r26,$r13
743 maskeqz $r12,$r12,$r14 | 743 maskeqz $r12,$r12,$r14
744 masknez $r14,$r13,$r14 | 744 masknez $r14,$r13,$r14
745 or $r12,$r12,$r14 | 745 or $r12,$r12,$r14
746 ldx.bu $r14,$r30,$r12 | 746 ldx.bu $r14,$r30,$r12
747 lu12i.w $r13,4096>>12 | 747 alsl.d $r14,$r14,$r18,2
748 ori $r13,$r13,432 | 748 ldptr.w $r13,$r14,0
749 add.d $r13,$r13,$r3 | 749 addi.w $r17,$r13,-1
750 alsl.d $r14,$r14,$r13,2 | 750 stptr.w $r17,$r14,0
751 ldptr.w $r13,$r14,-1968 | 751 slli.d $r13,$r13,2
752 addi.w $r17,$r13,-1 | 752 stx.w $r12,$r22,$r13
753 st.w $r17,$r14,-1968 | 753 ldptr.w $r12,$r19,0
754 slli.d $r13,$r13,2 | 754 blt $r12,$r16,.L71
755 stx.w $r12,$r22,$r13 | 755 .align 4
756 ldptr.w $r12,$r18,-2048 | 756
757 blt $r12,$r16,.L71 | 757
758 .align 4 | 758
This patch is ported from riscv's commit r14-3111.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (mem_shadd_or_shadd_rtx_p): New function.
(loongarch_legitimize_address): Add logical transformation code.
|
|
For below pattern, can be treated as a simple move because floating point
and vector share a common register on loongarch64.
(set (reg/v:SF 32 $f0 [orig:93 res ] [93])
(vec_select:SF (reg:V8SF 32 $f0 [115])
(parallel [
(const_int 0 [0])
])))
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_extract<mode>_0):
New define_insn_and_split patten.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-extract.c: New test.
|
|
Define LOGICAL_OP_NON_SHORT_CIRCUIT as 0, for a short-circuit branch, use the
short-circuit operation instead of the non-short-circuit operation.
SPEC2017 performance evaluation shows 1% performance improvement for fprate
GEOMEAN and no obvious regression for others. Especially, 526.blender_r +10.6%
on 3A6000.
This modification will introduce the following FAIL items:
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Conditional combines static and invariant" 1
FAIL: gcc.dg/tree-ssa/copy-headers-8.c scan-tree-dump-times ch2 "Will duplicate bb" 2
FAIL: gcc.dg/tree-ssa/update-threading.c scan-tree-dump-times optimized "Invalid sum" 0
gcc/ChangeLog:
* config/loongarch/loongarch.h (LOGICAL_OP_NON_SHORT_CIRCUIT): Define.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/short-circuit.c: New test.
|
|
approximate division.
We found that in the spec17 521.wrf program, some loop invariant code generated
from single-precision floating-point approximate division calculation failed to
propose a loop. This is because the pseudo-register that stores the
intermediate temporary calculation results is rewritten in the implementation
of single-precision floating-point approximate division, failing to propose
invariants in the loop2_invariant pass. To this end, the intermediate temporary
calculation results are stored in new pseudo-registers without destroying the
read-write dependency, so that they could be recognized as loop invariants in
the loop2_invariant pass.
After optimization, the number of instructions of 521.wrf is reduced by 0.18%
compared with before optimization (1716612948501 -> 1713471771364).
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_emit_swdivsf): Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/invariant-recip.c: New test.
|
|
It is incorrect to use vld/vori to implement the vec_concatz<mode> because when the LSX
instruction is used to update the value of the vector register, the upper 128 bits of
the vector register will not be zeroed.
gcc/ChangeLog:
* config/loongarch/lasx.md (@vec_concatz<mode>): Remove this define_insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init): Use vec_concat<mode>.
|
|
TLS gd ld and ie type symbols will generate corresponding GOT entries,
so non-zero offsets cannot be generated.
The address of TLS le type symbol+addend is not implemented in binutils,
so non-zero offset is not generated here for the time being.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_symbolic_constant_p):
For symbols of type tls, non-zero Offset is not generated.
|
|
Binutils 2.42 supports TLS LD/GD relaxation which requires the assembler
macro.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_explicit_relocs_p):
If la_opt_explicit_relocs is EXPLICIT_RELOCS_AUTO, return false
for SYMBOL_TLS_LDM and SYMBOL_TLS_GD.
(loongarch_call_tls_get_addr): Do not split symbols of
SYMBOL_TLS_LDM or SYMBOL_TLS_GD if la_opt_explicit_relocs is
EXPLICIT_RELOCS_AUTO.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-auto-tls-ld-gd.c: Check
for la.tls.ld and la.tls.gd.
|
|
We don't allow SImode in FCC, so constraint z is never really used
here.
gcc/ChangeLog:
* config/loongarch/loongarch.md (movsi_internal): Remove
constraint z.
|
|
table belongs.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_split_symbol):
Assign the '/u' attribute to the mem.
gcc/testsuite/ChangeLog:
* g++.target/loongarch/got-load.C: New test.
|
|
Eliminate the redundant sign extension that exists after the conditional
move when the target register is SImode.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Adjust.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/sign-extend-2.c: Adjust.
|
|
We found that the current combine optimization pass in gcc cannot handle
the following redundant sign extension situations:
(insn 77 76 78 5 (set (reg:SI 143)
(plus:SI (subreg/s/u:SI (reg/v:DI 104 [ len ]) 0)
(const_int 1 [0x1]))) {addsi3}
(expr_list:REG_DEAD (reg/v:DI 104 [ len ])
(nil)))
(insn 78 77 82 5 (set (reg/v:DI 104 [ len ])
(sign_extend:DI (reg:SI 143))) {extendsidi2}
(nil))
Because reg:SI 143 is not died or set in insn 78, no replacement merge will
be performed for the insn sequence. We adjusted the add template to eliminate
redundant sign extensions during the expand pass.
Adjusted based on upstream comments:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641988.html
gcc/ChangeLog:
* config/loongarch/loongarch.md (add<mode>3): Removed.
(*addsi3): New.
(addsi3): Ditto.
(adddi3): Ditto.
(*addsi3_extended): Removed.
(addsi3_extended): New.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/sign-extend.c: Moved to...
* gcc.target/loongarch/sign-extend-1.c: ...here.
* gcc.target/loongarch/sign-extend-2.c: New test.
|
|
LTO option streaming and target attributes both require per-function
target configuration, which is achieved via option save/restore.
We implement TARGET_OPTION_{SAVE,RESTORE} to switch the la_target
context in addition to other automatically maintained option states
(via the "Save" option property in the .opt files).
Tested on loongarch64-linux-gnu without regression.
PR target/113233
gcc/ChangeLog:
* config/loongarch/genopts/loongarch.opt.in: Mark options with
the "Save" property.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-opts.cc: Refresh -mcmodel= state
according to la_target.
* config/loongarch/loongarch.cc: Implement TARGET_OPTION_{SAVE,
RESTORE} for the la_target structure; Rename option conditions
to have the same "la_" prefix.
* config/loongarch/loongarch.h: Same.
|
|
during bitwise operations.
There are two mode iterators defined in the loongarch.md:
(define_mode_iterator GPR [SI (DI "TARGET_64BIT")])
and
(define_mode_iterator X [(SI "!TARGET_64BIT") (DI "TARGET_64BIT")])
Replace the mode in the bit arithmetic from GPR to X.
Since the bitwise operation instruction does not distinguish between 64-bit,
32-bit, etc., it is necessary to perform symbolic expansion if the bitwise
operation is less than 64 bits.
The original definition would have generated a lot of redundant symbolic
extension instructions. This problem is optimized with reference to the
implementation of RISCV.
Add this patch spec2017 500.perlbench performance improvement by 1.8%
gcc/ChangeLog:
* config/loongarch/loongarch.md (one_cmpl<mode>2): Replace GPR with X.
(*nor<mode>3): Likewise.
(nor<mode>3): Likewise.
(*negsi2_extended): New template.
(*<optab>si3_internal): Likewise.
(*one_cmplsi2_internal): Likewise.
(*norsi3_internal): Likewise.
(*<optab>nsi_internal): Likewise.
(bytepick_w_<bytepick_imm>_extend): Modify this template according to the
modified bit operation to make the optimization work.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/sign-extend-bitwise.c: New test.
|
|
Since we do not need printing or manual parsing of this option,
(whether in the driver or for target attributes to be supported later)
it can be handled in the .opt file framework.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch-strings: Remove explicit-reloc
argument string definitions.
* config/loongarch/loongarch-str.h: Same.
* config/loongarch/genopts/loongarch.opt.in: Mark -m[no-]explicit-relocs
as aliases to -mexplicit-relocs={always,none}
* config/loongarch/loongarch.opt: Regenerate.
* config/loongarch/loongarch.cc: Same.
|
|
Target features constants from loongarch-def.h are currently defined as macros.
Switch to enums for better look in the debugger.
gcc/ChangeLog:
* config/loongarch/loongarch-def.h: Define constants with
enums instead of Macros.
|
|
LoongArch ISA manual v1.10 suggests that software should not depend on
the ISA version number for marking processor features. The ISA version
number is now defined as a collective name of individual ISA evolutions.
Since there is a independent ISA evolution mask now, we can drop the
version information from the base ISA.
gcc/ChangeLog:
* config/loongarch/genopts/loongarch-strings: Rename.
* config/loongarch/genopts/loongarch.opt.in: Same.
* config/loongarch/loongarch-cpu.cc: Same.
* config/loongarch/loongarch-def.cc: Same.
* config/loongarch/loongarch-def.h: Same.
* config/loongarch/loongarch-opts.cc: Same.
* config/loongarch/loongarch-opts.h: Same.
* config/loongarch/loongarch-str.h: Same.
* config/loongarch/loongarch.opt: Same.
|
|
gcc/ChangeLog:
* config/loongarch/genopts/genstr.sh: Prepend the isa_evolution
variable with the common la_ prefix.
* config/loongarch/genopts/loongarch.opt.in: Mark ISA evolution
flags as saved using TargetVariable.
* config/loongarch/loongarch.opt: Same.
* config/loongarch/loongarch-def.h: Define evolution_set to
mark changes to the -march default.
* config/loongarch/loongarch-driver.cc: Same.
* config/loongarch/loongarch-opts.cc: Same.
* config/loongarch/loongarch-opts.h: Define and use ISA evolution
conditions around the la_target structure.
* config/loongarch/loongarch.cc: Same.
* config/loongarch/loongarch.md: Same.
* config/loongarch/loongarch-builtins.cc: Same.
* config/loongarch/loongarch-c.cc: Same.
* config/loongarch/lasx.md: Same.
* config/loongarch/lsx.md: Same.
* config/loongarch/sync.md: Same.
|
|
This patch implements more vec_init optabs that can handle two LSX vectors producing a LASX
vector by concatenating them. When an lsx vector is concatenated with an LSX const_vector of
zeroes, the vec_concatz pattern can be used effectively. For example as below
typedef short v8hi __attribute__ ((vector_size (16)));
typedef short v16hi __attribute__ ((vector_size (32)));
v8hi a, b;
v16hi vec_initv16hiv8hi ()
{
return __builtin_shufflevector (a, b, 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15);
}
Before this patch:
vec_initv16hiv8hi:
addi.d $r3,$r3,-64
.cfi_def_cfa_offset 64
xvrepli.h $xr0,0
la.local $r12,.LANCHOR0
xvst $xr0,$r3,0
xvst $xr0,$r3,32
vld $vr0,$r12,0
vst $vr0,$r3,0
vld $vr0,$r12,16
vst $vr0,$r3,32
xvld $xr1,$r3,32
xvld $xr2,$r3,32
xvld $xr0,$r3,0
xvilvh.h $xr0,$xr1,$xr0
xvld $xr1,$r3,0
xvilvl.h $xr1,$xr2,$xr1
addi.d $r3,$r3,64
.cfi_def_cfa_offset 0
xvpermi.q $xr0,$xr1,32
jr $r1
After this patch:
vec_initv16hiv8hi:
la.local $r12,.LANCHOR0
vld $vr0,$r12,32
vld $vr2,$r12,48
xvilvh.h $xr1,$xr2,$xr0
xvilvl.h $xr0,$xr2,$xr0
xvpermi.q $xr1,$xr0,32
xvst $xr1,$r4,0
jr $r1
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_initv32qiv16qi): Rename to ..
(vec_init<mode><lasxhalf>): .. this, and extend to mode.
(@vec_concatz<mode>): New insn pattern.
* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
Handle VALS containing two vectors.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-vec-init-2.c: New test.
|
|
For instruction xvpermi.q, unused bits in operands[3] need be set to 0 to avoid
causing undefined behavior on LA464.
gcc/ChangeLog:
* config/loongarch/lasx.md: Set the unused bits in operand[3] to 0.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lasx/lasx-xvpremi.c: Removed.
* gcc.target/loongarch/vector/lasx/lasx-xvpermi_q.c: New test.
|
|
the [x]vld/[x]vst instruction.
The [x]vld/[x]vst directive is defined as follows:
[x]vld/[x]vst {x/v}d, rj, si12
When not modified, the immediate field of [x]vld/[x]vst is between 10 and
14 bits depending on the type. However, in loongarch_valid_offset_p, the
immediate field is restricted first, so there is no error. However, in
some cases redundant instructions will be generated, see test cases.
Now modify it according to the description in the instruction manual.
gcc/ChangeLog:
* config/loongarch/lasx.md (lasx_mxld_<lasxfmt_f>):
Modify the method of determining the memory offset of [x]vld/[x]vst.
(lasx_mxst_<lasxfmt_f>): Likewise.
* config/loongarch/loongarch.cc (loongarch_valid_offset_p): Delete.
(loongarch_address_insns): Likewise.
* config/loongarch/lsx.md (lsx_ld_<lsxfmt_f>): Likewise.
(lsx_st_<lsxfmt_f>): Likewise.
* config/loongarch/predicates.md (aq10b_operand): Likewise.
(aq10h_operand): Likewise.
(aq10w_operand): Likewise.
(aq10d_operand): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vect-ld-st-imm12.c: New test.
|
|
Changed in v5: regenerated
Changed in v4: regenerated
Changed in v3: regenerated
Changed in v2: the files now contain some lang-specific URLs.
gcc/ada/ChangeLog:
* gcc-interface/lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/analyzer/ChangeLog:
* analyzer.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/c-family/ChangeLog:
* c.opt.urls: New file, autogenerated by regenerate-opt-urls.py.
gcc/ChangeLog:
* common.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
* config/aarch64/aarch64.opt.urls: Likewise.
* config/alpha/alpha.opt.urls: Likewise.
* config/alpha/elf.opt.urls: Likewise.
* config/arc/arc-tables.opt.urls: Likewise.
* config/arc/arc.opt.urls: Likewise.
* config/arm/arm-tables.opt.urls: Likewise.
* config/arm/arm.opt.urls: Likewise.
* config/arm/vxworks.opt.urls: Likewise.
* config/avr/avr.opt.urls: Likewise.
* config/bpf/bpf.opt.urls: Likewise.
* config/c6x/c6x-tables.opt.urls: Likewise.
* config/c6x/c6x.opt.urls: Likewise.
* config/cris/cris.opt.urls: Likewise.
* config/cris/elf.opt.urls: Likewise.
* config/csky/csky.opt.urls: Likewise.
* config/csky/csky_tables.opt.urls: Likewise.
* config/darwin.opt.urls: Likewise.
* config/dragonfly.opt.urls: Likewise.
* config/epiphany/epiphany.opt.urls: Likewise.
* config/fr30/fr30.opt.urls: Likewise.
* config/freebsd.opt.urls: Likewise.
* config/frv/frv.opt.urls: Likewise.
* config/ft32/ft32.opt.urls: Likewise.
* config/fused-madd.opt.urls: Likewise.
* config/g.opt.urls: Likewise.
* config/gcn/gcn.opt.urls: Likewise.
* config/gnu-user.opt.urls: Likewise.
* config/h8300/h8300.opt.urls: Likewise.
* config/hpux11.opt.urls: Likewise.
* config/i386/cygming.opt.urls: Likewise.
* config/i386/cygwin.opt.urls: Likewise.
* config/i386/djgpp.opt.urls: Likewise.
* config/i386/i386.opt.urls: Likewise.
* config/i386/mingw-w64.opt.urls: Likewise.
* config/i386/mingw.opt.urls: Likewise.
* config/i386/nto.opt.urls: Likewise.
* config/ia64/ia64.opt.urls: Likewise.
* config/ia64/ilp32.opt.urls: Likewise.
* config/ia64/vms.opt.urls: Likewise.
* config/iq2000/iq2000.opt.urls: Likewise.
* config/linux-android.opt.urls: Likewise.
* config/linux.opt.urls: Likewise.
* config/lm32/lm32.opt.urls: Likewise.
* config/loongarch/loongarch.opt.urls: Likewise.
* config/lynx.opt.urls: Likewise.
* config/m32c/m32c.opt.urls: Likewise.
* config/m32r/m32r.opt.urls: Likewise.
* config/m68k/ieee.opt.urls: Likewise.
* config/m68k/m68k-tables.opt.urls: Likewise.
* config/m68k/m68k.opt.urls: Likewise.
* config/m68k/uclinux.opt.urls: Likewise.
* config/mcore/mcore.opt.urls: Likewise.
* config/microblaze/microblaze.opt.urls: Likewise.
* config/mips/mips-tables.opt.urls: Likewise.
* config/mips/mips.opt.urls: Likewise.
* config/mips/sde.opt.urls: Likewise.
* config/mmix/mmix.opt.urls: Likewise.
* config/mn10300/mn10300.opt.urls: Likewise.
* config/moxie/moxie.opt.urls: Likewise.
* config/msp430/msp430.opt.urls: Likewise.
* config/nds32/nds32-elf.opt.urls: Likewise.
* config/nds32/nds32-linux.opt.urls: Likewise.
* config/nds32/nds32.opt.urls: Likewise.
* config/netbsd-elf.opt.urls: Likewise.
* config/netbsd.opt.urls: Likewise.
* config/nios2/elf.opt.urls: Likewise.
* config/nios2/nios2.opt.urls: Likewise.
* config/nvptx/nvptx-gen.opt.urls: Likewise.
* config/nvptx/nvptx.opt.urls: Likewise.
* config/openbsd.opt.urls: Likewise.
* config/or1k/elf.opt.urls: Likewise.
* config/or1k/or1k.opt.urls: Likewise.
* config/pa/pa-hpux.opt.urls: Likewise.
* config/pa/pa-hpux1010.opt.urls: Likewise.
* config/pa/pa-hpux1111.opt.urls: Likewise.
* config/pa/pa-hpux1131.opt.urls: Likewise.
* config/pa/pa.opt.urls: Likewise.
* config/pa/pa64-hpux.opt.urls: Likewise.
* config/pdp11/pdp11.opt.urls: Likewise.
* config/pru/pru.opt.urls: Likewise.
* config/riscv/riscv.opt.urls: Likewise.
* config/rl78/rl78.opt.urls: Likewise.
* config/rpath.opt.urls: Likewise.
* config/rs6000/476.opt.urls: Likewise.
* config/rs6000/aix64.opt.urls: Likewise.
* config/rs6000/darwin.opt.urls: Likewise.
* config/rs6000/linux64.opt.urls: Likewise.
* config/rs6000/rs6000-tables.opt.urls: Likewise.
* config/rs6000/rs6000.opt.urls: Likewise.
* config/rs6000/sysv4.opt.urls: Likewise.
* config/rtems.opt.urls: Likewise.
* config/rx/elf.opt.urls: Likewise.
* config/rx/rx.opt.urls: Likewise.
* config/s390/s390.opt.urls: Likewise.
* config/s390/tpf.opt.urls: Likewise.
* config/sh/sh.opt.urls: Likewise.
* config/sh/superh.opt.urls: Likewise.
* config/sol2.opt.urls: Likewise.
* config/sparc/long-double-switch.opt.urls: Likewise.
* config/sparc/sparc.opt.urls: Likewise.
* config/stormy16/stormy16.opt.urls: Likewise.
* config/v850/v850.opt.urls: Likewise.
* config/vax/elf.opt.urls: Likewise.
* config/vax/vax.opt.urls: Likewise.
* config/visium/visium.opt.urls: Likewise.
* config/vms/vms.opt.urls: Likewise.
* config/vxworks-smp.opt.urls: Likewise.
* config/vxworks.opt.urls: Likewise.
* config/xtensa/elf.opt.urls: Likewise.
* config/xtensa/uclinux.opt.urls: Likewise.
* config/xtensa/xtensa.opt.urls: Likewise.
gcc/d/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/fortran/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/go/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/lto/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/m2/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/ChangeLog:
* params.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
gcc/rust/ChangeLog:
* lang.opt.urls: New file, autogenerated by
regenerate-opt-urls.py.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
There are currently two versions of the implementations of constant
vector permutation: loongarch_expand_vec_perm_const_1 and
loongarch_expand_vec_perm_const_2. The implementations of the two
versions are different. Currently, only the implementation of
loongarch_expand_vec_perm_const_1 is used for 256-bit vectors. We
hope to streamline the code as much as possible while retaining the
better-performing implementation of the two. By repeatedly testing
spec2006 and spec2017, we got the following Merged version.
Compared with the pre-merger version, the number of lines of code
in loongarch.cc has been reduced by 888 lines. At the same time,
the performance of SPECint2006 under Ofast has been improved by 0.97%,
and the performance of SPEC2017 fprate has been improved by 0.27%.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_is_odd_extraction):
Remove useless forward declaration.
(loongarch_is_even_extraction): Remove useless forward declaration.
(loongarch_try_expand_lsx_vshuf_const): Removed.
(loongarch_expand_vec_perm_const_1): Merged.
(loongarch_is_double_duplicate): Removed.
(loongarch_is_center_extraction): Ditto.
(loongarch_is_reversing_permutation): Ditto.
(loongarch_is_di_misalign_extract): Ditto.
(loongarch_is_si_misalign_extract): Ditto.
(loongarch_is_lasx_lowpart_extract): Ditto.
(loongarch_is_op_reverse_perm): Ditto.
(loongarch_is_single_op_perm): Ditto.
(loongarch_is_divisible_perm): Ditto.
(loongarch_is_triple_stride_extract): Ditto.
(loongarch_expand_vec_perm_const_2): Merged.
(loongarch_expand_vec_perm_const): New.
(loongarch_vectorize_vec_perm_const): Adjust.
|
|
|
|
We already had smin/smax RTL pattern using vfmin/vfmax instructions.
But for smin/smax, it's unspecified what will happen if either operand
contains any NaN operands. So we would not vectorize the loop with
-fno-finite-math-only (the default for all optimization levels expect
-Ofast).
But, LoongArch vfmin/vfmax instruction is IEEE-754-2008 conformant so we
can also use them and vectorize the loop.
gcc/ChangeLog:
* config/loongarch/simd.md (fmax<mode>3): New define_insn.
(fmin<mode>3): Likewise.
(reduc_fmax_scal_<mode>3): New define_expand.
(reduc_fmin_scal_<mode>3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vfmax-vfmin.c: New test.
|
|
Check whether the assembler supports tls le relax. If it supports it, the assembly
instruction sequence of tls le relax will be generated by default.
The original way to obtain the tls le symbol address:
lu12i.w $rd, %le_hi20(sym)
ori $rd, $rd, %le_lo12(sym)
add.{w/d} $rd, $rd, $tp
If the assembler supports tls le relax, the following sequence is generated:
lu12i.w $rd, %le_hi20_r(sym)
add.{w/d} $rd,$rd,$tp,%le_add_r(sym)
addi.{w/d} $rd,$rd,%le_lo12_r(sym)
gcc/ChangeLog:
* config.in: Regenerate.
* config/loongarch/loongarch-opts.h (HAVE_AS_TLS_LE_RELAXATION): Define.
* config/loongarch/loongarch.cc (loongarch_legitimize_tls_address):
Added TLS Le Relax support.
(loongarch_print_operand_reloc): Add the output string of TLS Le Relax.
* config/loongarch/loongarch.md (@add_tls_le_relax<mode>): New template.
* configure: Regenerate.
* configure.ac: Check if binutils supports TLS le relax.
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add a function to check whether binutil supports
TLS Le Relax.
* gcc.target/loongarch/tls-le-relax.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.md (bstrins_<mode>_for_ior_mask):
For the condition, remove unneeded trailing "\" and move "&&" to
follow GNU coding style. NFC.
|
|
combine
The problem with peephole2 is it uses a naive sliding-window algorithm
and misses many cases. For example:
float a[10000];
float t() { return a[0] + a[8000]; }
is compiled to:
la.local $r13,a
la.local $r12,a+32768
fld.s $f1,$r13,0
fld.s $f0,$r12,-768
fadd.s $f0,$f1,$f0
by trunk. But as we've explained in r14-4851, the following would be
better with -mexplicit-relocs=auto:
pcalau12i $r13,%pc_hi20(a)
pcalau12i $r12,%pc_hi20(a+32000)
fld.s $f1,$r13,%pc_lo12(a)
fld.s $f0,$r12,%pc_lo12(a+32000)
fadd.s $f0,$f1,$f0
However the sliding-window algorithm just won't detect the pcalau12i/fld
pair to be optimized. Use a define_insn_and_rewrite in combine pass
will work around the issue.
gcc/ChangeLog:
* config/loongarch/predicates.md
(symbolic_pcrel_offset_operand): New define_predicate.
(mem_simple_ldst_operand): Likewise.
* config/loongarch/loongarch-protos.h
(loongarch_rewrite_mem_for_simple_ldst): Declare.
* config/loongarch/loongarch.cc
(loongarch_rewrite_mem_for_simple_ldst): Implement.
* config/loongarch/loongarch.md (simple_load<mode>): New
define_insn_and_rewrite.
(simple_load_<su>ext<SUBDI:mode><GPR:mode>): Likewise.
(simple_store<mode>): Likewise.
(define_peephole2): Remove la.local/[f]ld peepholes.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-2.c:
New test.
* gcc.target/loongarch/explicit-relocs-auto-single-load-store-3.c:
New test.
|
|
The GCC internal doc says:
X might be a pseudo-register or a 'subreg' of a pseudo-register,
which could either be in a hard register or in memory. Use
'true_regnum' to find out; it will return -1 if the pseudo is in
memory and the hard register number if it is in a register.
So "MEM_P (x)" is not enough for checking if we are reloading from/to
the memory. This bug has caused reload pass to stall and finally ICE
complaining with "maximum number of generated reload insns per insn
achieved", since r14-6814.
Check if "true_regnum (x)" is -1 besides "MEM_P (x)" to fix the issue.
gcc/ChangeLog:
PR target/113148
* config/loongarch/loongarch.cc (loongarch_secondary_reload):
Check if regno == -1 besides MEM_P (x) for reloading FCCmode
from/to FPR to/from memory.
gcc/testsuite/ChangeLog:
PR target/113148
* gcc.target/loongarch/pr113148.c: New test.
|
|
gcc/ChangeLog:
* config/loongarch/loongarch.md (rotl<mode>3):
New define_expand.
* config/loongarch/simd.md (vrotl<mode>3): Likewise.
(rotl<mode>3): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/rotl-with-rotr.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-vrotr-d.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-b.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-h.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-w.c: New test.
* gcc.target/loongarch/rotl-with-xvrotr-d.c: New test.
|
|
Following code will cause ICE on LoongArch target:
#include <lsxintrin.h>
extern void bar (__m128i, __m128i);
__m128i a;
void
foo ()
{
bar (a, a);
}
It is caused by missing constraint definition in mov<mode>_lsx. This
patch fixes the template and remove the unnecessary processing from
loongarch_split_move () function.
This patch also cleanup the redundant definition from
loongarch_split_move () and loongarch_split_move_p ().
gcc/ChangeLog:
* config/loongarch/lasx.md: Use loongarch_split_move and
loongarch_split_move_p directly.
* config/loongarch/loongarch-protos.h
(loongarch_split_move): Remove unnecessary argument.
(loongarch_split_move_insn_p): Delete.
(loongarch_split_move_insn): Delete.
* config/loongarch/loongarch.cc
(loongarch_split_move_insn_p): Delete.
(loongarch_load_store_insns): Use loongarch_split_move_p
directly.
(loongarch_split_move): remove the unnecessary processing.
(loongarch_split_move_insn): Delete.
* config/loongarch/lsx.md: Use loongarch_split_move and
loongarch_split_move_p directly.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/vector/lsx/lsx-mov-1.c: New test.
|
|
When investigaing failure of gcc.dg/vect/slp-reduc-sad.c, following
instruction block are being generated by vec_concatv32qi (which is
generated by vec_initv32qiv16qi) at entrance of foo() function:
vldx $vr3,$r5,$r6
vld $vr2,$r5,0
xvpermi.q $xr2,$xr3,0x20
causes the reversion of vec_initv32qiv16qi operation's high and
low 128-bit part.
According to other target's similar impl and LSX impl for following
RTL representation, current definition in lasx.md of "vec_concat<mode>"
are wrong:
(set (op0) (vec_concat (op1) (op2)))
For correct behavior, the last argument of xvpermi.q should be 0x02
instead of 0x20. This patch fixes this issue and cleanup the vec_concat
template impl.
gcc/ChangeLog:
* config/loongarch/lasx.md (vec_concatv4di): Delete.
(vec_concatv8si): Delete.
(vec_concatv16hi): Delete.
(vec_concatv32qi): Delete.
(vec_concatv4df): Delete.
(vec_concatv8sf): Delete.
(vec_concat<mode>): New template with insn output fixed.
|
|
We found that using the latest compiled gcc will cause a miscompare error
when running spec2006 400.perlbench test with -flto turned on. After testing,
it was found that only the LoongArch architecture will report errors.
The first error commit was located through the git bisect command as
r14-3773-g5b857e87201335. Through debugging, it was found that the problem
was that the split condition of the *bstrins_<mode>_for_ior_mask template was
empty, which should actually be consistent with the insn condition.
gcc/ChangeLog:
* config/loongarch/loongarch.md: Adjust.
|
|
Remove a redundant sign extension.
gcc/ChangeLog:
* config/loongarch/loongarch.md (rotrsi3_extend): New
define_insn.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/rotrw.c: New test.
|
|
We used a branch to load floating-point comparison results into GPR.
This is very slow when the branch is not predictable.
Implement movfcc so we can reload FCCmode into GPRs, FPRs, and MEM.
Then implement cstore<ANYF:mode>4.
gcc/ChangeLog:
* config/loongarch/loongarch-tune.h
(loongarch_rtx_cost_data::movcf2gr): New field.
(loongarch_rtx_cost_data::movcf2gr_): New method.
(loongarch_rtx_cost_data::use_movcf2gr): New method.
* config/loongarch/loongarch-def.cc
(loongarch_rtx_cost_data::loongarch_rtx_cost_data): Set movcf2gr
to COSTS_N_INSNS (7) and movgr2cf to COSTS_N_INSNS (15), based
on timing on LA464.
(loongarch_cpu_rtx_cost_data): Set movcf2gr and movgr2cf to
COSTS_N_INSNS (1) for LA664.
(loongarch_rtx_cost_optimize_size): Set movcf2gr and movgr2cf to
COSTS_N_INSNS (1) + 1.
* config/loongarch/predicates.md (loongarch_fcmp_operator): New
predicate.
* config/loongarch/loongarch.md (movfcc): Change to
define_expand.
(movfcc_internal): New define_insn.
(fcc_to_<X:mode>): New define_insn.
(cstore<ANYF:mode>4): New define_expand.
* config/loongarch/loongarch.cc
(loongarch_hard_regno_mode_ok_uncached): Allow FCCmode in GPRs
and GPRs.
(loongarch_secondary_reload): Reload FCCmode via FPR and/or GPR.
(loongarch_emit_float_compare): Call gen_reg_rtx instead of
loongarch_allocate_fcc.
(loongarch_allocate_fcc): Remove.
(loongarch_move_to_gpr_cost): Handle FCC_REGS -> GR_REGS.
(loongarch_move_from_gpr_cost): Handle GR_REGS -> FCC_REGS.
(loongarch_register_move_cost): Handle FCC_REGS -> FCC_REGS,
FCC_REGS -> FP_REGS, and FP_REGS -> FCC_REGS.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/movcf2gr.c: New test.
* gcc.target/loongarch/movcf2gr-via-fr.c: New test.
|
|
When I attempt to enable vect_usad_char effective target for LoongArch, slp-reduc-sad.c
and vect-reduc-sad*.c tests fail. These tests fail because the sad pattern generates bad
code. This patch to fixed them, for sad patterns, use zero expansion instead of sign
expansion for reduction.
Currently, we are fixing failed vectorized tests, and in the future, we will
enable more tests of "vect" for LoongArch.
gcc/ChangeLog:
* config/loongarch/lasx.md: Use zero expansion instruction.
* config/loongarch/lsx.md: Ditto.
|
|
Non functional change, clean up the code.
gcc/ChangeLog:
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Remove "temp2" and reuse
"temp" instead.
(loongarch_expand_vector_init): Use gcc_unreachable () instead
of gcc_assert (0), and fix the comment for it.
|
|
expander [PR113033]
Jakub says:
Then that seems like a bug in the loongarch vec_init pattern(s).
Those really don't have a predicate in any of the backends on the
input operand, so they need to force_reg it if it is something it
can't handle. I've looked e.g. at i386 vec_init and that is exactly
what it does, see the various tests + force_reg calls in
ix86_expand_vector_init*.
So replace gen_reg_rtx + emit_move_insn with force_reg to fix PR 113033.
gcc/ChangeLog:
PR target/113033
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Replace gen_reg_rtx +
emit_move_insn with force_reg.
(loongarch_expand_vector_init): Likewise.
gcc/testsuite/ChangeLog:
PR target/113033
* gcc.target/loongarch/pr113033.c: New test.
|
|
We had the following mappings between <x>vfcmp submenmonics and RTX
codes:
(define_code_attr fcc
[(unordered "cun")
(ordered "cor")
(eq "ceq")
(ne "cne")
(uneq "cueq")
(unle "cule")
(unlt "cult")
(le "cle")
(lt "clt")])
This is inconsistent with scalar code:
(define_code_attr fcond [(unordered "cun")
(uneq "cueq")
(unlt "cult")
(unle "cule")
(eq "ceq")
(lt "slt")
(le "sle")
(ordered "cor")
(ltgt "sne")
(ne "cune")
(ge "sge")
(gt "sgt")
(unge "cuge")
(ungt "cugt")])
For every RTX code for which the LSX/LASX code is different from the
scalar code, the scalar code is correct and the LSX/LASX code is wrong.
Most seriously, the RTX code NE should be mapped to "cneq", not "cne".
Rewrite <x>vfcmp define_insns in simd.md using the same mapping as
scalar fcmp.
Note that GAS does not support [x]vfcmp.{c/s}[u]{ge/gt} (pseudo)
instruction (although fcmp.{c/s}[u]{ge/gt} is supported), so we need to
switch the order of inputs and use [x]vfcmp.{c/s}[u]{le/lt} instead.
The <x>vfcmp.{sult/sule/clt/cle}.{s/d} instructions do not have a single
RTX code, but they can be modeled as an inversed RTX code following a
"not" operation. Doing so allows the compiler to optimized vectorized
__builtin_isless etc. to a single instruction. This optimization should
be added for scalar code too and I'll do it later.
Tests are added for mapping between C code, IEC 60559 operations, and
vfcmp instructions.
[1]:https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640713.html
gcc/ChangeLog:
PR target/113034
* config/loongarch/lasx.md (UNSPEC_LASX_XVFCMP_*): Remove.
(lasx_xvfcmp_caf_<flasxfmt>): Remove.
(lasx_xvfcmp_cune_<FLASX:flasxfmt>): Remove.
(FSC256_UNS): Remove.
(fsc256): Remove.
(lasx_xvfcmp_<vfcond:fcc>_<FLASX:flasxfmt>): Remove.
(lasx_xvfcmp_<fsc256>_<FLASX:flasxfmt>): Remove.
* config/loongarch/lsx.md (UNSPEC_LSX_XVFCMP_*): Remove.
(lsx_vfcmp_caf_<flsxfmt>): Remove.
(lsx_vfcmp_cune_<FLSX:flsxfmt>): Remove.
(vfcond): Remove.
(fcc): Remove.
(FSC_UNS): Remove.
(fsc): Remove.
(lsx_vfcmp_<vfcond:fcc>_<FLSX:flsxfmt>): Remove.
(lsx_vfcmp_<fsc>_<FLSX:flsxfmt>): Remove.
* config/loongarch/simd.md
(fcond_simd): New define_code_iterator.
(<simd_isa>_<x>vfcmp_<fcond:fcond_simd>_<simdfmt>):
New define_insn.
(fcond_simd_rev): New define_code_iterator.
(fcond_rev_asm): New define_code_attr.
(<simd_isa>_<x>vfcmp_<fcond:fcond_simd_rev>_<simdfmt>):
New define_insn.
(fcond_inv): New define_code_iterator.
(fcond_inv_rev): New define_code_iterator.
(fcond_inv_rev_asm): New define_code_attr.
(<simd_isa>_<x>vfcmp_<fcond_inv>_<simdfmt>): New define_insn.
(<simd_isa>_<x>vfcmp_<fcond_inv:fcond_inv_rev>_<simdfmt>):
New define_insn.
(UNSPEC_SIMD_FCMP_CAF, UNSPEC_SIMD_FCMP_SAF,
UNSPEC_SIMD_FCMP_SEQ, UNSPEC_SIMD_FCMP_SUN,
UNSPEC_SIMD_FCMP_SUEQ, UNSPEC_SIMD_FCMP_CNE,
UNSPEC_SIMD_FCMP_SOR, UNSPEC_SIMD_FCMP_SUNE): New unspecs.
(SIMD_FCMP): New define_int_iterator.
(fcond_unspec): New define_int_attr.
(<simd_isa>_<x>vfcmp_<fcond_unspec>_<simdfmt>): New define_insn.
* config/loongarch/loongarch.cc (loongarch_expand_lsx_cmp):
Remove unneeded special cases.
gcc/testsuite/ChangeLog:
PR target/113034
* gcc.target/loongarch/vfcmp-f.c: New test.
* gcc.target/loongarch/vfcmp-d.c: New test.
* gcc.target/loongarch/xvfcmp-f.c: New test.
* gcc.target/loongarch/xvfcmp-d.c: New test.
* gcc.target/loongarch/vector/lasx/lasx-vcond-2.c: Scan for cune
instead of cne.
* gcc.target/loongarch/vector/lsx/lsx-vcond-2.c: Likewise.
|
|
gcc/ChangeLog:
* config.gcc: Add loongarch-d.o to d_target_objs for LoongArch
architecture.
* config/loongarch/t-loongarch: Add object target for loongarch-d.cc.
* config/loongarch/loongarch-d.cc
(loongarch_d_target_versions): add interface function to define builtin
D versions for LoongArch architecture.
(loongarch_d_handle_target_float_abi): add interface function to define
builtin D traits for LoongArch architecture.
(loongarch_d_register_target_info): add interface function to register
loongarch_d_handle_target_float_abi function.
* config/loongarch/loongarch-d.h
(loongarch_d_target_versions): add function prototype.
(loongarch_d_register_target_info): Likewise.
libphobos/ChangeLog:
* configure.tgt: Enable libphobos for LoongArch architecture.
* libdruntime/gcc/sections/elf.d: Add TLS_DTV_OFFSET constant for
LoongArch64.
* libdruntime/gcc/unwind/generic.d: Add __aligned__ constant for
LoongArch64.
|