Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
* config.gcc: Add a case for loongarch*-*-linux-musl*.
* config/loongarch/linux.h: Disable the multilib-compatible
treatment for *musl* targets.
* config/loongarch/musl.h: New file.
|
|
gcc/ChangeLog:
* config.gcc (target_gtfiles): Change coreout to btfext-out.
(extra_objs): Change coreout to btfext-out.
* config/bpf/coreout.cc: Rename to btfext-out.cc.
* config/bpf/btfext-out.cc: Add.
* config/bpf/coreout.h: Rename to btfext-out.h.
* config/bpf/btfext-out.h: Add.
* config/bpf/core-builtins.cc: Change include.
* config/bpf/core-builtins.h: Change include.
* config/bpf/t-bpf: Accomodate renamed files.
|
|
The following deprecates ia64*-*-* for GCC 14. Since we plan to
force LRA for GCC 15 and the target only has slim chances of getting
updated this notifies people in advance. Given both Linux and
glibc have axed the target further development is also made difficult.
There is no listed maintainer for ia64 either.
PR target/90785
gcc/
* config.gcc: Add ia64*-*-* to the list of obsoleted targets.
contrib/
* config-list.mk (LIST): --enable-obsolete for ia64*-*-*.
|
|
gcc/ChangeLog:
* config.gcc (amdgcn-*-*): Add gfx1030 and gfx1100 to
TM_MULTILIB_CONFIG.
* doc/install.texi (Configuration amdgcn-*-*): Mention gfx1030/gfx1100.
* doc/invoke.texi (AMD GCN Options): Add gfx1030 and gfx1100 to
-march/-mtune.
libgomp/ChangeLog:
* testsuite/libgomp.c/declare-variant-4.h: Add variant functions
for gfx1030 and gfx1100.
* testsuite/libgomp.c/declare-variant-4-gfx1030.c: New test.
* testsuite/libgomp.c/declare-variant-4-gfx1100.c: New test.
Signed-off-by: Tobias Burnus <tburnus@baylibre.com>
|
|
This patch is to handle the differences in instruction generation
between Vector and XTheadVector. In this version, we only support
partial xtheadvector instructions that leverage directly from current
RVV1.0 with simple adding "th." prefix. For different name xtheadvector
instructions but share same patterns as RVV1.0 instructions, we will
use ASM targethook to rewrite the whole string of the instructions in
the following patches.
For some vector patterns that cannot be avoided, we use
"!TARGET_XTHEADVECTOR" to disable them in vector.md in order
not to generate instructions that xtheadvector does not support,
like vmv1r.
gcc/ChangeLog:
* config.gcc: Add files for XTheadVector intrinsics.
* config/riscv/autovec.md: Guard XTheadVector.
* config/riscv/predicates.md: Disable immediate vl
for XTheadVector.
* config/riscv/riscv-c.cc (riscv_pragma_intrinsic):
Add pragma for XTheadVector.
* config/riscv/riscv-string.cc (riscv_expand_block_move):
Guard XTheadVector.
* config/riscv/riscv-v.cc (vls_mode_valid_p):
Avoid autovec.
* config/riscv/riscv-vector-builtins-bases.cc:
Do not normalize vsetvl instructions for XTheadVector.
* config/riscv/riscv-vector-builtins-shapes.cc (check_type):
New check type function.
(build_one): Adjust for XTheadVector.
* config/riscv/riscv-vector-switch.def (ENTRY):
Disable fractional mode for the XTheadVector extension.
(TUPLE_ENTRY): Likewise.
* config/riscv/riscv.cc (riscv_v_adjust_bytesize):
Guard XTheadVector.
(riscv_preferred_simd_mode): Likewsie.
(riscv_autovectorize_vector_modes): Likewise.
(riscv_vector_mode_supported_any_target_p): Likewise.
(TARGET_VECTOR_MODE_SUPPORTED_ANY_TARGET_P): Likewise.
* config/riscv/thead.cc (th_asm_output_opcode):
Rewrite vsetvl instructions.
* config/riscv/vector.md:
Include thead-vector.md and change fractional LMUL
into 1 for vbool.
* config/riscv/riscv_th_vector.h: New file.
* config/riscv/thead-vector.md: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/pragma-1.c: Add XTheadVector.
* gcc.target/riscv/rvv/base/abi-1.c: Exclude XTheadVector.
* lib/target-supports.exp: Add target for XTheadVector.
Co-authored-by: Jin Ma <jinma@linux.alibaba.com>
Co-authored-by: Xianmiao Qu <cooper.qu@linux.alibaba.com>
Co-authored-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
This patch adds C intrinsics for Bitmanip Extension.
RISCV_BUILTIN_NO_PREFIX is a new riscv_builtin_description like RISCV_BUILTIN.
But it uses CODE_FOR_##INSN rather than CODE_FOR_riscv_##INSN.
Changed orcb, clmul, brev8 pattern's mode form X to GPR because orcbsi, clmul_si,
brev8_si are both included in rv32 and rv64. Test them in scalar_bitmanip_intrinsic-64-emulated.c.
gcc/ChangeLog:
* config.gcc: Include riscv_bitmanip.h.
* config/riscv/bitmanip.md: Changed mode form X to GPR in orcb and clmul pattern.
* config/riscv/crypto.md: Changed mode form X to GPR in brev8 pattern.
* config/riscv/riscv-builtins.cc (AVAIL): Adding new bitmanip builtins.
(RISCV_BUILTIN_NO_PREFIX): New helper macro.
* config/riscv/riscv-cmo.def (RISCV_BUILTIN): Add '_32'/'_64' postfix to builtins.
* config/riscv/riscv-ftypes.def (2): New ftypes.
* config/riscv/riscv-scalar-crypto.def (RISCV_BUILTIN): New builtins.
(RISCV_BUILTIN_NO_PREFIX): Likewise.
* config/riscv/riscv_bitmanip.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/scalar_bitmanip_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64-emulated.c: New test.
* gcc.target/riscv/scalar_bitmanip_intrinsic-64.c: New test.
|
|
This patch adds C intrinsics for Scalar Crypto Extension.
gcc/ChangeLog:
* config.gcc: Include riscv_crypto.h.
* config/riscv/riscv_crypto.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/scalar_crypto_intrinsic-32.c: New test.
* gcc.target/riscv/scalar_crypto_intrinsic-64.c: New test.
|
|
The r14-6524 changes created aarch64-builtins.h header and moved
struct aarch64_simd_type_info definition in there.
Unfortunately, the new header wasn't added to target_gtfiles, so the
trees and const char * pointer elements in the aarch64_simd_types
array aren't marked as GC roots anymore. That breaks e.g. PCH, when
the array elements then can refer to ggc_freed memory instead of the expected
types, but also any other GC collection could free them and further uses would
not work correctly.
Unfortunately, just adding the new header to target_gtfiles doesn't fix this,
because non-static variable definitions marked with GTY(()) aren't considered
by gengtype, it looks in those cases for an extern GTY(()) declaration, and
there was none - the aarch64-builtins.h header contains an extern declaration
without GTY(()). Adding GTY(()) to that extern declaration doesn't work, because
then gengtype attempts to emit the aarch64_simd_types GC roots in gtype-desc.cc
but the corresponding header isn't included there.
So, the patch instead adds another extern declaration in aarch64-builtins.cc
right before the actual definition, which makes sure the GC roots are registered
correctly in gt-aarch64-builtins.h (where we want them).
2024-01-09 Jakub Jelinek <jakub@redhat.com>
PR target/113270
* config.gcc (aarch64*-*-*): Add aarch64-builtins.h to target_gtfiles.
* config/aarch64/aarch64-builtins.cc (aarch64_simd_types): Add extern
GTY(()) declaration before the definition, drop GTY(()) drom the
definition.
|
|
ROCm since 5.7.1 supports gfx1100 (RDNA3) cards. This commit adds support
for it, mostly by assuming gfx1100 behaves identical to gfx1030. Like gfx1030,
gfx1100 support is neither documented nor the build of the multilib enabled by
default.
But contrary to gfx1030, gfx1100 has a known issue causing some libraries not
to build, including newlib: The sdwa variant of v_mov_b32_sdwa is not supported
by the hardware but GCC current does generates this instruction.
This will be addressed in a later commit.
gcc/ChangeLog:
* config.gcc (amdgcn-*-amdhsa): Accept --with-arch=gfx1100.
* config/gcn/gcn-hsa.h (NO_XNACK): Add gfx1100:
(ASM_SPEC): Handle gfx1100.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1100.
(enum gcn_isa): Add ISA_RDNA3.
(TARGET_GFX1100, TARGET_RDNA2_PLUS, TARGET_RDNA3): Define.
* config/gcn/gcn-valu.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
* config/gcn/gcn.cc (gcn_option_override,
gcn_omp_device_kind_arch_isa, output_file_start): Handle gfx1100.
(gcn_global_address_p, gcn_addr_space_legitimate_address_p): Change
TARGET_RDNA2 to TARGET_RDNA2_PLUS.
(gcn_hsa_declare_function_name): Don't use '.amdhsa_reserve_flat_scratch'
with gfx1100.
* config/gcn/gcn.h (ASSEMBLER_DIALECT): Likewise.
(TARGET_CPU_CPP_BUILTINS): Define __RDNA3__, __gfx1030__ and
__gfx1100__.
* config/gcn/gcn.md: Change TARGET_RDNA2 to TARGET_RDNA2_PLUS.
* config/gcn/gcn.opt (Enum gpu_type): Add gfx1100.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1100): Define.
(isa_has_combined_avgprs, main): Handle gfx1100.
* config/gcn/t-omp-device (isa): Add gfx1100.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (gcn_gfx1100_s): New const string.
(gcn_isa_name_len): Fix length.
(isa_hsa_name, isa_code, max_isa_vgprs): Handle gfx1100.
|
|
|
|
gcc/ChangeLog:
* config.gcc: Add loongarch-d.o to d_target_objs for LoongArch
architecture.
* config/loongarch/t-loongarch: Add object target for loongarch-d.cc.
* config/loongarch/loongarch-d.cc
(loongarch_d_target_versions): add interface function to define builtin
D versions for LoongArch architecture.
(loongarch_d_handle_target_float_abi): add interface function to define
builtin D traits for LoongArch architecture.
(loongarch_d_register_target_info): add interface function to register
loongarch_d_handle_target_float_abi function.
* config/loongarch/loongarch-d.h
(loongarch_d_target_versions): add function prototype.
(loongarch_d_register_target_info): Likewise.
libphobos/ChangeLog:
* configure.tgt: Enable libphobos for LoongArch architecture.
* libdruntime/gcc/sections/elf.d: Add TLS_DTV_OFFSET constant for
LoongArch64.
* libdruntime/gcc/unwind/generic.d: Add __aligned__ constant for
LoongArch64.
|
|
This adds a new aarch64-specific RTL-SSA pass dedicated to forming load
and store pairs (LDPs and STPs).
As a motivating example for the kind of thing this improves, take the
following testcase:
extern double c[20];
double f(double x)
{
double y = x*x;
y += c[16];
y += c[17];
y += c[18];
y += c[19];
return y;
}
for which we currently generate (at -O2):
f:
adrp x0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
ldr d30, [x0, 144]
fmadd d0, d0, d0, d31
ldr d31, [x0, 152]
fadd d0, d0, d29
fadd d0, d0, d30
fadd d0, d0, d31
ret
but with the pass, we generate:
f:
.LFB0:
adrp x0, c
add x0, x0, :lo12:c
ldp d31, d29, [x0, 128]
fmadd d0, d0, d0, d31
ldp d30, d31, [x0, 144]
fadd d0, d0, d29
fadd d0, d0, d30
fadd d0, d0, d31
ret
The pass is local (only considers a BB at a time). In theory, it should
be possible to extend it to run over EBBs, at least in the case of pure
(MEM_READONLY_P) loads, but this is left for future work.
The pass works by identifying two kinds of bases: tree decls obtained
via MEM_EXPR, and RTL register bases in the form of RTL-SSA def_infos.
If a candidate memory access has a MEM_EXPR base, then we track it via
this base, and otherwise if it is of a simple reg + <imm> form, we track
it via the RTL-SSA def_info for the register.
For each BB, for a given kind of base, we build up a hash table mapping
the base to an access_group. The access_group data structure holds a
list of accesses at each offset relative to the same base. It uses a
splay tree to support efficient insertion (while walking the bb), and
the nodes are chained using a linked list to support efficient
iteration (while doing the transformation).
For each base, we then iterate over the access_group to identify
adjacent accesses, and try to form load/store pairs for those insns that
access adjacent memory.
The pass is currently run twice, both before and after register
allocation. The first copy of the pass is run late in the pre-RA RTL
pipeline, immediately after sched1, since it was found that sched1 was
increasing register pressure when the pass was run before. The second
copy of the pass runs immediately before peephole2, so as to get any
opportunities that the existing ldp/stp peepholes can handle.
There are some cases that we punt on before RA, e.g.
accesses relative to eliminable regs (such as the soft frame pointer).
We do this since we can't know the elimination offset before RA, and we
want to avoid the RA reloading the offset (due to being out of ldp/stp
immediate range) as this can generate worse code.
The post-RA copy of the pass is there to pick up the crumbs that were
left behind / things we punted on in the pre-RA pass. Among other
things, it's needed to handle accesses relative to the stack pointer.
It can also handle code that didn't exist at the time the pre-RA pass
was run (spill code, prologue/epilogue code).
This is an initial implementation, and there are (among other possible
improvements) the following notable caveats / missing features that are
left for future work, but could give further improvements:
- Moving accesses between BBs within in an EBB, see above.
- Out-of-range opportunities: currently the pass refuses to form pairs
if there isn't a suitable base register with an immediate in range
for ldp/stp, but it can be profitable to emit anchor addresses in the
case that there are four or more out-of-range nearby accesses that can
be formed into pairs. This is handled by the current ldp/stp
peepholes, so it would be good to support this in the future.
- Discovery: currently we prioritize MEM_EXPR bases over RTL bases, which can
lead to us missing opportunities in the case that two accesses have distinct
MEM_EXPR bases (i.e. different DECLs) but they are still adjacent in memory
(e.g. adjacent variables on the stack). I hope to address this for GCC 15,
hopefully getting to the point where we can remove the ldp/stp peepholes and
scheduling hooks. Furthermore it would be nice to make the pass aware of
section anchors (adding these as a third kind of base) allowing merging
accesses to adjacent variables within the same section.
gcc/ChangeLog:
* config.gcc: Add aarch64-ldp-fusion.o to extra_objs for aarch64.
* config/aarch64/aarch64-passes.def: Add copies of pass_ldp_fusion
before and after RA.
* config/aarch64/aarch64-protos.h (make_pass_ldp_fusion): Declare.
* config/aarch64/aarch64.opt (-mearly-ldp-fusion): New.
(-mlate-ldp-fusion): New.
(--param=aarch64-ldp-alias-check-limit): New.
(--param=aarch64-ldp-writeback): New.
* config/aarch64/t-aarch64: Add rule for aarch64-ldp-fusion.o.
* config/aarch64/aarch64-ldp-fusion.cc: New file.
* doc/invoke.texi (AArch64 Options): Document new
-m{early,late}-ldp-fusion options.
|
|
ACLE has added intrinsics to bridge between SVE and Neon.
The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.
This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq
gcc/ChangeLog:
* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h:
Add handle_arm_neon_sve_bridge_h
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_<mode>): Likewise.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.
|
|
This pass adds a simple register allocator for FP & SIMD registers.
Its main purpose is to make use of SME2's strided LD1, ST1 and LUTI2/4
instructions, which require a very specific grouping structure,
and so would be difficult to exploit with general allocation.
The allocator is very simple. It gives up on anything that would
require spilling, or that it might not handle well for other reasons.
The allocator needs to track liveness at the level of individual FPRs.
Doing that fixes a lot of the PRs relating to redundant moves caused by
structure loads and stores. That particular problem is going to be
fixed more generally for GCC 15 by Lehua's RA patches.
However, the early-RA pass runs before scheduling, so it has a chance
to bag a spill-free allocation of vector code before the scheduler moves
things around. It could therefore still be useful for non-SME code
(e.g. for hand-scheduled ACLE code) even after Lehua's patches are in.
The pass is controlled by a tristate switch:
- -mearly-ra=all: run on all functions
- -mearly-ra=strided: run on functions that have access to strided registers
- -mearly-ra=none: don't run on any function
The patch makes -mearly-ra=all the default at -O2 and above for now.
We can revisit this for GCC 15 once Lehua's patches are in;
-mearly-ra=strided might then be more appropriate.
As said previously, the pass is very naive. There's much more that we
could do, such as handling invariants better. The main focus is on not
committing to a bad allocation, rather than on handling as much as
possible.
gcc/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* config.gcc: Add aarch64-early-ra.o for AArch64 targets.
* config/aarch64/t-aarch64 (aarch64-early-ra.o): New rule.
* config/aarch64/aarch64-opts.h (aarch64_early_ra_scope): New enum.
* config/aarch64/aarch64.opt (mearly_ra): New option.
* doc/invoke.texi: Document it.
* common/config/aarch64/aarch64-common.cc
(aarch_option_optimization_table): Use -mearly-ra=strided by
default for -O2 and above.
* config/aarch64/aarch64-passes.def (pass_aarch64_early_ra): New pass.
* config/aarch64/aarch64-protos.h (aarch64_strided_registers_p)
(make_pass_aarch64_early_ra): Declare.
* config/aarch64/aarch64-sme.md (@aarch64_sme_lut<LUTI_BITS><mode>):
Add a stride_type attribute.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided2): New pattern.
(@aarch64_sme_lut<LUTI_BITS><mode>_strided4): Likewise.
* config/aarch64/aarch64-sve-builtins-base.cc (svld1_impl::expand)
(svldnt1_impl::expand, svst1_impl::expand, svstn1_impl::expand): Handle
new way of defining multi-register loads and stores.
* config/aarch64/aarch64-sve.md (@aarch64_ld1<SVE_FULLx24:mode>)
(@aarch64_ldnt1<SVE_FULLx24:mode>, @aarch64_st1<SVE_FULLx24:mode>)
(@aarch64_stnt1<SVE_FULLx24:mode>): Delete.
* config/aarch64/aarch64-sve2.md (@aarch64_<LD1_COUNT:optab><mode>)
(@aarch64_<LD1_COUNT:optab><mode>_strided2): New patterns.
(@aarch64_<LD1_COUNT:optab><mode>_strided4): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided2): Likewise.
(@aarch64_<ST1_COUNT:optab><mode>_strided4): Likewise.
* config/aarch64/aarch64.cc (aarch64_strided_registers_p): New
function.
* config/aarch64/aarch64.md (UNSPEC_LD1_SVE_COUNT): Delete.
(UNSPEC_ST1_SVE_COUNT, UNSPEC_LDNT1_SVE_COUNT): Likewise.
(UNSPEC_STNT1_SVE_COUNT): Likewise.
(stride_type): New attribute.
* config/aarch64/constraints.md (Uwd, Uwt): New constraints.
* config/aarch64/iterators.md (UNSPEC_LD1_COUNT, UNSPEC_LDNT1_COUNT)
(UNSPEC_ST1_COUNT, UNSPEC_STNT1_COUNT): New unspecs.
(optab): Handle them.
(LD1_COUNT, ST1_COUNT): New iterators.
* config/aarch64/aarch64-early-ra.cc: New file.
gcc/testsuite/
PR rtl-optimization/106694
PR rtl-optimization/109078
PR rtl-optimization/109391
* gcc.target/aarch64/ldp_stp_16.c (cons4_4_float): Tighten expected
output test.
* gcc.target/aarch64/sve/shift_1.c: Allow reversed shifts for .s
as well as .d.
* gcc.target/aarch64/sme/strided_1.c: New test.
* gcc.target/aarch64/pr109078.c: Likewise.
* gcc.target/aarch64/pr109391.c: Likewise.
* gcc.target/aarch64/sve/pr106694.c: Likewise.
|
|
This adds support for the SME parts of arm_sme.h.
gcc/
* doc/invoke.texi: Document +sme-i16i64 and +sme-f64f64.
* config.gcc (aarch64*-*-*): Add arm_sme.h to the list of headers
to install and aarch64-sve-builtins-sme.o to the list of objects
to build.
* config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define
or undefine TARGET_SME, TARGET_SME_I16I64 and TARGET_SME_F64F64.
(aarch64_pragma_aarch64): Handle arm_sme.h.
* config/aarch64/aarch64-option-extensions.def (sme-i16i64)
(sme-f64f64): New extensions.
* config/aarch64/aarch64-protos.h (aarch64_sme_vq_immediate)
(aarch64_addsvl_addspl_immediate_p, aarch64_output_addsvl_addspl)
(aarch64_output_sme_zero_za): Declare.
(aarch64_output_move_struct): Delete.
(aarch64_sme_ldr_vnum_offset): Declare.
(aarch64_sve::handle_arm_sme_h): Likewise.
* config/aarch64/aarch64.h (AARCH64_ISA_SM_ON): New macro.
(AARCH64_ISA_SME_I16I64, AARCH64_ISA_SME_F64F64): Likewise.
(TARGET_STREAMING, TARGET_STREAMING_SME): Likewise.
(TARGET_SME_I16I64, TARGET_SME_F64F64): Likewise.
* config/aarch64/aarch64.cc (aarch64_sve_rdvl_factor_p): Rename to...
(aarch64_sve_rdvl_addvl_factor_p): ...this.
(aarch64_sve_rdvl_immediate_p): Update accordingly.
(aarch64_rdsvl_immediate_p, aarch64_add_offset): Likewise.
(aarch64_sme_vq_immediate): Likewise. Make public.
(aarch64_sve_addpl_factor_p): New function.
(aarch64_sve_addvl_addpl_immediate_p): Use
aarch64_sve_rdvl_addvl_factor_p and aarch64_sve_addpl_factor_p.
(aarch64_addsvl_addspl_immediate_p): New function.
(aarch64_output_addsvl_addspl): Likewise.
(aarch64_cannot_force_const_mem): Return true for RDSVL immediates.
(aarch64_classify_index): Handle .Q scaling for VNx1TImode.
(aarch64_classify_address): Likewise for vnum offsets.
(aarch64_output_sme_zero_za): New function.
(aarch64_sme_ldr_vnum_offset_p): Likewise.
* config/aarch64/predicates.md (aarch64_addsvl_addspl_immediate):
New predicate.
(aarch64_pluslong_operand): Include it for SME.
* config/aarch64/constraints.md (Ucj, Uav): New constraints.
* config/aarch64/iterators.md (VNx1TI_ONLY): New mode iterator.
(SME_ZA_I, SME_ZA_SDI, SME_ZA_SDF_I, SME_MOP_BHI): Likewise.
(SME_MOP_HSDF): Likewise.
(UNSPEC_SME_ADDHA, UNSPEC_SME_ADDVA, UNSPEC_SME_FMOPA)
(UNSPEC_SME_FMOPS, UNSPEC_SME_LD1_HOR, UNSPEC_SME_LD1_VER)
(UNSPEC_SME_READ_HOR, UNSPEC_SME_READ_VER, UNSPEC_SME_SMOPA)
(UNSPEC_SME_SMOPS, UNSPEC_SME_ST1_HOR, UNSPEC_SME_ST1_VER)
(UNSPEC_SME_SUMOPA, UNSPEC_SME_SUMOPS, UNSPEC_SME_UMOPA)
(UNSPEC_SME_UMOPS, UNSPEC_SME_USMOPA, UNSPEC_SME_USMOPS)
(UNSPEC_SME_WRITE_HOR, UNSPEC_SME_WRITE_VER): New unspecs.
(elem_bits): Handle x2 and x4 structure modes, plus VNx1TI.
(Vetype, Vesize, VPRED): Handle VNx1TI.
(b): New mode attribute.
(SME_LD1, SME_READ, SME_ST1, SME_WRITE, SME_BINARY_SDI, SME_INT_MOP)
(SME_FP_MOP): New int iterators.
(optab): Handle SME unspecs.
(hv): New int attribute.
* config/aarch64/aarch64.md (*add<mode>3_aarch64): Handle ADDSVL
and ADDSPL.
* config/aarch64/aarch64-sme.md (UNSPEC_SME_LDR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_ldr0, @aarch64_sme_ldrn<mode>): New patterns.
(UNSPEC_SME_STR): New unspec.
(@aarch64_sme_<optab><mode>, @aarch64_sme_<optab><mode>_plus)
(aarch64_sme_str0, @aarch64_sme_strn<mode>): New patterns.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(@aarch64_sme_<optab><v_int_container><mode>): Likewise.
(*aarch64_sme_<optab><v_int_container><mode>_plus): Likewise.
(@aarch64_sme_<optab><VNx1TI_ONLY:mode><SVE_FULL:mode>): Likewise.
(UNSPEC_SME_ZERO): New unspec.
(aarch64_sme_zero): New pattern.
(@aarch64_sme_<SME_BINARY_SDI:optab><mode>): Likewise.
(@aarch64_sme_<SME_INT_MOP:optab><mode>): Likewise.
(@aarch64_sme_<SME_FP_MOP:optab><mode>): Likewise.
* config/aarch64/aarch64-sve-builtins.def: Add ZA type suffixes.
Include aarch64-sve-builtins-sme.def.
(DEF_SME_ZA_FUNCTION): New macro.
* config/aarch64/aarch64-sve-builtins.h (CP_READ_ZA): New call
property.
(CP_WRITE_ZA): Likewise.
(PRED_za_m): New predication type.
(type_suffix_index): Handle DEF_SME_ZA_SUFFIX.
(type_suffix_info): Add vector_p and za_p fields.
(function_instance::num_za_tiles): New member function.
(function_builder::get_attributes): Add an aarch64_feature_flags
argument.
(function_expander::get_contiguous_base): Take a base argument
number, a vnum argument number, and an argument that indicates
whether the vnum parameter is a factor of the SME vector length
or the prevailing vector length.
(function_expander::add_integer_operand): Take a poly_int64.
(sve_switcher::sve_switcher): Take a base set of flags.
(sme_switcher): New class.
(scalar_types): Add a null entry for NUM_VECTOR_TYPES.
* config/aarch64/aarch64-sve-builtins.cc: Include
aarch64-sve-builtins-sme.h.
(pred_suffixes): Add an entry for PRED_za_m.
(type_suffixes): Initialize vector_p and za_p. Handle ZA suffixes.
(TYPES_all_za, TYPES_d_za, TYPES_za_bhsd_data, TYPES_za_all_data)
(TYPES_za_s_integer, TYPES_za_d_integer, TYPES_mop_base)
(TYPES_mop_base_signed, TYPES_mop_base_unsigned, TYPES_mop_i16i64)
(TYPES_mop_i16i64_signed, TYPES_mop_i16i64_unsigned, TYPES_za): New
type suffix macros.
(preds_m, preds_za_m): New predication lists.
(function_groups): Handle DEF_SME_ZA_FUNCTION.
(scalar_types): Add an entry for NUM_VECTOR_TYPES.
(find_type_suffix_for_scalar_type): Check positively for vectors
rather than negatively for predicates.
(check_required_extensions): Handle PSTATE.SM and PSTATE.ZA
requirements.
(report_out_of_range): Handle the case where the minimum and
maximum are the same.
(function_instance::reads_global_state_p): Return true for functions
that read ZA.
(function_instance::modifies_global_state_p): Return true for functions
that write to ZA.
(sve_switcher::sve_switcher): Add a base flags argument.
(function_builder::get_name): Handle "__arm_" prefixes.
(add_attribute): Add an overload that takes a namespaces.
(add_shared_state_attribute): New function.
(function_builder::get_attributes): Take the required feature flags
as argument. Add streaming and ZA attributes where appropriate.
(function_builder::add_unique_function): Update calls accordingly.
(function_resolver::check_gp_argument): Assert that the predication
isn't ZA _m predication.
(function_checker::function_checker): Don't bias the argument
number for ZA _m predication.
(function_expander::get_contiguous_base): Add arguments that
specify the base argument number, the vnum argument number,
and an argument that indicates whether the vnum parameter is
a factor of the SME vector length or the prevailing vector length.
Handle the SME case.
(function_expander::add_input_operand): Handle pmode_register_operand.
(function_expander::add_integer_operand): Take a poly_int64.
(init_builtins): Call handle_arm_sme_h for LTO.
(handle_arm_sve_h): Skip SME intrinsics.
(handle_arm_sme_h): New function.
* config/aarch64/aarch64-sve-builtins-functions.h
(read_write_za, write_za): New classes.
(unspec_based_sme_function, za_arith_function): New using aliases.
(quiet_za_arith_function): Likewise.
* config/aarch64/aarch64-sve-builtins-shapes.h
(binary_za_int_m, binary_za_m, binary_za_uint_m, bool_inherent)
(inherent_za, inherent_mask_za, ldr_za, load_za, read_za_m, store_za)
(str_za, unary_za_m, write_za_m): Declare.
* config/aarch64/aarch64-sve-builtins-shapes.cc (apply_predication):
Expect za_m functions to have an existing governing predicate.
(binary_za_m_base, binary_za_int_m_def, binary_za_m_def): New classes.
(binary_za_uint_m_def, bool_inherent_def, inherent_za_def): Likewise.
(inherent_mask_za_def, ldr_za_def, load_za_def, read_za_m_def)
(store_za_def, str_za_def, unary_za_m_def, write_za_m_def): Likewise.
* config/aarch64/arm_sme.h: New file.
* config/aarch64/aarch64-sve-builtins-sme.h: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.cc: Likewise.
* config/aarch64/aarch64-sve-builtins-sme.def: Likewise.
* config/aarch64/t-aarch64 (aarch64-sve-builtins.o): Depend on
aarch64-sve-builtins-sme.def and aarch64-sve-builtins-sme.h.
(aarch64-sve-builtins-sme.o): New rule.
gcc/testsuite/
* lib/target-supports.exp: Add sme and sme-i16i64 features.
* gcc.target/aarch64/pragma_cpp_predefs_4.c: Test __ARM_FEATURE_SME*
macros.
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Allow functions
to be marked as __arm_streaming, __arm_streaming_compatible, and
__arm_inout("za").
* g++.target/aarch64/sve/acle/general-c++/func_redef_4.c: Mark the
function as __arm_streaming_compatible.
* g++.target/aarch64/sve/acle/general-c++/func_redef_5.c: Likewise.
* g++.target/aarch64/sve/acle/general-c++/func_redef_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_4.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/func_redef_5.c: Likewise.
* g++.target/aarch64/sme/aarch64-sme-acle-asm.exp: New test harness.
* gcc.target/aarch64/sme/aarch64-sme-acle-asm.exp: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_int_m_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_m_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/binary_za_uint_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/read_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/unary_za_m_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general-c/write_za_m_1.c: Likewise.
|
|
The LLVM 13.0.1 assembler ('llvm-mc') indeed still does complain in presence of
multiple '-mcpu=[...]' options:
as: for the --mcpu option: may only occur zero or one times!
However, as of
"GCN: Tag '-march=[...]', '-mtune=[...]' as 'Negative' of themselves [PR112669]",
the GCC-side special handling is no longer necessary.
gcc/
* config.gcc <amdgcn-*-amdhsa> (extra_gcc_objs): Don't set.
* config/gcn/driver-gcn.cc: Remove.
* config/gcn/gcn-hsa.h (ASM_SPEC, EXTRA_SPEC_FUNCTIONS): Remove
'last_arg' spec function.
* config/gcn/t-gcn-hsa (driver-gcn.o): Remove.
|
|
We need the multilib paths in gcc to find e.g. glibc crt files on
Debian. This is essentially based on t-linux64 version.
gcc/ChangeLog:
* config/i386/t-gnu64: New file.
* config.gcc [x86_64-*-gnu*]: Add i386/t-gnu64 to
tmake_file.
|
|
As reported in the PR, mipsisa64r2-sde-elf doesn't build because HEAP_TRAMPOLINES_INIT
macro isn't defined anywhere.
It is normally defined by
# Figure out if we need to enable heap trampolines by default
case ${target} in
*-*-darwin2*)
# Currently, we do this for macOS 11 and above.
tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=1"
;;
*)
tm_defines="$tm_defines HEAP_TRAMPOLINES_INIT=0"
;;
esac
in config.gcc, but mips*-sde-elf* is the only target which overwrites
tm_defines shell variable rather than just appending to it (or in one case
prepending), all other targets append something to it, including other
mips* triplets.
I believe (just from looking at config.gcc) that the difference is that
LIBC_GLIBC=1 LIBC_UCLIBC=2 LIBC_BIONIC=3 LIBC_MUSL=4 HEAP_TRAMPOLINES_INIT=0
isn't defined without the patch and is with the patch.
I think defining those first 4 shouldn't cause any harm and defining the
last one is required for it to actually build at all.
2023-11-27 Jakub Jelinek <jakub@redhat.com>
PR target/112300
* config.gcc (mips*-sde-elf*): Append to tm_defines rather than
overwriting them.
|
|
Along with RV32E, RV64E is ratified. Though ILP32E and LP64E ABIs are
still draft, it's worth supporting it.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc
(riscv_ext_version_table): Set version to ratified 2.0.
(riscv_subset_list::parse_std_ext): Allow RV64E.
* config.gcc: Parse base ISA 'rv64e' and ABI 'lp64e'.
* config/riscv/arch-canonicalize: Parse base ISA 'rv64e'.
* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
Define different macro per XLEN. Add handling for ABI_LP64E.
* config/riscv/riscv-d.cc (riscv_d_handle_target_float_abi):
Add handling for ABI_LP64E.
* config/riscv/riscv-opts.h (enum riscv_abi_type): Add ABI_LP64E.
* config/riscv/riscv.cc (riscv_option_override): Enhance error
handling to support RV64E and LP64E.
(riscv_conditional_register_usage): Change "RV32E" in a comment
to "RV32E/RV64E".
* config/riscv/riscv.h
(UNITS_PER_FP_ARG): Add handling for ABI_LP64E.
(STACK_BOUNDARY): Ditto.
(ABI_STACK_BOUNDARY): Ditto.
(MAX_ARGS_IN_REGISTERS): Ditto.
(ABI_SPEC): Add support for "lp64e".
* config/riscv/riscv.opt: Parse -mabi=lp64e as ABI_LP64E.
* doc/invoke.texi: Add documentation of the LP64E ABI.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/predef-1.c: Test for __riscv_64e.
* gcc.target/riscv/predef-2.c: Ditto.
* gcc.target/riscv/predef-3.c: Ditto.
* gcc.target/riscv/predef-4.c: Ditto.
* gcc.target/riscv/predef-5.c: Ditto.
* gcc.target/riscv/predef-6.c: Ditto.
* gcc.target/riscv/predef-7.c: Ditto.
* gcc.target/riscv/predef-8.c: Ditto.
* gcc.target/riscv/predef-9.c: New test for RV64E and LP64E,
based on predef-7.c.
|
|
Now that we are finally able to use the kernel provided bpf_helpers.h
file and associated machinery, there is no longer need to distribute
our own version.
This patch removes bpf-helpers.h and deletes most of the associated
tests from the gcc.target/bpf testsuite. Two tests are adapted and
retained: one testing the kernel_helper attribute, which is still
useful, and the other making sure that proper constant propagation is
performed with -O2, which is necessary to use the helpers defined as
static pointers in the kernel's bpf_helpers.h.
Regtested in target bpf-unknown-none and host x86_64-linux-gnu.
gcc/ChangeLog
* config/bpf/bpf-helpers.h: Remove.
* config.gcc: Adapt accordingly.
gcc/testsuite/ChangeLog
* gcc.target/bpf/helper-bind.c: Do not include bpf-helpers.h.
* gcc.target/bpf/helper-skb-ancestor-cgroup-id.c: Likewise, and
renamed from skb-ancestor-cgroup-id.c.
* gcc.target/bpf/helper-bpf-redirect.c: Remove.
* gcc.target/bpf/helper-clone-redirect.c: Likewise.
* gcc.target/bpf/helper-csum-diff.c: Likewise.
* gcc.target/bpf/helper-csum-update.c: Likewise.
* gcc.target/bpf/helper-current-task-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-fib-lookup.c: Likewise.
* gcc.target/bpf/helper-get-cgroup-classid.c: Likewise.
* gcc.target/bpf/helper-get-current-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-get-current-comm.c: Likewise.
* gcc.target/bpf/helper-get-current-pid-tgid.c: Likewise.
* gcc.target/bpf/helper-get-current-task.c: Likewise.
* gcc.target/bpf/helper-get-current-uid-gid.c: Likewise.
* gcc.target/bpf/helper-get-hash-recalc.c: Likewise.
* gcc.target/bpf/helper-get-listener-sock.c: Likewise.
* gcc.target/bpf/helper-get-local-storage.c: Likewise.
* gcc.target/bpf/helper-get-numa-node-id.c: Likewise.
* gcc.target/bpf/helper-get-prandom-u32.c: Likewise.
* gcc.target/bpf/helper-get-route-realm.c: Likewise.
* gcc.target/bpf/helper-get-smp-processor-id.c: Likewise.
* gcc.target/bpf/helper-get-socket-cookie.c: Likewise.
* gcc.target/bpf/helper-get-socket-uid.c: Likewise.
* gcc.target/bpf/helper-get-stack.c: Likewise.
* gcc.target/bpf/helper-get-stackid.c: Likewise.
* gcc.target/bpf/helper-getsockopt.c: Likewise.
* gcc.target/bpf/helper-ktime-get-ns.c: Likewise.
* gcc.target/bpf/helper-l3-csum-replace.c: Likewise.
* gcc.target/bpf/helper-l4-csum-replace.c: Likewise.
* gcc.target/bpf/helper-lwt-push-encap.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-action.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-adjust-srh.c: Likewise.
* gcc.target/bpf/helper-lwt-seg6-store-bytes.c: Likewise.
* gcc.target/bpf/helper-map-delete-elem.c: Likewise.
* gcc.target/bpf/helper-map-lookup-elem.c: Likewise.
* gcc.target/bpf/helper-map-peek-elem.c: Likewise.
* gcc.target/bpf/helper-map-pop-elem.c: Likewise.
* gcc.target/bpf/helper-map-push-elem.c: Likewise.
* gcc.target/bpf/helper-map-update-elem.c: Likewise.
* gcc.target/bpf/helper-msg-apply-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-cork-bytes.c: Likewise.
* gcc.target/bpf/helper-msg-pop-data.c: Likewise.
* gcc.target/bpf/helper-msg-pull-data.c: Likewise.
* gcc.target/bpf/helper-msg-push-data.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-msg-redirect-map.c: Likewise.
* gcc.target/bpf/helper-override-return.c: Likewise.
* gcc.target/bpf/helper-perf-event-output.c: Likewise.
* gcc.target/bpf/helper-perf-event-read-value.c: Likewise.
* gcc.target/bpf/helper-perf-event-read.c: Likewise.
* gcc.target/bpf/helper-perf-prog-read-value.c: Likewise.
* gcc.target/bpf/helper-probe-read-str.c: Likewise.
* gcc.target/bpf/helper-probe-read.c: Likewise.
* gcc.target/bpf/helper-probe-write-user.c: Likewise.
* gcc.target/bpf/helper-rc-keydown.c: Likewise.
* gcc.target/bpf/helper-rc-pointer-rel.c: Likewise.
* gcc.target/bpf/helper-rc-repeat.c: Likewise.
* gcc.target/bpf/helper-redirect-map.c: Likewise.
* gcc.target/bpf/helper-set-hash-invalid.c: Likewise.
* gcc.target/bpf/helper-set-hash.c: Likewise.
* gcc.target/bpf/helper-setsockopt.c: Likewise.
* gcc.target/bpf/helper-sk-fullsock.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-sk-lookup-upd.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-hash.c: Likewise.
* gcc.target/bpf/helper-sk-redirect-map.c: Likewise.
* gcc.target/bpf/helper-sk-release.c: Likewise.
* gcc.target/bpf/helper-sk-select-reuseport.c: Likewise.
* gcc.target/bpf/helper-sk-storage-delete.c: Likewise.
* gcc.target/bpf/helper-sk-storage-get.c: Likewise.
* gcc.target/bpf/helper-skb-adjust-room.c: Likewise.
* gcc.target/bpf/helper-skb-cgroup-id.c: Likewise.
* gcc.target/bpf/helper-skb-change-head.c: Likewise.
* gcc.target/bpf/helper-skb-change-proto.c: Likewise.
* gcc.target/bpf/helper-skb-change-tail.c: Likewise.
* gcc.target/bpf/helper-skb-change-type.c: Likewise.
* gcc.target/bpf/helper-skb-ecn-set-ce.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-get-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-get-xfrm-state.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes-relative.c: Likewise.
* gcc.target/bpf/helper-skb-load-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-pull-data.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-key.c: Likewise.
* gcc.target/bpf/helper-skb-set-tunnel-opt.c: Likewise.
* gcc.target/bpf/helper-skb-store-bytes.c: Likewise.
* gcc.target/bpf/helper-skb-under-cgroup.c: Likewise.
* gcc.target/bpf/helper-skb-vlan-pop.c: Likewise.
* gcc.target/bpf/helper-skb-vlan-push.c: Likewise.
* gcc.target/bpf/helper-skc-lookup-tcp.c: Likewise.
* gcc.target/bpf/helper-sock-hash-update.c: Likewise.
* gcc.target/bpf/helper-sock-map-update.c: Likewise.
* gcc.target/bpf/helper-sock-ops-cb-flags-set.c: Likewise.
* gcc.target/bpf/helper-spin-lock.c: Likewise.
* gcc.target/bpf/helper-spin-unlock.c: Likewise.
* gcc.target/bpf/helper-strtol.c: Likewise.
* gcc.target/bpf/helper-strtoul.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-current-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-name.c: Likewise.
* gcc.target/bpf/helper-sysctl-get-new-value.c: Likewise.
* gcc.target/bpf/helper-sysctl-set-new-value.c: Likewise.
* gcc.target/bpf/helper-tail-call.c: Likewise.
* gcc.target/bpf/helper-tcp-check-syncookie.c: Likewise.
* gcc.target/bpf/helper-tcp-sock.c: Likewise.
* gcc.target/bpf/helper-trace-printk.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-head.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-meta.c: Likewise.
* gcc.target/bpf/helper-xdp-adjust-tail.c: Likewise.
* gcc.target/bpf/skb-ancestor-cgroup-id.c: Likewise.
|
|
Define ISA_BASE_LA64V110, which represents the base instruction set defined in LoongArch1.1.
Support the configure setting --with-arch =la664, and support -march=la664,-mtune=la664.
gcc/ChangeLog:
* config.gcc: Support LA664.
* config/loongarch/genopts/loongarch-strings: Likewise.
* config/loongarch/genopts/loongarch.opt.in: Likewise.
* config/loongarch/loongarch-cpu.cc (fill_native_cpu_config): Likewise.
* config/loongarch/loongarch-def.c: Likewise.
* config/loongarch/loongarch-def.h (N_ISA_BASE_TYPES): Likewise.
(ISA_BASE_LA64V110): Define macro.
(N_ARCH_TYPES): Update value.
(N_TUNE_TYPES): Update value.
(CPU_LA664): New macro.
* config/loongarch/loongarch-opts.cc (isa_default_abi): Likewise.
(isa_base_compat_p): Likewise.
* config/loongarch/loongarch-opts.h (TARGET_64BIT): This parameter is enabled
when la_target.isa.base is equal to ISA_BASE_LA64V100 or ISA_BASE_LA64V110.
(TARGET_uARCH_LA664): Define macro.
* config/loongarch/loongarch-str.h (STR_CPU_LA664): Likewise.
* config/loongarch/loongarch.cc (loongarch_cpu_sched_reassociation_width):
Add LA664 support.
* config/loongarch/loongarch.opt: Regenerate.
|
|
The target attribute which proposed in [1], target attribute allow user
to specify a local setting per-function basis.
The syntax of target attribute is `__attribute__((target("<ATTR-STRING>")))`.
and the syntax of `<ATTR-STRING>` describes below:
```
ATTR-STRING := ATTR-STRING ';' ATTR
| ATTR
ATTR := ARCH-ATTR
| CPU-ATTR
| TUNE-ATTR
ARCH-ATTR := 'arch=' EXTENSIONS-OR-FULLARCH
EXTENSIONS-OR-FULLARCH := <EXTENSIONS>
| <FULLARCHSTR>
EXTENSIONS := <EXTENSION> ',' <EXTENSIONS>
| <EXTENSION>
FULLARCHSTR := <full-arch-string>
EXTENSION := <OP> <EXTENSION-NAME> <VERSION>
OP := '+'
VERSION := [0-9]+ 'p' [0-9]+
| [1-9][0-9]*
|
EXTENSION-NAME := Naming rule is defined in RISC-V ISA manual
CPU-ATTR := 'cpu=' <valid-cpu-name>
TUNE-ATTR := 'tune=' <valid-tune-name>
```
Changes since v1:
- Use std::unique_ptr rather than alloca to prevent memory issue.
- Error rather than warning when attribute duplicated.
[1] https://github.com/riscv-non-isa/riscv-c-api-doc/pull/35
gcc/ChangeLog:
* config.gcc (riscv): Add riscv-target-attr.o.
* config/riscv/riscv-protos.h (riscv_declare_function_size) New.
(riscv_option_valid_attribute_p): New.
(riscv_override_options_internal): New.
(struct riscv_tune_info): New.
(riscv_parse_tune): New.
* config/riscv/riscv-target-attr.cc
(class riscv_target_attr_parser): New.
(struct riscv_attribute_info): New.
(riscv_attributes): New.
(riscv_target_attr_parser::parse_arch): New.
(riscv_target_attr_parser::handle_arch): New.
(riscv_target_attr_parser::handle_cpu): New.
(riscv_target_attr_parser::handle_tune): New.
(riscv_target_attr_parser::update_settings): New.
(riscv_process_one_target_attr): New.
(num_occurences_in_str): New.
(riscv_process_target_attr): New.
(riscv_option_valid_attribute_p): New.
* config/riscv/riscv.cc: Include target-globals.h and
riscv-subset.h.
(struct riscv_tune_info): Move to riscv-protos.h.
(get_tune_str): New.
(riscv_parse_tune): New parameter null_p.
(riscv_declare_function_size): New.
(riscv_option_override): Build target_option_default_node and
target_option_current_node.
(riscv_save_restore_target_globals): New.
(riscv_option_restore): New.
(riscv_previous_fndecl): New.
(riscv_set_current_function): Apply the target attribute.
(TARGET_OPTION_RESTORE): Define.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Ditto.
* config/riscv/riscv.h (SWITCHABLE_TARGET): Define to 1.
(ASM_DECLARE_FUNCTION_SIZE) Define.
* config/riscv/riscv.opt (mtune=): Add Save attribute.
(mcpu=): Ditto.
(mcmodel=): Ditto.
* config/riscv/t-riscv: Add build rule for riscv-target-attr.o
* doc/extend.texi: Add doc for target attribute.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-01.c: New.
* gcc.target/riscv/target-attr-02.c: Ditto.
* gcc.target/riscv/target-attr-03.c: Ditto.
* gcc.target/riscv/target-attr-04.c: Ditto.
* gcc.target/riscv/target-attr-05.c: Ditto.
* gcc.target/riscv/target-attr-06.c: Ditto.
* gcc.target/riscv/target-attr-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-01.c: Ditto.
* gcc.target/riscv/target-attr-bad-02.c: Ditto.
* gcc.target/riscv/target-attr-bad-03.c: Ditto.
* gcc.target/riscv/target-attr-bad-04.c: Ditto.
* gcc.target/riscv/target-attr-bad-05.c: Ditto.
* gcc.target/riscv/target-attr-bad-06.c: Ditto.
* gcc.target/riscv/target-attr-bad-07.c: Ditto.
* gcc.target/riscv/target-attr-bad-08.c: Ditto.
* gcc.target/riscv/target-attr-bad-09.c: Ditto.
* gcc.target/riscv/target-attr-bad-10.c: Ditto.
Reviewed-by: Christoph Müllner <christoph.muellner@vrull.eu>
|
|
This adds GTY markers to s390_builtin_types, s390_builtin_fn_types,
and s390_builtin_decls. These were missing causing problems in
particular when using builtins after including a precompiled header.
Unfortunately the declaration of these data structures use enum values
from s390-builtins.h. This file however is not included everywhere
and is rather large. In order to include it only for the purpose of
gtype-desc.cc we place a preprocessed copy of it in the build
directory and include only this.
This is going to be backported to GCC 12 and 13.
gcc/ChangeLog:
* config.gcc: Add s390-gen-builtins.h to target_gtfiles.
* config/s390/s390-builtins.h (s390_builtin_types)
(s390_builtin_fn_types, s390_builtin_decls): Add GTY marker.
* config/s390/t-s390 (EXTRA_GTYPE_DEPS): Add s390-gen-builtins.h.
Add build rule for s390-gen-builtins.h.
|
|
Enable -march/-mtune=yongfeng. Costs and tunings are set according
to the characteristics of the processor. Add a new .md file to describe
yongfeng processor.
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize yongfeng.
* common/config/i386/i386-common.cc: Add yongfeng.
* common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
Add ZHAOXIN_FAM7H_YONGFENG.
* config.gcc: Add yongfeng.
* config/i386/driver-i386.cc (host_detect_local_cpu):
Let -march=native recognize yongfeng processors.
* config/i386/i386-c.cc (ix86_target_macros_internal): Add yongfeng.
* config/i386/i386-options.cc (m_YONGFENG): New definition.
(m_ZHAOXIN): Ditto.
* config/i386/i386.h (enum processor_type): Add PROCESSOR_YONGFENG.
* config/i386/i386.md: Add yongfeng.
* config/i386/lujiazui.md: Fix typo.
* config/i386/x86-tune-costs.h (struct processor_costs):
Add yongfeng costs.
* config/i386/x86-tune-sched.cc (ix86_issue_rate): Add yongfeng.
(ix86_adjust_cost): Ditto.
* config/i386/x86-tune.def (X86_TUNE_SCHEDULE): Replace
m_LUJIAZUI with m_ZHAOXIN.
(X86_TUNE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Ditto.
(X86_TUNE_MOVX): Ditto.
(X86_TUNE_MEMORY_MISMATCH_STALL): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_32): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_64): Ditto.
(X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS): Ditto.
(X86_TUNE_FUSE_ALU_AND_BRANCH): Ditto.
(X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Ditto.
(X86_TUNE_USE_LEAVE): Ditto.
(X86_TUNE_PUSH_MEMORY): Ditto.
(X86_TUNE_LCP_STALL): Ditto.
(X86_TUNE_INTEGER_DFMODE_MOVES): Ditto.
(X86_TUNE_OPT_AGU): Ditto.
(X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Ditto.
(X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Ditto.
(X86_TUNE_USE_SAHF): Ditto.
(X86_TUNE_USE_BT): Ditto.
(X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Ditto.
(X86_TUNE_ONE_IF_CONV_INSN): Ditto.
(X86_TUNE_AVOID_MFENCE): Ditto.
(X86_TUNE_EXPAND_ABS): Ditto.
(X86_TUNE_USE_SIMODE_FIOP): Ditto.
(X86_TUNE_USE_FFREEP): Ditto.
(X86_TUNE_EXT_80387_CONSTANTS): Ditto.
(X86_TUNE_SSE_UNALIGNED_LOAD_OPTIMAL): Ditto.
(X86_TUNE_SSE_UNALIGNED_STORE_OPTIMAL): Ditto.
(X86_TUNE_SSE_TYPELESS_STORES): Ditto.
(X86_TUNE_SSE_LOAD0_BY_PXOR): Ditto.
(X86_TUNE_USE_GATHER_2PARTS): Add m_YONGFENG.
(X86_TUNE_USE_GATHER_4PARTS): Ditto.
(X86_TUNE_USE_GATHER_8PARTS): Ditto.
(X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
* doc/extend.texi: Add details about yongfeng.
* doc/invoke.texi: Ditto.
* config/i386/yongfeng.md: New file to describe yongfeng processor.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv32.C: Handle new -march.
* gcc.target/i386/funcspec-56.inc: Ditto.
|
|
This patch addresses the redundant AVL/VL toggling in RVV partial auto-vectorization
which is a known issue for a long time and I finally find the time to address it.
Consider a simple vector addition operation:
https://godbolt.org/z/7hfGfEjW3
void
foo (int *__restrict a,
int *__restrict b,
int *__restrict n)
{
for (int i = 0; i < n; i++)
a[i] = a[i] + b[i];
}
Optimized IR:
Loop body:
_38 = .SELECT_VL (ivtmp_36, POLY_INT_CST [4, 4]); -> vsetvli a5,a2,e8,mf4,ta,ma
...
vect__4.8_27 = .MASK_LEN_LOAD (vectp_a.6_29, 32B, { -1, ... }, _38, 0); -> vle32.v v2,0(a0)
vect__6.11_20 = .MASK_LEN_LOAD (vectp_b.9_25, 32B, { -1, ... }, _38, 0); -> vle32.v v1,0(a1)
vect__7.12_19 = vect__6.11_20 + vect__4.8_27; -> vsetvli a6,zero,e32,m1,ta,ma + vadd.vv v1,v1,v2
.MASK_LEN_STORE (vectp_a.13_11, 32B, { -1, ... }, _38, 0, vect__7.12_19); -> vsetvli zero,a5,e32,m1,ta,ma + vse32.v v1,0(a4)
We can see 2 redundant vsetvls inside the loop body due to AVL/VL toggling.
The AVL/VL toggling is because we are missing LEN information in simple PLUS_EXPR GIMPLE assignment:
vect__7.12_19 = vect__6.11_20 + vect__4.8_27;
GCC apply partial predicate load/store and un-predicated full vector operation on partial vectorization.
Such flow are used by all other targets like ARM SVE (RVV also uses such flow):
ARM SVE:
.L3:
ld1w z30.s, p7/z, [x0, x3, lsl 2] -> predicated load
ld1w z31.s, p7/z, [x1, x3, lsl 2] -> predicated load
add z31.s, z31.s, z30.s -> un-predicated add
st1w z31.s, p7, [x0, x3, lsl 2] -> predicated store
Such vectorization flow causes AVL/VL toggling on RVV so we need AVL propagation PASS for it.
Also, It's very unlikely that we can apply predicated operations on all vectorization for following reasons:
1. It's very heavy workload to support them on all vectorization and we don't see any benefits if we can handle that on targets backend.
2. Changing Loop vectorizer for it will make code base ugly and hard to maintain.
3. We will need so many patterns for all operations. Not only COND_LEN_ADD, COND_LEN_SUB, ....
We also need COND_LEN_EXTEND, ...., COND_LEN_CEIL, ... .. over 100+ patterns, unreasonable number of patterns.
To conclude, we prefer un-predicated operations here, and design a nice and clean AVL propagation PASS for it to elide the redundant vsetvls
due to AVL/VL toggling.
The second question is that why we separate a PASS called AVL propagation. Why not optimize it in VSETVL PASS (We definitetly can optimize AVL in VSETVL PASS)
Frankly, I was planning to address such issue in VSETVL PASS that's why we recently refactored VSETVL PASS. However, I changed my mind recently after several
experiments and tries.
The reasons as follows:
1. For code base management and maintainience. Current VSETVL PASS is complicated enough and aleady has enough aggressive and fancy optimizations which
turns out it can always generate optimal codegen in most of the cases. It's not a good idea keep adding more features into VSETVL PASS to make VSETVL
PASS become heavy and heavy again, then we will need to refactor it again in the future.
Actuall, the VSETVL PASS is very stable and optimal after the recent refactoring. Hopefully, we should not change VSETVL PASS any more except the minor
fixes.
2. vsetvl insertion (VSETVL PASS does this thing) and AVL propagation are 2 different things, I don't think we should fuse them into same PASS.
3. VSETVL PASS is an post-RA PASS, wheras AVL propagtion should be done before RA which can reduce register allocation.
4. This patch's AVL propagation PASS only does AVL propagation for RVV partial auto-vectorization situations.
This patch's codes are only hundreds lines which is very managable and can be very easily extended features and enhancements.
We can easily extend and enhance more AVL propagation in a clean and separate PASS in the future. (If we do it on VSETVL PASS, we will complicate
VSETVL PASS again which is already so complicated.)
Here is an example to demonstrate more:
https://godbolt.org/z/bE86sv3q5
void foo2 (int *__restrict a,
int *__restrict b,
int *__restrict c,
int *__restrict a2,
int *__restrict b2,
int *__restrict c2,
int *__restrict a3,
int *__restrict b3,
int *__restrict c3,
int *__restrict a4,
int *__restrict b4,
int *__restrict c4,
int *__restrict a5,
int *__restrict b5,
int *__restrict c5,
int n)
{
for (int i = 0; i < n; i++){
a[i] = b[i] + c[i];
b5[i] = b[i] + c[i];
a2[i] = b2[i] + c2[i];
a3[i] = b3[i] + c3[i];
a4[i] = b4[i] + c4[i];
a5[i] = a[i] + a4[i];
a[i] = a5[i] + b5[i]+ a[i];
a[i] = a[i] + c[i];
b5[i] = a[i] + c[i];
a2[i] = a[i] + c2[i];
a3[i] = a[i] + c3[i];
a4[i] = a[i] + c4[i];
a5[i] = a[i] + a4[i];
a[i] = a[i] + b5[i]+ a[i];
}
}
1. Loop Body:
Before this patch: After this patch:
vsetvli a4,t1,e8,mf4,ta,ma vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2) vle32.v v2,0(a2)
vle32.v v4,0(a1) vle32.v v3,0(t2)
vle32.v v1,0(t2) vle32.v v4,0(a1)
vsetvli a7,zero,e32,m1,ta,ma vle32.v v1,0(t0)
vadd.vv v4,v2,v4 vadd.vv v4,v2,v4
vsetvli zero,a4,e32,m1,ta,ma vadd.vv v1,v3,v1
vle32.v v3,0(s0) vadd.vv v1,v1,v4
vsetvli a7,zero,e32,m1,ta,ma vadd.vv v1,v1,v4
vadd.vv v1,v3,v1 vadd.vv v1,v1,v4
vadd.vv v1,v1,v4 vadd.vv v1,v1,v2
vadd.vv v1,v1,v4 vadd.vv v2,v1,v2
vadd.vv v1,v1,v4 vse32.v v2,0(t5)
vsetvli zero,a4,e32,m1,ta,ma vadd.vv v2,v2,v1
vle32.v v4,0(a5) vadd.vv v2,v2,v1
vsetvli a7,zero,e32,m1,ta,ma slli a7,a4,2
vadd.vv v1,v1,v2 vadd.vv v3,v1,v3
vadd.vv v2,v1,v2 vle32.v v5,0(a5)
vadd.vv v4,v1,v4 vle32.v v6,0(t6)
vsetvli zero,a4,e32,m1,ta,ma vse32.v v3,0(t3)
vse32.v v2,0(t5) vse32.v v2,0(a0)
vse32.v v4,0(a3) vadd.vv v3,v3,v1
vsetvli a7,zero,e32,m1,ta,ma vadd.vv v2,v1,v5
vadd.vv v3,v1,v3 vse32.v v3,0(t4)
vadd.vv v2,v2,v1 vadd.vv v1,v1,v6
vadd.vv v2,v2,v1 vse32.v v2,0(a3)
vsetvli zero,a4,e32,m1,ta,ma vse32.v v1,0(a6)
vse32.v v2,0(a0)
vse32.v v3,0(t3)
vle32.v v2,0(t0)
vsetvli a7,zero,e32,m1,ta,ma
vadd.vv v3,v3,v1
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v3,0(t4)
vsetvli a7,zero,e32,m1,ta,ma
slli a7,a4,2
vadd.vv v1,v1,v2
sub t1,t1,a4
vsetvli zero,a4,e32,m1,ta,ma
vse32.v v1,0(a6)
It's quite obvious, all heavy && redundant vsetvls inside loop body are eliminated.
2. Epilogue:
Before this patch: After this patch:
.L5: .L5:
ld s0,8(sp) ret
addi sp,sp,16
jr ra
This is the benefit we do the AVL propation before RA since we eliminate the use of 'a7' register
which is used by the redudant AVL/VL toggling instruction: 'vsetvli a7,zero,e32,m1,ta,ma'
The final codegen after this patch:
foo2:
lw t1,56(sp)
ld t6,0(sp)
ld t3,8(sp)
ld t0,16(sp)
ld t2,24(sp)
ld t4,32(sp)
ld t5,40(sp)
ble t1,zero,.L5
.L3:
vsetvli a4,t1,e32,m1,ta,ma
vle32.v v2,0(a2)
vle32.v v3,0(t2)
vle32.v v4,0(a1)
vle32.v v1,0(t0)
vadd.vv v4,v2,v4
vadd.vv v1,v3,v1
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v4
vadd.vv v1,v1,v2
vadd.vv v2,v1,v2
vse32.v v2,0(t5)
vadd.vv v2,v2,v1
vadd.vv v2,v2,v1
slli a7,a4,2
vadd.vv v3,v1,v3
vle32.v v5,0(a5)
vle32.v v6,0(t6)
vse32.v v3,0(t3)
vse32.v v2,0(a0)
vadd.vv v3,v3,v1
vadd.vv v2,v1,v5
vse32.v v3,0(t4)
vadd.vv v1,v1,v6
vse32.v v2,0(a3)
vse32.v v1,0(a6)
sub t1,t1,a4
add a1,a1,a7
add a2,a2,a7
add a5,a5,a7
add t6,t6,a7
add t0,t0,a7
add t2,t2,a7
add t5,t5,a7
add a3,a3,a7
add a6,a6,a7
add t3,t3,a7
add t4,t4,a7
add a0,a0,a7
bne t1,zero,.L3
.L5:
ret
PR target/111318
PR target/111888
gcc/ChangeLog:
* config.gcc: Add AVL propagation pass.
* config/riscv/riscv-passes.def (INSERT_PASS_AFTER): Ditto.
* config/riscv/riscv-protos.h (make_pass_avlprop): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-avlprop.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/select_vl-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr111318.c: New test.
* gcc.target/riscv/rvv/autovec/pr111888.c: New test.
Tested-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Currently, the sed command used to parse --with-{cpu,tune,arch} are
using GNU-specific extension (automatically recognising extended regex).
This is failing on Darwin, which defualts to Posix behaviour.
However '-E' is accepted to indicate an extended RE. Strictly, this
is also not really sufficient, since we should only require a Posix
sed.
gcc/ChangeLog:
* config.gcc: Use -E to to sed to indicate that we are using
extended REs.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
Enable -ftrampoline-impl=heap by default if we are on macOS 11
or later.
Co-Authored-By: Maxim Blinov <maxim.blinov@embecosm.com>
Co-Authored-By: Francois-Xavier Coudert <fxcoudert@gcc.gnu.org>
Co-Authored-By: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config.gcc: Default to heap trampolines on macOS 11 and above.
* config/i386/darwin.h: Define X86_CUSTOM_FUNCTION_TEST.
* config/i386/i386.h: Define X86_CUSTOM_FUNCTION_TEST.
* config/i386/i386.cc: Use X86_CUSTOM_FUNCTION_TEST.
|
|
Accept the architecture configure option and resolve build failures. This is
enough to build binaries, but I've not got a device to test it on, so there
are probably runtime issues to fix. The cache control instructions might be
unsafe (or too conservative), and the kernel metadata might be off. Vector
reductions will need to be reworked for RDNA2. In principle, it would be
better to use wavefrontsize32 for this architecture, but that would mean
switching everything to allow SImode masks, so wavefrontsize64 it is.
The multilib is not included in the default configuration so either configure
--with-arch=gfx1030 or include it in --with-multilib-list=gfx1030,....
The majority of this patch has no effect on other devices, but changing from
using scalar writes for the exit value to vector writes means we don't need
the scalar cache write-back instruction anywhere (which doesn't exist in RDNA2).
gcc/ChangeLog:
* config.gcc: Allow --with-arch=gfx1030.
* config/gcn/gcn-hsa.h (NO_XNACK): gfx1030 does not support xnack.
(ASM_SPEC): gfx1030 needs -mattr=+wavefrontsize64 set.
* config/gcn/gcn-opts.h (enum processor_type): Add PROCESSOR_GFX1030.
(TARGET_GFX1030): New.
(TARGET_RDNA2): New.
* config/gcn/gcn-valu.md (@dpp_move<mode>): Disable for RDNA2.
(addc<mode>3<exec_vcc>): Add RDNA2 syntax variant.
(subc<mode>3<exec_vcc>): Likewise.
(<convop><mode><vndi>2_exec): Add RDNA2 alternatives.
(vec_cmp<mode>di): Likewise.
(vec_cmp<u><mode>di): Likewise.
(vec_cmp<mode>di_exec): Likewise.
(vec_cmp<u><mode>di_exec): Likewise.
(vec_cmp<mode>di_dup): Likewise.
(vec_cmp<mode>di_dup_exec): Likewise.
(reduc_<reduc_op>_scal_<mode>): Disable for RDNA2.
(*<reduc_op>_dpp_shr_<mode>): Likewise.
(*plus_carry_dpp_shr_<mode>): Likewise.
(*plus_carry_in_dpp_shr_<mode>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Recognise gfx1030.
(gcn_global_address_p): RDNA2 only allows smaller offsets.
(gcn_addr_space_legitimate_address_p): Likewise.
(gcn_omp_device_kind_arch_isa): Recognise gfx1030.
(gcn_expand_epilogue): Use VGPRs instead of SGPRs.
(output_file_start): Configure gfx1030.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Add __RDNA2__;
(ASSEMBLER_DIALECT): New.
* config/gcn/gcn.md (rdna): New define_attr.
(enabled): Use "rdna" attribute.
(gcn_return): Remove s_dcache_wb.
(addcsi3_scalar): Add RDNA2 syntax variant.
(addcsi3_scalar_zero): Likewise.
(addptrdi3): Likewise.
(mulsi3): v_mul_lo_i32 should be v_mul_lo_u32 on all ISA.
(*memory_barrier): Add RDNA2 syntax variant.
(atomic_load<mode>): Add RDNA2 cache control variants, and disable
scalar atomics for RDNA2.
(atomic_store<mode>): Likewise.
(atomic_exchange<mode>): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx1030.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(main): Recognise -march=gfx1030.
* config/gcn/t-omp-device: Add gfx1030 isa.
libgcc/ChangeLog:
* config/gcn/amdgcn_veclib.h (CDNA3_PLUS): Set false for __RDNA2__.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (EF_AMDGPU_MACH_AMDGCN_GFX1030): New.
(isa_hsa_name): Recognise gfx1030.
(isa_code): Likewise.
* team.c (defined): Remove s_endpgm.
|
|
LLVM wants to remove it, which breaks our build. This patch means that
most users won't notice that change, when it comes, and those that do will
have chosen to enable Fiji explicitly.
I'm selecting gfx900 as the new default as that's the least likely for users
to want, which means most users will specify -march explicitly, which means
we'll be free to change the default again, when we need to, without breaking
anybody's makefiles.
gcc/ChangeLog:
* config.gcc (amdgcn): Switch default to --with-arch=gfx900.
Implement support for --with-multilib-list.
* config/gcn/t-gcn-hsa: Likewise.
* doc/install.texi: Likewise.
* doc/invoke.texi: Mark Fiji deprecated.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther
Lake.
* common/config/i386/i386-common.cc (processor_name):
Ditto.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_PANTHERLAKE.
* config.gcc: Add -march=pantherlake.
* config/i386/driver-i386.cc (host_detect_local_cpu): Refactor
the if clause. Handle pantherlake.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Handle pantherlake.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_PANTHERLAKE): New.
(m_CORE_HYBRID): Add pantherlake.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle Clearwater Forest.
* common/config/i386/i386-common.cc (processor_name):
Add Clearwater Forest.
(processor_alias_table): Ditto.
* common/config/i386/i386-cpuinfo.h (enum processor_types):
Add INTEL_CLEARWATERFOREST.
* config.gcc: Add -march=clearwaterforest.
* config/i386/driver-i386.cc (host_detect_local_cpu): Handle
clearwaterforest.
* config/i386/i386-c.cc (ix86_target_macros_internal): Ditto.
* config/i386/i386-options.cc (processor_cost_table): Ditto.
(m_CLEARWATERFOREST): New.
(m_CORE_ATOM): Add clearwaterforest.
* config/i386/i386.h (enum processor_type): Ditto.
* doc/extend.texi: Ditto.
* doc/invoke.texi: Ditto.
gcc/testsuite/ChangeLog:
* g++.target/i386/mv16.C: Ditto.
* gcc.target/i386/funcspec-56.inc: Handle new march.
|
|
gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Detect USER_MSR.
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_USER_MSR_SET): New.
(OPTION_MASK_ISA2_USER_MSR_UNSET): Ditto.
(ix86_handle_option): Handle -musermsr.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_USER_MSR.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for usermsr.
* config.gcc: Add usermsrintrin.h
* config/i386/cpuid.h (bit_USER_MSR): New.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (VOID, UINT64, UINT64).
* config/i386/i386-builtins.cc (ix86_init_mmx_sse_builtins):
Add __builtin_urdmsr and __builtin_uwrmsr.
* config/i386/i386-builtins.h (ix86_builtins):
Add IX86_BUILTIN_URDMSR and IX86_BUILTIN_UWRMSR.
* config/i386/i386-c.cc (ix86_target_macros_internal):
Define __USER_MSR__.
* config/i386/i386-expand.cc (ix86_expand_builtin):
Handle new builtins.
* config/i386/i386-isa.def (USER_MSR): Add DEF_PTA(USER_MSR).
* config/i386/i386-options.cc (ix86_valid_target_attribute_inner_p):
Handle usermsr.
* config/i386/i386.md (urdmsr): New define_insn.
(uwrmsr): Ditto.
* config/i386/i386.opt: Add option -musermsr.
* config/i386/x86gprintrin.h: Include usermsrintrin.h
* doc/extend.texi: Document usermsr.
* doc/invoke.texi: Document -musermsr.
* doc/sourcebuild.texi: Document target usermsr.
* config/i386/usermsrintrin.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/x86gprintrin-1.c: Add -musermsr for 64bit target.
* gcc.target/i386/x86gprintrin-2.c: Ditto.
* gcc.target/i386/x86gprintrin-3.c: Ditto.
* gcc.target/i386/x86gprintrin-4.c: Add musermsr for 64bit target.
* gcc.target/i386/x86gprintrin-5.c: Ditto
* gcc.target/i386/user_msr-1.c: New test.
* gcc.target/i386/user_msr-2.c: Ditto.
|
|
gcc/ChangeLog:
* config.gcc: Add loongarch-driver.h to tm_files.
* config/loongarch/loongarch.h: Do not include loongarch-driver.h.
* config/loongarch/t-loongarch: Append loongarch-multilib.h to $(GTM_H)
instead of $(TM_H) for building generator programs.
|
|
gcc/ChangeLog:
* config.gcc: Add avx512bitalgvlintrin.h.
* config/i386/avx5124fmapsintrin.h: Add evex512 target for 512 bit
intrins.
* config/i386/avx5124vnniwintrin.h: Ditto.
* config/i386/avx512bf16intrin.h: Ditto.
* config/i386/avx512bitalgintrin.h: Add evex512 target for 512 bit
intrins. Split 128/256 bit intrins to avx512bitalgvlintrin.h.
* config/i386/avx512erintrin.h: Add evex512 target for 512 bit
intrins
* config/i386/avx512ifmaintrin.h: Ditto
* config/i386/avx512pfintrin.h: Ditto
* config/i386/avx512vbmi2intrin.h: Ditto.
* config/i386/avx512vbmiintrin.h: Ditto.
* config/i386/avx512vnniintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vpopcntdqintrin.h: Ditto.
* config/i386/gfniintrin.h: Ditto.
* config/i386/immintrin.h: Add avx512bitalgvlintrin.h.
* config/i386/vaesintrin.h: Add evex512 target for 512 bit intrins.
* config/i386/vpclmulqdqintrin.h: Ditto.
* config/i386/avx512bitalgvlintrin.h: New.
|
|
gcc/ChangeLog:
* config.gcc (*linux*): Set rust target_objs, and
target_has_targetrustm,
* config/t-linux (linux-rust.o): New rule.
* config/linux-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (i[34567]86-*-mingw* | x86_64-*-mingw*): Set
rust_target_objs and target_has_targetrustm.
* config/t-winnt (winnt-rust.o): New rule.
* config/winnt-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-fuchsia): Set tmake_rule, rust_target_objs,
and target_has_targetrustm.
* config/fuchsia-rust.cc: New file.
* config/t-fuchsia: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-vxworks*): Set rust_target_objs and
target_has_targetrustm.
* config/t-vxworks (vxworks-rust.o): New rule.
* config/vxworks-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-dragonfly*): Set rust_target_objs and
target_has_targetrustm.
* config/t-dragonfly (dragonfly-rust.o): New rule.
* config/dragonfly-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-solaris2*): Set rust_target_objs and
target_has_targetrustm.
* config/t-sol2 (sol2-rust.o): New rule.
* config/sol2-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-openbsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-openbsd (openbsd-rust.o): New rule.
* config/openbsd-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-netbsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-netbsd (netbsd-rust.o): New rule.
* config/netbsd-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-freebsd*): Set rust_target_objs and
target_has_targetrustm.
* config/t-freebsd (freebsd-rust.o): New rule.
* config/freebsd-rust.cc: New file.
|
|
gcc/ChangeLog:
* config.gcc (*-*-darwin*): Set rust_target_objs and
target_has_targetrustm.
* config/t-darwin (darwin-rust.o): New rule.
* config/darwin-rust.cc: New file.
|
|
gcc/ChangeLog:
* Makefile.in (tm_rust_file_list, tm_rust_include_list, TM_RUST_H,
RUST_TARGET_DEF, RUST_TARGET_H, RUST_TARGET_OBJS): New variables.
(tm_rust.h, cs-tm_rust.h, default-rust.o,
rust/rust-target-hooks-def.h, s-rust-target-hooks-def-h): New rules.
(s-tm-texi): Also check timestamp on rust-target.def.
(generated_files): Add TM_RUST_H and rust-target-hooks-def.h.
(build/genhooks.o): Also depend on RUST_TARGET_DEF.
* config.gcc (tm_rust_file, rust_target_objs, target_has_targetrustm):
New variables.
* configure: Regenerate.
* configure.ac (tm_rust_file_list, tm_rust_include_list,
rust_target_objs): Add substitutes.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (targetrustm): Document.
(target_has_targetrustm): Document.
* genhooks.cc: Include rust/rust-target.def.
* config/default-rust.cc: New file.
gcc/rust/ChangeLog:
* rust-target-def.h: New file.
* rust-target.def: New file.
* rust-target.h: New file.
|
|
Library build options from --with-multilib-list used to be processed with
*self_spec, which missed the driver's initial canonicalization. This
caused limitations on CFLAGS override and the use of driver-only options
like -m[no]-lsx.
The problem is solved by promoting the injection rules of --with-multilib-list
options to the first element of DRIVER_SELF_SPECS, to make them execute before
the canonialization. The library-build options are also hard-coded in
the driver and can be used conveniently by the builders of other non-gcc
libraries via the use of -fmultiflags.
Bootstrapped and tested on loongarch64-linux-gnu.
ChangeLog:
* config-ml.in: Remove unneeded loongarch clause.
* configure.ac: Register custom makefile fragments mt-loongarch-*
for loongarch targets.
* configure: Regenerate.
config/ChangeLog:
* mt-loongarch-mlib: New file. Pass -fmultiflags when building
target libraries (FLAGS_FOR_TARGET).
* mt-loongarch-elf: New file.
* mt-loongarch-gnu: New file.
gcc/ChangeLog:
* config.gcc: Pass the default ABI via TM_MULTILIB_CONFIG.
* config/loongarch/loongarch-driver.h: Invoke MLIB_SELF_SPECS
before the driver canonicalization routines.
* config/loongarch/loongarch.h: Move definitions of CC1_SPEC etc.
to loongarch-driver.h
* config/loongarch/t-linux: Move multilib-related definitions to
t-multilib.
* config/loongarch/t-multilib: New file. Inject library build
options obtained from --with-multilib-list.
* config/loongarch/t-loongarch: Same.
|
|
This patch implements the expansion of the strlen builtin for RV32/RV64
for xlen-aligned aligned strings if Zbb or XTheadBb instructions are available.
The inserted sequences are:
rv32gc_zbb (RV64 is similar):
add a3,a0,4
li a4,-1
.L1: lw a5,0(a0)
add a0,a0,4
orc.b a5,a5
beq a5,a4,.L1
not a5,a5
ctz a5,a5
srl a5,a5,0x3
add a0,a0,a5
sub a0,a0,a3
rv64gc_xtheadbb (RV32 is similar):
add a4,a0,8
.L2: ld a5,0(a0)
add a0,a0,8
th.tstnbz a5,a5
beqz a5,.L2
th.rev a5,a5
th.ff1 a5,a5
srl a5,a5,0x3
add a0,a0,a5
sub a0,a0,a4
This allows to inline calls to strlen(), with optimized code for
xlen-aligned strings, resulting in the following benefits over
a call to libc:
* no call/ret instructions
* no stack frame allocation
* no register saving/restoring
* no alignment test
The inlining mechanism is gated by a new switch ('-minline-strlen')
and by the variable 'optimize_size'.
Tested using the glibc string tests.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
gcc/ChangeLog:
* config.gcc: Add new object riscv-string.o.
riscv-string.cc.
* config/riscv/riscv-protos.h (riscv_expand_strlen):
New function.
* config/riscv/riscv.md (strlen<mode>): New expand INSN.
* config/riscv/riscv.opt: New flag 'minline-strlen'.
* config/riscv/t-riscv: Add new object riscv-string.o.
* config/riscv/thead.md (th_rev<mode>2): Export INSN name.
(th_rev<mode>2): Likewise.
(th_tstnbz<mode>2): New INSN.
* doc/invoke.texi: Document '-minline-strlen'.
* emit-rtl.cc (emit_likely_jump_insn): New helper function.
(emit_unlikely_jump_insn): Likewise.
* rtl.h (emit_likely_jump_insn): New prototype.
(emit_unlikely_jump_insn): Likewise.
* config/riscv/riscv-string.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadbb-strlen-unaligned.c: New test.
* gcc.target/riscv/xtheadbb-strlen.c: New test.
* gcc.target/riscv/zbb-strlen-disabled-2.c: New test.
* gcc.target/riscv/zbb-strlen-disabled.c: New test.
* gcc.target/riscv/zbb-strlen-unaligned.c: New test.
* gcc.target/riscv/zbb-strlen.c: New test.
|
|
gcc/ChangeLog:
* config.gcc: remove non-POSIX syntax "<<<".
|
|
gcc/ChangeLog:
* config.gcc: Export the header file lasxintrin.h.
* config/loongarch/loongarch-builtins.cc (enum loongarch_builtin_type):
Add Loongson ASX builtin functions support.
(AVAIL_ALL): Ditto.
(LASX_BUILTIN): Ditto.
(LASX_NO_TARGET_BUILTIN): Ditto.
(LASX_BUILTIN_TEST_BRANCH): Ditto.
(CODE_FOR_lasx_xvsadd_b): Ditto.
(CODE_FOR_lasx_xvsadd_h): Ditto.
(CODE_FOR_lasx_xvsadd_w): Ditto.
(CODE_FOR_lasx_xvsadd_d): Ditto.
(CODE_FOR_lasx_xvsadd_bu): Ditto.
(CODE_FOR_lasx_xvsadd_hu): Ditto.
(CODE_FOR_lasx_xvsadd_wu): Ditto.
(CODE_FOR_lasx_xvsadd_du): Ditto.
(CODE_FOR_lasx_xvadd_b): Ditto.
(CODE_FOR_lasx_xvadd_h): Ditto.
(CODE_FOR_lasx_xvadd_w): Ditto.
(CODE_FOR_lasx_xvadd_d): Ditto.
(CODE_FOR_lasx_xvaddi_bu): Ditto.
(CODE_FOR_lasx_xvaddi_hu): Ditto.
(CODE_FOR_lasx_xvaddi_wu): Ditto.
(CODE_FOR_lasx_xvaddi_du): Ditto.
(CODE_FOR_lasx_xvand_v): Ditto.
(CODE_FOR_lasx_xvandi_b): Ditto.
(CODE_FOR_lasx_xvbitsel_v): Ditto.
(CODE_FOR_lasx_xvseqi_b): Ditto.
(CODE_FOR_lasx_xvseqi_h): Ditto.
(CODE_FOR_lasx_xvseqi_w): Ditto.
(CODE_FOR_lasx_xvseqi_d): Ditto.
(CODE_FOR_lasx_xvslti_b): Ditto.
(CODE_FOR_lasx_xvslti_h): Ditto.
(CODE_FOR_lasx_xvslti_w): Ditto.
(CODE_FOR_lasx_xvslti_d): Ditto.
(CODE_FOR_lasx_xvslti_bu): Ditto.
(CODE_FOR_lasx_xvslti_hu): Ditto.
(CODE_FOR_lasx_xvslti_wu): Ditto.
(CODE_FOR_lasx_xvslti_du): Ditto.
(CODE_FOR_lasx_xvslei_b): Ditto.
(CODE_FOR_lasx_xvslei_h): Ditto.
(CODE_FOR_lasx_xvslei_w): Ditto.
(CODE_FOR_lasx_xvslei_d): Ditto.
(CODE_FOR_lasx_xvslei_bu): Ditto.
(CODE_FOR_lasx_xvslei_hu): Ditto.
(CODE_FOR_lasx_xvslei_wu): Ditto.
(CODE_FOR_lasx_xvslei_du): Ditto.
(CODE_FOR_lasx_xvdiv_b): Ditto.
(CODE_FOR_lasx_xvdiv_h): Ditto.
(CODE_FOR_lasx_xvdiv_w): Ditto.
(CODE_FOR_lasx_xvdiv_d): Ditto.
(CODE_FOR_lasx_xvdiv_bu): Ditto.
(CODE_FOR_lasx_xvdiv_hu): Ditto.
(CODE_FOR_lasx_xvdiv_wu): Ditto.
(CODE_FOR_lasx_xvdiv_du): Ditto.
(CODE_FOR_lasx_xvfadd_s): Ditto.
(CODE_FOR_lasx_xvfadd_d): Ditto.
(CODE_FOR_lasx_xvftintrz_w_s): Ditto.
(CODE_FOR_lasx_xvftintrz_l_d): Ditto.
(CODE_FOR_lasx_xvftintrz_wu_s): Ditto.
(CODE_FOR_lasx_xvftintrz_lu_d): Ditto.
(CODE_FOR_lasx_xvffint_s_w): Ditto.
(CODE_FOR_lasx_xvffint_d_l): Ditto.
(CODE_FOR_lasx_xvffint_s_wu): Ditto.
(CODE_FOR_lasx_xvffint_d_lu): Ditto.
(CODE_FOR_lasx_xvfsub_s): Ditto.
(CODE_FOR_lasx_xvfsub_d): Ditto.
(CODE_FOR_lasx_xvfmul_s): Ditto.
(CODE_FOR_lasx_xvfmul_d): Ditto.
(CODE_FOR_lasx_xvfdiv_s): Ditto.
(CODE_FOR_lasx_xvfdiv_d): Ditto.
(CODE_FOR_lasx_xvfmax_s): Ditto.
(CODE_FOR_lasx_xvfmax_d): Ditto.
(CODE_FOR_lasx_xvfmin_s): Ditto.
(CODE_FOR_lasx_xvfmin_d): Ditto.
(CODE_FOR_lasx_xvfsqrt_s): Ditto.
(CODE_FOR_lasx_xvfsqrt_d): Ditto.
(CODE_FOR_lasx_xvflogb_s): Ditto.
(CODE_FOR_lasx_xvflogb_d): Ditto.
(CODE_FOR_lasx_xvmax_b): Ditto.
(CODE_FOR_lasx_xvmax_h): Ditto.
(CODE_FOR_lasx_xvmax_w): Ditto.
(CODE_FOR_lasx_xvmax_d): Ditto.
(CODE_FOR_lasx_xvmaxi_b): Ditto.
(CODE_FOR_lasx_xvmaxi_h): Ditto.
(CODE_FOR_lasx_xvmaxi_w): Ditto.
(CODE_FOR_lasx_xvmaxi_d): Ditto.
(CODE_FOR_lasx_xvmax_bu): Ditto.
(CODE_FOR_lasx_xvmax_hu): Ditto.
(CODE_FOR_lasx_xvmax_wu): Ditto.
(CODE_FOR_lasx_xvmax_du): Ditto.
(CODE_FOR_lasx_xvmaxi_bu): Ditto.
(CODE_FOR_lasx_xvmaxi_hu): Ditto.
(CODE_FOR_lasx_xvmaxi_wu): Ditto.
(CODE_FOR_lasx_xvmaxi_du): Ditto.
(CODE_FOR_lasx_xvmin_b): Ditto.
(CODE_FOR_lasx_xvmin_h): Ditto.
(CODE_FOR_lasx_xvmin_w): Ditto.
(CODE_FOR_lasx_xvmin_d): Ditto.
(CODE_FOR_lasx_xvmini_b): Ditto.
(CODE_FOR_lasx_xvmini_h): Ditto.
(CODE_FOR_lasx_xvmini_w): Ditto.
(CODE_FOR_lasx_xvmini_d): Ditto.
(CODE_FOR_lasx_xvmin_bu): Ditto.
(CODE_FOR_lasx_xvmin_hu): Ditto.
(CODE_FOR_lasx_xvmin_wu): Ditto.
(CODE_FOR_lasx_xvmin_du): Ditto.
(CODE_FOR_lasx_xvmini_bu): Ditto.
(CODE_FOR_lasx_xvmini_hu): Ditto.
(CODE_FOR_lasx_xvmini_wu): Ditto.
(CODE_FOR_lasx_xvmini_du): Ditto.
(CODE_FOR_lasx_xvmod_b): Ditto.
(CODE_FOR_lasx_xvmod_h): Ditto.
(CODE_FOR_lasx_xvmod_w): Ditto.
(CODE_FOR_lasx_xvmod_d): Ditto.
(CODE_FOR_lasx_xvmod_bu): Ditto.
(CODE_FOR_lasx_xvmod_hu): Ditto.
(CODE_FOR_lasx_xvmod_wu): Ditto.
(CODE_FOR_lasx_xvmod_du): Ditto.
(CODE_FOR_lasx_xvmul_b): Ditto.
(CODE_FOR_lasx_xvmul_h): Ditto.
(CODE_FOR_lasx_xvmul_w): Ditto.
(CODE_FOR_lasx_xvmul_d): Ditto.
(CODE_FOR_lasx_xvclz_b): Ditto.
(CODE_FOR_lasx_xvclz_h): Ditto.
(CODE_FOR_lasx_xvclz_w): Ditto.
(CODE_FOR_lasx_xvclz_d): Ditto.
(CODE_FOR_lasx_xvnor_v): Ditto.
(CODE_FOR_lasx_xvor_v): Ditto.
(CODE_FOR_lasx_xvori_b): Ditto.
(CODE_FOR_lasx_xvnori_b): Ditto.
(CODE_FOR_lasx_xvpcnt_b): Ditto.
(CODE_FOR_lasx_xvpcnt_h): Ditto.
(CODE_FOR_lasx_xvpcnt_w): Ditto.
(CODE_FOR_lasx_xvpcnt_d): Ditto.
(CODE_FOR_lasx_xvxor_v): Ditto.
(CODE_FOR_lasx_xvxori_b): Ditto.
(CODE_FOR_lasx_xvsll_b): Ditto.
(CODE_FOR_lasx_xvsll_h): Ditto.
(CODE_FOR_lasx_xvsll_w): Ditto.
(CODE_FOR_lasx_xvsll_d): Ditto.
(CODE_FOR_lasx_xvslli_b): Ditto.
(CODE_FOR_lasx_xvslli_h): Ditto.
(CODE_FOR_lasx_xvslli_w): Ditto.
(CODE_FOR_lasx_xvslli_d): Ditto.
(CODE_FOR_lasx_xvsra_b): Ditto.
(CODE_FOR_lasx_xvsra_h): Ditto.
(CODE_FOR_lasx_xvsra_w): Ditto.
(CODE_FOR_lasx_xvsra_d): Ditto.
(CODE_FOR_lasx_xvsrai_b): Ditto.
(CODE_FOR_lasx_xvsrai_h): Ditto.
(CODE_FOR_lasx_xvsrai_w): Ditto.
(CODE_FOR_lasx_xvsrai_d): Ditto.
(CODE_FOR_lasx_xvsrl_b): Ditto.
(CODE_FOR_lasx_xvsrl_h): Ditto.
(CODE_FOR_lasx_xvsrl_w): Ditto.
(CODE_FOR_lasx_xvsrl_d): Ditto.
(CODE_FOR_lasx_xvsrli_b): Ditto.
(CODE_FOR_lasx_xvsrli_h): Ditto.
(CODE_FOR_lasx_xvsrli_w): Ditto.
(CODE_FOR_lasx_xvsrli_d): Ditto.
(CODE_FOR_lasx_xvsub_b): Ditto.
(CODE_FOR_lasx_xvsub_h): Ditto.
(CODE_FOR_lasx_xvsub_w): Ditto.
(CODE_FOR_lasx_xvsub_d): Ditto.
(CODE_FOR_lasx_xvsubi_bu): Ditto.
(CODE_FOR_lasx_xvsubi_hu): Ditto.
(CODE_FOR_lasx_xvsubi_wu): Ditto.
(CODE_FOR_lasx_xvsubi_du): Ditto.
(CODE_FOR_lasx_xvpackod_d): Ditto.
(CODE_FOR_lasx_xvpackev_d): Ditto.
(CODE_FOR_lasx_xvpickod_d): Ditto.
(CODE_FOR_lasx_xvpickev_d): Ditto.
(CODE_FOR_lasx_xvrepli_b): Ditto.
(CODE_FOR_lasx_xvrepli_h): Ditto.
(CODE_FOR_lasx_xvrepli_w): Ditto.
(CODE_FOR_lasx_xvrepli_d): Ditto.
(CODE_FOR_lasx_xvandn_v): Ditto.
(CODE_FOR_lasx_xvorn_v): Ditto.
(CODE_FOR_lasx_xvneg_b): Ditto.
(CODE_FOR_lasx_xvneg_h): Ditto.
(CODE_FOR_lasx_xvneg_w): Ditto.
(CODE_FOR_lasx_xvneg_d): Ditto.
(CODE_FOR_lasx_xvbsrl_v): Ditto.
(CODE_FOR_lasx_xvbsll_v): Ditto.
(CODE_FOR_lasx_xvfmadd_s): Ditto.
(CODE_FOR_lasx_xvfmadd_d): Ditto.
(CODE_FOR_lasx_xvfmsub_s): Ditto.
(CODE_FOR_lasx_xvfmsub_d): Ditto.
(CODE_FOR_lasx_xvfnmadd_s): Ditto.
(CODE_FOR_lasx_xvfnmadd_d): Ditto.
(CODE_FOR_lasx_xvfnmsub_s): Ditto.
(CODE_FOR_lasx_xvfnmsub_d): Ditto.
(CODE_FOR_lasx_xvpermi_q): Ditto.
(CODE_FOR_lasx_xvpermi_d): Ditto.
(CODE_FOR_lasx_xbnz_v): Ditto.
(CODE_FOR_lasx_xbz_v): Ditto.
(CODE_FOR_lasx_xvssub_b): Ditto.
(CODE_FOR_lasx_xvssub_h): Ditto.
(CODE_FOR_lasx_xvssub_w): Ditto.
(CODE_FOR_lasx_xvssub_d): Ditto.
(CODE_FOR_lasx_xvssub_bu): Ditto.
(CODE_FOR_lasx_xvssub_hu): Ditto.
(CODE_FOR_lasx_xvssub_wu): Ditto.
(CODE_FOR_lasx_xvssub_du): Ditto.
(CODE_FOR_lasx_xvabsd_b): Ditto.
(CODE_FOR_lasx_xvabsd_h): Ditto.
(CODE_FOR_lasx_xvabsd_w): Ditto.
(CODE_FOR_lasx_xvabsd_d): Ditto.
(CODE_FOR_lasx_xvabsd_bu): Ditto.
(CODE_FOR_lasx_xvabsd_hu): Ditto.
(CODE_FOR_lasx_xvabsd_wu): Ditto.
(CODE_FOR_lasx_xvabsd_du): Ditto.
(CODE_FOR_lasx_xvavg_b): Ditto.
(CODE_FOR_lasx_xvavg_h): Ditto.
(CODE_FOR_lasx_xvavg_w): Ditto.
(CODE_FOR_lasx_xvavg_d): Ditto.
(CODE_FOR_lasx_xvavg_bu): Ditto.
(CODE_FOR_lasx_xvavg_hu): Ditto.
(CODE_FOR_lasx_xvavg_wu): Ditto.
(CODE_FOR_lasx_xvavg_du): Ditto.
(CODE_FOR_lasx_xvavgr_b): Ditto.
(CODE_FOR_lasx_xvavgr_h): Ditto.
(CODE_FOR_lasx_xvavgr_w): Ditto.
(CODE_FOR_lasx_xvavgr_d): Ditto.
(CODE_FOR_lasx_xvavgr_bu): Ditto.
(CODE_FOR_lasx_xvavgr_hu): Ditto.
(CODE_FOR_lasx_xvavgr_wu): Ditto.
(CODE_FOR_lasx_xvavgr_du): Ditto.
(CODE_FOR_lasx_xvmuh_b): Ditto.
(CODE_FOR_lasx_xvmuh_h): Ditto.
(CODE_FOR_lasx_xvmuh_w): Ditto.
(CODE_FOR_lasx_xvmuh_d): Ditto.
(CODE_FOR_lasx_xvmuh_bu): Ditto.
(CODE_FOR_lasx_xvmuh_hu): Ditto.
(CODE_FOR_lasx_xvmuh_wu): Ditto.
(CODE_FOR_lasx_xvmuh_du): Ditto.
(CODE_FOR_lasx_xvssran_b_h): Ditto.
(CODE_FOR_lasx_xvssran_h_w): Ditto.
(CODE_FOR_lasx_xvssran_w_d): Ditto.
(CODE_FOR_lasx_xvssran_bu_h): Ditto.
(CODE_FOR_lasx_xvssran_hu_w): Ditto.
(CODE_FOR_lasx_xvssran_wu_d): Ditto.
(CODE_FOR_lasx_xvssrarn_b_h): Ditto.
(CODE_FOR_lasx_xvssrarn_h_w): Ditto.
(CODE_FOR_lasx_xvssrarn_w_d): Ditto.
(CODE_FOR_lasx_xvssrarn_bu_h): Ditto.
(CODE_FOR_lasx_xvssrarn_hu_w): Ditto.
(CODE_FOR_lasx_xvssrarn_wu_d): Ditto.
(CODE_FOR_lasx_xvssrln_bu_h): Ditto.
(CODE_FOR_lasx_xvssrln_hu_w): Ditto.
(CODE_FOR_lasx_xvssrln_wu_d): Ditto.
(CODE_FOR_lasx_xvssrlrn_bu_h): Ditto.
(CODE_FOR_lasx_xvssrlrn_hu_w): Ditto.
(CODE_FOR_lasx_xvssrlrn_wu_d): Ditto.
(CODE_FOR_lasx_xvftint_w_s): Ditto.
(CODE_FOR_lasx_xvftint_l_d): Ditto.
(CODE_FOR_lasx_xvftint_wu_s): Ditto.
(CODE_FOR_lasx_xvftint_lu_d): Ditto.
(CODE_FOR_lasx_xvsllwil_h_b): Ditto.
(CODE_FOR_lasx_xvsllwil_w_h): Ditto.
(CODE_FOR_lasx_xvsllwil_d_w): Ditto.
(CODE_FOR_lasx_xvsllwil_hu_bu): Ditto.
(CODE_FOR_lasx_xvsllwil_wu_hu): Ditto.
(CODE_FOR_lasx_xvsllwil_du_wu): Ditto.
(CODE_FOR_lasx_xvsat_b): Ditto.
(CODE_FOR_lasx_xvsat_h): Ditto.
(CODE_FOR_lasx_xvsat_w): Ditto.
(CODE_FOR_lasx_xvsat_d): Ditto.
(CODE_FOR_lasx_xvsat_bu): Ditto.
(CODE_FOR_lasx_xvsat_hu): Ditto.
(CODE_FOR_lasx_xvsat_wu): Ditto.
(CODE_FOR_lasx_xvsat_du): Ditto.
(loongarch_builtin_vectorized_function): Ditto.
(loongarch_expand_builtin_insn): Ditto.
(loongarch_expand_builtin): Ditto.
* config/loongarch/loongarch-ftypes.def (1): Ditto.
(2): Ditto.
(3): Ditto.
(4): Ditto.
* config/loongarch/lasxintrin.h: New file.
|
|
gcc/ChangeLog:
* config.gcc: Export the header file lsxintrin.h.
* config/loongarch/loongarch-builtins.cc (LARCH_FTYPE_NAME4): Add builtin function support.
(enum loongarch_builtin_type): Ditto.
(AVAIL_ALL): Ditto.
(LARCH_BUILTIN): Ditto.
(LSX_BUILTIN): Ditto.
(LSX_BUILTIN_TEST_BRANCH): Ditto.
(LSX_NO_TARGET_BUILTIN): Ditto.
(CODE_FOR_lsx_vsadd_b): Ditto.
(CODE_FOR_lsx_vsadd_h): Ditto.
(CODE_FOR_lsx_vsadd_w): Ditto.
(CODE_FOR_lsx_vsadd_d): Ditto.
(CODE_FOR_lsx_vsadd_bu): Ditto.
(CODE_FOR_lsx_vsadd_hu): Ditto.
(CODE_FOR_lsx_vsadd_wu): Ditto.
(CODE_FOR_lsx_vsadd_du): Ditto.
(CODE_FOR_lsx_vadd_b): Ditto.
(CODE_FOR_lsx_vadd_h): Ditto.
(CODE_FOR_lsx_vadd_w): Ditto.
(CODE_FOR_lsx_vadd_d): Ditto.
(CODE_FOR_lsx_vaddi_bu): Ditto.
(CODE_FOR_lsx_vaddi_hu): Ditto.
(CODE_FOR_lsx_vaddi_wu): Ditto.
(CODE_FOR_lsx_vaddi_du): Ditto.
(CODE_FOR_lsx_vand_v): Ditto.
(CODE_FOR_lsx_vandi_b): Ditto.
(CODE_FOR_lsx_bnz_v): Ditto.
(CODE_FOR_lsx_bz_v): Ditto.
(CODE_FOR_lsx_vbitsel_v): Ditto.
(CODE_FOR_lsx_vseqi_b): Ditto.
(CODE_FOR_lsx_vseqi_h): Ditto.
(CODE_FOR_lsx_vseqi_w): Ditto.
(CODE_FOR_lsx_vseqi_d): Ditto.
(CODE_FOR_lsx_vslti_b): Ditto.
(CODE_FOR_lsx_vslti_h): Ditto.
(CODE_FOR_lsx_vslti_w): Ditto.
(CODE_FOR_lsx_vslti_d): Ditto.
(CODE_FOR_lsx_vslti_bu): Ditto.
(CODE_FOR_lsx_vslti_hu): Ditto.
(CODE_FOR_lsx_vslti_wu): Ditto.
(CODE_FOR_lsx_vslti_du): Ditto.
(CODE_FOR_lsx_vslei_b): Ditto.
(CODE_FOR_lsx_vslei_h): Ditto.
(CODE_FOR_lsx_vslei_w): Ditto.
(CODE_FOR_lsx_vslei_d): Ditto.
(CODE_FOR_lsx_vslei_bu): Ditto.
(CODE_FOR_lsx_vslei_hu): Ditto.
(CODE_FOR_lsx_vslei_wu): Ditto.
(CODE_FOR_lsx_vslei_du): Ditto.
(CODE_FOR_lsx_vdiv_b): Ditto.
(CODE_FOR_lsx_vdiv_h): Ditto.
(CODE_FOR_lsx_vdiv_w): Ditto.
(CODE_FOR_lsx_vdiv_d): Ditto.
(CODE_FOR_lsx_vdiv_bu): Ditto.
(CODE_FOR_lsx_vdiv_hu): Ditto.
(CODE_FOR_lsx_vdiv_wu): Ditto.
(CODE_FOR_lsx_vdiv_du): Ditto.
(CODE_FOR_lsx_vfadd_s): Ditto.
(CODE_FOR_lsx_vfadd_d): Ditto.
(CODE_FOR_lsx_vftintrz_w_s): Ditto.
(CODE_FOR_lsx_vftintrz_l_d): Ditto.
(CODE_FOR_lsx_vftintrz_wu_s): Ditto.
(CODE_FOR_lsx_vftintrz_lu_d): Ditto.
(CODE_FOR_lsx_vffint_s_w): Ditto.
(CODE_FOR_lsx_vffint_d_l): Ditto.
(CODE_FOR_lsx_vffint_s_wu): Ditto.
(CODE_FOR_lsx_vffint_d_lu): Ditto.
(CODE_FOR_lsx_vfsub_s): Ditto.
(CODE_FOR_lsx_vfsub_d): Ditto.
(CODE_FOR_lsx_vfmul_s): Ditto.
(CODE_FOR_lsx_vfmul_d): Ditto.
(CODE_FOR_lsx_vfdiv_s): Ditto.
(CODE_FOR_lsx_vfdiv_d): Ditto.
(CODE_FOR_lsx_vfmax_s): Ditto.
(CODE_FOR_lsx_vfmax_d): Ditto.
(CODE_FOR_lsx_vfmin_s): Ditto.
(CODE_FOR_lsx_vfmin_d): Ditto.
(CODE_FOR_lsx_vfsqrt_s): Ditto.
(CODE_FOR_lsx_vfsqrt_d): Ditto.
(CODE_FOR_lsx_vflogb_s): Ditto.
(CODE_FOR_lsx_vflogb_d): Ditto.
(CODE_FOR_lsx_vmax_b): Ditto.
(CODE_FOR_lsx_vmax_h): Ditto.
(CODE_FOR_lsx_vmax_w): Ditto.
(CODE_FOR_lsx_vmax_d): Ditto.
(CODE_FOR_lsx_vmaxi_b): Ditto.
(CODE_FOR_lsx_vmaxi_h): Ditto.
(CODE_FOR_lsx_vmaxi_w): Ditto.
(CODE_FOR_lsx_vmaxi_d): Ditto.
(CODE_FOR_lsx_vmax_bu): Ditto.
(CODE_FOR_lsx_vmax_hu): Ditto.
(CODE_FOR_lsx_vmax_wu): Ditto.
(CODE_FOR_lsx_vmax_du): Ditto.
(CODE_FOR_lsx_vmaxi_bu): Ditto.
(CODE_FOR_lsx_vmaxi_hu): Ditto.
(CODE_FOR_lsx_vmaxi_wu): Ditto.
(CODE_FOR_lsx_vmaxi_du): Ditto.
(CODE_FOR_lsx_vmin_b): Ditto.
(CODE_FOR_lsx_vmin_h): Ditto.
(CODE_FOR_lsx_vmin_w): Ditto.
(CODE_FOR_lsx_vmin_d): Ditto.
(CODE_FOR_lsx_vmini_b): Ditto.
(CODE_FOR_lsx_vmini_h): Ditto.
(CODE_FOR_lsx_vmini_w): Ditto.
(CODE_FOR_lsx_vmini_d): Ditto.
(CODE_FOR_lsx_vmin_bu): Ditto.
(CODE_FOR_lsx_vmin_hu): Ditto.
(CODE_FOR_lsx_vmin_wu): Ditto.
(CODE_FOR_lsx_vmin_du): Ditto.
(CODE_FOR_lsx_vmini_bu): Ditto.
(CODE_FOR_lsx_vmini_hu): Ditto.
(CODE_FOR_lsx_vmini_wu): Ditto.
(CODE_FOR_lsx_vmini_du): Ditto.
(CODE_FOR_lsx_vmod_b): Ditto.
(CODE_FOR_lsx_vmod_h): Ditto.
(CODE_FOR_lsx_vmod_w): Ditto.
(CODE_FOR_lsx_vmod_d): Ditto.
(CODE_FOR_lsx_vmod_bu): Ditto.
(CODE_FOR_lsx_vmod_hu): Ditto.
(CODE_FOR_lsx_vmod_wu): Ditto.
(CODE_FOR_lsx_vmod_du): Ditto.
(CODE_FOR_lsx_vmul_b): Ditto.
(CODE_FOR_lsx_vmul_h): Ditto.
(CODE_FOR_lsx_vmul_w): Ditto.
(CODE_FOR_lsx_vmul_d): Ditto.
(CODE_FOR_lsx_vclz_b): Ditto.
(CODE_FOR_lsx_vclz_h): Ditto.
(CODE_FOR_lsx_vclz_w): Ditto.
(CODE_FOR_lsx_vclz_d): Ditto.
(CODE_FOR_lsx_vnor_v): Ditto.
(CODE_FOR_lsx_vor_v): Ditto.
(CODE_FOR_lsx_vori_b): Ditto.
(CODE_FOR_lsx_vnori_b): Ditto.
(CODE_FOR_lsx_vpcnt_b): Ditto.
(CODE_FOR_lsx_vpcnt_h): Ditto.
(CODE_FOR_lsx_vpcnt_w): Ditto.
(CODE_FOR_lsx_vpcnt_d): Ditto.
(CODE_FOR_lsx_vxor_v): Ditto.
(CODE_FOR_lsx_vxori_b): Ditto.
(CODE_FOR_lsx_vsll_b): Ditto.
(CODE_FOR_lsx_vsll_h): Ditto.
(CODE_FOR_lsx_vsll_w): Ditto.
(CODE_FOR_lsx_vsll_d): Ditto.
(CODE_FOR_lsx_vslli_b): Ditto.
(CODE_FOR_lsx_vslli_h): Ditto.
(CODE_FOR_lsx_vslli_w): Ditto.
(CODE_FOR_lsx_vslli_d): Ditto.
(CODE_FOR_lsx_vsra_b): Ditto.
(CODE_FOR_lsx_vsra_h): Ditto.
(CODE_FOR_lsx_vsra_w): Ditto.
(CODE_FOR_lsx_vsra_d): Ditto.
(CODE_FOR_lsx_vsrai_b): Ditto.
(CODE_FOR_lsx_vsrai_h): Ditto.
(CODE_FOR_lsx_vsrai_w): Ditto.
(CODE_FOR_lsx_vsrai_d): Ditto.
(CODE_FOR_lsx_vsrl_b): Ditto.
(CODE_FOR_lsx_vsrl_h): Ditto.
(CODE_FOR_lsx_vsrl_w): Ditto.
(CODE_FOR_lsx_vsrl_d): Ditto.
(CODE_FOR_lsx_vsrli_b): Ditto.
(CODE_FOR_lsx_vsrli_h): Ditto.
(CODE_FOR_lsx_vsrli_w): Ditto.
(CODE_FOR_lsx_vsrli_d): Ditto.
(CODE_FOR_lsx_vsub_b): Ditto.
(CODE_FOR_lsx_vsub_h): Ditto.
(CODE_FOR_lsx_vsub_w): Ditto.
(CODE_FOR_lsx_vsub_d): Ditto.
(CODE_FOR_lsx_vsubi_bu): Ditto.
(CODE_FOR_lsx_vsubi_hu): Ditto.
(CODE_FOR_lsx_vsubi_wu): Ditto.
(CODE_FOR_lsx_vsubi_du): Ditto.
(CODE_FOR_lsx_vpackod_d): Ditto.
(CODE_FOR_lsx_vpackev_d): Ditto.
(CODE_FOR_lsx_vpickod_d): Ditto.
(CODE_FOR_lsx_vpickev_d): Ditto.
(CODE_FOR_lsx_vrepli_b): Ditto.
(CODE_FOR_lsx_vrepli_h): Ditto.
(CODE_FOR_lsx_vrepli_w): Ditto.
(CODE_FOR_lsx_vrepli_d): Ditto.
(CODE_FOR_lsx_vsat_b): Ditto.
(CODE_FOR_lsx_vsat_h): Ditto.
(CODE_FOR_lsx_vsat_w): Ditto.
(CODE_FOR_lsx_vsat_d): Ditto.
(CODE_FOR_lsx_vsat_bu): Ditto.
(CODE_FOR_lsx_vsat_hu): Ditto.
(CODE_FOR_lsx_vsat_wu): Ditto.
(CODE_FOR_lsx_vsat_du): Ditto.
(CODE_FOR_lsx_vavg_b): Ditto.
(CODE_FOR_lsx_vavg_h): Ditto.
(CODE_FOR_lsx_vavg_w): Ditto.
(CODE_FOR_lsx_vavg_d): Ditto.
(CODE_FOR_lsx_vavg_bu): Ditto.
(CODE_FOR_lsx_vavg_hu): Ditto.
(CODE_FOR_lsx_vavg_wu): Ditto.
(CODE_FOR_lsx_vavg_du): Ditto.
(CODE_FOR_lsx_vavgr_b): Ditto.
(CODE_FOR_lsx_vavgr_h): Ditto.
(CODE_FOR_lsx_vavgr_w): Ditto.
(CODE_FOR_lsx_vavgr_d): Ditto.
(CODE_FOR_lsx_vavgr_bu): Ditto.
(CODE_FOR_lsx_vavgr_hu): Ditto.
(CODE_FOR_lsx_vavgr_wu): Ditto.
(CODE_FOR_lsx_vavgr_du): Ditto.
(CODE_FOR_lsx_vssub_b): Ditto.
(CODE_FOR_lsx_vssub_h): Ditto.
(CODE_FOR_lsx_vssub_w): Ditto.
(CODE_FOR_lsx_vssub_d): Ditto.
(CODE_FOR_lsx_vssub_bu): Ditto.
(CODE_FOR_lsx_vssub_hu): Ditto.
(CODE_FOR_lsx_vssub_wu): Ditto.
(CODE_FOR_lsx_vssub_du): Ditto.
(CODE_FOR_lsx_vabsd_b): Ditto.
(CODE_FOR_lsx_vabsd_h): Ditto.
(CODE_FOR_lsx_vabsd_w): Ditto.
(CODE_FOR_lsx_vabsd_d): Ditto.
(CODE_FOR_lsx_vabsd_bu): Ditto.
(CODE_FOR_lsx_vabsd_hu): Ditto.
(CODE_FOR_lsx_vabsd_wu): Ditto.
(CODE_FOR_lsx_vabsd_du): Ditto.
(CODE_FOR_lsx_vftint_w_s): Ditto.
(CODE_FOR_lsx_vftint_l_d): Ditto.
(CODE_FOR_lsx_vftint_wu_s): Ditto.
(CODE_FOR_lsx_vftint_lu_d): Ditto.
(CODE_FOR_lsx_vandn_v): Ditto.
(CODE_FOR_lsx_vorn_v): Ditto.
(CODE_FOR_lsx_vneg_b): Ditto.
(CODE_FOR_lsx_vneg_h): Ditto.
(CODE_FOR_lsx_vneg_w): Ditto.
(CODE_FOR_lsx_vneg_d): Ditto.
(CODE_FOR_lsx_vshuf4i_d): Ditto.
(CODE_FOR_lsx_vbsrl_v): Ditto.
(CODE_FOR_lsx_vbsll_v): Ditto.
(CODE_FOR_lsx_vfmadd_s): Ditto.
(CODE_FOR_lsx_vfmadd_d): Ditto.
(CODE_FOR_lsx_vfmsub_s): Ditto.
(CODE_FOR_lsx_vfmsub_d): Ditto.
(CODE_FOR_lsx_vfnmadd_s): Ditto.
(CODE_FOR_lsx_vfnmadd_d): Ditto.
(CODE_FOR_lsx_vfnmsub_s): Ditto.
(CODE_FOR_lsx_vfnmsub_d): Ditto.
(CODE_FOR_lsx_vmuh_b): Ditto.
(CODE_FOR_lsx_vmuh_h): Ditto.
(CODE_FOR_lsx_vmuh_w): Ditto.
(CODE_FOR_lsx_vmuh_d): Ditto.
(CODE_FOR_lsx_vmuh_bu): Ditto.
(CODE_FOR_lsx_vmuh_hu): Ditto.
(CODE_FOR_lsx_vmuh_wu): Ditto.
(CODE_FOR_lsx_vmuh_du): Ditto.
(CODE_FOR_lsx_vsllwil_h_b): Ditto.
(CODE_FOR_lsx_vsllwil_w_h): Ditto.
(CODE_FOR_lsx_vsllwil_d_w): Ditto.
(CODE_FOR_lsx_vsllwil_hu_bu): Ditto.
(CODE_FOR_lsx_vsllwil_wu_hu): Ditto.
(CODE_FOR_lsx_vsllwil_du_wu): Ditto.
(CODE_FOR_lsx_vssran_b_h): Ditto.
(CODE_FOR_lsx_vssran_h_w): Ditto.
(CODE_FOR_lsx_vssran_w_d): Ditto.
(CODE_FOR_lsx_vssran_bu_h): Ditto.
(CODE_FOR_lsx_vssran_hu_w): Ditto.
(CODE_FOR_lsx_vssran_wu_d): Ditto.
(CODE_FOR_lsx_vssrarn_b_h): Ditto.
(CODE_FOR_lsx_vssrarn_h_w): Ditto.
(CODE_FOR_lsx_vssrarn_w_d): Ditto.
(CODE_FOR_lsx_vssrarn_bu_h): Ditto.
(CODE_FOR_lsx_vssrarn_hu_w): Ditto.
(CODE_FOR_lsx_vssrarn_wu_d): Ditto.
(CODE_FOR_lsx_vssrln_bu_h): Ditto.
(CODE_FOR_lsx_vssrln_hu_w): Ditto.
(CODE_FOR_lsx_vssrln_wu_d): Ditto.
(CODE_FOR_lsx_vssrlrn_bu_h): Ditto.
(CODE_FOR_lsx_vssrlrn_hu_w): Ditto.
(CODE_FOR_lsx_vssrlrn_wu_d): Ditto.
(loongarch_builtin_vector_type): Ditto.
(loongarch_build_cvpointer_type): Ditto.
(LARCH_ATYPE_CVPOINTER): Ditto.
(LARCH_ATYPE_BOOLEAN): Ditto.
(LARCH_ATYPE_V2SF): Ditto.
(LARCH_ATYPE_V2HI): Ditto.
(LARCH_ATYPE_V2SI): Ditto.
(LARCH_ATYPE_V4QI): Ditto.
(LARCH_ATYPE_V4HI): Ditto.
(LARCH_ATYPE_V8QI): Ditto.
(LARCH_ATYPE_V2DI): Ditto.
(LARCH_ATYPE_V4SI): Ditto.
(LARCH_ATYPE_V8HI): Ditto.
(LARCH_ATYPE_V16QI): Ditto.
(LARCH_ATYPE_V2DF): Ditto.
(LARCH_ATYPE_V4SF): Ditto.
(LARCH_ATYPE_V4DI): Ditto.
(LARCH_ATYPE_V8SI): Ditto.
(LARCH_ATYPE_V16HI): Ditto.
(LARCH_ATYPE_V32QI): Ditto.
(LARCH_ATYPE_V4DF): Ditto.
(LARCH_ATYPE_V8SF): Ditto.
(LARCH_ATYPE_UV2DI): Ditto.
(LARCH_ATYPE_UV4SI): Ditto.
(LARCH_ATYPE_UV8HI): Ditto.
(LARCH_ATYPE_UV16QI): Ditto.
(LARCH_ATYPE_UV4DI): Ditto.
(LARCH_ATYPE_UV8SI): Ditto.
(LARCH_ATYPE_UV16HI): Ditto.
(LARCH_ATYPE_UV32QI): Ditto.
(LARCH_ATYPE_UV2SI): Ditto.
(LARCH_ATYPE_UV4HI): Ditto.
(LARCH_ATYPE_UV8QI): Ditto.
(loongarch_builtin_vectorized_function): Ditto.
(LARCH_GET_BUILTIN): Ditto.
(loongarch_expand_builtin_insn): Ditto.
(loongarch_expand_builtin_lsx_test_branch): Ditto.
(loongarch_expand_builtin): Ditto.
* config/loongarch/loongarch-ftypes.def (1): Ditto.
(2): Ditto.
(3): Ditto.
(4): Ditto.
* config/loongarch/lsxintrin.h: New file.
|