Age | Commit message (Collapse) | Author | Files | Lines |
|
The old legacy code allowed building ranges of unknown types so passes
like IPA could build and propagate VARYING. For now it's easiest to
allow the old behavior, it's not like you can do anything with these
ranges except build them and copy them.
Eventually we should convert all users of set_varying() to use
supported types. I will address this in my upcoming IPA work.
PR tree-optimization/109711
gcc/ChangeLog:
* value-range.cc (irange::verify_range): Allow types of
error_mark_node.
|
|
When instrumentation is requested via -fsanitize-coverage=trace-pc, GCC
emits calls of __sanitizer_cov_trace_pc callback in each basic block.
This callback is supposed to be implemented by the user, and should be
able to identify the containing basic block by inspecting its return
address. Tailcalling the callback prevents that, so disallow it.
gcc/ChangeLog:
PR sanitizer/90746
* calls.cc (can_implement_as_sibling_call_p): Reject calls
to __sanitizer_cov_trace_pc.
gcc/testsuite/ChangeLog:
PR sanitizer/90746
* gcc.dg/sancov/basic0.c: Verify absence of tailcall.
|
|
aarch64_function_arg_alignment has traditionally taken the alignment
of a scalar type T from TYPE_ALIGN (TYPE_MAIN_VARIANT (T)). This is
supposed to discard any user alignment and give the alignment of the
underlying fundamental type.
PR109661 shows that this did the wrong thing for enums with
a defined underlying type, because:
(1) The enum itself could be aligned, using attributes.
(2) The enum would pick up any user alignment on the underlying type.
We get the right behaviour if we look at the TYPE_MAIN_VARIANT
of the underlying type instead.
As always, this affects register and stack arguments differently,
because:
(a) The code that handles register arguments only considers the
alignment of types that occupy two registers, whereas the
stack alignment is applied regardless of size.
(b) The code that handles register arguments tests the alignment
for equality with 16 bytes, so that (unexpected) greater alignments
are ignored. The code that handles stack arguments instead caps the
alignment to 16 bytes.
There is now (since GCC 13) an assert to trap the difference between
(a) and (b), which is how the new incompatiblity showed up.
Clang alredy handled the testcases correctly, so this patch aligns
the GCC behaviour with the Clang behaviour.
I'm planning to remove the asserts on the branches, since we don't
want to change the ABI there.
gcc/
PR target/109661
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Add
a new ABI break parameter for GCC 14. Set it to the alignment
of enums that have an underlying type. Take the true alignment
of such enums from the TYPE_ALIGN of the underlying type's
TYPE_MAIN_VARIANT.
(aarch64_function_arg_boundary): Update accordingly.
(aarch64_layout_arg, aarch64_gimplify_va_arg_expr): Likewise.
Warn about ABI differences.
gcc/testsuite/
* g++.target/aarch64/pr109661-1.C: New test.
* g++.target/aarch64/pr109661-2.C: Likewise.
* g++.target/aarch64/pr109661-3.C: Likewise.
* g++.target/aarch64/pr109661-4.C: Likewise.
* gcc.target/aarch64/pr109661-1.c: Likewise.
|
|
aarch64_function_arg_alignment has two related abi_break
parameters: abi_break for a change in GCC 9, and abi_break_packed
for a related follow-on change in GCC 13. In a sense, abi_break_packed
is a "subfix" of abi_break.
PR109661 now requires a third ABI break that is independent
of the other two. Having abi_break for the GCC 9 break and
abi_break_<something> for the GCC 13 and GCC 14 breaks might
give the impression that they're all related, and that the GCC 14
fix (like the GCC 13 fix) is a "subfix" of the GCC 9 one.
It therefore seemed like a good idea to rename the existing
variables first.
It would be difficult to choose names that describe briefly and
precisely what went wrong in each case. The next best thing
seemed to be to name them after the relevant GCC version.
(Of course, this might break down in future if we need two
independent fixes in the same version. Let's hope not.)
I wondered about putting all the variables in a structure,
but one advantage of using independent variables is that it's
harder to forget to update a caller. Maybe a fourth parameter
would be a tipping point.
gcc/
PR target/109661
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Rename
ABI break variables to abi_break_gcc_9 and abi_break_gcc_13.
(aarch64_layout_arg, aarch64_function_arg_boundary): Likewise.
(aarch64_gimplify_va_arg_expr): Likewise.
|
|
vrhaddq vrmulhq
Implement vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_M_N_NO_F)
(FUNCTION_WITHOUT_N_NO_F, FUNCTION_WITH_M_N_NO_U_F): New.
(vhaddq, vhsubq, vmulhq, vqaddq, vqsubq, vqdmulhq, vrhaddq)
(vrmulhq): New.
* config/arm/arm-mve-builtins-base.def (vhaddq, vhsubq, vmulhq)
(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
* config/arm/arm-mve-builtins-base.h (vhaddq, vhsubq, vmulhq)
(vqaddq, vqsubq, vqdmulhq, vrhaddq, vrmulhq): New.
* config/arm/arm_mve.h (vhsubq): Remove.
(vhaddq): Remove.
(vhaddq_m): Remove.
(vhsubq_m): Remove.
(vhaddq_x): Remove.
(vhsubq_x): Remove.
(vhsubq_u8): Remove.
(vhsubq_n_u8): Remove.
(vhaddq_u8): Remove.
(vhaddq_n_u8): Remove.
(vhsubq_s8): Remove.
(vhsubq_n_s8): Remove.
(vhaddq_s8): Remove.
(vhaddq_n_s8): Remove.
(vhsubq_u16): Remove.
(vhsubq_n_u16): Remove.
(vhaddq_u16): Remove.
(vhaddq_n_u16): Remove.
(vhsubq_s16): Remove.
(vhsubq_n_s16): Remove.
(vhaddq_s16): Remove.
(vhaddq_n_s16): Remove.
(vhsubq_u32): Remove.
(vhsubq_n_u32): Remove.
(vhaddq_u32): Remove.
(vhaddq_n_u32): Remove.
(vhsubq_s32): Remove.
(vhsubq_n_s32): Remove.
(vhaddq_s32): Remove.
(vhaddq_n_s32): Remove.
(vhaddq_m_n_s8): Remove.
(vhaddq_m_n_s32): Remove.
(vhaddq_m_n_s16): Remove.
(vhaddq_m_n_u8): Remove.
(vhaddq_m_n_u32): Remove.
(vhaddq_m_n_u16): Remove.
(vhaddq_m_s8): Remove.
(vhaddq_m_s32): Remove.
(vhaddq_m_s16): Remove.
(vhaddq_m_u8): Remove.
(vhaddq_m_u32): Remove.
(vhaddq_m_u16): Remove.
(vhsubq_m_n_s8): Remove.
(vhsubq_m_n_s32): Remove.
(vhsubq_m_n_s16): Remove.
(vhsubq_m_n_u8): Remove.
(vhsubq_m_n_u32): Remove.
(vhsubq_m_n_u16): Remove.
(vhsubq_m_s8): Remove.
(vhsubq_m_s32): Remove.
(vhsubq_m_s16): Remove.
(vhsubq_m_u8): Remove.
(vhsubq_m_u32): Remove.
(vhsubq_m_u16): Remove.
(vhaddq_x_n_s8): Remove.
(vhaddq_x_n_s16): Remove.
(vhaddq_x_n_s32): Remove.
(vhaddq_x_n_u8): Remove.
(vhaddq_x_n_u16): Remove.
(vhaddq_x_n_u32): Remove.
(vhaddq_x_s8): Remove.
(vhaddq_x_s16): Remove.
(vhaddq_x_s32): Remove.
(vhaddq_x_u8): Remove.
(vhaddq_x_u16): Remove.
(vhaddq_x_u32): Remove.
(vhsubq_x_n_s8): Remove.
(vhsubq_x_n_s16): Remove.
(vhsubq_x_n_s32): Remove.
(vhsubq_x_n_u8): Remove.
(vhsubq_x_n_u16): Remove.
(vhsubq_x_n_u32): Remove.
(vhsubq_x_s8): Remove.
(vhsubq_x_s16): Remove.
(vhsubq_x_s32): Remove.
(vhsubq_x_u8): Remove.
(vhsubq_x_u16): Remove.
(vhsubq_x_u32): Remove.
(__arm_vhsubq_u8): Remove.
(__arm_vhsubq_n_u8): Remove.
(__arm_vhaddq_u8): Remove.
(__arm_vhaddq_n_u8): Remove.
(__arm_vhsubq_s8): Remove.
(__arm_vhsubq_n_s8): Remove.
(__arm_vhaddq_s8): Remove.
(__arm_vhaddq_n_s8): Remove.
(__arm_vhsubq_u16): Remove.
(__arm_vhsubq_n_u16): Remove.
(__arm_vhaddq_u16): Remove.
(__arm_vhaddq_n_u16): Remove.
(__arm_vhsubq_s16): Remove.
(__arm_vhsubq_n_s16): Remove.
(__arm_vhaddq_s16): Remove.
(__arm_vhaddq_n_s16): Remove.
(__arm_vhsubq_u32): Remove.
(__arm_vhsubq_n_u32): Remove.
(__arm_vhaddq_u32): Remove.
(__arm_vhaddq_n_u32): Remove.
(__arm_vhsubq_s32): Remove.
(__arm_vhsubq_n_s32): Remove.
(__arm_vhaddq_s32): Remove.
(__arm_vhaddq_n_s32): Remove.
(__arm_vhaddq_m_n_s8): Remove.
(__arm_vhaddq_m_n_s32): Remove.
(__arm_vhaddq_m_n_s16): Remove.
(__arm_vhaddq_m_n_u8): Remove.
(__arm_vhaddq_m_n_u32): Remove.
(__arm_vhaddq_m_n_u16): Remove.
(__arm_vhaddq_m_s8): Remove.
(__arm_vhaddq_m_s32): Remove.
(__arm_vhaddq_m_s16): Remove.
(__arm_vhaddq_m_u8): Remove.
(__arm_vhaddq_m_u32): Remove.
(__arm_vhaddq_m_u16): Remove.
(__arm_vhsubq_m_n_s8): Remove.
(__arm_vhsubq_m_n_s32): Remove.
(__arm_vhsubq_m_n_s16): Remove.
(__arm_vhsubq_m_n_u8): Remove.
(__arm_vhsubq_m_n_u32): Remove.
(__arm_vhsubq_m_n_u16): Remove.
(__arm_vhsubq_m_s8): Remove.
(__arm_vhsubq_m_s32): Remove.
(__arm_vhsubq_m_s16): Remove.
(__arm_vhsubq_m_u8): Remove.
(__arm_vhsubq_m_u32): Remove.
(__arm_vhsubq_m_u16): Remove.
(__arm_vhaddq_x_n_s8): Remove.
(__arm_vhaddq_x_n_s16): Remove.
(__arm_vhaddq_x_n_s32): Remove.
(__arm_vhaddq_x_n_u8): Remove.
(__arm_vhaddq_x_n_u16): Remove.
(__arm_vhaddq_x_n_u32): Remove.
(__arm_vhaddq_x_s8): Remove.
(__arm_vhaddq_x_s16): Remove.
(__arm_vhaddq_x_s32): Remove.
(__arm_vhaddq_x_u8): Remove.
(__arm_vhaddq_x_u16): Remove.
(__arm_vhaddq_x_u32): Remove.
(__arm_vhsubq_x_n_s8): Remove.
(__arm_vhsubq_x_n_s16): Remove.
(__arm_vhsubq_x_n_s32): Remove.
(__arm_vhsubq_x_n_u8): Remove.
(__arm_vhsubq_x_n_u16): Remove.
(__arm_vhsubq_x_n_u32): Remove.
(__arm_vhsubq_x_s8): Remove.
(__arm_vhsubq_x_s16): Remove.
(__arm_vhsubq_x_s32): Remove.
(__arm_vhsubq_x_u8): Remove.
(__arm_vhsubq_x_u16): Remove.
(__arm_vhsubq_x_u32): Remove.
(__arm_vhsubq): Remove.
(__arm_vhaddq): Remove.
(__arm_vhaddq_m): Remove.
(__arm_vhsubq_m): Remove.
(__arm_vhaddq_x): Remove.
(__arm_vhsubq_x): Remove.
(vmulhq): Remove.
(vmulhq_m): Remove.
(vmulhq_x): Remove.
(vmulhq_u8): Remove.
(vmulhq_s8): Remove.
(vmulhq_u16): Remove.
(vmulhq_s16): Remove.
(vmulhq_u32): Remove.
(vmulhq_s32): Remove.
(vmulhq_m_s8): Remove.
(vmulhq_m_s32): Remove.
(vmulhq_m_s16): Remove.
(vmulhq_m_u8): Remove.
(vmulhq_m_u32): Remove.
(vmulhq_m_u16): Remove.
(vmulhq_x_s8): Remove.
(vmulhq_x_s16): Remove.
(vmulhq_x_s32): Remove.
(vmulhq_x_u8): Remove.
(vmulhq_x_u16): Remove.
(vmulhq_x_u32): Remove.
(__arm_vmulhq_u8): Remove.
(__arm_vmulhq_s8): Remove.
(__arm_vmulhq_u16): Remove.
(__arm_vmulhq_s16): Remove.
(__arm_vmulhq_u32): Remove.
(__arm_vmulhq_s32): Remove.
(__arm_vmulhq_m_s8): Remove.
(__arm_vmulhq_m_s32): Remove.
(__arm_vmulhq_m_s16): Remove.
(__arm_vmulhq_m_u8): Remove.
(__arm_vmulhq_m_u32): Remove.
(__arm_vmulhq_m_u16): Remove.
(__arm_vmulhq_x_s8): Remove.
(__arm_vmulhq_x_s16): Remove.
(__arm_vmulhq_x_s32): Remove.
(__arm_vmulhq_x_u8): Remove.
(__arm_vmulhq_x_u16): Remove.
(__arm_vmulhq_x_u32): Remove.
(__arm_vmulhq): Remove.
(__arm_vmulhq_m): Remove.
(__arm_vmulhq_x): Remove.
(vqsubq): Remove.
(vqaddq): Remove.
(vqaddq_m): Remove.
(vqsubq_m): Remove.
(vqsubq_u8): Remove.
(vqsubq_n_u8): Remove.
(vqaddq_u8): Remove.
(vqaddq_n_u8): Remove.
(vqsubq_s8): Remove.
(vqsubq_n_s8): Remove.
(vqaddq_s8): Remove.
(vqaddq_n_s8): Remove.
(vqsubq_u16): Remove.
(vqsubq_n_u16): Remove.
(vqaddq_u16): Remove.
(vqaddq_n_u16): Remove.
(vqsubq_s16): Remove.
(vqsubq_n_s16): Remove.
(vqaddq_s16): Remove.
(vqaddq_n_s16): Remove.
(vqsubq_u32): Remove.
(vqsubq_n_u32): Remove.
(vqaddq_u32): Remove.
(vqaddq_n_u32): Remove.
(vqsubq_s32): Remove.
(vqsubq_n_s32): Remove.
(vqaddq_s32): Remove.
(vqaddq_n_s32): Remove.
(vqaddq_m_n_s8): Remove.
(vqaddq_m_n_s32): Remove.
(vqaddq_m_n_s16): Remove.
(vqaddq_m_n_u8): Remove.
(vqaddq_m_n_u32): Remove.
(vqaddq_m_n_u16): Remove.
(vqaddq_m_s8): Remove.
(vqaddq_m_s32): Remove.
(vqaddq_m_s16): Remove.
(vqaddq_m_u8): Remove.
(vqaddq_m_u32): Remove.
(vqaddq_m_u16): Remove.
(vqsubq_m_n_s8): Remove.
(vqsubq_m_n_s32): Remove.
(vqsubq_m_n_s16): Remove.
(vqsubq_m_n_u8): Remove.
(vqsubq_m_n_u32): Remove.
(vqsubq_m_n_u16): Remove.
(vqsubq_m_s8): Remove.
(vqsubq_m_s32): Remove.
(vqsubq_m_s16): Remove.
(vqsubq_m_u8): Remove.
(vqsubq_m_u32): Remove.
(vqsubq_m_u16): Remove.
(__arm_vqsubq_u8): Remove.
(__arm_vqsubq_n_u8): Remove.
(__arm_vqaddq_u8): Remove.
(__arm_vqaddq_n_u8): Remove.
(__arm_vqsubq_s8): Remove.
(__arm_vqsubq_n_s8): Remove.
(__arm_vqaddq_s8): Remove.
(__arm_vqaddq_n_s8): Remove.
(__arm_vqsubq_u16): Remove.
(__arm_vqsubq_n_u16): Remove.
(__arm_vqaddq_u16): Remove.
(__arm_vqaddq_n_u16): Remove.
(__arm_vqsubq_s16): Remove.
(__arm_vqsubq_n_s16): Remove.
(__arm_vqaddq_s16): Remove.
(__arm_vqaddq_n_s16): Remove.
(__arm_vqsubq_u32): Remove.
(__arm_vqsubq_n_u32): Remove.
(__arm_vqaddq_u32): Remove.
(__arm_vqaddq_n_u32): Remove.
(__arm_vqsubq_s32): Remove.
(__arm_vqsubq_n_s32): Remove.
(__arm_vqaddq_s32): Remove.
(__arm_vqaddq_n_s32): Remove.
(__arm_vqaddq_m_n_s8): Remove.
(__arm_vqaddq_m_n_s32): Remove.
(__arm_vqaddq_m_n_s16): Remove.
(__arm_vqaddq_m_n_u8): Remove.
(__arm_vqaddq_m_n_u32): Remove.
(__arm_vqaddq_m_n_u16): Remove.
(__arm_vqaddq_m_s8): Remove.
(__arm_vqaddq_m_s32): Remove.
(__arm_vqaddq_m_s16): Remove.
(__arm_vqaddq_m_u8): Remove.
(__arm_vqaddq_m_u32): Remove.
(__arm_vqaddq_m_u16): Remove.
(__arm_vqsubq_m_n_s8): Remove.
(__arm_vqsubq_m_n_s32): Remove.
(__arm_vqsubq_m_n_s16): Remove.
(__arm_vqsubq_m_n_u8): Remove.
(__arm_vqsubq_m_n_u32): Remove.
(__arm_vqsubq_m_n_u16): Remove.
(__arm_vqsubq_m_s8): Remove.
(__arm_vqsubq_m_s32): Remove.
(__arm_vqsubq_m_s16): Remove.
(__arm_vqsubq_m_u8): Remove.
(__arm_vqsubq_m_u32): Remove.
(__arm_vqsubq_m_u16): Remove.
(__arm_vqsubq): Remove.
(__arm_vqaddq): Remove.
(__arm_vqaddq_m): Remove.
(__arm_vqsubq_m): Remove.
(vqdmulhq): Remove.
(vqdmulhq_m): Remove.
(vqdmulhq_s8): Remove.
(vqdmulhq_n_s8): Remove.
(vqdmulhq_s16): Remove.
(vqdmulhq_n_s16): Remove.
(vqdmulhq_s32): Remove.
(vqdmulhq_n_s32): Remove.
(vqdmulhq_m_n_s8): Remove.
(vqdmulhq_m_n_s32): Remove.
(vqdmulhq_m_n_s16): Remove.
(vqdmulhq_m_s8): Remove.
(vqdmulhq_m_s32): Remove.
(vqdmulhq_m_s16): Remove.
(__arm_vqdmulhq_s8): Remove.
(__arm_vqdmulhq_n_s8): Remove.
(__arm_vqdmulhq_s16): Remove.
(__arm_vqdmulhq_n_s16): Remove.
(__arm_vqdmulhq_s32): Remove.
(__arm_vqdmulhq_n_s32): Remove.
(__arm_vqdmulhq_m_n_s8): Remove.
(__arm_vqdmulhq_m_n_s32): Remove.
(__arm_vqdmulhq_m_n_s16): Remove.
(__arm_vqdmulhq_m_s8): Remove.
(__arm_vqdmulhq_m_s32): Remove.
(__arm_vqdmulhq_m_s16): Remove.
(__arm_vqdmulhq): Remove.
(__arm_vqdmulhq_m): Remove.
(vrhaddq): Remove.
(vrhaddq_m): Remove.
(vrhaddq_x): Remove.
(vrhaddq_u8): Remove.
(vrhaddq_s8): Remove.
(vrhaddq_u16): Remove.
(vrhaddq_s16): Remove.
(vrhaddq_u32): Remove.
(vrhaddq_s32): Remove.
(vrhaddq_m_s8): Remove.
(vrhaddq_m_s32): Remove.
(vrhaddq_m_s16): Remove.
(vrhaddq_m_u8): Remove.
(vrhaddq_m_u32): Remove.
(vrhaddq_m_u16): Remove.
(vrhaddq_x_s8): Remove.
(vrhaddq_x_s16): Remove.
(vrhaddq_x_s32): Remove.
(vrhaddq_x_u8): Remove.
(vrhaddq_x_u16): Remove.
(vrhaddq_x_u32): Remove.
(__arm_vrhaddq_u8): Remove.
(__arm_vrhaddq_s8): Remove.
(__arm_vrhaddq_u16): Remove.
(__arm_vrhaddq_s16): Remove.
(__arm_vrhaddq_u32): Remove.
(__arm_vrhaddq_s32): Remove.
(__arm_vrhaddq_m_s8): Remove.
(__arm_vrhaddq_m_s32): Remove.
(__arm_vrhaddq_m_s16): Remove.
(__arm_vrhaddq_m_u8): Remove.
(__arm_vrhaddq_m_u32): Remove.
(__arm_vrhaddq_m_u16): Remove.
(__arm_vrhaddq_x_s8): Remove.
(__arm_vrhaddq_x_s16): Remove.
(__arm_vrhaddq_x_s32): Remove.
(__arm_vrhaddq_x_u8): Remove.
(__arm_vrhaddq_x_u16): Remove.
(__arm_vrhaddq_x_u32): Remove.
(__arm_vrhaddq): Remove.
(__arm_vrhaddq_m): Remove.
(__arm_vrhaddq_x): Remove.
(vrmulhq): Remove.
(vrmulhq_m): Remove.
(vrmulhq_x): Remove.
(vrmulhq_u8): Remove.
(vrmulhq_s8): Remove.
(vrmulhq_u16): Remove.
(vrmulhq_s16): Remove.
(vrmulhq_u32): Remove.
(vrmulhq_s32): Remove.
(vrmulhq_m_s8): Remove.
(vrmulhq_m_s32): Remove.
(vrmulhq_m_s16): Remove.
(vrmulhq_m_u8): Remove.
(vrmulhq_m_u32): Remove.
(vrmulhq_m_u16): Remove.
(vrmulhq_x_s8): Remove.
(vrmulhq_x_s16): Remove.
(vrmulhq_x_s32): Remove.
(vrmulhq_x_u8): Remove.
(vrmulhq_x_u16): Remove.
(vrmulhq_x_u32): Remove.
(__arm_vrmulhq_u8): Remove.
(__arm_vrmulhq_s8): Remove.
(__arm_vrmulhq_u16): Remove.
(__arm_vrmulhq_s16): Remove.
(__arm_vrmulhq_u32): Remove.
(__arm_vrmulhq_s32): Remove.
(__arm_vrmulhq_m_s8): Remove.
(__arm_vrmulhq_m_s32): Remove.
(__arm_vrmulhq_m_s16): Remove.
(__arm_vrmulhq_m_u8): Remove.
(__arm_vrmulhq_m_u32): Remove.
(__arm_vrmulhq_m_u16): Remove.
(__arm_vrmulhq_x_s8): Remove.
(__arm_vrmulhq_x_s16): Remove.
(__arm_vrmulhq_x_s32): Remove.
(__arm_vrmulhq_x_u8): Remove.
(__arm_vrmulhq_x_u16): Remove.
(__arm_vrmulhq_x_u32): Remove.
(__arm_vrmulhq): Remove.
(__arm_vrmulhq_m): Remove.
(__arm_vrmulhq_x): Remove.
|
|
Factorize vabdq, vhaddq, vhsubq, vmulhq, vqaddq_u, vqdmulhq,
vqrdmulhq, vqrshlq, vqshlq, vqsubq_u, vrhaddq, vrmulhq, vrshlq
so that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_SU_BINARY): New.
(mve_insn): Add vabdq, vhaddq, vhsubq, vmulhq, vqaddq, vqdmulhq,
vqrdmulhq, vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq.
(supf): Add VQDMULHQ_S, VQRDMULHQ_S.
* config/arm/mve.md (mve_vabdq_<supf><mode>)
(@mve_vhaddq_<supf><mode>, mve_vhsubq_<supf><mode>)
(mve_vmulhq_<supf><mode>, mve_vqaddq_<supf><mode>)
(mve_vqdmulhq_s<mode>, mve_vqrdmulhq_s<mode>)
(mve_vqrshlq_<supf><mode>, mve_vqshlq_<supf><mode>)
(mve_vqsubq_<supf><mode>, @mve_vrhaddq_<supf><mode>)
(mve_vrmulhq_<supf><mode>, mve_vrshlq_<supf><mode>): Merge into
...
(@mve_<mve_insn>q_<supf><mode>): ... this.
* config/arm/vec-common.md (avg<mode>3_floor, uavg<mode>3_floor)
(avg<mode>3_ceil, uavg<mode>3_ceil): Use gen_mve_q instead of
gen_mve_vhaddq / gen_mve_vrhaddq.
|
|
Factorize vhaddq_m_n, vhsubq_m_n, vmlaq_m_n, vmlasq_m_n, vqaddq_m_n,
vqdmlahq_m_n, vqdmlashq_m_n, vqdmulhq_m_n, vqrdmlahq_m_n,
vqrdmlashq_m_n, vqrdmulhq_m_n, vqsubq_m_n
so that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_SU_M_N_BINARY): New.
(mve_insn): Add vhaddq, vhsubq, vmlaq, vmlasq, vqaddq, vqdmlahq,
vqdmlashq, vqdmulhq, vqrdmlahq, vqrdmlashq, vqrdmulhq, vqsubq.
(supf): Add VQDMLAHQ_M_N_S, VQDMLASHQ_M_N_S, VQRDMLAHQ_M_N_S,
VQRDMLASHQ_M_N_S, VQDMULHQ_M_N_S, VQRDMULHQ_M_N_S.
* config/arm/mve.md (mve_vhaddq_m_n_<supf><mode>)
(mve_vhsubq_m_n_<supf><mode>, mve_vmlaq_m_n_<supf><mode>)
(mve_vmlasq_m_n_<supf><mode>, mve_vqaddq_m_n_<supf><mode>)
(mve_vqdmlahq_m_n_s<mode>, mve_vqdmlashq_m_n_s<mode>)
(mve_vqrdmlahq_m_n_s<mode>, mve_vqrdmlashq_m_n_s<mode>)
(mve_vqsubq_m_n_<supf><mode>, mve_vqdmulhq_m_n_s<mode>)
(mve_vqrdmulhq_m_n_s<mode>): Merge into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
|
|
Factorize
vhaddq_n, vhsubq_n, vqaddq_n, vqdmulhq_n, vqrdmulhq_n, vqsubq_n
so that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_SU_N_BINARY): New.
(mve_insn): Add vhaddq, vhsubq, vqaddq, vqdmulhq, vqrdmulhq,
vqsubq.
(supf): Add VQDMULHQ_N_S, VQRDMULHQ_N_S.
* config/arm/mve.md (mve_vhaddq_n_<supf><mode>)
(mve_vhsubq_n_<supf><mode>, mve_vqaddq_n_<supf><mode>)
(mve_vqdmulhq_n_s<mode>, mve_vqrdmulhq_n_s<mode>)
(mve_vqsubq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
|
|
Factorize m-predicated versions of vabdq, vhaddq, vhsubq, vmaxq,
vminq, vmulhq, vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq,
vqdmulhq, vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq
so that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_SU_M_BINARY): New.
(mve_insn): Add vabdq, vhaddq, vhsubq, vmaxq, vminq, vmulhq,
vqaddq, vqdmladhq, vqdmladhxq, vqdmlsdhq, vqdmlsdhxq, vqdmulhq,
vqrdmladhq, vqrdmladhxq, vqrdmlsdhq, vqrdmlsdhxq, vqrdmulhq,
vqrshlq, vqshlq, vqsubq, vrhaddq, vrmulhq, vrshlq, vshlq.
(supf): Add VQDMLADHQ_M_S, VQDMLADHXQ_M_S, VQDMLSDHQ_M_S,
VQDMLSDHXQ_M_S, VQDMULHQ_M_S, VQRDMLADHQ_M_S, VQRDMLADHXQ_M_S,
VQRDMLSDHQ_M_S, VQRDMLSDHXQ_M_S, VQRDMULHQ_M_S.
* config/arm/mve.md (@mve_<mve_insn>q_m_<supf><mode>): New.
(mve_vshlq_m_<supf><mode>): Merged into
@mve_<mve_insn>q_m_<supf><mode>.
(mve_vabdq_m_<supf><mode>): Likewise.
(mve_vhaddq_m_<supf><mode>): Likewise.
(mve_vhsubq_m_<supf><mode>): Likewise.
(mve_vmaxq_m_<supf><mode>): Likewise.
(mve_vminq_m_<supf><mode>): Likewise.
(mve_vmulhq_m_<supf><mode>): Likewise.
(mve_vqaddq_m_<supf><mode>): Likewise.
(mve_vqrshlq_m_<supf><mode>): Likewise.
(mve_vqshlq_m_<supf><mode>): Likewise.
(mve_vqsubq_m_<supf><mode>): Likewise.
(mve_vrhaddq_m_<supf><mode>): Likewise.
(mve_vrmulhq_m_<supf><mode>): Likewise.
(mve_vrshlq_m_<supf><mode>): Likewise.
(mve_vqdmladhq_m_s<mode>): Likewise.
(mve_vqdmladhxq_m_s<mode>): Likewise.
(mve_vqdmlsdhq_m_s<mode>): Likewise.
(mve_vqdmlsdhxq_m_s<mode>): Likewise.
(mve_vqdmulhq_m_s<mode>): Likewise.
(mve_vqrdmladhq_m_s<mode>): Likewise.
(mve_vqrdmladhxq_m_s<mode>): Likewise.
(mve_vqrdmlsdhq_m_s<mode>): Likewise.
(mve_vqrdmlsdhxq_m_s<mode>): Likewise.
(mve_vqrdmulhq_m_s<mode>): Likewise.
|
|
Implement vcreateq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_M_N): New. (vcreateq): New.
* config/arm/arm-mve-builtins-base.def (vcreateq): New.
* config/arm/arm-mve-builtins-base.h (vcreateq): New.
* config/arm/arm_mve.h (vcreateq_f16): Remove.
(vcreateq_f32): Remove.
(vcreateq_u8): Remove.
(vcreateq_u16): Remove.
(vcreateq_u32): Remove.
(vcreateq_u64): Remove.
(vcreateq_s8): Remove.
(vcreateq_s16): Remove.
(vcreateq_s32): Remove.
(vcreateq_s64): Remove.
(__arm_vcreateq_u8): Remove.
(__arm_vcreateq_u16): Remove.
(__arm_vcreateq_u32): Remove.
(__arm_vcreateq_u64): Remove.
(__arm_vcreateq_s8): Remove.
(__arm_vcreateq_s16): Remove.
(__arm_vcreateq_s32): Remove.
(__arm_vcreateq_s64): Remove.
(__arm_vcreateq_f16): Remove.
(__arm_vcreateq_f32): Remove.
|
|
We need a 'fake' iterator to be able to use mve_insn for vcreateq_f.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_FP_CREATE_ONLY): New.
(mve_insn): Add VCREATEQ_S, VCREATEQ_U, VCREATEQ_F.
* config/arm/mve.md (mve_vcreateq_f<mode>): Rename into ...
(@mve_<mve_insn>q_f<mode>): ... this.
(mve_vcreateq_<supf><mode>): Rename into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.
|
|
This patch adds the create shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (create): New.
* config/arm/arm-mve-builtins-shapes.h: (create): New.
|
|
Introduce a function that will be used to build intrinsics which use
UNSPECS for the versions.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn): New.
|
|
Implement vorrq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N_NO_N_F): New.
(vorrq): New.
* config/arm/arm-mve-builtins-base.def (vorrq): New.
* config/arm/arm-mve-builtins-base.h (vorrq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vorrq.
* config/arm/arm_mve.h (vorrq): Remove.
(vorrq_m_n): Remove.
(vorrq_m): Remove.
(vorrq_x): Remove.
(vorrq_u8): Remove.
(vorrq_s8): Remove.
(vorrq_u16): Remove.
(vorrq_s16): Remove.
(vorrq_u32): Remove.
(vorrq_s32): Remove.
(vorrq_n_u16): Remove.
(vorrq_f16): Remove.
(vorrq_n_s16): Remove.
(vorrq_n_u32): Remove.
(vorrq_f32): Remove.
(vorrq_n_s32): Remove.
(vorrq_m_n_s16): Remove.
(vorrq_m_n_u16): Remove.
(vorrq_m_n_s32): Remove.
(vorrq_m_n_u32): Remove.
(vorrq_m_s8): Remove.
(vorrq_m_s32): Remove.
(vorrq_m_s16): Remove.
(vorrq_m_u8): Remove.
(vorrq_m_u32): Remove.
(vorrq_m_u16): Remove.
(vorrq_m_f32): Remove.
(vorrq_m_f16): Remove.
(vorrq_x_s8): Remove.
(vorrq_x_s16): Remove.
(vorrq_x_s32): Remove.
(vorrq_x_u8): Remove.
(vorrq_x_u16): Remove.
(vorrq_x_u32): Remove.
(vorrq_x_f16): Remove.
(vorrq_x_f32): Remove.
(__arm_vorrq_u8): Remove.
(__arm_vorrq_s8): Remove.
(__arm_vorrq_u16): Remove.
(__arm_vorrq_s16): Remove.
(__arm_vorrq_u32): Remove.
(__arm_vorrq_s32): Remove.
(__arm_vorrq_n_u16): Remove.
(__arm_vorrq_n_s16): Remove.
(__arm_vorrq_n_u32): Remove.
(__arm_vorrq_n_s32): Remove.
(__arm_vorrq_m_n_s16): Remove.
(__arm_vorrq_m_n_u16): Remove.
(__arm_vorrq_m_n_s32): Remove.
(__arm_vorrq_m_n_u32): Remove.
(__arm_vorrq_m_s8): Remove.
(__arm_vorrq_m_s32): Remove.
(__arm_vorrq_m_s16): Remove.
(__arm_vorrq_m_u8): Remove.
(__arm_vorrq_m_u32): Remove.
(__arm_vorrq_m_u16): Remove.
(__arm_vorrq_x_s8): Remove.
(__arm_vorrq_x_s16): Remove.
(__arm_vorrq_x_s32): Remove.
(__arm_vorrq_x_u8): Remove.
(__arm_vorrq_x_u16): Remove.
(__arm_vorrq_x_u32): Remove.
(__arm_vorrq_f16): Remove.
(__arm_vorrq_f32): Remove.
(__arm_vorrq_m_f32): Remove.
(__arm_vorrq_m_f16): Remove.
(__arm_vorrq_x_f16): Remove.
(__arm_vorrq_x_f32): Remove.
(__arm_vorrq): Remove.
(__arm_vorrq_m_n): Remove.
(__arm_vorrq_m): Remove.
(__arm_vorrq_x): Remove.
|
|
patch adds the binary_orrq shape description.
MODE_n intrinsics use a set of predicates (preds_m_or_none) different
the MODE_none ones, so we explicitly reference preds_m_or_none from
the shape, thus we need to make it a global array.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_orrq): New.
* config/arm/arm-mve-builtins-shapes.h (binary_orrq): New.
* config/arm/arm-mve-builtins.cc (preds_m_or_none): Remove static.
* config/arm/arm-mve-builtins.h (preds_m_or_none): Declare.
|
|
Implement vamdq, veorq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M): New.
(vandq,veorq): New.
* config/arm/arm-mve-builtins-base.def (vandq, veorq): New.
* config/arm/arm-mve-builtins-base.h (vandq, veorq): New.
* config/arm/arm_mve.h (vandq): Remove.
(vandq_m): Remove.
(vandq_x): Remove.
(vandq_u8): Remove.
(vandq_s8): Remove.
(vandq_u16): Remove.
(vandq_s16): Remove.
(vandq_u32): Remove.
(vandq_s32): Remove.
(vandq_f16): Remove.
(vandq_f32): Remove.
(vandq_m_s8): Remove.
(vandq_m_s32): Remove.
(vandq_m_s16): Remove.
(vandq_m_u8): Remove.
(vandq_m_u32): Remove.
(vandq_m_u16): Remove.
(vandq_m_f32): Remove.
(vandq_m_f16): Remove.
(vandq_x_s8): Remove.
(vandq_x_s16): Remove.
(vandq_x_s32): Remove.
(vandq_x_u8): Remove.
(vandq_x_u16): Remove.
(vandq_x_u32): Remove.
(vandq_x_f16): Remove.
(vandq_x_f32): Remove.
(__arm_vandq_u8): Remove.
(__arm_vandq_s8): Remove.
(__arm_vandq_u16): Remove.
(__arm_vandq_s16): Remove.
(__arm_vandq_u32): Remove.
(__arm_vandq_s32): Remove.
(__arm_vandq_m_s8): Remove.
(__arm_vandq_m_s32): Remove.
(__arm_vandq_m_s16): Remove.
(__arm_vandq_m_u8): Remove.
(__arm_vandq_m_u32): Remove.
(__arm_vandq_m_u16): Remove.
(__arm_vandq_x_s8): Remove.
(__arm_vandq_x_s16): Remove.
(__arm_vandq_x_s32): Remove.
(__arm_vandq_x_u8): Remove.
(__arm_vandq_x_u16): Remove.
(__arm_vandq_x_u32): Remove.
(__arm_vandq_f16): Remove.
(__arm_vandq_f32): Remove.
(__arm_vandq_m_f32): Remove.
(__arm_vandq_m_f16): Remove.
(__arm_vandq_x_f16): Remove.
(__arm_vandq_x_f32): Remove.
(__arm_vandq): Remove.
(__arm_vandq_m): Remove.
(__arm_vandq_x): Remove.
(veorq_m): Remove.
(veorq_x): Remove.
(veorq_u8): Remove.
(veorq_s8): Remove.
(veorq_u16): Remove.
(veorq_s16): Remove.
(veorq_u32): Remove.
(veorq_s32): Remove.
(veorq_f16): Remove.
(veorq_f32): Remove.
(veorq_m_s8): Remove.
(veorq_m_s32): Remove.
(veorq_m_s16): Remove.
(veorq_m_u8): Remove.
(veorq_m_u32): Remove.
(veorq_m_u16): Remove.
(veorq_m_f32): Remove.
(veorq_m_f16): Remove.
(veorq_x_s8): Remove.
(veorq_x_s16): Remove.
(veorq_x_s32): Remove.
(veorq_x_u8): Remove.
(veorq_x_u16): Remove.
(veorq_x_u32): Remove.
(veorq_x_f16): Remove.
(veorq_x_f32): Remove.
(__arm_veorq_u8): Remove.
(__arm_veorq_s8): Remove.
(__arm_veorq_u16): Remove.
(__arm_veorq_s16): Remove.
(__arm_veorq_u32): Remove.
(__arm_veorq_s32): Remove.
(__arm_veorq_m_s8): Remove.
(__arm_veorq_m_s32): Remove.
(__arm_veorq_m_s16): Remove.
(__arm_veorq_m_u8): Remove.
(__arm_veorq_m_u32): Remove.
(__arm_veorq_m_u16): Remove.
(__arm_veorq_x_s8): Remove.
(__arm_veorq_x_s16): Remove.
(__arm_veorq_x_s32): Remove.
(__arm_veorq_x_u8): Remove.
(__arm_veorq_x_u16): Remove.
(__arm_veorq_x_u32): Remove.
(__arm_veorq_f16): Remove.
(__arm_veorq_f32): Remove.
(__arm_veorq_m_f32): Remove.
(__arm_veorq_m_f16): Remove.
(__arm_veorq_x_f16): Remove.
(__arm_veorq_x_f32): Remove.
(__arm_veorq): Remove.
(__arm_veorq_m): Remove.
(__arm_veorq_x): Remove.
|
|
Factorize vandq, veorq, vorrq, vbicq so that they use the same
parameterized names.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_M_BINARY_LOGIC)
(MVE_FP_M_BINARY_LOGIC): New.
(MVE_INT_M_N_BINARY_LOGIC): New.
(MVE_INT_N_BINARY_LOGIC): New.
(mve_insn): Add vand, veor, vorr, vbic.
* config/arm/mve.md (mve_vandq_m_<supf><mode>)
(mve_veorq_m_<supf><mode>, mve_vorrq_m_<supf><mode>)
(mve_vbicq_m_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this.
(mve_vandq_m_f<mode>, mve_veorq_m_f<mode>, mve_vorrq_m_f<mode>)
(mve_vbicq_m_f<mode>): Merge into ...
(@mve_<mve_insn>q_m_f<mode>): ... this.
(mve_vorrq_n_<supf><mode>)
(mve_vbicq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vorrq_m_n_<supf><mode>, mve_vbicq_m_n_<supf><mode>): Merge
into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
|
|
This patch adds the binary shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary): New.
* config/arm/arm-mve-builtins-shapes.h (binary): New.
|
|
Implement vaddq, vmulq, vsubq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITH_RTX_M_N):
New.
(vaddq, vmulq, vsubq): New.
* config/arm/arm-mve-builtins-base.def (vaddq, vmulq, vsubq): New.
* config/arm/arm-mve-builtins-base.h (vaddq, vmulq, vsubq): New.
* config/arm/arm_mve.h (vaddq): Remove.
(vaddq_m): Remove.
(vaddq_x): Remove.
(vaddq_n_u8): Remove.
(vaddq_n_s8): Remove.
(vaddq_n_u16): Remove.
(vaddq_n_s16): Remove.
(vaddq_n_u32): Remove.
(vaddq_n_s32): Remove.
(vaddq_n_f16): Remove.
(vaddq_n_f32): Remove.
(vaddq_m_n_s8): Remove.
(vaddq_m_n_s32): Remove.
(vaddq_m_n_s16): Remove.
(vaddq_m_n_u8): Remove.
(vaddq_m_n_u32): Remove.
(vaddq_m_n_u16): Remove.
(vaddq_m_s8): Remove.
(vaddq_m_s32): Remove.
(vaddq_m_s16): Remove.
(vaddq_m_u8): Remove.
(vaddq_m_u32): Remove.
(vaddq_m_u16): Remove.
(vaddq_m_f32): Remove.
(vaddq_m_f16): Remove.
(vaddq_m_n_f32): Remove.
(vaddq_m_n_f16): Remove.
(vaddq_s8): Remove.
(vaddq_s16): Remove.
(vaddq_s32): Remove.
(vaddq_u8): Remove.
(vaddq_u16): Remove.
(vaddq_u32): Remove.
(vaddq_f16): Remove.
(vaddq_f32): Remove.
(vaddq_x_s8): Remove.
(vaddq_x_s16): Remove.
(vaddq_x_s32): Remove.
(vaddq_x_n_s8): Remove.
(vaddq_x_n_s16): Remove.
(vaddq_x_n_s32): Remove.
(vaddq_x_u8): Remove.
(vaddq_x_u16): Remove.
(vaddq_x_u32): Remove.
(vaddq_x_n_u8): Remove.
(vaddq_x_n_u16): Remove.
(vaddq_x_n_u32): Remove.
(vaddq_x_f16): Remove.
(vaddq_x_f32): Remove.
(vaddq_x_n_f16): Remove.
(vaddq_x_n_f32): Remove.
(__arm_vaddq_n_u8): Remove.
(__arm_vaddq_n_s8): Remove.
(__arm_vaddq_n_u16): Remove.
(__arm_vaddq_n_s16): Remove.
(__arm_vaddq_n_u32): Remove.
(__arm_vaddq_n_s32): Remove.
(__arm_vaddq_m_n_s8): Remove.
(__arm_vaddq_m_n_s32): Remove.
(__arm_vaddq_m_n_s16): Remove.
(__arm_vaddq_m_n_u8): Remove.
(__arm_vaddq_m_n_u32): Remove.
(__arm_vaddq_m_n_u16): Remove.
(__arm_vaddq_m_s8): Remove.
(__arm_vaddq_m_s32): Remove.
(__arm_vaddq_m_s16): Remove.
(__arm_vaddq_m_u8): Remove.
(__arm_vaddq_m_u32): Remove.
(__arm_vaddq_m_u16): Remove.
(__arm_vaddq_s8): Remove.
(__arm_vaddq_s16): Remove.
(__arm_vaddq_s32): Remove.
(__arm_vaddq_u8): Remove.
(__arm_vaddq_u16): Remove.
(__arm_vaddq_u32): Remove.
(__arm_vaddq_x_s8): Remove.
(__arm_vaddq_x_s16): Remove.
(__arm_vaddq_x_s32): Remove.
(__arm_vaddq_x_n_s8): Remove.
(__arm_vaddq_x_n_s16): Remove.
(__arm_vaddq_x_n_s32): Remove.
(__arm_vaddq_x_u8): Remove.
(__arm_vaddq_x_u16): Remove.
(__arm_vaddq_x_u32): Remove.
(__arm_vaddq_x_n_u8): Remove.
(__arm_vaddq_x_n_u16): Remove.
(__arm_vaddq_x_n_u32): Remove.
(__arm_vaddq_n_f16): Remove.
(__arm_vaddq_n_f32): Remove.
(__arm_vaddq_m_f32): Remove.
(__arm_vaddq_m_f16): Remove.
(__arm_vaddq_m_n_f32): Remove.
(__arm_vaddq_m_n_f16): Remove.
(__arm_vaddq_f16): Remove.
(__arm_vaddq_f32): Remove.
(__arm_vaddq_x_f16): Remove.
(__arm_vaddq_x_f32): Remove.
(__arm_vaddq_x_n_f16): Remove.
(__arm_vaddq_x_n_f32): Remove.
(__arm_vaddq): Remove.
(__arm_vaddq_m): Remove.
(__arm_vaddq_x): Remove.
(vmulq): Remove.
(vmulq_m): Remove.
(vmulq_x): Remove.
(vmulq_u8): Remove.
(vmulq_n_u8): Remove.
(vmulq_s8): Remove.
(vmulq_n_s8): Remove.
(vmulq_u16): Remove.
(vmulq_n_u16): Remove.
(vmulq_s16): Remove.
(vmulq_n_s16): Remove.
(vmulq_u32): Remove.
(vmulq_n_u32): Remove.
(vmulq_s32): Remove.
(vmulq_n_s32): Remove.
(vmulq_n_f16): Remove.
(vmulq_f16): Remove.
(vmulq_n_f32): Remove.
(vmulq_f32): Remove.
(vmulq_m_n_s8): Remove.
(vmulq_m_n_s32): Remove.
(vmulq_m_n_s16): Remove.
(vmulq_m_n_u8): Remove.
(vmulq_m_n_u32): Remove.
(vmulq_m_n_u16): Remove.
(vmulq_m_s8): Remove.
(vmulq_m_s32): Remove.
(vmulq_m_s16): Remove.
(vmulq_m_u8): Remove.
(vmulq_m_u32): Remove.
(vmulq_m_u16): Remove.
(vmulq_m_f32): Remove.
(vmulq_m_f16): Remove.
(vmulq_m_n_f32): Remove.
(vmulq_m_n_f16): Remove.
(vmulq_x_s8): Remove.
(vmulq_x_s16): Remove.
(vmulq_x_s32): Remove.
(vmulq_x_n_s8): Remove.
(vmulq_x_n_s16): Remove.
(vmulq_x_n_s32): Remove.
(vmulq_x_u8): Remove.
(vmulq_x_u16): Remove.
(vmulq_x_u32): Remove.
(vmulq_x_n_u8): Remove.
(vmulq_x_n_u16): Remove.
(vmulq_x_n_u32): Remove.
(vmulq_x_f16): Remove.
(vmulq_x_f32): Remove.
(vmulq_x_n_f16): Remove.
(vmulq_x_n_f32): Remove.
(__arm_vmulq_u8): Remove.
(__arm_vmulq_n_u8): Remove.
(__arm_vmulq_s8): Remove.
(__arm_vmulq_n_s8): Remove.
(__arm_vmulq_u16): Remove.
(__arm_vmulq_n_u16): Remove.
(__arm_vmulq_s16): Remove.
(__arm_vmulq_n_s16): Remove.
(__arm_vmulq_u32): Remove.
(__arm_vmulq_n_u32): Remove.
(__arm_vmulq_s32): Remove.
(__arm_vmulq_n_s32): Remove.
(__arm_vmulq_m_n_s8): Remove.
(__arm_vmulq_m_n_s32): Remove.
(__arm_vmulq_m_n_s16): Remove.
(__arm_vmulq_m_n_u8): Remove.
(__arm_vmulq_m_n_u32): Remove.
(__arm_vmulq_m_n_u16): Remove.
(__arm_vmulq_m_s8): Remove.
(__arm_vmulq_m_s32): Remove.
(__arm_vmulq_m_s16): Remove.
(__arm_vmulq_m_u8): Remove.
(__arm_vmulq_m_u32): Remove.
(__arm_vmulq_m_u16): Remove.
(__arm_vmulq_x_s8): Remove.
(__arm_vmulq_x_s16): Remove.
(__arm_vmulq_x_s32): Remove.
(__arm_vmulq_x_n_s8): Remove.
(__arm_vmulq_x_n_s16): Remove.
(__arm_vmulq_x_n_s32): Remove.
(__arm_vmulq_x_u8): Remove.
(__arm_vmulq_x_u16): Remove.
(__arm_vmulq_x_u32): Remove.
(__arm_vmulq_x_n_u8): Remove.
(__arm_vmulq_x_n_u16): Remove.
(__arm_vmulq_x_n_u32): Remove.
(__arm_vmulq_n_f16): Remove.
(__arm_vmulq_f16): Remove.
(__arm_vmulq_n_f32): Remove.
(__arm_vmulq_f32): Remove.
(__arm_vmulq_m_f32): Remove.
(__arm_vmulq_m_f16): Remove.
(__arm_vmulq_m_n_f32): Remove.
(__arm_vmulq_m_n_f16): Remove.
(__arm_vmulq_x_f16): Remove.
(__arm_vmulq_x_f32): Remove.
(__arm_vmulq_x_n_f16): Remove.
(__arm_vmulq_x_n_f32): Remove.
(__arm_vmulq): Remove.
(__arm_vmulq_m): Remove.
(__arm_vmulq_x): Remove.
(vsubq): Remove.
(vsubq_m): Remove.
(vsubq_x): Remove.
(vsubq_n_f16): Remove.
(vsubq_n_f32): Remove.
(vsubq_u8): Remove.
(vsubq_n_u8): Remove.
(vsubq_s8): Remove.
(vsubq_n_s8): Remove.
(vsubq_u16): Remove.
(vsubq_n_u16): Remove.
(vsubq_s16): Remove.
(vsubq_n_s16): Remove.
(vsubq_u32): Remove.
(vsubq_n_u32): Remove.
(vsubq_s32): Remove.
(vsubq_n_s32): Remove.
(vsubq_f16): Remove.
(vsubq_f32): Remove.
(vsubq_m_s8): Remove.
(vsubq_m_u8): Remove.
(vsubq_m_s16): Remove.
(vsubq_m_u16): Remove.
(vsubq_m_s32): Remove.
(vsubq_m_u32): Remove.
(vsubq_m_n_s8): Remove.
(vsubq_m_n_s32): Remove.
(vsubq_m_n_s16): Remove.
(vsubq_m_n_u8): Remove.
(vsubq_m_n_u32): Remove.
(vsubq_m_n_u16): Remove.
(vsubq_m_f32): Remove.
(vsubq_m_f16): Remove.
(vsubq_m_n_f32): Remove.
(vsubq_m_n_f16): Remove.
(vsubq_x_s8): Remove.
(vsubq_x_s16): Remove.
(vsubq_x_s32): Remove.
(vsubq_x_n_s8): Remove.
(vsubq_x_n_s16): Remove.
(vsubq_x_n_s32): Remove.
(vsubq_x_u8): Remove.
(vsubq_x_u16): Remove.
(vsubq_x_u32): Remove.
(vsubq_x_n_u8): Remove.
(vsubq_x_n_u16): Remove.
(vsubq_x_n_u32): Remove.
(vsubq_x_f16): Remove.
(vsubq_x_f32): Remove.
(vsubq_x_n_f16): Remove.
(vsubq_x_n_f32): Remove.
(__arm_vsubq_u8): Remove.
(__arm_vsubq_n_u8): Remove.
(__arm_vsubq_s8): Remove.
(__arm_vsubq_n_s8): Remove.
(__arm_vsubq_u16): Remove.
(__arm_vsubq_n_u16): Remove.
(__arm_vsubq_s16): Remove.
(__arm_vsubq_n_s16): Remove.
(__arm_vsubq_u32): Remove.
(__arm_vsubq_n_u32): Remove.
(__arm_vsubq_s32): Remove.
(__arm_vsubq_n_s32): Remove.
(__arm_vsubq_m_s8): Remove.
(__arm_vsubq_m_u8): Remove.
(__arm_vsubq_m_s16): Remove.
(__arm_vsubq_m_u16): Remove.
(__arm_vsubq_m_s32): Remove.
(__arm_vsubq_m_u32): Remove.
(__arm_vsubq_m_n_s8): Remove.
(__arm_vsubq_m_n_s32): Remove.
(__arm_vsubq_m_n_s16): Remove.
(__arm_vsubq_m_n_u8): Remove.
(__arm_vsubq_m_n_u32): Remove.
(__arm_vsubq_m_n_u16): Remove.
(__arm_vsubq_x_s8): Remove.
(__arm_vsubq_x_s16): Remove.
(__arm_vsubq_x_s32): Remove.
(__arm_vsubq_x_n_s8): Remove.
(__arm_vsubq_x_n_s16): Remove.
(__arm_vsubq_x_n_s32): Remove.
(__arm_vsubq_x_u8): Remove.
(__arm_vsubq_x_u16): Remove.
(__arm_vsubq_x_u32): Remove.
(__arm_vsubq_x_n_u8): Remove.
(__arm_vsubq_x_n_u16): Remove.
(__arm_vsubq_x_n_u32): Remove.
(__arm_vsubq_n_f16): Remove.
(__arm_vsubq_n_f32): Remove.
(__arm_vsubq_f16): Remove.
(__arm_vsubq_f32): Remove.
(__arm_vsubq_m_f32): Remove.
(__arm_vsubq_m_f16): Remove.
(__arm_vsubq_m_n_f32): Remove.
(__arm_vsubq_m_n_f16): Remove.
(__arm_vsubq_x_f16): Remove.
(__arm_vsubq_x_f32): Remove.
(__arm_vsubq_x_n_f16): Remove.
(__arm_vsubq_x_n_f32): Remove.
(__arm_vsubq): Remove.
(__arm_vsubq_m): Remove.
(__arm_vsubq_x): Remove.
* config/arm/arm_mve_builtins.def (vsubq_u, vsubq_s, vsubq_f):
Remove.
(vmulq_u, vmulq_s, vmulq_f): Remove.
* config/arm/mve.md (mve_vsubq_<supf><mode>): Remove.
(mve_vmulq_<supf><mode>): Remove.
|
|
In order to avoid using a huge switch when generating all the
intrinsics (e.g. mve_vaddq_n_sv4si, ...), we want to generate a single
function taking the builtin code as parameter (e.g. mve_q_n (VADDQ_S,
....)
This is achieved by using the new mve_insn iterator.
Having done that, it becomes easier to share similar patterns, to
avoid useless/error-prone code duplication.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/ChangeLog:
* config/arm/iterators.md (MVE_INT_BINARY_RTX, MVE_INT_M_BINARY)
(MVE_INT_M_N_BINARY, MVE_INT_N_BINARY, MVE_FP_M_BINARY)
(MVE_FP_M_N_BINARY, MVE_FP_N_BINARY, mve_addsubmul, mve_insn): New
iterators.
* config/arm/mve.md
(mve_vsubq_n_f<mode>, mve_vaddq_n_f<mode>, mve_vmulq_n_f<mode>):
Factorize into ...
(@mve_<mve_insn>q_n_f<mode>): ... this.
(mve_vaddq_n_<supf><mode>, mve_vmulq_n_<supf><mode>)
(mve_vsubq_n_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vaddq<mode>, mve_vmulq<mode>, mve_vsubq<mode>): Factorize
into ...
(mve_<mve_addsubmul>q<mode>): ... this.
(mve_vaddq_f<mode>, mve_vmulq_f<mode>, mve_vsubq_f<mode>):
Factorize into ...
(mve_<mve_addsubmul>q_f<mode>): ... this.
(mve_vaddq_m_<supf><mode>, mve_vmulq_m_<supf><mode>)
(mve_vsubq_m_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this,
(mve_vaddq_m_n_<supf><mode>, mve_vmulq_m_n_<supf><mode>)
(mve_vsubq_m_n_<supf><mode>): Factorize into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
(mve_vaddq_m_f<mode>, mve_vmulq_m_f<mode>, mve_vsubq_m_f<mode>):
Factorize into ...
(@mve_<mve_insn>q_m_f<mode>): ... this.
(mve_vaddq_m_n_f<mode>, mve_vmulq_m_n_f<mode>)
(mve_vsubq_m_n_f<mode>): Factorize into ...
(@mve_<mve_insn>q_m_n_f<mode>): ... this.
|
|
Introduce a function that will be used to build intrinsics which use
RTX codes for the non-predicated, no-mode version, and UNSPECS
otherwise.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/ChangeLog:
* config/arm/arm-mve-builtins-functions.h (class
unspec_based_mve_function_base): New.
(class unspec_based_mve_function_exact_insn): New.
|
|
This patch adds the binary_opt_n shape description.
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_opt_n): New.
* config/arm/arm-mve-builtins-shapes.h (binary_opt_n): New.
|
|
Implement vuninitialized using the new MVE builtins framework.
We need to keep the overloaded __arm_vuninitializedq definitions
because their resolution depends on the result type only, which is not
currently supported by the resolver.
2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/ChangeLog:
* config/arm/arm-mve-builtins-base.cc (class
vuninitializedq_impl): New.
* config/arm/arm-mve-builtins-base.def (vuninitializedq): New.
* config/arm/arm-mve-builtins-base.h (vuninitializedq): New
declaration.
* config/arm/arm-mve-builtins-shapes.cc (inherent): New.
* config/arm/arm-mve-builtins-shapes.h (inherent): New
declaration.
* config/arm/arm_mve_types.h (__arm_vuninitializedq): Move to ...
* config/arm/arm_mve.h (__arm_vuninitializedq): ... here.
(__arm_vuninitializedq_u8): Remove.
(__arm_vuninitializedq_u16): Remove.
(__arm_vuninitializedq_u32): Remove.
(__arm_vuninitializedq_u64): Remove.
(__arm_vuninitializedq_s8): Remove.
(__arm_vuninitializedq_s16): Remove.
(__arm_vuninitializedq_s32): Remove.
(__arm_vuninitializedq_s64): Remove.
(__arm_vuninitializedq_f16): Remove.
(__arm_vuninitializedq_f32): Remove.
|
|
This patch implements vreinterpretq using the new MVE intrinsics
framework.
The old definitions for vreinterpretq are removed as a consequence.
2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (vreinterpretq_impl): New class.
* config/arm/arm-mve-builtins-base.def: Define vreinterpretq.
* config/arm/arm-mve-builtins-base.h (vreinterpretq): New declaration.
* config/arm/arm-mve-builtins-shapes.cc (parse_element_type): New function.
(parse_type): Likewise.
(parse_signature): Likewise.
(build_one): Likewise.
(build_all): Likewise.
(overloaded_base): New struct.
(unary_convert_def): Likewise.
* config/arm/arm-mve-builtins-shapes.h (unary_convert): Declare.
* config/arm/arm-mve-builtins.cc (TYPES_reinterpret_signed1): New
macro.
(TYPES_reinterpret_unsigned1): Likewise.
(TYPES_reinterpret_integer): Likewise.
(TYPES_reinterpret_integer1): Likewise.
(TYPES_reinterpret_float1): Likewise.
(TYPES_reinterpret_float): Likewise.
(reinterpret_integer): New.
(reinterpret_float): New.
(handle_arm_mve_h): Register builtins.
* config/arm/arm_mve.h (vreinterpretq_s16): Remove.
(vreinterpretq_s32): Likewise.
(vreinterpretq_s64): Likewise.
(vreinterpretq_s8): Likewise.
(vreinterpretq_u16): Likewise.
(vreinterpretq_u32): Likewise.
(vreinterpretq_u64): Likewise.
(vreinterpretq_u8): Likewise.
(vreinterpretq_f16): Likewise.
(vreinterpretq_f32): Likewise.
(vreinterpretq_s16_s32): Likewise.
(vreinterpretq_s16_s64): Likewise.
(vreinterpretq_s16_s8): Likewise.
(vreinterpretq_s16_u16): Likewise.
(vreinterpretq_s16_u32): Likewise.
(vreinterpretq_s16_u64): Likewise.
(vreinterpretq_s16_u8): Likewise.
(vreinterpretq_s32_s16): Likewise.
(vreinterpretq_s32_s64): Likewise.
(vreinterpretq_s32_s8): Likewise.
(vreinterpretq_s32_u16): Likewise.
(vreinterpretq_s32_u32): Likewise.
(vreinterpretq_s32_u64): Likewise.
(vreinterpretq_s32_u8): Likewise.
(vreinterpretq_s64_s16): Likewise.
(vreinterpretq_s64_s32): Likewise.
(vreinterpretq_s64_s8): Likewise.
(vreinterpretq_s64_u16): Likewise.
(vreinterpretq_s64_u32): Likewise.
(vreinterpretq_s64_u64): Likewise.
(vreinterpretq_s64_u8): Likewise.
(vreinterpretq_s8_s16): Likewise.
(vreinterpretq_s8_s32): Likewise.
(vreinterpretq_s8_s64): Likewise.
(vreinterpretq_s8_u16): Likewise.
(vreinterpretq_s8_u32): Likewise.
(vreinterpretq_s8_u64): Likewise.
(vreinterpretq_s8_u8): Likewise.
(vreinterpretq_u16_s16): Likewise.
(vreinterpretq_u16_s32): Likewise.
(vreinterpretq_u16_s64): Likewise.
(vreinterpretq_u16_s8): Likewise.
(vreinterpretq_u16_u32): Likewise.
(vreinterpretq_u16_u64): Likewise.
(vreinterpretq_u16_u8): Likewise.
(vreinterpretq_u32_s16): Likewise.
(vreinterpretq_u32_s32): Likewise.
(vreinterpretq_u32_s64): Likewise.
(vreinterpretq_u32_s8): Likewise.
(vreinterpretq_u32_u16): Likewise.
(vreinterpretq_u32_u64): Likewise.
(vreinterpretq_u32_u8): Likewise.
(vreinterpretq_u64_s16): Likewise.
(vreinterpretq_u64_s32): Likewise.
(vreinterpretq_u64_s64): Likewise.
(vreinterpretq_u64_s8): Likewise.
(vreinterpretq_u64_u16): Likewise.
(vreinterpretq_u64_u32): Likewise.
(vreinterpretq_u64_u8): Likewise.
(vreinterpretq_u8_s16): Likewise.
(vreinterpretq_u8_s32): Likewise.
(vreinterpretq_u8_s64): Likewise.
(vreinterpretq_u8_s8): Likewise.
(vreinterpretq_u8_u16): Likewise.
(vreinterpretq_u8_u32): Likewise.
(vreinterpretq_u8_u64): Likewise.
(vreinterpretq_s32_f16): Likewise.
(vreinterpretq_s32_f32): Likewise.
(vreinterpretq_u16_f16): Likewise.
(vreinterpretq_u16_f32): Likewise.
(vreinterpretq_u32_f16): Likewise.
(vreinterpretq_u32_f32): Likewise.
(vreinterpretq_u64_f16): Likewise.
(vreinterpretq_u64_f32): Likewise.
(vreinterpretq_u8_f16): Likewise.
(vreinterpretq_u8_f32): Likewise.
(vreinterpretq_f16_f32): Likewise.
(vreinterpretq_f16_s16): Likewise.
(vreinterpretq_f16_s32): Likewise.
(vreinterpretq_f16_s64): Likewise.
(vreinterpretq_f16_s8): Likewise.
(vreinterpretq_f16_u16): Likewise.
(vreinterpretq_f16_u32): Likewise.
(vreinterpretq_f16_u64): Likewise.
(vreinterpretq_f16_u8): Likewise.
(vreinterpretq_f32_f16): Likewise.
(vreinterpretq_f32_s16): Likewise.
(vreinterpretq_f32_s32): Likewise.
(vreinterpretq_f32_s64): Likewise.
(vreinterpretq_f32_s8): Likewise.
(vreinterpretq_f32_u16): Likewise.
(vreinterpretq_f32_u32): Likewise.
(vreinterpretq_f32_u64): Likewise.
(vreinterpretq_f32_u8): Likewise.
(vreinterpretq_s16_f16): Likewise.
(vreinterpretq_s16_f32): Likewise.
(vreinterpretq_s64_f16): Likewise.
(vreinterpretq_s64_f32): Likewise.
(vreinterpretq_s8_f16): Likewise.
(vreinterpretq_s8_f32): Likewise.
(__arm_vreinterpretq_f16): Likewise.
(__arm_vreinterpretq_f32): Likewise.
(__arm_vreinterpretq_s16): Likewise.
(__arm_vreinterpretq_s32): Likewise.
(__arm_vreinterpretq_s64): Likewise.
(__arm_vreinterpretq_s8): Likewise.
(__arm_vreinterpretq_u16): Likewise.
(__arm_vreinterpretq_u32): Likewise.
(__arm_vreinterpretq_u64): Likewise.
(__arm_vreinterpretq_u8): Likewise.
* config/arm/arm_mve_types.h (__arm_vreinterpretq_s16_s32): Remove.
(__arm_vreinterpretq_s16_s64): Likewise.
(__arm_vreinterpretq_s16_s8): Likewise.
(__arm_vreinterpretq_s16_u16): Likewise.
(__arm_vreinterpretq_s16_u32): Likewise.
(__arm_vreinterpretq_s16_u64): Likewise.
(__arm_vreinterpretq_s16_u8): Likewise.
(__arm_vreinterpretq_s32_s16): Likewise.
(__arm_vreinterpretq_s32_s64): Likewise.
(__arm_vreinterpretq_s32_s8): Likewise.
(__arm_vreinterpretq_s32_u16): Likewise.
(__arm_vreinterpretq_s32_u32): Likewise.
(__arm_vreinterpretq_s32_u64): Likewise.
(__arm_vreinterpretq_s32_u8): Likewise.
(__arm_vreinterpretq_s64_s16): Likewise.
(__arm_vreinterpretq_s64_s32): Likewise.
(__arm_vreinterpretq_s64_s8): Likewise.
(__arm_vreinterpretq_s64_u16): Likewise.
(__arm_vreinterpretq_s64_u32): Likewise.
(__arm_vreinterpretq_s64_u64): Likewise.
(__arm_vreinterpretq_s64_u8): Likewise.
(__arm_vreinterpretq_s8_s16): Likewise.
(__arm_vreinterpretq_s8_s32): Likewise.
(__arm_vreinterpretq_s8_s64): Likewise.
(__arm_vreinterpretq_s8_u16): Likewise.
(__arm_vreinterpretq_s8_u32): Likewise.
(__arm_vreinterpretq_s8_u64): Likewise.
(__arm_vreinterpretq_s8_u8): Likewise.
(__arm_vreinterpretq_u16_s16): Likewise.
(__arm_vreinterpretq_u16_s32): Likewise.
(__arm_vreinterpretq_u16_s64): Likewise.
(__arm_vreinterpretq_u16_s8): Likewise.
(__arm_vreinterpretq_u16_u32): Likewise.
(__arm_vreinterpretq_u16_u64): Likewise.
(__arm_vreinterpretq_u16_u8): Likewise.
(__arm_vreinterpretq_u32_s16): Likewise.
(__arm_vreinterpretq_u32_s32): Likewise.
(__arm_vreinterpretq_u32_s64): Likewise.
(__arm_vreinterpretq_u32_s8): Likewise.
(__arm_vreinterpretq_u32_u16): Likewise.
(__arm_vreinterpretq_u32_u64): Likewise.
(__arm_vreinterpretq_u32_u8): Likewise.
(__arm_vreinterpretq_u64_s16): Likewise.
(__arm_vreinterpretq_u64_s32): Likewise.
(__arm_vreinterpretq_u64_s64): Likewise.
(__arm_vreinterpretq_u64_s8): Likewise.
(__arm_vreinterpretq_u64_u16): Likewise.
(__arm_vreinterpretq_u64_u32): Likewise.
(__arm_vreinterpretq_u64_u8): Likewise.
(__arm_vreinterpretq_u8_s16): Likewise.
(__arm_vreinterpretq_u8_s32): Likewise.
(__arm_vreinterpretq_u8_s64): Likewise.
(__arm_vreinterpretq_u8_s8): Likewise.
(__arm_vreinterpretq_u8_u16): Likewise.
(__arm_vreinterpretq_u8_u32): Likewise.
(__arm_vreinterpretq_u8_u64): Likewise.
(__arm_vreinterpretq_s32_f16): Likewise.
(__arm_vreinterpretq_s32_f32): Likewise.
(__arm_vreinterpretq_s16_f16): Likewise.
(__arm_vreinterpretq_s16_f32): Likewise.
(__arm_vreinterpretq_s64_f16): Likewise.
(__arm_vreinterpretq_s64_f32): Likewise.
(__arm_vreinterpretq_s8_f16): Likewise.
(__arm_vreinterpretq_s8_f32): Likewise.
(__arm_vreinterpretq_u16_f16): Likewise.
(__arm_vreinterpretq_u16_f32): Likewise.
(__arm_vreinterpretq_u32_f16): Likewise.
(__arm_vreinterpretq_u32_f32): Likewise.
(__arm_vreinterpretq_u64_f16): Likewise.
(__arm_vreinterpretq_u64_f32): Likewise.
(__arm_vreinterpretq_u8_f16): Likewise.
(__arm_vreinterpretq_u8_f32): Likewise.
(__arm_vreinterpretq_f16_f32): Likewise.
(__arm_vreinterpretq_f16_s16): Likewise.
(__arm_vreinterpretq_f16_s32): Likewise.
(__arm_vreinterpretq_f16_s64): Likewise.
(__arm_vreinterpretq_f16_s8): Likewise.
(__arm_vreinterpretq_f16_u16): Likewise.
(__arm_vreinterpretq_f16_u32): Likewise.
(__arm_vreinterpretq_f16_u64): Likewise.
(__arm_vreinterpretq_f16_u8): Likewise.
(__arm_vreinterpretq_f32_f16): Likewise.
(__arm_vreinterpretq_f32_s16): Likewise.
(__arm_vreinterpretq_f32_s32): Likewise.
(__arm_vreinterpretq_f32_s64): Likewise.
(__arm_vreinterpretq_f32_s8): Likewise.
(__arm_vreinterpretq_f32_u16): Likewise.
(__arm_vreinterpretq_f32_u32): Likewise.
(__arm_vreinterpretq_f32_u64): Likewise.
(__arm_vreinterpretq_f32_u8): Likewise.
(__arm_vreinterpretq_s16): Likewise.
(__arm_vreinterpretq_s32): Likewise.
(__arm_vreinterpretq_s64): Likewise.
(__arm_vreinterpretq_s8): Likewise.
(__arm_vreinterpretq_u16): Likewise.
(__arm_vreinterpretq_u32): Likewise.
(__arm_vreinterpretq_u64): Likewise.
(__arm_vreinterpretq_u8): Likewise.
(__arm_vreinterpretq_f16): Likewise.
(__arm_vreinterpretq_f32): Likewise.
* config/arm/mve.md (@arm_mve_reinterpret<mode>): New pattern.
* config/arm/unspecs.md: (REINTERPRET): New unspec.
gcc/testsuite/
* g++.target/arm/mve.exp: Add general-c++ and general directories.
* g++.target/arm/mve/general-c++/nomve_fp_1.c: New test.
* g++.target/arm/mve/general-c++/vreinterpretq_1.C: New test.
* gcc.target/arm/mve/general-c/nomve_fp_1.c: New test.
* gcc.target/arm/mve/general-c/vreinterpretq_1.c: New test.
|
|
This patch introduces the new MVE intrinsics framework, heavily
inspired by the SVE one in the aarch64 port.
Like the MVE intrinsic types implementation, the intrinsics framework
defines functions via a new pragma in arm_mve.h. A boolean parameter
is used to pass true when __ARM_MVE_PRESERVE_USER_NAMESPACE is
defined, and false when it is not, allowing for non-prefixed intrinsic
functions to be conditionally defined.
Future patches will build on this framework by adding new intrinsic
functions and adding the features needed to support them.
Differences compared to the aarch64/SVE port include:
- when present, the predicate argument is the last one with MVE (the
first one with SVE)
- when using merging predicates ("_m" suffix), the "inactive" argument
(if any) is inserted in the first position
- when using merging predicates ("_m" suffix), some function do not
have the "inactive" argument, so we maintain an exception-list
- MVE intrinsics dealing with floating-point require the FP extension,
while SVE may support different extensions
- regarding global state, MVE does not have any prefetch intrinsic, so
we do not need a flag for this
- intrinsic names can be prefixed with "__arm", depending on whether
preserve_user_namespace is true or false
- parse_signature: the maximum number of arguments is now a parameter,
this helps detecting an overflow with a new assert.
- suffixes and overloading can be controlled using
explicit_mode_suffix_p and skip_overload_p in addition to
explicit_type_suffix_p
At this implemtation stage, there are some limitations compared
to aarch64/SVE, which are removed later in the series:
- "offset" mode is not supported yet
- gimple folding is not implemented
2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/ChangeLog:
* config.gcc: Add arm-mve-builtins-base.o and
arm-mve-builtins-shapes.o to extra_objs.
* config/arm/arm-builtins.cc (arm_builtin_decl): Handle MVE builtin
numberspace.
(arm_expand_builtin): Likewise
(arm_check_builtin_call): Likewise
(arm_describe_resolver): Likewise.
* config/arm/arm-builtins.h (enum resolver_ident): Add
arm_mve_resolver.
* config/arm/arm-c.cc (arm_pragma_arm): Handle new pragma.
(arm_resolve_overloaded_builtin): Handle MVE builtins.
(arm_register_target_pragmas): Register arm_check_builtin_call.
* config/arm/arm-mve-builtins.cc (class registered_function): New
class.
(struct registered_function_hasher): New struct.
(pred_suffixes): New table.
(mode_suffixes): New table.
(type_suffix_info): New table.
(TYPES_float16): New.
(TYPES_all_float): New.
(TYPES_integer_8): New.
(TYPES_integer_8_16): New.
(TYPES_integer_16_32): New.
(TYPES_integer_32): New.
(TYPES_signed_16_32): New.
(TYPES_signed_32): New.
(TYPES_all_signed): New.
(TYPES_all_unsigned): New.
(TYPES_all_integer): New.
(TYPES_all_integer_with_64): New.
(DEF_VECTOR_TYPE): New.
(DEF_DOUBLE_TYPE): New.
(DEF_MVE_TYPES_ARRAY): New.
(all_integer): New.
(all_integer_with_64): New.
(float16): New.
(all_float): New.
(all_signed): New.
(all_unsigned): New.
(integer_8): New.
(integer_8_16): New.
(integer_16_32): New.
(integer_32): New.
(signed_16_32): New.
(signed_32): New.
(register_vector_type): Use void_type_node for mve.fp-only types when
mve.fp is not enabled.
(register_builtin_tuple_types): Likewise.
(handle_arm_mve_h): New function..
(matches_type_p): Likewise..
(report_out_of_range): Likewise.
(report_not_enum): Likewise.
(report_missing_float): Likewise.
(report_non_ice): Likewise.
(check_requires_float): Likewise.
(function_instance::hash): Likewise
(function_instance::call_properties): Likewise.
(function_instance::reads_global_state_p): Likewise.
(function_instance::modifies_global_state_p): Likewise.
(function_instance::could_trap_p): Likewise.
(function_instance::has_inactive_argument): Likewise.
(registered_function_hasher::hash): Likewise.
(registered_function_hasher::equal): Likewise.
(function_builder::function_builder): Likewise.
(function_builder::~function_builder): Likewise.
(function_builder::append_name): Likewise.
(function_builder::finish_name): Likewise.
(function_builder::get_name): Likewise.
(add_attribute): Likewise.
(function_builder::get_attributes): Likewise.
(function_builder::add_function): Likewise.
(function_builder::add_unique_function): Likewise.
(function_builder::add_overloaded_function): Likewise.
(function_builder::add_overloaded_functions): Likewise.
(function_builder::register_function_group): Likewise.
(function_call_info::function_call_info): Likewise.
(function_resolver::function_resolver): Likewise.
(function_resolver::get_vector_type): Likewise.
(function_resolver::get_scalar_type_name): Likewise.
(function_resolver::get_argument_type): Likewise.
(function_resolver::scalar_argument_p): Likewise.
(function_resolver::report_no_such_form): Likewise.
(function_resolver::lookup_form): Likewise.
(function_resolver::resolve_to): Likewise.
(function_resolver::infer_vector_or_tuple_type): Likewise.
(function_resolver::infer_vector_type): Likewise.
(function_resolver::require_vector_or_scalar_type): Likewise.
(function_resolver::require_vector_type): Likewise.
(function_resolver::require_matching_vector_type): Likewise.
(function_resolver::require_derived_vector_type): Likewise.
(function_resolver::require_derived_scalar_type): Likewise.
(function_resolver::require_integer_immediate): Likewise.
(function_resolver::require_scalar_type): Likewise.
(function_resolver::check_num_arguments): Likewise.
(function_resolver::check_gp_argument): Likewise.
(function_resolver::finish_opt_n_resolution): Likewise.
(function_resolver::resolve_unary): Likewise.
(function_resolver::resolve_unary_n): Likewise.
(function_resolver::resolve_uniform): Likewise.
(function_resolver::resolve_uniform_opt_n): Likewise.
(function_resolver::resolve): Likewise.
(function_checker::function_checker): Likewise.
(function_checker::argument_exists_p): Likewise.
(function_checker::require_immediate): Likewise.
(function_checker::require_immediate_enum): Likewise.
(function_checker::require_immediate_range): Likewise.
(function_checker::check): Likewise.
(gimple_folder::gimple_folder): Likewise.
(gimple_folder::fold): Likewise.
(function_expander::function_expander): Likewise.
(function_expander::direct_optab_handler): Likewise.
(function_expander::get_fallback_value): Likewise.
(function_expander::get_reg_target): Likewise.
(function_expander::add_output_operand): Likewise.
(function_expander::add_input_operand): Likewise.
(function_expander::add_integer_operand): Likewise.
(function_expander::generate_insn): Likewise.
(function_expander::use_exact_insn): Likewise.
(function_expander::use_unpred_insn): Likewise.
(function_expander::use_pred_x_insn): Likewise.
(function_expander::use_cond_insn): Likewise.
(function_expander::map_to_rtx_codes): Likewise.
(function_expander::expand): Likewise.
(resolve_overloaded_builtin): Likewise.
(check_builtin_call): Likewise.
(gimple_fold_builtin): Likewise.
(expand_builtin): Likewise.
(gt_ggc_mx): Likewise.
(gt_pch_nx): Likewise.
(gt_pch_nx): Likewise.
* config/arm/arm-mve-builtins.def(s8): Define new type suffix.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
(f16): Likewise.
(f32): Likewise.
(n): New mode.
(offset): New mode.
* config/arm/arm-mve-builtins.h (MAX_TUPLE_SIZE): New constant.
(CP_READ_FPCR): Likewise.
(CP_RAISE_FP_EXCEPTIONS): Likewise.
(CP_READ_MEMORY): Likewise.
(CP_WRITE_MEMORY): Likewise.
(enum units_index): New enum.
(enum predication_index): New.
(enum type_class_index): New.
(enum mode_suffix_index): New enum.
(enum type_suffix_index): New.
(struct mode_suffix_info): New struct.
(struct type_suffix_info): New.
(struct function_group_info): Likewise.
(class function_instance): Likewise.
(class registered_function): Likewise.
(class function_builder): Likewise.
(class function_call_info): Likewise.
(class function_resolver): Likewise.
(class function_checker): Likewise.
(class gimple_folder): Likewise.
(class function_expander): Likewise.
(get_mve_pred16_t): Likewise.
(find_mode_suffix): New function.
(class function_base): Likewise.
(class function_shape): Likewise.
(function_instance::operator==): New function.
(function_instance::operator!=): Likewise.
(function_instance::vectors_per_tuple): Likewise.
(function_instance::mode_suffix): Likewise.
(function_instance::type_suffix): Likewise.
(function_instance::scalar_type): Likewise.
(function_instance::vector_type): Likewise.
(function_instance::tuple_type): Likewise.
(function_instance::vector_mode): Likewise.
(function_call_info::function_returns_void_p): Likewise.
(function_base::call_properties): Likewise.
* config/arm/arm-protos.h (enum arm_builtin_class): Add
ARM_BUILTIN_MVE.
(handle_arm_mve_h): New.
(resolve_overloaded_builtin): New.
(check_builtin_call): New.
(gimple_fold_builtin): New.
(expand_builtin): New.
* config/arm/arm.cc (TARGET_GIMPLE_FOLD_BUILTIN): Define as
arm_gimple_fold_builtin.
(arm_gimple_fold_builtin): New function.
* config/arm/arm_mve.h: Use new arm_mve.h pragma.
* config/arm/predicates.md (arm_any_register_operand): New predicate.
* config/arm/t-arm: (arm-mve-builtins.o): Add includes.
(arm-mve-builtins-shapes.o): New target.
(arm-mve-builtins-base.o): New target.
* config/arm/arm-mve-builtins-base.cc: New file.
* config/arm/arm-mve-builtins-base.def: New file.
* config/arm/arm-mve-builtins-base.h: New file.
* config/arm/arm-mve-builtins-functions.h: New file.
* config/arm/arm-mve-builtins-shapes.cc: New file.
* config/arm/arm-mve-builtins-shapes.h: New file.
Co-authored-by: Christophe Lyon <christophe.lyon@arm.com
|
|
This patch introduces a separate numberspace for general arm builtin
function codes. The intent of this patch is to separate the space of
function codes that may be assigned to general builtins and future
MVE intrinsic functions by using the first bit of each function code
to differentiate them. This is identical to how SVE intrinsic functions
are currently differentiated from general aarch64 builtins.
Future intrinsics implementations may also make use of numberspacing by
changing the values of ARM_BUILTIN_SHIFT and ARM_BUILTIN_CLASS, and
adding themselves to the arm_builtin_class enum.
2022-09-08 Murray Steele <murray.steele@arm.com>
Christophe Lyon <christophe.lyon@arm.com>
gcc/ChangeLog:
* config/arm/arm-builtins.cc (arm_general_add_builtin_function):
New function.
(arm_init_builtin): Use arm_general_add_builtin_function instead
of arm_add_builtin_function.
(arm_init_acle_builtins): Likewise.
(arm_init_mve_builtins): Likewise.
(arm_init_crypto_builtins): Likewise.
(arm_init_builtins): Likewise.
(arm_general_builtin_decl): New function.
(arm_builtin_decl): Defer to numberspace-specialized functions.
(arm_expand_builtin_args): Rename into arm_general_expand_builtin_args.
(arm_expand_builtin_1): Rename into arm_general_expand_builtin_1 and ...
(arm_general_expand_builtin_1): ... specialize for general builtins.
(arm_expand_acle_builtin): Use arm_general_expand_builtin
instead of arm_expand_builtin.
(arm_expand_mve_builtin): Likewise.
(arm_expand_neon_builtin): Likewise.
(arm_expand_vfp_builtin): Likewise.
(arm_general_expand_builtin): New function.
(arm_expand_builtin): Specialize for general builtins.
(arm_general_check_builtin_call): New function.
(arm_check_builtin_call): Specialize for general builtins.
(arm_describe_resolver): Validate numberspace.
(arm_cde_end_args): Likewise.
* config/arm/arm-protos.h (enum arm_builtin_class): New enum.
(ARM_BUILTIN_SHIFT, ARM_BUILTIN_CLASS): New constants.
Co-authored-by: Christophe Lyon <christophe.lyon@arm.com>
|
|
Fixes:
gcc/config/riscv/sync.md:66:1: error: control reaches end of non-void function [-Werror=return-type]
66 | [(set (attr "length") (const_int 4))])
| ^
PR target/109713
gcc/ChangeLog:
* config/riscv/sync.md: Add gcc_unreachable to a switch.
|
|
This is the last set of changes removing calls to last_stmt in favor of
*gsi_last_bb where this is obviously correct. As with the last changes
I tried to cleanup the code as far as dependences are concerned.
* tree-ssa-loop-split.cc (split_at_bb_p): Avoid last_stmt.
(patch_loop_exit): Likewise.
(connect_loops): Likewise.
(split_loop): Likewise.
(control_dep_semi_invariant_p): Likewise.
(do_split_loop_on_cond): Likewise.
(split_loop_on_cond): Likewise.
* tree-ssa-loop-unswitch.cc (find_unswitching_predicates_for_bb):
Likewise.
(simplify_loop_version): Likewise.
(evaluate_bbs): Likewise.
(find_loop_guard): Likewise.
(clean_up_after_unswitching): Likewise.
* tree-ssa-math-opts.cc (maybe_optimize_guarding_check):
Likewise.
(optimize_spaceship): Take a gcond * argument, avoid
last_stmt.
(math_opts_dom_walker::after_dom_children): Adjust call to
optimize_spaceship.
* tree-vrp.cc (maybe_set_nonzero_bits): Avoid last_stmt.
* value-pointer-equiv.cc (pointer_equiv_analyzer::visit_edge):
Likewise.
|
|
This always sets _M_string_length in the constructor for ranges of input
iterators, such as stream iterators.
We copy from the source range to the local buffer, and then repeatedly
reallocate a larger one if necessary. When disposing the old buffer,
_M_is_local() is used to tell if the buffer is the local one or not (and
so must be deallocated). In addition to comparing the buffer address
with the local buffer, _M_is_local() has an optimization hint so that
the compiler knows that for a string using the local buffer, there is an
invariant that _M_string_length <= _S_local_capacity (added for PR109299
via r13-6915-gbf78b43873b0b7). But we failed to set _M_string_length in
the constructor taking a pair of iterators, so the invariant might not
hold, and __builtin_unreachable() is reached. This causes UBsan errors,
and potentially misoptimization.
To ensure the invariant holds, _M_string_length is initialized to zero
before doing anything else, so that _M_is_local() doesn't see an
uninitialized value.
This issue only surfaces when constructing a string with a range of
input iterator, and the uninitialized _M_string_length happens to be
greater than _S_local_capacity, i.e., 15 for the std::string
specialization.
libstdc++-v3/ChangeLog:
PR libstdc++/109703
* include/bits/basic_string.h (basic_string(Iter, Iter, Alloc)):
Initialize _M_string_length.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
|
|
Now that we have support for inline subword atomic operations, it is no
longer necessary to link against libatomic. This also fixes testsuite
failures because the framework does not properly set up the linker flags
for finding libatomic.
The use of atomic operations is also independent of the use of libpthread.
gcc/
* config/riscv/linux.h (LIB_SPEC): Don't redefine.
|
|
Add segment load/store intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/198
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (fold_fault_load):
New function.
(class vlseg): New class.
(class vsseg): Ditto.
(class vlsseg): Ditto.
(class vssseg): Ditto.
(class seg_indexed_load): Ditto.
(class seg_indexed_store): Ditto.
(class vlsegff): Ditto.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vlseg):
Ditto.
(vsseg): Ditto.
(vlsseg): Ditto.
(vssseg): Ditto.
(vluxseg): Ditto.
(vloxseg): Ditto.
(vsuxseg): Ditto.
(vsoxseg): Ditto.
(vlsegff): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct
seg_loadstore_def): Ditto.
(struct seg_indexed_loadstore_def): Ditto.
(struct seg_fault_load_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_builder::append_nf): New function.
* config/riscv/riscv-vector-builtins.def (vfloat32m1x2_t):
Change ptr from double into float.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h: Add segment intrinsics.
* config/riscv/riscv-vsetvl.cc (fault_first_load_p): Adapt for
segment ff load.
* config/riscv/riscv.md: Add segment instructions.
* config/riscv/vector-iterators.md: Support segment intrinsics.
* config/riscv/vector.md (@pred_unit_strided_load<mode>): New
pattern.
(@pred_unit_strided_store<mode>): Ditto.
(@pred_strided_load<mode>): Ditto.
(@pred_strided_store<mode>): Ditto.
(@pred_fault_load<mode>): Ditto.
(@pred_indexed_<order>load<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>load<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>load<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>load<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>load<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>load<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>load<V64T:mode><V64I:mode>): Ditto.
(@pred_indexed_<order>store<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>store<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>store<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>store<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>store<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>store<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>store<V64T:mode><V64I:mode>): Ditto.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (valid_type): Adapt for
tuple type support.
(inttype): Ditto.
(floattype): Ditto.
(main): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vset): Add
tuple type vset.
(vget): Add tuple type vget.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_TUPLE_OPS): New macro.
(vint8mf8x2_t): Ditto.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_OPS):
Ditto.
(DEF_RVV_TYPE_INDEX): Ditto.
(rvv_arg_type_info::get_tuple_subpart_type): New function.
(DEF_RVV_TUPLE_TYPE): New macro.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX):
Adapt for tuple vget/vset support.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
(tuple_subpart): Add tuple subpart base type.
* config/riscv/riscv-vector-builtins.h (struct
rvv_arg_type_info): Ditto.
(tuple_type_field): New function.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
gcc/ChangeLog:
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
* config/riscv/riscv-protos.h (riscv_v_ext_tuple_mode_p): New
function.
(get_nf): Ditto.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-v.cc (ENTRY): New macro.
(TUPLE_ENTRY): Ditto.
(get_nf): New function.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_TYPE):
New macro.
(register_tuple_type): New function
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TUPLE_TYPE):
New macro.
(vint8mf8x2_t): New macro.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h (DEF_RVV_TUPLE_TYPE):
Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): New
function.
(TUPLE_ENTRY): Ditto.
(riscv_v_ext_mode_p): New function.
(riscv_v_adjust_nunits): Add tuple mode adjustment.
(riscv_classify_address): Ditto.
(riscv_binary_cost): Ditto.
(riscv_rtx_costs): Ditto.
(riscv_secondary_memory_needed): Ditto.
(riscv_hard_regno_nregs): Ditto.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_vector_mode_supported_p): Ditto.
(riscv_regmode_natural_size): Ditto.
(riscv_array_mode): New function.
(TARGET_ARRAY_MODE): New target hook.
* config/riscv/riscv.md: Add tuple modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md (mov<mode>): Add tuple modes data
movement.
(*mov<VT:mode>_<P:mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-10.c: New test.
* gcc.target/riscv/rvv/base/abi-11.c: New test.
* gcc.target/riscv/rvv/base/abi-12.c: New test.
* gcc.target/riscv/rvv/base/abi-13.c: New test.
* gcc.target/riscv/rvv/base/abi-14.c: New test.
* gcc.target/riscv/rvv/base/abi-15.c: New test.
* gcc.target/riscv/rvv/base/abi-16.c: New test.
* gcc.target/riscv/rvv/base/abi-8.c: New test.
* gcc.target/riscv/rvv/base/abi-9.c: New test.
* gcc.target/riscv/rvv/base/tuple-1.c: New test.
* gcc.target/riscv/rvv/base/tuple-10.c: New test.
* gcc.target/riscv/rvv/base/tuple-11.c: New test.
* gcc.target/riscv/rvv/base/tuple-12.c: New test.
* gcc.target/riscv/rvv/base/tuple-13.c: New test.
* gcc.target/riscv/rvv/base/tuple-14.c: New test.
* gcc.target/riscv/rvv/base/tuple-15.c: New test.
* gcc.target/riscv/rvv/base/tuple-16.c: New test.
* gcc.target/riscv/rvv/base/tuple-17.c: New test.
* gcc.target/riscv/rvv/base/tuple-18.c: New test.
* gcc.target/riscv/rvv/base/tuple-19.c: New test.
* gcc.target/riscv/rvv/base/tuple-2.c: New test.
* gcc.target/riscv/rvv/base/tuple-20.c: New test.
* gcc.target/riscv/rvv/base/tuple-21.c: New test.
* gcc.target/riscv/rvv/base/tuple-22.c: New test.
* gcc.target/riscv/rvv/base/tuple-23.c: New test.
* gcc.target/riscv/rvv/base/tuple-24.c: New test.
* gcc.target/riscv/rvv/base/tuple-25.c: New test.
* gcc.target/riscv/rvv/base/tuple-26.c: New test.
* gcc.target/riscv/rvv/base/tuple-27.c: New test.
* gcc.target/riscv/rvv/base/tuple-3.c: New test.
* gcc.target/riscv/rvv/base/tuple-4.c: New test.
* gcc.target/riscv/rvv/base/tuple-5.c: New test.
* gcc.target/riscv/rvv/base/tuple-6.c: New test.
* gcc.target/riscv/rvv/base/tuple-7.c: New test.
* gcc.target/riscv/rvv/base/tuple-8.c: New test.
* gcc.target/riscv/rvv/base/tuple-9.c: New test.
* gcc.target/riscv/rvv/base/user-10.c: New test.
* gcc.target/riscv/rvv/base/user-11.c: New test.
* gcc.target/riscv/rvv/base/user-12.c: New test.
* gcc.target/riscv/rvv/base/user-13.c: New test.
* gcc.target/riscv/rvv/base/user-14.c: New test.
* gcc.target/riscv/rvv/base/user-15.c: New test.
* gcc.target/riscv/rvv/base/user-7.c: New test.
* gcc.target/riscv/rvv/base/user-8.c: New test.
* gcc.target/riscv/rvv/base/user-9.c: New test.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
When cse_insn prunes src{,_folded,_eqv_here,_related} with the
equivalence set in the *_same_value chain it also searches for
an equivalence to the destination of the instruction with
/* This is the same as the destination of the insns, we want
to prefer it. Copy it to src_related. The code below will
then give it a negative cost. */
if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
src_related = p->exp;
this picks up the last such equivalence and in particular any
later duplicate will be pruned by the preceeding
else if (src_related && GET_CODE (src_related) == code
&& rtx_equal_p (src_related, p->exp))
src_related = 0;
first. This wastes cycles doing extra rtx_equal_p checks. The
following instead searches for the first destination equivalence
separately in this loop and delays using src_related for it until
we are about to process that, avoiding another redundant rtx_equal_p
check.
I've came here because of a testcase with very large equivalence
lists and compile-time of cse_insn. The patch below doesn't speed
it up significantly since there's no equivalence on the destination.
In theory this opens the possibility to track dest_related
separately, avoiding the implicit pruning of any previous
value in src_related. As is the change should be a no-op for
code generation.
* cse.cc (cse_insn): Track an equivalence to the destination
separately and delay using src_related for it.
|
|
The RTL CSE hash table has a fixed number of buckets (32) each
with a linked list of entries with the same hash value. The
actual hash values are computed using hash_rtx which uses adds
for mixing and adds the rtx CODE as CODE << 7 (apart from some
exceptions such as MEM). The unsigned int typed hash value
is then simply truncated for the actual lookup into the fixed
size table which means that usually CODE is simply lost.
The following improves this truncation by first mixing in more
bits using xor. It does not change the actual hash function
since that's used outside of CSE as well.
An alternative would be to bump the fixed number of buckets,
say to 256 which would retain the LSB of CODE or to 8192 which
can capture all 6 bits required for the last CODE.
As the comment in CSE says, there's invalidate_memory and
flush_hash_table done possibly frequently and those at least
need to walk all slots, so when the hash table is mostly empty
enlarging it will be a loss. Still there should be more
regular lookups by hash, so less collisions should pay off
as well.
Without enlarging the table a better hash function is unlikely
going to make a big difference, simple statistics on the
number of collisions at insertion time shows a reduction of
around 10%. Bumping HASH_SHIFT by 1 improves that to 30%
at the expense of reducing the average table fill by 10%
(all of this stats from looking just at fold-const.i at -O2).
Increasing HASH_SHIFT more leaves the table even more sparse
likely showing that hash_rtx uses add for mixing which is
quite bad. Bumping HASH_SHIFT by 2 removes 90% of all
collisions.
Experimenting with using inchash instead of adds for the
mixing does not improve things when looking at the HASH_SHIFT
bumped by 2 numbers.
* cse.cc (HASH): Turn into inline function and mix
in another HASH_SHIFT bits.
(SAFE_HASH): Likewise.
|
|
Further straightforward patch for the various halving intrinsics with or without rounding, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<sur>h<addsub><mode>): Rename to...
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for halving and rounding
add/sub intrinsics.
|
|
vec-concat with zero
Continuing the, almost mechanical, series this patch adds annotation for some of the simple
floating-point patterns we have, and adds testing to ensure that redundant zeroing instructions
are eliminated.
Bootstrapped and tested on aarch64-none-linux-gnu and also aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (add<mode>3): Rename to...
(add<mode>3<vczle><vczbe>): ... This.
(sub<mode>3): Rename to...
(sub<mode>3<vczle><vczbe>): ... This.
(mul<mode>3): Rename to...
(mul<mode>3<vczle><vczbe>): ... This.
(*div<mode>3): Rename to...
(*div<mode>3<vczle><vczbe>): ... This.
(neg<mode>2): Rename to...
(neg<mode>2<vczle><vczbe>): ... This.
(abs<mode>2): Rename to...
(abs<mode>2<vczle><vczbe>): ... This.
(<frint_pattern><mode>2): Rename to...
(<frint_pattern><mode>2<vczle><vczbe>): ... This.
(<fmaxmin><mode>3): Rename to...
(<fmaxmin><mode>3<vczle><vczbe>): ... This.
(*sqrt<mode>2): Rename to...
(*sqrt<mode>2<vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for some unary
and binary floating-point ops.
* gcc.target/aarch64/simd/pr99195_2.c: New test.
|
|
`vr`, `vm` and `vd` constarint for vector register constarint, those 3
constarint has implemented on LLVM as well.
gcc/ChangeLog:
* doc/md.texi (RISC-V): Add vr, vm, vd constarint.
|
|
[-Wunused-private-field]
PR tree-optimization/109693
gcc/ChangeLog:
* value-range-storage.cc (vrange_allocator::vrange_allocator):
Remove unused field.
* value-range-storage.h: Likewise.
|
|
During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from &pset to cp_fold_data
&data, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set<tree> and so if during the folding we ever touch
data->flags, we use uninitialized data there.
The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.
2023-05-03 Jakub Jelinek <jakub@redhat.com>
* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than &pset to cp_walk_tree with cp_fold_r.
|
|
We try to cache the result of reduce_template_parm_level so that when we
reduce the same parm multiple times we get the same result, but this wasn't
working for template template parms because in that case TYPE is a
TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
level mismatch that we're trying to adjust for. So in that case compare the
template parms of the template template parms instead.
The result can be seen in nontype12.C, where we previously gave three
duplicate errors on line 7 and now give only one because subsequent
substitutions use the cache.
gcc/cp/ChangeLog:
* pt.cc (reduce_template_parm_level): Fix comparison of
template template parm to cached version.
gcc/testsuite/ChangeLog:
* g++.dg/template/nontype12.C: Check for duplicate error.
|
|
|
|
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type. Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop. It's unlikely to make a difference on any real code, but the
simplification is also nice.
We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.
#define NC(X) \
template <class U> struct X##1; \
template <class U> struct X##2; \
template <class U> struct X##3; \
template <class U> struct X##4; \
template <class U> struct X##5; \
template <class U> struct X##6;
#define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
#define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
template <int I> struct A
{
NC3(am)
};
template <class...Ts> void sink(Ts...);
template <int...Is> void g()
{
sink(A<Is>()...);
}
template <int I> void f()
{
g<__integer_pack(I)...>();
}
int main()
{
f<1000>();
}
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
|
|
When I added diamond shaped form bb to match_simplify_replacement,
I copied the code to move the statement rather than factoring it
out to a new function. This does the refactoring to a new function
to avoid the duplicated code. It will make adding support for having
two statements to move easier (the second statement will only be a
conversion).
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (move_stmt): New function.
(match_simplify_replacement): Use move_stmt instead
of the inlined version.
|
|
This ports the clrsb builtin part of builtin_zero_pattern
to match.pd. A simple pattern to port.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
pattern.
|
|
I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.
I committed this as obvious after a bootstrap/test on x86_64-linux-gnu
PR tree-optimization/109702
gcc/ChangeLog:
* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-25b.c: New test.
|
|
There is no canonical form for this case defined. So the aarch64 backend needs
a pattern to match both of these forms.
The forms are:
(set (reg/i:SI 0 x0)
(if_then_else:SI (eq (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 97)
(const_int -1 [0xffffffffffffffff])))
and
(set (reg/i:SI 0 x0)
(ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
(const_int 0 [0])))
(reg:SI 102)))
Currently the aarch64 backend matches the first form so this
patch adds a insn_and_split to match the second form and
convert it to the first form.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
PR target/109657
gcc/ChangeLog:
* config/aarch64/aarch64.md (*cmov<mode>_insn_m1): New
insn_and_split pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/csinv-2.c: New test.
|
|
In the testcase below, we push_to_top_level to instantiate f and g, and they
can both use the previous_class_level cache from instantiating A<int>.
Wiping the cache in pop_from_top_level is not helpful; we'll do that in
pushclass if needed.
template <class T> struct A
{
int i;
void f() { i = 42; }
void g() { i = 24; }
};
int main()
{
A<int> a;
a.f();
a.g();
}
gcc/cp/ChangeLog:
* name-lookup.cc (pop_from_top_level): Don't
invalidate_class_lookup_cache.
|
|
While looking at the empty base handling for 109678, it occurred to me that
we ought to be able to look for an empty base at a specific offset, not just
in general.
PR c++/109678
gcc/cp/ChangeLog:
* cp-tree.h (lookup_base): Add offset parm.
* constexpr.cc (cxx_fold_indirect_ref_1): Pass it.
* search.cc (struct lookup_base_data_s): Add offset.
(dfs_lookup_base): Handle it.
(lookup_base): Pass it.
|
|
Here, when dealing with a class with a complex subobject structure, we would
try and fail to find the relevant FIELD_DECL for an empty base before giving
up. And we would do this at each level, in a combinatorially problematic
way. Instead, we should check for an empty base first.
PR c++/109678
gcc/cp/ChangeLog:
* constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/variant1.C: New test.
|