Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch adds the missing expanders for smax/smin for v*hf modes,
by using the VDQWH iterator instead of VALLW.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/vec-common.md (smin<mode>3): Use VDQWH iterator.
(smax<mode>3): Likewise.
|
|
Implement vmaxvq, vminvq, vmaxavq, vminavq using the new MVE builtins
framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_PRED_P_S_U)
(FUNCTION_PRED_P_S): New.
(vmaxavq, vminavq, vmaxvq, vminvq): New.
* config/arm/arm-mve-builtins-base.def (vmaxavq, vminavq, vmaxvq)
(vminvq): New.
* config/arm/arm-mve-builtins-base.h (vmaxavq, vminavq, vmaxvq)
(vminvq): New.
* config/arm/arm_mve.h (vminvq): Remove.
(vmaxvq): Remove.
(vminvq_p): Remove.
(vmaxvq_p): Remove.
(vminvq_u8): Remove.
(vmaxvq_u8): Remove.
(vminvq_s8): Remove.
(vmaxvq_s8): Remove.
(vminvq_u16): Remove.
(vmaxvq_u16): Remove.
(vminvq_s16): Remove.
(vmaxvq_s16): Remove.
(vminvq_u32): Remove.
(vmaxvq_u32): Remove.
(vminvq_s32): Remove.
(vmaxvq_s32): Remove.
(vminvq_p_u8): Remove.
(vmaxvq_p_u8): Remove.
(vminvq_p_s8): Remove.
(vmaxvq_p_s8): Remove.
(vminvq_p_u16): Remove.
(vmaxvq_p_u16): Remove.
(vminvq_p_s16): Remove.
(vmaxvq_p_s16): Remove.
(vminvq_p_u32): Remove.
(vmaxvq_p_u32): Remove.
(vminvq_p_s32): Remove.
(vmaxvq_p_s32): Remove.
(__arm_vminvq_u8): Remove.
(__arm_vmaxvq_u8): Remove.
(__arm_vminvq_s8): Remove.
(__arm_vmaxvq_s8): Remove.
(__arm_vminvq_u16): Remove.
(__arm_vmaxvq_u16): Remove.
(__arm_vminvq_s16): Remove.
(__arm_vmaxvq_s16): Remove.
(__arm_vminvq_u32): Remove.
(__arm_vmaxvq_u32): Remove.
(__arm_vminvq_s32): Remove.
(__arm_vmaxvq_s32): Remove.
(__arm_vminvq_p_u8): Remove.
(__arm_vmaxvq_p_u8): Remove.
(__arm_vminvq_p_s8): Remove.
(__arm_vmaxvq_p_s8): Remove.
(__arm_vminvq_p_u16): Remove.
(__arm_vmaxvq_p_u16): Remove.
(__arm_vminvq_p_s16): Remove.
(__arm_vmaxvq_p_s16): Remove.
(__arm_vminvq_p_u32): Remove.
(__arm_vmaxvq_p_u32): Remove.
(__arm_vminvq_p_s32): Remove.
(__arm_vmaxvq_p_s32): Remove.
(__arm_vminvq): Remove.
(__arm_vmaxvq): Remove.
(__arm_vminvq_p): Remove.
(__arm_vmaxvq_p): Remove.
(vminavq): Remove.
(vmaxavq): Remove.
(vminavq_p): Remove.
(vmaxavq_p): Remove.
(vminavq_s8): Remove.
(vmaxavq_s8): Remove.
(vminavq_s16): Remove.
(vmaxavq_s16): Remove.
(vminavq_s32): Remove.
(vmaxavq_s32): Remove.
(vminavq_p_s8): Remove.
(vmaxavq_p_s8): Remove.
(vminavq_p_s16): Remove.
(vmaxavq_p_s16): Remove.
(vminavq_p_s32): Remove.
(vmaxavq_p_s32): Remove.
(__arm_vminavq_s8): Remove.
(__arm_vmaxavq_s8): Remove.
(__arm_vminavq_s16): Remove.
(__arm_vmaxavq_s16): Remove.
(__arm_vminavq_s32): Remove.
(__arm_vmaxavq_s32): Remove.
(__arm_vminavq_p_s8): Remove.
(__arm_vmaxavq_p_s8): Remove.
(__arm_vminavq_p_s16): Remove.
(__arm_vmaxavq_p_s16): Remove.
(__arm_vminavq_p_s32): Remove.
(__arm_vmaxavq_p_s32): Remove.
(__arm_vminavq): Remove.
(__arm_vmaxavq): Remove.
(__arm_vminavq_p): Remove.
(__arm_vmaxavq_p): Remove.
|
|
Factorize vmaxvq vminvq vmaxavq vminavq so that they use the same
pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_VMAXVQ_VMINVQ, MVE_VMAXVQ_VMINVQ_P): New.
(mve_insn): Add vmaxav, vmaxv, vminav, vminv.
(supf): Add VMAXAVQ_S, VMAXAVQ_P_S, VMINAVQ_S, VMINAVQ_P_S.
* config/arm/mve.md (mve_vmaxavq_s<mode>, mve_vmaxvq_<supf><mode>)
(mve_vminavq_s<mode>, mve_vminvq_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.
(mve_vmaxavq_p_s<mode>, mve_vmaxvq_p_<supf><mode>)
(mve_vminavq_p_s<mode>, mve_vminvq_p_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_p_<supf><mode>): ... this.
|
|
Introduce a function that will be used to build intrinsics that use p
predication.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-functions.h (class
unspec_mve_function_exact_insn_pred_p): New.
|
|
This patch adds the binary_maxavminav shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxavminav): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxavminav): New.
|
|
This patch adds the binary_maxvminv shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_maxvminv): New.
* config/arm/arm-mve-builtins-shapes.h (binary_maxvminv): New.
|
|
REG_ALLOC_ORDER is much less important than it used to be, but it
is still used as a tie-breaker when multiple registers in a class
are equally good.
Previously aarch64 used the default approach of allocating in order
of increasing register number. But as the comment in the patch says,
it's better to allocate FP and predicate registers in the opposite
order, so that we don't eat into smaller register classes unnecessarily.
This fixes some existing FIXMEs and improves the register allocation
for some Arm ACLE code.
Doing this also showed that *vcond_mask_<mode><vpred> (predicated MOV/SEL)
unnecessarily required p0-p7 rather than p0-p15 for the unpredicated
movprfx alternatives. Only the predicated movprfx alternative requires
p0-p7 (due to the movprfx itself, rather than due to the main instruction).
gcc/
* config/aarch64/aarch64-protos.h (aarch64_adjust_reg_alloc_order):
Declare.
* config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
(ADJUST_REG_ALLOC_ORDER): Likewise.
* config/aarch64/aarch64.cc (aarch64_adjust_reg_alloc_order): New
function.
* config/aarch64/aarch64-sve.md (*vcond_mask_<mode><vpred>): Use
Upa rather than Upl for unpredicated movprfx alternatives.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/abd_f16.c: Remove XFAILs.
* gcc.target/aarch64/sve/acle/asm/abd_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/abd_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/add_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/and_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/divr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dot_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/eor_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mad_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/max_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/min_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mla_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mls_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/msb_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f16_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f32_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_f64_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mul_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulh_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/mulx_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmad_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmla_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmls_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/nmsb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/orr_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/sub_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f16_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f32_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_f64_notrap.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/bcax_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qadd_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalb_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qdmlalbt_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsub_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/qsubr_u8.c: Likewise.
|
|
aarch64-sve2-acle-asm.exp tried to prevent --with-cpu/tune
from affecting the results, but it used sve_flags rather than
sve2_flags. This was a silent failure when running the full
testsuite, but was a fatal error when running the harness
individually.
gcc/testsuite/
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use
sve2_flags instead of sve_flags.
|
|
successive line
This is a patch for the m2iso library to prevent SkipLine from consuming
the next character on the next line.
gcc/m2/ChangeLog:
PR modula2/109779
* gm2-libs-iso/RTgen.mod (doLook): Remove old.
Remove re-assignment of result.
* gm2-libs-iso/TextIO.mod (CanRead): Rename into ...
(CharAvailable): ... this.
(DumpState): New procedure.
(SetResult): Rename as SetNul.
(WasGoodChar): Rename into ...
(EofOrEoln): ... this.
(SkipLine): Skip over the newline.
(ReadString): Flip THEN ELSE statements after testing for
EofOrEoln.
(ReadRestLine): Flip THEN ELSE statements after testing for
EofOrEoln.
gcc/testsuite/ChangeLog:
PR modula2/109779
* gm2/isolib/run/pass/skiplinetest.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
http://eel.is/c++draft/dcl.attr#grammar-4 says
"In an attribute-list, an ellipsis may appear only if that attribute's
specification permits it."
and doesn't explicitly permit it on any standard attribute.
The https://wg21.link/p1774r8 paper which introduced assume attribute says
"We could therefore hypothetically permit the assume attribute to directly
support pack expansion:
template <int... args>
void f() {
[[assume(args >= 0)...]];
}
However, we do not propose this. It would require substantial additional work
for a very rare use case. Note that this can instead be expressed with a fold
expression, which is equivalent to the above and works out of the box without
any extra effort:
template <int... args>
void f() {
[[assume(((args >= 0) && ...))]];
}
", but as the testcase shows, GCC 13+ ICEs on assume attribute followed by
... if it contains packs.
The following patch rejects those instead of ICE and for C++17 or later
suggests using fold expressions instead (it doesn't make sense to suggest
it for C++14 and earlier when we'd error on the fold expressions).
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR c++/109756
* cp-gimplify.cc (process_stmt_assume_attribute): Diagnose pack
expansion of assume attribute.
* g++.dg/cpp23/attr-assume11.C: New test.
|
|
This patch fixes a minor code quality issue I found while testing LRA on the
H8. Specifically we have a peephole which converts a comparison of a memory
location against zero into a load + comparison which is actually more
efficient. This triggers when there are registers available at the right
point during peephole2.
If the load is not a mode dependent address we can actually do better by
realizing the load itself sets the proper flags and eliminate the comparison.
I may have expected this to happen when I wrote the original peephole2,
but cmpelim runs before peephole2, so clearly if we want to eliminate the
comparison we have to do it manually.
gcc/
* config/h8300/testcompare.md: Add peephole2 which uses a memory
load to set flags, thus eliminating a compare against zero.
|
|
Implement vshllbq and vshlltq using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (vshllbq, vshlltq): New.
* config/arm/arm-mve-builtins-base.def (vshllbq, vshlltq): New.
* config/arm/arm-mve-builtins-base.h (vshllbq, vshlltq): New.
* config/arm/arm_mve.h (vshlltq): Remove.
(vshllbq): Remove.
(vshllbq_m): Remove.
(vshlltq_m): Remove.
(vshllbq_x): Remove.
(vshlltq_x): Remove.
(vshlltq_n_u8): Remove.
(vshllbq_n_u8): Remove.
(vshlltq_n_s8): Remove.
(vshllbq_n_s8): Remove.
(vshlltq_n_u16): Remove.
(vshllbq_n_u16): Remove.
(vshlltq_n_s16): Remove.
(vshllbq_n_s16): Remove.
(vshllbq_m_n_s8): Remove.
(vshllbq_m_n_s16): Remove.
(vshllbq_m_n_u8): Remove.
(vshllbq_m_n_u16): Remove.
(vshlltq_m_n_s8): Remove.
(vshlltq_m_n_s16): Remove.
(vshlltq_m_n_u8): Remove.
(vshlltq_m_n_u16): Remove.
(vshllbq_x_n_s8): Remove.
(vshllbq_x_n_s16): Remove.
(vshllbq_x_n_u8): Remove.
(vshllbq_x_n_u16): Remove.
(vshlltq_x_n_s8): Remove.
(vshlltq_x_n_s16): Remove.
(vshlltq_x_n_u8): Remove.
(vshlltq_x_n_u16): Remove.
(__arm_vshlltq_n_u8): Remove.
(__arm_vshllbq_n_u8): Remove.
(__arm_vshlltq_n_s8): Remove.
(__arm_vshllbq_n_s8): Remove.
(__arm_vshlltq_n_u16): Remove.
(__arm_vshllbq_n_u16): Remove.
(__arm_vshlltq_n_s16): Remove.
(__arm_vshllbq_n_s16): Remove.
(__arm_vshllbq_m_n_s8): Remove.
(__arm_vshllbq_m_n_s16): Remove.
(__arm_vshllbq_m_n_u8): Remove.
(__arm_vshllbq_m_n_u16): Remove.
(__arm_vshlltq_m_n_s8): Remove.
(__arm_vshlltq_m_n_s16): Remove.
(__arm_vshlltq_m_n_u8): Remove.
(__arm_vshlltq_m_n_u16): Remove.
(__arm_vshllbq_x_n_s8): Remove.
(__arm_vshllbq_x_n_s16): Remove.
(__arm_vshllbq_x_n_u8): Remove.
(__arm_vshllbq_x_n_u16): Remove.
(__arm_vshlltq_x_n_s8): Remove.
(__arm_vshlltq_x_n_s16): Remove.
(__arm_vshlltq_x_n_u8): Remove.
(__arm_vshlltq_x_n_u16): Remove.
(__arm_vshlltq): Remove.
(__arm_vshllbq): Remove.
(__arm_vshllbq_m): Remove.
(__arm_vshlltq_m): Remove.
(__arm_vshllbq_x): Remove.
(__arm_vshlltq_x): Remove.
|
|
Factorize vshllbq vshlltq so that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (mve_insn): Add vshllb, vshllt.
(VSHLLBQ_N, VSHLLTQ_N): Remove.
(VSHLLxQ_N): New.
(VSHLLBQ_M_N, VSHLLTQ_M_N): Remove.
(VSHLLxQ_M_N): New.
* config/arm/mve.md (mve_vshllbq_n_<supf><mode>)
(mve_vshlltq_n_<supf><mode>): Merge into ...
(@mve_<mve_insn>q_n_<supf><mode>): ... this.
(mve_vshllbq_m_n_<supf><mode>, mve_vshlltq_m_n_<supf><mode>):
Merge into ...
(@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
|
|
This patch adds the binary_widen_n shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_widen_n): New.
* config/arm/arm-mve-builtins-shapes.h (binary_widen_n): New.
|
|
vqmovuntq
Implement vmovnbq, vmovntq, vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq
using the new MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (vmovnbq, vmovntq, vqmovnbq)
(vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins-base.def (vmovnbq, vmovntq)
(vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins-base.h (vmovnbq, vmovntq, vqmovnbq)
(vqmovntq, vqmovunbq, vqmovuntq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vmovnbq,
vmovntq, vqmovnbq, vqmovntq, vqmovunbq, vqmovuntq.
* config/arm/arm_mve.h (vqmovntq): Remove.
(vqmovnbq): Remove.
(vqmovnbq_m): Remove.
(vqmovntq_m): Remove.
(vqmovntq_u16): Remove.
(vqmovnbq_u16): Remove.
(vqmovntq_s16): Remove.
(vqmovnbq_s16): Remove.
(vqmovntq_u32): Remove.
(vqmovnbq_u32): Remove.
(vqmovntq_s32): Remove.
(vqmovnbq_s32): Remove.
(vqmovnbq_m_s16): Remove.
(vqmovntq_m_s16): Remove.
(vqmovnbq_m_u16): Remove.
(vqmovntq_m_u16): Remove.
(vqmovnbq_m_s32): Remove.
(vqmovntq_m_s32): Remove.
(vqmovnbq_m_u32): Remove.
(vqmovntq_m_u32): Remove.
(__arm_vqmovntq_u16): Remove.
(__arm_vqmovnbq_u16): Remove.
(__arm_vqmovntq_s16): Remove.
(__arm_vqmovnbq_s16): Remove.
(__arm_vqmovntq_u32): Remove.
(__arm_vqmovnbq_u32): Remove.
(__arm_vqmovntq_s32): Remove.
(__arm_vqmovnbq_s32): Remove.
(__arm_vqmovnbq_m_s16): Remove.
(__arm_vqmovntq_m_s16): Remove.
(__arm_vqmovnbq_m_u16): Remove.
(__arm_vqmovntq_m_u16): Remove.
(__arm_vqmovnbq_m_s32): Remove.
(__arm_vqmovntq_m_s32): Remove.
(__arm_vqmovnbq_m_u32): Remove.
(__arm_vqmovntq_m_u32): Remove.
(__arm_vqmovntq): Remove.
(__arm_vqmovnbq): Remove.
(__arm_vqmovnbq_m): Remove.
(__arm_vqmovntq_m): Remove.
(vmovntq): Remove.
(vmovnbq): Remove.
(vmovnbq_m): Remove.
(vmovntq_m): Remove.
(vmovntq_u16): Remove.
(vmovnbq_u16): Remove.
(vmovntq_s16): Remove.
(vmovnbq_s16): Remove.
(vmovntq_u32): Remove.
(vmovnbq_u32): Remove.
(vmovntq_s32): Remove.
(vmovnbq_s32): Remove.
(vmovnbq_m_s16): Remove.
(vmovntq_m_s16): Remove.
(vmovnbq_m_u16): Remove.
(vmovntq_m_u16): Remove.
(vmovnbq_m_s32): Remove.
(vmovntq_m_s32): Remove.
(vmovnbq_m_u32): Remove.
(vmovntq_m_u32): Remove.
(__arm_vmovntq_u16): Remove.
(__arm_vmovnbq_u16): Remove.
(__arm_vmovntq_s16): Remove.
(__arm_vmovnbq_s16): Remove.
(__arm_vmovntq_u32): Remove.
(__arm_vmovnbq_u32): Remove.
(__arm_vmovntq_s32): Remove.
(__arm_vmovnbq_s32): Remove.
(__arm_vmovnbq_m_s16): Remove.
(__arm_vmovntq_m_s16): Remove.
(__arm_vmovnbq_m_u16): Remove.
(__arm_vmovntq_m_u16): Remove.
(__arm_vmovnbq_m_s32): Remove.
(__arm_vmovntq_m_s32): Remove.
(__arm_vmovnbq_m_u32): Remove.
(__arm_vmovntq_m_u32): Remove.
(__arm_vmovntq): Remove.
(__arm_vmovnbq): Remove.
(__arm_vmovnbq_m): Remove.
(__arm_vmovntq_m): Remove.
(vqmovuntq): Remove.
(vqmovunbq): Remove.
(vqmovunbq_m): Remove.
(vqmovuntq_m): Remove.
(vqmovuntq_s16): Remove.
(vqmovunbq_s16): Remove.
(vqmovuntq_s32): Remove.
(vqmovunbq_s32): Remove.
(vqmovunbq_m_s16): Remove.
(vqmovuntq_m_s16): Remove.
(vqmovunbq_m_s32): Remove.
(vqmovuntq_m_s32): Remove.
(__arm_vqmovuntq_s16): Remove.
(__arm_vqmovunbq_s16): Remove.
(__arm_vqmovuntq_s32): Remove.
(__arm_vqmovunbq_s32): Remove.
(__arm_vqmovunbq_m_s16): Remove.
(__arm_vqmovuntq_m_s16): Remove.
(__arm_vqmovunbq_m_s32): Remove.
(__arm_vqmovuntq_m_s32): Remove.
(__arm_vqmovuntq): Remove.
(__arm_vqmovunbq): Remove.
(__arm_vqmovunbq_m): Remove.
(__arm_vqmovuntq_m): Remove.
|
|
vqmovuntq
Factorize vmovnbq vmovntq vqmovnbq vqmovntq vqmovunbq vqmovuntq so
that they use the same pattern.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_MOVN, MVE_MOVN_M): New.
(mve_insn): Add vmovnb, vmovnt, vqmovnb, vqmovnt, vqmovunb,
vqmovunt.
(isu): Likewise.
(supf): Add VQMOVUNBQ_M_S, VQMOVUNBQ_S, VQMOVUNTQ_M_S,
VQMOVUNTQ_S.
* config/arm/mve.md (mve_vmovnbq_<supf><mode>)
(mve_vmovntq_<supf><mode>, mve_vqmovnbq_<supf><mode>)
(mve_vqmovntq_<supf><mode>, mve_vqmovunbq_s<mode>)
(mve_vqmovuntq_s<mode>): Merge into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.
(mve_vmovnbq_m_<supf><mode>, mve_vmovntq_m_<supf><mode>)
(mve_vqmovnbq_m_<supf><mode>, mve_vqmovntq_m_<supf><mode>)
(mve_vqmovunbq_m_s<mode>, mve_vqmovuntq_m_s<mode>): Merge into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this.
|
|
shapes
This patch adds the binary_move_narrow and binary_move_narrow_unsigned
shapes descriptions.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (binary_move_narrow): New.
(binary_move_narrow_unsigned): New.
* config/arm/arm-mve-builtins-shapes.h (binary_move_narrow): New.
(binary_move_narrow_unsigned): New.
|
|
Implement vrndq, vrndaq, vrndmq, vrndnq, vrndpq, vrndxq using the new
MVE builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_ONLY_F): New.
(vrndaq, vrndmq, vrndnq, vrndpq, vrndq, vrndxq): New.
* config/arm/arm-mve-builtins-base.def (vrndaq, vrndmq, vrndnq)
(vrndpq, vrndq, vrndxq): New.
* config/arm/arm-mve-builtins-base.h (vrndaq, vrndmq, vrndnq)
(vrndpq, vrndq, vrndxq): New.
* config/arm/arm_mve.h (vrndxq): Remove.
(vrndq): Remove.
(vrndpq): Remove.
(vrndnq): Remove.
(vrndmq): Remove.
(vrndaq): Remove.
(vrndaq_m): Remove.
(vrndmq_m): Remove.
(vrndnq_m): Remove.
(vrndpq_m): Remove.
(vrndq_m): Remove.
(vrndxq_m): Remove.
(vrndq_x): Remove.
(vrndnq_x): Remove.
(vrndmq_x): Remove.
(vrndpq_x): Remove.
(vrndaq_x): Remove.
(vrndxq_x): Remove.
(vrndxq_f16): Remove.
(vrndxq_f32): Remove.
(vrndq_f16): Remove.
(vrndq_f32): Remove.
(vrndpq_f16): Remove.
(vrndpq_f32): Remove.
(vrndnq_f16): Remove.
(vrndnq_f32): Remove.
(vrndmq_f16): Remove.
(vrndmq_f32): Remove.
(vrndaq_f16): Remove.
(vrndaq_f32): Remove.
(vrndaq_m_f16): Remove.
(vrndmq_m_f16): Remove.
(vrndnq_m_f16): Remove.
(vrndpq_m_f16): Remove.
(vrndq_m_f16): Remove.
(vrndxq_m_f16): Remove.
(vrndaq_m_f32): Remove.
(vrndmq_m_f32): Remove.
(vrndnq_m_f32): Remove.
(vrndpq_m_f32): Remove.
(vrndq_m_f32): Remove.
(vrndxq_m_f32): Remove.
(vrndq_x_f16): Remove.
(vrndq_x_f32): Remove.
(vrndnq_x_f16): Remove.
(vrndnq_x_f32): Remove.
(vrndmq_x_f16): Remove.
(vrndmq_x_f32): Remove.
(vrndpq_x_f16): Remove.
(vrndpq_x_f32): Remove.
(vrndaq_x_f16): Remove.
(vrndaq_x_f32): Remove.
(vrndxq_x_f16): Remove.
(vrndxq_x_f32): Remove.
(__arm_vrndxq_f16): Remove.
(__arm_vrndxq_f32): Remove.
(__arm_vrndq_f16): Remove.
(__arm_vrndq_f32): Remove.
(__arm_vrndpq_f16): Remove.
(__arm_vrndpq_f32): Remove.
(__arm_vrndnq_f16): Remove.
(__arm_vrndnq_f32): Remove.
(__arm_vrndmq_f16): Remove.
(__arm_vrndmq_f32): Remove.
(__arm_vrndaq_f16): Remove.
(__arm_vrndaq_f32): Remove.
(__arm_vrndaq_m_f16): Remove.
(__arm_vrndmq_m_f16): Remove.
(__arm_vrndnq_m_f16): Remove.
(__arm_vrndpq_m_f16): Remove.
(__arm_vrndq_m_f16): Remove.
(__arm_vrndxq_m_f16): Remove.
(__arm_vrndaq_m_f32): Remove.
(__arm_vrndmq_m_f32): Remove.
(__arm_vrndnq_m_f32): Remove.
(__arm_vrndpq_m_f32): Remove.
(__arm_vrndq_m_f32): Remove.
(__arm_vrndxq_m_f32): Remove.
(__arm_vrndq_x_f16): Remove.
(__arm_vrndq_x_f32): Remove.
(__arm_vrndnq_x_f16): Remove.
(__arm_vrndnq_x_f32): Remove.
(__arm_vrndmq_x_f16): Remove.
(__arm_vrndmq_x_f32): Remove.
(__arm_vrndpq_x_f16): Remove.
(__arm_vrndpq_x_f32): Remove.
(__arm_vrndaq_x_f16): Remove.
(__arm_vrndaq_x_f32): Remove.
(__arm_vrndxq_x_f16): Remove.
(__arm_vrndxq_x_f32): Remove.
(__arm_vrndxq): Remove.
(__arm_vrndq): Remove.
(__arm_vrndpq): Remove.
(__arm_vrndnq): Remove.
(__arm_vrndmq): Remove.
(__arm_vrndaq): Remove.
(__arm_vrndaq_m): Remove.
(__arm_vrndmq_m): Remove.
(__arm_vrndnq_m): Remove.
(__arm_vrndpq_m): Remove.
(__arm_vrndq_m): Remove.
(__arm_vrndxq_m): Remove.
(__arm_vrndq_x): Remove.
(__arm_vrndnq_x): Remove.
(__arm_vrndmq_x): Remove.
(__arm_vrndpq_x): Remove.
(__arm_vrndaq_x): Remove.
(__arm_vrndxq_x): Remove.
|
|
Implement vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq using the new MVE
builtins framework.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-base.cc (FUNCTION_WITHOUT_N_NO_U_F): New.
(vabsq, vnegq, vclsq, vclzq, vqabsq, vqnegq): New.
* config/arm/arm-mve-builtins-base.def (vabsq, vnegq, vclsq)
(vclzq, vqabsq, vqnegq): New.
* config/arm/arm-mve-builtins-base.h (vabsq, vnegq, vclsq, vclzq)
(vqabsq, vqnegq): New.
* config/arm/arm_mve.h (vabsq): Remove.
(vabsq_m): Remove.
(vabsq_x): Remove.
(vabsq_f16): Remove.
(vabsq_f32): Remove.
(vabsq_s8): Remove.
(vabsq_s16): Remove.
(vabsq_s32): Remove.
(vabsq_m_s8): Remove.
(vabsq_m_s16): Remove.
(vabsq_m_s32): Remove.
(vabsq_m_f16): Remove.
(vabsq_m_f32): Remove.
(vabsq_x_s8): Remove.
(vabsq_x_s16): Remove.
(vabsq_x_s32): Remove.
(vabsq_x_f16): Remove.
(vabsq_x_f32): Remove.
(__arm_vabsq_s8): Remove.
(__arm_vabsq_s16): Remove.
(__arm_vabsq_s32): Remove.
(__arm_vabsq_m_s8): Remove.
(__arm_vabsq_m_s16): Remove.
(__arm_vabsq_m_s32): Remove.
(__arm_vabsq_x_s8): Remove.
(__arm_vabsq_x_s16): Remove.
(__arm_vabsq_x_s32): Remove.
(__arm_vabsq_f16): Remove.
(__arm_vabsq_f32): Remove.
(__arm_vabsq_m_f16): Remove.
(__arm_vabsq_m_f32): Remove.
(__arm_vabsq_x_f16): Remove.
(__arm_vabsq_x_f32): Remove.
(__arm_vabsq): Remove.
(__arm_vabsq_m): Remove.
(__arm_vabsq_x): Remove.
(vnegq): Remove.
(vnegq_m): Remove.
(vnegq_x): Remove.
(vnegq_f16): Remove.
(vnegq_f32): Remove.
(vnegq_s8): Remove.
(vnegq_s16): Remove.
(vnegq_s32): Remove.
(vnegq_m_s8): Remove.
(vnegq_m_s16): Remove.
(vnegq_m_s32): Remove.
(vnegq_m_f16): Remove.
(vnegq_m_f32): Remove.
(vnegq_x_s8): Remove.
(vnegq_x_s16): Remove.
(vnegq_x_s32): Remove.
(vnegq_x_f16): Remove.
(vnegq_x_f32): Remove.
(__arm_vnegq_s8): Remove.
(__arm_vnegq_s16): Remove.
(__arm_vnegq_s32): Remove.
(__arm_vnegq_m_s8): Remove.
(__arm_vnegq_m_s16): Remove.
(__arm_vnegq_m_s32): Remove.
(__arm_vnegq_x_s8): Remove.
(__arm_vnegq_x_s16): Remove.
(__arm_vnegq_x_s32): Remove.
(__arm_vnegq_f16): Remove.
(__arm_vnegq_f32): Remove.
(__arm_vnegq_m_f16): Remove.
(__arm_vnegq_m_f32): Remove.
(__arm_vnegq_x_f16): Remove.
(__arm_vnegq_x_f32): Remove.
(__arm_vnegq): Remove.
(__arm_vnegq_m): Remove.
(__arm_vnegq_x): Remove.
(vclsq): Remove.
(vclsq_m): Remove.
(vclsq_x): Remove.
(vclsq_s8): Remove.
(vclsq_s16): Remove.
(vclsq_s32): Remove.
(vclsq_m_s8): Remove.
(vclsq_m_s16): Remove.
(vclsq_m_s32): Remove.
(vclsq_x_s8): Remove.
(vclsq_x_s16): Remove.
(vclsq_x_s32): Remove.
(__arm_vclsq_s8): Remove.
(__arm_vclsq_s16): Remove.
(__arm_vclsq_s32): Remove.
(__arm_vclsq_m_s8): Remove.
(__arm_vclsq_m_s16): Remove.
(__arm_vclsq_m_s32): Remove.
(__arm_vclsq_x_s8): Remove.
(__arm_vclsq_x_s16): Remove.
(__arm_vclsq_x_s32): Remove.
(__arm_vclsq): Remove.
(__arm_vclsq_m): Remove.
(__arm_vclsq_x): Remove.
(vclzq): Remove.
(vclzq_m): Remove.
(vclzq_x): Remove.
(vclzq_s8): Remove.
(vclzq_s16): Remove.
(vclzq_s32): Remove.
(vclzq_u8): Remove.
(vclzq_u16): Remove.
(vclzq_u32): Remove.
(vclzq_m_u8): Remove.
(vclzq_m_s8): Remove.
(vclzq_m_u16): Remove.
(vclzq_m_s16): Remove.
(vclzq_m_u32): Remove.
(vclzq_m_s32): Remove.
(vclzq_x_s8): Remove.
(vclzq_x_s16): Remove.
(vclzq_x_s32): Remove.
(vclzq_x_u8): Remove.
(vclzq_x_u16): Remove.
(vclzq_x_u32): Remove.
(__arm_vclzq_s8): Remove.
(__arm_vclzq_s16): Remove.
(__arm_vclzq_s32): Remove.
(__arm_vclzq_u8): Remove.
(__arm_vclzq_u16): Remove.
(__arm_vclzq_u32): Remove.
(__arm_vclzq_m_u8): Remove.
(__arm_vclzq_m_s8): Remove.
(__arm_vclzq_m_u16): Remove.
(__arm_vclzq_m_s16): Remove.
(__arm_vclzq_m_u32): Remove.
(__arm_vclzq_m_s32): Remove.
(__arm_vclzq_x_s8): Remove.
(__arm_vclzq_x_s16): Remove.
(__arm_vclzq_x_s32): Remove.
(__arm_vclzq_x_u8): Remove.
(__arm_vclzq_x_u16): Remove.
(__arm_vclzq_x_u32): Remove.
(__arm_vclzq): Remove.
(__arm_vclzq_m): Remove.
(__arm_vclzq_x): Remove.
(vqabsq): Remove.
(vqnegq): Remove.
(vqnegq_m): Remove.
(vqabsq_m): Remove.
(vqabsq_s8): Remove.
(vqabsq_s16): Remove.
(vqabsq_s32): Remove.
(vqnegq_s8): Remove.
(vqnegq_s16): Remove.
(vqnegq_s32): Remove.
(vqnegq_m_s8): Remove.
(vqabsq_m_s8): Remove.
(vqnegq_m_s16): Remove.
(vqabsq_m_s16): Remove.
(vqnegq_m_s32): Remove.
(vqabsq_m_s32): Remove.
(__arm_vqabsq_s8): Remove.
(__arm_vqabsq_s16): Remove.
(__arm_vqabsq_s32): Remove.
(__arm_vqnegq_s8): Remove.
(__arm_vqnegq_s16): Remove.
(__arm_vqnegq_s32): Remove.
(__arm_vqnegq_m_s8): Remove.
(__arm_vqabsq_m_s8): Remove.
(__arm_vqnegq_m_s16): Remove.
(__arm_vqabsq_m_s16): Remove.
(__arm_vqnegq_m_s32): Remove.
(__arm_vqabsq_m_s32): Remove.
(__arm_vqabsq): Remove.
(__arm_vqnegq): Remove.
(__arm_vqnegq_m): Remove.
(__arm_vqabsq_m): Remove.
|
|
Factorize vabs vcls vclz vneg vqabs vqneg vrnda vrndm vrndn vrndp vrnd
vrndx so that they use the same pattern.
This patch introduces the mve_mnemo iterator because some of the
involved intrinsics have a different name from their mnenonic: for
instance vrndq vs vrintz.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/iterators.md (MVE_INT_M_UNARY, MVE_INT_UNARY)
(MVE_FP_UNARY, MVE_FP_M_UNARY): New.
(mve_insn): Add vabs, vcls, vclz, vneg, vqabs, vqneg, vrnda,
vrndm, vrndn, vrndp, vrnd, vrndx.
(isu): Add VABSQ_M_S, VCLSQ_M_S, VCLZQ_M_S, VCLZQ_M_U, VNEGQ_M_S,
VQABSQ_M_S, VQNEGQ_M_S.
(mve_mnemo): New.
* config/arm/mve.md (mve_vrndq_m_f<mode>, mve_vrndxq_f<mode>)
(mve_vrndq_f<mode>, mve_vrndpq_f<mode>, mve_vrndnq_f<mode>)
(mve_vrndmq_f<mode>, mve_vrndaq_f<mode>): Merge into ...
(@mve_<mve_insn>q_f<mode>): ... this.
(mve_vnegq_f<mode>, mve_vabsq_f<mode>): Merge into ...
(mve_v<absneg_str>q_f<mode>): ... this.
(mve_vnegq_s<mode>, mve_vabsq_s<mode>): Merge into ...
(mve_v<absneg_str>q_s<mode>): ... this.
(mve_vclsq_s<mode>, mve_vqnegq_s<mode>, mve_vqabsq_s<mode>): Merge into ...
(@mve_<mve_insn>q_<supf><mode>): ... this.
(mve_vabsq_m_s<mode>, mve_vclsq_m_s<mode>)
(mve_vclzq_m_<supf><mode>, mve_vnegq_m_s<mode>)
(mve_vqabsq_m_s<mode>, mve_vqnegq_m_s<mode>): Merge into ...
(@mve_<mve_insn>q_m_<supf><mode>): ... this.
(mve_vabsq_m_f<mode>, mve_vnegq_m_f<mode>, mve_vrndaq_m_f<mode>)
(mve_vrndmq_m_f<mode>, mve_vrndnq_m_f<mode>, mve_vrndpq_m_f<mode>)
(mve_vrndxq_m_f<mode>): Merge into ...
(@mve_<mve_insn>q_m_f<mode>): ... this.
|
|
This patch adds the unary shape description.
2022-09-08 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-mve-builtins-shapes.cc (unary): New.
* config/arm/arm-mve-builtins-shapes.h (unary): New.
|
|
Trivial comment typo...
2023-05-09 Jakub Jelinek <jakub@redhat.com>
* mux-utils.h: Fix comment typo, avoides -> avoids.
|
|
I came up with a testcase which reproduces all the way to r10-7469.
LTO to avoid early inlining it, so that ccp handles rotates and not
shifts before they are turned into rotates.
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109778
* gcc.dg/lto/pr109778_0.c: New test.
* gcc.dg/lto/pr109778_1.c: New file.
|
|
[PR109778]
The following testcase is miscompiled, because bitwise ccp2 handles
a rotate with a signed type incorrectly.
Seems tree-ssa-ccp.cc has the only callers of wi::[lr]rotate with 3
arguments, all other callers just rotate in the right precision and
I think work correctly. ccp works with widest_ints and so rotations
by the excessive precision certainly don't match what it wants
when it sees a rotate in some specific bitsize. Still, if it is
unsigned rotate and the widest_int is zero extended from width,
the functions perform left shift and logical right shift on the value
and then at the end zero extend the result of left shift and uselessly
also the result of logical right shift and return | of that.
On the testcase we the signed char rrotate by 4 argument is
CONSTANT -75 i.e. 0xffffffff....fffffb5 with mask 2.
The mask is correctly rotated to 0x20, but because the 8-bit constant
is sign extended to 192-bit one, the logical right shift by 4 doesn't
yield expected 0xb, but gives 0xfffffffffff....ffffb, and then
return wi::zext (left, width) | wi::zext (right, width); where left is
0xfffffff....fb50, so we return 0xfb instead of the expected
0x5b.
The following patch fixes that by doing the zero extension in case of
the right variable before doing wi::lrshift rather than after it.
Also, wi::[lr]rotate widht width < precision always zero extends
the result. I'm afraid it can't do better because it doesn't know
if it is done for an unsigned or signed type, but the caller in this
case knows that very well, so I've done the extension based on sgn
in the caller. E.g. 0x5b rotated right (or left) by 4 with width 8
previously gave 0xb5, but sgn == SIGNED in widest_int it should be
0xffffffff....fffb5 instead.
2023-05-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109778
* wide-int.h (wi::lrotate, wi::rrotate): Call wi::lrshift on
wi::zext (x, width) rather than x if width != precision, rather
than using wi::zext (right, width) after the shift.
* tree-ssa-ccp.cc (bit_value_binop): Call wi::ext on the results
of wi::lrotate or wi::rrotate.
* gcc.c-torture/execute/pr109778.c: New test.
|
|
get_out_file did not follow the coding conventions (mixing three-space
and two-space indentation, missing linebreak before function name).
Take that as an excuse to reimplement it in a more terse manner and
rename as 'choose_output', which is hopefully more descriptive.
gcc/ChangeLog:
* genmatch.cc (get_out_file): Make static and rename to ...
(choose_output): ... this. Reimplement. Update all uses ...
(decision_tree::gen): ... here and ...
(main): ... here.
|
|
Display usage more consistently and get rid of camelCase.
gcc/ChangeLog:
* genmatch.cc (showUsage): Reimplement as ...
(usage): ...this. Adjust all uses.
(main): Print usage when no arguments. Add missing 'return 1'.
|
|
Eliminate boolean parameters of emit_func. The first ('open') just
prints 'extern' to generated header, which is unnecessary. Introduce a
separate function to use when finishing a declaration in place of the
second ('close').
Rename emit_func to 'fp_decl' (matching 'fprintf' in length) to unbreak
indentation in several places.
Reshuffle emitted line breaks in a few places to make generated
declarations less ugly.
gcc/ChangeLog:
* genmatch.cc (header_file): Make static.
(emit_func): Rename to...
(fp_decl): ... this. Adjust all uses.
(fp_decl_done): New function. Use it...
(decision_tree::gen): ... here and...
(write_predicate): ... here.
(main): Adjust.
|
|
Some tests hard-coded specific allocations for temporary registers,
whereas the RA should be free to pick anything that doesn't force
unnecessary moves or spills.
gcc/testsuite/
* gcc.target/aarch64/asimd-mul-to-shl-sub.c: Allow any register
allocation for temporary results, rather than requiring specific
registers.
* gcc.target/aarch64/auto-init-padding-1.c: Likewise.
* gcc.target/aarch64/auto-init-padding-2.c: Likewise.
* gcc.target/aarch64/auto-init-padding-3.c: Likewise.
* gcc.target/aarch64/auto-init-padding-4.c: Likewise.
* gcc.target/aarch64/auto-init-padding-9.c: Likewise.
* gcc.target/aarch64/memset-corner-cases.c: Likewise.
* gcc.target/aarch64/memset-q-reg.c: Likewise.
* gcc.target/aarch64/simd/vaddlv_1.c: Likewise.
* gcc.target/aarch64/sve-neon-modes_1.c: Likewise.
* gcc.target/aarch64/sve-neon-modes_3.c: Likewise.
* gcc.target/aarch64/sve/load_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_256.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_512.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_1024.c: Likewise.
* gcc.target/aarch64/sve/pcs/return_6_2048.c: Likewise.
* gcc.target/aarch64/sve/pr89007-1.c: Likewise.
* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
* gcc.target/aarch64/sve/store_scalar_offset_1.c: Likewise.
* gcc.target/aarch64/vadd_reduc-1.c: Likewise.
* gcc.target/aarch64/vadd_reduc-2.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_bf16.c: Allow the temporary
predicate register to be any of p4-p7, rather than requiring p4
specifically.
* gcc.target/aarch64/sve/pcs/args_5_be_f16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_f64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_s64.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u8.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u16.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u32.c: Likewise.
* gcc.target/aarch64/sve/pcs/args_5_be_u64.c: Likewise.
|
|
There were many tests that used [0-9] to match an FP or vector register,
but that should allow any of 0-31 instead.
asm-x-constraint-1.c required s0-s7, but that's the range for "y"
rather than "x". "x" allows s0-s15.
sve/pcs/return_9.c required z2-z7 (the initial set of available
call-clobbered registers), but z24-z31 are OK too.
gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/vshl-opt-6.c: Allow any
FP/vector register, not just register 0-9.
* gcc.target/aarch64/fmul_fcvt_2.c: Likewise.
* gcc.target/aarch64/ldp_stp_8.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_21.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_f64.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_s64.c: Likewise.
* gcc.target/aarch64/simd/vpaddd_u64.c: Likewise.
* gcc.target/aarch64/sve/adr_1.c: Likewise.
* gcc.target/aarch64/sve/adr_2.c: Likewise.
* gcc.target/aarch64/sve/adr_3.c: Likewise.
* gcc.target/aarch64/sve/adr_4.c: Likewise.
* gcc.target/aarch64/sve/adr_5.c: Likewise.
* gcc.target/aarch64/sve/extract_1.c: Likewise.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.
* gcc.target/aarch64/sve/spill_3.c: Likewise.
* gcc.target/aarch64/vfp-1.c: Likewise.
* gcc.target/aarch64/asm-x-constraint-1.c: Allow s0-s15, not just
s0-s7.
* gcc.target/aarch64/sve/pcs/return_9.c: Allow z24-z31 as well as
z2-z7.
|
|
Most governing predicate operands require p0-p7, but some
instructions also allow p8-p15. Non-gp uses of predicates
often also allow all of p0-p15.
This patch fixes up cases where we required p0-p7 unnecessarily.
In some cases we match the definition (typically a comparison,
PFALSE or PTRUE), sometimes we match the use (like a logic
instruction, MOV or SEL), and sometimes we match both.
gcc/testsuite/
* g++.target/aarch64/sve/vcond_1.C: Allow any predicate
register for the temporary results, not just p0-p7.
* gcc.target/aarch64/sve/acle/asm/dupq_b8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dupq_b64.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_5.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_6.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_7.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_9.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilele_10.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_1.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_2.c: Likewise.
* gcc.target/aarch64/sve/acle/general/whilelt_3.c: Likewise.
* gcc.target/aarch64/sve/pcs/varargs_1.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_6.c: Likewise.
* gcc.target/aarch64/sve/vcond_2.c: Likewise.
* gcc.target/aarch64/sve/vcond_3.c: Likewise.
* gcc.target/aarch64/sve/vcond_7.c: Likewise.
* gcc.target/aarch64/sve/vcond_18.c: Likewise.
* gcc.target/aarch64/sve/vcond_19.c: Likewise.
* gcc.target/aarch64/sve/vcond_20.c: Likewise.
|
|
Some of the svdup tests expand to a SEL between two constant vectors.
This patch allows the constants to be formed in either order.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/dup_s16.c: When using SEL to select
between two constant vectors, allow the constant moves to appear in
either order.
* gcc.target/aarch64/sve/acle/asm/dup_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/dup_u64.c: Likewise.
|
|
Some ACLE intrinsics map to instructions that tie the output
operand to an input operand. If all the operands are allocated
to different registers, and if MOVPRFX can't be used, we will need
a move either before the instruction or after it. Many tests only
matched the "before" case; this patch makes them accept the "after"
case too.
gcc/testsuite/
* gcc.target/aarch64/advsimd-intrinsics/bfcvtnq2-untied.c: Allow
moves to occur after the intrinsic instruction, rather than requiring
them to happen before.
* gcc.target/aarch64/advsimd-intrinsics/bfdot-1.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vdot-3-1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/adda_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brka_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brkb_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/brkn_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clasta_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_bf16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/clastb_f64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pfirst_b.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/pnext_b8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sli_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_s8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u16.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u32.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u64.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sri_u8.c: Likewise.
|
|
Some of the SVE ACLE asm tests tried to be agnostic about the
instruction order, but only one of the alternatives was exercised
in practice. This patch fixes latent typos in the other versions.
gcc/testsuite/
* gcc.target/aarch64/sve2/acle/asm/aesd_u8.c: Fix expected register
allocation in the case where a move occurs after the intrinsic
instruction.
* gcc.target/aarch64/sve2/acle/asm/aese_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesimc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/aesmc_u8.c: Likewise.
* gcc.target/aarch64/sve2/acle/asm/sm4e_u32.c: Likewise.
|
|
This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch"). To start by quoting that:
If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn. Allocating the
same register to the input and the output then saves the cost of a move.
If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency. This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.
This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common. It is then impossible for the output
operand to reuse the dying input register.
This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output. (Matched earlyclobbers are fine.)
Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. A SPEC2017 run
on aarch64 showed no differences outside the noise. Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:
Target Tests Good Bad Delta Best Worst Median
====== ===== ==== === ===== ==== ===== ======
amdgcn-amdhsa 14 7 7 3 -18 10 -1
arm-linux-gnueabihf 16 15 1 -22 -4 2 -1
csky-elf 6 6 0 -21 -6 -2 -4
hppa64-hp-hpux11.23 5 5 0 -7 -2 -1 -1
ia64-linux-gnu 16 16 0 -70 -15 -1 -3
m32r-elf 53 1 52 64 -2 8 1
mcore-elf 2 2 0 -8 -6 -2 -6
microblaze-elf 285 283 2 -909 -68 4 -1
mmix 7 7 0 -2101 -2091 -1 -1
msp430-elf 1 1 0 -4 -4 -4 -4
pru-elf 8 6 2 -12 -6 2 -2
rx-elf 22 18 4 -40 -5 6 -2
sparc-linux-gnu 15 14 1 -40 -8 1 -2
sparc-wrs-vxworks 15 14 1 -40 -8 1 -2
visium-elf 2 1 1 0 -2 2 -2
xstormy16-elf 1 1 0 -2 -2 -2 -2
with other targets showing no sensitivity to the patch. The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.
gcc/
* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
earlyclobbers.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
|
|
This was fixed by r13-1018, but the testcase seems needed.
PR c++/106740
gcc/testsuite/ChangeLog:
* g++.dg/template/friend78.C: New test.
|
|
|
|
This is a repost/respin of a patch that was conditionally approved:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609470.html
This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using i386's previously added
split_double_concat infrastructure.
For the new test case below:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 t = (__int128)y << 64;
__int128 r = (x & ~0ull) | t;
return r;
}
mainline GCC with -O2 currently generates:
foo: movq %rdi, %rcx
xorl %eax, %eax
xorl %edi, %edi
orq %rcx, %rax
orq %rdi, %rdx
ret
with this patch, GCC instead now generates the much better:
foo: movq %rdi, %rcx
movq %rcx, %rax
ret
It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path. Given that this idiom
isn't seen in 32-bit code (so this pattern doesn't match with -m32),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.
2023-05-08 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386.md (any_or_plus): Move definition earlier.
(*insvti_highpart_1): New define_insn_and_split to overwrite
(insv) the highpart of a TImode register/memory.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvti_highpart-1.c: New test case.
|
|
Todo from early_inliner needs to be propagated so that
cleanup_tree_cfg () is called if necessary.
This bug was causing an assert in get_loop_body during
ipa-sra in autoprofiledbootstrap build since loops weren't
fixed up and one of the loops had num_nodes set to 0.
Tested on x86_64-pc-linux-gnu.
gcc/ChangeLog:
* auto-profile.cc (auto_profile): Check todo from early_inline
to see if cleanup_tree_vfg needs to be called.
(early_inline): Return todo from early_inliner.
|
|
I had missed when converting this
testcase to Gimple that there was a define
for int/unsigned type specifically to get
an INT32 type. This means when using a
literal integer constant you need to use the
`_Literal (type)` to form the types correctly on the
constants.
This fixes the issue and has been both tested on
xstormy16-elf and x86_64-linux-gnu.
Committed as obvious.
gcc/testsuite/ChangeLog:
PR testsuite/109776
* gcc.dg/pr81192.c: Fix integer constants for int16 targets.
|
|
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc (pass_vsetvl::get_vector_info):
New.
(pass_vsetvl::get_block_info): New.
(pass_vsetvl::update_vector_info): New.
(pass_vsetvl::simple_vsetvl): Use get_vector_info.
(pass_vsetvl::compute_local_backward_infos): Ditto.
(pass_vsetvl::transfer_before): Ditto.
(pass_vsetvl::transfer_after): Ditto.
(pass_vsetvl::emit_local_forward_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_insns): Ditto.
(pass_vsetvl::compute_local_backward_infos): Use
update_vector_info.
|
|
stdint.h will require having corresponding multi-lib existing, so using
stdint-gcc.h instead, also added a riscv_vector.h wrapper to
gcc.target/riscv/rvv/autovec/.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/partial/single_rgroup-1.h: Change
stdint.h to stdint-gcc.h.
* gcc.target/riscv/rvv/autovec/template-1.h: Ditto.
* gcc.target/riscv/rvv/autovec/riscv_vector.h: New.
|
|
Today's build of xstormy16-elf failed due to a branch to an out of range
target. Manual inspection of the assembly code for the affected function
(divdi3) showed that the zero-extension patterns were claiming a length
of 2, but clearly assembled into 4 bytes.
This patch adds an explicit length to the zero extension pattern and
appears to resolve the issue in my test builds.
gcc/
* config/stormy16/stormy16.md (zero_extendhisi2): Fix length.
|
|
the 'LTO_TORTURE_OPTIONS'
Otherwise, for example for 'RUNTESTFLAGS' of '--target_board=unix\{-m64,-m32\}'
vs. '--target_board=unix\{-m32,-m64\}', both variants exercise testing with
always the first flag variant's 'LTO_OPTIONS'/'LTO_TORTURE_OPTIONS', which
results in unequal test results between the two 'RUNTESTFLAGS' variants if one
of the flag variants has 'check_linker_plugin_available' but the other doesn't.
Fix-up for r180245 (commit c1a7cdbbcca90ad5260bfc543f8c10f3514e76c1)
"Update testsuite to run with slim LTO".
gcc/testsuite/
* g++.dg/guality/guality.exp: Move 'torture-init' earlier.
* gcc.dg/guality/guality.exp: Likewise.
* gfortran.dg/guality/guality.exp: Likewise.
* lib/c-torture.exp (LTO_TORTURE_OPTIONS): Don't set.
* lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Don't set.
* lib/lto.exp (lto_init, lto_finish): Let each 'lto_init'
determine the default 'LTO_OPTIONS'.
* lib/torture-options.exp (torture-init, torture-finish): Let each
'torture-init' determine the 'LTO_TORTURE_OPTIONS'.
|
|
This extends the PR93107 fix, which made us do resolve_nondeduced_context
on the elements of an initializer list during auto deduction, to happen for
CTAD as well.
PR c++/106214
PR c++/93107
gcc/cp/ChangeLog:
* pt.cc (do_auto_deduction): Move up resolve_nondeduced_context
calls to happen before do_class_deduction. Add some
error_mark_node tests.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/class-deduction114.C: New test.
|
|
The new __dmr type that is being added as a possible future PowerPC instruction
set bumps into a structure field size issue. The size of the __dmr type is 1024 bits.
The precision field in tree_type_common is currently 10 bits, so if you store
1,024 into field, you get a 0 back. When you get 0 in the precision field, the
ccp pass passes this 0 to sext_hwi in hwint.h. That function in turn generates
a shift that is equal to the host wide int bit size, which is undefined as
machine dependent for shifting in C/C++.
int shift = HOST_BITS_PER_WIDE_INT - prec;
return ((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) src << shift)) >> shift;
It turns out the x86_64 where I first did my tests returns the original input
before the two shifts, while the PowerPC always returns 0. In the ccp pass, the
original input is -1, and so it worked. When I did the runs on the PowerPC, the
result was 0, which ultimately led to the failure.
2023-02-01 Richard Biener <rguenther@suse.de>
Michael Meissner <meissner@linux.ibm.com>
PR middle-end/108623
* tree-core.h (tree_type_common): Bump up precision field to 16 bits.
Align bit fields > 1 bit to at least an 8-bit boundary.
|
|
Fix coding-style errors introduced in ca2f64d5d08c1699ca4b7cb2bf6a76692e809e0f
gcc/fortran/ChangeLog:
* resolve.cc (resolve_select_type): Fix coding style.
libgfortran/ChangeLog:
* caf/single.c (_gfortran_caf_register): Fix coding style.
* io/async.c (update_pdt, async_io): Likewise.
* io/format.c (free_format_data): Likewise.
* io/transfer.c (st_read_done_worker, st_write_done_worker): Likewise.
* io/unix.c (mem_close): Likewise.
|
|
After using factor_out_conditional_conversion with diamond bb,
we should be able do use it also for all normal unary gimple and not
just conversions. This allows to optimize PR 59424 for an example.
This is also a start to optimize PR 64700 and a few others.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
An example of this is:
```
static inline unsigned long long g(int t)
{
unsigned t1 = t;
return t1;
}
static int abs1(int a)
{
if (a < 0)
a = -a;
return a;
}
unsigned long long f(int c, int d, int e)
{
unsigned long long t;
if (d > e)
t = g(abs1(d));
else
t = g(abs1(e));
return t;
}
```
Which should be optimized to:
_9 = MAX_EXPR <d_5(D), e_6(D)>;
_4 = ABS_EXPR <_9>;
t_3 = (long long unsigned intD.16) _4;
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
(factor_out_conditional_operation): This and add support for all unary
operations.
(pass_phiopt::execute): Update call to factor_out_conditional_conversion
to call factor_out_conditional_operation instead.
PR tree-optimization/109424
PR tree-optimization/59424
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-2.c: Update tree scan for
details change in wording.
* gcc.dg/tree-ssa/minmax-17.c: Likewise.
* gcc.dg/tree-ssa/pr103771.c: Likewise.
* gcc.dg/tree-ssa/minmax-18.c: New test.
* gcc.dg/tree-ssa/minmax-19.c: New test.
|
|
After adding diamond shaped bb support to factor_out_conditional_conversion,
we can get a case where we have two conversions that needs factored out
and then would have another phiopt happen.
An example is:
```
static inline unsigned long long g(int t)
{
unsigned t1 = t;
return t1;
}
unsigned long long f(int c, int d, int e)
{
unsigned long long t;
if (c > d)
t = g(c);
else
t = g(d);
return t;
}
```
In this case we should get a MAX_EXPR in phiopt1 with two casts.
Before this patch, we would just factor out the outer cast and then
wait till phiopt2 to factor out the inner cast.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
over factor_out_conditional_conversion.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-17.c: New test.
|
|
So the function factor_out_conditional_conversion already supports
diamond shaped bb forms, just need to be called for such a thing.
harden-cond-comp.c needed to be changed as we would optimize out the
conversion now and that causes the compare hardening not needing to
split the block which it was testing. So change it such that there
would be no chance of optimization.
Also add two testcases that showed the improvement. PR 103771 is
solved in ifconvert also for the vectorizer but now it is solved
in a general sense.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/49959
PR tree-optimization/103771
gcc/ChangeLog:
* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
Diamond shapped bb form for factor_out_conditional_conversion.
gcc/testsuite/ChangeLog:
* c-c++-common/torture/harden-cond-comp.c: Change testcase
slightly to avoid the new phiopt optimization.
* gcc.dg/tree-ssa/abs-2.c: New test.
* gcc.dg/tree-ssa/pr103771.c: New test.
|
|
1. Add movmisalign pattern for TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
targethook, current RISC-V has supported this target hook, we can't make
it supported without movmisalign pattern.
2. Remove global extern of get_mask_policy_no_pred && get_tail_policy_no_pred.
These 2 functions are comming from intrinsic builtin frameworks.
We are sure we don't need them in auto-vectorization implementation.
3. Refine mask mode implementation.
4. We should not have "riscv_vector_" in riscv_vector namspace since it
makes the codes inconsistent and ugly.
For example:
Before this patch:
static opt_machine_mode
riscv_get_mask_mode (machine_mode mode)
{
machine_mode mask_mode = VOIDmode;
if (TARGET_VECTOR && riscv_vector::riscv_vector_get_mask_mode (mode).exists (&mask_mode))
return mask_mode;
..
After this patch:
riscv_get_mask_mode (machine_mode mode)
{
machine_mode mask_mode = VOIDmode;
if (TARGET_VECTOR && riscv_vector::get_mask_mode (mode).exists (&mask_mode))
return mask_mode;
..
5. Fix fail testcase fixed-vlmax-1.c.
gcc/ChangeLog:
* config/riscv/autovec.md (movmisalign<mode>): New pattern.
* config/riscv/riscv-protos.h (riscv_vector_mask_mode_p): Delete.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-v.cc (get_mask_policy_no_pred): Delete.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(get_mask_mode): New function.
* config/riscv/riscv-vector-builtins.cc (use_real_merge_p): Remove
global extern.
(get_tail_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred): Ditto.
(get_mask_policy_for_pred): Ditto
* config/riscv/riscv.cc (riscv_get_mask_mode): Refine codes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/fixed-vlmax-1.c: Fix typo.
|