aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2024-12-13arm: [MVE intrinsics] rework vldr gather_shifted_offsetChristophe Lyon9-653/+129
Implement vldr?q_gather_shifted_offset using the new MVE builtins framework. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_ldrgu_qualifiers) (arm_ldrgs_qualifiers, arm_ldrgs_z_qualifiers) (arm_ldrgu_z_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (vldrq_gather_impl): Add support for shifted version. (vldrdq_gather_shifted, vldrhq_gather_shifted) (vldrwq_gather_shifted): New. * config/arm/arm-mve-builtins-base.def (vldrdq_gather_shifted) (vldrhq_gather_shifted, vldrwq_gather_shifted): New. * config/arm/arm-mve-builtins-base.h (vldrdq_gather_shifted) (vldrhq_gather_shifted, vldrwq_gather_shifted): New. * config/arm/arm_mve.h (vldrhq_gather_shifted_offset): Delete. (vldrhq_gather_shifted_offset_z): Delete. (vldrdq_gather_shifted_offset): Delete. (vldrdq_gather_shifted_offset_z): Delete. (vldrwq_gather_shifted_offset): Delete. (vldrwq_gather_shifted_offset_z): Delete. (vldrhq_gather_shifted_offset_s32): Delete. (vldrhq_gather_shifted_offset_s16): Delete. (vldrhq_gather_shifted_offset_u32): Delete. (vldrhq_gather_shifted_offset_u16): Delete. (vldrhq_gather_shifted_offset_z_s32): Delete. (vldrhq_gather_shifted_offset_z_s16): Delete. (vldrhq_gather_shifted_offset_z_u32): Delete. (vldrhq_gather_shifted_offset_z_u16): Delete. (vldrdq_gather_shifted_offset_s64): Delete. (vldrdq_gather_shifted_offset_u64): Delete. (vldrdq_gather_shifted_offset_z_s64): Delete. (vldrdq_gather_shifted_offset_z_u64): Delete. (vldrhq_gather_shifted_offset_f16): Delete. (vldrhq_gather_shifted_offset_z_f16): Delete. (vldrwq_gather_shifted_offset_f32): Delete. (vldrwq_gather_shifted_offset_s32): Delete. (vldrwq_gather_shifted_offset_u32): Delete. (vldrwq_gather_shifted_offset_z_f32): Delete. (vldrwq_gather_shifted_offset_z_s32): Delete. (vldrwq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_s32): Delete. (__arm_vldrhq_gather_shifted_offset_s16): Delete. (__arm_vldrhq_gather_shifted_offset_u32): Delete. (__arm_vldrhq_gather_shifted_offset_u16): Delete. (__arm_vldrhq_gather_shifted_offset_z_s32): Delete. (__arm_vldrhq_gather_shifted_offset_z_s16): Delete. (__arm_vldrhq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_z_u16): Delete. (__arm_vldrdq_gather_shifted_offset_s64): Delete. (__arm_vldrdq_gather_shifted_offset_u64): Delete. (__arm_vldrdq_gather_shifted_offset_z_s64): Delete. (__arm_vldrdq_gather_shifted_offset_z_u64): Delete. (__arm_vldrwq_gather_shifted_offset_s32): Delete. (__arm_vldrwq_gather_shifted_offset_u32): Delete. (__arm_vldrwq_gather_shifted_offset_z_s32): Delete. (__arm_vldrwq_gather_shifted_offset_z_u32): Delete. (__arm_vldrhq_gather_shifted_offset_f16): Delete. (__arm_vldrhq_gather_shifted_offset_z_f16): Delete. (__arm_vldrwq_gather_shifted_offset_f32): Delete. (__arm_vldrwq_gather_shifted_offset_z_f32): Delete. (__arm_vldrhq_gather_shifted_offset): Delete. (__arm_vldrhq_gather_shifted_offset_z): Delete. (__arm_vldrdq_gather_shifted_offset): Delete. (__arm_vldrdq_gather_shifted_offset_z): Delete. (__arm_vldrwq_gather_shifted_offset): Delete. (__arm_vldrwq_gather_shifted_offset_z): Delete. * config/arm/arm_mve_builtins.def (vldrhq_gather_shifted_offset_z_u, vldrhq_gather_shifted_offset_u) (vldrhq_gather_shifted_offset_z_s, vldrhq_gather_shifted_offset_s) (vldrdq_gather_shifted_offset_s, vldrhq_gather_shifted_offset_f) (vldrwq_gather_shifted_offset_f, vldrwq_gather_shifted_offset_s) (vldrdq_gather_shifted_offset_z_s) (vldrhq_gather_shifted_offset_z_f) (vldrwq_gather_shifted_offset_z_f) (vldrwq_gather_shifted_offset_z_s, vldrdq_gather_shifted_offset_u) (vldrwq_gather_shifted_offset_u, vldrdq_gather_shifted_offset_z_u) (vldrwq_gather_shifted_offset_z_u): Delete. * config/arm/iterators.md (supf): Remove VLDRHQGSO_S, VLDRHQGSO_U, VLDRDQGSO_S, VLDRDQGSO_U, VLDRWQGSO_S, VLDRWQGSO_U. (VLDRHGSOQ, VLDRDGSOQ, VLDRWGSOQ): Delete. * config/arm/mve.md (mve_vldrhq_gather_shifted_offset_<supf><mode>): Delete. (mve_vldrhq_gather_shifted_offset_z_<supf><mode>): Delete. (mve_vldrdq_gather_shifted_offset_<supf>v2di): Delete. (mve_vldrdq_gather_shifted_offset_z_<supf>v2di): Delete. (mve_vldrhq_gather_shifted_offset_fv8hf): Delete. (mve_vldrhq_gather_shifted_offset_z_fv8hf): Delete. (mve_vldrwq_gather_shifted_offset_fv4sf): Delete. (mve_vldrwq_gather_shifted_offset_<supf>v4si): Delete. (mve_vldrwq_gather_shifted_offset_z_fv4sf): Delete. (mve_vldrwq_gather_shifted_offset_z_<supf>v4si): Delete. (@mve_vldrq_gather_shifted_offset_<mode>): New. (@mve_vldrq_gather_shifted_offset_extend_v4si<US>): New. (@mve_vldrq_gather_shifted_offset_z_<mode>): New. (@mve_vldrq_gather_shifted_offset_z_extend_v4si<US>): New. * config/arm/unspecs.md (VLDRHQGSO_S, VLDRHQGSO_U, VLDRDQGSO_S) (VLDRDQGSO_U, VLDRHQGSO_F, VLDRWQGSO_F, VLDRWQGSO_S, VLDRWQGSO_U): Delete. (VLDRGSOQ, VLDRGSOQ_Z, VLDRGSOQ_EXT, VLDRGSOQ_EXT_Z): New.
2024-12-13arm: [MVE intrinsics] rework vldr gather_offsetChristophe Lyon8-884/+156
Implement vldr?q_gather_offset using the new MVE builtins framework. The patch introduces a new attribute iterator (MVE_u_elem) to accomodate the fact that ACLE's expected output description uses "uNN" for all modes, except V8HF where it expects ".f16". Using "V_sz_elem" would work, but would require to update several testcases. gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vldrq_gather_impl): New. (vldrbq_gather, vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm-mve-builtins-base.def (vldrbq_gather) (vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm-mve-builtins-base.h (vldrbq_gather) (vldrdq_gather, vldrhq_gather, vldrwq_gather): New. * config/arm/arm_mve.h (vldrbq_gather_offset): Delete. (vldrbq_gather_offset_z): Delete. (vldrhq_gather_offset): Delete. (vldrhq_gather_offset_z): Delete. (vldrdq_gather_offset): Delete. (vldrdq_gather_offset_z): Delete. (vldrwq_gather_offset): Delete. (vldrwq_gather_offset_z): Delete. (vldrbq_gather_offset_u8): Delete. (vldrbq_gather_offset_s8): Delete. (vldrbq_gather_offset_u16): Delete. (vldrbq_gather_offset_s16): Delete. (vldrbq_gather_offset_u32): Delete. (vldrbq_gather_offset_s32): Delete. (vldrbq_gather_offset_z_s16): Delete. (vldrbq_gather_offset_z_u8): Delete. (vldrbq_gather_offset_z_s32): Delete. (vldrbq_gather_offset_z_u16): Delete. (vldrbq_gather_offset_z_u32): Delete. (vldrbq_gather_offset_z_s8): Delete. (vldrhq_gather_offset_s32): Delete. (vldrhq_gather_offset_s16): Delete. (vldrhq_gather_offset_u32): Delete. (vldrhq_gather_offset_u16): Delete. (vldrhq_gather_offset_z_s32): Delete. (vldrhq_gather_offset_z_s16): Delete. (vldrhq_gather_offset_z_u32): Delete. (vldrhq_gather_offset_z_u16): Delete. (vldrdq_gather_offset_s64): Delete. (vldrdq_gather_offset_u64): Delete. (vldrdq_gather_offset_z_s64): Delete. (vldrdq_gather_offset_z_u64): Delete. (vldrhq_gather_offset_f16): Delete. (vldrhq_gather_offset_z_f16): Delete. (vldrwq_gather_offset_f32): Delete. (vldrwq_gather_offset_s32): Delete. (vldrwq_gather_offset_u32): Delete. (vldrwq_gather_offset_z_f32): Delete. (vldrwq_gather_offset_z_s32): Delete. (vldrwq_gather_offset_z_u32): Delete. (__arm_vldrbq_gather_offset_u8): Delete. (__arm_vldrbq_gather_offset_s8): Delete. (__arm_vldrbq_gather_offset_u16): Delete. (__arm_vldrbq_gather_offset_s16): Delete. (__arm_vldrbq_gather_offset_u32): Delete. (__arm_vldrbq_gather_offset_s32): Delete. (__arm_vldrbq_gather_offset_z_s8): Delete. (__arm_vldrbq_gather_offset_z_s32): Delete. (__arm_vldrbq_gather_offset_z_s16): Delete. (__arm_vldrbq_gather_offset_z_u8): Delete. (__arm_vldrbq_gather_offset_z_u32): Delete. (__arm_vldrbq_gather_offset_z_u16): Delete. (__arm_vldrhq_gather_offset_s32): Delete. (__arm_vldrhq_gather_offset_s16): Delete. (__arm_vldrhq_gather_offset_u32): Delete. (__arm_vldrhq_gather_offset_u16): Delete. (__arm_vldrhq_gather_offset_z_s32): Delete. (__arm_vldrhq_gather_offset_z_s16): Delete. (__arm_vldrhq_gather_offset_z_u32): Delete. (__arm_vldrhq_gather_offset_z_u16): Delete. (__arm_vldrdq_gather_offset_s64): Delete. (__arm_vldrdq_gather_offset_u64): Delete. (__arm_vldrdq_gather_offset_z_s64): Delete. (__arm_vldrdq_gather_offset_z_u64): Delete. (__arm_vldrwq_gather_offset_s32): Delete. (__arm_vldrwq_gather_offset_u32): Delete. (__arm_vldrwq_gather_offset_z_s32): Delete. (__arm_vldrwq_gather_offset_z_u32): Delete. (__arm_vldrhq_gather_offset_f16): Delete. (__arm_vldrhq_gather_offset_z_f16): Delete. (__arm_vldrwq_gather_offset_f32): Delete. (__arm_vldrwq_gather_offset_z_f32): Delete. (__arm_vldrbq_gather_offset): Delete. (__arm_vldrbq_gather_offset_z): Delete. (__arm_vldrhq_gather_offset): Delete. (__arm_vldrhq_gather_offset_z): Delete. (__arm_vldrdq_gather_offset): Delete. (__arm_vldrdq_gather_offset_z): Delete. (__arm_vldrwq_gather_offset): Delete. (__arm_vldrwq_gather_offset_z): Delete. * config/arm/arm_mve_builtins.def (vldrbq_gather_offset_u) (vldrbq_gather_offset_s, vldrbq_gather_offset_z_s) (vldrbq_gather_offset_z_u, vldrhq_gather_offset_z_u) (vldrhq_gather_offset_u, vldrhq_gather_offset_z_s) (vldrhq_gather_offset_s, vldrdq_gather_offset_s) (vldrhq_gather_offset_f, vldrwq_gather_offset_f) (vldrwq_gather_offset_s, vldrdq_gather_offset_z_s) (vldrhq_gather_offset_z_f, vldrwq_gather_offset_z_f) (vldrwq_gather_offset_z_s, vldrdq_gather_offset_u) (vldrwq_gather_offset_u, vldrdq_gather_offset_z_u) (vldrwq_gather_offset_z_u): Delete. * config/arm/iterators.md (MVE_u_elem): New. (supf): Remove VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S, VLDRHQGO_U, VLDRDQGO_S, VLDRDQGO_U, VLDRWQGO_S, VLDRWQGO_U. (VLDRBGOQ, VLDRHGOQ, VLDRDGOQ, VLDRWGOQ): Delete. * config/arm/mve.md (mve_vldrbq_gather_offset_<supf><mode>): Delete. (mve_vldrbq_gather_offset_z_<supf><mode>): Delete. (mve_vldrhq_gather_offset_<supf><mode>): Delete. (mve_vldrhq_gather_offset_z_<supf><mode>): Delete. (mve_vldrdq_gather_offset_<supf>v2di): Delete. (mve_vldrdq_gather_offset_z_<supf>v2di): Delete. (mve_vldrhq_gather_offset_fv8hf): Delete. (mve_vldrhq_gather_offset_z_fv8hf): Delete. (mve_vldrwq_gather_offset_fv4sf): Delete. (mve_vldrwq_gather_offset_<supf>v4si): Delete. (mve_vldrwq_gather_offset_z_fv4sf): Delete. (mve_vldrwq_gather_offset_z_<supf>v4si): Delete. (@mve_vldrq_gather_offset_<mode>): New. (@mve_vldrq_gather_offset_extend_<mode><US>): New. (@mve_vldrq_gather_offset_z_<mode>): New. (@mve_vldrq_gather_offset_z_extend_<mode><US>): New. * config/arm/unspecs.md (VLDRBQGO_S, VLDRBQGO_U, VLDRHQGO_S) (VLDRHQGO_U, VLDRDQGO_S, VLDRDQGO_U, VLDRHQGO_F, VLDRWQGO_F) (VLDRWQGO_S, VLDRWQGO_U): Delete. (VLDRGOQ, VLDRGOQ_Z, VLDRGOQ_EXT, VLDRGOQ_EXT_Z): New.
2024-12-13arm: [MVE intrinsics] add load_ext_gather_offset shapeChristophe Lyon2-0/+59
This patch adds the load_ext_gather_offset shape description. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct load_ext_gather): New. (struct load_ext_gather_offset_def): New. * config/arm/arm-mve-builtins-shapes.h (load_ext_gather_offset): New.
2024-12-13arm: [MVE intrinsics] rework vstr scatter_base_wbChristophe Lyon10-378/+128
Implement vstr?q_scatter_base_wb using the new MVE builtins framework. The patch introduces a new 'b' type for signatures, which represents the type of the 'base' argument of vstr?q_scatter_base_wb. gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strsbwbs_qualifiers) (arm_strsbwbu_qualifiers, arm_strsbwbs_p_qualifiers) (arm_strsbwbu_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (vstrq_scatter_base_impl): Add support for MODE_wb. * config/arm/arm-mve-builtins-shapes.cc (parse_type): Add support for 'b' type. (store_scatter_base): Add support for MODE_wb. * config/arm/arm-mve-builtins.cc (function_resolver::require_pointer_to_type): New. * config/arm/arm-mve-builtins.h (function_resolver::require_pointer_to_type): New. * config/arm/arm_mve.h (vstrdq_scatter_base_wb): Delete. (vstrdq_scatter_base_wb_p): Delete. (vstrwq_scatter_base_wb_p): Delete. (vstrwq_scatter_base_wb): Delete. (vstrdq_scatter_base_wb_p_s64): Delete. (vstrdq_scatter_base_wb_p_u64): Delete. (vstrdq_scatter_base_wb_s64): Delete. (vstrdq_scatter_base_wb_u64): Delete. (vstrwq_scatter_base_wb_p_s32): Delete. (vstrwq_scatter_base_wb_p_f32): Delete. (vstrwq_scatter_base_wb_p_u32): Delete. (vstrwq_scatter_base_wb_s32): Delete. (vstrwq_scatter_base_wb_u32): Delete. (vstrwq_scatter_base_wb_f32): Delete. (__arm_vstrdq_scatter_base_wb_s64): Delete. (__arm_vstrdq_scatter_base_wb_u64): Delete. (__arm_vstrdq_scatter_base_wb_p_s64): Delete. (__arm_vstrdq_scatter_base_wb_p_u64): Delete. (__arm_vstrwq_scatter_base_wb_p_s32): Delete. (__arm_vstrwq_scatter_base_wb_p_u32): Delete. (__arm_vstrwq_scatter_base_wb_s32): Delete. (__arm_vstrwq_scatter_base_wb_u32): Delete. (__arm_vstrwq_scatter_base_wb_f32): Delete. (__arm_vstrwq_scatter_base_wb_p_f32): Delete. (__arm_vstrdq_scatter_base_wb): Delete. (__arm_vstrdq_scatter_base_wb_p): Delete. (__arm_vstrwq_scatter_base_wb_p): Delete. (__arm_vstrwq_scatter_base_wb): Delete. * config/arm/arm_mve_builtins.def (vstrwq_scatter_base_wb_u) (vstrdq_scatter_base_wb_u, vstrwq_scatter_base_wb_p_u) (vstrdq_scatter_base_wb_p_u, vstrwq_scatter_base_wb_s) (vstrwq_scatter_base_wb_f, vstrdq_scatter_base_wb_s) (vstrwq_scatter_base_wb_p_s, vstrwq_scatter_base_wb_p_f) (vstrdq_scatter_base_wb_p_s): Delete. * config/arm/iterators.md (supf): Remove VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRDQSBWB_S, VSTRDQSBWB_U. (VSTRDSBQ, VSTRWSBWBQ, VSTRDSBWBQ): Delete. * config/arm/mve.md (mve_vstrwq_scatter_base_wb_<supf>v4si): Delete. (mve_vstrwq_scatter_base_wb_p_<supf>v4si): Delete. (mve_vstrwq_scatter_base_wb_fv4sf): Delete. (mve_vstrwq_scatter_base_wb_p_fv4sf): Delete. (mve_vstrdq_scatter_base_wb_<supf>v2di): Delete. (mve_vstrdq_scatter_base_wb_p_<supf>v2di): Delete. (@mve_vstrq_scatter_base_wb_<mode>): New. (@mve_vstrq_scatter_base_wb_p_<mode>): New. * config/arm/unspecs.md (VSTRWQSBWB_S, VSTRWQSBWB_U, VSTRWQSBWB_F) (VSTRDQSBWB_S, VSTRDQSBWB_U): Delete. (VSTRSBWBQ, VSTRSBWBQ_P): New.
2024-12-13arm: [MVE intrinsics] rework vstr scatter_baseChristophe Lyon9-360/+72
Implement vstr?q_scatter_base using the new MVE builtins framework. We need to introduce a new iterator (MVE_4) to support the set needed by vstr?q_scatter_base (V4SI V4SF V2DI). gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strsbs_qualifiers) (arm_strsbu_qualifiers, arm_strsbs_p_qualifiers) (arm_strsbu_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_base_impl): New. (vstrwq_scatter_base, vstrdq_scatter_base): New. * config/arm/arm-mve-builtins-base.def (vstrwq_scatter_base) (vstrdq_scatter_base): New. * config/arm/arm-mve-builtins-base.h (vstrwq_scatter_base) (vstrdq_scatter_base): New. * config/arm/arm_mve.h (vstrwq_scatter_base): Delete. (vstrwq_scatter_base_p): Delete. (vstrdq_scatter_base_p): Delete. (vstrdq_scatter_base): Delete. (vstrwq_scatter_base_s32): Delete. (vstrwq_scatter_base_u32): Delete. (vstrwq_scatter_base_p_s32): Delete. (vstrwq_scatter_base_p_u32): Delete. (vstrdq_scatter_base_p_s64): Delete. (vstrdq_scatter_base_p_u64): Delete. (vstrdq_scatter_base_s64): Delete. (vstrdq_scatter_base_u64): Delete. (vstrwq_scatter_base_f32): Delete. (vstrwq_scatter_base_p_f32): Delete. (__arm_vstrwq_scatter_base_s32): Delete. (__arm_vstrwq_scatter_base_u32): Delete. (__arm_vstrwq_scatter_base_p_s32): Delete. (__arm_vstrwq_scatter_base_p_u32): Delete. (__arm_vstrdq_scatter_base_p_s64): Delete. (__arm_vstrdq_scatter_base_p_u64): Delete. (__arm_vstrdq_scatter_base_s64): Delete. (__arm_vstrdq_scatter_base_u64): Delete. (__arm_vstrwq_scatter_base_f32): Delete. (__arm_vstrwq_scatter_base_p_f32): Delete. (__arm_vstrwq_scatter_base): Delete. (__arm_vstrwq_scatter_base_p): Delete. (__arm_vstrdq_scatter_base_p): Delete. (__arm_vstrdq_scatter_base): Delete. * config/arm/arm_mve_builtins.def (vstrwq_scatter_base_s) (vstrwq_scatter_base_u, vstrwq_scatter_base_p_s) (vstrwq_scatter_base_p_u, vstrdq_scatter_base_s) (vstrwq_scatter_base_f, vstrdq_scatter_base_p_s) (vstrwq_scatter_base_p_f, vstrdq_scatter_base_u) (vstrdq_scatter_base_p_u): Delete. * config/arm/iterators.md (MVE_4): New. (supf): Remove VSTRWQSB_S, VSTRWQSB_U. (VSTRWSBQ): Delete. * config/arm/mve.md (mve_vstrwq_scatter_base_<supf>v4si): Delete. (mve_vstrwq_scatter_base_p_<supf>v4si): Delete. (mve_vstrdq_scatter_base_p_<supf>v2di): Delete. (mve_vstrdq_scatter_base_<supf>v2di): Delete. (mve_vstrwq_scatter_base_fv4sf): Delete. (mve_vstrwq_scatter_base_p_fv4sf): Delete. (@mve_vstrq_scatter_base_<mode>): New. (@mve_vstrq_scatter_base_p_<mode>): New. * config/arm/unspecs.md (VSTRWQSB_S, VSTRWQSB_U, VSTRWQSB_F): Delete. (VSTRSBQ, VSTRSBQ_P): New.
2024-12-13arm: [MVE intrinsics] Add store_scatter_base shapeChristophe Lyon2-0/+50
This patch adds the store_scatter_base shape description. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (store_scatter_base): New. * config/arm/arm-mve-builtins-shapes.h (store_scatter_base): New.
2024-12-13arm: [MVE intrinsics] Check immediate is a multiple in a rangeChristophe Lyon2-0/+63
This patch adds support to check that an immediate is a multiple of a given value in a given range. This will be used for instance by scatter_base to check that offset is in +/-4*[0..127]. Unlike require_immediate_range, require_immediate_range_multiple accepts signed range bounds to handle the above case. gcc/ChangeLog: * config/arm/arm-mve-builtins.cc (report_out_of_range_multiple): New. (function_checker::require_signed_immediate): New. (function_checker::require_immediate_range_multiple): New. * config/arm/arm-mve-builtins.h (function_checker::require_immediate_range_multiple): New. (function_checker::require_signed_immediate): New.
2024-12-13arm: [MVE intrinsics] rework vstr_scatter_shifted_offsetChristophe Lyon9-786/+103
Implement vstr?q_scatter_shifted_offset intrinsics using the MVE builtins framework. We use the same approach as the previous patch, and we now have four sets of patterns: - vector scatter stores with shifted offset (non-truncating) - predicated vector scatter stores with shifted offset (non-truncating) - truncating vector scatter stores with shifted offset - predicated truncating vector scatter stores with shifted offset Note that the truncating patterns do not use an iterator since there is only one such variant: V4SI to V4HI. We need to introduce new iterators: - MVE_VLD_ST_scatter_shifted, same as MVE_VLD_ST_scatter without V16QI - MVE_scatter_shift to map the mode to the shift amount gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_strss_qualifiers) (arm_strsu_qualifiers, arm_strsu_p_qualifiers) (arm_strss_p_qualifiers): Delete. * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl): Add support for shifted version. (vstrdq_scatter_shifted, vstrhq_scatter_shifted) (vstrwq_scatter_shifted): New. * config/arm/arm-mve-builtins-base.def (vstrhq_scatter_shifted) (vstrwq_scatter_shifted, vstrdq_scatter_shifted): New. * config/arm/arm-mve-builtins-base.h (vstrhq_scatter_shifted) (vstrwq_scatter_shifted, vstrdq_scatter_shifted): New. * config/arm/arm_mve.h (vstrhq_scatter_shifted_offset): Delete. (vstrhq_scatter_shifted_offset_p): Delete. (vstrdq_scatter_shifted_offset_p): Delete. (vstrdq_scatter_shifted_offset): Delete. (vstrwq_scatter_shifted_offset_p): Delete. (vstrwq_scatter_shifted_offset): Delete. (vstrhq_scatter_shifted_offset_s32): Delete. (vstrhq_scatter_shifted_offset_s16): Delete. (vstrhq_scatter_shifted_offset_u32): Delete. (vstrhq_scatter_shifted_offset_u16): Delete. (vstrhq_scatter_shifted_offset_p_s32): Delete. (vstrhq_scatter_shifted_offset_p_s16): Delete. (vstrhq_scatter_shifted_offset_p_u32): Delete. (vstrhq_scatter_shifted_offset_p_u16): Delete. (vstrdq_scatter_shifted_offset_p_s64): Delete. (vstrdq_scatter_shifted_offset_p_u64): Delete. (vstrdq_scatter_shifted_offset_s64): Delete. (vstrdq_scatter_shifted_offset_u64): Delete. (vstrhq_scatter_shifted_offset_f16): Delete. (vstrhq_scatter_shifted_offset_p_f16): Delete. (vstrwq_scatter_shifted_offset_f32): Delete. (vstrwq_scatter_shifted_offset_p_f32): Delete. (vstrwq_scatter_shifted_offset_p_s32): Delete. (vstrwq_scatter_shifted_offset_p_u32): Delete. (vstrwq_scatter_shifted_offset_s32): Delete. (vstrwq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_s32): Delete. (__arm_vstrhq_scatter_shifted_offset_s16): Delete. (__arm_vstrhq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_u16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_s32): Delete. (__arm_vstrhq_scatter_shifted_offset_p_s16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_p_u16): Delete. (__arm_vstrdq_scatter_shifted_offset_p_s64): Delete. (__arm_vstrdq_scatter_shifted_offset_p_u64): Delete. (__arm_vstrdq_scatter_shifted_offset_s64): Delete. (__arm_vstrdq_scatter_shifted_offset_u64): Delete. (__arm_vstrwq_scatter_shifted_offset_p_s32): Delete. (__arm_vstrwq_scatter_shifted_offset_p_u32): Delete. (__arm_vstrwq_scatter_shifted_offset_s32): Delete. (__arm_vstrwq_scatter_shifted_offset_u32): Delete. (__arm_vstrhq_scatter_shifted_offset_f16): Delete. (__arm_vstrhq_scatter_shifted_offset_p_f16): Delete. (__arm_vstrwq_scatter_shifted_offset_f32): Delete. (__arm_vstrwq_scatter_shifted_offset_p_f32): Delete. (__arm_vstrhq_scatter_shifted_offset): Delete. (__arm_vstrhq_scatter_shifted_offset_p): Delete. (__arm_vstrdq_scatter_shifted_offset_p): Delete. (__arm_vstrdq_scatter_shifted_offset): Delete. (__arm_vstrwq_scatter_shifted_offset_p): Delete. (__arm_vstrwq_scatter_shifted_offset): Delete. * config/arm/arm_mve_builtins.def (vstrhq_scatter_shifted_offset_p_u) (vstrhq_scatter_shifted_offset_u) (vstrhq_scatter_shifted_offset_p_s) (vstrhq_scatter_shifted_offset_s, vstrdq_scatter_shifted_offset_s) (vstrhq_scatter_shifted_offset_f, vstrwq_scatter_shifted_offset_f) (vstrwq_scatter_shifted_offset_s) (vstrdq_scatter_shifted_offset_p_s) (vstrhq_scatter_shifted_offset_p_f) (vstrwq_scatter_shifted_offset_p_f) (vstrwq_scatter_shifted_offset_p_s) (vstrdq_scatter_shifted_offset_u, vstrwq_scatter_shifted_offset_u) (vstrdq_scatter_shifted_offset_p_u) (vstrwq_scatter_shifted_offset_p_u): Delete. * config/arm/iterators.md (MVE_VLD_ST_scatter_shifted): New. (MVE_scatter_shift): New. (supf): Remove VSTRHQSSO_S, VSTRHQSSO_U, VSTRDQSSO_S, VSTRDQSSO_U, VSTRWQSSO_U, VSTRWQSSO_S. (VSTRHSSOQ, VSTRDSSOQ, VSTRWSSOQ): Delete. * config/arm/mve.md (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>): Delete. (mve_vstrhq_scatter_shifted_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_shifted_offset_<supf><mode>): Delete. (mve_vstrhq_scatter_shifted_offset_<supf><mode>_insn): Delete. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di): Delete. (mve_vstrdq_scatter_shifted_offset_p_<supf>v2di_insn): Delete. (mve_vstrdq_scatter_shifted_offset_<supf>v2di): Delete. (mve_vstrdq_scatter_shifted_offset_<supf>v2di_insn): Delete. (mve_vstrhq_scatter_shifted_offset_fv8hf): Delete. (mve_vstrhq_scatter_shifted_offset_fv8hf_insn): Delete. (mve_vstrhq_scatter_shifted_offset_p_fv8hf): Delete. (mve_vstrhq_scatter_shifted_offset_p_fv8hf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_fv4sf): Delete. (mve_vstrwq_scatter_shifted_offset_fv4sf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_p_fv4sf): Delete. (mve_vstrwq_scatter_shifted_offset_p_fv4sf_insn): Delete. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si): Delete. (mve_vstrwq_scatter_shifted_offset_p_<supf>v4si_insn): Delete. (mve_vstrwq_scatter_shifted_offset_<supf>v4si): Delete. (mve_vstrwq_scatter_shifted_offset_<supf>v4si_insn): Delete. (@mve_vstrq_scatter_shifted_offset_<mode>): New. (@mve_vstrq_scatter_shifted_offset_p_<mode>): New. (mve_vstrq_truncate_scatter_shifted_offset_v4si): New. (mve_vstrq_truncate_scatter_shifted_offset_p_v4si): New. * config/arm/unspecs.md (VSTRDQSSO_S, VSTRDQSSO_U, VSTRWQSSO_S) (VSTRWQSSO_U, VSTRHQSSO_F, VSTRWQSSO_F, VSTRHQSSO_S, VSTRHQSSO_U): Delete. (VSTRSSOQ, VSTRSSOQ_P, VSTRSSOQ_TRUNC, VSTRSSOQ_TRUNC_P): New.
2024-12-13arm: [MVE intrinsics] rework vstr?q_scatter_offsetChristophe Lyon9-1017/+143
This patch implements vstr?q_scatter_offset using the new MVE builtins framework. It uses a similar approach to a previous patch which grouped truncating and non-truncating stores in two sets of patterns, rather than having groups of patterns depending on the destination size. We need to add the 'integer_64' types of suffixes in order to support vstrdq_scatter_offset. The patch introduces the MVE_VLD_ST_scatter iterator, similar to MVE_VLD_ST but which also includes V2DI (again, for vstrdq_scatter_offset). The new MVE_scatter_offset mode attribute is used to map the destination type to the offset type (both are usually equal, except when the destination is floating-point). We end up with four sets of patterns: - vector scatter stores with offset (non-truncating) - predicated vector scatter stores with offset (non-truncating) - truncating vector scatter stores with offset - predicated truncating vector scatter stores with offset gcc/ChangeLog: * config/arm/arm-mve-builtins-base.cc (class vstrq_scatter_impl): New. (vstrbq_scatter, vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.def (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins-base.h (vstrbq_scatter) (vstrhq_scatter, vstrwq_scatter, vstrdq_scatter): New. * config/arm/arm-mve-builtins.cc (integer_64): New. * config/arm/arm_mve.h (vstrbq_scatter_offset): Delete. (vstrbq_scatter_offset_p): Delete. (vstrhq_scatter_offset): Delete. (vstrhq_scatter_offset_p): Delete. (vstrdq_scatter_offset_p): Delete. (vstrdq_scatter_offset): Delete. (vstrwq_scatter_offset_p): Delete. (vstrwq_scatter_offset): Delete. (vstrbq_scatter_offset_s8): Delete. (vstrbq_scatter_offset_u8): Delete. (vstrbq_scatter_offset_u16): Delete. (vstrbq_scatter_offset_s16): Delete. (vstrbq_scatter_offset_u32): Delete. (vstrbq_scatter_offset_s32): Delete. (vstrbq_scatter_offset_p_s8): Delete. (vstrbq_scatter_offset_p_s32): Delete. (vstrbq_scatter_offset_p_s16): Delete. (vstrbq_scatter_offset_p_u8): Delete. (vstrbq_scatter_offset_p_u32): Delete. (vstrbq_scatter_offset_p_u16): Delete. (vstrhq_scatter_offset_s32): Delete. (vstrhq_scatter_offset_s16): Delete. (vstrhq_scatter_offset_u32): Delete. (vstrhq_scatter_offset_u16): Delete. (vstrhq_scatter_offset_p_s32): Delete. (vstrhq_scatter_offset_p_s16): Delete. (vstrhq_scatter_offset_p_u32): Delete. (vstrhq_scatter_offset_p_u16): Delete. (vstrdq_scatter_offset_p_s64): Delete. (vstrdq_scatter_offset_p_u64): Delete. (vstrdq_scatter_offset_s64): Delete. (vstrdq_scatter_offset_u64): Delete. (vstrhq_scatter_offset_f16): Delete. (vstrhq_scatter_offset_p_f16): Delete. (vstrwq_scatter_offset_f32): Delete. (vstrwq_scatter_offset_p_f32): Delete. (vstrwq_scatter_offset_p_s32): Delete. (vstrwq_scatter_offset_p_u32): Delete. (vstrwq_scatter_offset_s32): Delete. (vstrwq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_s8): Delete. (__arm_vstrbq_scatter_offset_s32): Delete. (__arm_vstrbq_scatter_offset_s16): Delete. (__arm_vstrbq_scatter_offset_u8): Delete. (__arm_vstrbq_scatter_offset_u32): Delete. (__arm_vstrbq_scatter_offset_u16): Delete. (__arm_vstrbq_scatter_offset_p_s8): Delete. (__arm_vstrbq_scatter_offset_p_s32): Delete. (__arm_vstrbq_scatter_offset_p_s16): Delete. (__arm_vstrbq_scatter_offset_p_u8): Delete. (__arm_vstrbq_scatter_offset_p_u32): Delete. (__arm_vstrbq_scatter_offset_p_u16): Delete. (__arm_vstrhq_scatter_offset_s32): Delete. (__arm_vstrhq_scatter_offset_s16): Delete. (__arm_vstrhq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_u16): Delete. (__arm_vstrhq_scatter_offset_p_s32): Delete. (__arm_vstrhq_scatter_offset_p_s16): Delete. (__arm_vstrhq_scatter_offset_p_u32): Delete. (__arm_vstrhq_scatter_offset_p_u16): Delete. (__arm_vstrdq_scatter_offset_p_s64): Delete. (__arm_vstrdq_scatter_offset_p_u64): Delete. (__arm_vstrdq_scatter_offset_s64): Delete. (__arm_vstrdq_scatter_offset_u64): Delete. (__arm_vstrwq_scatter_offset_p_s32): Delete. (__arm_vstrwq_scatter_offset_p_u32): Delete. (__arm_vstrwq_scatter_offset_s32): Delete. (__arm_vstrwq_scatter_offset_u32): Delete. (__arm_vstrhq_scatter_offset_f16): Delete. (__arm_vstrhq_scatter_offset_p_f16): Delete. (__arm_vstrwq_scatter_offset_f32): Delete. (__arm_vstrwq_scatter_offset_p_f32): Delete. (__arm_vstrbq_scatter_offset): Delete. (__arm_vstrbq_scatter_offset_p): Delete. (__arm_vstrhq_scatter_offset): Delete. (__arm_vstrhq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset_p): Delete. (__arm_vstrdq_scatter_offset): Delete. (__arm_vstrwq_scatter_offset_p): Delete. (__arm_vstrwq_scatter_offset): Delete. * config/arm/arm_mve_builtins.def (vstrbq_scatter_offset_s) (vstrbq_scatter_offset_u, vstrbq_scatter_offset_p_s) (vstrbq_scatter_offset_p_u, vstrhq_scatter_offset_p_u) (vstrhq_scatter_offset_u, vstrhq_scatter_offset_p_s) (vstrhq_scatter_offset_s, vstrdq_scatter_offset_s) (vstrhq_scatter_offset_f, vstrwq_scatter_offset_f) (vstrwq_scatter_offset_s, vstrdq_scatter_offset_p_s) (vstrhq_scatter_offset_p_f, vstrwq_scatter_offset_p_f) (vstrwq_scatter_offset_p_s, vstrdq_scatter_offset_u) (vstrwq_scatter_offset_u, vstrdq_scatter_offset_p_u) (vstrwq_scatter_offset_p_u) Delete. * config/arm/iterators.md (MVE_VLD_ST_scatter): New. (MVE_scatter_offset): New. (MVE_elem_ch): Add entry for V2DI. (supf): Remove VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S, VSTRHQSO_U, VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_U, VSTRWQSO_S. (VSTRBSOQ, VSTRHSOQ, VSTRDSOQ, VSTRWSOQ): Delete. * config/arm/mve.md (mve_vstrbq_scatter_offset_<supf><mode>): Delete. (mve_vstrbq_scatter_offset_<supf><mode>_insn): Delete. (mve_vstrbq_scatter_offset_p_<supf><mode>): Delete. (mve_vstrbq_scatter_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_offset_p_<supf><mode>): Delete. (mve_vstrhq_scatter_offset_p_<supf><mode>_insn): Delete. (mve_vstrhq_scatter_offset_<supf><mode>): Delete. (mve_vstrhq_scatter_offset_<supf><mode>_insn): Delete. (mve_vstrdq_scatter_offset_p_<supf>v2di): Delete. (mve_vstrdq_scatter_offset_p_<supf>v2di_insn): Delete. (mve_vstrdq_scatter_offset_<supf>v2di): Delete. (mve_vstrdq_scatter_offset_<supf>v2di_insn): Delete. (mve_vstrhq_scatter_offset_fv8hf): Delete. (mve_vstrhq_scatter_offset_fv8hf_insn): Delete. (mve_vstrhq_scatter_offset_p_fv8hf): Delete. (mve_vstrhq_scatter_offset_p_fv8hf_insn): Delete. (mve_vstrwq_scatter_offset_fv4sf): Delete. (mve_vstrwq_scatter_offset_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_fv4sf): Delete. (mve_vstrwq_scatter_offset_p_fv4sf_insn): Delete. (mve_vstrwq_scatter_offset_p_<supf>v4si): Delete. (mve_vstrwq_scatter_offset_p_<supf>v4si_insn): Delete. (mve_vstrwq_scatter_offset_<supf>v4si): Delete. (mve_vstrwq_scatter_offset_<supf>v4si_insn): Delete. (@mve_vstrq_scatter_offset_<mode>): New. (@mve_vstrq_scatter_offset_p_<mode>): New. (@mve_vstrq_truncate_scatter_offset_<mode>): New. (@mve_vstrq_truncate_scatter_offset_p_<mode>): New. * config/arm/unspecs.md (VSTRBQSO_S, VSTRBQSO_U, VSTRHQSO_S) (VSTRDQSO_S, VSTRDQSO_U, VSTRWQSO_S, VSTRWQSO_U, VSTRHQSO_F) (VSTRWQSO_F, VSTRHQSO_U): Delete. (VSTRQSO, VSTRQSO_P, VSTRQSO_TRUNC, VSTRQSO_TRUNC_P): New.
2024-12-13arm: [MVE intrinsics] add store_scatter_offset shapeChristophe Lyon2-0/+65
This patch adds the store_scatter_offset shape and uses a new helper class (store_scatter), which will also be used by later patches. gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct store_scatter): New. (struct store_scatter_offset_def): New. * config/arm/arm-mve-builtins-shapes.h (store_scatter_offset): New.
2024-12-13arm: [MVE intrinsics] add mode_after_pred helper in function_shapeChristophe Lyon3-1/+21
This new helper returns true if the mode suffix goes after the predicate suffix. This is true in most cases, so the base implementations in nonoverloaded_base and overloaded_base return true. For instance: vaddq_m_n_s32. This will be useful in later patches to implement vstr?q_scatter_offset_p (_p appears after _offset). gcc/ChangeLog: * config/arm/arm-mve-builtins-shapes.cc (struct nonoverloaded_base): Implement mode_after_pred. (struct overloaded_base): Likewise. * config/arm/arm-mve-builtins.cc (function_builder::get_name): Call mode_after_pred as needed. * config/arm/arm-mve-builtins.h (function_shape): Add mode_after_pred.
2024-12-13AArch64: Set L1 data cache size according to size on CPUsTamar Christina9-22/+9
This sets the L1 data cache size for some cores based on their size in their Technical Reference Manuals. Today the port minimum is 256 bytes as explained in commit g:9a99559a478111f7fbeec29bd78344df7651c707, however like Neoverse V2 most cores actually define the L1 cache size as 64-bytes. The generic Armv9-A model was already changed in g:f000cb8cbc58b23a91c84d47d69481904981a1d9 and this change follows suite for a few other cores based on their TRMs. This results in less memory pressure when running on large core count machines. gcc/ChangeLog: * config/aarch64/tuning_models/cortexx925.h: Set L1 cache size to 64b. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen1.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. (neoversev2_prefetch_tune): Removed. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
2024-12-13AArch64: Add CMP+CSEL and CMP+CSET for cores that support itTamar Christina10-9/+17
GCC 15 added two new fusions CMP+CSEL and CMP+CSET. This patch enables them for cores that support based on their Software Optimization Guides and generically on Armv9-A. Even if a core does not support it there's no negative performance impact. gcc/ChangeLog: * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSE_NEOVERSE_BASE): New. * config/aarch64/tuning_models/neoverse512tvb.h: Use it. * config/aarch64/tuning_models/neoversen2.h: Use it. * config/aarch64/tuning_models/neoversen3.h: Use it. * config/aarch64/tuning_models/neoversev1.h: Use it. * config/aarch64/tuning_models/neoversev2.h: Use it. * config/aarch64/tuning_models/neoversev3.h: Use it. * config/aarch64/tuning_models/neoversev3ae.h: Use it. * config/aarch64/tuning_models/cortexx925.h: Add fusions. * config/aarch64/tuning_models/generic_armv9_a.h: Add fusions.
2024-12-13i386: Add vec_fm{addsub,subadd}v2sf4 patterns [PR116979]Jakub Jelinek1-0/+48
As mentioned in the PR, the addition of vec_addsubv2sf3 expander caused the testcase to be vectorized and no longer to use fma. The following patch adds new expanders so that it can be vectorized again with the alternating add/sub fma instructions. There is some bug on the slp cost computation side which causes it not to count some scalar multiplication costs, but I think the patch is desirable anyway before that is fixed and the testcase for now just uses -fvect-cost-model=unlimited. 2024-12-13 Jakub Jelinek <jakub@redhat.com> PR target/116979 * config/i386/mmx.md (vec_fmaddsubv2sf4, vec_fmsubaddv2sf4): New define_expand patterns. * gcc.target/i386/pr116979.c: New test.
2024-12-13RISC-V: Improve slide1up pattern.Robin Dapp3-15/+56
This patch adds a second variant to implement the extract/slide1up pattern. In order to do a permutation like <3, 4, 5, 6> from vectors <0, 1, 2, 3> and <4, 5, 6, 7> we currently extract <3> from the first vector and re-insert it into the second vector. Unless register-file crossing latency is essentially zero it should be preferable to first slide the second vector up by one, then slide down the first vector by (nunits - 1). gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_register_move_cost): Export. * config/riscv/riscv-v.cc (shuffle_extract_and_slide1up_patterns): Rename... (shuffle_off_by_one_patterns): ... to this and add slideup/slidedown variant. (expand_vec_perm_const_1): Call renamed function. * config/riscv/riscv.cc (riscv_secondary_memory_needed): Remove static. (riscv_register_move_cost): Add VR<->GR/FR handling. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112599-2.c: Adjust test expectation.
2024-12-13RISC-V: Add even/odd vec_perm_const pattern.Robin Dapp1-0/+66
This adds handling for even/odd patterns. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_even_odd_patterns): New function. (expand_vec_perm_const_1): Use new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-evenodd.c: New test.
2024-12-13RISC-V: Add interleave pattern.Robin Dapp1-0/+80
This patch adds efficient handling of interleaving patterns like [0 4 1 5] to vec_perm_const. It is implemented by a slideup and a gather. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_interleave_patterns): New function. (expand_vec_perm_const_1): Use new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-interleave.c: New test.
2024-12-13RISC-V: Add slide to perm_const strategies.Robin Dapp1-0/+99
This patch adds a shuffle_slide_patterns to expand_vec_perm_const. It recognizes permutations like {0, 1, 4, 5} or {2, 3, 6, 7} which can be constructed by a slideup or slidedown of one of the vectors into the other one. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_slide_patterns): New. (expand_vec_perm_const_1): Call new function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide-run.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/shuffle-slide.c: New test.
2024-12-13RISC-V: Emit vector shift pattern for const_vector [PR117353].Robin Dapp1-3/+5
In PR117353 and PR117878 we expand a const vector during reload. For this we use an unpredicated left shift. Normally an insn like this is split but as we introduce it late and cannot create pseudos anymore it remains unpredicated and is not recognized by the vsetvl pass (where we expect all insns to be in predicated RVV format). This patch directly emits a predicated shift instead. We could distinguish between !lra_in_progress and lra_in_progress and emit an unpredicated shift in the former case but we're not very likely to optimize it anyway so it doesn't seem worth it. PR target/117353 PR target/117878 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_const_vector): Use predicated instead of simple shift. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr117353.c: New test.
2024-12-13RISC-V: Make vector strided load alias all other memoriesPan Li1-0/+1
The vector strided load doesn't include the (mem:BLK (scratch)) to alias all other memories. It will make the alias analysis only consider the base address of strided load and promopt the store before the strided load. For example as below #define STEP 10 char d[225]; int e[STEP]; int main() { // store 0, 10, 20, 30, 40, 50, 60, 70, 80, 90 for (long h = 0; h < STEP; ++h) d[h * STEP] = 9; // load 30, 40, 50, 60, 70, 80, 90 // store 3, 4, 5, 6, 7, 8, 9 for (int h = 3; h < STEP; h += 1) e[h] = d[h * STEP]; if (e[5] != 9) { __builtin_abort (); } return 0; } The asm dump will be: main: lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) li a4,9 sb a4,30(a5) addi a3,a5,30 vsetivli zero,7,e32,m1,ta,ma li a2,10 vlse8.v v2,0(a3),a2 // depends on 30(a5), 40(a5), ... 90(a5) but // only 30(a5) has been promoted before vlse. // It is store after load mistake. addi a3,a5,252 sb a4,0(a5) sb a4,10(a5) sb a4,20(a5) sb a4,40(a5) vzext.vf4 v1,v2 sb a4,50(a5) sb a4,60(a5) vse32.v v1,0(a3) li a0,0 sb a4,70(a5) sb a4,80(a5) sb a4,90(a5) lw a5,260(a5) beq a5,a4,.L4 li a0,123 After this patch: main: vsetivli zero,4,e32,m1,ta,ma vmv.v.i v1,9 lui a5,%hi(.LANCHOR0) addi a5,a5,%lo(.LANCHOR0) addi a4,a5,244 vse32.v v1,0(a4) li a4,9 sb a4,0(a5) sb a4,10(a5) sb a4,20(a5) sb a4,30(a5) sb a4,40(a5) sb a4,50(a5) sb a4,60(a5) sb a4,70(a5) sb a4,80(a5) sb a4,90(a5) vsetivli zero,3,e32,m1,ta,ma addi a4,a5,70 li a3,10 vlse8.v v2,0(a4),a3 addi a5,a5,260 li a0,0 vzext.vf4 v1,v2 vse32.v v1,0(a5) ret The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/117990 gcc/ChangeLog: * config/riscv/vector.md: Add the (mem:BLK (scratch)) to the vector strided load. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr117990-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-12-12hppa: Remove extra clobber from divsi3, udivsi3, modsi3 and umodsi3 patternsJohn David Anglin2-70/+16
The $$divI, $$divU, $$remI and $$remU millicode calls clobber r1, r26, r25 and the return link register (r31 or r2). We don't need to clobber any other registers. 2024-12-12 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.cc (pa_emit_hpdiv_const): Clobber r1, r25, r25 and return register. * config/pa/pa.md (divsi3): Revise clobbers and operands. Remove second clobber from div:SI insns. (udivsi3, modsi3, umodsi3): Likewise.
2024-12-12AVR: target/118000 - Fix copymem from address-spaces.Georg-Johann Lay1-2/+15
* rampz_rtx et al. were missing MEM_VOLATILE_P. This is needed because avr_emit_cpymemhi is setting RAMPZ explicitly with an own insn. * avr_out_cpymem was missing a final RAMPZ = 0 on EBI devices. This only affects the __flash1 ... __flash5 spaces since the other ASes use different routines, gcc/ PR target/118000 * config/avr/avr.cc (avr_init_expanders) <sreg_rtx> <rampd_rtx, rampx_rtx, rampy_rtx, rampz_rtx>: Set MEM_VOLATILE_P. (avr_out_cpymem) [ELPM && EBI]: Restore RAMPZ to 0 after.
2024-12-12AVR: Assert minimal required bit width of section_common::flags.Georg-Johann Lay1-0/+29
gcc/ * config/avr/avr.cc (avr_ctz): New constexpr function. (section_common::flags): Assert minimal bit width.
2024-12-12AVR: target/118001 - Add __flashx as 24-bit named address space.Georg-Johann Lay6-114/+361
This patch adds __flashx as a new named address space that allocates objects in .progmemx.data. The handling is mostly the same or similar to that of 24-bit space __memx, except that the asm routines are simpler and more efficient. Loads are emit inline when ELPMX or LPMX is available. The address space uses a 24-bit addresses even on devices with a program memory size of 64 KiB or less. PR target/118001 gcc/ * doc/extend.texi (AVR Named Address Spaces): Document __flashx. * config/avr/avr.h (ADDR_SPACE_FLASHX): New enum value. * config/avr/avr-protos.h (avr_out_fload, avr_mem_flashx_p) (avr_fload_libgcc_p, avr_load_libgcc_mem_p) (avr_load_libgcc_insn_p): New. * config/avr/avr.cc (avr_addrspace): Add ADDR_SPACE_FLASHX. (avr_decl_flashx_p, avr_mem_flashx_p, avr_fload_libgcc_p) (avr_load_libgcc_mem_p, avr_load_libgcc_insn_p, avr_out_fload): New functions. (avr_adjust_insn_length) [ADJUST_LEN_FLOAD]: Handle case. (avr_progmem_p) [avr_decl_flashx_p]: return 2. (avr_addr_space_legitimate_address_p) [ADDR_SPACE_FLASHX]: Has same behavior like ADDR_SPACE_MEMX. (avr_addr_space_convert): Use pointer sizes rather then ASes. (avr_addr_space_contains): New function. (avr_convert_to_type): Use it. (avr_emit_cpymemhi): Handle ADDR_SPACE_FLASHX. * config/avr/avr.md (adjust_len) <fload>: New attr value. (gen_load<mode>_libgcc): Renamed from load<mode>_libgcc. (xload8<mode>_A): Iterate over MOVMODE rather than over ALL1. (fxmov<mode>_A): New from xloadv<mode>_A. (xmov<mode>_8): New from xload<mode>_A. (fmov<mode>): New insns. (fxload<mode>_A): New from xload<mode>_A. (fxload_<mode>_libgcc): New from xload_<mode>_libgcc. (*fxload_<mode>_libgcc): New from *xload_<mode>_libgcc. (mov<mode>) [avr_mem_flashx_p]: Hande ADDR_SPACE_FLASHX. (cpymemx_<mode>): Make sure the address space is not lost when splitting. (*cpymemx_<mode>) [ADDR_SPACE_FLASHX]: Use __movmemf_<mode> for asm. (*ashlqi.1.zextpsi_split): New combine pattern. * config/avr/predicates.md (nox_general_operand): Don't match when avr_mem_flashx_p is true. * config/avr/avr-passes.cc (AVR_LdSt_Props): ADDR_SPACE_FLASHX has no post_inc. gcc/testsuite/ * gcc.target/avr/torture/addr-space-1.h [AVR_HAVE_ELPM]: Use a function to bump .progmemx.data to a high address. * gcc.target/avr/torture/addr-space-2.h: Same. * gcc.target/avr/torture/addr-space-1-fx.c: New test. * gcc.target/avr/torture/addr-space-2-fx.c: New test. libgcc/ * config/avr/t-avr (LIB1ASMFUNCS): Add _fload_1, _fload_2, _fload_3, _fload_4, _movmemf. * config/avr/lib1funcs.S (.branch_plus): New .macro. (__xload_1, __xload_2, __xload_3, __xload_4): When the address is located in flash, then forward to... (__fload_1, __fload_2, __fload_3, __fload_4): ...these new functions, respectively. (__movmemx_hi): When the address is located in flash, forward to... (__movmemf_hi): ...this new function.
2024-12-12i386: regenerate i386.opt.urlsSam James1-1/+2
r15-6128-gfa878dc8c45fa3 missed the regeneration of the URL doc map, so regenerate it here to make the buildbots happy. gcc/ChangeLog: * config/i386/i386.opt.urls: Regenerate.
2024-12-11middle-end: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE [PR96342]Andre Vieira3-4/+5
This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. gcc/ChangeLog: PR target/96342 * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. * doc/tm.texi: Regenerate Co-authored-by: Victor Do Nascimento <victor.donascimento@arm.com> Co-authored-by: Tamar Christina <tamar.christina@arm.com>
2024-12-11aarch64: Use SVE ASRD instruction with Neon modes.Soumya AR3-12/+29
The ASRD instruction on SVE performs an arithmetic shift right by an immediate for divide. This patch enables the use of ASRD with Neon modes. For example: int in[N], out[N]; void foo (void) { for (int i = 0; i < N; i++) out[i] = in[i] / 4; } compiles to: ldr q31, [x1, x0] cmlt v30.16b, v31.16b, #0 and z30.b, z30.b, 3 add v30.16b, v30.16b, v31.16b sshr v30.16b, v30.16b, 2 str q30, [x0, x2] add x0, x0, 16 cmp x0, 1024 but can just be: ldp q30, q31, [x0], 32 asrd z31.b, p7/m, z31.b, #2 asrd z30.b, p7/m, z30.b, #2 stp q30, q31, [x1], 32 cmp x0, x2 This patch also adds the following overload: aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode) Depending on the data mode, the function returns a predicate with the appropriate bits set. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload. * config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise. * config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3 and *sdiv_pow2<mode>3 to support Neon modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/sve-asrd.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> Signed-off-by: Soumya AR <soumyaa@nvidia.com>
2024-12-11aarch64: Extend SVE2 bit-select instructions for Neon modes.Soumya AR1-0/+66
NBSL, BSL1N, and BSL2N are bit-select intructions on SVE2 with certain operands inverted. These can be extended to work with Neon modes. Since these instructions are unpredicated, duplicate patterns were added with the predicate removed to generate these instructions for Neon modes. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. Signed-off-by: Soumya AR <soumyaa@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (*aarch64_sve2_nbsl_unpred<mode>): New pattern to match unpredicated form. (*aarch64_sve2_bsl1n_unpred<mode>): Likewise. (*aarch64_sve2_bsl2n_unpred<mode>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/bitsel.c: New test.
2024-12-10arm: Fix LDRD register overlap [PR117675]Wilco Dijkstra5-2/+37
The register indexed variants of LDRD have complex register overlap constraints which makes them hard to use without using output_move_double (which can't be used for atomics as it doesn't guarantee to emit atomic LDRD/STRD when required). Add a new predicate and constraint for plain LDRD/STRD with base or base+imm. This blocks register indexing and fixes PR117675. gcc: PR target/117675 * config/arm/arm.cc (arm_ldrd_legitimate_address): New function. * config/arm/arm-protos.h (arm_ldrd_legitimate_address): New prototype. * config/arm/constraints.md: Add new Uo constraint. * config/arm/predicates.md (arm_ldrd_memory_operand): Add new predicate. * config/arm/sync.md (arm_atomic_loaddi2_ldrd): Use arm_ldrd_memory_operand and Uo. gcc/testsuite: PR target/117675 * gcc.target/arm/pr117675.c: Add new test.
2024-12-10AArch64: Add baseline tuneWilco Dijkstra13-13/+16
Cleanup the extra tune defines by introducing AARCH64_EXTRA_TUNE_BASE as a common base supported by all modern cores. Initially set it to AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND. No change in generated code. gcc: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNE_BASE): New define. * config/aarch64/tuning_models/ampere1b.h: Use AARCH64_EXTRA_TUNE_BASE. * config/aarch64/tuning_models/cortexx925.h: Likewise. * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise. * config/aarch64/tuning_models/generic_armv8_a.h: Likewise. * config/aarch64/tuning_models/generic_armv9_a.h: Likewise. * config/aarch64/tuning_models/neoversen1.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise.
2024-12-10AArch64: Cleanup alignment macrosWilco Dijkstra3-18/+62
Change the AARCH64_EXPAND_ALIGNMENT macro into proper function calls to make future changes easier. Use the existing alignment settings, however avoid overaligning small array's or structs to 64 bits when there is no benefit. The lower alignment gives a small reduction in data and stack size. Using 32-bit alignment for small char arrays still improves performance of string functions since it can be loaded in full by the first 8/16-byte load. gcc: * config/aarch64/aarch64.h (AARCH64_EXPAND_ALIGNMENT): Remove. (DATA_ALIGNMENT): Use aarch64_data_alignment. (LOCAL_ALIGNMENT): Use aarch64_stack_alignment. * config/aarch64/aarch64.cc (aarch64_data_alignment): New function. (aarch64_stack_alignment): Likewise. * config/aarch64/aarch64-protos.h (aarch64_data_alignment): New prototype. (aarch64_stack_alignment): Likewise.
2024-12-10AArch64: Use LDP/STP for large struct typesWilco Dijkstra2-83/+21
Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. gcc: * config/aarch64/aarch64.cc (aarch64_classify_address): Treat SIMD structs identically in little and bigendian. * config/aarch64/aarch64-simd.md (aarch64_mov<mode>): Remove VSTRUCT instructions. (aarch64_be_mov<mode>): Allow little-endian, rename to aarch64_mov<mode>. (aarch64_be_movoi): Allow little-endian, rename to aarch64_movoi. (aarch64_be_movci): Allow little-endian, rename to aarch64_movci. (aarch64_be_movxi): Allow little-endian, rename to aarch64_movxi. Remove big-endian special case in define_split variants. gcc/testsuite: * gcc.target/aarch64/torture/simd-abi-8.c: Update to check for LDP/STP.
2024-12-10aarch64: Remove vcond{,u} optabsRichard Sandiford5-231/+4
Prompted by Richard E's arm patch, this one removes the aarch64 support for the vcond{,u} optabs. gcc/ * config/aarch64/aarch64-protos.h (aarch64_expand_sve_vcond): Delete. * config/aarch64/aarch64-simd.md (<su><maxmin>v2di3): Expand into separate vec_cmp and vcond_mask instructions, instead of using vcond. (vcond<mode><mode>, vcond<v_cmp_mixed><mode>, vcondu<mode><mode>) (vcondu<mode><v_cmp_mixed>): Delete. * config/aarch64/aarch64-sve.md (vcond<SVE_ALL:mode><SVE_I:mode>) (vcondu<SVE_ALL:mode><SVE_I:mode>, vcond<mode><v_fp_equiv>): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_sve_vcond): Likewise. * config/aarch64/iterators.md (V_FP_EQUIV, v_fp_equiv, V_cmp_mixed) (v_cmp_mixed): Likewise.
2024-12-10aarch64: Add support for fp8fma instructionsSaurabh Jha6-38/+149
The AArch64 FEAT_FP8FMA extension introduces instructions for multiply-add of vectors. This patch introduces the following instructions: 1. {vmlalbq|vmlaltq}_f16_mf8_fpm. 2. {vmlalbq|vmlaltq}_lane{q}_f16_mf8_fpm. 3. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_f32_mf8_fpm. 4. {vmlallbbq|vmlallbtq|vmlalltbq|vmlallttq}_lane{q}_f32_mf8_fpm. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_pragma_builtins_checker::require_immediate_lane_index): New overload. (aarch64_pragma_builtins_checker::check): Add support for FP8FMA intrinsics. (aarch64_expand_pragma_builtins): Likewise. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Conditionally define TARGET_FP8FMA. * config/aarch64/aarch64-simd-pragma-builtins.def: Add the FP8FMA intrinsics. * config/aarch64/aarch64-simd.md: (@aarch64_<FMLAL_FP8_HF:insn><mode): New pattern. (@aarch64_<FMLAL_FP8_HF:insn>_lane<V8HF_ONLY:mode><VB:mode>): Likewise. (@aarch64_<FMLALL_FP8_SF:insn><mode): Likewise. (@aarch64_<FMLALL_FP8_SF:insn>_lane<V8HF_ONLY:mode><VB:mode>): Likewise. * config/aarch64/iterators.md (V8HF_ONLY): New mode iterator. (SVE2_FP8_TERNARY_VNX8HF): Rename to... (FMLAL_FP8_HF): ...this. (SVE2_FP8_TERNARY_LANE_VNX8HF): Delete in favor of FMLAL_FP8_HF. (SVE2_FP8_TERNARY_VNX4SF): Rename to... (FMLALL_FP8_SF): ...this. (SVE2_FP8_TERNARY_LANE_VNX4SF): Delete in favor of FMLALL_FP8_SF. (sve2_fp8_fma_op_vnx8hf, sve2_fp8_fma_op_vnx4sf): Fold into... (insn): ...here. * config/aarch64/aarch64-sve2.md: Update uses accordingly. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pragma_cpp_predefs_4.c: Test TARGET_FP8FMA. * gcc.target/aarch64/simd/vmla_fpm.c: New test. * gcc.target/aarch64/simd/vmla_lane_indices_1.c: Likewise. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2024-12-10aarch64: Add support for fp8dot2 and fp8dot4Saurabh Jha5-0/+113
The AArch64 FEAT_FP8DOT2 and FEAT_FP8DOT4 extension introduces instructions for dot product of vectors. This patch introduces the following intrinsics: 1. vdot{q}_{fp16|fp32}_mf8_fpm. 2. vdot{q}_lane{q}_{fp16|fp32}_mf8_fpm. We added a new aarch64_builtin_signature variant, ternary_lane, and added support for it in the functions aarch64_fntype and aarch64_expand_pragma_builtin. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (enum class): Add ternary_lane. (aarch64_fntype): Hnadle ternary_lane. (aarch64_pragma_builtins_checker::require_immediate_lane_index): New function. (aarch64_pragma_builtins_checker::check): Handle the new intrinsics. (aarch64_expand_pragma_builtin): Likewise. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): Define TARGET_FP8DOT2 and TARGET_FP8DOT4. * config/aarch64/aarch64-simd-pragma-builtins.def: Define vdot and vdot_lane intrinsics. * config/aarch64/aarch64-simd.md (@aarch64_<fpm_uns_op><mode>): New pattern. (@aarch64_<fpm_uns_op>_lane<VQ_HSF_VDOT:mode><VB:mode>): Likewise. * config/aarch64/iterators.md (VQ_HSF_VDOT): New mode iterator. (UNSPEC_VDOT, UNSPEC_VDOT_LANE): New unspecs. (fpm_uns_op): Handle them. (VNARROWB, Vnbtype): New mode attributes. (FPM_VDOT, FPM_VDOT_LANE): New int iterators. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pragma_cpp_predefs_4.c: Test fp8dot2 and fp8dot4. * gcc.target/aarch64/simd/vdot2_fpm.c: New test. * gcc.target/aarch64/simd/vdot4_fpm.c: New test. * gcc.target/aarch64/simd/vdot_lane_indices_1.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2024-12-10aarch64: Add support for fp8 convert and scaleSaurabh Jha5-18/+269
The AArch64 FEAT_FP8 extension introduces instructions for conversion and scaling. This patch introduces the following intrinsics: 1. vcvt{1|2}_{bf16|high_bf16|low_bf16}_mf8_fpm. 2. vcvt{q}_mf8_f16_fpm. 3. vcvt_{high}_mf8_f32_fpm. 4. vscale{q}_{f16|f32|f64}. We introduced two aarch64_builtin_signatures enum variants, unary and ternary, and added support for these variants in the functions aarch64_fntype and aarch64_expand_pragma_builtin. We added new simd_types for integers (s32, s32q, and s64q) and for floating points (f8 and f8q). Because we added support for fp8 intrinsics here, we modified the check in acle/fp8.c that was checking that __ARM_FEATURE_FP8 macro is not defined. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (FLAG_USES_FPMR, FLAG_FP8): New flags. (ENTRY): Modified to support ternary operations. (enum class): New variants to support new signatures. (struct aarch64_pragma_builtins_data): Extend types to 4 elements. (aarch64_fntype): Handle new signatures. (aarch64_get_low_unspec): New function. (aarch64_convert_to_v64): New function, split out from... (aarch64_expand_pragma_builtin): ...here. Handle new signatures. * config/aarch64/aarch64-c.cc (aarch64_update_cpp_builtins): New flag for FP8. * config/aarch64/aarch64-simd-pragma-builtins.def: Define new fp8 intrinsics. (ENTRY_BINARY, ENTRY_BINARY_LANE): Update for new ENTRY interface. (ENTRY_UNARY, ENTRY_TERNARY, ENTRY_UNARY_FPM): New macros. (ENTRY_BINARY_VHSDF_SIGNED): Likewise. * config/aarch64/aarch64-simd.md (@aarch64_<fpm_uns_op><mode>): New pattern. (@aarch64_<fpm_uns_op><mode>_high): Likewise. (@aarch64_<fpm_uns_op><mode>_high_be): Likewise. (@aarch64_<fpm_uns_op><mode>_high_le): Likewise. * config/aarch64/iterators.md (V4SF_ONLY, VQ_BHF): New mode iterators. (UNSPEC_FCVTN_FP8, UNSPEC_FCVTN2_FP8, UNSPEC_F1CVTL_FP8) (UNSPEC_F1CVTL2_FP8, UNSPEC_F2CVTL_FP8, UNSPEC_F2CVTL2_FP8) (UNSPEC_FSCALE): New unspecs. (VPACKB, VPACKBtype): New mode attributes. (b): Add support for V[48][BH]F. (FPM_UNARY_UNS, FPM_BINARY_UNS, SCALE_UNS): New int iterators. (insn): New int attribute. gcc/testsuite/ChangeLog: * gcc.target/aarch64/acle/fp8.c: Remove check that fp8 feature macro doesn't exist and... * gcc.target/aarch64/pragma_cpp_predefs_4.c: ...test that it does here. * gcc.target/aarch64/simd/scale_fpm.c: New test. * gcc.target/aarch64/simd/vcvt_fpm.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2024-12-09aarch64: Fix ICE happening in SET_TYPE_VECTOR_SUBPARTS with libgccjitAntoni Boucher4-69/+95
The structure aarch64_simd_type_info was split in 2 because we do not want to reset the static members of aarch64_simd_type_info to their default value. We only want the tree types to be GC-ed. This is necessary for libgccjit which can run multiple times in the same process. If the static values were GC-ed, the second run would ICE/segfault because of their invalid value. The following test suites passed for this patch: * The aarch64 tests. * The aarch64 regression tests. The number of failures of the jit tests on aarch64 lowered from +100 to ~7. gcc/ChangeLog: PR target/117923 * config/aarch64/aarch64-builtins.cc: Remove GTY marker on aarch64_simd_types, aarch64_simd_types_trees (new variable), rename aarch64_simd_types to aarch64_simd_types_trees. * config/aarch64/aarch64-builtins.h: Remove GTY marker on aarch64_simd_types, aarch64_simd_types_trees (new variable). * config/aarch64/aarch64-sve-builtins-shapes.cc: Rename aarch64_simd_types to aarch64_simd_types_trees. * config/aarch64/aarch64-sve-builtins.cc: Rename aarch64_simd_types to aarch64_simd_types_trees.
2024-12-09aarch64: Implement new expander for efficient CRC computation.Mariam Arutunian4-0/+195
This patch introduces two new expanders for the aarch64 backend, dedicated to generate optimized code for CRC computations. The new expanders are designed to leverage specific hardware capabilities to achieve faster CRC calculations, particularly using the crc32, crc32c and pmull instructions when supported by the target architecture. Expander 1: Bit-Forward CRC (crc<ALLI:mode><ALLX:mode>4) For targets that support pmul instruction (TARGET_AES), the expander will generate code that uses the pmull (crypto_pmulldi) instruction for CRC computation. Expander 2: Bit-Reversed CRC (crc_rev<ALLI:mode><ALLX:mode>4) The expander first checks if the target supports the CRC32* instruction set (TARGET_CRC32) and the polynomial in use is 0x1EDC6F41 (iSCSI) or 0x04C11DB7 (HDLC). If the conditions are met, it emits calls to the corresponding crc32* instruction (depending on the data size and the polynomial). If the target does not support crc32* but supports pmull, it then uses the pmull (crypto_pmulldi) instruction for bit-reversed CRC computation. Otherwise table-based CRC is generated. gcc/ * config/aarch64/aarch64-protos.h (aarch64_expand_crc_using_pmull): New extern function declaration. (aarch64_expand_reversed_crc_using_pmull): Likewise. * config/aarch64/aarch64.cc (aarch64_expand_crc_using_pmull): New function. (aarch64_expand_reversed_crc_using_pmull): Likewise. * config/aarch64/aarch64.md (crc_rev<ALLI:mode><ALLX:mode>4): New expander for reversed CRC. (crc<ALLI:mode><ALLX:mode>4): New expander for bit-forward CRC. * config/aarch64/iterators.md (crc_data_type): New mode attribute. gcc/testsuite/ * gcc.target/aarch64/crc-1-pmul.c: New test. * gcc.target/aarch64/crc-10-pmul.c: Likewise. * gcc.target/aarch64/crc-12-pmul.c: Likewise. * gcc.target/aarch64/crc-13-pmul.c: Likewise. * gcc.target/aarch64/crc-14-pmul.c: Likewise. * gcc.target/aarch64/crc-17-pmul.c: Likewise. * gcc.target/aarch64/crc-18-pmul.c: Likewise. * gcc.target/aarch64/crc-21-pmul.c: Likewise. * gcc.target/aarch64/crc-22-pmul.c: Likewise. * gcc.target/aarch64/crc-23-pmul.c: Likewise. * gcc.target/aarch64/crc-4-pmul.c: Likewise. * gcc.target/aarch64/crc-5-pmul.c: Likewise. * gcc.target/aarch64/crc-6-pmul.c: Likewise. * gcc.target/aarch64/crc-7-pmul.c: Likewise. * gcc.target/aarch64/crc-8-pmul.c: Likewise. * gcc.target/aarch64/crc-9-pmul.c: Likewise. * gcc.target/aarch64/crc-CCIT-data16-pmul.c: Likewise. * gcc.target/aarch64/crc-CCIT-data8-pmul.c: Likewise. * gcc.target/aarch64/crc-coremark-16bitdata-pmul.c: Likewise. * gcc.target/aarch64/crc-crc32-data16.c: Likewise. * gcc.target/aarch64/crc-crc32-data32.c: Likewise. * gcc.target/aarch64/crc-crc32-data8.c: Likewise. * gcc.target/aarch64/crc-crc32c-data16.c: Likewise. * gcc.target/aarch64/crc-crc32c-data32.c: Likewise. * gcc.target/aarch64/crc-crc32c-data8.c: Likewise. Signed-off-by: Mariam Arutunian <mariamarutunian@gmail.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
2024-12-09aarch64: Add @ to aarch64_get_lane<mode>Richard Sandiford1-1/+1
This is a prerequisite for Mariam's CRC support. gcc/ * config/aarch64/aarch64-simd.md (aarch64_get_lane<mode>): Add "@" to the name.
2024-12-09s390: Fix UNSPEC_CC_TO_INT canonicalizationJuergen Christ1-1/+1
Canonicalization of comparisons for UNSPEC_CC_TO_INT missed one case causing unnecessarily complex code. This especially seems to hit the Linux kernel. gcc/ChangeLog: * config/s390/s390.cc (s390_canonicalize_comparison): Add missing UNSPEC_CC_TO_INT case. gcc/testsuite/ChangeLog: * gcc.target/s390/ccusage.c: New test. Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
2024-12-09c++: Allow overloaded builtins to be used in SFINAE contextMatthew Malcomson9-17/+16
This commit newly introduces the ability to use overloaded builtins in C++ SFINAE context. The goal behind this is in order to ensure there is a single mechanism that libstdc++ can use to determine whether a given type can be used in the atomic fetch_add (and similar) builtins. I am working on another patch that hopes to use this mechanism to identify whether fetch_add (and similar) work on floating point types. Current state of the world: GCC currently exposes resolved versions of these builtins to the user, so for GCC it's currently possible to use tests similar to the below to check for atomic loads on a 2 byte sized object. #if __has_builtin(__atomic_load_2) Clang does not expose resolved versions of the atomic builtins. clang currently allows SFINAE on builtins, so that C++ code can check whether a builtin is available on a given type. GCC does not (and that is what this patch aims to change). C libraries like libatomic can check whether a given atomic builtin can work on a given type by using autoconf to check for a miscompilation when attempting such a use. My goal: I would like to enable floating point fetch_add (and similar) in GCC, in order to use those overloads in libstdc++ implementation of atomic<float>::fetch_add. This should allow compilers targeting GPU's which have floating point fetch_add instructions to emit optimal code. In order to do that I need some consistent mechanism that libstdc++ can use to identify whether the fetch_add builtins have floating point overloads (and for which types these exist). I would hence like to enable SFINAE on builtins, so that libstdc++ can use that mechanism for the floating point fetch_add builtins. Implementation follows the existing mechanism for handling SFINAE contexts in c-common.cc. A boolean is passed into the c-common.cc function indicating whether these functions should emit errors or not. This boolean comes from `complain & tf_error` in the C++ frontend. (Similar to other functions like valid_array_size_p and c_build_vec_perm_expr). This is done both for resolve_overloaded_builtin and check_builtin_function_arguments, both of which can be used in SFINAE contexts. I attempted to trigger something using the `reject_gcc_builtin` function in an SFINAE context. Given the context where this function is called from the C++ frontend it looks like it may be possible, but I did not manage to trigger this in template context by attempting to do something similar to the testcases added around those calls. - I would appreciate any feedback on whether this is something that can happen in a template context, and if so some help writing a relevant testcase for it. Both of these functions have target hooks for target specific builtins that I have updated to take the extra boolean flag. I have not adjusted the functions implementing those target hooks (except to update the declarations) so target specific builtins will still error in SFINAE contexts. - I could imagine not updating the target hook definition since nothing would use that change. However I figure that allowing targets to decide this behaviour would be the right thing to do eventually, and since this is the target-independent part of the change to do that this patch should make that change. Could adjust if others disagree. Other relevant points that I'd appreciate reviewers check: - I did not pass this new flag through atomic_bitint_fetch_using_cas_loop since the _BitInt type is not available in the C++ frontend and I didn't want if conditions that can not be executed in the source. - I only test non-compile-time-constant types with SVE types, since I do not know of a way to get a VLA into a SFINAE context. - While writing tests I noticed a few differences with clang in this area. I don't think they are problematic but am mentioning them for completeness and to allow others to judge if these are a problem). - atomic_fetch_add on a boolean is allowed by clang. - When __atomic_load is passed an invalid memory model (i.e. too large), we give an SFINAE failure while clang does not. Bootstrap and regression tested on AArch64 and x86_64. Built first stage on targets whose target hook declaration needed updated (though did not regtest etc). Targets triplets I built in order to check the backend specific changes I made: - arm-none-linux-gnueabihf - avr-linux-gnu - riscv-linux-gnu - powerpc-linux-gnu - s390x-linux-gnu Ok for commit to trunk? gcc/c-family/ChangeLog: * c-common.cc (builtin_function_validate_nargs, check_builtin_function_arguments, speculation_safe_value_resolve_call, speculation_safe_value_resolve_params, sync_resolve_size, sync_resolve_params, get_atomic_generic_size, resolve_overloaded_atomic_exchange, resolve_overloaded_atomic_compare_exchange, resolve_overloaded_atomic_load, resolve_overloaded_atomic_store, resolve_overloaded_builtin): Add `complain` boolean parameter and determine whether to emit errors based on its value. * c-common.h (check_builtin_function_arguments, resolve_overloaded_builtin): Mention `complain` boolean parameter in declarations. Give it a default of `true`. gcc/ChangeLog: * config/aarch64/aarch64-c.cc (aarch64_resolve_overloaded_builtin,aarch64_check_builtin_call): Add new unused boolean parameter to match target hook definition. * config/arm/arm-builtins.cc (arm_check_builtin_call): Likewise. * config/arm/arm-c.cc (arm_resolve_overloaded_builtin): Likewise. * config/arm/arm-protos.h (arm_check_builtin_call): Likewise. * config/avr/avr-c.cc (avr_resolve_overloaded_builtin): Likewise. * config/riscv/riscv-c.cc (riscv_check_builtin_call, riscv_resolve_overloaded_builtin): Likewise. * config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin): Likewise. * config/rs6000/rs6000-protos.h (altivec_resolve_overloaded_builtin): Likewise. * config/s390/s390-c.cc (s390_resolve_overloaded_builtin): Likewise. * doc/tm.texi: Regenerate. * target.def (TARGET_RESOLVE_OVERLOADED_BUILTIN, TARGET_CHECK_BUILTIN_CALL): Update prototype to include a boolean parameter that indicates whether errors should be emitted. Update documentation to mention this fact. gcc/cp/ChangeLog: * call.cc (build_cxx_call): Pass `complain` parameter to check_builtin_function_arguments. Take its value from the `tsubst_flags_t` type `complain & tf_error`. * semantics.cc (finish_call_expr): Pass `complain` parameter to resolve_overloaded_builtin. Take its value from the `tsubst_flags_t` type `complain & tf_error`. gcc/testsuite/ChangeLog: * g++.dg/template/builtin-atomic-overloads.def: New test. * g++.dg/template/builtin-atomic-overloads1.C: New test. * g++.dg/template/builtin-atomic-overloads2.C: New test. * g++.dg/template/builtin-atomic-overloads3.C: New test. * g++.dg/template/builtin-atomic-overloads4.C: New test. * g++.dg/template/builtin-atomic-overloads5.C: New test. * g++.dg/template/builtin-atomic-overloads6.C: New test. * g++.dg/template/builtin-atomic-overloads7.C: New test. * g++.dg/template/builtin-atomic-overloads8.C: New test. * g++.dg/template/builtin-sfinae-check-function-arguments.C: New test. * g++.dg/template/builtin-speculation-overloads.def: New test. * g++.dg/template/builtin-speculation-overloads1.C: New test. * g++.dg/template/builtin-speculation-overloads2.C: New test. * g++.dg/template/builtin-speculation-overloads3.C: New test. * g++.dg/template/builtin-speculation-overloads4.C: New test. * g++.dg/template/builtin-speculation-overloads5.C: New test. * g++.dg/template/builtin-validate-nargs.C: New test. Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
2024-12-09arm: remove obsolete vcond expandersRichard Earnshaw3-116/+0
The vcond{,u} expander paterns have been declared as obsolete. Remove them from the Arm backend. gcc/ChangeLog: PR target/114189 * config/arm/arm-protos.h (arm_expand_vcond): Delete prototype. * config/arm/arm.cc (arm_expand_vcond): Delete function. * config/arm/vec-common.md (vcond<mode><mode>): Delete pattern (vcond<V_cvtto><mode>): Likewise. (vcond<VH_cvtto><mode>): Likewise. (vcondu<mode><v_cmp_result>): Likewise.
2024-12-09GCN: Fix 'real_from_integer' usageThomas Schwinge1-1/+1
The recent commit b3f1b9e2aa079f8ec73e3cb48143a16645c49566 "build: Remove INCLUDE_MEMORY [PR117737]" exposed an issue in code added in 2020 GCN back end commit 95607c12363712c39345e1d97f2c1aee8025e188 "Zero-initialise masked load destinations"; compilation now fails: [...] In file included from ../../source-gcc/gcc/coretypes.h:507:0, from ../../source-gcc/gcc/config/gcn/gcn.cc:24: ../../source-gcc/gcc/real.h: In instantiation of ‘format_helper::format_helper(const T&) [with T = std::nullptr_t]’: ../../source-gcc/gcc/config/gcn/gcn.cc:1178:46: required from here ../../source-gcc/gcc/real.h:233:17: error: no match for ‘operator==’ (operand types are ‘std::nullptr_t’ and ‘machine_mode’) : m_format (m == VOIDmode ? 0 : REAL_MODE_FORMAT (m)) ^ [...] That's with 'g++ (GCC) 5.5.0', and seen similarly with 'g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0', for example. gcc/ * config/gcn/gcn.cc (gcn_vec_constant): Fix 'real_from_integer' usage.
2024-12-09aarch64: Update cpuinfo strings for some arch featuresKyrylo Tkachov1-9/+9
The entries for some recently-added arch features were missing the cpuinfo string used in -march=native detection. Presumably the Linux kernel had not specified such a string at the time the GCC support was added. But I see that current versions of Linux do have strings for these features in the arch/arm64/kernel/cpuinfo.c file in the kernel tree. This patch adds them. This fixes the strings for the f32mm and f64mm features which I think were using the wrong string. The kernel exposes them with an "sve" prefix. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-option-extensions.def (sve-b16b16, f32mm, f64mm, sve2p1, sme-f64f64, sme-i16i64, sme-b16b16, sme-f16f16, mops): Update FEATURE_STRING field.
2024-12-08pru: Implement c and n asm operand modifiersDimitar Dimitrov1-1/+11
Fix c-c++-common/toplevel-asm-1.c failure for PRU backend, caused by missing implementation of the "c" asm operand modifier. gcc/ChangeLog: * config/pru/pru.cc (pru_print_operand): Implement c and n inline assembly operand modifiers. gcc/testsuite/ChangeLog: * gcc.target/pru/asm-op-modifier.c: New test. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-12-07SPARC: Add functional comments for VIS4B instructionsEric Botcazou1-1/+14
gcc/ * config/sparc/sparc.md (VIS4B instructions): Add comments.
2024-12-07AVR: Better location for late (during final) diagnostic.Georg-Johann Lay1-5/+11
gcc/ * config/avr/avr.cc (avr_print_operand_address): Use avr_insn_location as location for late (during final) diagnostic.
2024-12-07i386: x r<< (c - y) to x r>> y etc. optimization [PR117930]Jakub Jelinek1-0/+141
The following patch optimizes x r<< (c - y) to x r>> y, x r>> (c - y) to x r<< y, x r<< (c + y) to x r<< y and x r>> (c + y) to x r>> y if c is a multiple of x's bitsize. 2024-12-07 Jakub Jelinek <jakub@redhat.com> PR target/117930 * config/i386/i386.md (crotate): New define_code_attr. (*<insn><mode>3_add, *<insn><mode>3_add_1, *<insn><mode>3_sub, *<insn><mode>3_sub_1): New define_insn_and_split patterns plus following define_split for constant first input operand. * gcc.target/i386/pr117930.c: New test.
2024-12-07Revert "RISC-V: Add const to function_shape::get_name [NFC]"Kito Cheng3-73/+73
This reverts commit 9bf4cad4e4e1ec92c320a619c9bad35535596ced.
2024-12-06Support for 64-bit location_t: RTL partsLewis Hyatt3-10/+17
Some RTL objects need to store a location_t. Currently, they store it in the rt_int field of union rtunion, but in a world where location_t could be 64-bit, they need to store it in a larger variable. Unfortunately, rtunion does not currently have a 64-bit int type for that purpose, so add one. In order to avoid increasing any overhead when 64-bit locations are not in use, the new field is dedicated for location_t storage only and has type "location_t" so it will only be 64-bit if necessary. This necessitates adding a new RTX format code 'L' for locations. There are very many switch statements in the codebase that inspect the RTX format code. I took the approach of finding all of them that handle code 'i' or 'n' and making sure they handle 'L' too. I am sure that some of these call sites can never see an 'L' code, but I thought it would be safer and more future-proof to handle as many as possible, given it's just a line or two to add in most cases. gcc/ChangeLog: * rtl.def (DEBUG_INSN): Use new format code 'L' for location_t fields. (INSN): Likewise. (JUMP_INSN): Likewise. (CALL_INSN): Likewise. (ASM_INPUT): Likewise. (ASM_OPERANDS): Likewise. * rtl.h (union rtunion): Add new location_t RT_LOC member for use by the 'L' format. (struct rtx_debug_insn): Adjust comment. (struct rtx_nonjump_insn): Adjust comment. (struct rtx_call_insn): Adjust comment. (XLOC): New accessor macro for rtunion::rt_loc. (X0LOC): Likewise. (XCLOC): Likewise. (INSN_LOCATION): Use XLOC instead of XUINT to retrieve a location_t. (NOTE_MARKER_LOCATION): Likewise for XCUINT -> XCLOC. (ASM_OPERANDS_SOURCE_LOCATION): Likewise. (ASM_INPUT_SOURCE_LOCATION):Likewise. (gen_rtx_ASM_INPUT): Adjust to use sL format instead of si. (gen_rtx_INSN): Adjust prototype to use location_r rather than int for the location. * cfgrtl.cc (force_nonfallthru_and_redirect): Change type of LOC local variable from int to location_t. * rtlhash.cc (add_rtx): Support 'L' format in the switch statement. * var-tracking.cc (loc_cmp): Likewise. * alias.cc (rtx_equal_for_memref_p): Likewise. * config/alpha/alpha.cc (summarize_insn): Likewise. * config/ia64/ia64.cc (rtx_needs_barrier): Likewise. * config/rs6000/rs6000.cc (rs6000_hash_constant): Likewise. * cse.cc (hash_rtx): Likewise. (exp_equiv_p): Likewise. * cselib.cc (rtx_equal_for_cselib_1): Likewise. (cselib_hash_rtx): Likewise. (cselib_expand_value_rtx_1): Likewise. * emit-rtl.cc (copy_insn_1): Likewise. (gen_rtx_INSN): Change the location argument from int to location_t, and call the corresponding gen_rtf_fmt_* function. * final.cc (leaf_renumber_regs_insn): Support 'L' format in the switch statement. * genattrtab.cc (attr_rtx_1): Likewise. * genemit.cc (gen_exp): Likewise. * gengenrtl.cc (type_from_format): Likewise. (accessor_from_format): Likewise. * gengtype.cc (adjust_field_rtx_def): Likewise. * genpeep.cc (match_rtx): Likewise; just mark gcc_unreachable() for now. * genrecog.cc (find_operand): Support 'L' format in the switch statement. (find_matching_operand): Likewise. (validate_pattern): Likewise. * gensupport.cc (subst_pattern_match): Likewise. (get_alternatives_number): Likewise. (collect_insn_data): Likewise. (alter_predicate_for_insn): Likewise. (alter_constraints): Likewise. (subst_dup): Likewise. * jump.cc (rtx_renumbered_equal_p): Likewise. * loop-invariant.cc (hash_invariant_expr_1): Likewise. * lra-constraints.cc (operands_match_p): Likewise. * lra.cc (lra_rtx_hash): Likewise. * print-rtl.cc (rtx_writer::print_rtx_operand_code_i): Refactor location_t-relevant code to... (rtx_writer::print_rtx_operand_code_L): ...new function here. (rtx_writer::print_rtx_operand): Support 'L' format in the switch statement. * print-rtl.h (rtx_writer::print_rtx_operand_code_L): Add prototype for new function. * read-rtl-function.cc (function_reader::read_rtx_operand): Support 'L' format in the switch statement. (function_reader::read_rtx_operand_i_or_n): Rename to... (function_reader::read_rtx_operand_inL): ...this, and support 'L' as well. * read-rtl.cc (apply_int_iterator): Support 'L' format in the switch statement. (rtx_reader::read_rtx_operand): Likewise. * reload.cc (operands_match_p): Likewise. * rtl.cc (rtx_format): Add new code 'L'. (rtx_equal_p): Support 'L' in the switch statement. Remove dead code in the handling for 'i' and 'n'.