aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-05-03RISC-V: Support segment intrinsicsJu-Zhe Zhong12-118/+1325
Add segment load/store intrinsics: https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/198 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (fold_fault_load): New function. (class vlseg): New class. (class vsseg): Ditto. (class vlsseg): Ditto. (class vssseg): Ditto. (class seg_indexed_load): Ditto. (class seg_indexed_store): Ditto. (class vlsegff): Ditto. (BASE): Ditto. * config/riscv/riscv-vector-builtins-bases.h: Ditto. * config/riscv/riscv-vector-builtins-functions.def (vlseg): Ditto. (vsseg): Ditto. (vlsseg): Ditto. (vssseg): Ditto. (vluxseg): Ditto. (vloxseg): Ditto. (vsuxseg): Ditto. (vsoxseg): Ditto. (vlsegff): Ditto. * config/riscv/riscv-vector-builtins-shapes.cc (struct seg_loadstore_def): Ditto. (struct seg_indexed_loadstore_def): Ditto. (struct seg_fault_load_def): Ditto. (SHAPE): Ditto. * config/riscv/riscv-vector-builtins-shapes.h: Ditto. * config/riscv/riscv-vector-builtins.cc (function_builder::append_nf): New function. * config/riscv/riscv-vector-builtins.def (vfloat32m1x2_t): Change ptr from double into float. (vfloat32m1x3_t): Ditto. (vfloat32m1x4_t): Ditto. (vfloat32m1x5_t): Ditto. (vfloat32m1x6_t): Ditto. (vfloat32m1x7_t): Ditto. (vfloat32m1x8_t): Ditto. (vfloat32m2x2_t): Ditto. (vfloat32m2x3_t): Ditto. (vfloat32m2x4_t): Ditto. (vfloat32m4x2_t): Ditto. * config/riscv/riscv-vector-builtins.h: Add segment intrinsics. * config/riscv/riscv-vsetvl.cc (fault_first_load_p): Adapt for segment ff load. * config/riscv/riscv.md: Add segment instructions. * config/riscv/vector-iterators.md: Support segment intrinsics. * config/riscv/vector.md (@pred_unit_strided_load<mode>): New pattern. (@pred_unit_strided_store<mode>): Ditto. (@pred_strided_load<mode>): Ditto. (@pred_strided_store<mode>): Ditto. (@pred_fault_load<mode>): Ditto. (@pred_indexed_<order>load<V1T:mode><V1I:mode>): Ditto. (@pred_indexed_<order>load<V2T:mode><V2I:mode>): Ditto. (@pred_indexed_<order>load<V4T:mode><V4I:mode>): Ditto. (@pred_indexed_<order>load<V8T:mode><V8I:mode>): Ditto. (@pred_indexed_<order>load<V16T:mode><V16I:mode>): Ditto. (@pred_indexed_<order>load<V32T:mode><V32I:mode>): Ditto. (@pred_indexed_<order>load<V64T:mode><V64I:mode>): Ditto. (@pred_indexed_<order>store<V1T:mode><V1I:mode>): Ditto. (@pred_indexed_<order>store<V2T:mode><V2I:mode>): Ditto. (@pred_indexed_<order>store<V4T:mode><V4I:mode>): Ditto. (@pred_indexed_<order>store<V8T:mode><V8I:mode>): Ditto. (@pred_indexed_<order>store<V16T:mode><V16I:mode>): Ditto. (@pred_indexed_<order>store<V32T:mode><V32I:mode>): Ditto. (@pred_indexed_<order>store<V64T:mode><V64I:mode>): Ditto. Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
2023-05-03RISC-V: Add tuple type vget/vset intrinsicsJu-Zhe Zhong7-321/+688
gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc (valid_type): Adapt for tuple type support. (inttype): Ditto. (floattype): Ditto. (main): Ditto. * config/riscv/riscv-vector-builtins-bases.cc: Ditto. * config/riscv/riscv-vector-builtins-functions.def (vset): Add tuple type vset. (vget): Add tuple type vget. * config/riscv/riscv-vector-builtins-types.def (DEF_RVV_TUPLE_OPS): New macro. (vint8mf8x2_t): Ditto. (vuint8mf8x2_t): Ditto. (vint8mf8x3_t): Ditto. (vuint8mf8x3_t): Ditto. (vint8mf8x4_t): Ditto. (vuint8mf8x4_t): Ditto. (vint8mf8x5_t): Ditto. (vuint8mf8x5_t): Ditto. (vint8mf8x6_t): Ditto. (vuint8mf8x6_t): Ditto. (vint8mf8x7_t): Ditto. (vuint8mf8x7_t): Ditto. (vint8mf8x8_t): Ditto. (vuint8mf8x8_t): Ditto. (vint8mf4x2_t): Ditto. (vuint8mf4x2_t): Ditto. (vint8mf4x3_t): Ditto. (vuint8mf4x3_t): Ditto. (vint8mf4x4_t): Ditto. (vuint8mf4x4_t): Ditto. (vint8mf4x5_t): Ditto. (vuint8mf4x5_t): Ditto. (vint8mf4x6_t): Ditto. (vuint8mf4x6_t): Ditto. (vint8mf4x7_t): Ditto. (vuint8mf4x7_t): Ditto. (vint8mf4x8_t): Ditto. (vuint8mf4x8_t): Ditto. (vint8mf2x2_t): Ditto. (vuint8mf2x2_t): Ditto. (vint8mf2x3_t): Ditto. (vuint8mf2x3_t): Ditto. (vint8mf2x4_t): Ditto. (vuint8mf2x4_t): Ditto. (vint8mf2x5_t): Ditto. (vuint8mf2x5_t): Ditto. (vint8mf2x6_t): Ditto. (vuint8mf2x6_t): Ditto. (vint8mf2x7_t): Ditto. (vuint8mf2x7_t): Ditto. (vint8mf2x8_t): Ditto. (vuint8mf2x8_t): Ditto. (vint8m1x2_t): Ditto. (vuint8m1x2_t): Ditto. (vint8m1x3_t): Ditto. (vuint8m1x3_t): Ditto. (vint8m1x4_t): Ditto. (vuint8m1x4_t): Ditto. (vint8m1x5_t): Ditto. (vuint8m1x5_t): Ditto. (vint8m1x6_t): Ditto. (vuint8m1x6_t): Ditto. (vint8m1x7_t): Ditto. (vuint8m1x7_t): Ditto. (vint8m1x8_t): Ditto. (vuint8m1x8_t): Ditto. (vint8m2x2_t): Ditto. (vuint8m2x2_t): Ditto. (vint8m2x3_t): Ditto. (vuint8m2x3_t): Ditto. (vint8m2x4_t): Ditto. (vuint8m2x4_t): Ditto. (vint8m4x2_t): Ditto. (vuint8m4x2_t): Ditto. (vint16mf4x2_t): Ditto. (vuint16mf4x2_t): Ditto. (vint16mf4x3_t): Ditto. (vuint16mf4x3_t): Ditto. (vint16mf4x4_t): Ditto. (vuint16mf4x4_t): Ditto. (vint16mf4x5_t): Ditto. (vuint16mf4x5_t): Ditto. (vint16mf4x6_t): Ditto. (vuint16mf4x6_t): Ditto. (vint16mf4x7_t): Ditto. (vuint16mf4x7_t): Ditto. (vint16mf4x8_t): Ditto. (vuint16mf4x8_t): Ditto. (vint16mf2x2_t): Ditto. (vuint16mf2x2_t): Ditto. (vint16mf2x3_t): Ditto. (vuint16mf2x3_t): Ditto. (vint16mf2x4_t): Ditto. (vuint16mf2x4_t): Ditto. (vint16mf2x5_t): Ditto. (vuint16mf2x5_t): Ditto. (vint16mf2x6_t): Ditto. (vuint16mf2x6_t): Ditto. (vint16mf2x7_t): Ditto. (vuint16mf2x7_t): Ditto. (vint16mf2x8_t): Ditto. (vuint16mf2x8_t): Ditto. (vint16m1x2_t): Ditto. (vuint16m1x2_t): Ditto. (vint16m1x3_t): Ditto. (vuint16m1x3_t): Ditto. (vint16m1x4_t): Ditto. (vuint16m1x4_t): Ditto. (vint16m1x5_t): Ditto. (vuint16m1x5_t): Ditto. (vint16m1x6_t): Ditto. (vuint16m1x6_t): Ditto. (vint16m1x7_t): Ditto. (vuint16m1x7_t): Ditto. (vint16m1x8_t): Ditto. (vuint16m1x8_t): Ditto. (vint16m2x2_t): Ditto. (vuint16m2x2_t): Ditto. (vint16m2x3_t): Ditto. (vuint16m2x3_t): Ditto. (vint16m2x4_t): Ditto. (vuint16m2x4_t): Ditto. (vint16m4x2_t): Ditto. (vuint16m4x2_t): Ditto. (vint32mf2x2_t): Ditto. (vuint32mf2x2_t): Ditto. (vint32mf2x3_t): Ditto. (vuint32mf2x3_t): Ditto. (vint32mf2x4_t): Ditto. (vuint32mf2x4_t): Ditto. (vint32mf2x5_t): Ditto. (vuint32mf2x5_t): Ditto. (vint32mf2x6_t): Ditto. (vuint32mf2x6_t): Ditto. (vint32mf2x7_t): Ditto. (vuint32mf2x7_t): Ditto. (vint32mf2x8_t): Ditto. (vuint32mf2x8_t): Ditto. (vint32m1x2_t): Ditto. (vuint32m1x2_t): Ditto. (vint32m1x3_t): Ditto. (vuint32m1x3_t): Ditto. (vint32m1x4_t): Ditto. (vuint32m1x4_t): Ditto. (vint32m1x5_t): Ditto. (vuint32m1x5_t): Ditto. (vint32m1x6_t): Ditto. (vuint32m1x6_t): Ditto. (vint32m1x7_t): Ditto. (vuint32m1x7_t): Ditto. (vint32m1x8_t): Ditto. (vuint32m1x8_t): Ditto. (vint32m2x2_t): Ditto. (vuint32m2x2_t): Ditto. (vint32m2x3_t): Ditto. (vuint32m2x3_t): Ditto. (vint32m2x4_t): Ditto. (vuint32m2x4_t): Ditto. (vint32m4x2_t): Ditto. (vuint32m4x2_t): Ditto. (vint64m1x2_t): Ditto. (vuint64m1x2_t): Ditto. (vint64m1x3_t): Ditto. (vuint64m1x3_t): Ditto. (vint64m1x4_t): Ditto. (vuint64m1x4_t): Ditto. (vint64m1x5_t): Ditto. (vuint64m1x5_t): Ditto. (vint64m1x6_t): Ditto. (vuint64m1x6_t): Ditto. (vint64m1x7_t): Ditto. (vuint64m1x7_t): Ditto. (vint64m1x8_t): Ditto. (vuint64m1x8_t): Ditto. (vint64m2x2_t): Ditto. (vuint64m2x2_t): Ditto. (vint64m2x3_t): Ditto. (vuint64m2x3_t): Ditto. (vint64m2x4_t): Ditto. (vuint64m2x4_t): Ditto. (vint64m4x2_t): Ditto. (vuint64m4x2_t): Ditto. (vfloat32mf2x2_t): Ditto. (vfloat32mf2x3_t): Ditto. (vfloat32mf2x4_t): Ditto. (vfloat32mf2x5_t): Ditto. (vfloat32mf2x6_t): Ditto. (vfloat32mf2x7_t): Ditto. (vfloat32mf2x8_t): Ditto. (vfloat32m1x2_t): Ditto. (vfloat32m1x3_t): Ditto. (vfloat32m1x4_t): Ditto. (vfloat32m1x5_t): Ditto. (vfloat32m1x6_t): Ditto. (vfloat32m1x7_t): Ditto. (vfloat32m1x8_t): Ditto. (vfloat32m2x2_t): Ditto. (vfloat32m2x3_t): Ditto. (vfloat32m2x4_t): Ditto. (vfloat32m4x2_t): Ditto. (vfloat64m1x2_t): Ditto. (vfloat64m1x3_t): Ditto. (vfloat64m1x4_t): Ditto. (vfloat64m1x5_t): Ditto. (vfloat64m1x6_t): Ditto. (vfloat64m1x7_t): Ditto. (vfloat64m1x8_t): Ditto. (vfloat64m2x2_t): Ditto. (vfloat64m2x3_t): Ditto. (vfloat64m2x4_t): Ditto. (vfloat64m4x2_t): Ditto. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_OPS): Ditto. (DEF_RVV_TYPE_INDEX): Ditto. (rvv_arg_type_info::get_tuple_subpart_type): New function. (DEF_RVV_TUPLE_TYPE): New macro. * config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX): Adapt for tuple vget/vset support. (vint8mf4_t): Ditto. (vuint8mf4_t): Ditto. (vint8mf2_t): Ditto. (vuint8mf2_t): Ditto. (vint8m1_t): Ditto. (vuint8m1_t): Ditto. (vint8m2_t): Ditto. (vuint8m2_t): Ditto. (vint8m4_t): Ditto. (vuint8m4_t): Ditto. (vint8m8_t): Ditto. (vuint8m8_t): Ditto. (vint16mf4_t): Ditto. (vuint16mf4_t): Ditto. (vint16mf2_t): Ditto. (vuint16mf2_t): Ditto. (vint16m1_t): Ditto. (vuint16m1_t): Ditto. (vint16m2_t): Ditto. (vuint16m2_t): Ditto. (vint16m4_t): Ditto. (vuint16m4_t): Ditto. (vint16m8_t): Ditto. (vuint16m8_t): Ditto. (vint32mf2_t): Ditto. (vuint32mf2_t): Ditto. (vint32m1_t): Ditto. (vuint32m1_t): Ditto. (vint32m2_t): Ditto. (vuint32m2_t): Ditto. (vint32m4_t): Ditto. (vuint32m4_t): Ditto. (vint32m8_t): Ditto. (vuint32m8_t): Ditto. (vint64m1_t): Ditto. (vuint64m1_t): Ditto. (vint64m2_t): Ditto. (vuint64m2_t): Ditto. (vint64m4_t): Ditto. (vuint64m4_t): Ditto. (vint64m8_t): Ditto. (vuint64m8_t): Ditto. (vfloat32mf2_t): Ditto. (vfloat32m1_t): Ditto. (vfloat32m2_t): Ditto. (vfloat32m4_t): Ditto. (vfloat32m8_t): Ditto. (vfloat64m1_t): Ditto. (vfloat64m2_t): Ditto. (vfloat64m4_t): Ditto. (vfloat64m8_t): Ditto. (tuple_subpart): Add tuple subpart base type. * config/riscv/riscv-vector-builtins.h (struct rvv_arg_type_info): Ditto. (tuple_type_field): New function. Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
2023-05-03RISC-V: Add tuple types supportJu-Zhe Zhong56-19/+6548
gcc/ChangeLog: * config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro. (RVV_TUPLE_PARTIAL_MODES): Ditto. * config/riscv/riscv-protos.h (riscv_v_ext_tuple_mode_p): New function. (get_nf): Ditto. (get_subpart_mode): Ditto. (get_tuple_mode): Ditto. (expand_tuple_move): Ditto. * config/riscv/riscv-v.cc (ENTRY): New macro. (TUPLE_ENTRY): Ditto. (get_nf): New function. (get_subpart_mode): Ditto. (get_tuple_mode): Ditto. (expand_tuple_move): Ditto. * config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_TYPE): New macro. (register_tuple_type): New function * config/riscv/riscv-vector-builtins.def (DEF_RVV_TUPLE_TYPE): New macro. (vint8mf8x2_t): New macro. (vuint8mf8x2_t): Ditto. (vint8mf8x3_t): Ditto. (vuint8mf8x3_t): Ditto. (vint8mf8x4_t): Ditto. (vuint8mf8x4_t): Ditto. (vint8mf8x5_t): Ditto. (vuint8mf8x5_t): Ditto. (vint8mf8x6_t): Ditto. (vuint8mf8x6_t): Ditto. (vint8mf8x7_t): Ditto. (vuint8mf8x7_t): Ditto. (vint8mf8x8_t): Ditto. (vuint8mf8x8_t): Ditto. (vint8mf4x2_t): Ditto. (vuint8mf4x2_t): Ditto. (vint8mf4x3_t): Ditto. (vuint8mf4x3_t): Ditto. (vint8mf4x4_t): Ditto. (vuint8mf4x4_t): Ditto. (vint8mf4x5_t): Ditto. (vuint8mf4x5_t): Ditto. (vint8mf4x6_t): Ditto. (vuint8mf4x6_t): Ditto. (vint8mf4x7_t): Ditto. (vuint8mf4x7_t): Ditto. (vint8mf4x8_t): Ditto. (vuint8mf4x8_t): Ditto. (vint8mf2x2_t): Ditto. (vuint8mf2x2_t): Ditto. (vint8mf2x3_t): Ditto. (vuint8mf2x3_t): Ditto. (vint8mf2x4_t): Ditto. (vuint8mf2x4_t): Ditto. (vint8mf2x5_t): Ditto. (vuint8mf2x5_t): Ditto. (vint8mf2x6_t): Ditto. (vuint8mf2x6_t): Ditto. (vint8mf2x7_t): Ditto. (vuint8mf2x7_t): Ditto. (vint8mf2x8_t): Ditto. (vuint8mf2x8_t): Ditto. (vint8m1x2_t): Ditto. (vuint8m1x2_t): Ditto. (vint8m1x3_t): Ditto. (vuint8m1x3_t): Ditto. (vint8m1x4_t): Ditto. (vuint8m1x4_t): Ditto. (vint8m1x5_t): Ditto. (vuint8m1x5_t): Ditto. (vint8m1x6_t): Ditto. (vuint8m1x6_t): Ditto. (vint8m1x7_t): Ditto. (vuint8m1x7_t): Ditto. (vint8m1x8_t): Ditto. (vuint8m1x8_t): Ditto. (vint8m2x2_t): Ditto. (vuint8m2x2_t): Ditto. (vint8m2x3_t): Ditto. (vuint8m2x3_t): Ditto. (vint8m2x4_t): Ditto. (vuint8m2x4_t): Ditto. (vint8m4x2_t): Ditto. (vuint8m4x2_t): Ditto. (vint16mf4x2_t): Ditto. (vuint16mf4x2_t): Ditto. (vint16mf4x3_t): Ditto. (vuint16mf4x3_t): Ditto. (vint16mf4x4_t): Ditto. (vuint16mf4x4_t): Ditto. (vint16mf4x5_t): Ditto. (vuint16mf4x5_t): Ditto. (vint16mf4x6_t): Ditto. (vuint16mf4x6_t): Ditto. (vint16mf4x7_t): Ditto. (vuint16mf4x7_t): Ditto. (vint16mf4x8_t): Ditto. (vuint16mf4x8_t): Ditto. (vint16mf2x2_t): Ditto. (vuint16mf2x2_t): Ditto. (vint16mf2x3_t): Ditto. (vuint16mf2x3_t): Ditto. (vint16mf2x4_t): Ditto. (vuint16mf2x4_t): Ditto. (vint16mf2x5_t): Ditto. (vuint16mf2x5_t): Ditto. (vint16mf2x6_t): Ditto. (vuint16mf2x6_t): Ditto. (vint16mf2x7_t): Ditto. (vuint16mf2x7_t): Ditto. (vint16mf2x8_t): Ditto. (vuint16mf2x8_t): Ditto. (vint16m1x2_t): Ditto. (vuint16m1x2_t): Ditto. (vint16m1x3_t): Ditto. (vuint16m1x3_t): Ditto. (vint16m1x4_t): Ditto. (vuint16m1x4_t): Ditto. (vint16m1x5_t): Ditto. (vuint16m1x5_t): Ditto. (vint16m1x6_t): Ditto. (vuint16m1x6_t): Ditto. (vint16m1x7_t): Ditto. (vuint16m1x7_t): Ditto. (vint16m1x8_t): Ditto. (vuint16m1x8_t): Ditto. (vint16m2x2_t): Ditto. (vuint16m2x2_t): Ditto. (vint16m2x3_t): Ditto. (vuint16m2x3_t): Ditto. (vint16m2x4_t): Ditto. (vuint16m2x4_t): Ditto. (vint16m4x2_t): Ditto. (vuint16m4x2_t): Ditto. (vint32mf2x2_t): Ditto. (vuint32mf2x2_t): Ditto. (vint32mf2x3_t): Ditto. (vuint32mf2x3_t): Ditto. (vint32mf2x4_t): Ditto. (vuint32mf2x4_t): Ditto. (vint32mf2x5_t): Ditto. (vuint32mf2x5_t): Ditto. (vint32mf2x6_t): Ditto. (vuint32mf2x6_t): Ditto. (vint32mf2x7_t): Ditto. (vuint32mf2x7_t): Ditto. (vint32mf2x8_t): Ditto. (vuint32mf2x8_t): Ditto. (vint32m1x2_t): Ditto. (vuint32m1x2_t): Ditto. (vint32m1x3_t): Ditto. (vuint32m1x3_t): Ditto. (vint32m1x4_t): Ditto. (vuint32m1x4_t): Ditto. (vint32m1x5_t): Ditto. (vuint32m1x5_t): Ditto. (vint32m1x6_t): Ditto. (vuint32m1x6_t): Ditto. (vint32m1x7_t): Ditto. (vuint32m1x7_t): Ditto. (vint32m1x8_t): Ditto. (vuint32m1x8_t): Ditto. (vint32m2x2_t): Ditto. (vuint32m2x2_t): Ditto. (vint32m2x3_t): Ditto. (vuint32m2x3_t): Ditto. (vint32m2x4_t): Ditto. (vuint32m2x4_t): Ditto. (vint32m4x2_t): Ditto. (vuint32m4x2_t): Ditto. (vint64m1x2_t): Ditto. (vuint64m1x2_t): Ditto. (vint64m1x3_t): Ditto. (vuint64m1x3_t): Ditto. (vint64m1x4_t): Ditto. (vuint64m1x4_t): Ditto. (vint64m1x5_t): Ditto. (vuint64m1x5_t): Ditto. (vint64m1x6_t): Ditto. (vuint64m1x6_t): Ditto. (vint64m1x7_t): Ditto. (vuint64m1x7_t): Ditto. (vint64m1x8_t): Ditto. (vuint64m1x8_t): Ditto. (vint64m2x2_t): Ditto. (vuint64m2x2_t): Ditto. (vint64m2x3_t): Ditto. (vuint64m2x3_t): Ditto. (vint64m2x4_t): Ditto. (vuint64m2x4_t): Ditto. (vint64m4x2_t): Ditto. (vuint64m4x2_t): Ditto. (vfloat32mf2x2_t): Ditto. (vfloat32mf2x3_t): Ditto. (vfloat32mf2x4_t): Ditto. (vfloat32mf2x5_t): Ditto. (vfloat32mf2x6_t): Ditto. (vfloat32mf2x7_t): Ditto. (vfloat32mf2x8_t): Ditto. (vfloat32m1x2_t): Ditto. (vfloat32m1x3_t): Ditto. (vfloat32m1x4_t): Ditto. (vfloat32m1x5_t): Ditto. (vfloat32m1x6_t): Ditto. (vfloat32m1x7_t): Ditto. (vfloat32m1x8_t): Ditto. (vfloat32m2x2_t): Ditto. (vfloat32m2x3_t): Ditto. (vfloat32m2x4_t): Ditto. (vfloat32m4x2_t): Ditto. (vfloat64m1x2_t): Ditto. (vfloat64m1x3_t): Ditto. (vfloat64m1x4_t): Ditto. (vfloat64m1x5_t): Ditto. (vfloat64m1x6_t): Ditto. (vfloat64m1x7_t): Ditto. (vfloat64m1x8_t): Ditto. (vfloat64m2x2_t): Ditto. (vfloat64m2x3_t): Ditto. (vfloat64m2x4_t): Ditto. (vfloat64m4x2_t): Ditto. * config/riscv/riscv-vector-builtins.h (DEF_RVV_TUPLE_TYPE): Ditto. * config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto. * config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): New function. (TUPLE_ENTRY): Ditto. (riscv_v_ext_mode_p): New function. (riscv_v_adjust_nunits): Add tuple mode adjustment. (riscv_classify_address): Ditto. (riscv_binary_cost): Ditto. (riscv_rtx_costs): Ditto. (riscv_secondary_memory_needed): Ditto. (riscv_hard_regno_nregs): Ditto. (riscv_hard_regno_mode_ok): Ditto. (riscv_vector_mode_supported_p): Ditto. (riscv_regmode_natural_size): Ditto. (riscv_array_mode): New function. (TARGET_ARRAY_MODE): New target hook. * config/riscv/riscv.md: Add tuple modes. * config/riscv/vector-iterators.md: Ditto. * config/riscv/vector.md (mov<mode>): Add tuple modes data movement. (*mov<VT:mode>_<P:mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/abi-10.c: New test. * gcc.target/riscv/rvv/base/abi-11.c: New test. * gcc.target/riscv/rvv/base/abi-12.c: New test. * gcc.target/riscv/rvv/base/abi-13.c: New test. * gcc.target/riscv/rvv/base/abi-14.c: New test. * gcc.target/riscv/rvv/base/abi-15.c: New test. * gcc.target/riscv/rvv/base/abi-16.c: New test. * gcc.target/riscv/rvv/base/abi-8.c: New test. * gcc.target/riscv/rvv/base/abi-9.c: New test. * gcc.target/riscv/rvv/base/tuple-1.c: New test. * gcc.target/riscv/rvv/base/tuple-10.c: New test. * gcc.target/riscv/rvv/base/tuple-11.c: New test. * gcc.target/riscv/rvv/base/tuple-12.c: New test. * gcc.target/riscv/rvv/base/tuple-13.c: New test. * gcc.target/riscv/rvv/base/tuple-14.c: New test. * gcc.target/riscv/rvv/base/tuple-15.c: New test. * gcc.target/riscv/rvv/base/tuple-16.c: New test. * gcc.target/riscv/rvv/base/tuple-17.c: New test. * gcc.target/riscv/rvv/base/tuple-18.c: New test. * gcc.target/riscv/rvv/base/tuple-19.c: New test. * gcc.target/riscv/rvv/base/tuple-2.c: New test. * gcc.target/riscv/rvv/base/tuple-20.c: New test. * gcc.target/riscv/rvv/base/tuple-21.c: New test. * gcc.target/riscv/rvv/base/tuple-22.c: New test. * gcc.target/riscv/rvv/base/tuple-23.c: New test. * gcc.target/riscv/rvv/base/tuple-24.c: New test. * gcc.target/riscv/rvv/base/tuple-25.c: New test. * gcc.target/riscv/rvv/base/tuple-26.c: New test. * gcc.target/riscv/rvv/base/tuple-27.c: New test. * gcc.target/riscv/rvv/base/tuple-3.c: New test. * gcc.target/riscv/rvv/base/tuple-4.c: New test. * gcc.target/riscv/rvv/base/tuple-5.c: New test. * gcc.target/riscv/rvv/base/tuple-6.c: New test. * gcc.target/riscv/rvv/base/tuple-7.c: New test. * gcc.target/riscv/rvv/base/tuple-8.c: New test. * gcc.target/riscv/rvv/base/tuple-9.c: New test. * gcc.target/riscv/rvv/base/user-10.c: New test. * gcc.target/riscv/rvv/base/user-11.c: New test. * gcc.target/riscv/rvv/base/user-12.c: New test. * gcc.target/riscv/rvv/base/user-13.c: New test. * gcc.target/riscv/rvv/base/user-14.c: New test. * gcc.target/riscv/rvv/base/user-15.c: New test. * gcc.target/riscv/rvv/base/user-7.c: New test. * gcc.target/riscv/rvv/base/user-8.c: New test. * gcc.target/riscv/rvv/base/user-9.c: New test. Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
2023-05-03Speedup cse_insnRichard Biener1-24/+27
When cse_insn prunes src{,_folded,_eqv_here,_related} with the equivalence set in the *_same_value chain it also searches for an equivalence to the destination of the instruction with /* This is the same as the destination of the insns, we want to prefer it. Copy it to src_related. The code below will then give it a negative cost. */ if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest)) src_related = p->exp; this picks up the last such equivalence and in particular any later duplicate will be pruned by the preceeding else if (src_related && GET_CODE (src_related) == code && rtx_equal_p (src_related, p->exp)) src_related = 0; first. This wastes cycles doing extra rtx_equal_p checks. The following instead searches for the first destination equivalence separately in this loop and delays using src_related for it until we are about to process that, avoiding another redundant rtx_equal_p check. I've came here because of a testcase with very large equivalence lists and compile-time of cse_insn. The patch below doesn't speed it up significantly since there's no equivalence on the destination. In theory this opens the possibility to track dest_related separately, avoiding the implicit pruning of any previous value in src_related. As is the change should be a no-op for code generation. * cse.cc (cse_insn): Track an equivalence to the destination separately and delay using src_related for it.
2023-05-03Improve RTL CSE hash table hash usageRichard Biener1-14/+23
The RTL CSE hash table has a fixed number of buckets (32) each with a linked list of entries with the same hash value. The actual hash values are computed using hash_rtx which uses adds for mixing and adds the rtx CODE as CODE << 7 (apart from some exceptions such as MEM). The unsigned int typed hash value is then simply truncated for the actual lookup into the fixed size table which means that usually CODE is simply lost. The following improves this truncation by first mixing in more bits using xor. It does not change the actual hash function since that's used outside of CSE as well. An alternative would be to bump the fixed number of buckets, say to 256 which would retain the LSB of CODE or to 8192 which can capture all 6 bits required for the last CODE. As the comment in CSE says, there's invalidate_memory and flush_hash_table done possibly frequently and those at least need to walk all slots, so when the hash table is mostly empty enlarging it will be a loss. Still there should be more regular lookups by hash, so less collisions should pay off as well. Without enlarging the table a better hash function is unlikely going to make a big difference, simple statistics on the number of collisions at insertion time shows a reduction of around 10%. Bumping HASH_SHIFT by 1 improves that to 30% at the expense of reducing the average table fill by 10% (all of this stats from looking just at fold-const.i at -O2). Increasing HASH_SHIFT more leaves the table even more sparse likely showing that hash_rtx uses add for mixing which is quite bad. Bumping HASH_SHIFT by 2 removes 90% of all collisions. Experimenting with using inchash instead of adds for the mixing does not improve things when looking at the HASH_SHIFT bumped by 2 numbers. * cse.cc (HASH): Turn into inline function and mix in another HASH_SHIFT bits. (SAFE_HASH): Likewise.
2023-05-03aarch64: PR target/99195 annotate HADDSUB patterns for vec-concat with zeroKyrylo Tkachov2-7/+11
Further straightforward patch for the various halving intrinsics with or without rounding, plus tests. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_<sur>h<addsub><mode>): Rename to... (aarch64_<sur>h<addsub><mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add tests for halving and rounding add/sub intrinsics.
2023-05-03aarch64: PR target/99195 annotate simple floating-point patterns for ↵Kyrylo Tkachov3-9/+92
vec-concat with zero Continuing the, almost mechanical, series this patch adds annotation for some of the simple floating-point patterns we have, and adds testing to ensure that redundant zeroing instructions are eliminated. Bootstrapped and tested on aarch64-none-linux-gnu and also aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (add<mode>3): Rename to... (add<mode>3<vczle><vczbe>): ... This. (sub<mode>3): Rename to... (sub<mode>3<vczle><vczbe>): ... This. (mul<mode>3): Rename to... (mul<mode>3<vczle><vczbe>): ... This. (*div<mode>3): Rename to... (*div<mode>3<vczle><vczbe>): ... This. (neg<mode>2): Rename to... (neg<mode>2<vczle><vczbe>): ... This. (abs<mode>2): Rename to... (abs<mode>2<vczle><vczbe>): ... This. (<frint_pattern><mode>2): Rename to... (<frint_pattern><mode>2<vczle><vczbe>): ... This. (<fmaxmin><mode>3): Rename to... (<fmaxmin><mode>3<vczle><vczbe>): ... This. (*sqrt<mode>2): Rename to... (*sqrt<mode>2<vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add testing for some unary and binary floating-point ops. * gcc.target/aarch64/simd/pr99195_2.c: New test.
2023-05-03Docs: Add vector register constarint for asm operandsKito Cheng1-0/+9
`vr`, `vm` and `vd` constarint for vector register constarint, those 3 constarint has implemented on LLVM as well. gcc/ChangeLog: * doc/md.texi (RISC-V): Add vr, vm, vd constarint.
2023-05-03clang warning: warning: private field 'm_gc' is not used ↵Martin Liska2-2/+0
[-Wunused-private-field] PR tree-optimization/109693 gcc/ChangeLog: * value-range-storage.cc (vrange_allocator::vrange_allocator): Remove unused field. * value-range-storage.h: Likewise.
2023-05-03c++: Fix up VEC_INIT_EXPR gimplification after r12-7069Jakub Jelinek1-9/+9
During patch backporting, I've noticed that while most cp_walk_tree calls with cp_fold_r callback callers were changed from &pset to cp_fold_data &data, the VEC_INIT_EXPR gimplifications has not, so it still passes just address of a hash_set<tree> and so if during the folding we ever touch data->flags, we use uninitialized data there. The following patch changes it to do the same thing as cp_fold_function because the VEC_INIT_EXPR gimplifications will happen on function bodies only. 2023-05-03 Jakub Jelinek <jakub@redhat.com> * cp-gimplify.cc (cp_fold_data): Move definition earlier. (cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false constructed data rather than &pset to cp_walk_tree with cp_fold_r.
2023-05-03c++: fix TTP level reduction cacheJason Merrill2-2/+8
We try to cache the result of reduce_template_parm_level so that when we reduce the same parm multiple times we get the same result, but this wasn't working for template template parms because in that case TYPE is a TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same level mismatch that we're trying to adjust for. So in that case compare the template parms of the template template parms instead. The result can be seen in nontype12.C, where we previously gave three duplicate errors on line 7 and now give only one because subsequent substitutions use the cache. gcc/cp/ChangeLog: * pt.cc (reduce_template_parm_level): Fix comparison of template template parm to cached version. gcc/testsuite/ChangeLog: * g++.dg/template/nontype12.C: Check for duplicate error.
2023-05-03Daily bump.GCC Administrator4-1/+194
2023-05-02c++: simplify member template substitutionJason Merrill1-28/+10
I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X##5; \ template <class U> struct X##6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
2023-05-02PHIOPT: small refactoring of match_simplify_replacement.Andrew Pinski1-33/+24
When I added diamond shaped form bb to match_simplify_replacement, I copied the code to move the statement rather than factoring it out to a new function. This does the refactoring to a new function to avoid the duplicated code. It will make adding support for having two statements to move easier (the second statement will only be a conversion). OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (move_stmt): New function. (match_simplify_replacement): Use move_stmt instead of the inlined version.
2023-05-02MATCH: Port CLRSB part of builtin_zero_patternAndrew Pinski1-0/+8
This ports the clrsb builtin part of builtin_zero_pattern to match.pd. A simple pattern to port. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New pattern.
2023-05-02tree-optimization: [PR109702] MATCH: Fix a ? func(a) : N patternsAndrew Pinski2-8/+78
I accidently messed up these patterns so the comparison against 0 and the arguments was not matching up when they need to be. I committed this as obvious after a bootstrap/test on x86_64-linux-gnu PR tree-optimization/109702 gcc/ChangeLog: * match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-25b.c: New test.
2023-05-02target: [PR109657] (a ? -1 : 0) | b could be optimized better for aarch64Andrew Pinski2-0/+47
There is no canonical form for this case defined. So the aarch64 backend needs a pattern to match both of these forms. The forms are: (set (reg/i:SI 0 x0) (if_then_else:SI (eq (reg:CC 66 cc) (const_int 0 [0])) (reg:SI 97) (const_int -1 [0xffffffffffffffff]))) and (set (reg/i:SI 0 x0) (ior:SI (neg:SI (ne:SI (reg:CC 66 cc) (const_int 0 [0]))) (reg:SI 102))) Currently the aarch64 backend matches the first form so this patch adds a insn_and_split to match the second form and convert it to the first form. OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions PR target/109657 gcc/ChangeLog: * config/aarch64/aarch64.md (*cmov<mode>_insn_m1): New insn_and_split pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/csinv-2.c: New test.
2023-05-02c++: less invalidate_class_lookup_cacheJason Merrill1-3/+0
In the testcase below, we push_to_top_level to instantiate f and g, and they can both use the previous_class_level cache from instantiating A<int>. Wiping the cache in pop_from_top_level is not helpful; we'll do that in pushclass if needed. template <class T> struct A { int i; void f() { i = 42; } void g() { i = 24; } }; int main() { A<int> a; a.f(); a.g(); } gcc/cp/ChangeLog: * name-lookup.cc (pop_from_top_level): Don't invalidate_class_lookup_cache.
2023-05-02c++: look for empty base at specific offset [PR109678]Jason Merrill3-5/+25
While looking at the empty base handling for 109678, it occurred to me that we ought to be able to look for an empty base at a specific offset, not just in general. PR c++/109678 gcc/cp/ChangeLog: * cp-tree.h (lookup_base): Add offset parm. * constexpr.cc (cxx_fold_indirect_ref_1): Pass it. * search.cc (struct lookup_base_data_s): Add offset. (dfs_lookup_base): Handle it. (lookup_base): Pass it.
2023-05-02c++: std::variant slow to compile [PR109678]Jason Merrill2-10/+60
Here, when dealing with a class with a complex subobject structure, we would try and fail to find the relevant FIELD_DECL for an empty base before giving up. And we would do this at each level, in a combinatorially problematic way. Instead, we should check for an empty base first. PR c++/109678 gcc/cp/ChangeLog: * constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/variant1.C: New test.
2023-05-02RISC-V: Table A.6 conformance testsPatrick O'Neill28-0/+360
These tests cover basic cases to ensure the atomic mappings follow the strengthened Table A.6 mappings that are compatible with Table A.7. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/testsuite/ChangeLog: * gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test. * gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test. * gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test. * gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test. * gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test. * gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test. * gcc.target/riscv/amo-table-a-6-fence-1.c: New test. * gcc.target/riscv/amo-table-a-6-fence-2.c: New test. * gcc.target/riscv/amo-table-a-6-fence-3.c: New test. * gcc.target/riscv/amo-table-a-6-fence-4.c: New test. * gcc.target/riscv/amo-table-a-6-fence-5.c: New test. * gcc.target/riscv/amo-table-a-6-load-1.c: New test. * gcc.target/riscv/amo-table-a-6-load-2.c: New test. * gcc.target/riscv/amo-table-a-6-load-3.c: New test. * gcc.target/riscv/amo-table-a-6-store-1.c: New test. * gcc.target/riscv/amo-table-a-6-store-2.c: New test. * gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test. * gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test. * gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test. * gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test. * gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test. * gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Weaken atomic loadsPatrick O'Neill1-2/+26
This change brings atomic loads in line with table A.6 of the ISA manual. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/sync.md (atomic_load<mode>): Implement atomic load mapping. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Weaken mem_thread_fencePatrick O'Neill1-3/+13
This change brings atomic fences in line with table A.6 of the ISA manual. Relax mem_thread_fence according to the memmodel given. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/sync.md (mem_thread_fence_1): Change fence depending on the given memory model. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Weaken LR/SC pairsPatrick O'Neill3-47/+114
Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs as needed. Atomic compare and exchange ops provide success and failure memory models. C++17 and later place no restrictions on the relative strength of each model, so ensure we cover both by using a model that enforces the ordering of both given models. This change brings LR/SC ops in line with table A.6 of the ISA manual. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_union_memmodels): Expose riscv_union_memmodels function to sync.md. * config/riscv/riscv.cc (riscv_union_memmodels): Add function to get the union of two memmodels in sync.md. (riscv_print_operand): Add %I and %J flags that output the optimal LR/SC flag bits for a given memory model. * config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl bits on SC op and replace with optimized %I, %J flags. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Eliminate AMO op fencesPatrick O'Neill2-17/+11
Atomic operations with the appropriate bits set already enfore release semantics. Remove unnecessary release fences from atomic ops. This change brings AMO ops in line with table A.6 of the ISA manual. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_memmodel_needs_amo_release): Change function name. (riscv_print_operand): Remove unneeded %F case. * config/riscv/sync.md: Remove unneeded fences. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Strengthen atomic storesPatrick O'Neill2-3/+27
This change makes atomic stores strictly stronger than table A.6 of the ISA manual. This mapping makes the overall patchset compatible with table A.7 as well. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> PR target/89835 gcc/ChangeLog: * config/riscv/sync.md (atomic_store<mode>): Use simple store instruction in combination with fence(s). gcc/testsuite/ChangeLog: * gcc.target/riscv/pr89835.c: New test. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Add AMO release bitsPatrick O'Neill1-1/+6
This patch sets the relevant .rl bits on amo operations. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Change behavior of %A to include release bits. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Enforce atomic compare_exchange SEQ_CSTPatrick O'Neill1-2/+9
This patch enforces SEQ_CST for atomic compare_exchange ops. Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs recommended by table A.6 of the ISA manual. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/sync.md (atomic_cas_value_strong<mode>): Change FENCE/LR.aq/SC.aq into sequentially consistent LR.aqrl/SC.rl pair. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Enforce subword atomic LR/SC SEQ_CSTPatrick O'Neill1-4/+4
Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs recommended by table A.6 of the ISA manual. 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/sync.md: Change LR.aq/SC.rl pairs into sequentially consistent LR.aqrl/SC.rl pairs. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Eliminate SYNC memory modelsPatrick O'Neill1-8/+3
Remove references to MEMMODEL_SYNC_* models by converting via memmodel_base(). 2023-04-27 Patrick O'Neill <patrick@rivosinc.com> gcc/ChangeLog: * config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and sanitize memmodel input with memmodel_base. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: Name newly added flags in changelogPatrick O'Neill1-2/+4
This patch fixes the changelog to explicitly name the added command line flags introduced in this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html 2023-05-01 Patrick O'Neill <patrick@rivosinc.com> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-05-02RISC-V: ICE for vlmul_ext_v intrinsic APIYanzhang Wang2-1/+16
PR target/109617 gcc/ChangeLog: * config/riscv/vector-iterators.md: Support VNx2HI and VNX4DI when MIN_VLEN >= 128. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-1.c: New test. Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com> Co-authored-by: Pan Li <pan2.li@intel.com> Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
2023-05-02RISC-V: fix build issue with gcc 4.9.xRomain Naour1-2/+2
GCC should still build with GCC 4.8.3 or newer [1] using C++03 by default. But a recent change in RISC-V port introduced a C++11 feature "std::log2" [2]. Use log2 from the C header, without the namespace [3]. [1] https://gcc.gnu.org/install/prerequisites.html [2] https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=7caa1ae5e451e780fbc4746a54e3f19d4f4304dc [3] https://stackoverflow.com/questions/26733413/error-log2-is-not-a-member-of-std Fixes: https://gitlab.com/buildroot.org/toolchains-builder/-/jobs/4202276589 gcc/ChangeLog: * config/riscv/genrvv-type-indexer.cc: Use log2 from the C header, without the namespace. Signed-off-by: Romain Naour <romain.naour@gmail.com>
2023-05-02c++: Add testcase for already fixed PR [PR109506]Patrick Palka1-0/+22
The PR109666 fix r14-386-g07c52d1eec967 incidentally also fixes this PR. PR c++/109506 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-template26.C: New test.
2023-05-02docs: port documentation of VRP paramsMartin Liska1-6/+9
gcc/ChangeLog: * doc/invoke.texi: Update documentation based on param.opt file.
2023-05-02tree-optimization/109672 - properly check emulated plus during vectRichard Biener1-12/+9
The following refactors the check for emulated vector support for the cases of plus, minus and negate. In the PR we end up with a SImode plus, supported by the target but emulated and in this context fail to verify we are dealing with exactly word_mode. PR tree-optimization/109672 * tree-vect-stmts.cc (vectorizable_operation): For plus, minus and negate always check the vector mode is word mode.
2023-05-02[i386] Fix testcases for emulated scatterRichard Biener4-5/+8
The following adjusts testcases where the pr88531 fail with -m32 because we do not consider MMX size vectorization there and the pr89618 runs into load/store cost differences with -m32. * gcc.target/i386/pr88531-2a.c: Skip scanning for ia32. * gcc.target/i386/pr88531-2b.c: Likewise. * gcc.target/i386/pr88531-2c.c: Likewise. * gcc.target/i386/pr89618-2.c: Likewise. Disable AVX512.
2023-05-02Daily bump.GCC Administrator5-1/+427
2023-05-01ubsan: ubsan_maybe_instrument_array_ref tweakMarek Polacek1-6/+2
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613687.html> we discussed that the copy_node in ubsan_maybe_instrument_array_ref is redundant, but also that it'd be best to postpone the optimization to GCC 14. So I'm making that change now. gcc/c-family/ChangeLog: * c-ubsan.cc (ubsan_maybe_instrument_array_ref): Don't copy_node.
2023-05-01c++: array DMI and member fn [PR109666]Jason Merrill5-41/+58
Here it turns out I also needed to adjust cfun when stepping out of the member function to instantiate the DMI. But instead of adding that tweak, let's unify with instantiate_body and just push_to_top_level instead of trying to do the minimum subset of it. There was no measurable change in compile time on stdc++.h. This should also resolve 109506 without yet another tweak. PR c++/109666 gcc/cp/ChangeLog: * name-lookup.cc (maybe_push_to_top_level) (maybe_pop_from_top_level): Split out... * pt.cc (instantiate_body): ...from here. * init.cc (maybe_instantiate_nsdmi_init): Use them. * name-lookup.h: Declare them.. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/nsdmi-array2.C: New test.
2023-05-01PHIOPT: Update comment about what the pass now doesAndrew Pinski1-31/+36
I noticed I didn't update the comment about how the pass works after I initially added match_simplify_replacement. Anyways this updates the comment to be the current state of the pass. OK? gcc/ChangeLog: * tree-ssa-phiopt.cc: Update comment about how the transformation are implemented.
2023-05-01Convert xstormy16 to LRAJeff Law1-3/+0
This patch converts the xstormy16 patch to LRA. It introduces a code quality regression in the shiftsi testcase, but it also fixes numerous aborts/errors. IMHO it's a good tradeoff. gcc/ * config/stormy16/stormy16.cc (TARGET_LRA_P): Remove defintion.
2023-05-01Enable LRA on several portsJeff Law6-17/+0
Spurred by Segher's RFC, I went ahead and tested several ports with LRA enabled. Not surprisingly, many failed, but a few built their full set of libraries successful and of those a few even ran their testsuites with no regressions. In fact, enabling LRA fixes a small number of failures on the iq2000 port. This patch converts the ports which built their libraries and have test results that are as good as or better than without LRA. There may be minor code quality regressions or there may be minor code quality improvements -- I'm leaving that for the port maintainers to own going forward. gcc/ * config/cris/cris.cc (TARGET_LRA_P): Remove. * config/epiphany/epiphany.cc (TARGET_LRA_P): Remove. * config/iq2000/iq2000.cc (TARGET_LRA_P): Remove. * config/m32r/m32r.cc (TARGET_LRA_P): Remove. * config/microblaze/microblaze.cc (TARGET_LRA_P): Remove. * config/mmix/mmix.cc (TARGET_LRA_P): Remove.
2023-05-01apply debug-remap to file names in .su filesRasmus Villemoes3-2/+8
The .su files generated with -fstack-usage are arguably debug info. In order to make builds more reproducible, apply the same remapping logic to the recorded file names as for when producing the debug info embedded in the object files. To this end, teach print_decl_identifier() a new PRINT_DECL_REMAP_DEBUG flag and use that from output_stack_usage_1(). gcc/ChangeLog: * print-tree.h (PRINT_DECL_REMAP_DEBUG): New flag. * print-tree.cc (print_decl_identifier): Implement it. * toplev.cc (output_stack_usage_1): Use it.
2023-05-01Remove unused friends in int_range<>.Aldy Hernandez1-5/+0
gcc/ChangeLog: * value-range.h (class int_range): Remove gt_ggc_mx and gt_pch_nx friends.
2023-05-01Inline irange::set_nonzero.Aldy Hernandez1-2/+18
irange::set_nonzero is used everywhere and benefits immensely from inlining. gcc/ChangeLog: * value-range.h (irange::set_nonzero): Inline.
2023-05-01Cleanup irange::set.Aldy Hernandez5-135/+59
Now that anti-ranges are no more and iranges contain wide_ints instead of trees, various cleanups are possible. This is one of a handful of patches improving the performance of irange::set() which is not on a hot path, but quite sensitive because it is so pervasive. gcc/ChangeLog: * gimple-range-op.cc (cfn_ffs::fold_range): Use the correct precision. * gimple-ssa-warn-alloca.cc (alloca_call_type): Use <2> for invalid_range, as it is an inverse range. * tree-vrp.cc (find_case_label_range): Avoid trees. * value-range.cc (irange::irange_set): Delete. (irange::irange_set_1bit_anti_range): Delete. (irange::irange_set_anti_range): Delete. (irange::set): Cleanup. * value-range.h (class irange): Remove irange_set, irange_set_anti_range, irange_set_1bit_anti_range. (irange::set_undefined): Remove set to m_type.
2023-05-01Convert internal representation of irange to wide_ints.Aldy Hernandez6-213/+153
gcc/ChangeLog: * range-op.cc (update_known_bitmask): Adjust for irange containing wide_ints internally. * tree-ssanames.cc (set_nonzero_bits): Same. * tree-ssanames.h (set_nonzero_bits): Same. * value-range-storage.cc (irange_storage::set_irange): Same. (irange_storage::get_irange): Same. * value-range.cc (irange::operator=): Same. (irange::irange_set): Same. (irange::irange_set_1bit_anti_range): Same. (irange::irange_set_anti_range): Same. (irange::set): Same. (irange::verify_range): Same. (irange::contains_p): Same. (irange::irange_single_pair_union): Same. (irange::union_): Same. (irange::irange_contains_p): Same. (irange::intersect): Same. (irange::invert): Same. (irange::set_range_from_nonzero_bits): Same. (irange::set_nonzero_bits): Same. (mask_to_wi): Same. (irange::intersect_nonzero_bits): Same. (irange::union_nonzero_bits): Same. (gt_ggc_mx): Same. (gt_pch_nx): Same. (tree_range): Same. (range_tests_strict_enum): Same. (range_tests_misc): Same. (range_tests_nonzero_bits): Same. * value-range.h (irange::type): Same. (irange::varying_compatible_p): Same. (irange::irange): Same. (int_range::int_range): Same. (irange::set_undefined): Same. (irange::set_varying): Same. (irange::lower_bound): Same. (irange::upper_bound): Same.
2023-05-01Rewrite bounds_of_var_in_loop() to use ranges.Aldy Hernandez3-249/+117
Little by little, bounds_of_var_in_loop() has grown into an unmaintainable mess. This patch rewrites the code to use the relevant APIs as well as refactor it to make it more readable. gcc/ChangeLog: * gimple-range-fold.cc (tree_lower_bound): Delete. (tree_upper_bound): Delete. (vrp_val_max): Delete. (vrp_val_min): Delete. (fold_using_range::range_of_ssa_name_with_loop_info): Call range_of_var_in_loop. * vr-values.cc (valid_value_p): Delete. (fix_overflow): Delete. (get_scev_info): New. (bounds_of_var_in_loop): Refactor into... (induction_variable_may_overflow_p): ...this, (range_from_loop_direction): ...and this, (range_of_var_in_loop): ...and this. * vr-values.h (bounds_of_var_in_loop): Delete. (range_of_var_in_loop): New.
2023-05-01Replace vrp_val* with wide_ints.Aldy Hernandez6-115/+78
This patch removes all uses of vrp_val_{min,max} in favor for a irange_val_* which are wide_int based. This will leave only one use of vrp_val_* which returns trees in range_of_ssa_name_with_loop_info() because it needs to work with non-integers (floats, etc). In a follow-up patch, this function will also be cleaned up such that vrp_val_* can be deleted. The functions min_limit and max_limit in range-op.cc are now useless as they're basically irange_val*. I didn't rename them yet to avoid churn. I'll do it in a later patch. gcc/ChangeLog: * gimple-range-fold.cc (adjust_pointer_diff_expr): Rewrite with irange_val*. (vrp_val_max): New. (vrp_val_min): New. * gimple-range-op.cc (cfn_strlen::fold_range): Use irange_val_*. * range-op.cc (max_limit): Same. (min_limit): Same. (plus_minus_ranges): Same. (operator_rshift::op1_range): Same. (operator_cast::inside_domain_p): Same. * value-range.cc (vrp_val_is_max): Delete. (vrp_val_is_min): Delete. (range_tests_misc): Use irange_val_*. * value-range.h (vrp_val_is_min): Delete. (vrp_val_is_max): Delete. (vrp_val_max): Delete. (irange_val_min): New. (vrp_val_min): Delete. (irange_val_max): New. * vr-values.cc (check_for_binary_op_overflow): Use irange_val_*.