Age | Commit message (Collapse) | Author | Files | Lines |
|
Add segment load/store intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/198
gcc/ChangeLog:
* config/riscv/riscv-vector-builtins-bases.cc (fold_fault_load):
New function.
(class vlseg): New class.
(class vsseg): Ditto.
(class vlsseg): Ditto.
(class vssseg): Ditto.
(class seg_indexed_load): Ditto.
(class seg_indexed_store): Ditto.
(class vlsegff): Ditto.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vlseg):
Ditto.
(vsseg): Ditto.
(vlsseg): Ditto.
(vssseg): Ditto.
(vluxseg): Ditto.
(vloxseg): Ditto.
(vsuxseg): Ditto.
(vsoxseg): Ditto.
(vlsegff): Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct
seg_loadstore_def): Ditto.
(struct seg_indexed_loadstore_def): Ditto.
(struct seg_fault_load_def): Ditto.
(SHAPE): Ditto.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc
(function_builder::append_nf): New function.
* config/riscv/riscv-vector-builtins.def (vfloat32m1x2_t):
Change ptr from double into float.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h: Add segment intrinsics.
* config/riscv/riscv-vsetvl.cc (fault_first_load_p): Adapt for
segment ff load.
* config/riscv/riscv.md: Add segment instructions.
* config/riscv/vector-iterators.md: Support segment intrinsics.
* config/riscv/vector.md (@pred_unit_strided_load<mode>): New
pattern.
(@pred_unit_strided_store<mode>): Ditto.
(@pred_strided_load<mode>): Ditto.
(@pred_strided_store<mode>): Ditto.
(@pred_fault_load<mode>): Ditto.
(@pred_indexed_<order>load<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>load<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>load<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>load<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>load<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>load<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>load<V64T:mode><V64I:mode>): Ditto.
(@pred_indexed_<order>store<V1T:mode><V1I:mode>): Ditto.
(@pred_indexed_<order>store<V2T:mode><V2I:mode>): Ditto.
(@pred_indexed_<order>store<V4T:mode><V4I:mode>): Ditto.
(@pred_indexed_<order>store<V8T:mode><V8I:mode>): Ditto.
(@pred_indexed_<order>store<V16T:mode><V16I:mode>): Ditto.
(@pred_indexed_<order>store<V32T:mode><V32I:mode>): Ditto.
(@pred_indexed_<order>store<V64T:mode><V64I:mode>): Ditto.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc (valid_type): Adapt for
tuple type support.
(inttype): Ditto.
(floattype): Ditto.
(main): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vset): Add
tuple type vset.
(vget): Add tuple type vget.
* config/riscv/riscv-vector-builtins-types.def
(DEF_RVV_TUPLE_OPS): New macro.
(vint8mf8x2_t): Ditto.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_OPS):
Ditto.
(DEF_RVV_TYPE_INDEX): Ditto.
(rvv_arg_type_info::get_tuple_subpart_type): New function.
(DEF_RVV_TUPLE_TYPE): New macro.
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TYPE_INDEX):
Adapt for tuple vget/vset support.
(vint8mf4_t): Ditto.
(vuint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vuint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vuint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vuint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vuint8m4_t): Ditto.
(vint8m8_t): Ditto.
(vuint8m8_t): Ditto.
(vint16mf4_t): Ditto.
(vuint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vuint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vuint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vuint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vuint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vuint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vuint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vuint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vuint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vuint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vuint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vuint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
(tuple_subpart): Add tuple subpart base type.
* config/riscv/riscv-vector-builtins.h (struct
rvv_arg_type_info): Ditto.
(tuple_type_field): New function.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
gcc/ChangeLog:
* config/riscv/riscv-modes.def (RVV_TUPLE_MODES): New macro.
(RVV_TUPLE_PARTIAL_MODES): Ditto.
* config/riscv/riscv-protos.h (riscv_v_ext_tuple_mode_p): New
function.
(get_nf): Ditto.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-v.cc (ENTRY): New macro.
(TUPLE_ENTRY): Ditto.
(get_nf): New function.
(get_subpart_mode): Ditto.
(get_tuple_mode): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_TUPLE_TYPE):
New macro.
(register_tuple_type): New function
* config/riscv/riscv-vector-builtins.def (DEF_RVV_TUPLE_TYPE):
New macro.
(vint8mf8x2_t): New macro.
(vuint8mf8x2_t): Ditto.
(vint8mf8x3_t): Ditto.
(vuint8mf8x3_t): Ditto.
(vint8mf8x4_t): Ditto.
(vuint8mf8x4_t): Ditto.
(vint8mf8x5_t): Ditto.
(vuint8mf8x5_t): Ditto.
(vint8mf8x6_t): Ditto.
(vuint8mf8x6_t): Ditto.
(vint8mf8x7_t): Ditto.
(vuint8mf8x7_t): Ditto.
(vint8mf8x8_t): Ditto.
(vuint8mf8x8_t): Ditto.
(vint8mf4x2_t): Ditto.
(vuint8mf4x2_t): Ditto.
(vint8mf4x3_t): Ditto.
(vuint8mf4x3_t): Ditto.
(vint8mf4x4_t): Ditto.
(vuint8mf4x4_t): Ditto.
(vint8mf4x5_t): Ditto.
(vuint8mf4x5_t): Ditto.
(vint8mf4x6_t): Ditto.
(vuint8mf4x6_t): Ditto.
(vint8mf4x7_t): Ditto.
(vuint8mf4x7_t): Ditto.
(vint8mf4x8_t): Ditto.
(vuint8mf4x8_t): Ditto.
(vint8mf2x2_t): Ditto.
(vuint8mf2x2_t): Ditto.
(vint8mf2x3_t): Ditto.
(vuint8mf2x3_t): Ditto.
(vint8mf2x4_t): Ditto.
(vuint8mf2x4_t): Ditto.
(vint8mf2x5_t): Ditto.
(vuint8mf2x5_t): Ditto.
(vint8mf2x6_t): Ditto.
(vuint8mf2x6_t): Ditto.
(vint8mf2x7_t): Ditto.
(vuint8mf2x7_t): Ditto.
(vint8mf2x8_t): Ditto.
(vuint8mf2x8_t): Ditto.
(vint8m1x2_t): Ditto.
(vuint8m1x2_t): Ditto.
(vint8m1x3_t): Ditto.
(vuint8m1x3_t): Ditto.
(vint8m1x4_t): Ditto.
(vuint8m1x4_t): Ditto.
(vint8m1x5_t): Ditto.
(vuint8m1x5_t): Ditto.
(vint8m1x6_t): Ditto.
(vuint8m1x6_t): Ditto.
(vint8m1x7_t): Ditto.
(vuint8m1x7_t): Ditto.
(vint8m1x8_t): Ditto.
(vuint8m1x8_t): Ditto.
(vint8m2x2_t): Ditto.
(vuint8m2x2_t): Ditto.
(vint8m2x3_t): Ditto.
(vuint8m2x3_t): Ditto.
(vint8m2x4_t): Ditto.
(vuint8m2x4_t): Ditto.
(vint8m4x2_t): Ditto.
(vuint8m4x2_t): Ditto.
(vint16mf4x2_t): Ditto.
(vuint16mf4x2_t): Ditto.
(vint16mf4x3_t): Ditto.
(vuint16mf4x3_t): Ditto.
(vint16mf4x4_t): Ditto.
(vuint16mf4x4_t): Ditto.
(vint16mf4x5_t): Ditto.
(vuint16mf4x5_t): Ditto.
(vint16mf4x6_t): Ditto.
(vuint16mf4x6_t): Ditto.
(vint16mf4x7_t): Ditto.
(vuint16mf4x7_t): Ditto.
(vint16mf4x8_t): Ditto.
(vuint16mf4x8_t): Ditto.
(vint16mf2x2_t): Ditto.
(vuint16mf2x2_t): Ditto.
(vint16mf2x3_t): Ditto.
(vuint16mf2x3_t): Ditto.
(vint16mf2x4_t): Ditto.
(vuint16mf2x4_t): Ditto.
(vint16mf2x5_t): Ditto.
(vuint16mf2x5_t): Ditto.
(vint16mf2x6_t): Ditto.
(vuint16mf2x6_t): Ditto.
(vint16mf2x7_t): Ditto.
(vuint16mf2x7_t): Ditto.
(vint16mf2x8_t): Ditto.
(vuint16mf2x8_t): Ditto.
(vint16m1x2_t): Ditto.
(vuint16m1x2_t): Ditto.
(vint16m1x3_t): Ditto.
(vuint16m1x3_t): Ditto.
(vint16m1x4_t): Ditto.
(vuint16m1x4_t): Ditto.
(vint16m1x5_t): Ditto.
(vuint16m1x5_t): Ditto.
(vint16m1x6_t): Ditto.
(vuint16m1x6_t): Ditto.
(vint16m1x7_t): Ditto.
(vuint16m1x7_t): Ditto.
(vint16m1x8_t): Ditto.
(vuint16m1x8_t): Ditto.
(vint16m2x2_t): Ditto.
(vuint16m2x2_t): Ditto.
(vint16m2x3_t): Ditto.
(vuint16m2x3_t): Ditto.
(vint16m2x4_t): Ditto.
(vuint16m2x4_t): Ditto.
(vint16m4x2_t): Ditto.
(vuint16m4x2_t): Ditto.
(vint32mf2x2_t): Ditto.
(vuint32mf2x2_t): Ditto.
(vint32mf2x3_t): Ditto.
(vuint32mf2x3_t): Ditto.
(vint32mf2x4_t): Ditto.
(vuint32mf2x4_t): Ditto.
(vint32mf2x5_t): Ditto.
(vuint32mf2x5_t): Ditto.
(vint32mf2x6_t): Ditto.
(vuint32mf2x6_t): Ditto.
(vint32mf2x7_t): Ditto.
(vuint32mf2x7_t): Ditto.
(vint32mf2x8_t): Ditto.
(vuint32mf2x8_t): Ditto.
(vint32m1x2_t): Ditto.
(vuint32m1x2_t): Ditto.
(vint32m1x3_t): Ditto.
(vuint32m1x3_t): Ditto.
(vint32m1x4_t): Ditto.
(vuint32m1x4_t): Ditto.
(vint32m1x5_t): Ditto.
(vuint32m1x5_t): Ditto.
(vint32m1x6_t): Ditto.
(vuint32m1x6_t): Ditto.
(vint32m1x7_t): Ditto.
(vuint32m1x7_t): Ditto.
(vint32m1x8_t): Ditto.
(vuint32m1x8_t): Ditto.
(vint32m2x2_t): Ditto.
(vuint32m2x2_t): Ditto.
(vint32m2x3_t): Ditto.
(vuint32m2x3_t): Ditto.
(vint32m2x4_t): Ditto.
(vuint32m2x4_t): Ditto.
(vint32m4x2_t): Ditto.
(vuint32m4x2_t): Ditto.
(vint64m1x2_t): Ditto.
(vuint64m1x2_t): Ditto.
(vint64m1x3_t): Ditto.
(vuint64m1x3_t): Ditto.
(vint64m1x4_t): Ditto.
(vuint64m1x4_t): Ditto.
(vint64m1x5_t): Ditto.
(vuint64m1x5_t): Ditto.
(vint64m1x6_t): Ditto.
(vuint64m1x6_t): Ditto.
(vint64m1x7_t): Ditto.
(vuint64m1x7_t): Ditto.
(vint64m1x8_t): Ditto.
(vuint64m1x8_t): Ditto.
(vint64m2x2_t): Ditto.
(vuint64m2x2_t): Ditto.
(vint64m2x3_t): Ditto.
(vuint64m2x3_t): Ditto.
(vint64m2x4_t): Ditto.
(vuint64m2x4_t): Ditto.
(vint64m4x2_t): Ditto.
(vuint64m4x2_t): Ditto.
(vfloat32mf2x2_t): Ditto.
(vfloat32mf2x3_t): Ditto.
(vfloat32mf2x4_t): Ditto.
(vfloat32mf2x5_t): Ditto.
(vfloat32mf2x6_t): Ditto.
(vfloat32mf2x7_t): Ditto.
(vfloat32mf2x8_t): Ditto.
(vfloat32m1x2_t): Ditto.
(vfloat32m1x3_t): Ditto.
(vfloat32m1x4_t): Ditto.
(vfloat32m1x5_t): Ditto.
(vfloat32m1x6_t): Ditto.
(vfloat32m1x7_t): Ditto.
(vfloat32m1x8_t): Ditto.
(vfloat32m2x2_t): Ditto.
(vfloat32m2x3_t): Ditto.
(vfloat32m2x4_t): Ditto.
(vfloat32m4x2_t): Ditto.
(vfloat64m1x2_t): Ditto.
(vfloat64m1x3_t): Ditto.
(vfloat64m1x4_t): Ditto.
(vfloat64m1x5_t): Ditto.
(vfloat64m1x6_t): Ditto.
(vfloat64m1x7_t): Ditto.
(vfloat64m1x8_t): Ditto.
(vfloat64m2x2_t): Ditto.
(vfloat64m2x3_t): Ditto.
(vfloat64m2x4_t): Ditto.
(vfloat64m4x2_t): Ditto.
* config/riscv/riscv-vector-builtins.h (DEF_RVV_TUPLE_TYPE):
Ditto.
* config/riscv/riscv-vector-switch.def (TUPLE_ENTRY): Ditto.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): New
function.
(TUPLE_ENTRY): Ditto.
(riscv_v_ext_mode_p): New function.
(riscv_v_adjust_nunits): Add tuple mode adjustment.
(riscv_classify_address): Ditto.
(riscv_binary_cost): Ditto.
(riscv_rtx_costs): Ditto.
(riscv_secondary_memory_needed): Ditto.
(riscv_hard_regno_nregs): Ditto.
(riscv_hard_regno_mode_ok): Ditto.
(riscv_vector_mode_supported_p): Ditto.
(riscv_regmode_natural_size): Ditto.
(riscv_array_mode): New function.
(TARGET_ARRAY_MODE): New target hook.
* config/riscv/riscv.md: Add tuple modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md (mov<mode>): Add tuple modes data
movement.
(*mov<VT:mode>_<P:mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/abi-10.c: New test.
* gcc.target/riscv/rvv/base/abi-11.c: New test.
* gcc.target/riscv/rvv/base/abi-12.c: New test.
* gcc.target/riscv/rvv/base/abi-13.c: New test.
* gcc.target/riscv/rvv/base/abi-14.c: New test.
* gcc.target/riscv/rvv/base/abi-15.c: New test.
* gcc.target/riscv/rvv/base/abi-16.c: New test.
* gcc.target/riscv/rvv/base/abi-8.c: New test.
* gcc.target/riscv/rvv/base/abi-9.c: New test.
* gcc.target/riscv/rvv/base/tuple-1.c: New test.
* gcc.target/riscv/rvv/base/tuple-10.c: New test.
* gcc.target/riscv/rvv/base/tuple-11.c: New test.
* gcc.target/riscv/rvv/base/tuple-12.c: New test.
* gcc.target/riscv/rvv/base/tuple-13.c: New test.
* gcc.target/riscv/rvv/base/tuple-14.c: New test.
* gcc.target/riscv/rvv/base/tuple-15.c: New test.
* gcc.target/riscv/rvv/base/tuple-16.c: New test.
* gcc.target/riscv/rvv/base/tuple-17.c: New test.
* gcc.target/riscv/rvv/base/tuple-18.c: New test.
* gcc.target/riscv/rvv/base/tuple-19.c: New test.
* gcc.target/riscv/rvv/base/tuple-2.c: New test.
* gcc.target/riscv/rvv/base/tuple-20.c: New test.
* gcc.target/riscv/rvv/base/tuple-21.c: New test.
* gcc.target/riscv/rvv/base/tuple-22.c: New test.
* gcc.target/riscv/rvv/base/tuple-23.c: New test.
* gcc.target/riscv/rvv/base/tuple-24.c: New test.
* gcc.target/riscv/rvv/base/tuple-25.c: New test.
* gcc.target/riscv/rvv/base/tuple-26.c: New test.
* gcc.target/riscv/rvv/base/tuple-27.c: New test.
* gcc.target/riscv/rvv/base/tuple-3.c: New test.
* gcc.target/riscv/rvv/base/tuple-4.c: New test.
* gcc.target/riscv/rvv/base/tuple-5.c: New test.
* gcc.target/riscv/rvv/base/tuple-6.c: New test.
* gcc.target/riscv/rvv/base/tuple-7.c: New test.
* gcc.target/riscv/rvv/base/tuple-8.c: New test.
* gcc.target/riscv/rvv/base/tuple-9.c: New test.
* gcc.target/riscv/rvv/base/user-10.c: New test.
* gcc.target/riscv/rvv/base/user-11.c: New test.
* gcc.target/riscv/rvv/base/user-12.c: New test.
* gcc.target/riscv/rvv/base/user-13.c: New test.
* gcc.target/riscv/rvv/base/user-14.c: New test.
* gcc.target/riscv/rvv/base/user-15.c: New test.
* gcc.target/riscv/rvv/base/user-7.c: New test.
* gcc.target/riscv/rvv/base/user-8.c: New test.
* gcc.target/riscv/rvv/base/user-9.c: New test.
Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
|
|
When cse_insn prunes src{,_folded,_eqv_here,_related} with the
equivalence set in the *_same_value chain it also searches for
an equivalence to the destination of the instruction with
/* This is the same as the destination of the insns, we want
to prefer it. Copy it to src_related. The code below will
then give it a negative cost. */
if (GET_CODE (dest) == code && rtx_equal_p (p->exp, dest))
src_related = p->exp;
this picks up the last such equivalence and in particular any
later duplicate will be pruned by the preceeding
else if (src_related && GET_CODE (src_related) == code
&& rtx_equal_p (src_related, p->exp))
src_related = 0;
first. This wastes cycles doing extra rtx_equal_p checks. The
following instead searches for the first destination equivalence
separately in this loop and delays using src_related for it until
we are about to process that, avoiding another redundant rtx_equal_p
check.
I've came here because of a testcase with very large equivalence
lists and compile-time of cse_insn. The patch below doesn't speed
it up significantly since there's no equivalence on the destination.
In theory this opens the possibility to track dest_related
separately, avoiding the implicit pruning of any previous
value in src_related. As is the change should be a no-op for
code generation.
* cse.cc (cse_insn): Track an equivalence to the destination
separately and delay using src_related for it.
|
|
The RTL CSE hash table has a fixed number of buckets (32) each
with a linked list of entries with the same hash value. The
actual hash values are computed using hash_rtx which uses adds
for mixing and adds the rtx CODE as CODE << 7 (apart from some
exceptions such as MEM). The unsigned int typed hash value
is then simply truncated for the actual lookup into the fixed
size table which means that usually CODE is simply lost.
The following improves this truncation by first mixing in more
bits using xor. It does not change the actual hash function
since that's used outside of CSE as well.
An alternative would be to bump the fixed number of buckets,
say to 256 which would retain the LSB of CODE or to 8192 which
can capture all 6 bits required for the last CODE.
As the comment in CSE says, there's invalidate_memory and
flush_hash_table done possibly frequently and those at least
need to walk all slots, so when the hash table is mostly empty
enlarging it will be a loss. Still there should be more
regular lookups by hash, so less collisions should pay off
as well.
Without enlarging the table a better hash function is unlikely
going to make a big difference, simple statistics on the
number of collisions at insertion time shows a reduction of
around 10%. Bumping HASH_SHIFT by 1 improves that to 30%
at the expense of reducing the average table fill by 10%
(all of this stats from looking just at fold-const.i at -O2).
Increasing HASH_SHIFT more leaves the table even more sparse
likely showing that hash_rtx uses add for mixing which is
quite bad. Bumping HASH_SHIFT by 2 removes 90% of all
collisions.
Experimenting with using inchash instead of adds for the
mixing does not improve things when looking at the HASH_SHIFT
bumped by 2 numbers.
* cse.cc (HASH): Turn into inline function and mix
in another HASH_SHIFT bits.
(SAFE_HASH): Likewise.
|
|
Further straightforward patch for the various halving intrinsics with or without rounding, plus tests.
Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (aarch64_<sur>h<addsub><mode>): Rename to...
(aarch64_<sur>h<addsub><mode><vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add tests for halving and rounding
add/sub intrinsics.
|
|
vec-concat with zero
Continuing the, almost mechanical, series this patch adds annotation for some of the simple
floating-point patterns we have, and adds testing to ensure that redundant zeroing instructions
are eliminated.
Bootstrapped and tested on aarch64-none-linux-gnu and also aarch64_be-none-elf.
gcc/ChangeLog:
PR target/99195
* config/aarch64/aarch64-simd.md (add<mode>3): Rename to...
(add<mode>3<vczle><vczbe>): ... This.
(sub<mode>3): Rename to...
(sub<mode>3<vczle><vczbe>): ... This.
(mul<mode>3): Rename to...
(mul<mode>3<vczle><vczbe>): ... This.
(*div<mode>3): Rename to...
(*div<mode>3<vczle><vczbe>): ... This.
(neg<mode>2): Rename to...
(neg<mode>2<vczle><vczbe>): ... This.
(abs<mode>2): Rename to...
(abs<mode>2<vczle><vczbe>): ... This.
(<frint_pattern><mode>2): Rename to...
(<frint_pattern><mode>2<vczle><vczbe>): ... This.
(<fmaxmin><mode>3): Rename to...
(<fmaxmin><mode>3<vczle><vczbe>): ... This.
(*sqrt<mode>2): Rename to...
(*sqrt<mode>2<vczle><vczbe>): ... This.
gcc/testsuite/ChangeLog:
PR target/99195
* gcc.target/aarch64/simd/pr99195_1.c: Add testing for some unary
and binary floating-point ops.
* gcc.target/aarch64/simd/pr99195_2.c: New test.
|
|
`vr`, `vm` and `vd` constarint for vector register constarint, those 3
constarint has implemented on LLVM as well.
gcc/ChangeLog:
* doc/md.texi (RISC-V): Add vr, vm, vd constarint.
|
|
[-Wunused-private-field]
PR tree-optimization/109693
gcc/ChangeLog:
* value-range-storage.cc (vrange_allocator::vrange_allocator):
Remove unused field.
* value-range-storage.h: Likewise.
|
|
During patch backporting, I've noticed that while most cp_walk_tree calls
with cp_fold_r callback callers were changed from &pset to cp_fold_data
&data, the VEC_INIT_EXPR gimplifications has not, so it still passes just
address of a hash_set<tree> and so if during the folding we ever touch
data->flags, we use uninitialized data there.
The following patch changes it to do the same thing as cp_fold_function
because the VEC_INIT_EXPR gimplifications will happen on function bodies
only.
2023-05-03 Jakub Jelinek <jakub@redhat.com>
* cp-gimplify.cc (cp_fold_data): Move definition earlier.
(cp_gimplify_expr): Pass address of ff_genericize | ff_mce_false
constructed data rather than &pset to cp_walk_tree with cp_fold_r.
|
|
We try to cache the result of reduce_template_parm_level so that when we
reduce the same parm multiple times we get the same result, but this wasn't
working for template template parms because in that case TYPE is a
TEMPLATE_TEMPLATE_PARM, and so same_type_p was false because of the same
level mismatch that we're trying to adjust for. So in that case compare the
template parms of the template template parms instead.
The result can be seen in nontype12.C, where we previously gave three
duplicate errors on line 7 and now give only one because subsequent
substitutions use the cache.
gcc/cp/ChangeLog:
* pt.cc (reduce_template_parm_level): Fix comparison of
template template parm to cached version.
gcc/testsuite/ChangeLog:
* g++.dg/template/nontype12.C: Check for duplicate error.
|
|
|
|
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type. Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop. It's unlikely to make a difference on any real code, but the
simplification is also nice.
We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.
#define NC(X) \
template <class U> struct X##1; \
template <class U> struct X##2; \
template <class U> struct X##3; \
template <class U> struct X##4; \
template <class U> struct X##5; \
template <class U> struct X##6;
#define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
#define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
template <int I> struct A
{
NC3(am)
};
template <class...Ts> void sink(Ts...);
template <int...Is> void g()
{
sink(A<Is>()...);
}
template <int I> void f()
{
g<__integer_pack(I)...>();
}
int main()
{
f<1000>();
}
gcc/cp/ChangeLog:
* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
of a class template.
(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
|
|
When I added diamond shaped form bb to match_simplify_replacement,
I copied the code to move the statement rather than factoring it
out to a new function. This does the refactoring to a new function
to avoid the duplicated code. It will make adding support for having
two statements to move easier (the second statement will only be a
conversion).
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (move_stmt): New function.
(match_simplify_replacement): Use move_stmt instead
of the inlined version.
|
|
This ports the clrsb builtin part of builtin_zero_pattern
to match.pd. A simple pattern to port.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* match.pd (a != 0 ? CLRSB(a) : CST -> CLRSB(a)): New
pattern.
|
|
I accidently messed up these patterns so the comparison
against 0 and the arguments was not matching up when they
need to be.
I committed this as obvious after a bootstrap/test on x86_64-linux-gnu
PR tree-optimization/109702
gcc/ChangeLog:
* match.pd: Fix "a != 0 ? FUNC(a) : CST" patterns
for FUNC of POPCOUNT BSWAP FFS PARITY CLZ and CTZ.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-25b.c: New test.
|
|
There is no canonical form for this case defined. So the aarch64 backend needs
a pattern to match both of these forms.
The forms are:
(set (reg/i:SI 0 x0)
(if_then_else:SI (eq (reg:CC 66 cc)
(const_int 0 [0]))
(reg:SI 97)
(const_int -1 [0xffffffffffffffff])))
and
(set (reg/i:SI 0 x0)
(ior:SI (neg:SI (ne:SI (reg:CC 66 cc)
(const_int 0 [0])))
(reg:SI 102)))
Currently the aarch64 backend matches the first form so this
patch adds a insn_and_split to match the second form and
convert it to the first form.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
PR target/109657
gcc/ChangeLog:
* config/aarch64/aarch64.md (*cmov<mode>_insn_m1): New
insn_and_split pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/csinv-2.c: New test.
|
|
In the testcase below, we push_to_top_level to instantiate f and g, and they
can both use the previous_class_level cache from instantiating A<int>.
Wiping the cache in pop_from_top_level is not helpful; we'll do that in
pushclass if needed.
template <class T> struct A
{
int i;
void f() { i = 42; }
void g() { i = 24; }
};
int main()
{
A<int> a;
a.f();
a.g();
}
gcc/cp/ChangeLog:
* name-lookup.cc (pop_from_top_level): Don't
invalidate_class_lookup_cache.
|
|
While looking at the empty base handling for 109678, it occurred to me that
we ought to be able to look for an empty base at a specific offset, not just
in general.
PR c++/109678
gcc/cp/ChangeLog:
* cp-tree.h (lookup_base): Add offset parm.
* constexpr.cc (cxx_fold_indirect_ref_1): Pass it.
* search.cc (struct lookup_base_data_s): Add offset.
(dfs_lookup_base): Handle it.
(lookup_base): Pass it.
|
|
Here, when dealing with a class with a complex subobject structure, we would
try and fail to find the relevant FIELD_DECL for an empty base before giving
up. And we would do this at each level, in a combinatorially problematic
way. Instead, we should check for an empty base first.
PR c++/109678
gcc/cp/ChangeLog:
* constexpr.cc (cxx_fold_indirect_ref_1): Handle empty base first.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/variant1.C: New test.
|
|
These tests cover basic cases to ensure the atomic mappings follow the
strengthened Table A.6 mappings that are compatible with Table A.7.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/testsuite/ChangeLog:
* gcc.target/riscv/amo-table-a-6-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-amo-add-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-1.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-2.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-3.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-4.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-5.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-6.c: New test.
* gcc.target/riscv/amo-table-a-6-compare-exchange-7.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-1.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-2.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-3.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-4.c: New test.
* gcc.target/riscv/amo-table-a-6-fence-5.c: New test.
* gcc.target/riscv/amo-table-a-6-load-1.c: New test.
* gcc.target/riscv/amo-table-a-6-load-2.c: New test.
* gcc.target/riscv/amo-table-a-6-load-3.c: New test.
* gcc.target/riscv/amo-table-a-6-store-1.c: New test.
* gcc.target/riscv/amo-table-a-6-store-2.c: New test.
* gcc.target/riscv/amo-table-a-6-store-compat-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-1.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-2.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-3.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-4.c: New test.
* gcc.target/riscv/amo-table-a-6-subword-amo-add-5.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This change brings atomic loads in line with table A.6 of the ISA
manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/sync.md (atomic_load<mode>): Implement atomic
load mapping.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This change brings atomic fences in line with table A.6 of the ISA
manual.
Relax mem_thread_fence according to the memmodel given.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/sync.md (mem_thread_fence_1): Change fence
depending on the given memory model.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Introduce the %I and %J flags for setting the .aqrl bits on LR/SC pairs
as needed.
Atomic compare and exchange ops provide success and failure memory
models. C++17 and later place no restrictions on the relative strength
of each model, so ensure we cover both by using a model that enforces
the ordering of both given models.
This change brings LR/SC ops in line with table A.6 of the ISA manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_union_memmodels): Expose
riscv_union_memmodels function to sync.md.
* config/riscv/riscv.cc (riscv_union_memmodels): Add function to
get the union of two memmodels in sync.md.
(riscv_print_operand): Add %I and %J flags that output the
optimal LR/SC flag bits for a given memory model.
* config/riscv/sync.md: Remove static .aqrl bits on LR op/.rl
bits on SC op and replace with optimized %I, %J flags.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Atomic operations with the appropriate bits set already enfore release
semantics. Remove unnecessary release fences from atomic ops.
This change brings AMO ops in line with table A.6 of the ISA manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc
(riscv_memmodel_needs_amo_release): Change function name.
(riscv_print_operand): Remove unneeded %F case.
* config/riscv/sync.md: Remove unneeded fences.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This change makes atomic stores strictly stronger than table A.6 of the
ISA manual. This mapping makes the overall patchset compatible with
table A.7 as well.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
PR target/89835
gcc/ChangeLog:
* config/riscv/sync.md (atomic_store<mode>): Use simple store
instruction in combination with fence(s).
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr89835.c: New test.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This patch sets the relevant .rl bits on amo operations.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_print_operand): Change behavior
of %A to include release bits.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This patch enforces SEQ_CST for atomic compare_exchange ops.
Replace Fence/LR.aq/SC.aq pairs with SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/sync.md (atomic_cas_value_strong<mode>): Change
FENCE/LR.aq/SC.aq into sequentially consistent LR.aqrl/SC.rl
pair.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Replace LR.aq/SC.rl pairs with the SEQ_CST LR.aqrl/SC.rl pairs
recommended by table A.6 of the ISA manual.
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/sync.md: Change LR.aq/SC.rl pairs into
sequentially consistent LR.aqrl/SC.rl pairs.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
Remove references to MEMMODEL_SYNC_* models by converting via
memmodel_base().
2023-04-27 Patrick O'Neill <patrick@rivosinc.com>
gcc/ChangeLog:
* config/riscv/riscv.cc: Remove MEMMODEL_SYNC_* cases and
sanitize memmodel input with memmodel_base.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
This patch fixes the changelog to explicitly name the added command line
flags introduced in this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616807.html
2023-05-01 Patrick O'Neill <patrick@rivosinc.com>
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
PR target/109617
gcc/ChangeLog:
* config/riscv/vector-iterators.md: Support VNx2HI and VNX4DI when MIN_VLEN >= 128.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/vlmul_ext-1.c: New test.
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
Co-authored-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
|
|
GCC should still build with GCC 4.8.3 or newer [1]
using C++03 by default. But a recent change in
RISC-V port introduced a C++11 feature "std::log2" [2].
Use log2 from the C header, without the namespace [3].
[1] https://gcc.gnu.org/install/prerequisites.html
[2] https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=7caa1ae5e451e780fbc4746a54e3f19d4f4304dc
[3] https://stackoverflow.com/questions/26733413/error-log2-is-not-a-member-of-std
Fixes:
https://gitlab.com/buildroot.org/toolchains-builder/-/jobs/4202276589
gcc/ChangeLog:
* config/riscv/genrvv-type-indexer.cc: Use log2 from the C header, without
the namespace.
Signed-off-by: Romain Naour <romain.naour@gmail.com>
|
|
The PR109666 fix r14-386-g07c52d1eec967 incidentally also fixes this PR.
PR c++/109506
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-template26.C: New test.
|
|
gcc/ChangeLog:
* doc/invoke.texi: Update documentation based on param.opt file.
|
|
The following refactors the check for emulated vector support for
the cases of plus, minus and negate. In the PR we end up with
a SImode plus, supported by the target but emulated and in this
context fail to verify we are dealing with exactly word_mode.
PR tree-optimization/109672
* tree-vect-stmts.cc (vectorizable_operation): For plus,
minus and negate always check the vector mode is word mode.
|
|
The following adjusts testcases where the pr88531 fail with -m32
because we do not consider MMX size vectorization there and the
pr89618 runs into load/store cost differences with -m32.
* gcc.target/i386/pr88531-2a.c: Skip scanning for ia32.
* gcc.target/i386/pr88531-2b.c: Likewise.
* gcc.target/i386/pr88531-2c.c: Likewise.
* gcc.target/i386/pr89618-2.c: Likewise. Disable AVX512.
|
|
|
|
In <https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613687.html>
we discussed that the copy_node in ubsan_maybe_instrument_array_ref
is redundant, but also that it'd be best to postpone the optimization
to GCC 14. So I'm making that change now.
gcc/c-family/ChangeLog:
* c-ubsan.cc (ubsan_maybe_instrument_array_ref): Don't copy_node.
|
|
Here it turns out I also needed to adjust cfun when stepping out of the
member function to instantiate the DMI. But instead of adding that tweak,
let's unify with instantiate_body and just push_to_top_level instead of
trying to do the minimum subset of it. There was no measurable change in
compile time on stdc++.h.
This should also resolve 109506 without yet another tweak.
PR c++/109666
gcc/cp/ChangeLog:
* name-lookup.cc (maybe_push_to_top_level)
(maybe_pop_from_top_level): Split out...
* pt.cc (instantiate_body): ...from here.
* init.cc (maybe_instantiate_nsdmi_init): Use them.
* name-lookup.h: Declare them..
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/nsdmi-array2.C: New test.
|
|
I noticed I didn't update the comment about how the pass
works after I initially added match_simplify_replacement.
Anyways this updates the comment to be the current state
of the pass.
OK?
gcc/ChangeLog:
* tree-ssa-phiopt.cc: Update comment about
how the transformation are implemented.
|
|
This patch converts the xstormy16 patch to LRA. It introduces a code
quality regression in the shiftsi testcase, but it also fixes numerous
aborts/errors. IMHO it's a good tradeoff.
gcc/
* config/stormy16/stormy16.cc (TARGET_LRA_P): Remove defintion.
|
|
Spurred by Segher's RFC, I went ahead and tested several ports with LRA
enabled. Not surprisingly, many failed, but a few built their full set
of libraries successful and of those a few even ran their testsuites
with no regressions. In fact, enabling LRA fixes a small number of
failures on the iq2000 port.
This patch converts the ports which built their libraries and have test
results that are as good as or better than without LRA. There may
be minor code quality regressions or there may be minor code quality
improvements -- I'm leaving that for the port maintainers to own going
forward.
gcc/
* config/cris/cris.cc (TARGET_LRA_P): Remove.
* config/epiphany/epiphany.cc (TARGET_LRA_P): Remove.
* config/iq2000/iq2000.cc (TARGET_LRA_P): Remove.
* config/m32r/m32r.cc (TARGET_LRA_P): Remove.
* config/microblaze/microblaze.cc (TARGET_LRA_P): Remove.
* config/mmix/mmix.cc (TARGET_LRA_P): Remove.
|
|
The .su files generated with -fstack-usage are arguably debug info. In
order to make builds more reproducible, apply the same remapping logic
to the recorded file names as for when producing the debug info
embedded in the object files.
To this end, teach print_decl_identifier() a new
PRINT_DECL_REMAP_DEBUG flag and use that from output_stack_usage_1().
gcc/ChangeLog:
* print-tree.h (PRINT_DECL_REMAP_DEBUG): New flag.
* print-tree.cc (print_decl_identifier): Implement it.
* toplev.cc (output_stack_usage_1): Use it.
|
|
gcc/ChangeLog:
* value-range.h (class int_range): Remove gt_ggc_mx and gt_pch_nx
friends.
|
|
irange::set_nonzero is used everywhere and benefits immensely from
inlining.
gcc/ChangeLog:
* value-range.h (irange::set_nonzero): Inline.
|
|
Now that anti-ranges are no more and iranges contain wide_ints instead
of trees, various cleanups are possible. This is one of a handful of
patches improving the performance of irange::set() which is not on a
hot path, but quite sensitive because it is so pervasive.
gcc/ChangeLog:
* gimple-range-op.cc (cfn_ffs::fold_range): Use the correct
precision.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Use <2> for
invalid_range, as it is an inverse range.
* tree-vrp.cc (find_case_label_range): Avoid trees.
* value-range.cc (irange::irange_set): Delete.
(irange::irange_set_1bit_anti_range): Delete.
(irange::irange_set_anti_range): Delete.
(irange::set): Cleanup.
* value-range.h (class irange): Remove irange_set,
irange_set_anti_range, irange_set_1bit_anti_range.
(irange::set_undefined): Remove set to m_type.
|
|
gcc/ChangeLog:
* range-op.cc (update_known_bitmask): Adjust for irange containing
wide_ints internally.
* tree-ssanames.cc (set_nonzero_bits): Same.
* tree-ssanames.h (set_nonzero_bits): Same.
* value-range-storage.cc (irange_storage::set_irange): Same.
(irange_storage::get_irange): Same.
* value-range.cc (irange::operator=): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::verify_range): Same.
(irange::contains_p): Same.
(irange::irange_single_pair_union): Same.
(irange::union_): Same.
(irange::irange_contains_p): Same.
(irange::intersect): Same.
(irange::invert): Same.
(irange::set_range_from_nonzero_bits): Same.
(irange::set_nonzero_bits): Same.
(mask_to_wi): Same.
(irange::intersect_nonzero_bits): Same.
(irange::union_nonzero_bits): Same.
(gt_ggc_mx): Same.
(gt_pch_nx): Same.
(tree_range): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
* value-range.h (irange::type): Same.
(irange::varying_compatible_p): Same.
(irange::irange): Same.
(int_range::int_range): Same.
(irange::set_undefined): Same.
(irange::set_varying): Same.
(irange::lower_bound): Same.
(irange::upper_bound): Same.
|
|
Little by little, bounds_of_var_in_loop() has grown into an
unmaintainable mess. This patch rewrites the code to use the relevant
APIs as well as refactor it to make it more readable.
gcc/ChangeLog:
* gimple-range-fold.cc (tree_lower_bound): Delete.
(tree_upper_bound): Delete.
(vrp_val_max): Delete.
(vrp_val_min): Delete.
(fold_using_range::range_of_ssa_name_with_loop_info): Call
range_of_var_in_loop.
* vr-values.cc (valid_value_p): Delete.
(fix_overflow): Delete.
(get_scev_info): New.
(bounds_of_var_in_loop): Refactor into...
(induction_variable_may_overflow_p): ...this,
(range_from_loop_direction): ...and this,
(range_of_var_in_loop): ...and this.
* vr-values.h (bounds_of_var_in_loop): Delete.
(range_of_var_in_loop): New.
|
|
This patch removes all uses of vrp_val_{min,max} in favor for a
irange_val_* which are wide_int based. This will leave only one use
of vrp_val_* which returns trees in range_of_ssa_name_with_loop_info()
because it needs to work with non-integers (floats, etc). In a
follow-up patch, this function will also be cleaned up such that
vrp_val_* can be deleted.
The functions min_limit and max_limit in range-op.cc are now useless
as they're basically irange_val*. I didn't rename them yet to avoid
churn. I'll do it in a later patch.
gcc/ChangeLog:
* gimple-range-fold.cc (adjust_pointer_diff_expr): Rewrite with
irange_val*.
(vrp_val_max): New.
(vrp_val_min): New.
* gimple-range-op.cc (cfn_strlen::fold_range): Use irange_val_*.
* range-op.cc (max_limit): Same.
(min_limit): Same.
(plus_minus_ranges): Same.
(operator_rshift::op1_range): Same.
(operator_cast::inside_domain_p): Same.
* value-range.cc (vrp_val_is_max): Delete.
(vrp_val_is_min): Delete.
(range_tests_misc): Use irange_val_*.
* value-range.h (vrp_val_is_min): Delete.
(vrp_val_is_max): Delete.
(vrp_val_max): Delete.
(irange_val_min): New.
(vrp_val_min): Delete.
(irange_val_max): New.
* vr-values.cc (check_for_binary_op_overflow): Use irange_val_*.
|