Age | Commit message (Collapse) | Author | Files | Lines |
|
gcc/ChangeLog:
* config.gcc: Add avx10_2satcvtintrin.h and
avx10_2-512satcvtintrin.h.
* config/i386/i386-builtin-types.def:
Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI),
(V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI),
(V16SI, V16SF, V16SI, UHI, INT), (V16HI, V16BF, V16HI, UHI, INT),
(V32HI, V32BF, V32HI, USI, INT).
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle
V32HI_FTYPE_V32BF_V32HI_USI, V16HI_FTYPE_V16BF_V16HI_UHI,
V8HI_FTYPE_V8BF_V8HI_UQI.
(ix86_expand_round_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI_INT,
V16SI_FTYPE_V16SF_V16SI_UHI_INT, V16HI_FTYPE_V16BF_V16HI_UHI_INT.
* config/i386/immintrin.h: Include avx10_2satcvtintrin.h and
avx10_2-512savcvtintrin.h.
* config/i386/sse.md:
(UNSPEC_CVTNE_BF16_IBS_ITER): New iterator.
(sat_cvt_sign_prefix): Ditto.
(sat_cvt_trunc_prefix): Ditto.
(UNSPEC_CVT_PH_IBS_ITER): Ditto.
(UNSPEC_CVTT_PH_IBS_ITER): Ditto.
(UNSPEC_CVT_PS_IBS_ITER): Ditto.
(UNSPEC_CVTT_PS_IBS_ITER): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>nebf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
New define_insn.
(avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Ditto.
(avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Ditto.
(avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
* config/i386/avx10_2-512satcvtintrin.h: New file.
* config/i386/avx10_2satcvtintrin.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx512f-helper.h: Add new test macro.
* gcc.target/i386/m512-check.h: Add new type.
* gcc.target/i386/avx10_2-512-satcvt-1.c: New test.
* gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-satcvt-1.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.
|
|
gcc/ChangeLog:
* config/i386/avx10_2-512bf16intrin.h: Add new intrinsics.
* config/i386/avx10_2bf16intrin.h: Diito.
* config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE
for new type.
* config/i386/i386-builtin.def (BDESC): Add new buildin.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle new type.
* config/i386/sse.md (vecmemsuffix): Add vector BF mode.
(avx10_2_rsqrtpbf16_<mode><mask_name>): New define_insn.
(avx10_2_sqrtnepbf16_<mode><mask_name>): Ditto.
(avx10_2_rcppbf16_<mode><mask_name>): Ditto.
(avx10_2_getexppbf16_<mode><mask_name>): Ditto.
(BF16IMMOP): New iterator.
(bf16immop): Ditto.
(avx10_2_<bf16immop>pbf16_<mode><mask_name>): New define_insn.
(avx10_2_fpclasspbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cmppbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_comsbf16_v8bf): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx10-check.h: Add AVX10_SCALAR.
* gcc.target/i386/avx10-helper.h: Add helper functions.
* gcc.target/i386/avx10_2-512-bf16-1.c: Add new tests.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx-1.c: Add macros.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-vcmppbf16-2.c: New test.
* gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcmppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vcomsbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetexppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrcppbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vreducenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Ditto.
Co-authored-by: Levy Hsu <admin@levyhsu.com>
|
|
gcc/ChangeLog:
* config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h.
* config/i386/i386-builtin-types.def : Add new
DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF,
V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF,
V8BF_FTYPE_V8BF_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_UHI,
V32BF_FTYPE_V32BF_V32BF_USI, V32BF_FTYPE_V32BF_V32BF_V32BF_USI,
V8BF_FTYPE_V8BF_V8BF_V8BF_UQI and V16BF_FTYPE_V16BF_V16BF_V16BF_UHI.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle new DEF_FUNCTION_TYPE.
* config/i386/immintrin.h: Include avx10_2-512bf16intrin.h and
avx10_2bf16intrin.h.
* config/i386/sse.md
(VBF_AVX10_2): New iterator.
(avx10_2_scalefpbf16_<mode><mask_name>): New define_insn.
(avx10_2_<code>nepbf16_<mode><mask_name>): Ditto.
(avx10_2_<insn>nepbf16_<mode><mask_name>): Ditto.
(avx10_2_fmaddnepbf16_<mode>_maskz): New expander.
(avx10_2_fnmaddnepbf16_<mode>_maskz): Ditto.
(avx10_2_fmsubnepbf16_<mode>_maskz): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_maskz): Ditto.
(avx10_2_fmaddnepbf16_<mode><sd_maskz_name>): New define_insn.
(avx10_2_fmaddnepbf16_<mode>_mask): Ditto.
(avx10_2_fmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmaddnepbf16_<mode>_mask): Ditto.
(avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto.
(avx10_2_fmsubnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmsubnepbf16_<mode>_mask): Ditto.
(avx10_2_fmsubnepbf16_<mode>_mask3): Ditto.
(avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_mask): Ditto.
(avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto.
* config/i386/avx10_2-512bf16intrin.h: New file.
* config/i386/avx10_2bf16intrin.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-helper.h: Add MAKE_MASK_MERGE and MAKE_MASK_ZERO
for bf16_uw.
* gcc.target/i386/m512-check.h: Add union512bf16_uw, union256bf16_uw,
union128bf16_uw and CHECK_EXP for them.
* gcc.target/i386/avx10-helper.h: New file.
* gcc.target/i386/avx10_2-512-bf16-1.c: New test.
* gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-bf16-1.c: Ditto.
* gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto.
* gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto.
Co-authored-by: Levy Hsu <admin@levyhsu.com>
|
|
gcc/ChangeLog:
* config.gcc: Add avx10_2-512convertintrin.h and
avx10_2convertintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_POINTER_TYPE
and DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle AVX10.2.
(ix86_expand_round_builtin): Ditto.
* config/i386/immintrin.h: Include avx10_2-512convertintrin.h,
avx10_2convertintrin.h.
* config/i386/sse.md (VHF_AVX10_2): New iterator.
(bf16_ph): Add 512 bit mode.
(avx10_2_cvt2ps2phx_<mode><mask_name<round_name>): New define_insn.
(ssebvecmode): New iterator.
(UNSPEC_NECONVERTFP8_PACK): Ditto.
(neconvertfp8_pack): Ditto.
(vcvt<neconvertfp8_pack><mode><mask_name>): New define_insn.
(ssebvecmode_2): New iterator.
(UNSPEC_VCVTBIASPH2FP8_PACK): Ditto.
(biasph2fp8_pack): Ditto.
(vcvt<biasph2fp8_pack>v8hf): New expander.
(vcvt<biasph2fp8_pack>v8hf_mask): Ditto.
(*vcvt<biasph2bf8_pack>v8hf): New define_insn.
(*vcvt<biasph2fp8_pack>v8hf_mask): Ditto.
(VHF_AVX10_2_2): New iterator.
(vcvt<biasph2fp8_pack><mode><mask_name>): New define_insn.
(VHF_256_512): New iterator.
(ph2fp8suff): Ditto.
(UNSPEC_NECONVERTPH2FP8_PACK): Ditto.
(neconvertph2fp8): Ditto.
(vcvt<neconvertph2fp8>v8hf_mask): New expander.
(*vcvt<neconvertph2fp8>v8hf): New define_insn.
(*vcvt<neconvertph2fp8>v8hf_mask): Ditto.
(vcvt<neconvertph2fp8><mode><mask_name>): Ditto.
(vcvthf82ph<mode><mask_name>): Ditto.
* config/i386/avx10_2-512convertintrin.h: New file.
* config/i386/avx10_2convertintrin.h: Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add macros for const.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-convert-1.c: New test.
* gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-convert-1.c: Ditto.
* gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto.
* gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto.
* gcc.target/i386/fp8-helper.h: New helper file.
Co-authored-by: Levy Hsu <admin@levyhsu.com>
Co-authored-by: Kong Lingling <lingling.kong@intel.com>
|
|
gcc/ChangeLog:
* config/i386/avx10_2-512mediaintrin.h: Add new intrins.
* config/i386/avx10_2mediaintrin.h: Ditto.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT16 and AVX10.2.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/sse.md (unspec): Add UNSPEC_VDPPHPS.
(avx10_2_mpsadbw<mask_name>): New define_insn.
(<mask_codefor><sse4_1_avx2>_mpsadbw<mask_name>): Ditto.
(vpdp<vpdpwprodtype>_<mode>): Add AVX10_2_256.
(vpdp<vpdpwprodtype>_v16si): New defin_insn.
(vpdp<vpdpwprodtype>_<mode>_mask): Ditto.
(*vpdp<vpdpwprodtype>_<mode>_maskz): Ditto.
(vpdp<vpdpwprodtype>_<mode>_maskz): New expander.
(vdpphps_<mode>): New define_insn.
(vdpphps_<mode>_mask): Ditto.
(*vdpphps_<mode>_maskz): Ditto.
(vdpphps_<mode>_maskz): New expander.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avxvnniint16-1.c: Add new macro test.
* gcc.target/i386/avx-1.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Add test.
* gcc.target/i386/avx10_2-media-1.c: Ditto.
* gcc.target/i386/avxvnniint16-builtin.c: New test.
* gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-builtin-2.c: Ditto.
* gcc.target/i386/avx10_2-vdpphps-2.c: Ditto.
* gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto.
Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
|
|
gcc/ChangeLog
* config.gcc: Add avx10_2mediaintrin.h and
avx10_2-512mediaintrin.h.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/i386-builtins.cc (def_builtin): Handle shared
builtins between AVXVNNIINT8 and AVX10.2.
* config/i386/i386-expand.cc (ix86_check_builtin_isa_match):
Ditto.
* config/i386/immintrin.h: Include avx10_2mediaintrin.h and
avx10_2-512mediaintrin.h
* config/i386/sse.md: (VI4_AVX10_2): New.
(vpdp<vpdotprodtype>_<mode>): Add AVX10_2_256.
(vpdp<vpdotprodtype>_v16si): New define_insn.
(vpdp<vpdotprodtype>_<mode>_mask): Ditto.
(*vpdp<vpdotprodtype>_<mode>_maskz): Ditto.
(vpdp<vpdotprodtype>_<mode>_maskz): New expander.
* config/i386/avx10_2-512mediaintrin.h: New file.
* config/i386/avx10_2mediaintrin.h: Ditto.
gcc/testsuite/ChangeLog
* gcc.target/i386/avx512f-helper.h: Reuse AVX512F macros
for AVX10.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* lib/target-supports.exp
(check_effective_target_avx10_2): New.
(check_effective_target_avx10_2_512): Ditto.
* gcc.target/i386/avx10-check.h: New test file.
* gcc.target/i386/avx10-helper.h: Ditto.
* gcc.target/i386/avx10_2-builtin-1.c: Ditto.
* gcc.target/i386/avx10_2-512-media-1.c: Ditto.
* gcc.target/i386/avx10_2-media-1.c: Ditto..
* gcc.target/i386/avxvnniint8-builtin.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto.
* gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto.
Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
|
|
After AVX10 introduction, we still want to use AVX512 helper functions
to avoid duplicate code. In order to reuse them, we need to do some refactor
to make sure each function define happen under correct ISA to avoid ABI
warnings.
gcc/testsuite/ChangeLog:
* gcc.target/i386/m512-check.h: Wrap the function define with
correct vector size.
|
|
This patch would like to allow IMM for the operand 0 of ussub pattern.
Aka .SAT_SUB(1023, y) as the below example.
Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return (T)IMM >= y ? (T)IMM - y : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)
Before this patch:
10 │ sat_u_sub_imm82_uint64_t_fmt_1:
11 │ li a5,82
12 │ bgtu a0,a5,.L3
13 │ sub a0,a5,a0
14 │ ret
15 │ .L3:
16 │ li a0,0
17 │ ret
After this patch:
10 │ sat_u_sub_imm82_uint64_t_fmt_1:
11 │ li a5,82
12 │ sltu a4,a5,a0
13 │ addi a4,a4,-1
14 │ sub a0,a5,a0
15 │ and a0,a4,a0
16 │ ret
The below test suites are passed for this patch:
1. The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
func impl to gen xmode rtx reg from operand rtx.
(riscv_expand_ussub): Gen xmode reg for operand 1.
* config/riscv/riscv.md: Allow const_int for operand 1.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned vector
.SAT_TRUNC form 4. Aka:
Form 4:
#define DEF_VEC_SAT_U_TRUNC_FMT_4(NT, WT) \
void __attribute__((noinline)) \
vec_sat_u_trunc_##NT##_##WT##_fmt_4 (NT *out, WT *in, unsigned limit) \
{ \
unsigned i; \
for (i = 0; i < limit; i++) \
{ \
bool not_overflow = in[i] <= (WT)(NT)(-1); \
out[i] = ((NT)in[i]) | (NT)((NT)not_overflow - 1); \
} \
}
DEF_VEC_SAT_U_TRUNC_FMT_4 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-24.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 4. Aka:
Form 4:
#define DEF_SAT_U_TRUNC_FMT_4(NT, WT) \
NT __attribute__((noinline)) \
sat_u_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
{ \
bool not_overflow = x <= (WT)(NT)(-1); \
return ((NT)x) | (NT)((NT)not_overflow - 1); \
}
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-19.c: New test.
* gcc.target/riscv/sat_u_trunc-20.c: New test.
* gcc.target/riscv/sat_u_trunc-21.c: New test.
* gcc.target/riscv/sat_u_trunc-22.c: New test.
* gcc.target/riscv/sat_u_trunc-23.c: New test.
* gcc.target/riscv/sat_u_trunc-24.c: New test.
* gcc.target/riscv/sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/sat_u_trunc-run-24.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
Currently, some binops of vector vs double scalar under RV32 can't
translated to vf but vfmv+vxx.vv.
The cause is that vec_duplicate is also expanded to broadcast for double mode
under RV32. last-combine can't process expanded broadcast.
gcc/ChangeLog:
* config/riscv/vector.md: Add !FLOAT_MODE_P constraint.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto.
|
|
The previous patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd
aimed to eliminate redundant MOV instructions by removing calling
emit_clobber in lower-subreg.cc's resolve_simple_move.
First, I found that another patch address this issue:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf2737cda53a83332db1a1a021653447b05a7e7
and even without removing calling emit_clobber,
the instruction generation is still as expected.
Second, removing the CLOBBER expression will have side effects.
When there is no CLOBBER expression and only SUBREG assignments exist,
according to the logic of the 'df_lr_bb_local_compute' function,
the register will be added to the basic block LR IN set.
This will cause the register's lifetime to span the entire function,
resulting in increased register pressure. Taking the newly added test case
'gcc/testsuite/gcc.target/riscv/pr43644.c' as an example,
removing the CLOBBER expression will lead to spill in some registers.
gcc/:
* lower-subreg.cc (resolve_simple_move): Re-add calling emit_clobber
immediately before moving a multi-word register by parts.
gcc/testsuite/:
* gcc.target/riscv/pr43644.c: New test case.
|
|
The test case uses "atomic<int>", which fails to link on
pru-unknown-elf target due to missing __atomic_load_4 symbol.
Fix by filtering for sync_int_long effective target. Ensured that the
test still passes for x86_64-pc-linux-gnu.
gcc/testsuite/ChangeLog:
* g++.dg/init/array54.C: Require sync_int_long effective target.
Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
|
|
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.
Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
They are handled similar to COND in if conversion.
This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.
PR tree-optimization/115866
gcc/ChangeLog:
* tree-if-conv.cc (if_convertible_switch_p): New function.
(if_convertible_stmt_p): Check for switch.
(get_loop_body_in_if_conv_order): Handle switch.
(predicate_bbs): Likewise.
(predicate_statements): Likewise.
(remove_conditions_and_labels): Likewise.
(ifcvt_split_critical_edges): Likewise.
(ifcvt_local_dce): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
|
|
Write CodeView S_LDATA32 symbols for static locals in optimized code. We have
to handle these separately, as they come after the S_FRAMEPROC, plus you can't
have S_BLOCK32 symbols like you can in unoptimized code.
gcc/
* dwarf2codeview.cc (write_optimized_static_local_vars): New function.
(write_function): Call write_optimized_static_local_vars.
|
|
Write S_FRAMEPROC symbols, which aren't very useful but seem to be necessary
for Microsoft debuggers to function properly. These symbols come after S_LOCAL
symbols for optimized variables, but before S_REGISTER and S_REGREL32 for
unoptimized variables.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_FRAMEPROC.
(write_s_frameproc): New function.
(write_function): Call write_s_frameproc.
|
|
Outputs S_DEFRANGE_REGISTER_REL symbols for optimized local variables that are
on the stack, consisting of the stack register, the offset, and the code range
for which this applies.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_DEFRANGE_REGISTER_REL.
(write_defrange_register_rel): New function.
(write_optimized_local_variable_loc): Add fbloc param, and call
write_defrange_register_rel.
(write_optimized_local_variable): Add fbloc param.
(write_optimized_function_vars): Add fbloc param.
|
|
Enable variable tracking when outputting CodeView debug information, and make
it so that we issue debug symbols for optimized variables in registers. This
consists of S_LOCAL symbols, which give the name and the type of local
variables, followed by S_DEFRANGE_REGISTER symbols for the register and the
code for which this applies.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_LOCAL and
S_DEFRANGE_REGISTER.
(write_s_local): New function.
(write_defrange_register): New function.
(write_optimized_local_variable_loc): New function.
(write_optimized_local_variable): New function.
(write_optimized_function_vars): New function.
(write_function): Call write_optimized_function_vars if variable
tracking enabled.
* dwarf2out.cc (typedef var_loc_view): Move to dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* dwarf2out.h (typedef var_loc_view): Move from dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* opts.cc (finish_options): Enable variable tracking for CodeView.
|
|
This patch tweaks timode_scalar_chain::compute_convert_gain to better
reflect the expansion of V1TImode arithmetic right shifts by the i386
backend. The comment "see ix86_expand_v1ti_ashiftrt" appears after
"case ASHIFTRT" in compute_convert_gain, and the changes below attempt
to better match the logic used there.
The original motivating example is:
__int128 m1;
void foo()
{
m1 = (m1 << 8) >> 8;
}
which with -O2 -mavx2 we fail to convert to vector form due to the
inappropriate cost of the arithmetic right shift.
Instruction gain -16 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
Total gain: -3
Chain #1 conversion is not profitable
This is reporting that the ASHIFTRT is four instructions worse using
vectors than in scalar form, which is incorrect as the AVX2 expansion
of this shift only requires three instructions (and the scalar form
requires two).
With more accurate costs in timode_scalar_chain::compute_convert_gain
we now see (with -O2 -mavx2):
Instruction gain -4 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
Total gain: 9
Converting chain #1...
which results in:
foo: vmovdqa m1(%rip), %xmm0
vpslldq $1, %xmm0, %xmm0
vpsrad $8, %xmm0, %xmm1
vpsrldq $1, %xmm0, %xmm0
vpblendd $7, %xmm0, %xmm1, %xmm0
vmovdqa %xmm0, m1(%rip)
ret
2024-08-25 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/i386-features.cc (compute_convert_gain)
<case ASHIFTRT>: Update to match ix86_expand_v1ti_ashiftrt.
|
|
Another test where the output was slightly twiddled by late-combine in which
simply disabling late-combine seems to be the best option.
> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ...
> FAIL: gcc.target/riscv/cm_mv_rv32.c -Os check-function-bodies sum
Pushing to the trunk.
gcc/testsuite
* gcc.target/riscv/cm_mv_rv32.c: Disable late-combine.
|
|
Surya's IRA patch from June slightly improves the code we generate for the
vls/calling-conventions tests on RISC-V. Specifically it removes an
unnecessary move from the instruction stream. This (of course) broke those
tests:
> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp ...
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
> FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3
This patch does the natural adjustment of those tests by dropping the moves
from the scan.
gcc/testsuite
* gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Update
expected output.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: Likewise.
* gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: Likewise.
|
|
Just minor testsuite adjustments -- several of the shorten-memref tests are
slightly twiddled by the late-combine pass:
> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ...
> FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> XPASS: gcc.target/riscv/shorten-memrefs-3.c -Os scan-assembler-not load2a:\n.*addi[ \t]*[at][0-9],[at][0-9],[0-9]*
> FAIL: gcc.target/riscv/shorten-memrefs-5.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> FAIL: gcc.target/riscv/shorten-memrefs-8.c -Os scan-assembler store:\n(\t?\\.[^\n]*\n)*\taddi\ta[0-7],a[0-7],1
This patch just turns off the late-combine pass for those tests. Locally I'd
adjusted all the shorten-memref patches, but a quick re-rest shows that only 4
tests seem affected right now.
Anyway, pushing to the trunk to slightly clean up our test results.
gcc/testsuite
* gcc.target/riscv/shorten-memrefs-2.c: Turn off late-combine.
* gcc.target/riscv/shorten-memrefs-3.c: Likewise.
* gcc.target/riscv/shorten-memrefs-5.c: Likewise.
* gcc.target/riscv/shorten-memrefs-8.c: Likewise.
|
|
This patch provides a simple unit test for snprintf and atof against
the libc definition module.
gcc/testsuite/ChangeLog:
* gm2/calling-c/libc/run/pass/calling-c-libc-run-pass.exp: New test.
* gm2/calling-c/libc/run/pass/testlibcstr.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
|
|
|
|
Export all string to integral and floating point number conversion functions
(atof, atoi, atol, atoll, strtod, strtof, strtold, strtol, strtoll, strtoul
and strtoull).
gcc/m2/ChangeLog:
* gm2-libs/libc.def (atof): Export unqualified.
(atoi): Ditto.
(atol): Ditto.
(atoll): Ditto.
(strtod): Ditto.
(strtof): Ditto.
(strtold): Ditto.
(strtol): Ditto.
(strtoll): Ditto.
(strtoul): Ditto.
(strtoull): Ditto.
Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
|
|
In the case that the initial awaiter returns an object, the initial await
can be a target expression and we need to look at its initializer to cast
the await_resume() to void and to wrap in a compound expression that sets
the initial_await_resume_called flag.
PR c++/110635
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Look through
initial await target expressions to find the actual co_await_expr
that we need to update.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr110635.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
In the fix for PR95822 (r11-7402) we set throwing_cleanup false in the top
level of the coroutine transform code. However, as the current PR shows,
that is not sufficient.
Any use of cxx_maybe_build_cleanup() can reset the flag, which causes the
check_return_expr () logic to try to add a guard variable and set it.
For the coroutine code, we need to handle the cleanups separately, since
the responsibility for them changes after the first resume point, which
we handle in the ramp exception processing.
Fix this by forcing the "throwing_cleanup" flag false right before the
processing of the return expression.
PR c++/102051
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Handle
"throwing_cleanup" here instead of ...
(cp_coroutine_transform::apply_transforms): ... here.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr102051.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
[dcl.fct.def.coroutine]/7 says:
The expression promise.get_return_object() is used to initialize the returned
reference or prvalue result object of a call to a coroutine. The call to
get_return_object is sequenced before the call to initial_suspend and is
invoked at most once.
The issue is about when any conversions are carried out if the type of
the g_r_o call is not the same as the ramp return. Currently, we have been
doing this by materialising the g_r_o return value and passing that to
finish_return_expr() which handles the necessary conversions and checks.
As the PR shows, this does not work as expected.
In the revised version we carry out the work of the conversions when
intialising the return slot (with the same facilities that are used by
finish_return_expr()). We do this before the call that initiates the
coroutine body, satisfying the requirements for one call before initial
suspend.
The return expression becomes a trivial 'return <retval>'.
This simplifies the ramp logic considerably, since we no longer need to
keep track of the temporarily-materialised g_r_o value.
PR c++/115908
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Rework the return
value initialisation to initialise the return slot always from
get_return_object, even if that implies carrying out conversions
to do so.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr115908.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
We have been requiring the get_return_on_allocation_fail() call to have the
same type as the ramp. This is not intended by the standard, so relax that
to allow anything convertible to the ramp return.
PR c++/109682
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Allow for cases where
get_return_on_allocation_fail has a type convertible to the ramp
return type.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr109682.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
[PR100476].
Require that the value returned by get_return_object is convertible to
the ramp return. This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.
We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.
PR c++/100476
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Adjust expected diagnostic.
* g++.dg/coroutines/pr102489.C: Avoid void get_return_object.
* g++.dg/coroutines/pr103868.C: Likewise.
* g++.dg/coroutines/pr94879-folly-1.C: Likewise.
* g++.dg/coroutines/pr94883-folly-2.C: Likewise.
* g++.dg/coroutines/pr96749-2.C: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
The responsibility for destroying part of the frame content (promise,
arg copies and the frame itself) transitions from the ramp to the
body of the coroutine once we reach the await_resume () for the
initial suspend.
We added the variable that flags the transition, but failed to act on
it. This corrects that so that the ramp only tries to run DTORs for
objects when an exception occurs before the initial suspend await
resume has started.
PR c++/113773
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Only cleanup the
frame state on exceptions that occur before the initial await
resume has begun.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/pr113773.C: New test.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
This splits out the building of the allocation and deallocation expressions
and runs them early in the ramp build, so that we can exit if they are not
usable, before we start building the ramp body.
Likewise move checks for other required resources to the begining of the
ramp builder.
This is preparation for work needed to update the allocation/destruction
in cases where we have excess alignment of the promise or other saved frame
state.
gcc/cp/ChangeLog:
* call.cc (build_op_delete_call_1): Renamed and added a param
to allow the caller to prioritize two argument usual deleters.
(build_op_delete_call): New.
(build_coroutine_op_delete_call): New.
* coroutines.cc (coro_get_frame_dtor): Rename...
(build_coroutine_frame_delete_expr):... to this; simplify to
use build_op_delete_call for all cases.
(build_actor_fn): Use revised frame delete function.
(build_coroutine_frame_alloc_expr): New.
(cp_coroutine_transform::complete_ramp_function): Rename...
(cp_coroutine_transform::build_ramp_function): ... to this.
Reorder code to carry out checks for prerequisites before the
codegen. Split out the allocation/delete code.
(cp_coroutine_transform::apply_transforms): Use revised name.
* coroutines.h: Rename function.
* cp-tree.h (build_coroutine_op_delete_call): New.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Use revised
diagnostics.
* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C:
Likewise.
* g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C:
Likewise.
* g++.dg/coroutines/coro-bad-grooaf-00-static.C: Likewise.
* g++.dg/coroutines/ramp-return-b.C: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
This change is preparation for fixes to the ramp and codegen to follow.
The primary motivation is that we have thee activities; analysis, ramp
synthesis and outlined coroutine body synthesis. These are currently
carried out in sequence in the 'morph_fn_to_coro' code, which means that
we are nesting the synthesis of the outlined coroutine body inside the
finish_function call for the original function (which becomes the ramp).
The revised code splits the three interests so that the analysis can be
used independently by the ramp and body synthesis. This avoids some issues
seen with global state that start/finish function use and allows us to
use more of the high-level APIs in fixing bugs.
The resultant implementation is more self-contained, and has less impact
on finish_function.
gcc/cp/ChangeLog:
* coroutines.cc (struct suspend_point_info, struct param_info,
struct local_var_info, struct susp_frame_data,
struct local_vars_frame_data): Move to coroutines.h.
(build_actor_fn): Use start/finish function APIs.
(build_destroy_fn): Likewise.
(coro_build_actor_or_destroy_function): No longer mark the
actor / destroyer as DECL_COROUTINE_P.
(coro_rewrite_function_body): Use class members.
(cp_coroutine_transform::wrap_original_function_body): Likewise.
(build_ramp_function): Replace by...
(cp_coroutine_transform::complete_ramp_function): ...this.
(cp_coroutine_transform::cp_coroutine_transform): New.
(cp_coroutine_transform::~cp_coroutine_transform): New
(morph_fn_to_coro): Replace by...
(cp_coroutine_transform::apply_transforms): ...this.
(cp_coroutine_transform::finish_transforms): New.
* cp-tree.h (morph_fn_to_coro): Remove.
* decl.cc (emit_coro_helper): Remove.
(finish_function): Revise handling of coroutine transforms.
* coroutines.h: New file.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
Co-authored-by: Arsen Arsenović <arsen@aarsen.me>
|
|
This is primarily preparation to partition the functionality of the
coroutine transform into analysis, ramp generation and then (later)
synthesis of the coroutine body. The patch does fix one latent
issue in the ordering of DTORs for frame parameter copies (to ensure
that they are processed in reverse order to the copy creation).
gcc/cp/ChangeLog:
* coroutines.cc (build_actor_fn): Arrange to apply any
required parameter copy DTORs in reverse order to their
creation.
(coro_rewrite_function_body): Handle revised param uses.
(morph_fn_to_coro): Split the ramp function completion
into a separate function.
(build_ramp_function): New.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
When we build an await expression, we might need to materialise the awaiter
if it is a prvalue. This re-implements this using core APIs instead of local
code.
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Simplify checks for the cases that
we need to materialise an awaiter.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
|
|
The case in PR113746 used to ICE until commit r15-123-gf04dc89a991ddc.
This patch simply adds the case to the testsuite.
PR c++/113746
gcc/testsuite/ChangeLog:
* g++.dg/parse/crash76.C: New test.
|
|
set -fschedule-insns.
gcc/testsuite/
* gcc.dg/torture/pr115929-2.c: Add dg-require-effective-target scheduling.
* gcc.dg/torture/pr116343.c: Same.
|
|
|
|
repeating_sequence_p operates directly on the encoded pattern and does
not derive elements using the .elt() accessor. Passing in the length of
the unencoded vector can cause an out-of-bounds read of the encoded
pattern.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p):
Use encoded_nelts when calling repeating_sequence_p.
(rvv_builder::is_repeating_sequence): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
|
|
PR116405]
Now that more operations are allowed for noce_convert_multiple_sets,
it is possible that the same register appears multiple times as target
in a basic block. After noce_convert_multiple_sets_1 is called we
potentially also emit register moves from temporaries back to the
original targets. In some cases where the target registers overlap
with the block's condition, these register moves may overwrite
intermediate variables because they're emitted after the if-converted
code. To address this issue we now iterate backwards and keep track
of seen registers when emitting these final register moves.
PR rtl-optimization/116372
PR rtl-optimization/116405
gcc/ChangeLog:
* ifcvt.cc (noce_convert_multiple_sets): Iterate backwards and track
target registers.
gcc/testsuite/ChangeLog:
* gcc.dg/pr116372.c: New test.
* gcc.dg/pr116405.c: New test.
|
|
Similar to not allowing jump instructions in the generated code, we
also shouldn't allow call instructions in noce_convert_multiple_sets.
In the case of PR116358 a libcall was generated from force_operand.
PR middle-end/116358
gcc/ChangeLog:
* ifcvt.cc (noce_convert_multiple_sets): Disallow call insns.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116358.c: New test.
|
|
Our power8 swap optimization pass has some special handling for optimizing
swaps of TImode variables. The test case reported in bugzilla uses a call
to __atomic_compare_exchange, which introduces a variable of PTImode and
that does not get the same treatment as TImode leading to wrong code
generation. The simple fix is to treat PTImode identically to TImode.
2024-08-23 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/116415
* config/rs6000/rs6000.h (TI_OR_PTI_MODE): New define.
* config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Use it to
handle PTImode identically to TImode.
gcc/testsuite/
PR target/116415
* gcc.target/powerpc/pr116415.c: New test.
|
|
Complex lowering generally replaces existing complex defs with
COMPLEX_EXPRs but those might be dead when it can always refer to
components from the lattice. This in turn can pessimize followup
transforms like forwprop and reassoc, the following makes sure to
get rid of dead COMPLEX_EXPRs generated by using
simple_dce_from_worklist.
PR tree-optimization/116463
* tree-complex.cc: Include tree-ssa-dce.h.
(dce_worklist): New global.
(update_complex_assignment): Add SSA def to the DCE worklist.
(tree_lower_complex): Perform DCE.
|
|
This reverts commit 4cb07a38233aadb4b389a6e5236c95f52241b6e0.
|
|
This patch would like to support the form 4 of the unsigned integer
.SAT_TRUNC. Aka below example:
Form 4:
#define DEF_SAT_U_TRUC_FMT_4(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_4 (WT x) \
{ \
bool not_overflow = x <= (WT)(NT)(-1); \
return ((NT)x) | (NT)((NT)not_overflow - 1); \
}
DEF_SAT_U_TRUC_FMT_4(uint32_t, uint64_t)
Before this patch:
4 │ __attribute__((noinline))
5 │ uint8_t sat_u_truc_uint32_t_to_uint8_t_fmt_4 (uint32_t x)
6 │ {
7 │ _Bool not_overflow;
8 │ unsigned char _1;
9 │ unsigned char _2;
10 │ unsigned char _3;
11 │ uint8_t _6;
12 │
13 │ ;; basic block 2, loop depth 0
14 │ ;; pred: ENTRY
15 │ not_overflow_5 = x_4(D) <= 255;
16 │ _1 = (unsigned char) x_4(D);
17 │ _2 = (unsigned char) not_overflow_5;
18 │ _3 = _2 + 255;
19 │ _6 = _1 | _3;
20 │ return _6;
21 │ ;; succ: EXIT
22 │
23 │ }
After this patch:
4 │ __attribute__((noinline))
5 │ uint8_t sat_u_truc_uint32_t_to_uint8_t_fmt_4 (uint32_t x)
6 │ {
7 │ uint8_t _6;
8 │
9 │ ;; basic block 2, loop depth 0
10 │ ;; pred: ENTRY
11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call]
12 │ return _6;
13 │ ;; succ: EXIT
14 │
15 │ }
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add form 4 for unsigned .SAT_TRUNC matching.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
In get_best_extraction_insn we use smallest_int_mode_for_size with
struct_bits as size argument. PR115495 has struct_bits = 256 and we
don't have a mode for that. This patch makes smallest_mode_for_size
and smallest_int_mode_for_size return opt modes so we can just skip
over the loop when there is no mode.
PR middle-end/115495
gcc/ChangeLog:
* cfgexpand.cc (expand_debug_expr): Require mode.
* combine.cc (make_extraction): Ditto.
* config/aarch64/aarch64.cc (aarch64_expand_cpymem): Ditto.
(aarch64_expand_setmem): Ditto.
* config/arc/arc.cc (arc_expand_cpymem): Ditto.
* config/arm/arm.cc (arm_expand_divmod_libfunc): Ditto.
* config/i386/i386.cc (ix86_get_mask_mode): Ditto.
* config/rs6000/predicates.md: Ditto.
* config/rs6000/rs6000.cc (vspltis_constant): Ditto.
* config/s390/s390.cc (s390_expand_insv): Ditto.
* config/sparc/sparc.cc (assign_int_registers): Ditto.
* coverage.cc (get_gcov_type): Ditto.
(get_gcov_unsigned_t): Ditto.
* dse.cc (find_shift_sequence): Ditto.
* expmed.cc (store_integral_bit_field): Ditto.
* expr.cc (convert_mode_scalar): Ditto.
(op_by_pieces_d::smallest_fixed_size_mode_for_size): Ditto.
(emit_block_move_via_oriented_loop): Ditto.
(copy_blkmode_to_reg): Ditto.
(store_field): Ditto.
* internal-fn.cc (expand_arith_overflow): Ditto.
* machmode.h (HAVE_MACHINE_MODES): Ditto.
(smallest_mode_for_size): Use opt_machine_mode.
(smallest_int_mode_for_size): Use opt_scalar_int_mode.
* optabs-query.cc (get_best_extraction_insn): Require mode.
* optabs.cc (expand_twoval_binop_libfunc): Ditto.
* stor-layout.cc (smallest_mode_for_size): Return
opt_machine_mode.
(layout_type): Require mode.
(initialize_sizetypes): Ditto.
* tree-ssa-loop-manip.cc (canonicalize_loop_ivs): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr115495.c: New test.
gcc/ada/ChangeLog:
* gcc-interface/utils2.cc (fast_modulo_reduction): Require mode.
(nonbinary_modular_operation): Ditto.
|
|
Standard abs synthesis during expand is max (a, -a). This
expansion has the advantage of avoiding masking and is thus potentially
faster than the a < 0 ? -a : a synthesis.
gcc/ChangeLog:
* config/riscv/autovec.md (abs<mode>2): Expand via max (a, -a).
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Adjust test
expectation.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
|
|
Apparently due to slightly different optimization levels
not always both subroutines have multiple subranges,
but having at least one such, and no lexical blocks
is sufficient to prove that the fix worked. Q.E.D.
So reduce the test expectations to only at least one
inlined subroutine with multiple subranges.
gcc/testsuite/ChangeLog:
PR other/116462
* gcc.dg/debug/dwarf2/inline7.c: Reduce test expectations.
|
|
This comes from a loophole in gnat_get_array_descr_info for record types
containing a template, which represent an aliased array, when this array
type is bit-packed and implemented as a modular integer.
gcc/ada/
* gcc-interface/misc.cc (gnat_get_array_descr_info): Test the
BIT_PACKED_ARRAY_TYPE_P flag only once on the final debug type. In
the case of records containing a template, replay the entire
processing for the array type contained therein.
|