aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-08-26[PATCH 1/2] AVX10.2: Support saturating convert instructionsHu, Lin140-1/+3327
gcc/ChangeLog: * config.gcc: Add avx10_2satcvtintrin.h and avx10_2-512satcvtintrin.h. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE (V8HI, V8BF, V8HI, UQI), (V16HI, V16BF, V16HI, UHI), (V32HI, V32BF, V32HI, USI), (V16SI, V16SF, V16SI, UHI, INT), (V16HI, V16BF, V16HI, UHI, INT), (V32HI, V32BF, V32HI, USI, INT). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI, V16HI_FTYPE_V16BF_V16HI_UHI, V8HI_FTYPE_V8BF_V8HI_UQI. (ix86_expand_round_builtin): Handle V32HI_FTYPE_V32BF_V32HI_USI_INT, V16SI_FTYPE_V16SF_V16SI_UHI_INT, V16HI_FTYPE_V16BF_V16HI_UHI_INT. * config/i386/immintrin.h: Include avx10_2satcvtintrin.h and avx10_2-512savcvtintrin.h. * config/i386/sse.md: (UNSPEC_CVTNE_BF16_IBS_ITER): New iterator. (sat_cvt_sign_prefix): Ditto. (sat_cvt_trunc_prefix): Ditto. (UNSPEC_CVT_PH_IBS_ITER): Ditto. (UNSPEC_CVTT_PH_IBS_ITER): Ditto. (UNSPEC_CVT_PS_IBS_ITER): Ditto. (UNSPEC_CVTT_PS_IBS_ITER): Ditto. (avx10_2_cvt<sat_cvt_trunc_prefix>nebf162i<sat_cvt_sign_prefix>bs<mode><mask_name>): New define_insn. (avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>): Ditto. (avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>): Ditto. (avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>): Ditto. (avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>): Ditto. * config/i386/avx10_2-512satcvtintrin.h: New file. * config/i386/avx10_2satcvtintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx512f-helper.h: Add new test macro. * gcc.target/i386/m512-check.h: Add new type. * gcc.target/i386/avx10_2-512-satcvt-1.c: New test. * gcc.target/i386/avx10_2-512-vcvtnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtps2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvttps2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-satcvt-1.c: Ditto. * gcc.target/i386/avx10_2-vcvtnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttnebf162ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttnebf162iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttph2iubs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttps2ibs-2.c: Ditto. * gcc.target/i386/avx10_2-vcvttps2iubs-2.c: Ditto.
2024-08-26[PATCH 2/2] AVX10.2: Support BF16 instructionskonglin135-2/+2096
gcc/ChangeLog: * config/i386/avx10_2-512bf16intrin.h: Add new intrinsics. * config/i386/avx10_2bf16intrin.h: Diito. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for new type. * config/i386/i386-builtin.def (BDESC): Add new buildin. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle new type. * config/i386/sse.md (vecmemsuffix): Add vector BF mode. (avx10_2_rsqrtpbf16_<mode><mask_name>): New define_insn. (avx10_2_sqrtnepbf16_<mode><mask_name>): Ditto. (avx10_2_rcppbf16_<mode><mask_name>): Ditto. (avx10_2_getexppbf16_<mode><mask_name>): Ditto. (BF16IMMOP): New iterator. (bf16immop): Ditto. (avx10_2_<bf16immop>pbf16_<mode><mask_name>): New define_insn. (avx10_2_fpclasspbf16_<mode><mask_scalar_merge_name>): Ditto. (avx10_2_cmppbf16_<mode><mask_scalar_merge_name>): Ditto. (avx10_2_comsbf16_v8bf): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10-check.h: Add AVX10_SCALAR. * gcc.target/i386/avx10-helper.h: Add helper functions. * gcc.target/i386/avx10_2-512-bf16-1.c: Add new tests. * gcc.target/i386/avx10_2-bf16-1.c: Ditto. * gcc.target/i386/avx-1.c: Add macros. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-vcmppbf16-2.c: New test. * gcc.target/i386/avx10_2-512-vfpclasspbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vgetexppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vgetmantpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrcppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vreducenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrndscalenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vrsqrtpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vsqrtnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcmppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vcomsbf16-1.c: Ditto. * gcc.target/i386/avx10_2-vcomsbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfpclasspbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vgetexppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vgetmantpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrcppbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vreducenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrndscalenepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vrsqrtpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vsqrtnepbf16-2.c: Ditto. Co-authored-by: Levy Hsu <admin@levyhsu.com>
2024-08-26[PATCH 1/2] AVX10.2: Support BF16 instructionskonglin135-2/+2514
gcc/ChangeLog: * config.gcc: Add avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/i386-builtin-types.def : Add new DEF_FUNCTION_TYPE for V32BF_FTYPE_V32BF_V32BF, V16BF_FTYPE_V16BF_V16BF, V8BF_FTYPE_V8BF_V8BF, V8BF_FTYPE_V8BF_V8BF_UQI, V16BF_FTYPE_V16BF_V16BF_UHI, V32BF_FTYPE_V32BF_V32BF_USI, V32BF_FTYPE_V32BF_V32BF_V32BF_USI, V8BF_FTYPE_V8BF_V8BF_V8BF_UQI and V16BF_FTYPE_V16BF_V16BF_V16BF_UHI. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle new DEF_FUNCTION_TYPE. * config/i386/immintrin.h: Include avx10_2-512bf16intrin.h and avx10_2bf16intrin.h. * config/i386/sse.md (VBF_AVX10_2): New iterator. (avx10_2_scalefpbf16_<mode><mask_name>): New define_insn. (avx10_2_<code>nepbf16_<mode><mask_name>): Ditto. (avx10_2_<insn>nepbf16_<mode><mask_name>): Ditto. (avx10_2_fmaddnepbf16_<mode>_maskz): New expander. (avx10_2_fnmaddnepbf16_<mode>_maskz): Ditto. (avx10_2_fmsubnepbf16_<mode>_maskz): Ditto. (avx10_2_fnmsubnepbf16_<mode>_maskz): Ditto. (avx10_2_fmaddnepbf16_<mode><sd_maskz_name>): New define_insn. (avx10_2_fmaddnepbf16_<mode>_mask): Ditto. (avx10_2_fmaddnepbf16_<mode>_mask3): Ditto. (avx10_2_fnmaddnepbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fnmaddnepbf16_<mode>_mask): Ditto. (avx10_2_fnmaddnepbf16_<mode>_mask3): Ditto. (avx10_2_fmsubnepbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fmsubnepbf16_<mode>_mask): Ditto. (avx10_2_fmsubnepbf16_<mode>_mask3): Ditto. (avx10_2_fnmsubnepbf16_<mode><sd_maskz_name>): Ditto. (avx10_2_fnmsubnepbf16_<mode>_mask): Ditto. (avx10_2_fnmsubnepbf16_<mode>_mask3): Ditto. * config/i386/avx10_2-512bf16intrin.h: New file. * config/i386/avx10_2bf16intrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-helper.h: Add MAKE_MASK_MERGE and MAKE_MASK_ZERO for bf16_uw. * gcc.target/i386/m512-check.h: Add union512bf16_uw, union256bf16_uw, union128bf16_uw and CHECK_EXP for them. * gcc.target/i386/avx10-helper.h: New file. * gcc.target/i386/avx10_2-512-bf16-1.c: New test. * gcc.target/i386/avx10_2-512-vaddnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vdivnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfnmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vfnmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vmaxpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vminpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vscalefpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-512-vsubnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-bf16-1.c: Ditto. * gcc.target/i386/avx10_2-vaddnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vdivnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfnmaddXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vfnmsubXXXnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vmaxpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vminpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vmulnepbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vscalefpbf16-2.c: Ditto. * gcc.target/i386/avx10_2-vsubnepbf16-2.c: Ditto. Co-authored-by: Levy Hsu <admin@levyhsu.com>
2024-08-26AVX10.2: Support convert instructionsLevy Hsu45-5/+3511
gcc/ChangeLog: * config.gcc: Add avx10_2-512convertintrin.h and avx10_2convertintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_POINTER_TYPE and DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Handle AVX10.2. (ix86_expand_round_builtin): Ditto. * config/i386/immintrin.h: Include avx10_2-512convertintrin.h, avx10_2convertintrin.h. * config/i386/sse.md (VHF_AVX10_2): New iterator. (bf16_ph): Add 512 bit mode. (avx10_2_cvt2ps2phx_<mode><mask_name<round_name>): New define_insn. (ssebvecmode): New iterator. (UNSPEC_NECONVERTFP8_PACK): Ditto. (neconvertfp8_pack): Ditto. (vcvt<neconvertfp8_pack><mode><mask_name>): New define_insn. (ssebvecmode_2): New iterator. (UNSPEC_VCVTBIASPH2FP8_PACK): Ditto. (biasph2fp8_pack): Ditto. (vcvt<biasph2fp8_pack>v8hf): New expander. (vcvt<biasph2fp8_pack>v8hf_mask): Ditto. (*vcvt<biasph2bf8_pack>v8hf): New define_insn. (*vcvt<biasph2fp8_pack>v8hf_mask): Ditto. (VHF_AVX10_2_2): New iterator. (vcvt<biasph2fp8_pack><mode><mask_name>): New define_insn. (VHF_256_512): New iterator. (ph2fp8suff): Ditto. (UNSPEC_NECONVERTPH2FP8_PACK): Ditto. (neconvertph2fp8): Ditto. (vcvt<neconvertph2fp8>v8hf_mask): New expander. (*vcvt<neconvertph2fp8>v8hf): New define_insn. (*vcvt<neconvertph2fp8>v8hf_mask): Ditto. (vcvt<neconvertph2fp8><mode><mask_name>): Ditto. (vcvthf82ph<mode><mask_name>): Ditto. * config/i386/avx10_2-512convertintrin.h: New file. * config/i386/avx10_2convertintrin.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add macros for const. * gcc.target/i386/avx-2.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-convert-1.c: New test. * gcc.target/i386/avx10_2-512-vcvt2ps2phx-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-512-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-convert-1.c: Ditto. * gcc.target/i386/avx10_2-vcvt2ps2phx-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtbiasph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvthf82ph-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtne2ph2hf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2bf8s-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8-2.c: Ditto. * gcc.target/i386/avx10_2-vcvtneph2hf8s-2.c: Ditto. * gcc.target/i386/fp8-helper.h: New helper file. Co-authored-by: Levy Hsu <admin@levyhsu.com> Co-authored-by: Kong Lingling <lingling.kong@intel.com>
2024-08-26[PATCH 2/2] AVX10.2: Support media instructionsHaochen Jiang32-35/+1953
gcc/ChangeLog: * config/i386/avx10_2-512mediaintrin.h: Add new intrins. * config/i386/avx10_2mediaintrin.h: Ditto. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT16 and AVX10.2. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. * config/i386/sse.md (unspec): Add UNSPEC_VDPPHPS. (avx10_2_mpsadbw<mask_name>): New define_insn. (<mask_codefor><sse4_1_avx2>_mpsadbw<mask_name>): Ditto. (vpdp<vpdpwprodtype>_<mode>): Add AVX10_2_256. (vpdp<vpdpwprodtype>_v16si): New defin_insn. (vpdp<vpdpwprodtype>_<mode>_mask): Ditto. (*vpdp<vpdpwprodtype>_<mode>_maskz): Ditto. (vpdp<vpdpwprodtype>_<mode>_maskz): New expander. (vdpphps_<mode>): New define_insn. (vdpphps_<mode>_mask): Ditto. (*vdpphps_<mode>_maskz): Ditto. (vdpphps_<mode>_maskz): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avxvnniint16-1.c: Add new macro test. * gcc.target/i386/avx-1.c: Ditto. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-512-media-1.c: Add test. * gcc.target/i386/avx10_2-media-1.c: Ditto. * gcc.target/i386/avxvnniint16-builtin.c: New test. * gcc.target/i386/avx10_2-512-vdpphps-2.c: Ditto. * gcc.target/i386/avx10_2-512-vmpsadbw-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwsud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwsuds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwusd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwusds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwuud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpwuuds-2.c: Ditto. * gcc.target/i386/avx10_2-builtin-2.c: Ditto. * gcc.target/i386/avx10_2-vdpphps-2.c: Ditto. * gcc.target/i386/avx10_2-vmpsadbw-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwsud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwsuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwusd-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwusds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpwuuds-2.c: Ditto. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com>
2024-08-26[PATCH 1/2] AVX10.2: Support media instructionsHongyu Wang30-24/+1577
gcc/ChangeLog * config.gcc: Add avx10_2mediaintrin.h and avx10_2-512mediaintrin.h. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-builtins.cc (def_builtin): Handle shared builtins between AVXVNNIINT8 and AVX10.2. * config/i386/i386-expand.cc (ix86_check_builtin_isa_match): Ditto. * config/i386/immintrin.h: Include avx10_2mediaintrin.h and avx10_2-512mediaintrin.h * config/i386/sse.md: (VI4_AVX10_2): New. (vpdp<vpdotprodtype>_<mode>): Add AVX10_2_256. (vpdp<vpdotprodtype>_v16si): New define_insn. (vpdp<vpdotprodtype>_<mode>_mask): Ditto. (*vpdp<vpdotprodtype>_<mode>_maskz): Ditto. (vpdp<vpdotprodtype>_<mode>_maskz): New expander. * config/i386/avx10_2-512mediaintrin.h: New file. * config/i386/avx10_2mediaintrin.h: Ditto. gcc/testsuite/ChangeLog * gcc.target/i386/avx512f-helper.h: Reuse AVX512F macros for AVX10. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avx10_2): New. (check_effective_target_avx10_2_512): Ditto. * gcc.target/i386/avx10-check.h: New test file. * gcc.target/i386/avx10-helper.h: Ditto. * gcc.target/i386/avx10_2-builtin-1.c: Ditto. * gcc.target/i386/avx10_2-512-media-1.c: Ditto. * gcc.target/i386/avx10_2-media-1.c: Ditto.. * gcc.target/i386/avxvnniint8-builtin.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbssd-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbssds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbsud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-512-vpdpbuuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbssd-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbssds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbsud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuud-2.c: Ditto. * gcc.target/i386/avx10_2-vpdpbuuds-2.c: Ditto. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
2024-08-26i386: Refactor m512-check.hHaochen Jiang1-31/+35
After AVX10 introduction, we still want to use AVX512 helper functions to avoid duplicate code. In order to reuse them, we need to do some refactor to make sure each function define happen under correct ISA to avoid ABI warnings. gcc/testsuite/ChangeLog: * gcc.target/i386/m512-check.h: Wrap the function define with correct vector size.
2024-08-26RISC-V: Support IMM for operand 0 of ussub patternPan Li17-2/+477
This patch would like to allow IMM for the operand 0 of ussub pattern. Aka .SAT_SUB(1023, y) as the below example. Form 1: #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ { \ return (T)IMM >= y ? (T)IMM - y : 0; \ } DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) Before this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ bgtu a0,a5,.L3 13 │ sub a0,a5,a0 14 │ ret 15 │ .L3: 16 │ li a0,0 17 │ ret After this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ sltu a4,a5,a0 13 │ addi a4,a4,-1 14 │ sub a0,a5,a0 15 │ and a0,a4,a0 16 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new func impl to gen xmode rtx reg from operand rtx. (riscv_expand_ussub): Gen xmode reg for operand 1. * config/riscv/riscv.md: Allow const_int for operand 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_u_sub_imm-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-4.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-26RISC-V: Add testcases for unsigned vector .SAT_TRUNC form 4Pan Li13-0/+236
This patch would like to add test cases for the unsigned vector .SAT_TRUNC form 4. Aka: Form 4: #define DEF_VEC_SAT_U_TRUNC_FMT_4(NT, WT) \ void __attribute__((noinline)) \ vec_sat_u_trunc_##NT##_##WT##_fmt_4 (NT *out, WT *in, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ bool not_overflow = in[i] <= (WT)(NT)(-1); \ out[i] = ((NT)in[i]) | (NT)((NT)not_overflow - 1); \ } \ } DEF_VEC_SAT_U_TRUNC_FMT_4 (uint32_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-19.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-20.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-21.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-22.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-23.c: New test. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-24.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-26RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 4Pan Li13-0/+218
This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 4. Aka: Form 4: #define DEF_SAT_U_TRUNC_FMT_4(NT, WT) \ NT __attribute__((noinline)) \ sat_u_trunc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ bool not_overflow = x <= (WT)(NT)(-1); \ return ((NT)x) | (NT)((NT)not_overflow - 1); \ } The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_u_trunc-19.c: New test. * gcc.target/riscv/sat_u_trunc-20.c: New test. * gcc.target/riscv/sat_u_trunc-21.c: New test. * gcc.target/riscv/sat_u_trunc-22.c: New test. * gcc.target/riscv/sat_u_trunc-23.c: New test. * gcc.target/riscv/sat_u_trunc-24.c: New test. * gcc.target/riscv/sat_u_trunc-run-19.c: New test. * gcc.target/riscv/sat_u_trunc-run-20.c: New test. * gcc.target/riscv/sat_u_trunc-run-21.c: New test. * gcc.target/riscv/sat_u_trunc-run-22.c: New test. * gcc.target/riscv/sat_u_trunc-run-23.c: New test. * gcc.target/riscv/sat_u_trunc-run-24.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-26Daily bump.GCC Administrator3-1/+143
2024-08-25RISC-V: Fix double mode under RV32 not utilize vfdemin.han33-68/+69
Currently, some binops of vector vs double scalar under RV32 can't translated to vf but vfmv+vxx.vv. The cause is that vec_duplicate is also expanded to broadcast for double mode under RV32. last-combine can't process expanded broadcast. gcc/ChangeLog: * config/riscv/vector.md: Add !FLOAT_MODE_P constraint. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto.
2024-08-25[PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.Xianmiao Qu2-0/+19
The previous patch: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd aimed to eliminate redundant MOV instructions by removing calling emit_clobber in lower-subreg.cc's resolve_simple_move. First, I found that another patch address this issue: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf2737cda53a83332db1a1a021653447b05a7e7 and even without removing calling emit_clobber, the instruction generation is still as expected. Second, removing the CLOBBER expression will have side effects. When there is no CLOBBER expression and only SUBREG assignments exist, according to the logic of the 'df_lr_bb_local_compute' function, the register will be added to the basic block LR IN set. This will cause the register's lifetime to span the entire function, resulting in increased register pressure. Taking the newly added test case 'gcc/testsuite/gcc.target/riscv/pr43644.c' as an example, removing the CLOBBER expression will lead to spill in some registers. gcc/: * lower-subreg.cc (resolve_simple_move): Re-add calling emit_clobber immediately before moving a multi-word register by parts. gcc/testsuite/: * gcc.target/riscv/pr43644.c: New test case.
2024-08-25testsuite: Run array54.C only for sync_int_long targetsDimitar Dimitrov1-0/+1
The test case uses "atomic<int>", which fails to link on pru-unknown-elf target due to missing __atomic_load_4 symbol. Fix by filtering for sync_int_long effective target. Ensured that the test still passes for x86_64-pc-linux-gnu. gcc/testsuite/ChangeLog: * g++.dg/init/array54.C: Require sync_int_long effective target. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2024-08-25Support if conversion for switchesAndi Kleen5-6/+270
The gimple-if-to-switch pass converts if statements with multiple equal checks on the same value to a switch. This breaks vectorization which cannot handle switches. Teach the tree-if-conv pass used by the vectorizer to handle simple switch statements, like those created by if-to-switch earlier. These are switches that only have a single non default block, They are handled similar to COND in if conversion. This makes the vect-bitfield-read-1-not test fail. The test checks for a bitfield analysis failing, but it actually relied on the ifcvt erroring out early because the test is using a switch. The if conversion still does not work because the switch is not in a form that this patch can handle, but it fails much later and the bitfield analysis succeeds, which makes the test fail. I marked it xfail because it doesn't seem to be testing what it wants to test. PR tree-optimization/115866 gcc/ChangeLog: * tree-if-conv.cc (if_convertible_switch_p): New function. (if_convertible_stmt_p): Check for switch. (get_loop_body_in_if_conv_order): Handle switch. (predicate_bbs): Likewise. (predicate_statements): Likewise. (remove_conditions_and_labels): Likewise. (ifcvt_split_critical_edges): Likewise. (ifcvt_local_dce): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-switch-ifcvt-1.c: New test. * gcc.dg/vect/vect-switch-ifcvt-2.c: New test. * gcc.dg/vect/vect-switch-search-line-fast.c: New test. * gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
2024-08-25Write CodeView information about static locals in optimized codeMark Harmstone1-0/+57
Write CodeView S_LDATA32 symbols for static locals in optimized code. We have to handle these separately, as they come after the S_FRAMEPROC, plus you can't have S_BLOCK32 symbols like you can in unoptimized code. gcc/ * dwarf2codeview.cc (write_optimized_static_local_vars): New function. (write_function): Call write_optimized_static_local_vars.
2024-08-25Write CodeView S_FRAMEPROC symbolsMark Harmstone1-2/+78
Write S_FRAMEPROC symbols, which aren't very useful but seem to be necessary for Microsoft debuggers to function properly. These symbols come after S_LOCAL symbols for optimized variables, but before S_REGISTER and S_REGREL32 for unoptimized variables. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_FRAMEPROC. (write_s_frameproc): New function. (write_function): Call write_s_frameproc.
2024-08-25Write CodeView information about optimized stack variablesMark Harmstone1-9/+119
Outputs S_DEFRANGE_REGISTER_REL symbols for optimized local variables that are on the stack, consisting of the stack register, the offset, and the code range for which this applies. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_DEFRANGE_REGISTER_REL. (write_defrange_register_rel): New function. (write_optimized_local_variable_loc): Add fbloc param, and call write_defrange_register_rel. (write_optimized_local_variable): Add fbloc param. (write_optimized_function_vars): Add fbloc param.
2024-08-25Write CodeView information about enregistered optimized variablesMark Harmstone4-39/+353
Enable variable tracking when outputting CodeView debug information, and make it so that we issue debug symbols for optimized variables in registers. This consists of S_LOCAL symbols, which give the name and the type of local variables, followed by S_DEFRANGE_REGISTER symbols for the register and the code for which this applies. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_LOCAL and S_DEFRANGE_REGISTER. (write_s_local): New function. (write_defrange_register): New function. (write_optimized_local_variable_loc): New function. (write_optimized_local_variable): New function. (write_optimized_function_vars): New function. (write_function): Call write_optimized_function_vars if variable tracking enabled. * dwarf2out.cc (typedef var_loc_view): Move to dwarf2out.h. (struct dw_loc_list_struct): Likewise. * dwarf2out.h (typedef var_loc_view): Move from dwarf2out.h. (struct dw_loc_list_struct): Likewise. * opts.cc (finish_options): Enable variable tracking for CodeView.
2024-08-25i386: Update STV's gains for TImode arithmetic right shifts on AVX2.Roger Sayle1-8/+13
This patch tweaks timode_scalar_chain::compute_convert_gain to better reflect the expansion of V1TImode arithmetic right shifts by the i386 backend. The comment "see ix86_expand_v1ti_ashiftrt" appears after "case ASHIFTRT" in compute_convert_gain, and the changes below attempt to better match the logic used there. The original motivating example is: __int128 m1; void foo() { m1 = (m1 << 8) >> 8; } which with -O2 -mavx2 we fail to convert to vector form due to the inappropriate cost of the arithmetic right shift. Instruction gain -16 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;} Total gain: -3 Chain #1 conversion is not profitable This is reporting that the ASHIFTRT is four instructions worse using vectors than in scalar form, which is incorrect as the AVX2 expansion of this shift only requires three instructions (and the scalar form requires two). With more accurate costs in timode_scalar_chain::compute_convert_gain we now see (with -O2 -mavx2): Instruction gain -4 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;} Total gain: 9 Converting chain #1... which results in: foo: vmovdqa m1(%rip), %xmm0 vpslldq $1, %xmm0, %xmm0 vpsrad $8, %xmm0, %xmm1 vpsrldq $1, %xmm0, %xmm0 vpblendd $7, %xmm0, %xmm1, %xmm0 vmovdqa %xmm0, m1(%rip) ret 2024-08-25 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain) <case ASHIFTRT>: Update to match ix86_expand_v1ti_ashiftrt.
2024-08-25Disable late-combine in another RISC-V testJeff Law1-1/+1
Another test where the output was slightly twiddled by late-combine in which simply disabling late-combine seems to be the best option. > Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ... > FAIL: gcc.target/riscv/cm_mv_rv32.c -Os check-function-bodies sum Pushing to the trunk. gcc/testsuite * gcc.target/riscv/cm_mv_rv32.c: Disable late-combine.
2024-08-25[committed] Fix assembly scan for RISC-V VLS testsJeff Law7-7/+7
Surya's IRA patch from June slightly improves the code we generate for the vls/calling-conventions tests on RISC-V. Specifically it removes an unnecessary move from the instruction stream. This (of course) broke those tests: > Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/rvv/rvv.exp ... > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 > FAIL: gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c -O3 -ftree-vectorize -mrvv-vector-bits=scalable scan-assembler-times mv\\s+s0,a0\\s+call\\s+memset\\s+mv\\s+a0,s0 3 This patch does the natural adjustment of those tests by dropping the moves from the scan. gcc/testsuite * gcc.target/riscv/rvv/autovec/vls/calling-convention-1.c: Update expected output. * gcc.target/riscv/rvv/autovec/vls/calling-convention-2.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-3.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-4.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-5.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-6.c: Likewise. * gcc.target/riscv/rvv/autovec/vls/calling-convention-7.c: Likewise.
2024-08-25Turn off late-combine for a few risc-v specific testsJeff Law4-4/+4
Just minor testsuite adjustments -- several of the shorten-memref tests are slightly twiddled by the late-combine pass: > Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ... > FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi > XPASS: gcc.target/riscv/shorten-memrefs-3.c -Os scan-assembler-not load2a:\n.*addi[ \t]*[at][0-9],[at][0-9],[0-9]* > FAIL: gcc.target/riscv/shorten-memrefs-5.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi > FAIL: gcc.target/riscv/shorten-memrefs-8.c -Os scan-assembler store:\n(\t?\\.[^\n]*\n)*\taddi\ta[0-7],a[0-7],1 This patch just turns off the late-combine pass for those tests. Locally I'd adjusted all the shorten-memref patches, but a quick re-rest shows that only 4 tests seem affected right now. Anyway, pushing to the trunk to slightly clean up our test results. gcc/testsuite * gcc.target/riscv/shorten-memrefs-2.c: Turn off late-combine. * gcc.target/riscv/shorten-memrefs-3.c: Likewise. * gcc.target/riscv/shorten-memrefs-5.c: Likewise. * gcc.target/riscv/shorten-memrefs-8.c: Likewise.
2024-08-25modula2 testsuite: new libc unit testGaius Mulley2-0/+75
This patch provides a simple unit test for snprintf and atof against the libc definition module. gcc/testsuite/ChangeLog: * gm2/calling-c/libc/run/pass/calling-c-libc-run-pass.exp: New test. * gm2/calling-c/libc/run/pass/testlibcstr.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-08-25Daily bump.GCC Administrator4-1/+178
2024-08-24modula2: Export all string to integral and fp number conversion functionsGaius Mulley1-0/+84
Export all string to integral and floating point number conversion functions (atof, atoi, atol, atoll, strtod, strtof, strtold, strtol, strtoll, strtoul and strtoull). gcc/m2/ChangeLog: * gm2-libs/libc.def (atof): Export unqualified. (atoi): Ditto. (atol): Ditto. (atoll): Ditto. (strtod): Ditto. (strtof): Ditto. (strtold): Ditto. (strtol): Ditto. (strtoll): Ditto. (strtoul): Ditto. (strtoull): Ditto. Signed-off-by: Wilken Gottwalt <wilken.gottwalt@posteo.net>
2024-08-24c++, coroutines: Look through initial_await target exprs [PR110635].Iain Sandoe2-1/+79
In the case that the initial awaiter returns an object, the initial await can be a target expression and we need to look at its initializer to cast the await_resume() to void and to wrap in a compound expression that sets the initial_await_resume_called flag. PR c++/110635 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::wrap_original_function_body): Look through initial await target expressions to find the actual co_await_expr that we need to update. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr110635.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Rework handling of throwing_cleanups [PR102051].Iain Sandoe2-11/+21
In the fix for PR95822 (r11-7402) we set throwing_cleanup false in the top level of the coroutine transform code. However, as the current PR shows, that is not sufficient. Any use of cxx_maybe_build_cleanup() can reset the flag, which causes the check_return_expr () logic to try to add a guard variable and set it. For the coroutine code, we need to handle the cleanups separately, since the responsibility for them changes after the first resume point, which we handle in the ramp exception processing. Fix this by forcing the "throwing_cleanup" flag false right before the processing of the return expression. PR c++/102051 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Handle "throwing_cleanup" here instead of ... (cp_coroutine_transform::apply_transforms): ... here. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr102051.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Fix ordering of return object conversions [PR115908].Iain Sandoe2-107/+129
[dcl.fct.def.coroutine]/7 says: The expression promise.get_return_object() is used to initialize the returned reference or prvalue result object of a call to a coroutine. The call to get_return_object is sequenced before the call to initial_suspend and is invoked at most once. The issue is about when any conversions are carried out if the type of the g_r_o call is not the same as the ramp return. Currently, we have been doing this by materialising the g_r_o return value and passing that to finish_return_expr() which handles the necessary conversions and checks. As the PR shows, this does not work as expected. In the revised version we carry out the work of the conversions when intialising the return slot (with the same facilities that are used by finish_return_expr()). We do this before the call that initiates the coroutine body, satisfying the requirements for one call before initial suspend. The return expression becomes a trivial 'return <retval>'. This simplifies the ramp logic considerably, since we no longer need to keep track of the temporarily-materialised g_r_o value. PR c++/115908 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Rework the return value initialisation to initialise the return slot always from get_return_object, even if that implies carrying out conversions to do so. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr115908.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Allow convertible get_return_on_allocation_fail [PR109682].Iain Sandoe2-13/+34
We have been requiring the get_return_on_allocation_fail() call to have the same type as the ramp. This is not intended by the standard, so relax that to allow anything convertible to the ramp return. PR c++/109682 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Allow for cases where get_return_on_allocation_fail has a type convertible to the ramp return type. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr109682.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Only allow void get_return_object if the ramp is void ↵Iain Sandoe7-49/+48
[PR100476]. Require that the value returned by get_return_object is convertible to the ramp return. This means that the only time we allow a void get_return_object, is when the ramp is also a void function. We diagnose this early to allow us to exit the ramp build if the return values are incompatible. PR c++/100476 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Remove special handling of void get_return_object expressions. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: Adjust expected diagnostic. * g++.dg/coroutines/pr102489.C: Avoid void get_return_object. * g++.dg/coroutines/pr103868.C: Likewise. * g++.dg/coroutines/pr94879-folly-1.C: Likewise. * g++.dg/coroutines/pr94883-folly-2.C: Likewise. * g++.dg/coroutines/pr96749-2.C: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Fix handling of early exceptions [PR113773].Iain Sandoe2-13/+92
The responsibility for destroying part of the frame content (promise, arg copies and the frame itself) transitions from the ramp to the body of the coroutine once we reach the await_resume () for the initial suspend. We added the variable that flags the transition, but failed to act on it. This corrects that so that the ramp only tries to run DTORs for objects when an exception occurs before the initial suspend await resume has started. PR c++/113773 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::build_ramp_function): Only cleanup the frame state on exceptions that occur before the initial await resume has begun. gcc/testsuite/ChangeLog: * g++.dg/coroutines/torture/pr113773.C: New test. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Separate allocator work from the ramp body build.Iain Sandoe9-235/+280
This splits out the building of the allocation and deallocation expressions and runs them early in the ramp build, so that we can exit if they are not usable, before we start building the ramp body. Likewise move checks for other required resources to the begining of the ramp builder. This is preparation for work needed to update the allocation/destruction in cases where we have excess alignment of the promise or other saved frame state. gcc/cp/ChangeLog: * call.cc (build_op_delete_call_1): Renamed and added a param to allow the caller to prioritize two argument usual deleters. (build_op_delete_call): New. (build_coroutine_op_delete_call): New. * coroutines.cc (coro_get_frame_dtor): Rename... (build_coroutine_frame_delete_expr):... to this; simplify to use build_op_delete_call for all cases. (build_actor_fn): Use revised frame delete function. (build_coroutine_frame_alloc_expr): New. (cp_coroutine_transform::complete_ramp_function): Rename... (cp_coroutine_transform::build_ramp_function): ... to this. Reorder code to carry out checks for prerequisites before the codegen. Split out the allocation/delete code. (cp_coroutine_transform::apply_transforms): Use revised name. * coroutines.h: Rename function. * cp-tree.h (build_coroutine_op_delete_call): New. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-bad-alloc-01-bad-op-del.C: Use revised diagnostics. * g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C: Likewise. * g++.dg/coroutines/coro-bad-gro-01-void-gro-non-class-coro.C: Likewise. * g++.dg/coroutines/coro-bad-grooaf-00-static.C: Likewise. * g++.dg/coroutines/ramp-return-b.C: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Separate the analysis, ramp and outlined function synthesis.Iain Sandoe4-414/+446
This change is preparation for fixes to the ramp and codegen to follow. The primary motivation is that we have thee activities; analysis, ramp synthesis and outlined coroutine body synthesis. These are currently carried out in sequence in the 'morph_fn_to_coro' code, which means that we are nesting the synthesis of the outlined coroutine body inside the finish_function call for the original function (which becomes the ramp). The revised code splits the three interests so that the analysis can be used independently by the ramp and body synthesis. This avoids some issues seen with global state that start/finish function use and allows us to use more of the high-level APIs in fixing bugs. The resultant implementation is more self-contained, and has less impact on finish_function. gcc/cp/ChangeLog: * coroutines.cc (struct suspend_point_info, struct param_info, struct local_var_info, struct susp_frame_data, struct local_vars_frame_data): Move to coroutines.h. (build_actor_fn): Use start/finish function APIs. (build_destroy_fn): Likewise. (coro_build_actor_or_destroy_function): No longer mark the actor / destroyer as DECL_COROUTINE_P. (coro_rewrite_function_body): Use class members. (cp_coroutine_transform::wrap_original_function_body): Likewise. (build_ramp_function): Replace by... (cp_coroutine_transform::complete_ramp_function): ...this. (cp_coroutine_transform::cp_coroutine_transform): New. (cp_coroutine_transform::~cp_coroutine_transform): New (morph_fn_to_coro): Replace by... (cp_coroutine_transform::apply_transforms): ...this. (cp_coroutine_transform::finish_transforms): New. * cp-tree.h (morph_fn_to_coro): Remove. * decl.cc (emit_coro_helper): Remove. (finish_function): Revise handling of coroutine transforms. * coroutines.h: New file. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> Co-authored-by: Arsen Arsenović <arsen@aarsen.me>
2024-08-24c++, coroutines: Split the ramp build into a separate function.Iain Sandoe1-183/+201
This is primarily preparation to partition the functionality of the coroutine transform into analysis, ramp generation and then (later) synthesis of the coroutine body. The patch does fix one latent issue in the ordering of DTORs for frame parameter copies (to ensure that they are processed in reverse order to the copy creation). gcc/cp/ChangeLog: * coroutines.cc (build_actor_fn): Arrange to apply any required parameter copy DTORs in reverse order to their creation. (coro_rewrite_function_body): Handle revised param uses. (morph_fn_to_coro): Split the ramp function completion into a separate function. (build_ramp_function): New. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++, coroutines: Tidy up awaiter variable checks.Iain Sandoe1-48/+11
When we build an await expression, we might need to materialise the awaiter if it is a prvalue. This re-implements this using core APIs instead of local code. gcc/cp/ChangeLog: * coroutines.cc (build_co_await): Simplify checks for the cases that we need to materialise an awaiter. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-24c++: Add testcase for (now fixed) regression [PR113746]Simon Martin1-0/+6
The case in PR113746 used to ICE until commit r15-123-gf04dc89a991ddc. This patch simply adds the case to the testsuite. PR c++/113746 gcc/testsuite/ChangeLog: * g++.dg/parse/crash76.C: New test.
2024-08-24testsuite: Add dg-require-effective-target scheduling for some tests that ↵Georg-Johann Lay2-0/+2
set -fschedule-insns. gcc/testsuite/ * gcc.dg/torture/pr115929-2.c: Add dg-require-effective-target scheduling. * gcc.dg/torture/pr116343.c: Same.
2024-08-24Daily bump.GCC Administrator5-1/+367
2024-08-23RISC-V: Use encoded nelts when calling repeating_sequence_pPatrick O'Neill1-7/+3
repeating_sequence_p operates directly on the encoded pattern and does not derive elements using the .elt() accessor. Passing in the length of the unencoded vector can cause an out-of-bounds read of the encoded pattern. gcc/ChangeLog: * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Use encoded_nelts when calling repeating_sequence_p. (rvv_builder::is_repeating_sequence): Ditto. (rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2024-08-23ifcvt: Do not overwrite results in noce_convert_multiple_sets [PR116372, ↵Manolis Tsamis3-4/+48
PR116405] Now that more operations are allowed for noce_convert_multiple_sets, it is possible that the same register appears multiple times as target in a basic block. After noce_convert_multiple_sets_1 is called we potentially also emit register moves from temporaries back to the original targets. In some cases where the target registers overlap with the block's condition, these register moves may overwrite intermediate variables because they're emitted after the if-converted code. To address this issue we now iterate backwards and keep track of seen registers when emitting these final register moves. PR rtl-optimization/116372 PR rtl-optimization/116405 gcc/ChangeLog: * ifcvt.cc (noce_convert_multiple_sets): Iterate backwards and track target registers. gcc/testsuite/ChangeLog: * gcc.dg/pr116372.c: New test. * gcc.dg/pr116405.c: New test.
2024-08-23ifcvt: disallow call instructions in noce_convert_multiple_sets [PR116358]Manolis Tsamis2-1/+16
Similar to not allowing jump instructions in the generated code, we also shouldn't allow call instructions in noce_convert_multiple_sets. In the case of PR116358 a libcall was generated from force_operand. PR middle-end/116358 gcc/ChangeLog: * ifcvt.cc (noce_convert_multiple_sets): Disallow call insns. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr116358.c: New test.
2024-08-23rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]Peter Bergner3-4/+48
Our power8 swap optimization pass has some special handling for optimizing swaps of TImode variables. The test case reported in bugzilla uses a call to __atomic_compare_exchange, which introduces a variable of PTImode and that does not get the same treatment as TImode leading to wrong code generation. The simple fix is to treat PTImode identically to TImode. 2024-08-23 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/116415 * config/rs6000/rs6000.h (TI_OR_PTI_MODE): New define. * config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Use it to handle PTImode identically to TImode. gcc/testsuite/ PR target/116415 * gcc.target/powerpc/pr116415.c: New test.
2024-08-23tree-optimization/116463 - complex lowering leaves around dead stmtsRichard Biener1-0/+9
Complex lowering generally replaces existing complex defs with COMPLEX_EXPRs but those might be dead when it can always refer to components from the lattice. This in turn can pessimize followup transforms like forwprop and reassoc, the following makes sure to get rid of dead COMPLEX_EXPRs generated by using simple_dce_from_worklist. PR tree-optimization/116463 * tree-complex.cc: Include tree-ssa-dce.h. (dce_worklist): New global. (update_complex_assignment): Add SSA def to the DCE worklist. (tree_lower_complex): Perform DCE.
2024-08-23Revert "Fortran: Fix class transformational intrinsic calls [PR102689]"Paul Thomas4-475/+35
This reverts commit 4cb07a38233aadb4b389a6e5236c95f52241b6e0.
2024-08-23Match: Support form 4 for unsigned integer .SAT_TRUNCPan Li1-0/+18
This patch would like to support the form 4 of the unsigned integer .SAT_TRUNC. Aka below example: Form 4: #define DEF_SAT_U_TRUC_FMT_4(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_4 (WT x) \ { \ bool not_overflow = x <= (WT)(NT)(-1); \ return ((NT)x) | (NT)((NT)not_overflow - 1); \ } DEF_SAT_U_TRUC_FMT_4(uint32_t, uint64_t) Before this patch: 4 │ __attribute__((noinline)) 5 │ uint8_t sat_u_truc_uint32_t_to_uint8_t_fmt_4 (uint32_t x) 6 │ { 7 │ _Bool not_overflow; 8 │ unsigned char _1; 9 │ unsigned char _2; 10 │ unsigned char _3; 11 │ uint8_t _6; 12 │ 13 │ ;; basic block 2, loop depth 0 14 │ ;; pred: ENTRY 15 │ not_overflow_5 = x_4(D) <= 255; 16 │ _1 = (unsigned char) x_4(D); 17 │ _2 = (unsigned char) not_overflow_5; 18 │ _3 = _2 + 255; 19 │ _6 = _1 | _3; 20 │ return _6; 21 │ ;; succ: EXIT 22 │ 23 │ } After this patch: 4 │ __attribute__((noinline)) 5 │ uint8_t sat_u_truc_uint32_t_to_uint8_t_fmt_4 (uint32_t x) 6 │ { 7 │ uint8_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _6; 13 │ ;; succ: EXIT 14 │ 15 │ } The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add form 4 for unsigned .SAT_TRUNC matching. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-08-23optabs-query: Use opt_machine_mode for smallest_int_mode_for_size [PR115495].Robin Dapp22-42/+62
In get_best_extraction_insn we use smallest_int_mode_for_size with struct_bits as size argument. PR115495 has struct_bits = 256 and we don't have a mode for that. This patch makes smallest_mode_for_size and smallest_int_mode_for_size return opt modes so we can just skip over the loop when there is no mode. PR middle-end/115495 gcc/ChangeLog: * cfgexpand.cc (expand_debug_expr): Require mode. * combine.cc (make_extraction): Ditto. * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Ditto. (aarch64_expand_setmem): Ditto. * config/arc/arc.cc (arc_expand_cpymem): Ditto. * config/arm/arm.cc (arm_expand_divmod_libfunc): Ditto. * config/i386/i386.cc (ix86_get_mask_mode): Ditto. * config/rs6000/predicates.md: Ditto. * config/rs6000/rs6000.cc (vspltis_constant): Ditto. * config/s390/s390.cc (s390_expand_insv): Ditto. * config/sparc/sparc.cc (assign_int_registers): Ditto. * coverage.cc (get_gcov_type): Ditto. (get_gcov_unsigned_t): Ditto. * dse.cc (find_shift_sequence): Ditto. * expmed.cc (store_integral_bit_field): Ditto. * expr.cc (convert_mode_scalar): Ditto. (op_by_pieces_d::smallest_fixed_size_mode_for_size): Ditto. (emit_block_move_via_oriented_loop): Ditto. (copy_blkmode_to_reg): Ditto. (store_field): Ditto. * internal-fn.cc (expand_arith_overflow): Ditto. * machmode.h (HAVE_MACHINE_MODES): Ditto. (smallest_mode_for_size): Use opt_machine_mode. (smallest_int_mode_for_size): Use opt_scalar_int_mode. * optabs-query.cc (get_best_extraction_insn): Require mode. * optabs.cc (expand_twoval_binop_libfunc): Ditto. * stor-layout.cc (smallest_mode_for_size): Return opt_machine_mode. (layout_type): Require mode. (initialize_sizetypes): Ditto. * tree-ssa-loop-manip.cc (canonicalize_loop_ivs): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr115495.c: New test. gcc/ada/ChangeLog: * gcc-interface/utils2.cc (fast_modulo_reduction): Require mode. (nonbinary_modular_operation): Ditto.
2024-08-23RISC-V: Expand vec abs without masking.Robin Dapp12-41/+47
Standard abs synthesis during expand is max (a, -a). This expansion has the advantage of avoiding masking and is thus potentially faster than the a < 0 ? -a : a synthesis. gcc/ChangeLog: * config/riscv/autovec.md (abs<mode>2): Expand via max (a, -a). gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Adjust test expectation. * gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/vls/abs-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto.
2024-08-23Fix test failure on powerpc targetsBernd Edlinger1-3/+3
Apparently due to slightly different optimization levels not always both subroutines have multiple subranges, but having at least one such, and no lexical blocks is sufficient to prove that the fix worked. Q.E.D. So reduce the test expectations to only at least one inlined subroutine with multiple subranges. gcc/testsuite/ChangeLog: PR other/116462 * gcc.dg/debug/dwarf2/inline7.c: Reduce test expectations.
2024-08-23ada: Fix crash on aliased variable with packed array type and -g switchEric Botcazou1-10/+11
This comes from a loophole in gnat_get_array_descr_info for record types containing a template, which represent an aliased array, when this array type is bit-packed and implemented as a modular integer. gcc/ada/ * gcc-interface/misc.cc (gnat_get_array_descr_info): Test the BIT_PACKED_ARRAY_TYPE_P flag only once on the final debug type. In the case of records containing a template, replay the entire processing for the array type contained therein.