aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
5 hoursaarch64: Implement 16-byte vector mode const0 store by TImodeHaochen Gui1-1/+10
gcc/ * config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD): Expand 16-byte vector mode const0 store by TImode.
6 hoursAVX10.2 ymm rounding: Support vsqrtp{s,d,h} and vsubp{s,d,h} intrinsHu, Lin12-0/+345
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vscalefp{s,d,h} intrinsHu, Lin13-1/+186
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def: Add new builtins. * config/i386/sse.md: (<avx512>_scalef<mode><mask_name><round_name>): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vreducep{s,d,h} and vrndscalep{s,d,h} intrinsHu, Lin13-2/+375
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (<mask_codefor>reducep<mode><mask_name><round_saeonly_name>): Add condition check. (<avx512>_rndscale<mode><mask_name><round_saeonly_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vmulp{s,d,h} and vrangep{s,d} intrinsHu, Lin14-0/+322
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI_INT. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support v{max,min}p{s,d,h} intrinsHu, Lin12-0/+366
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vgetexpp{s,d,h} and vgetmantp{s,d,h} intrinsHu, Lin15-2/+361
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V8SF_FTYPE_V8SF_V8SF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_UQI_INT, V16HF_FTYPE_V16HF_V16HF_UHI_INT, V16HF_FTYPE_V16HF_INT_V16HF_UHI_INT, V4DF_FTYPE_V4DF_INT_V4DF_UQI_INT, V8SF_FTYPE_V8SF_INT_V8SF_UQI_INT. * config/i386/sse.md: (<avx512>_getexp<mode><mask_name><round_saeonly_name>): Add condition check. (<avx512>_getmant<mode><mask_name><round_saeonly_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vfnmsub{132,231,213}p{s,d,h} intrinsHu, Lin13-1/+191
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (<avx512>_fnmsub_<mode>_mask3<round_name>): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vfmulcph and vfnmadd{132,231,213}p{s,d,h} intrinsHu, Lin12-0/+252
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vfm{sub,subadd}{132,231,213}p{s,d,h} intrinsHu, Lin13-1/+369
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (<avx512>_fmsub_<mode>_mask<round_name>): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vfmaddcph and vfmaddsub{132,231,213}p{s,d,h} ↵Hu, Lin13-2/+253
intrins gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (<avx512>_fmaddsub_<mode>_mask<round_name>): Add condition check. (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vfmadd{132,231,213}p{s,d,h} intrinsHu, Lin13-1/+186
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md: (<avx512>_fmadd_<mode>_mask3<round_name>): Add condition check. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: New test.
6 hoursAVX10.2 ymm rounding: Support vfc{madd,mul}cph, vfixupimmp{s,d} intrinsHu, Lin15-2/+269
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V16HF_FTYPE_V16HF_V16HF_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI_INT. * config/i386/sse.md: (<avx512>_fixupimm<mode><sd_maskz_name><round_saeonly_name>): Add condition check. (<avx512>_fixupimm<mode>_mask<round_saeonly_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: New test.
6 hoursAVX10.2 ymm rounding: Support vcvt{,u}w2ph and vdivp{s,d,h} intrinsHu, Lin14-0/+293
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V16HF_FTYPE_V16HI_V16HF_UHI_INT. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-3.c: New test.
6 hoursAVX10.2 ymm rounding: Support vcvttps2{,u}{dq,qq} and vcvtu{dq,qq}2p{s,d,h} ↵Hu, Lin13-13/+515
intrins gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (unspec_fix_truncv8sfv8si2<mask_name>): Extend rounding control. (<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>): Ditto. (<mask_codefor>floatuns<sseintvecmodelower><mode>2<mask_name><round_name>): Add condition check. (fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>): Remove round_saeonly_name. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-2.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvttph2{,u}{dq,qq,w} intrinsHu, Lin14-5/+347
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/sse.md (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>): Extend round control for 256bit. (unspec_avx512fp16_fix<vcvtt_uns_suffix>_trunc<mode>2<mask_name>): Ditto. (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>): Add condition check. * config/i386/subst.md (round_saeonly_mode_condition): Add V16HI check for 256bit. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-2.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvtqq2p{s,d,h} and vcvttpd2{,u}{dq,qq} intrinsHu, Lin16-14/+434
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V4DF_FTYPE_V4DI_V4DF_UQI_INT, V4SF_FTYPE_V4DI_V4SF_UQI_INT, V8HF_FTYPE_V4DI_V8HF_UQI_INT. * config/i386/sse.md: (avx512fp16_vcvt<floatsuffix>qq2ph_v4di_mask_round): New expand. (*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Extend round control and add "_1" suffix. (float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>): Add condition check. (float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>): Ditto. (float<floatunssuffix><mode><ssePSmode2lower>2<mask_name><round_name>): Limit suffix output. (unspec_fix_truncv4dfv4si2<mask_name>): Extend round control. (unspec_fixuns_truncv4dfv4si2<mask_name>): Ditto. * config/i386/subst.md (round_qq2pssuff): New iterator. (round_saeonly_suff): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-2.c: New test.
6 hoursAVX10.2 ymm rounding: Support vcvtps2{,u}{dq,qq} intrinsHu, Lin16-5/+240
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V8SI_FTYPE_V8SF_V8SI_UQI_INT, V4DI_FTYPE_V4SF_V4DI_UQI_INT. * config/i386/sse.md (<sse2_avx_avx512f>_fix_notrunc<sf2simodelower><mode><mask_name>): Extend to round. (<mask_codefor><avx512>_fixuns_notrunc<sf2simodelower><mode><mask_name><round_name>): Add round condition check. * config/i386/subst.md (round_constraint4): New. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvtph2{,u}w and vcvtps2p{d,hx} intrinsHu, Lin16-1/+232
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V16HI_FTYPE_V16HF_V16HI_UHI_INT, V4DF_FTYPE_V4SF_V4DF_UQI_INT V8HF_FTYPE_V8SF_V8HF_UQI_INT. * config/i386/sse.md (avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>): Add round condition check. * config/i386/subst.md (round_mode_condition): Add V16HI check for 256bit. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvtph2p{s,d,sx} and vcvtph2{,u}{dq,qq} intrinsHu, Lin16-9/+410
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: New intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V8SF_FTYPE_V8HF_V8SF_UQI_INT, V8SI_FTYPE_V8HF_V8SI_UQI_INT, V4DF_FTYPE_V8HF_V4DF_UQI_INT, V4DI_FTYPE_V8HF_V4DI_UQI_INT. * config/i386/sse.md: (avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>): Add condition check. (avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode> <mask_name><round_name>): Ditto. (avx512fp16_float_extend_ph<mode>2<mask_name>): Extend round saeonly. (vcvtph2ps256<mask_name>): Ditto. * config/i386/subst.md (round_saeonly_applied): New condition. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Add new macro test. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvtpd2{,u}{dq,qq} intrinsHu, Lin16-6/+234
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V4DI_FTYPE_V4DF_V4DI_UQI_INT, V4SI_FTYPE_V4DF_V4SI_UQI_INT. * config/i386/sse.md: (avx_cvtpd2dq256<mask_name>): Change name to avx_cvtpd2dq256<mask_name><round_name> and extend pattern to generate 256bit insns. (fixuns_notrunc<mode><si2dfmodelower>2<mask_name><round_name>): Add round_mode_condition. * config/i386/subst.md (round_pd2udqsuff): New iterator. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add new macro test. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vcvtdq2p{s,h} and vcvtpd2p{s,h} intrinsHu, Lin16-9/+249
gcc/ChangeLog: * config/i386/avx10_2roundingintrin.h: Add new intrins. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V8SF_FTYPE_V8SI_V8SF_UQI_INT, V4SF_FTYPE_V4DF_V4SF_UQI_INT, V8HF_FTYPE_V8SI_V8HF_UQI_INT, V8HF_FTYPE_V4DF_V8HF_UQI_INT. * config/i386/sse.md: (avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>): Add condition check. (avx512fp16_vcvtpd2ph_v4df_mask_round): New expand. (*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Change name to avx512fp16_vcvt<castmode>2ph_<mode>_mask<round_name>_1 and extend pattern to generate 256bit insns. (avx_cvtpd2ps256<mask_name>): Change name to avx_cvtpd2ps256<mask_name><round_name> and extend pattern to generate 256bit insns. * config/i386/subst.md (round_applied): New condition. (round_suff): New iterator. (round_mode_condition): Add V32HI check for 512bit. (round_saeonly_mode_condition): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add new builtin test. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add new macro test. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: Add test.
6 hoursAVX10.2 ymm rounding: Support vadd{s,d,h} and vcmp{s,d,h} intrinsHu, Lin17-60/+433
gcc/ChangeLog: * config.gcc: Add avx10_2roundingintrin.h. * config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE. * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT, UQI_FTYPE_V4DF_V4DF_INT_UQI_INT, UHI_FTYPE_V16HF_V16HF_INT_UHI_INT, UQI_FTYPE_V8SF_V8SF_INT_UQI_INT. * config/i386/immintrin.h: Include avx10_2roundingintrin.h. * config/i386/sse.md: Change subst_attr name due to renaming. * config/i386/subst.md: (<round_mode512bit_condition>): Add condition check for avx10.2 rounding control 256bit intrins and renamed to ... (<round_mode_condition>): ...this. (round_saeonly_mode512bit_condition): Add condition check for avx10.2 rounding control 256 bit intris and renamed to ... (round_saeonly_mode_condition): ...this. * config/i386/avx10_2roundingintrin.h: New file. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add -mavx10.2 and new builtin test. * gcc.target/i386/avx-2.c: Ditto. * gcc.target/i386/sse-13.c: Add new tests. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/avx10_2-rounding-1.c: New test.
16 hoursAVR: Tweak 16-bit addition with const that didn't get a LD_REGS register.Georg-Johann Lay1-2/+18
The 16-bit additions like addhi3 have two forms: One with a scratch:QI and one without, where the latter is required because reload cannot deal with a scratch when spill code pops a 16-bit addition. Passes like combine and fwprop1 may come up with the non-scratch version, which is sub-optimal in the case when the addition is performed in a NO_LD_REGS register because the operands will be spilled to LD_REGS. Having a scratch:QI at disposal can lead to better code with less spills. gcc/ * config/avr/avr.md (*add<mode>3_split) [!reload_completed]: Add a scratch:QI to 16-bit additions with constant.
17 hoursAVR: ad target/116407 - Fix linker error "relocation truncated to fit".Georg-Johann Lay1-1/+1
PR target/116407 gcc/ * config/avr/avr.md (*dec-and-branchhi!=-1.l.clobber): Increase the additional jump offset to 2 words.
19 hoursAVR: target/116407 - Fix linker error "relocation truncated to fit".Georg-Johann Lay3-9/+12
Some text peepholes output extra instructions prior to a branch instruction and that increase the jump offset of backward branches. PR target/116407 gcc/ * config/avr/avr-protos.h (avr_jump_mode): Add an int argument. * config/avr/avr.cc (avr_jump_mode): Add an int argument to increase the computed jump offset of backwards branches. * config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1): Increase the jump offset used by avr_jump_mode() as needed. gcc/testsuite/ * gcc.target/avr/torture/pr116407-2.c: New test. * gcc.target/avr/torture/pr116407-4.c: New test.
31 hoursRISC-V: Implement the quad and oct .SAT_TRUNC for scalarPan Li2-0/+40
This patch would like to implement the quad and oct .SAT_TRUNC pattern in the riscv backend. Aka: Form 1: #define DEF_SAT_U_TRUC_FMT_1(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \ { \ bool overflow = x > (WT)(NT)(-1); \ return ((NT)x) | (NT)-overflow; \ } DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t) Before this patch: 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ _Bool overflow; 8 │ short unsigned int _1; 9 │ short unsigned int _2; 10 │ short unsigned int _3; 11 │ uint16_t _6; 12 │ 13 │ ;; basic block 2, loop depth 0 14 │ ;; pred: ENTRY 15 │ overflow_5 = x_4(D) > 65535; 16 │ _1 = (short unsigned int) x_4(D); 17 │ _2 = (short unsigned int) overflow_5; 18 │ _3 = -_2; 19 │ _6 = _1 | _3; 20 │ return _6; 21 │ ;; succ: EXIT 22 │ 23 │ } After this patch: 3 │ 4 │ __attribute__((noinline)) 5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x) 6 │ { 7 │ uint16_t _6; 8 │ 9 │ ;; basic block 2, loop depth 0 10 │ ;; pred: ENTRY 11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call] 12 │ return _6; 13 │ ;; succ: EXIT 14 │ 15 │ } The below tests suites are passed for this patch 1. The rv64gcv fully regression test. 2. The rv64gcv build with glibc gcc/ChangeLog: * config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for quad truncation. (ANYI_OCT_TRUNC): New iterator for oct truncation. (ANYI_QUAD_TRUNCATED): New attr for truncated quad modes. (ANYI_OCT_TRUNCATED): New attr for truncated oct modes. (anyi_quad_truncated): Ditto but for lower case. (anyi_oct_truncated): Ditto but for lower case. * config/riscv/riscv.md (ustrunc<mode><anyi_quad_truncated>2): Add new pattern for quad truncation. (ustrunc<mode><anyi_oct_truncated>2): Ditto but for oct. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust the expand dump check times. * gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto. * gcc.target/riscv/sat_arith_data.h: Add test helper macros. * gcc.target/riscv/sat_u_trunc-4.c: New test. * gcc.target/riscv/sat_u_trunc-5.c: New test. * gcc.target/riscv/sat_u_trunc-6.c: New test. * gcc.target/riscv/sat_u_trunc-run-4.c: New test. * gcc.target/riscv/sat_u_trunc-run-5.c: New test. * gcc.target/riscv/sat_u_trunc-run-6.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
31 hoursRISC-V: Make sure high bits of usadd operands is clean for non-Xmode [PR116278]Pan Li1-12/+22
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the high bits of Xmode may be all 1 which is not expected. For example as below code. signed char b[1]; unsigned short c; signed char *d = b; int main() { b[0] = -40; c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9; __builtin_printf("%d\n", c); } After expanding we have: ;; _6 = .SAT_ADD (_3, 9); (insn 8 7 9 (set (reg:DI 143) (high:DI (symbol_ref:DI ("d") [flags 0x86] <var_decl d>))) (nil)) (insn 9 8 10 (set (reg/f:DI 142) (mem/f/c:DI (lo_sum:DI (reg:DI 143) (symbol_ref:DI ("d") [flags 0x86] <var_decl d>)) [1 d+0 S8 A64])) (nil)) (insn 10 9 11 (set (reg:HI 144 [ _3 ]) (sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) "test.c":7:10 -1 (nil)) The convert from signed char to unsigned short will have sign_extend rtl as above. And finally become the lb insn as below: lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8 lui a0,0x1a addi a5,a1,9 slli a5,a5,0x30 srli a5,a5,0x30 // a5 is 65505 sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we actually want to compare 65505 and 65496 (0xffd8). Thus we need to clean up the high bits to ensure this. The below test suites are passed for this patch: * The rv64gcv fully regression test. PR target/116278 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new func impl to zero extend rtx. (riscv_expand_usadd): Leverage above func to cleanup operands 0 and remove the special handing for SImode in RV64. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_add-11.c: Adjust asm check body. * gcc.target/riscv/sat_u_add-15.c: Ditto. * gcc.target/riscv/sat_u_add-19.c: Ditto. * gcc.target/riscv/sat_u_add-23.c: Ditto. * gcc.target/riscv/sat_u_add-3.c: Ditto. * gcc.target/riscv/sat_u_add-7.c: Ditto. * gcc.target/riscv/sat_u_add_imm-11.c: Ditto. * gcc.target/riscv/sat_u_add_imm-15.c: Ditto. * gcc.target/riscv/sat_u_add_imm-3.c: Ditto. * gcc.target/riscv/sat_u_add_imm-7.c: Ditto. * gcc.target/riscv/pr116278-run-1.c: New test. * gcc.target/riscv/pr116278-run-2.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
35 hourst-rtems: add rv32imf architecture to the RTEMS multilib for RISC-VKevin Kirspel1-2/+3
The attach patch is specific to the RTEMS RISC-V architecture multilib which is controlled by the t-rtems file in the gcc/config/riscv/ directory. The patch file was created from the gcc-13.3.0 branch. It was successfully tested within RTEMS Source Builder. gcc/ * config/riscv/t-rtems: Add ilp32f multilib.
40 hoursAdjust v850 rotate expander to allow more cases for V850E3V5Jeff Law1-1/+3
The recent if-conversion changes tripped a failure on the v850 port. The core underlying issue is that while the if-conversion code tries to do the right thing with noce_can_force_operand to determine if it can force an arbitrary operand into a register, it's not really a sufficient check. Essentially for arithmetic codes, it checks the operands. If the operands are force-able and there's a code_to_optab mapping, then it returns true. code_to_optab doesn't actually check anything other than the existence of a mapping in the target. If the target pattern has restrictions enforced by the condition or it's an expander that is allowed to FAIL, then noce_can_force_operand to be true, even though we may not be able to directly force the operand into a register. This came up on the v850 when we had an operand that was a rotate by a constant number of bits (I don't remember the count, all that's important about it was the count was not 8 or 16). The v850 port has this define_expand: > (define_expand "rotlsi3" > [(parallel [(set (match_operand:SI 0 "register_operand" "") > (rotate:SI (match_operand:SI 1 "register_operand" "") > (match_operand:SI 2 "const_int_operand" ""))) > (clobber (reg:CC CC_REGNUM))])] > "(TARGET_V850E_UP)" > { > if (INTVAL (operands[2]) != 16) > FAIL; > }) So the only rotate count allowed is 16 (there's a similar HI rotate with a count of 8). AFAICT the rotate patterns are allowed to FAIL. So naturally the expander fails and we get a testsuite regression: > Tests that now fail, but worked before (4 tests): > > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) This patch works around the problem by allowing the rotates in additional cases, particularly for the V850E3V5+ variants which have a general rotate capability. But let's be clear, this is just a workaround and I expect we're going to have to revisit the code to test if an operand can be forced into a register. gcc/ * config/v850/v850.md (rotlsi3): Allow more cases for V850E3V5+.
40 hoursRISC-V: Fix ICE for vector single-width integer multiply-add intrinsicsJin Ma1-40/+40
When rs1 is the immediate 0, the following ICE occurs: error: unrecognizable insn: (insn 8 5 12 2 (set (reg:RVVM1DI 134 [ <retval> ]) (if_then_else:RVVM1DI (unspec:RVVMF64BI [ (const_vector:RVVMF64BI repeat [ (const_int 1 [0x1]) ]) (reg/v:DI 137 [ vl ]) (const_int 2 [0x2]) repeated x2 (const_int 0 [0]) (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 [0])) (reg/v:RVVM1DI 136 [ vs2 ])) (reg/v:RVVM1DI 135 [ vd ])) (reg/v:RVVM1DI 135 [ vd ]))) gcc/ChangeLog: * config/riscv/vector.md: Allow scalar operand to be 0. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-7.c: New test. * gcc.target/riscv/rvv/base/bug-8.c: New test.
40 hours[RISC-V][PR target/116282] Stabilize pattern conditionsJeff Law4-31/+55
So as expected the core problem with target/116282 is that the cost of certain constant synthesis cases varied depending on whether or not we're allowed to generate new pseudos or not. That in turn meant that in obscure cases an insn might change from recognizable to unrecognizable and triggers the observed failure. So we need to keep the cost stable, at least when called from a pattern's condition. So we pass another boolean down when necessary. I've tried to keep API fallout minimized. Built and tested on rv32 in my tester. Let's see what pre-commit testing has to say though 🙂 Note this will also require a minor change to the in-flight constant synthesis work. PR target/116282 gcc/ * config/riscv/riscv-protos.h (riscv_const_insns): Add new argument. * config/riscv/riscv.cc (riscv_build_integer): Add new argument ALLOW_NEW_PSEUDOS. Pass it down to recursive calls and check it before using synthesis which allows new registers to be created. (riscv_split_integer_cost): Pass new argument to riscv_build_integer. (riscv_integer_cost): Add ALLOW_NEW_PSEUDOS argument, pass it down to riscv_build_integer. (riscv_legitimate_constant_p): Pass new argument to riscv_const_insns. (riscv_const_insns): New argment ALLOW_NEW_PSEUDOS. Pass it down to riscv_integer_cost and riscv_const_insns. (riscv_split_const_insns): Pass new argument to riscv_const_insns. (riscv_move_integer, riscv_rtx_costs): Similarly. * config/riscv/riscv.md (shadd with costly constant): Pass new argument to riscv_const_insns. * config/riscv/bitmanip.md (and with costly constant): Pass new argument to riscv_const_insns. gcc/testsuite/ * gcc.target/riscv/pr116282.c: New test.
41 hoursRISC-V: Bugfix for RVV rounding intrinsic ICE in function checkerJin Ma3-3/+7
When compiling an interface for rounding of type 'vfloat16*' without using zvfh or zvfhmin, it is not enough to use FLOAT_MODE_P because the type does not support it. Although the subsequent riscv_validate_vector_type checks will still fail and throw exceptions, I don't think we should have ICE here. internal compiler error: in check, at config/riscv/riscv-vector-builtins-shapes.cc:444 10 | return __riscv_vfadd_vv_f16m1_rm (vs2, vs1, 0, vl); | ^~~~~~ 0x4191794 internal_error(char const*, ...) /iothome/jin.ma/code/master/gcc/gcc/diagnostic-global-context.cc:491 0x416ebf5 fancy_abort(char const*, int, char const*) /iothome/jin.ma/code/master/gcc/gcc/diagnostic.cc:1772 0x220aae6 riscv_vector::build_frm_base::check(riscv_vector::function_checker&) const /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:444 0x2205323 riscv_vector::function_checker::check() /iothome/jin.ma/code/master/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4414 gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_vector_float_type_p): New. * config/riscv/riscv-vector-builtins.cc (function_instance::any_type_float_p): Use riscv_vector_float_type_p instead of FLOAT_MODE_P for judgment. * config/riscv/riscv.cc (riscv_vector_int_type_p): Change static to extern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/bug-9.c: New test.
41 hoursRISC-V: Bugfix incorrect operand for vwsll auto-vectPan Li1-0/+4
This patch would like to fix one ICE when rv64gcv_zvbb for vwsll. Consider below example. void vwsll_vv_test (short *restrict dst, char *restrict a, int *restrict b, int n) { for (int i = 0; i < n; i++) dst[i] = a[i] << b[i]; } It will hit the vwsll pattern with following operands. operand 0 -> (reg:RVVMF2HI 146 [ vect__7.13 ]) operand 1 -> (reg:RVVMF4QI 165 [ vect_cst__33 ]) operand 2 -> (reg:RVVM1SI 171 [ vect_cst__36 ]) According to the ISA, operand 2 should be the same as operand 1. Aka operand 2 should have RVVMF4QI mode as above. Thus, add quad truncation for operand 2 before emit vwsll. The below test suites are passed for this patch. * The rv64gcv fully regression test. PR target/116280 gcc/ChangeLog: * config/riscv/autovec-opt.md: Add quad truncation to align the mode requirement for vwsll. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr116280-1.c: New test. * gcc.target/riscv/rvv/base/pr116280-2.c: New test.
41 hoursRISC-V: Add auto-vect pattern for vector rotate shiftFeng Wang1-0/+16
This patch add the vector rotate shift pattern for auto-vect. With this patch, the scalar rotate shift can be automatically vectorized into vector rotate shift. gcc/ChangeLog: * config/riscv/autovec.md (v<bitmanip_optab><mode>3): Add new define_expand pattern for vector rotate shift. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrolr-1.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-run.c: New test. * gcc.target/riscv/rvv/autovec/binop/vrolr-template.h: New test.
45 hoursAVR: target/116390 - Fix an avrtiny asm out template.Georg-Johann Lay1-15/+15
PR target/116390 gcc/ * config/avr/avr.cc (avr_out_movsi_mr_r_reg_disp_tiny): Fix output templates for the reg_base == reg_src and reg_src == reg_base - 2 cases. gcc/testsuite/ * gcc.target/avr/torture/pr116390.c: New test.
2 daysRISC-V: Fix factor in dwarf_poly_indeterminate_value [PR116305]曾治金1-2/+2
This patch is to fix the bug (BugId:116305) introduced by the commit bd93ef for risc-v target. The commit bd93ef changes the chunk_num from 1 to TARGET_MIN_VLEN/128 if TARGET_MIN_VLEN is larger than 128 in riscv_convert_vector_bits. So it changes the value of BYTES_PER_RISCV_VECTOR. For example, before merging the commit bd93ef and if TARGET_MIN_VLEN is 256, the value of BYTES_PER_RISCV_VECTOR should be [8, 8], but now [16, 16]. The value of riscv_bytes_per_vector_chunk and BYTES_PER_RISCV_VECTOR are no longer equal. Prologue will use BYTES_PER_RISCV_VECTOR.coeffs[1] to estimate the vlenb register value in riscv_legitimize_poly_move, and dwarf2cfi will also get the estimated vlenb register value in riscv_dwarf_poly_indeterminate_value to calculate the number of times to multiply the vlenb register value. So need to change the factor from riscv_bytes_per_vector_chunk to BYTES_PER_RISCV_VECTOR, otherwise we will get the incorrect dwarf information. The incorrect example as follow: ``` csrr    t0,vlenb slli    t1,t0,1 sub     sp,sp,t1 .cfi_escape 0xf,0xb,0x72,0,0x92,0xa2,0x38,0,0x34,0x1e,0x23,0x50,0x22 ``` The sequence '0x92,0xa2,0x38,0' means the vlenb register, '0x34' means the literal 4, '0x1e' means the multiply operation. But in fact, the vlenb register value just need to multiply the literal 2. PR target/116305 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_dwarf_poly_indeterminate_value): Take BYTES_PER_RISCV_VECTOR for *factor instead of riscv_bytes_per_vector_chunk. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalable_vector_cfi.c: New test. Signed-off-by: Zhijin Zeng <zhijin.zeng@spacemit.com>
3 daysAVR: target/85624 - Use HImode for clrmemqi alignment.Georg-Johann Lay1-4/+2
gcc/ PR target/85624 * config/avr/avr.md (*clrmemqi*): Use HImode for alignment operand. (cherry picked from commit 507b4e147588c0fafe952b7226dd764ebeebb103)
3 daysi386: Fix some vex insns that prohibit egprLingling Kong1-17/+32
Although these vex insn have evex counterpart, but when it uses the displayed vex prefix should not support APX EGPR. Like TARGET_AVXVNNI, TARGET_IFMA and TARGET_AVXNECONVERT. TARGET_AVXVNNIINT8 and TARGET_AVXVNNITINT16 also are vex insn should not support egpr. gcc/ChangeLog: * config/i386/sse.md (vpmadd52<vpmadd52type><mode>): Prohibit egpr for vex version. (vpdpbusd_<mode>): Ditto. (vpdpbusds_<mode>): Ditto. (vpdpwssd_<mode>): Ditto. (vpdpwssds_<mode>): Ditto. (*vcvtneps2bf16_v4sf): Ditto. (*vcvtneps2bf16_v8sf): Ditto. (vpdp<vpdotprodtype>_<mode>): Ditto. (vbcstnebf162ps_<mode>): Ditto. (vbcstnesh2ps_<mode>): Ditto. (vcvtnee<bf16_ph>2ps_<mode>): Ditto. (vcvtneo<bf16_ph>2ps_<mode>): Ditto. (vpdp<vpdpwprodtype>_<mode>): Ditto.
3 daysaarch64: Improve popcount for bytes [PR113042]Andrew Pinski1-13/+24
For popcount for bytes, we don't need the reduction addition after the vector cnt instruction as we are only counting one byte's popcount. This changes the popcount extend to cover all ALLI rather than GPI. Changes since v1: * v2 - Use ALLI iterator and combine all into one pattern. Add new testcases popcnt[6-8].c. * v3 - Simplify TARGET_CSSC path. Use convert_to_mode instead of gen_zero_extend* directly. Some other small cleanups. Bootstrapped and tested on aarch64-linux-gnu with no regressions. PR target/113042 gcc/ChangeLog: * config/aarch64/aarch64.md (popcount<mode>2): Update pattern to support ALLI modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/popcnt5.c: New test. * gcc.target/aarch64/popcnt6.c: New test. * gcc.target/aarch64/popcnt7.c: New test. * gcc.target/aarch64/popcnt8.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
3 daysRISC-V: use fclass insns to implement isfinite,isnormal and isinf builtinsVineet Gupta1-0/+63
Currently these builtins use float compare instructions which require FP flags to be saved/restored which could be costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP values w/o disturbing FP exception flags. Now that upstream supports the corresponding optabs, wire them up in the backend. gcc/ChangeLog: * config/riscv/riscv.md: define_insn for fclass insn. define_expand for isfinite, isnormal, isinf. gcc/testsuite/ChangeLog: * gcc.target/riscv/fclass.c: New tests. Tested-by: Edwin Lu <ewlu@rivosinc.com> # pre-commit-CI #2060 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
3 daysi386: Improve split of *extendv2di2_highpart_stv_noavx512vl.Roger Sayle1-2/+30
This patch follows up on the previous patch to fix PR target/116275 by improving the code STV (ultimately) generates for highpart sign extensions like (x<<8)>>8. The arithmetic right shift is able to take advantage of the available common subexpressions from the preceding left shift. Hence previously with -O2 -m32 -mavx -mno-avx512vl we'd generate: vpsllq $8, %xmm0, %xmm0 vpsrad $8, %xmm0, %xmm1 vpsrlq $8, %xmm0, %xmm0 vpblendw $51, %xmm0, %xmm1, %xmm0 But with improved splitting, we now generate three instructions: vpslld $8, %xmm1, %xmm0 vpsrad $8, %xmm0, %xmm0 vpblendw $51, %xmm1, %xmm0, %xmm0 This patch also implements Uros' suggestion that the pre-reload splitter could introduced a new pseudo to hold the intermediate to potentially help reload with register allocation, which applies when not performing the above optimization, i.e. on TARGET_XOP. 2024-08-15 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386.md (*extendv2di2_highpart_stv_noavx512vl): Split to an improved implementation on !TARGET_XOP. On TARGET_XOP, use a new pseudo for the intermediate to simplify register allocation. gcc/testsuite/ChangeLog * g++.target/i386/pr116275-2.C: New test case.
4 daysLoongArch: Implement scalar isinf, isnormal, and isfinite via fclassXi Ruoyao1-7/+46
Doing so can avoid loading FP constants from the memory. It also partially fixes PR 66262 as fclass does not signal on sNaN. gcc/ChangeLog: * config/loongarch/loongarch.md (extendsidi2): Add ("=r", "f") alternative and use movfr2gr.s for it. The spec clearly states movfr2gr.s sign extends the value to GRLEN. (fclass_<fmt>): Make the result SImode instead of a floating mode. The fclass results are really not FP values. (FCLASS_MASK): New define_int_iterator. (fclass_optab): New define_int_attr. (<FCLASS_MASK:fclass_optab><ANYF:mode>): New define_expand template. gcc/testsuite/ChangeLog: * gcc.target/loongarch/fclass-compile.c: New test. * gcc.target/loongarch/fclass-run.c: New test.
4 daysMovement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary ↵liuhongt3-7/+32
reload. It results in 2 failures for x86_64-pc-linux-gnu{\ -march=cascadelake}; gcc: gcc.target/i386/extendditi3-1.c scan-assembler cqt?o gcc: gcc.target/i386/pr113560.c scan-assembler-times \tmulq 1 For pr113560.c, now GCC generates mulx instead of mulq with -march=cascadelake, which should be optimal, so adjust testcase for that. For gcc.target/i386/extendditi2-1.c, RA happens to choose another register instead of rax and result in movq %rdi, %rbp movq %rdi, %rax sarq $63, %rbp movq %rbp, %rdx The patch adds a new define_peephole2 for that. gcc/ChangeLog: PR target/116274 * config/i386/i386-expand.cc (ix86_expand_vector_move): Restrict special case TImode to 128-bit vector conversions via V2DI under ix86_pre_reload_split (). * config/i386/i386.cc (inline_secondary_memory_needed): Movement between GENERAL_REGS and SSE_REGS for TImode doesn't need secondary reload. * config/i386/i386.md (*extendsidi2_rex64): Add a define_peephole2 after it. gcc/testsuite/ChangeLog: * gcc.target/i386/pr116274.c: New test. * gcc.target/i386/pr113560.c: Scan either mulq or mulx.
4 daysaarch64: Rename svpext to svpext_lane [PR116371]Richard Sandiford3-4/+4
When implementing the SME2 ACLE, I somehow missed off the _lane suffix on svpext. gcc/ PR target/116371 * config/aarch64/aarch64-sve-builtins-sve2.h (svpext): Rename to... (svpext_lane): ...this. * config/aarch64/aarch64-sve-builtins-sve2.cc (svpext_impl): Rename to... (svpext_lane_impl): ...this and update instantiation accordingly. * config/aarch64/aarch64-sve-builtins-sve2.def (svpext): Rename to... (svpext_lane): ...this. gcc/testsuite/ PR target/116371 * gcc.target/aarch64/sme2/acle-asm/pext_c16.c, gcc.target/aarch64/sme2/acle-asm/pext_c16_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_c32.c, gcc.target/aarch64/sme2/acle-asm/pext_c32_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_c64.c, gcc.target/aarch64/sme2/acle-asm/pext_c64_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_c8.c, gcc.target/aarch64/sme2/acle-asm/pext_c8_x2.c: Replace with... * gcc.target/aarch64/sme2/acle-asm/pext_lane_c16.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c16_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c32.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c32_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c64.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c64_x2.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c8.c, gcc.target/aarch64/sme2/acle-asm/pext_lane_c8_x2.c: ...these new tests, testing for svpext_lane instead of svpext.
4 daysrs6000: Add TARGET_FLOAT128_HW guard for quad-precision insnsHaochen Gui2-13/+16
gcc/ * config/rs6000/rs6000.md (floatti<mode>2, floatunsti<mode>2, fix_trunc<mode>ti2): Add guard TARGET_FLOAT128_HW. * config/rs6000/vsx.md (xsxexpqp_<IEEE128:mode>_<V2DI_DI:mode>, xsxsigqp_<IEEE128:mode>_<VEC_TI:mode>, xsiexpqpf_<mode>, xsiexpqp_<IEEE128:mode>_<V2DI_DI:mode>, xscmpexpqp_<code>_<mode>, *xscmpexpqp, xststdcnegqp_<mode>): Replace guard TARGET_P9_VECTOR with TARGET_FLOAT128_HW. (xststdc_<mode>, *xststdc_<mode>, isinf<mode>2): Add guard TARGET_FLOAT128_HW for the IEEE128 modes. gcc/testsuite/ * gcc.target/powerpc/float128-cmp2-runnable.c: Replace ppc_float128_sw with ppc_float128_hw and remove p9vector_hw.
4 daysrs6000: Implement optab_isnormal for SFDF and IEEE128Haochen Gui1-0/+18
gcc/ PR target/97786 * config/rs6000/vsx.md (isnormal<mode>2): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-7.c: New test. * gcc.target/powerpc/pr97786-8.c: New test.
4 daysrs6000: Implement optab_isfinite for SFDF and IEEE128Haochen Gui1-0/+15
gcc/ PR target/97786 * config/rs6000/vsx.md (isfinite<mode>2): New expand. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-4.c: New test. * gcc.target/powerpc/pr97786-5.c: New test.
4 daysrs6000: Implement optab_isinf for SFDF and IEEE128Haochen Gui4-45/+58
gcc/ PR target/97786 * config/rs6000/rs6000.md (constant VSX_TEST_DATA_CLASS_NAN, VSX_TEST_DATA_CLASS_POS_INF, VSX_TEST_DATA_CLASS_NEG_INF, VSX_TEST_DATA_CLASS_POS_ZERO, VSX_TEST_DATA_CLASS_NEG_ZERO, VSX_TEST_DATA_CLASS_POS_DENORMAL, VSX_TEST_DATA_CLASS_NEG_DENORMAL): Define. (mode_attr sdq, vsx_altivec, wa_v, x): Define. (mode_iterator IEEE_FP): Define. * config/rs6000/vsx.md (isinf<mode>2): New expand. (expand xststdcqp_<mode>, xststdc<sd>p): Combine into... (expand xststdc_<mode>): ...this. (insn *xststdcqp_<mode>, *xststdc<sd>p): Combine into... (insn *xststdc_<mode>): ...this. * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin): Rename CODE_FOR_xststdcqp_kf as CODE_FOR_xststdc_kf, CODE_FOR_xststdcqp_tf as CODE_FOR_xststdc_tf. * config/rs6000/rs6000-builtins.def: Rename xststdcdp as xststdc_df, xststdcsp as xststdc_sf, xststdcqp_kf as xststdc_kf. gcc/testsuite/ PR target/97786 * gcc.target/powerpc/pr97786-1.c: New test. * gcc.target/powerpc/pr97786-2.c: New test.
5 dayss390: Remove vector intrinsicsStefan Schulze Frielinghaus1-14/+0
The following intrinsics are not implemented. Thus, remove them. gcc/ChangeLog: * config/s390/vecintrin.h (vec_vstbrh): Remove. (vec_vstbrf): Remove. (vec_vstbrg): Remove. (vec_vstbrq): Remove. (vec_vstbrf_flt): Remove. (vec_vstbrg_dbl): Remove. (vec_vsterb): Remove. (vec_vsterh): Remove. (vec_vsterf): Remove. (vec_vsterg): Remove. (vec_vsterf_flt): Remove. (vec_vsterg_dbl): Remove.