aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2021-09-22AVX512FP16: Add permutation and mask blend intrinsics.dianhong xu2-0/+93
gcc/ChangeLog: * config/i386/avx512fp16intrin.h: (_mm512_mask_blend_ph): New intrinsic. (_mm512_permutex2var_ph): Ditto. (_mm512_permutexvar_ph): Ditto. * config/i386/avx512fp16vlintrin.h: (_mm256_mask_blend_ph): New intrinsic. (_mm256_permutex2var_ph): Ditto. (_mm256_permutexvar_ph): Ditto. (_mm_mask_blend_ph): Ditto. (_mm_permutex2var_ph): Ditto. (_mm_permutexvar_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-14.c: New test.
2021-09-22AVX512FP16: Add complex conjugation intrinsic instructions.dianhong xu2-0/+80
gcc/ChangeLog: * config/i386/avx512fp16intrin.h: Add new intrinsics. (_mm512_conj_pch): New intrinsic. (_mm512_mask_conj_pch): Ditto. (_mm512_maskz_conj_pch): Ditto. * config/i386/avx512fp16vlintrin.h: Add new intrinsics. (_mm256_conj_pch): New intrinsic. (_mm256_mask_conj_pch): Ditto. (_mm256_maskz_conj_pch): Ditto. (_mm_conj_pch): Ditto. (_mm_mask_conj_pch): Ditto. (_mm_maskz_conj_pch): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-conjugation-1.c: New test. * gcc.target/i386/avx512fp16vl-conjugation-1.c: New test.
2021-09-22AVX512FP16: Add reduce operators(add/mul/min/max).dianhong xu2-0/+203
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_MM512_REDUCE_OP): New macro (_mm512_reduce_add_ph): New intrinsic. (_mm512_reduce_mul_ph): Ditto. (_mm512_reduce_min_ph): Ditto. (_mm512_reduce_max_ph): Ditto. * config/i386/avx512fp16vlintrin.h (_MM256_REDUCE_OP/_MM_REDUCE_OP): New macro. (_mm256_reduce_add_ph): New intrinsic. (_mm256_reduce_mul_ph): Ditto. (_mm256_reduce_min_ph): Ditto. (_mm256_reduce_max_ph): Ditto. (_mm_reduce_add_ph): Ditto. (_mm_reduce_mul_ph): Ditto. (_mm_reduce_min_ph): Ditto. (_mm_reduce_max_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-reduce-op-1.c: New test. * gcc.target/i386/avx512fp16vl-reduce-op-1.c: Ditto.
2021-09-22AVX512FP16: Support load/store/abs intrinsics.dianhong xu2-0/+116
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (__m512h_u, __m256h_u, __m128h_u): New typedef. (_mm512_load_ph): New intrinsic. (_mm256_load_ph): Ditto. (_mm_load_ph): Ditto. (_mm512_loadu_ph): Ditto. (_mm256_loadu_ph): Ditto. (_mm_loadu_ph): Ditto. (_mm512_store_ph): Ditto. (_mm256_store_ph): Ditto. (_mm_store_ph): Ditto. (_mm512_storeu_ph): Ditto. (_mm256_storeu_ph): Ditto. (_mm_storeu_ph): Ditto. (_mm512_abs_ph): Ditto. * config/i386/avx512fp16vlintrin.h (_mm_abs_ph): Ditto. (_mm256_abs_ph): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-13.c: New test.
2021-09-22IBM Z: TPF: Add cc clobber to profiling expandersAndreas Krebbel1-2/+4
The code sequence emitted uses CC internally. gcc/ChangeLog: * config/s390/tpf.md (prologue_tpf, epilogue_tpf): Add cc clobber.
2021-09-22IBM Z: Fix PR102222Andreas Krebbel1-0/+10
Avoid emitting a strict low part move if the insv target actually affects the whole target reg. gcc/ChangeLog: PR target/102222 * config/s390/s390.c (s390_expand_insv): Emit a normal move if it is actually a full copy of the source operand into the target. Don't emit a strict low part move if source and target mode match. gcc/testsuite/ChangeLog: * gcc.target/s390/pr102222.c: New test.
2021-09-22Support 64bit fma/fms/fnma/fnms under avx512vl.liuhongt2-9/+15
gcc/ChangeLog: * config/i386/i386.md (define_attr "isa"): Add fma_or_avx512vl. (define_attr "enabled"): Correspond fma_or_avx512vl to TARGET_FMA || TARGET_AVX512VL. * config/i386/mmx.md (fmav2sf4): Extend to AVX512 fma. (fmsv2sf4): Ditto. (fnmav2sf4): Ditto. (fnmsv2sf4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-pr95046.c: New test.
2021-09-22AVX512FP16: Add expander for cstorehf4.liuhongt1-0/+15
gcc/ChangeLog: * config/i386/i386.md (cstorehf3): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-fpcompare-1.c: New test. * gcc.target/i386/avx512fp16-builtin-fpcompare-2.c: New test.
2021-09-22AVX512FP16: Add expander for ceil/floor/trunc/roundeven.liuhongt2-7/+19
gcc/ChangeLog: * config/i386/i386.md (<rounding_insn>hf2): New expander. (sse4_1_round<mode>2): Extend from MODEF to MODEFH. * config/i386/sse.md (*sse4_1_round<ssescalarmodesuffix>): Extend from VF_128 to VFH_128. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-round-1.c: New test.
2021-09-22AVX512FP16: Add expander for sqrthf2.liuhongt3-8/+28
gcc/ChangeLog: * config/i386/i386-features.c (i386-features.c): Handle E_HFmode. * config/i386/i386.md (sqrthf2): New expander. (*sqrthf2): New define_insn. * config/i386/sse.md (*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>): Extend to VFH_128. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-builtin-sqrt-1.c: New test. * gcc.target/i386/avx512fp16vl-builtin-sqrt-1.c: New test.
2021-09-22AVX512FP16: Add vfcmaddcsh/vfmaddcsh/vfcmulcsh/vfmulcsh.liuhongt4-0/+624
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_mask_fcmadd_sch): New intrinsic. (_mm_mask3_fcmadd_sch): Likewise. (_mm_maskz_fcmadd_sch): Likewise. (_mm_fcmadd_sch): Likewise. (_mm_mask_fmadd_sch): Likewise. (_mm_mask3_fmadd_sch): Likewise. (_mm_maskz_fmadd_sch): Likewise. (_mm_fmadd_sch): Likewise. (_mm_mask_fcmadd_round_sch): Likewise. (_mm_mask3_fcmadd_round_sch): Likewise. (_mm_maskz_fcmadd_round_sch): Likewise. (_mm_fcmadd_round_sch): Likewise. (_mm_mask_fmadd_round_sch): Likewise. (_mm_mask3_fmadd_round_sch): Likewise. (_mm_maskz_fmadd_round_sch): Likewise. (_mm_fmadd_round_sch): Likewise. (_mm_fcmul_sch): Likewise. (_mm_mask_fcmul_sch): Likewise. (_mm_maskz_fcmul_sch): Likewise. (_mm_fmul_sch): Likewise. (_mm_mask_fmul_sch): Likewise. (_mm_maskz_fmul_sch): Likewise. (_mm_fcmul_round_sch): Likewise. (_mm_mask_fcmul_round_sch): Likewise. (_mm_maskz_fcmul_round_sch): Likewise. (_mm_fmul_round_sch): Likewise. (_mm_mask_fmul_round_sch): Likewise. (_mm_maskz_fmul_round_sch): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (avx512fp16_fmaddcsh_v8hf_maskz<round_expand_name>): New expander. (avx512fp16_fcmaddcsh_v8hf_maskz<round_expand_name>): Ditto. (avx512fp16_fma_<complexopname>sh_v8hf<mask_scalarcz_name><round_scalarcz_name>): New define insn. (avx512fp16_<complexopname>sh_v8hf_mask<round_name>): Ditto. (avx512fp16_<complexopname>sh_v8hf<mask_scalarc_name><round_scalarcz_name>): Ditto. * config/i386/subst.md (mask_scalarcz_name): New. (mask_scalarc_name): Ditto. (mask_scalarc_operand3): Ditto. (mask_scalarcz_operand4): Ditto. (round_scalarcz_name): Ditto. (round_scalarc_mask_operand3): Ditto. (round_scalarcz_mask_operand4): Ditto. (round_scalarc_mask_op3): Ditto. (round_scalarcz_mask_op4): Ditto. (round_scalarcz_constraint): Ditto. (round_scalarcz_nimm_predicate): Ditto. (mask_scalarcz): Ditto. (mask_scalarc): Ditto. (round_scalarcz): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-22AVX512FP16: Add vfcmaddcph/vfmaddcph/vfcmulcph/vfmulcphliuhongt7-0/+837
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_fcmadd_pch): New intrinsic. (_mm512_mask_fcmadd_pch): Likewise. (_mm512_mask3_fcmadd_pch): Likewise. (_mm512_maskz_fcmadd_pch): Likewise. (_mm512_fmadd_pch): Likewise. (_mm512_mask_fmadd_pch): Likewise. (_mm512_mask3_fmadd_pch): Likewise. (_mm512_maskz_fmadd_pch): Likewise. (_mm512_fcmadd_round_pch): Likewise. (_mm512_mask_fcmadd_round_pch): Likewise. (_mm512_mask3_fcmadd_round_pch): Likewise. (_mm512_maskz_fcmadd_round_pch): Likewise. (_mm512_fmadd_round_pch): Likewise. (_mm512_mask_fmadd_round_pch): Likewise. (_mm512_mask3_fmadd_round_pch): Likewise. (_mm512_maskz_fmadd_round_pch): Likewise. (_mm512_fcmul_pch): Likewise. (_mm512_mask_fcmul_pch): Likewise. (_mm512_maskz_fcmul_pch): Likewise. (_mm512_fmul_pch): Likewise. (_mm512_mask_fmul_pch): Likewise. (_mm512_maskz_fmul_pch): Likewise. (_mm512_fcmul_round_pch): Likewise. (_mm512_mask_fcmul_round_pch): Likewise. (_mm512_maskz_fcmul_round_pch): Likewise. (_mm512_fmul_round_pch): Likewise. (_mm512_mask_fmul_round_pch): Likewise. (_mm512_maskz_fmul_round_pch): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_fmadd_pch): New intrinsic. (_mm_mask_fmadd_pch): Likewise. (_mm_mask3_fmadd_pch): Likewise. (_mm_maskz_fmadd_pch): Likewise. (_mm256_fmadd_pch): Likewise. (_mm256_mask_fmadd_pch): Likewise. (_mm256_mask3_fmadd_pch): Likewise. (_mm256_maskz_fmadd_pch): Likewise. (_mm_fcmadd_pch): Likewise. (_mm_mask_fcmadd_pch): Likewise. (_mm_mask3_fcmadd_pch): Likewise. (_mm_maskz_fcmadd_pch): Likewise. (_mm256_fcmadd_pch): Likewise. (_mm256_mask_fcmadd_pch): Likewise. (_mm256_mask3_fcmadd_pch): Likewise. (_mm256_maskz_fcmadd_pch): Likewise. (_mm_fmul_pch): Likewise. (_mm_mask_fmul_pch): Likewise. (_mm_maskz_fmul_pch): Likewise. (_mm256_fmul_pch): Likewise. (_mm256_mask_fmul_pch): Likewise. (_mm256_maskz_fmul_pch): Likewise. (_mm_fcmul_pch): Likewise. (_mm_mask_fcmul_pch): Likewise. (_mm_maskz_fcmul_pch): Likewise. (_mm256_fcmul_pch): Likewise. (_mm256_mask_fcmul_pch): Likewise. (_mm256_maskz_fcmul_pch): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF, V8HF_FTYPE_V16HF_V16HF_V16HF, V16HF_FTYPE_V16HF_V16HF_V16HF_UQI, V32HF_FTYPE_V32HF_V32HF_V32HF_INT, V32HF_FTYPE_V32HF_V32HF_V32HF_UHI_INT): Add new builtin types. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/subst.md (SUBST_CV): New. (maskc_name): Ditto. (maskc_operand3): Ditto. (maskc): Ditto. (sdc_maskz_name): Ditto. (sdc_mask_op4): Ditto. (sdc_mask_op5): Ditto. (sdc_mask_mode512bit_condition): Ditto. (sdc): Ditto. (round_maskc_operand3): Ditto. (round_sdc_mask_operand4): Ditto. (round_maskc_op3): Ditto. (round_sdc_mask_op4): Ditto. (round_saeonly_sdc_mask_operand5): Ditto. * config/i386/sse.md (unspec): Add complex fma unspecs. (avx512fmaskcmode): New. (UNSPEC_COMPLEX_F_C_MA): Ditto. (UNSPEC_COMPLEX_F_C_MUL): Ditto. (complexopname): Ditto. (<avx512>_fmaddc_<mode>_maskz<round_expand_name>): New expander. (<avx512>_fcmaddc_<mode>_maskz<round_expand_name>): Ditto. (fma_<complexopname>_<mode><sdc_maskz_name><round_name>): New define insn. (<avx512>_<complexopname>_<mode>_mask<round_name>): Ditto. (<avx512>_<complexopname>_<mode><maskc_name><round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-21rs6000: Parameterize some const values for density testKewen Lin2-15/+45
This patch follows the discussion here[1], where Segher suggested parameterizing those exact magic constants for density heuristics, to make it easier to tweak if need. The change here should be "No Functional Change". But I verified it with SPEC2017 at option sets O2-vect and Ofast-unroll on Power8, the result is neutral as expected. [1]https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html gcc/ChangeLog: * config/rs6000/rs6000.opt (rs6000-density-pct-threshold, rs6000-density-size-threshold, rs6000-density-penalty, rs6000-density-load-pct-threshold, rs6000-density-load-num-threshold): New parameter. * config/rs6000/rs6000.c (rs6000_density_test): Adjust with corresponding parameters.
2021-09-19Darwin, crts: Build Darwin10 unwinder shim as a library.Iain Sandoe1-1/+1
We have a small unwinder shim that is only used for Darwin10 (and only then in quite specific cases). To avoid linking this code for every executable or DSO, we can present the crt as a convenience library (rather than a .o file). Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config/darwin.h (LINK_COMMAND_SPEC_A): Use Darwin10 unwinder shim as a convenience library. libgcc/ChangeLog: * config.host: Use convenience library for Darwin10 unwinder shim. * config/t-darwin: Build Darwin10 unwinder shim as a convenience library.
2021-09-19[PATCH] avr: Add atmega324pb MCUMatwey V. Kornilov1-0/+1
gcc/ * config/avr/avr-mcus.def: Add atmega324pb. * doc/avr-mmcu.texi: Corresponding changes.
2021-09-18Fix ICE in pass_rpad.liuhongt1-5/+22
Besides conversion instructions, pass_rpad also handles scalar sqrt/rsqrt/rcp/round instructions, while r12-3614 should only want to handle conversion instructions, so fix it. gcc/ChangeLog: * config/i386/i386-features.c (remove_partial_avx_dependency): Restrict TARGET_USE_VECTOR_FP_CONVERTS and TARGET_USE_VECTOR_CONVERTS to conversion instructions only.
2021-09-18AVX512FP16: Add scalar fma instructions.liuhongt5-163/+598
Add vfmadd[132,213,231]sh/vfnmadd[132,213,231]sh/ vfmsub[132,213,231]sh/vfnmsub[132,213,231]sh. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_fmadd_sh): New intrinsic. (_mm_mask_fmadd_sh): Likewise. (_mm_mask3_fmadd_sh): Likewise. (_mm_maskz_fmadd_sh): Likewise. (_mm_fmadd_round_sh): Likewise. (_mm_mask_fmadd_round_sh): Likewise. (_mm_mask3_fmadd_round_sh): Likewise. (_mm_maskz_fmadd_round_sh): Likewise. (_mm_fnmadd_sh): Likewise. (_mm_mask_fnmadd_sh): Likewise. (_mm_mask3_fnmadd_sh): Likewise. (_mm_maskz_fnmadd_sh): Likewise. (_mm_fnmadd_round_sh): Likewise. (_mm_mask_fnmadd_round_sh): Likewise. (_mm_mask3_fnmadd_round_sh): Likewise. (_mm_maskz_fnmadd_round_sh): Likewise. (_mm_fmsub_sh): Likewise. (_mm_mask_fmsub_sh): Likewise. (_mm_mask3_fmsub_sh): Likewise. (_mm_maskz_fmsub_sh): Likewise. (_mm_fmsub_round_sh): Likewise. (_mm_mask_fmsub_round_sh): Likewise. (_mm_mask3_fmsub_round_sh): Likewise. (_mm_maskz_fmsub_round_sh): Likewise. (_mm_fnmsub_sh): Likewise. (_mm_mask_fnmsub_sh): Likewise. (_mm_mask3_fnmsub_sh): Likewise. (_mm_maskz_fnmsub_sh): Likewise. (_mm_fnmsub_round_sh): Likewise. (_mm_mask_fnmsub_round_sh): Likewise. (_mm_mask3_fnmsub_round_sh): Likewise. (_mm_maskz_fnmsub_round_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V8HF_V8HF_V8HF_UQI_INT): New builtin type. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c: Handle new builtin type. * config/i386/sse.md (fmai_vmfmadd_<mode><round_name>): Ajdust to support FP16. (fmai_vmfmsub_<mode><round_name>): Ditto. (fmai_vmfnmadd_<mode><round_name>): Ditto. (fmai_vmfnmsub_<mode><round_name>): Ditto. (*fmai_fmadd_<mode>): Ditto. (*fmai_fmsub_<mode>): Ditto. (*fmai_fnmadd_<mode><round_name>): Ditto. (*fmai_fnmsub_<mode><round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask<round_name>): Ditto. (avx512f_vmfmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfmadd_<mode>_maskz<round_expand_name>): Ditto. (avx512f_vmfmadd_<mode>_maskz_1<round_name>): Ditto. (*avx512f_vmfmsub_<mode>_mask<round_name>): Ditto. (avx512f_vmfmsub_<mode>_mask3<round_name>): Ditto. (*avx512f_vmfmsub_<mode>_maskz_1<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_mask3<round_name>): Ditto. (*avx512f_vmfnmsub_<mode>_mask<round_name>): Ditto. (*avx512f_vmfnmadd_<mode>_mask<round_name>): Renamed to ... (avx512f_vmfnmadd_<mode>_mask<round_name>) ... this, and adjust to support FP16. (avx512f_vmfnmadd_<mode>_mask3<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_maskz_1<round_name>): Ditto. (avx512f_vmfnmadd_<mode>_maskz<round_expand_name>): New expander. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-18AVX512FP16: Enable FP16 mask load/store.H.J. Lu1-6/+6
gcc/ChangeLog: * config/i386/sse.md (avx512fmaskmodelower): Extend to support HF modes. (maskload<mode><avx512fmaskmodelower>): Ditto. (maskstore<mode><avx512fmaskmodelower>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-xorsign-1.c: New test.
2021-09-18AVX512FP16: Add scalar/vector bitwise operations, includingH.J. Lu4-62/+130
1. FP16 vector xor/ior/and/andnot/abs/neg 2. FP16 scalar abs/neg/copysign/xorsign gcc/ChangeLog: * config/i386/i386-expand.c (ix86_expand_fp_absneg_operator): Handle HFmode. (ix86_expand_copysign): Ditto. (ix86_expand_xorsign): Ditto. * config/i386/i386.c (ix86_build_const_vector): Handle HF vector modes. (ix86_build_signbit_mask): Ditto. (ix86_can_change_mode_class): Ditto. * config/i386/i386.md (SSEMODEF): Add HFmode. (ssevecmodef): Ditto. (<code>hf2): New define_expand. (*<code>hf2_1): New define_insn_and_split. (copysign<mode>): Extend to support HFmode under AVX512FP16. (xorsign<mode>): Ditto. * config/i386/sse.md (VFB): New mode iterator. (VFB_128_256): Ditto. (VFB_512): Ditto. (sseintvecmode2): Support HF vector mode. (<code><mode>2): Use new mode iterator. (*<code><mode>2): Ditto. (copysign<mode>3): Ditto. (xorsign<mode>3): Ditto. (<code><mode>3<mask_name>): Ditto. (<code><mode>3<mask_name>): Ditto. (<sse>_andnot<mode>3<mask_name>): Adjust for HF vector mode. (<sse>_andnot<mode>3<mask_name>): Ditto. (*<code><mode>3<mask_name>): Ditto. (*<code><mode>3<mask_name>): Ditto.
2021-09-18AVX512FP16: Add FP16 fma instructions.liuhongt4-96/+928
Add vfmadd[132,213,231]ph/vfnmadd[132,213,231]ph/vfmsub[132,213,231]ph/ vfnmsub[132,213,231]ph. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fmadd_ph): New intrinsic. (_mm512_mask3_fmadd_ph): Likewise. (_mm512_maskz_fmadd_ph): Likewise. (_mm512_fmadd_round_ph): Likewise. (_mm512_mask_fmadd_round_ph): Likewise. (_mm512_mask3_fmadd_round_ph): Likewise. (_mm512_maskz_fmadd_round_ph): Likewise. (_mm512_fnmadd_ph): Likewise. (_mm512_mask_fnmadd_ph): Likewise. (_mm512_mask3_fnmadd_ph): Likewise. (_mm512_maskz_fnmadd_ph): Likewise. (_mm512_fnmadd_round_ph): Likewise. (_mm512_mask_fnmadd_round_ph): Likewise. (_mm512_mask3_fnmadd_round_ph): Likewise. (_mm512_maskz_fnmadd_round_ph): Likewise. (_mm512_fmsub_ph): Likewise. (_mm512_mask_fmsub_ph): Likewise. (_mm512_mask3_fmsub_ph): Likewise. (_mm512_maskz_fmsub_ph): Likewise. (_mm512_fmsub_round_ph): Likewise. (_mm512_mask_fmsub_round_ph): Likewise. (_mm512_mask3_fmsub_round_ph): Likewise. (_mm512_maskz_fmsub_round_ph): Likewise. (_mm512_fnmsub_ph): Likewise. (_mm512_mask_fnmsub_ph): Likewise. (_mm512_mask3_fnmsub_ph): Likewise. (_mm512_maskz_fnmsub_ph): Likewise. (_mm512_fnmsub_round_ph): Likewise. (_mm512_mask_fnmsub_round_ph): Likewise. (_mm512_mask3_fnmsub_round_ph): Likewise. (_mm512_maskz_fnmsub_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmadd_ph): New intrinsic. (_mm256_mask_fmadd_ph): Likewise. (_mm256_mask3_fmadd_ph): Likewise. (_mm256_maskz_fmadd_ph): Likewise. (_mm_fmadd_ph): Likewise. (_mm_mask_fmadd_ph): Likewise. (_mm_mask3_fmadd_ph): Likewise. (_mm_maskz_fmadd_ph): Likewise. (_mm256_fnmadd_ph): Likewise. (_mm256_mask_fnmadd_ph): Likewise. (_mm256_mask3_fnmadd_ph): Likewise. (_mm256_maskz_fnmadd_ph): Likewise. (_mm_fnmadd_ph): Likewise. (_mm_mask_fnmadd_ph): Likewise. (_mm_mask3_fnmadd_ph): Likewise. (_mm_maskz_fnmadd_ph): Likewise. (_mm256_fmsub_ph): Likewise. (_mm256_mask_fmsub_ph): Likewise. (_mm256_mask3_fmsub_ph): Likewise. (_mm256_maskz_fmsub_ph): Likewise. (_mm_fmsub_ph): Likewise. (_mm_mask_fmsub_ph): Likewise. (_mm_mask3_fmsub_ph): Likewise. (_mm_maskz_fmsub_ph): Likewise. (_mm256_fnmsub_ph): Likewise. (_mm256_mask_fnmsub_ph): Likewise. (_mm256_mask3_fnmsub_ph): Likewise. (_mm256_maskz_fnmsub_ph): Likewise. (_mm_fnmsub_ph): Likewise. (_mm_mask_fnmsub_ph): Likewise. (_mm_mask3_fnmsub_ph): Likewise. (_mm_maskz_fnmsub_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (<avx512>_fmadd_<mode>_maskz<round_expand_name>): Adjust to support HF vector modes. (<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name><round_name>): Ditto. (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_1): Ditto. (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_2): Ditto. (*<sd_mask_codefor>fma_fmadd_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fmadd_<mode>_mask<round_name>): Ditto. (<avx512>_fmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fmsub_<mode>_maskz<round_expand_name>): Ditto. (<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name><round_name>): Ditto. (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_1): Ditto. (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_2): Ditto. (*<sd_mask_codefor>fma_fmsub_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fmsub_<mode>_mask3<round_name>): Ditto. (<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name><round_name>): Ditto. (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_1): Ditto. (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_2): Ditto. (*<sd_mask_codefor>fma_fnmadd_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fnmadd_<mode>_mask<round_name>): Ditto. (<avx512>_fnmadd_<mode>_mask3<round_name>): Ditto. (<avx512>_fnmsub_<mode>_maskz<round_expand_name>): Ditto. (<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name><round_name>): Ditto. (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_1): Ditto. (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_2): Ditto. (*<sd_mask_codefor>fma_fnmsub_<mode><sd_maskz_name>_bcst_3): Ditto. (<avx512>_fnmsub_<mode>_mask<round_name>): Ditto. (<avx512>_fnmsub_<mode>_mask3<round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test fot new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-18AVX512FP16: Add vfmaddsub[132,213,231]ph/vfmsubadd[132,213,231]ph.liuhongt4-41/+490
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_fmaddsub_ph): New intrinsic. (_mm512_mask_fmaddsub_ph): Likewise. (_mm512_mask3_fmaddsub_ph): Likewise. (_mm512_maskz_fmaddsub_ph): Likewise. (_mm512_fmaddsub_round_ph): Likewise. (_mm512_mask_fmaddsub_round_ph): Likewise. (_mm512_mask3_fmaddsub_round_ph): Likewise. (_mm512_maskz_fmaddsub_round_ph): Likewise. (_mm512_mask_fmsubadd_ph): Likewise. (_mm512_mask3_fmsubadd_ph): Likewise. (_mm512_maskz_fmsubadd_ph): Likewise. (_mm512_fmsubadd_round_ph): Likewise. (_mm512_mask_fmsubadd_round_ph): Likewise. (_mm512_mask3_fmsubadd_round_ph): Likewise. (_mm512_maskz_fmsubadd_round_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm256_fmaddsub_ph): New intrinsic. (_mm256_mask_fmaddsub_ph): Likewise. (_mm256_mask3_fmaddsub_ph): Likewise. (_mm256_maskz_fmaddsub_ph): Likewise. (_mm_fmaddsub_ph): Likewise. (_mm_mask_fmaddsub_ph): Likewise. (_mm_mask3_fmaddsub_ph): Likewise. (_mm_maskz_fmaddsub_ph): Likewise. (_mm256_fmsubadd_ph): Likewise. (_mm256_mask_fmsubadd_ph): Likewise. (_mm256_mask3_fmsubadd_ph): Likewise. (_mm256_maskz_fmsubadd_ph): Likewise. (_mm_fmsubadd_ph): Likewise. (_mm_mask_fmsubadd_ph): Likewise. (_mm_mask3_fmsubadd_ph): Likewise. (_mm_maskz_fmsubadd_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (VFH_SF_AVX512VL): New mode iterator. * (<avx512>_fmsubadd_<mode>_maskz<round_expand_name>): New expander. * (<avx512>_fmaddsub_<mode>_maskz<round_expand_name>): Use VFH_SF_AVX512VL. * (<sd_mask_codefor>fma_fmaddsub_<mode><sd_maskz_name><round_name>): Ditto. * (<avx512>_fmaddsub_<mode>_mask<round_name>): Ditto. * (<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto. * (<sd_mask_codefor>fma_fmsubadd_<mode><sd_maskz_name><round_name>): Ditto. * (<avx512>_fmsubadd_<mode>_mask<round_name>): Ditto. * (<avx512>_fmsubadd_<mode>_mask3<round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-18Support embedded broadcast for AVX512FP16 instructions.liuhongt3-9/+7
gcc/ChangeLog: PR target/87767 * config/i386/i386.c (ix86_print_operand): Handle V8HF/V16HF/V32HFmode. * config/i386/i386.h (VALID_BCST_MODE_P): Add HFmode. * config/i386/sse.md (avx512bcst): Remove. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-broadcast-1.c: New test. * gcc.target/i386/avx512fp16-broadcast-2.c: New test.
2021-09-17openacc: Remove unnecessary barriers (gimple worker partitioning/broadcast)Julian Brown1-3/+8
This is an optimisation for middle-end worker-partitioning support (used to support multiple workers on AMD GCN). At present, barriers may be emitted in cases where they aren't needed and cannot be optimised away. This patch stops the extraneous barriers from being emitted in the first place. One exception to the above (where the barrier is still needed) is for predicated blocks of code that perform a write to gang-private shared memory from one worker. We must execute a barrier before other workers read that shared memory location. gcc/ * config/gcn/gcn.c (gimple.h): Include. (gcn_fork_join): Emit barrier for worker-level joins. * omp-oacc-neuter-broadcast.cc (find_local_vars_to_propagate): Add writes_gang_private bitmap parameter. Set bit for blocks containing gang-private variable writes. (worker_single_simple): Don't emit barrier after predicated block. (worker_single_copy): Don't emit barrier if we're not broadcasting anything and the block contains no gang-private writes. (neuter_worker_single): Don't predicate blocks that only contain NOPs or internal marker functions. Pass has_gang_private_write argument to worker_single_copy. (oacc_do_neutering): Add writes_gang_private bitmap handling.
2021-09-17openacc: Shared memory layout optimisationJulian Brown5-67/+92
This patch implements an algorithm to lay out local data-share (LDS) space. It currently works for AMD GCN. At the moment, LDS is used for three things: 1. Gang-private variables 2. Reduction temporaries (accumulators) 3. Broadcasting for worker partitioning After the patch is applied, (2) and (3) are placed at preallocated locations in LDS, and (1) continues to be handled by the backend (as it is at present prior to this patch being applied). LDS now looks like this: +--------------+ (gang-private size + 1024, = 1536) | free space | | ... | | - - - - - - -| | worker bcast | +--------------+ | reductions | +--------------+ <<< -mgang-private-size=<number> (def. 512) | gang-private | | vars | +--------------+ (32) | low LDS vars | +--------------+ LDS base So, gang-private space is fixed at a constant amount at compile time (which can be increased with a command-line switch if necessary for some given code). The layout algorithm takes out a slice of the remainder of usable space for reduction vars, and uses the rest for worker partitioning. The partitioning algorithm works as follows. 1. An "adjacency" set is built up for each basic block that might do a broadcast. This is calculated by starting at each such block, and doing a recursive DFS walk over successors to find the next block (or blocks) that *also* does a broadcast (dfs_broadcast_reachable_1). 2. The adjacency set is inverted to get adjacent predecessor blocks also. 3. Blocks that will perform a broadcast are sorted by size of that broadcast: the biggest blocks are handled first. 4. A splay tree structure is used to calculate the spans of LDS memory that are already allocated by the blocks adjacent to this one (merge_ranges{,_1}. 5. The current block's broadcast space is allocated from the first free span not allocated in the splay tree structure calculated above (first_fit_range). This seems to work quite nicely and efficiently with the splay tree structure. 6. Continue with the next-biggest broadcast block until we're done. In this way, "adjacent" broadcasts will not use the same piece of LDS memory. PR96334 "openacc: Unshare reduction temporaries for GCN" got merged in: The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>) for reduction temporaries. Unlike e.g. var decls and SSA names, these nodes cannot be shared during gimplification, but are so in some circumstances. This is detected when appropriate --enable-checking options are used. This patch unshares such nodes when they are reused more than once. gcc/ * config/gcn/gcn-protos.h (gcn_goacc_create_worker_broadcast_record): Update prototype. * config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use preallocated block of LDS memory. Do not cache/share decls for reduction temporaries between invocations. (gcn_goacc_reduction_teardown): Unshare VAR on second use. (gcn_goacc_create_worker_broadcast_record): Add OFFSET parameter and return temporary LDS space at that offset. Return pointer in "sender" case. * config/gcn/gcn.c (acc_lds_size, gang_private_hwm, lds_allocs): New global vars. (ACC_LDS_SIZE): Define as acc_lds_size. (gcn_init_machine_status): Don't initialise lds_allocated, lds_allocs, reduc_decls fields of machine function struct. (gcn_option_override): Handle default size for gang-private variables and -mgang-private-size option. (gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when initialising M0_REG. (gcn_shared_mem_layout): New function. (gcn_print_lds_decl): Update comment. Use global lds_allocs map and gang_private_hwm variable. (TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook. * config/gcn/gcn.h (machine_function): Remove lds_allocated, lds_allocs, reduc_decls. Add reduction_base, reduction_limit. * config/gcn/gcn.opt (gang_private_size_opt): New global. (mgang-private-size=): New option. * doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place documentation hook. * doc/tm.texi: Regenerate. * omp-oacc-neuter-broadcast.cc (targhooks.h, diagnostic-core.h): Add includes. (build_sender_ref): Handle sender_decl being pointer. (worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS parameters. Pass placement argument to create_worker_broadcast_record hook invocations. Handle sender_decl being pointer and isolate_broadcasts inserting extra barriers. (blk_offset_map_t): Add typedef. (neuter_worker_single): Add BLK_OFFSET_MAP parameter. Pass preallocated range to worker_single_copy call. (dfs_broadcast_reachable_1): New function. (idx_decl_pair_t, used_range_vec_t): New typedefs. (sort_size_descending): New function. (addr_range): New class. (splay_tree_compare_addr_range, splay_tree_free_key) (first_fit_range, merge_ranges_1, merge_ranges): New functions. (execute_omp_oacc_neuter_broadcast): Rename to... (oacc_do_neutering): ... this. Add BOUNDS_LO, BOUNDS_HI parameters. Arrange layout of shared memory for broadcast operations. (execute_omp_oacc_neuter_broadcast): New function. (pass_omp_oacc_neuter_broadcast::gate): Remove num_workers==1 handling from here. Enable pass for all OpenACC routines in order to call shared memory-layout hook. * target.def (create_worker_broadcast_record): Add OFFSET parameter. (shared_mem_layout): New hook. libgomp/ * testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: Update.
2021-09-17rs6000: Support for vectorizing built-in functionsBill Schmidt1-0/+257
This patch just duplicates a couple of functions and adjusts them to use the new builtin names. There's no logical change otherwise. 2021-09-17 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000.c (rs6000-builtins.h): New include. (rs6000_new_builtin_vectorized_function): New function. (rs6000_new_builtin_md_vectorized_function): Likewise. (rs6000_builtin_vectorized_function): Call rs6000_new_builtin_vectorized_function. (rs6000_builtin_md_vectorized_function): Call rs6000_new_builtin_md_vectorized_function.
2021-09-17rs6000: Handle some recent MMA builtin changesBill Schmidt3-86/+138
Peter Bergner recently added two new builtins __builtin_vsx_lxvp and __builtin_vsx_stxvp. These happened to break a pattern in MMA builtins that I had been using to automate gimple folding of MMA builtins. Previously, every MMA function that could be folded had an associated internal function that it was folded into. The LXVP/STXVP builtins are just folded directly into memory operations. Instead of relying on this pattern, this patch adds a new attribute to builtins called "mmaint," which is set for all MMA builtins that have an associated internal builtin. The naming convention that adds _INTERNAL to the builtin index name remains. The rest of the patch is just duplicating Peter's patch, using the new builtin infrastructure. 2021-09-17 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag. (ASSEMBLE_PAIR): Likewise. (BUILD_ACC): Likewise. (DISASSEMBLE_ACC): Likewise. (DISASSEMBLE_PAIR): Likewise. (PMXVBF16GER2): Likewise. (PMXVBF16GER2NN): Likewise. (PMXVBF16GER2NP): Likewise. (PMXVBF16GER2PN): Likewise. (PMXVBF16GER2PP): Likewise. (PMXVF16GER2): Likewise. (PMXVF16GER2NN): Likewise. (PMXVF16GER2NP): Likewise. (PMXVF16GER2PN): Likewise. (PMXVF16GER2PP): Likewise. (PMXVF32GER): Likewise. (PMXVF32GERNN): Likewise. (PMXVF32GERNP): Likewise. (PMXVF32GERPN): Likewise. (PMXVF32GERPP): Likewise. (PMXVF64GER): Likewise. (PMXVF64GERNN): Likewise. (PMXVF64GERNP): Likewise. (PMXVF64GERPN): Likewise. (PMXVF64GERPP): Likewise. (PMXVI16GER2): Likewise. (PMXVI16GER2PP): Likewise. (PMXVI16GER2S): Likewise. (PMXVI16GER2SPP): Likewise. (PMXVI4GER8): Likewise. (PMXVI4GER8PP): Likewise. (PMXVI8GER4): Likewise. (PMXVI8GER4PP): Likewise. (PMXVI8GER4SPP): Likewise. (XVBF16GER2): Likewise. (XVBF16GER2NN): Likewise. (XVBF16GER2NP): Likewise. (XVBF16GER2PN): Likewise. (XVBF16GER2PP): Likewise. (XVF16GER2): Likewise. (XVF16GER2NN): Likewise. (XVF16GER2NP): Likewise. (XVF16GER2PN): Likewise. (XVF16GER2PP): Likewise. (XVF32GER): Likewise. (XVF32GERNN): Likewise. (XVF32GERNP): Likewise. (XVF32GERPN): Likewise. (XVF32GERPP): Likewise. (XVF64GER): Likewise. (XVF64GERNN): Likewise. (XVF64GERNP): Likewise. (XVF64GERPN): Likewise. (XVF64GERPP): Likewise. (XVI16GER2): Likewise. (XVI16GER2PP): Likewise. (XVI16GER2S): Likewise. (XVI16GER2SPP): Likewise. (XVI4GER8): Likewise. (XVI4GER8PP): Likewise. (XVI8GER4): Likewise. (XVI8GER4PP): Likewise. (XVI8GER4SPP): Likewise. (XXMFACC): Likewise. (XXMTACC): Likewise. (XXSETACCZ): Likewise. (ASSEMBLE_PAIR_V): Likewise. (BUILD_PAIR): Likewise. (DISASSEMBLE_PAIR_V): Likewise. (LXVP): New. (STXVP): New. * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and RS6000_BIF_STXVP. * config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint. (parse_bif_attrs): Handle ismmaint. (write_decls): Add bif_mmaint_bit and bif_is_mmaint. (write_bif_static_init): Handle ismmaint.
2021-09-17rs6000: Handle gimple folding of target built-insBill Schmidt1-0/+1165
This is another patch that looks bigger than it really is. Because we have a new namespace for the builtins, allowing us to have both the old and new builtin infrastructure supported at once, we need versions of these functions that use the new builtin namespace. Otherwise the code is unchanged. 2021-09-17 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin): New forward decl. (rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin. (rs6000_new_builtin_valid_without_lhs): New function. (rs6000_gimple_fold_new_mma_builtin): Likewise. (rs6000_gimple_fold_new_builtin): Likewise.
2021-09-17rs6000: Move __builtin_mffsl to the [always] stanzaBill Schmidt1-3/+6
I over-restricted use of __builtin_mffsl, since I was unaware that it automatically uses mffs when mffsl is not available. Paul Clarke pointed this out in discussion of his SSE 4.1 compatibility patches. 2021-08-31 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-builtin-new.def (__builtin_mffsl): Move from [power9] to [always].
2021-09-17x86: Add TARGET_SSE_PARTIAL_REG_[FP_]CONVERTS_DEPENDENCYH.J. Lu4-5/+29
1. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY in SSE FP to FP splitters. 2. Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY in SSE INT to FP splitters. 3. Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and TARGET_SSE_PARTIAL_REG_DEPENDENCY when handling avx_partial_xmm_update attribute. Don't convert AVX partial XMM register update if there is no partial SSE register dependency for SSE conversion. gcc/ * config/i386/i386-features.c (remove_partial_avx_dependency): Also check TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY and and TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY before generating vxorps. * config/i386/i386.h (TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. * config/i386/i386.md (SSE FP to FP splitters): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY. (SSE INT to FP splitter): Replace TARGET_SSE_PARTIAL_REG_DEPENDENCY with TARGET_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY. * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_FP_CONVERTS_DEPENDENCY): New. (X86_TUNE_SSE_PARTIAL_REG_CONVERTS_DEPENDENCY): Likewise. gcc/testsuite/ * gcc.target/i386/avx-covert-1.c: New file. * gcc.target/i386/avx-fp-covert-1.c: Likewise. * gcc.target/i386/avx-int-covert-1.c: Likewise. * gcc.target/i386/sse-covert-1.c: Likewise. * gcc.target/i386/sse-fp-covert-1.c: Likewise. * gcc.target/i386/sse-int-covert-1.c: Likewise.
2021-09-17x86: Properly handle USE_VECTOR_FP_CONVERTS/USE_VECTOR_CONVERTSH.J. Lu1-3/+20
Check TARGET_USE_VECTOR_FP_CONVERTS or TARGET_USE_VECTOR_CONVERTS when handling avx_partial_xmm_update attribute. Don't convert AVX partial XMM register update if vector packed SSE conversion should be used. gcc/ PR target/101900 * config/i386/i386-features.c (remove_partial_avx_dependency): Check TARGET_USE_VECTOR_FP_CONVERTS and TARGET_USE_VECTOR_CONVERTS before generating vxorps. gcc/testsuite PR target/101900 * gcc.target/i386/pr101900-1.c: New test. * gcc.target/i386/pr101900-2.c: Likewise. * gcc.target/i386/pr101900-3.c: Likewise.
2021-09-17x86: Update memcpy/memset inline strategies for -mtune=tremontH.J. Lu3-2/+126
Simply memcpy and memset inline strategies to avoid branches for -mtune=tremont: 1. Create Tremont cost model from generic cost model. 2. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector load and store for up to 16 * 16 (256) bytes when the data size is fixed and known. 3. Inline only if data size is known to be <= 256. a. Use "rep movsb/stosb" with simple code sequence if the data size is a constant. b. Use loop if data size is not a constant. 4. Use memcpy/memset libray function if data size is unknown or > 256. * config/i386/i386-options.c (processor_cost_table): Use tremont_cost for Tremont. * config/i386/x86-tune-costs.h (tremont_memcpy): New. (tremont_memset): Likewise. (tremont_cost): Likewise. * config/i386/x86-tune.def (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB): Enable for Tremont.
2021-09-17x86: Update -mtune=tremontH.J. Lu3-18/+22
Initial -mtune=tremont update 1. Use Haswell scheduling model. 2. Assume that stack engine allows to execute push&pop instructions in parall. 3. Prepare for scheduling pass as -mtune=generic. 4. Use the same issue rate as -mtune=generic. 5. Enable partial_reg_dependency. 6. Disable accumulate_outgoing_args 7. Enable use_leave 8. Enable push_memory 9. Disable four_jump_limit 10. Disable opt_agu 11. Disable avoid_lea_for_addr 12. Disable avoid_mem_opnd_for_cmove 13. Enable misaligned_move_string_pro_epilogues 14. Enable use_cltd 16. Enable avoid_false_dep_for_bmi 17. Enable avoid_mfence 18. Disable expand_abs 19. Enable sse_typeless_stores 20. Enable sse_load0_by_pxor 21. Disable split_mem_opnd_for_fp_converts 22. Disable slow_pshufb 23. Enable partial_reg_dependency This is the first patch to tune for Tremont. With all patches applied, performance impacts on SPEC CPU 2017 are: 500.perlbench_r 1.81% 502.gcc_r 0.57% 505.mcf_r 1.16% 520.omnetpp_r 0.00% 523.xalancbmk_r 0.00% 525.x264_r 4.55% 531.deepsjeng_r 0.00% 541.leela_r 0.39% 548.exchange2_r 1.13% 557.xz_r 0.00% geomean for intrate 0.95% 503.bwaves_r 0.00% 507.cactuBSSN_r 6.94% 508.namd_r 12.37% 510.parest_r 1.01% 511.povray_r 3.70% 519.lbm_r 36.61% 521.wrf_r 8.79% 526.blender_r 2.91% 527.cam4_r 6.23% 538.imagick_r 0.28% 544.nab_r 21.99% 549.fotonik3d_r 3.63% 554.roms_r -1.20% geomean for fprate 7.50% gcc/ChangeLog * common/config/i386/i386-common.c: Use Haswell scheduling model for Tremont. * config/i386/i386.c (ix86_sched_init_global): Prepare for Tremont scheduling pass. * config/i386/x86-tune-sched.c (ix86_issue_rate): Change Tremont issue rate to 4. (ix86_adjust_cost): Handle Tremont. * config/i386/x86-tune.def (X86_TUNE_SSE_PARTIAL_REG_DEPENDENCY): Enable for Tremont. (X86_TUNE_USE_LEAVE): Likewise. (X86_TUNE_PUSH_MEMORY): Likewise. (X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES): Likewise. (X86_TUNE_USE_CLTD): Likewise. (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI): Likewise. (X86_TUNE_AVOID_MFENCE): Likewise. (X86_TUNE_SSE_TYPELESS_STORES): Likewise. (X86_TUNE_SSE_LOAD0_BY_PXOR): Likewise. (X86_TUNE_ACCUMULATE_OUTGOING_ARGS): Disable for Tremont. (X86_TUNE_FOUR_JUMP_LIMIT): Likewise. (X86_TUNE_OPT_AGU): Likewise. (X86_TUNE_AVOID_LEA_FOR_ADDR): Likewise. (X86_TUNE_AVOID_MEM_OPND_FOR_CMOVE): Likewise. (X86_TUNE_EXPAND_ABS): Likewise. (X86_TUNE_SPLIT_MEM_OPND_FOR_FP_CONVERTS): Likewise. (X86_TUNE_SLOW_PSHUFB): Likewise.
2021-09-17AVX512FP16: Add intrinsics for casting between vector float16 and vector ↵liuhongt2-0/+270
float32/float64/integer. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_undefined_ph): New intrinsic. (_mm256_undefined_ph): Likewise. (_mm512_undefined_ph): Likewise. (_mm_cvtsh_h): Likewise. (_mm256_cvtsh_h): Likewise. (_mm512_cvtsh_h): Likewise. (_mm512_castph_ps): Likewise. (_mm512_castph_pd): Likewise. (_mm512_castph_si512): Likewise. (_mm512_castph512_ph128): Likewise. (_mm512_castph512_ph256): Likewise. (_mm512_castph128_ph512): Likewise. (_mm512_castph256_ph512): Likewise. (_mm512_zextph128_ph512): Likewise. (_mm512_zextph256_ph512): Likewise. (_mm512_castps_ph): Likewise. (_mm512_castpd_ph): Likewise. (_mm512_castsi512_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_castph_ps): New intrinsic. (_mm256_castph_ps): Likewise. (_mm_castph_pd): Likewise. (_mm256_castph_pd): Likewise. (_mm_castph_si128): Likewise. (_mm256_castph_si256): Likewise. (_mm_castps_ph): Likewise. (_mm256_castps_ph): Likewise. (_mm_castpd_ph): Likewise. (_mm256_castpd_ph): Likewise. (_mm_castsi128_ph): Likewise. (_mm256_castsi256_ph): Likewise. (_mm256_castph256_ph128): Likewise. (_mm256_castph128_ph256): Likewise. (_mm256_zextph128_ph256): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-typecast-1.c: New test. * gcc.target/i386/avx512fp16-typecast-2.c: Ditto. * gcc.target/i386/avx512fp16vl-typecast-1.c: Ditto. * gcc.target/i386/avx512fp16vl-typecast-2.c: Ditto.
2021-09-17AVX512FP16: Add vcvtsh2ss/vcvtsh2sd/vcvtss2sh/vcvtsd2sh.liuhongt5-1/+356
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvtsh_ss): New intrinsic. (_mm_mask_cvtsh_ss): Likewise. (_mm_maskz_cvtsh_ss): Likewise. (_mm_cvtsh_sd): Likewise. (_mm_mask_cvtsh_sd): Likewise. (_mm_maskz_cvtsh_sd): Likewise. (_mm_cvt_roundsh_ss): Likewise. (_mm_mask_cvt_roundsh_ss): Likewise. (_mm_maskz_cvt_roundsh_ss): Likewise. (_mm_cvt_roundsh_sd): Likewise. (_mm_mask_cvt_roundsh_sd): Likewise. (_mm_maskz_cvt_roundsh_sd): Likewise. (_mm_cvtss_sh): Likewise. (_mm_mask_cvtss_sh): Likewise. (_mm_maskz_cvtss_sh): Likewise. (_mm_cvtsd_sh): Likewise. (_mm_mask_cvtsd_sh): Likewise. (_mm_maskz_cvtsd_sh): Likewise. (_mm_cvt_roundss_sh): Likewise. (_mm_mask_cvt_roundss_sh): Likewise. (_mm_maskz_cvt_roundss_sh): Likewise. (_mm_cvt_roundsd_sh): Likewise. (_mm_mask_cvt_roundsd_sh): Likewise. (_mm_maskz_cvt_roundsd_sh): Likewise. * config/i386/i386-builtin-types.def (V8HF_FTYPE_V2DF_V8HF_V8HF_UQI_INT, V8HF_FTYPE_V4SF_V8HF_V8HF_UQI_INT, V2DF_FTYPE_V8HF_V2DF_V2DF_UQI_INT, V4SF_FTYPE_V8HF_V4SF_V4SF_UQI_INT): Add new builtin types. * config/i386/i386-builtin.def: Add corrresponding new builtins. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/sse.md (VF48_128): New mode iterator. (avx512fp16_vcvtsh2<ssescalarmodesuffix><mask_scalar_name><round_saeonly_scalar_name>): New. (avx512fp16_vcvt<ssescalarmodesuffix>2sh<mask_scalar_name><round_scalar_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-17AVX512FP16: Add vcvtph2pd/vcvtph2psx/vcvtpd2ph/vcvtps2phx.liuhongt6-3/+749
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtph_pd): New intrinsic. (_mm512_mask_cvtph_pd): Likewise. (_mm512_maskz_cvtph_pd): Likewise. (_mm512_cvt_roundph_pd): Likewise. (_mm512_mask_cvt_roundph_pd): Likewise. (_mm512_maskz_cvt_roundph_pd): Likewise. (_mm512_cvtxph_ps): Likewise. (_mm512_mask_cvtxph_ps): Likewise. (_mm512_maskz_cvtxph_ps): Likewise. (_mm512_cvtx_roundph_ps): Likewise. (_mm512_mask_cvtx_roundph_ps): Likewise. (_mm512_maskz_cvtx_roundph_ps): Likewise. (_mm512_cvtxps_ph): Likewise. (_mm512_mask_cvtxps_ph): Likewise. (_mm512_maskz_cvtxps_ph): Likewise. (_mm512_cvtx_roundps_ph): Likewise. (_mm512_mask_cvtx_roundps_ph): Likewise. (_mm512_maskz_cvtx_roundps_ph): Likewise. (_mm512_cvtpd_ph): Likewise. (_mm512_mask_cvtpd_ph): Likewise. (_mm512_maskz_cvtpd_ph): Likewise. (_mm512_cvt_roundpd_ph): Likewise. (_mm512_mask_cvt_roundpd_ph): Likewise. (_mm512_maskz_cvt_roundpd_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtph_pd): New intrinsic. (_mm_mask_cvtph_pd): Likewise. (_mm_maskz_cvtph_pd): Likewise. (_mm256_cvtph_pd): Likewise. (_mm256_mask_cvtph_pd): Likewise. (_mm256_maskz_cvtph_pd): Likewise. (_mm_cvtxph_ps): Likewise. (_mm_mask_cvtxph_ps): Likewise. (_mm_maskz_cvtxph_ps): Likewise. (_mm256_cvtxph_ps): Likewise. (_mm256_mask_cvtxph_ps): Likewise. (_mm256_maskz_cvtxph_ps): Likewise. (_mm_cvtxps_ph): Likewise. (_mm_mask_cvtxps_ph): Likewise. (_mm_maskz_cvtxps_ph): Likewise. (_mm256_cvtxps_ph): Likewise. (_mm256_mask_cvtxps_ph): Likewise. (_mm256_maskz_cvtxps_ph): Likewise. (_mm_cvtpd_ph): Likewise. (_mm_mask_cvtpd_ph): Likewise. (_mm_maskz_cvtpd_ph): Likewise. (_mm256_cvtpd_ph): Likewise. (_mm256_mask_cvtpd_ph): Likewise. (_mm256_maskz_cvtpd_ph): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-expand.c: Handle new builtin types. * config/i386/sse.md (VF4_128_8_256): New. (VF48H_AVX512VL): Ditto. (ssePHmode): Add HF vector modes. (castmode): Add new convertable modes. (qq2phsuff): Ditto. (ph2pssuffix): New. (avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>): Ditto. (avx512fp16_vcvt<castmode>2ph_<mode>): Ditto. (*avx512fp16_vcvt<castmode>2ph_<mode>): Ditto. (avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto. (*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Ditto. (*avx512fp16_vcvt<castmode>2ph_<mode>_mask_1): Ditto. (avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>): Ditto. (avx512fp16_float_extend_ph<mode>2<mask_name>): Ditto. (*avx512fp16_float_extend_ph<mode>2_load<mask_name>): Ditto. (avx512fp16_float_extend_phv2df2<mask_name>): Ditto. (*avx512fp16_float_extend_phv2df2_load<mask_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-17AVX512FP16: Add vcvttsh2si/vcvttsh2usi.liuhongt3-0/+107
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvttsh_i32): New intrinsic. (_mm_cvttsh_u32): Likewise. (_mm_cvtt_roundsh_i32): Likewise. (_mm_cvtt_roundsh_u32): Likewise. (_mm_cvttsh_i64): Likewise. (_mm_cvttsh_u64): Likewise. (_mm_cvtt_roundsh_i64): Likewise. (_mm_cvtt_roundsh_u64): Likewise. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/sse.md (avx512fp16_fix<fixunssuffix>_trunc<mode>2<round_saeonly_name>): New. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vcvttsh2si-1a.c: New test. * gcc.target/i386/avx512fp16-vcvttsh2si-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2si64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2si64-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi-1b.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi64-1a.c: Ditto. * gcc.target/i386/avx512fp16-vcvttsh2usi64-1b.c: Ditto. * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-17AVX512FP16: Add ↵liuhongt4-0/+982
vcvttph2w/vcvttph2uw/vcvttph2dq/vcvttph2qq/vcvttph2udq/vcvttph2uqq gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvttph_epi32): New intrinsic. (_mm512_mask_cvttph_epi32): Likewise. (_mm512_maskz_cvttph_epi32): Likewise. (_mm512_cvtt_roundph_epi32): Likewise. (_mm512_mask_cvtt_roundph_epi32): Likewise. (_mm512_maskz_cvtt_roundph_epi32): Likewise. (_mm512_cvttph_epu32): Likewise. (_mm512_mask_cvttph_epu32): Likewise. (_mm512_maskz_cvttph_epu32): Likewise. (_mm512_cvtt_roundph_epu32): Likewise. (_mm512_mask_cvtt_roundph_epu32): Likewise. (_mm512_maskz_cvtt_roundph_epu32): Likewise. (_mm512_cvttph_epi64): Likewise. (_mm512_mask_cvttph_epi64): Likewise. (_mm512_maskz_cvttph_epi64): Likewise. (_mm512_cvtt_roundph_epi64): Likewise. (_mm512_mask_cvtt_roundph_epi64): Likewise. (_mm512_maskz_cvtt_roundph_epi64): Likewise. (_mm512_cvttph_epu64): Likewise. (_mm512_mask_cvttph_epu64): Likewise. (_mm512_maskz_cvttph_epu64): Likewise. (_mm512_cvtt_roundph_epu64): Likewise. (_mm512_mask_cvtt_roundph_epu64): Likewise. (_mm512_maskz_cvtt_roundph_epu64): Likewise. (_mm512_cvttph_epi16): Likewise. (_mm512_mask_cvttph_epi16): Likewise. (_mm512_maskz_cvttph_epi16): Likewise. (_mm512_cvtt_roundph_epi16): Likewise. (_mm512_mask_cvtt_roundph_epi16): Likewise. (_mm512_maskz_cvtt_roundph_epi16): Likewise. (_mm512_cvttph_epu16): Likewise. (_mm512_mask_cvttph_epu16): Likewise. (_mm512_maskz_cvttph_epu16): Likewise. (_mm512_cvtt_roundph_epu16): Likewise. (_mm512_mask_cvtt_roundph_epu16): Likewise. (_mm512_maskz_cvtt_roundph_epu16): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvttph_epi32): New intirnsic. (_mm_mask_cvttph_epi32): Likewise. (_mm_maskz_cvttph_epi32): Likewise. (_mm256_cvttph_epi32): Likewise. (_mm256_mask_cvttph_epi32): Likewise. (_mm256_maskz_cvttph_epi32): Likewise. (_mm_cvttph_epu32): Likewise. (_mm_mask_cvttph_epu32): Likewise. (_mm_maskz_cvttph_epu32): Likewise. (_mm256_cvttph_epu32): Likewise. (_mm256_mask_cvttph_epu32): Likewise. (_mm256_maskz_cvttph_epu32): Likewise. (_mm_cvttph_epi64): Likewise. (_mm_mask_cvttph_epi64): Likewise. (_mm_maskz_cvttph_epi64): Likewise. (_mm256_cvttph_epi64): Likewise. (_mm256_mask_cvttph_epi64): Likewise. (_mm256_maskz_cvttph_epi64): Likewise. (_mm_cvttph_epu64): Likewise. (_mm_mask_cvttph_epu64): Likewise. (_mm_maskz_cvttph_epu64): Likewise. (_mm256_cvttph_epu64): Likewise. (_mm256_mask_cvttph_epu64): Likewise. (_mm256_maskz_cvttph_epu64): Likewise. (_mm_cvttph_epi16): Likewise. (_mm_mask_cvttph_epi16): Likewise. (_mm_maskz_cvttph_epi16): Likewise. (_mm256_cvttph_epi16): Likewise. (_mm256_mask_cvttph_epi16): Likewise. (_mm256_maskz_cvttph_epi16): Likewise. (_mm_cvttph_epu16): Likewise. (_mm_mask_cvttph_epu16): Likewise. (_mm_maskz_cvttph_epu16): Likewise. (_mm256_cvttph_epu16): Likewise. (_mm256_mask_cvttph_epu16): Likewise. (_mm256_maskz_cvttph_epu16): Likewise. * config/i386/i386-builtin.def: Add new builtins. * config/i386/sse.md (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>): New. (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>): Ditto. (*avx512fp16_fix<fixunssuffix>_trunc<mode>2_load<mask_name>): Ditto. (avx512fp16_fix<fixunssuffix>_truncv2di2<mask_name>): Ditto. (avx512fp16_fix<fixunssuffix>_truncv2di2_load<mask_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-17AVX512FP16: Add vcvtsh2si/vcvtsh2usi/vcvtsi2sh/vcvtusi2sh.liuhongt5-0/+221
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm_cvtsh_i32): New intrinsic. (_mm_cvtsh_u32): Likewise. (_mm_cvt_roundsh_i32): Likewise. (_mm_cvt_roundsh_u32): Likewise. (_mm_cvtsh_i64): Likewise. (_mm_cvtsh_u64): Likewise. (_mm_cvt_roundsh_i64): Likewise. (_mm_cvt_roundsh_u64): Likewise. (_mm_cvti32_sh): Likewise. (_mm_cvtu32_sh): Likewise. (_mm_cvt_roundi32_sh): Likewise. (_mm_cvt_roundu32_sh): Likewise. (_mm_cvti64_sh): Likewise. (_mm_cvtu64_sh): Likewise. (_mm_cvt_roundi64_sh): Likewise. (_mm_cvt_roundu64_sh): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_round_builtin): Handle new builtin types. * config/i386/sse.md (avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix><round_name>): New define_insn. (avx512fp16_vcvtsh2<sseintconvertsignprefix>si<rex64namesuffix>_2): Likewise. (avx512fp16_vcvt<floatsuffix>si2sh<rex64namesuffix><round_name>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-16rs6000: Handle overloads during program parsingBill Schmidt3-1/+1132
Although this patch looks quite large, the changes are fairly minimal. Most of it is duplicating the large function that does the overload resolution using the automatically generated data structures instead of the old hand-generated ones. This doesn't make the patch terribly easy to review, unfortunately. Just be aware that generally we aren't changing the logic and functionality of overload handling. 2021-09-16 Bill Schmidt <wschmidt@linux.ibm.com> gcc/ * config/rs6000/rs6000-c.c (rs6000-builtins.h): New include. (altivec_resolve_new_overloaded_builtin): New forward decl. (rs6000_new_builtin_type_compatible): New function. (altivec_resolve_overloaded_builtin): Call altivec_resolve_new_overloaded_builtin. (altivec_build_new_resolved_builtin): New function. (altivec_resolve_new_overloaded_builtin): Likewise. * config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported): Likewise. * config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from name of rs6000_new_builtin_is_supported.
2021-09-16[i386] Change ix86_decompose_address return type to bool.Uros Bizjak2-25/+25
After a recent change only a boolean value is returned. 2021-09-16 Uroš Bizjak <ubizjak@gmail.com> gcc/ * config/i386/i386-protos.h (ix86_decompose_address): Change return type to bool. * config/i386/i386.c (ix86_decompose_address): Ditto.
2021-09-16PowerPC: Fix rs6000-gen-builtins with build != host [PR102353]Tobias Burnus1-10/+7
This mimics what the main Makefile.in does: compile the generator files under build (with Makefile.in's 'build/%.o' rule for compilation). It also adds $(RUN_GEN) to optionally run it with valgrind and the $(build_exeext) suffix. Before, the .o files were compiled with $(COMPILE), causing link error with $(LINKER_FOR_BUILD) for build != host. gcc/ PR target/102353 * config/rs6000/t-rs6000 (build/rs6000-gen-builtins.o, build/rbtree.o): Added 'build/' to target, use build/%.o rule. (build/rs6000-gen-builtins$(build_exeext)): Add 'build/' and '$(build_exeext)' to target and 'build/' for the *.o files. (rs6000-builtins.c): Update for those changes; run rs6000-gen-builtins with $(RUN_GEN).
2021-09-16sparc: Add scheduling information for LEON5Daniel Cederman6-16/+213
The LEON5 can often dual issue instructions from the same 64-bit aligned double word if there are no data dependencies. Add scheduling information to avoid scheduling unpairable instructions back-to-back. gcc/ChangeLog: * config/sparc/sparc-opts.h (enum sparc_processor_type): Add LEON5 * config/sparc/sparc.c (struct processor_costs): Add LEON5 costs (leon5_adjust_cost): Increase cost of store with data dependency on ALU instruction and FPU anti-dependencies. (sparc_option_override): Add LEON5 costs (sparc_adjust_cost): Add LEON5 cost adjustments * config/sparc/sparc.h: Add LEON5 * config/sparc/sparc.md: Include LEON5 scheduling information * config/sparc/sparc.opt: Add LEON5 * doc/invoke.texi: Add LEON5 * config/sparc/leon5.md: New file.
2021-09-16sparc: Add NOP in stack_protect_set32 if sparc_fix_b2bst enabledDaniel Cederman1-2/+8
This is needed to prevent the Store -> (Non-store or load) -> Store sequence. gcc/ChangeLog: * config/sparc/sparc.md (stack_protect_set32): Add NOP to prevent sensitive sequence for B2BST errata workaround.
2021-09-16sparc: Prevent atomic instructions in beginning of functions for UT700Daniel Cederman1-0/+11
A call to the function might have a load instruction in the delay slot and a load followed by an atomic function could cause a deadlock. gcc/ChangeLog: * config/sparc/sparc.c (sparc_do_work_around_errata): Do not begin functions with atomic instruction in the UT700 errata workaround.
2021-09-16sparc: Skip all empty assembly statementsDaniel Cederman1-14/+21
This version detects multiple empty assembly statements in a row and also detects non-memory barrier empty assembly statements (__asm__("")). It can be used instead of next_active_insn(). gcc/ChangeLog: * config/sparc/sparc.c (next_active_non_empty_insn): New function that returns next active non empty assembly instruction. (sparc_do_work_around_errata): Use new function.
2021-09-16sparc: Treat more instructions as load or store in errata workaroundsDaniel Cederman1-8/+41
Check the attribute of instruction to determine if it performs a store or load operation. This more generic approach sees the last instruction in the GOTdata_op model as a potential load and treats the memory barrier as a potential store instruction. gcc/ChangeLog: * config/sparc/sparc.c (store_insn_p): Add predicate for store attributes. (load_insn_p): Add predicate for load attributes. (sparc_do_work_around_errata): Use new predicates.
2021-09-16sparc: Print out bit names for LEON and LEON3 with -mdebugAndreas Larsson1-0/+4
gcc/ChangeLog: * config/sparc/sparc.c (dump_target_flag_bits): Print bit names for LEON and LEON3.
2021-09-16mips: Fix macro typoMartin Liska1-1/+1
gcc/ChangeLog: * config/mips/netbsd.h: Fix typo in name of a macro.
2021-09-16AVX512FP16: Add vcvtuw2ph/vcvtw2ph/vcvtdq2ph/vcvtudq2ph/vcvtqq2ph/vcvtuqq2phliuhongt8-3/+993
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtepi32_ph): New intrinsic. (_mm512_mask_cvtepi32_ph): Likewise. (_mm512_maskz_cvtepi32_ph): Likewise. (_mm512_cvt_roundepi32_ph): Likewise. (_mm512_mask_cvt_roundepi32_ph): Likewise. (_mm512_maskz_cvt_roundepi32_ph): Likewise. (_mm512_cvtepu32_ph): Likewise. (_mm512_mask_cvtepu32_ph): Likewise. (_mm512_maskz_cvtepu32_ph): Likewise. (_mm512_cvt_roundepu32_ph): Likewise. (_mm512_mask_cvt_roundepu32_ph): Likewise. (_mm512_maskz_cvt_roundepu32_ph): Likewise. (_mm512_cvtepi64_ph): Likewise. (_mm512_mask_cvtepi64_ph): Likewise. (_mm512_maskz_cvtepi64_ph): Likewise. (_mm512_cvt_roundepi64_ph): Likewise. (_mm512_mask_cvt_roundepi64_ph): Likewise. (_mm512_maskz_cvt_roundepi64_ph): Likewise. (_mm512_cvtepu64_ph): Likewise. (_mm512_mask_cvtepu64_ph): Likewise. (_mm512_maskz_cvtepu64_ph): Likewise. (_mm512_cvt_roundepu64_ph): Likewise. (_mm512_mask_cvt_roundepu64_ph): Likewise. (_mm512_maskz_cvt_roundepu64_ph): Likewise. (_mm512_cvtepi16_ph): Likewise. (_mm512_mask_cvtepi16_ph): Likewise. (_mm512_maskz_cvtepi16_ph): Likewise. (_mm512_cvt_roundepi16_ph): Likewise. (_mm512_mask_cvt_roundepi16_ph): Likewise. (_mm512_maskz_cvt_roundepi16_ph): Likewise. (_mm512_cvtepu16_ph): Likewise. (_mm512_mask_cvtepu16_ph): Likewise. (_mm512_maskz_cvtepu16_ph): Likewise. (_mm512_cvt_roundepu16_ph): Likewise. (_mm512_mask_cvt_roundepu16_ph): Likewise. (_mm512_maskz_cvt_roundepu16_ph): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtepi32_ph): New intrinsic. (_mm_mask_cvtepi32_ph): Likewise. (_mm_maskz_cvtepi32_ph): Likewise. (_mm256_cvtepi32_ph): Likewise. (_mm256_mask_cvtepi32_ph): Likewise. (_mm256_maskz_cvtepi32_ph): Likewise. (_mm_cvtepu32_ph): Likewise. (_mm_mask_cvtepu32_ph): Likewise. (_mm_maskz_cvtepu32_ph): Likewise. (_mm256_cvtepu32_ph): Likewise. (_mm256_mask_cvtepu32_ph): Likewise. (_mm256_maskz_cvtepu32_ph): Likewise. (_mm_cvtepi64_ph): Likewise. (_mm_mask_cvtepi64_ph): Likewise. (_mm_maskz_cvtepi64_ph): Likewise. (_mm256_cvtepi64_ph): Likewise. (_mm256_mask_cvtepi64_ph): Likewise. (_mm256_maskz_cvtepi64_ph): Likewise. (_mm_cvtepu64_ph): Likewise. (_mm_mask_cvtepu64_ph): Likewise. (_mm_maskz_cvtepu64_ph): Likewise. (_mm256_cvtepu64_ph): Likewise. (_mm256_mask_cvtepu64_ph): Likewise. (_mm256_maskz_cvtepu64_ph): Likewise. (_mm_cvtepi16_ph): Likewise. (_mm_mask_cvtepi16_ph): Likewise. (_mm_maskz_cvtepi16_ph): Likewise. (_mm256_cvtepi16_ph): Likewise. (_mm256_mask_cvtepi16_ph): Likewise. (_mm256_maskz_cvtepi16_ph): Likewise. (_mm_cvtepu16_ph): Likewise. (_mm_mask_cvtepu16_ph): Likewise. (_mm_maskz_cvtepu16_ph): Likewise. (_mm256_cvtepu16_ph): Likewise. (_mm256_mask_cvtepu16_ph): Likewise. (_mm256_maskz_cvtepu16_ph): Likewise. * config/i386/i386-builtin-types.def: Add corresponding builtin types. * config/i386/i386-builtin.def: Add corresponding new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/i386-modes.def: Declare V2HF and V6HF. * config/i386/sse.md (VI2H_AVX512VL): New. (qq2phsuff): Ditto. (sseintvecmode): Add HF vector modes. (avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>): New. (avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto. (*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>): Ditto. (avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto. (*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask): Ditto. (*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask_1): Ditto. (avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto. (*avx512fp16_vcvt<floatsuffix>qq2ph_v2di): Ditto. (avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto. (*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask): Ditto. (*avx512fp16_vcvt<floatsuffix>qq2ph_v2di_mask_1): Ditto. * config/i386/subst.md (round_qq2phsuff): New subst_attr. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.
2021-09-16AVX512FP16: Add vcvtph2dq/vcvtph2qq/vcvtph2w/vcvtph2uw/vcvtph2uqq/vcvtph2udqliuhongt6-0/+941
gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_cvtph_epi32): New intrinsic/ (_mm512_mask_cvtph_epi32): Likewise. (_mm512_maskz_cvtph_epi32): Likewise. (_mm512_cvt_roundph_epi32): Likewise. (_mm512_mask_cvt_roundph_epi32): Likewise. (_mm512_maskz_cvt_roundph_epi32): Likewise. (_mm512_cvtph_epu32): Likewise. (_mm512_mask_cvtph_epu32): Likewise. (_mm512_maskz_cvtph_epu32): Likewise. (_mm512_cvt_roundph_epu32): Likewise. (_mm512_mask_cvt_roundph_epu32): Likewise. (_mm512_maskz_cvt_roundph_epu32): Likewise. (_mm512_cvtph_epi64): Likewise. (_mm512_mask_cvtph_epi64): Likewise. (_mm512_maskz_cvtph_epi64): Likewise. (_mm512_cvt_roundph_epi64): Likewise. (_mm512_mask_cvt_roundph_epi64): Likewise. (_mm512_maskz_cvt_roundph_epi64): Likewise. (_mm512_cvtph_epu64): Likewise. (_mm512_mask_cvtph_epu64): Likewise. (_mm512_maskz_cvtph_epu64): Likewise. (_mm512_cvt_roundph_epu64): Likewise. (_mm512_mask_cvt_roundph_epu64): Likewise. (_mm512_maskz_cvt_roundph_epu64): Likewise. (_mm512_cvtph_epi16): Likewise. (_mm512_mask_cvtph_epi16): Likewise. (_mm512_maskz_cvtph_epi16): Likewise. (_mm512_cvt_roundph_epi16): Likewise. (_mm512_mask_cvt_roundph_epi16): Likewise. (_mm512_maskz_cvt_roundph_epi16): Likewise. (_mm512_cvtph_epu16): Likewise. (_mm512_mask_cvtph_epu16): Likewise. (_mm512_maskz_cvtph_epu16): Likewise. (_mm512_cvt_roundph_epu16): Likewise. (_mm512_mask_cvt_roundph_epu16): Likewise. (_mm512_maskz_cvt_roundph_epu16): Likewise. * config/i386/avx512fp16vlintrin.h (_mm_cvtph_epi32): New intrinsic. (_mm_mask_cvtph_epi32): Likewise. (_mm_maskz_cvtph_epi32): Likewise. (_mm256_cvtph_epi32): Likewise. (_mm256_mask_cvtph_epi32): Likewise. (_mm256_maskz_cvtph_epi32): Likewise. (_mm_cvtph_epu32): Likewise. (_mm_mask_cvtph_epu32): Likewise. (_mm_maskz_cvtph_epu32): Likewise. (_mm256_cvtph_epu32): Likewise. (_mm256_mask_cvtph_epu32): Likewise. (_mm256_maskz_cvtph_epu32): Likewise. (_mm_cvtph_epi64): Likewise. (_mm_mask_cvtph_epi64): Likewise. (_mm_maskz_cvtph_epi64): Likewise. (_mm256_cvtph_epi64): Likewise. (_mm256_mask_cvtph_epi64): Likewise. (_mm256_maskz_cvtph_epi64): Likewise. (_mm_cvtph_epu64): Likewise. (_mm_mask_cvtph_epu64): Likewise. (_mm_maskz_cvtph_epu64): Likewise. (_mm256_cvtph_epu64): Likewise. (_mm256_mask_cvtph_epu64): Likewise. (_mm256_maskz_cvtph_epu64): Likewise. (_mm_cvtph_epi16): Likewise. (_mm_mask_cvtph_epi16): Likewise. (_mm_maskz_cvtph_epi16): Likewise. (_mm256_cvtph_epi16): Likewise. (_mm256_mask_cvtph_epi16): Likewise. (_mm256_maskz_cvtph_epi16): Likewise. (_mm_cvtph_epu16): Likewise. (_mm_mask_cvtph_epu16): Likewise. (_mm_maskz_cvtph_epu16): Likewise. (_mm256_cvtph_epu16): Likewise. (_mm256_mask_cvtph_epu16): Likewise. (_mm256_maskz_cvtph_epu16): Likewise. * config/i386/i386-builtin-types.def: Add new builtin types. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-expand.c (ix86_expand_args_builtin): Handle new builtin types. (ix86_expand_round_builtin): Ditto. * config/i386/sse.md (sseintconvert): New. (ssePHmode): Ditto. (UNSPEC_US_FIX_NOTRUNC): Ditto. (sseintconvertsignprefix): Ditto. (avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode><mask_name><round_name>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-1.c: Add test for new builtins. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/sse-14.c: Add test for new intrinsics. * gcc.target/i386/sse-22.c: Ditto.