Age | Commit message (Collapse) | Author | Files | Lines |
|
Move SVE extension checking functionality to aarch64-builtins.cc, so
that it can be shared by non-SVE intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins.cc (check_builtin_call)
(expand_builtin): Update calls to the below.
(report_missing_extension, report_missing_registers)
(check_required_extensions): Move out of aarch64_sve namespace,
rename, and move into...
* config/aarch64/aarch64-builtins.cc (aarch64_report_missing_extension)
(aarch64_report_missing_registers)
(aarch64_check_required_extensions) ...here.
* config/aarch64/aarch64-protos.h (aarch64_check_required_extensions):
Add prototype.
|
|
Replace TARGET_GENERAL_REGS_ONLY check with an explicit check that
aarch64_isa_flags enables all required extensions. This will be more
flexible when repurposing this function for non-SVE intrinsics.
gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins.cc
(check_required_registers): Remove target check and rename to...
(report_missing_registers): ...this.
(check_required_extensions): Refactor.
|
|
Fix ICE when scalar coarrays are used in a select type. Prevent
coindexing in associate/select type/select rank selector expression.
gcc/fortran/ChangeLog:
PR fortran/46371
PR fortran/56496
* expr.cc (gfc_is_coindexed): Detect is coindexed also when
rewritten to caf_get.
* trans-stmt.cc (trans_associate_var): Always accept a
descriptor for coarrays.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/select_type_1.f90: New test.
* gfortran.dg/coarray/select_type_2.f90: New test.
* gfortran.dg/coarray/select_type_3.f90: New test.
|
|
[PR115917]
gcc/ada/ChangeLog:
PR ada/115917
* gnatvsn.ads: Add note about the duplication of this value in
version.c.
* version.c (VER_LEN_MAX): Define to the same value as
Gnatvsn.Ver_Len_Max.
(gnat_version_string): Use VER_LEN_MAX as bound.
|
|
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA
The fp reassociation width for Neoverse V2 was set to 6 since its
introduction and I guess it was empirically tuned. But since
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA was added the tree reassociation
pass seems to be more deliberate in forming FMAs and when that flag is
used it seems to more properly evaluate the FMA vs non-FMA reassociation
widths.
According to the Neoverse V2 SWOG the core has a throughput of 4 for
most FP operations, so the value 6 is not accurate anyway.
Also, the SWOG does state that FMADD operations are pipelined and the
results can be forwarded from FP multiplies to the accumulation operands
of FMADD instructions, which seems to be what
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA expresses.
This patch sets the fp_reassoc_width field to 4 and enables
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA for -mcpu=neoverse-v2.
On SPEC2017 fprate I see the following changes on a Grace system:
503.bwaves_r 0.16%
507.cactuBSSN_r -0.32%
508.namd_r 3.04%
510.parest_r 0.00%
511.povray_r 0.78%
519.lbm_r 0.35%
521.wrf_r 0.69%
526.blender_r -0.53%
527.cam4_r 0.84%
538.imagick_r 0.00%
544.nab_r -0.97%
549.fotonik3d_r -0.45%
554.roms_r 0.97%
Geomean 0.35%
with -Ofast -mcpu=grace -flto.
So slight overall improvement with a meaningful improvement in
508.namd_r.
I think other tunings in aarch64 should look into
AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA as well, but I'll leave the
benchmarking to someone else.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/ChangeLog:
* config/aarch64/tuning_models/neoversev2.h (fp_reassoc_width):
Set to 4.
(tune_flags): Add AARCH64_EXTRA_TUNE_FULLY_PIPELINED_FMA.
|
|
This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1315.
gcc/testsuite/ChangeLog:
* g++.dg/warn/pr33738-2.C: dg-prune arm linker messages about
size of enums.
Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
|
|
The 'code' part of a 'define_code_attr' refers to the type of the key, in other
words, it uses a code_iterator to pick the 'value' from their (key "value") pair
list.
However, rtx_alloc_for_name requires a code_attribute to be used when the
'value' needs to be a type. In other words, no other type of attributes could be
used, before this patch, to produce a rtx typed 'value'.
This patch removes that restriction and allows the backend to use any kind of
attribute as long as that attribute always produces a valid code typed 'value'.
gcc/ChangeLog:
* read-rtl.cc (rtx_reader::rtx_alloc_for_name): Allow all attribute
types to produce code 'values'.
(check_code_attribute): Rename ...
(check_attribute_codes): ... to this. And change comments to refer to
* doc/md.texi: Add paragraph to document that you can use int and mode
attributes to produce codes.
|
|
scanltranstree.exp defines some LTO wrappers around standard
non-LTO scanners. Four of them are cut-&-paste variants of
one another, so this patch generates them from a single template.
It also does the same for scan-ltrans-tree-dump-times, so that
other *-times scanners can be added easily in future.
The scanners seem to be lightly used. gcc.dg/ipa/ipa-icf-38.c uses
scan-ltrans-tree-dump{,-not} and libgomp.c/declare-variant-1.c
uses scan-ltrans-tree-dump-{not,times}. Nothing currently seems
to use scan-ltrans-tree-dump-dem*.
gcc/testsuite/
* lib/scanltranstree.exp: Redefine the routines using two
templates.
|
|
Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.
PR fortran/84244
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
generated for a component, link it to the component it is
generated for (the previous one).
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/ptr_comp_5.f08: New test.
|
|
gcc/
* config/aarch64/aarch64-simd.md (mov<mode> for VSTRUCT_QD):
Expand 16-byte vector mode const0 store by TImode.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def: Add new builtins.
* config/i386/sse.md:
(<avx512>_scalef<mode><mask_name><round_name>): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<mask_codefor>reducep<mode><mask_name><round_saeonly_name>):
Add condition check.
(<avx512>_rndscale<mode><mask_name><round_saeonly_name>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin):
Handle V8SF_FTYPE_V8SF_V8SF_INT_V8SF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_INT_V4DF_UQI_INT.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SF_V8SF_UQI_INT, V4DF_FTYPE_V4DF_V4DF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_UHI_INT, V16HF_FTYPE_V16HF_INT_V16HF_UHI_INT,
V4DF_FTYPE_V4DF_INT_V4DF_UQI_INT, V8SF_FTYPE_V8SF_INT_V8SF_UQI_INT.
* config/i386/sse.md:
(<avx512>_getexp<mode><mask_name><round_saeonly_name>):
Add condition check.
(<avx512>_getmant<mode><mask_name><round_saeonly_name>):
Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fnmsub_<mode>_mask3<round_name>): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmsub_<mode>_mask<round_name>): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
intrins
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmaddsub_<mode>_mask<round_name>): Add condition check.
(<avx512>_fmaddsub_<mode>_mask3<round_name>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md:
(<avx512>_fmadd_<mode>_mask3<round_name>): Add condition check.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HF_V16HF_INT, V16HF_FTYPE_V16HF_V16HF_V16HF_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UQI_INT,
V4DF_FTYPE_V4DF_V4DF_V4DI_INT_UQI_INT,
V8SF_FTYPE_V8SF_V8SF_V8SI_INT_UQI_INT.
* config/i386/sse.md:
(<avx512>_fixupimm<mode><sd_maskz_name><round_saeonly_name>):
Add condition check.
(<avx512>_fixupimm<mode>_mask<round_saeonly_name>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HF_FTYPE_V16HI_V16HF_UHI_INT.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-3.c: New test.
|
|
intrins
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md
(unspec_fix_truncv8sfv8si2<mask_name>): Extend rounding control.
(<mask_codefor>fixuns_trunc<mode><sseintvecmodelower>2<mask_name>):
Ditto.
(<mask_codefor>floatuns<sseintvecmodelower><mode>2<mask_name><round_name>):
Add condition check.
(fix<fixunssuffix>_trunc<mode><sselongvecmodelower>2<mask_name><round_saeonly_name>):
Remove round_saeonly_name.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/sse.md (avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name>):
Extend round control for 256bit.
(unspec_avx512fp16_fix<vcvtt_uns_suffix>_trunc<mode>2<mask_name>):
Ditto.
(avx512fp16_fix<fixunssuffix>_trunc<mode>2<mask_name><round_saeonly_name>):
Add condition check.
* config/i386/subst.md
(round_saeonly_mode_condition): Add V16HI check for 256bit.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DI_V4DF_UQI_INT, V4SF_FTYPE_V4DI_V4SF_UQI_INT,
V8HF_FTYPE_V4DI_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvt<floatsuffix>qq2ph_v4di_mask_round): New expand.
(*avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode>_mask):
Extend round control and add "_1" suffix.
(float<floatunssuffix><sseintvecmodelower><mode>2<mask_name><round_name>):
Add condition check.
(float<floatunssuffix><sselongvecmodelower><mode>2<mask_name><round_name>):
Ditto.
(float<floatunssuffix><mode><ssePSmode2lower>2<mask_name><round_name>):
Limit suffix output.
(unspec_fix_truncv4dfv4si2<mask_name>): Extend round control.
(unspec_fixuns_truncv4dfv4si2<mask_name>): Ditto.
* config/i386/subst.md (round_qq2pssuff): New iterator.
(round_saeonly_suff): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-2.c: New test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SI_FTYPE_V8SF_V8SI_UQI_INT, V4DI_FTYPE_V4SF_V4DI_UQI_INT.
* config/i386/sse.md
(<sse2_avx_avx512f>_fix_notrunc<sf2simodelower><mode><mask_name>):
Extend to round.
(<mask_codefor><avx512>_fixuns_notrunc<sf2simodelower><mode><mask_name><round_name>):
Add round condition check.
* config/i386/subst.md (round_constraint4): New.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V16HI_FTYPE_V16HF_V16HI_UHI_INT, V4DF_FTYPE_V4SF_V4DF_UQI_INT
V8HF_FTYPE_V8SF_V8HF_UQI_INT.
* config/i386/sse.md
(avx512fp16_vcvt<castmode>2ph_<mode><mask_name><round_name>):
Add round condition check.
* config/i386/subst.md (round_mode_condition): Add V16HI check for
256bit.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: New intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8HF_V8SF_UQI_INT, V8SI_FTYPE_V8HF_V8SI_UQI_INT,
V4DF_FTYPE_V8HF_V4DF_UQI_INT, V4DI_FTYPE_V8HF_V4DI_UQI_INT.
* config/i386/sse.md:
(avx512fp16_float_extend_ph<mode>2<mask_name><round_saeonly_name>):
Add condition check.
(avx512fp16_vcvtph2<sseintconvertsignprefix><sseintconvert>_<mode>
<mask_name><round_name>):
Ditto.
(avx512fp16_float_extend_ph<mode>2<mask_name>): Extend round saeonly.
(vcvtph2ps256<mask_name>): Ditto.
* config/i386/subst.md
(round_saeonly_applied): New condition.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Add new macro test.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DI_FTYPE_V4DF_V4DI_UQI_INT, V4SI_FTYPE_V4DF_V4SI_UQI_INT.
* config/i386/sse.md:
(avx_cvtpd2dq256<mask_name>): Change name to
avx_cvtpd2dq256<mask_name><round_name> and extend pattern to
generate 256bit insns.
(fixuns_notrunc<mode><si2dfmodelower>2<mask_name><round_name>):
Add round_mode_condition.
* config/i386/subst.md (round_pd2udqsuff): New iterator.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
|
|
gcc/ChangeLog:
* config/i386/avx10_2roundingintrin.h: Add new intrins.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V8SF_FTYPE_V8SI_V8SF_UQI_INT, V4SF_FTYPE_V4DF_V4SF_UQI_INT,
V8HF_FTYPE_V8SI_V8HF_UQI_INT, V8HF_FTYPE_V4DF_V8HF_UQI_INT.
* config/i386/sse.md:
(avx512fp16_vcvt<floatsuffix><sseintconvert>2ph_<mode><mask_name><round_name>):
Add condition check.
(avx512fp16_vcvtpd2ph_v4df_mask_round): New expand.
(*avx512fp16_vcvt<castmode>2ph_<mode>_mask): Change name to
avx512fp16_vcvt<castmode>2ph_<mode>_mask<round_name>_1
and extend pattern to generate 256bit insns.
(avx_cvtpd2ps256<mask_name>): Change name to
avx_cvtpd2ps256<mask_name><round_name> and extend pattern to
generate 256bit insns.
* config/i386/subst.md (round_applied): New condition.
(round_suff): New iterator.
(round_mode_condition): Add V32HI check for 512bit.
(round_saeonly_mode_condition): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add new builtin test.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add new macro test.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: Add test.
|
|
gcc/ChangeLog:
* config.gcc: Add avx10_2roundingintrin.h.
* config/i386/i386-builtin-types.def: Add new DEF_FUNCTION_TYPE.
* config/i386/i386-builtin.def (BDESC): Add new builtins.
* config/i386/i386-expand.cc (ix86_expand_round_builtin): Handle
V4DF_FTYPE_V4DF_V4DF_V4DF_UQI_INT, V8SF_FTYPE_V8SF_V8SF_V8SF_UQI_INT,
V16HF_FTYPE_V16HF_V16HF_V16HF_UHI_INT, UQI_FTYPE_V4DF_V4DF_INT_UQI_INT,
UHI_FTYPE_V16HF_V16HF_INT_UHI_INT, UQI_FTYPE_V8SF_V8SF_INT_UQI_INT.
* config/i386/immintrin.h: Include avx10_2roundingintrin.h.
* config/i386/sse.md: Change subst_attr name due to renaming.
* config/i386/subst.md:
(<round_mode512bit_condition>): Add condition check for avx10.2
rounding control 256bit intrins and renamed to ...
(<round_mode_condition>): ...this.
(round_saeonly_mode512bit_condition): Add condition check for
avx10.2 rounding control 256 bit intris and renamed to ...
(round_saeonly_mode_condition): ...this.
* config/i386/avx10_2roundingintrin.h: New file.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx-1.c: Add -mavx10.2 and new builtin test.
* gcc.target/i386/avx-2.c: Ditto.
* gcc.target/i386/sse-13.c: Add new tests.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx10_2-rounding-1.c: New test.
|
|
|
|
This fixes two general ubsan issues in ext-dce, both related to use-side
processsing of modes > DImode.
In ext_dce_process_uses we can be presented with something like this as a use
(subreg:SI (reg:TF) 12)
That will result in an out of range shift for a HOST_WIDE_INT object. Where
this happens is safe to just break from the SET context and process the
subjects. This will ultimately result in seeing (reg:TF) and we'll mark all
bit groups as live.
In carry_backpropagate we can be presented with a TImode shift (for example)
and the shift count can be > 63 for such a shift. This naturally trips ubsan
as well as we're operating on 64 bit objects.
We can just return mmask in this case noting that every bit group is live.
The combination of these two fixes eliminates all the reported ubsan issues in
ext-dce seen in a bootstrap and regression test on x86.
While I was in there I went ahead and fixed the various hardcoded 63/64 values
to be HOST_BITS_PER_WIDE_INT based.
Bootstrapped and regression tested on x86 with no regressions. Also built with
ubsan enabled and verified the build logs and testsuite logs don't call out any
issues in ext-dce anymore.
Pushing to the trunk.
PR rtl-optimization/115876
gcc
* ext-dce.cc (ext_dce_process_sets): Replace hardcoded 63/64 instances
with HOST_BITS_PER_WIDE_INT based values.
(carry_backpropagate): Handle modes with more bits than
HOST_BITS_PER_WIDE_INT gracefully, avoiding undefined behavior.
(ext_dce_process_uses): Handle subreg offsets which would result
in ubsan shifts gracefully, avoiding undefined behavior.
|
|
gcc:
* doc/gm2.texi (Contributing): Tweak gm2 mailing list address.
|
|
To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.
Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op
instead of manually extracting from/creating the gimple.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr87007-5.c: Disable phi-opt.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The 16-bit additions like addhi3 have two forms: One with a scratch:QI
and one without, where the latter is required because reload cannot
deal with a scratch when spill code pops a 16-bit addition.
Passes like combine and fwprop1 may come up with the non-scratch version,
which is sub-optimal in the case when the addition is performed in a
NO_LD_REGS register because the operands will be spilled to LD_REGS.
Having a scratch:QI at disposal can lead to better code with less spills.
gcc/
* config/avr/avr.md (*add<mode>3_split) [!reload_completed]:
Add a scratch:QI to 16-bit additions with constant.
|
|
PR target/116407
gcc/
* config/avr/avr.md (*dec-and-branchhi!=-1.l.clobber):
Increase the additional jump offset to 2 words.
|
|
Some text peepholes output extra instructions prior to a branch
instruction and that increase the jump offset of backward branches.
PR target/116407
gcc/
* config/avr/avr-protos.h (avr_jump_mode): Add an int argument.
* config/avr/avr.cc (avr_jump_mode): Add an int argument to increase
the computed jump offset of backwards branches.
* config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1):
Increase the jump offset used by avr_jump_mode() as needed.
gcc/testsuite/
* gcc.target/avr/torture/pr116407-2.c: New test.
* gcc.target/avr/torture/pr116407-4.c: New test.
|
|
This extends r14-3982-g9ea74d235c7e78 to also include the newly added statements
since some of them might be dead too (due to the way match and simplify works).
This was noticed while working on adding a new match and simplify pattern where a
new statement that got added was not being used.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* gimple-fold.cc (mark_lhs_in_seq_for_dce): New function.
(replace_stmt_with_simplification): Call mark_lhs_in_seq_for_dce
right before inserting the sequence.
(fold_stmt_1): Add dce_worklist argument, update call to
replace_stmt_with_simplification.
(fold_stmt): Add dce_worklist argument, update call to fold_stmt_1.
(fold_stmt_inplace): Update call to fold_stmt_1.
* gimple-fold.h (fold_stmt): Add bitmap argument.
* tree-ssa-forwprop.cc (pass_forwprop::execute): Update call to fold_stmt.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This patch would like to implement the quad and oct .SAT_TRUNC pattern
in the riscv backend. Aka:
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_1 (WT x) \
{ \
bool overflow = x > (WT)(NT)(-1); \
return ((NT)x) | (NT)-overflow; \
}
DEF_SAT_U_TRUC_FMT_1(uint16_t, uint64_t)
Before this patch:
4 │ __attribute__((noinline))
5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
6 │ {
7 │ _Bool overflow;
8 │ short unsigned int _1;
9 │ short unsigned int _2;
10 │ short unsigned int _3;
11 │ uint16_t _6;
12 │
13 │ ;; basic block 2, loop depth 0
14 │ ;; pred: ENTRY
15 │ overflow_5 = x_4(D) > 65535;
16 │ _1 = (short unsigned int) x_4(D);
17 │ _2 = (short unsigned int) overflow_5;
18 │ _3 = -_2;
19 │ _6 = _1 | _3;
20 │ return _6;
21 │ ;; succ: EXIT
22 │
23 │ }
After this patch:
3 │
4 │ __attribute__((noinline))
5 │ uint16_t sat_u_truc_uint64_t_to_uint16_t_fmt_1 (uint64_t x)
6 │ {
7 │ uint16_t _6;
8 │
9 │ ;; basic block 2, loop depth 0
10 │ ;; pred: ENTRY
11 │ _6 = .SAT_TRUNC (x_4(D)); [tail call]
12 │ return _6;
13 │ ;; succ: EXIT
14 │
15 │ }
The below tests suites are passed for this patch
1. The rv64gcv fully regression test.
2. The rv64gcv build with glibc
gcc/ChangeLog:
* config/riscv/iterators.md (ANYI_QUAD_TRUNC): New iterator for
quad truncation.
(ANYI_OCT_TRUNC): New iterator for oct truncation.
(ANYI_QUAD_TRUNCATED): New attr for truncated quad modes.
(ANYI_OCT_TRUNCATED): New attr for truncated oct modes.
(anyi_quad_truncated): Ditto but for lower case.
(anyi_oct_truncated): Ditto but for lower case.
* config/riscv/riscv.md (ustrunc<mode><anyi_quad_truncated>2):
Add new pattern for quad truncation.
(ustrunc<mode><anyi_oct_truncated>2): Ditto but for oct.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-2.c: Adjust
the expand dump check times.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-3.c: Ditto.
* gcc.target/riscv/sat_arith_data.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-4.c: New test.
* gcc.target/riscv/sat_u_trunc-5.c: New test.
* gcc.target/riscv/sat_u_trunc-6.c: New test.
* gcc.target/riscv/sat_u_trunc-run-4.c: New test.
* gcc.target/riscv/sat_u_trunc-run-5.c: New test.
* gcc.target/riscv/sat_u_trunc-run-6.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
For QI/HImode of .SAT_ADD, the operands may be sign-extended and the
high bits of Xmode may be all 1 which is not expected. For example as
below code.
signed char b[1];
unsigned short c;
signed char *d = b;
int main() {
b[0] = -40;
c = ({ (unsigned short)d[0] < 0xFFF6 ? (unsigned short)d[0] : 0xFFF6; }) + 9;
__builtin_printf("%d\n", c);
}
After expanding we have:
;; _6 = .SAT_ADD (_3, 9);
(insn 8 7 9 (set (reg:DI 143)
(high:DI (symbol_ref:DI ("d") [flags 0x86] <var_decl d>)))
(nil))
(insn 9 8 10 (set (reg/f:DI 142)
(mem/f/c:DI (lo_sum:DI (reg:DI 143)
(symbol_ref:DI ("d") [flags 0x86] <var_decl d>)) [1 d+0 S8 A64]))
(nil))
(insn 10 9 11 (set (reg:HI 144 [ _3 ])
(sign_extend:HI (mem:QI (reg/f:DI 142) [0 *d.0_1+0 S1 A8]))) "test.c":7:10 -1
(nil))
The convert from signed char to unsigned short will have sign_extend rtl
as above. And finally become the lb insn as below:
lb a1,0(a5) // a1 is -40, aka 0xffffffffffffffd8
lui a0,0x1a
addi a5,a1,9
slli a5,a5,0x30
srli a5,a5,0x30 // a5 is 65505
sltu a1,a5,a1 // compare 65505 and 0xffffffffffffffd8 => TRUE
The sltu try to compare 65505 and 0xffffffffffffffd8 here, but we
actually want to compare 65505 and 65496 (0xffd8). Thus we need to
clean up the high bits to ensure this.
The below test suites are passed for this patch:
* The rv64gcv fully regression test.
PR target/116278
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Add new
func impl to zero extend rtx.
(riscv_expand_usadd): Leverage above func to cleanup operands 0
and remove the special handing for SImode in RV64.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_u_add-11.c: Adjust asm check body.
* gcc.target/riscv/sat_u_add-15.c: Ditto.
* gcc.target/riscv/sat_u_add-19.c: Ditto.
* gcc.target/riscv/sat_u_add-23.c: Ditto.
* gcc.target/riscv/sat_u_add-3.c: Ditto.
* gcc.target/riscv/sat_u_add-7.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-11.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-3.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-7.c: Ditto.
* gcc.target/riscv/pr116278-run-1.c: New test.
* gcc.target/riscv/pr116278-run-2.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 3. Aka:
Form 3:
#define DEF_SAT_U_TRUC_FMT_3(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \
{ \
WT max = (WT)(NT)-1; \
return x <= max ? (NT)x : (NT) max; \
}
DEF_SAT_U_TRUC_FMT_3 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-13.c: New test.
* gcc.target/riscv/sat_u_trunc-14.c: New test.
* gcc.target/riscv/sat_u_trunc-15.c: New test.
* gcc.target/riscv/sat_u_trunc-run-13.c: New test.
* gcc.target/riscv/sat_u_trunc-run-14.c: New test.
* gcc.target/riscv/sat_u_trunc-run-15.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
This patch would like to add test cases for the unsigned scalar
.SAT_TRUNC form 2. Aka:
Form 2:
#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \
NT __attribute__((noinline)) \
sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \
{ \
WT max = (WT)(NT)-1; \
return x > max ? (NT) max : (NT)x; \
}
DEF_SAT_U_TRUC_FMT_2 (uint32_t, uint64_t)
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-7.c: New test.
* gcc.target/riscv/sat_u_trunc-8.c: New test.
* gcc.target/riscv/sat_u_trunc-9.c: New test.
* gcc.target/riscv/sat_u_trunc-run-7.c: New test.
* gcc.target/riscv/sat_u_trunc-run-8.c: New test.
* gcc.target/riscv/sat_u_trunc-run-9.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
|
|
|
|
This is analogous to a prior patch to ext-dce which fixes propagation of sign
bits, but this time for the saturating variants. I'd held off fixing those
because I wanted the time to look at that code (since we don't have a testcase
for it as far as I know).
Not surprisingly, putting an abort on that path and running an x86 bootstrap
and testsuite run, it never triggers. Of course not a lot of code tries to do
saturating shifts.
Anyway, bootstrapped and regression tested on x86_64. Pushing to the trunk.
Thanks for everyone's patience.
gcc/
* ext-dce.cc (carry_backpropagate): Cast mask to HOST_WIDE_INT before
shifting.
|
|
The attach patch is specific to the RTEMS RISC-V architecture multilib which is
controlled by the t-rtems file in the gcc/config/riscv/ directory. The patch
file was created from the gcc-13.3.0 branch. It was successfully tested within
RTEMS Source Builder.
gcc/
* config/riscv/t-rtems: Add ilp32f multilib.
|
|
The recent if-conversion changes tripped a failure on the v850 port.
The core underlying issue is that while the if-conversion code tries to do the
right thing with noce_can_force_operand to determine if it can force an
arbitrary operand into a register, it's not really a sufficient check.
Essentially for arithmetic codes, it checks the operands. If the operands are
force-able and there's a code_to_optab mapping, then it returns true.
code_to_optab doesn't actually check anything other than the existence of a
mapping in the target. If the target pattern has restrictions enforced by the
condition or it's an expander that is allowed to FAIL, then
noce_can_force_operand to be true, even though we may not be able to directly
force the operand into a register.
This came up on the v850 when we had an operand that was a rotate by a constant
number of bits (I don't remember the count, all that's important about it was
the count was not 8 or 16).
The v850 port has this define_expand:
> (define_expand "rotlsi3"
> [(parallel [(set (match_operand:SI 0 "register_operand" "")
> (rotate:SI (match_operand:SI 1 "register_operand" "")
> (match_operand:SI 2 "const_int_operand" "")))
> (clobber (reg:CC CC_REGNUM))])]
> "(TARGET_V850E_UP)"
> {
> if (INTVAL (operands[2]) != 16)
> FAIL;
> })
So the only rotate count allowed is 16 (there's a similar HI rotate with a count of 8). AFAICT the rotate patterns are allowed to FAIL. So naturally the expander fails and we get a testsuite regression:
> Tests that now fail, but worked before (4 tests):
>
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-mgcc-abi/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-msoft-float/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
> v850-sim/-mv850e3v5: gcc: gcc.c-torture/execute/20100805-1.c -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
This patch works around the problem by allowing the rotates in additional
cases, particularly for the V850E3V5+ variants which have a general rotate
capability. But let's be clear, this is just a workaround and I expect we're
going to have to revisit the code to test if an operand can be forced into a
register.
gcc/
* config/v850/v850.md (rotlsi3): Allow more cases for V850E3V5+.
|
|
When rs1 is the immediate 0, the following ICE occurs:
error: unrecognizable insn:
(insn 8 5 12 2 (set (reg:RVVM1DI 134 [ <retval> ])
(if_then_else:RVVM1DI (unspec:RVVMF64BI [
(const_vector:RVVMF64BI repeat [
(const_int 1 [0x1])
])
(reg/v:DI 137 [ vl ])
(const_int 2 [0x2]) repeated x2
(const_int 0 [0])
(reg:SI 66 vl)
(reg:SI 67 vtype)
] UNSPEC_VPREDICATE)
(plus:RVVM1DI (mult:RVVM1DI (vec_duplicate:RVVM1DI (const_int 0 [0]))
(reg/v:RVVM1DI 136 [ vs2 ]))
(reg/v:RVVM1DI 135 [ vd ]))
(reg/v:RVVM1DI 135 [ vd ])))
gcc/ChangeLog:
* config/riscv/vector.md: Allow scalar operand to be 0.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/bug-7.c: New test.
* gcc.target/riscv/rvv/base/bug-8.c: New test.
|
|
So as expected the core problem with target/116282 is that the cost of certain
constant synthesis cases varied depending on whether or not we're allowed to
generate new pseudos or not.
That in turn meant that in obscure cases an insn might change from recognizable
to unrecognizable and triggers the observed failure.
So we need to keep the cost stable, at least when called from a pattern's
condition. So we pass another boolean down when necessary. I've tried to keep
API fallout minimized.
Built and tested on rv32 in my tester. Let's see what pre-commit testing has
to say though 🙂
Note this will also require a minor change to the in-flight constant synthesis
work.
PR target/116282
gcc/
* config/riscv/riscv-protos.h (riscv_const_insns): Add new argument.
* config/riscv/riscv.cc (riscv_build_integer): Add new argument
ALLOW_NEW_PSEUDOS. Pass it down to recursive calls and check it
before using synthesis which allows new registers to be created.
(riscv_split_integer_cost): Pass new argument to riscv_build_integer.
(riscv_integer_cost): Add ALLOW_NEW_PSEUDOS argument, pass it down to
riscv_build_integer.
(riscv_legitimate_constant_p): Pass new argument to riscv_const_insns.
(riscv_const_insns): New argment ALLOW_NEW_PSEUDOS. Pass it down to
riscv_integer_cost and riscv_const_insns.
(riscv_split_const_insns): Pass new argument to riscv_const_insns.
(riscv_move_integer, riscv_rtx_costs): Similarly.
* config/riscv/riscv.md (shadd with costly constant): Pass new argument
to riscv_const_insns.
* config/riscv/bitmanip.md (and with costly constant): Pass new argument
to riscv_const_insns.
gcc/testsuite/
* gcc.target/riscv/pr116282.c: New test.
|