aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-10-08Support smin/smax for V2HF/V4HFliuhongt3-35/+83
gcc/ChangeLog: * config/i386/mmx.md (VHF_32_64): New mode iterator. (<insn><mode>3): New define_expand, merged from .. (<insn>v4hf3): .. this and (<insn>v2hf3): .. this. (movd_v2hf_to_sse_reg): New define_expand, splitted from .. (movd_v2hf_to_sse): .. this. (<code><mode>3): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-vminmaxph-1.c: New test. * gcc.target/i386/avx512fp16-64-32-vecop-1.c: Scan-assembler only for { target { ! ia32 } }.
2023-10-08Fortran/OpenMP: Fix handling of strictly structured blocksTobias Burnus5-5/+127
For strictly structured blocks, a BLOCK was created but the code was placed after the block the outer structured block. Additionally, labelled blocks were mishandled. As the code is now properly in a BLOCK, it solves additional issues. gcc/fortran/ChangeLog: * parse.cc (parse_omp_structured_block): Make the user code end up inside of BLOCK construct for strictly structured blocks; fix fallout for 'section' and 'teams'. * openmp.cc (resolve_omp_target): Fix changed BLOCK handling for teams in target checking. libgomp/ChangeLog: * testsuite/libgomp.fortran/strictly-structured-block-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/block_17.f90: New test. * gfortran.dg/gomp/strictly-structured-block-5.f90: New test.
2023-10-08rs6000: build constant via li/lis;rldicJiufu Guo2-2/+89
This patch checks if a constant is possible to be built by "li;rldic". Only need to take care of "negative li", other forms do not need to check. For example, "negative lis" is just a "negative li" with an additional shift. gcc/ChangeLog: * config/rs6000/rs6000.cc (can_be_built_by_li_and_rldic): New function. (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rldic. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const-build.c: Add more tests.
2023-10-08rs6000: build constant via li/lis;rldicl/rldicrJiufu Guo2-2/+105
If a constant is possible left/right cleaned on a rotated value from a negative value of "li/lis". Then, using "li/lis ; rldicl/rldicr" to build the constant. gcc/ChangeLog: * config/rs6000/rs6000.cc (can_be_built_by_li_lis_and_rldicl): New function. (can_be_built_by_li_lis_and_rldicr): New function. (rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rldicr and can_be_built_by_li_lis_and_rldicl. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const-build.c: Add more tests.
2023-10-08rs6000: build constant via lis;rotldiJiufu Guo2-7/+53
If a constant is possible to be rotated to/from a negative value from "lis", then using "lis;rotldi" to build the constant. The positive value of "lis" does not need to be analyzed. Because if a constant can be rotated from the positive value of "lis", it also can be rotated from a positive value of "li". gcc/ChangeLog: * config/rs6000/rs6000.cc (can_be_rotated_to_negative_lis): New function. (can_be_built_by_li_and_rotldi): Rename to ... (can_be_built_by_li_lis_and_rotldi): ... this function. (rs6000_emit_set_long_const): Call can_be_built_by_li_lis_and_rotldi. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const-build.c: Add more tests.
2023-10-08rs6000: build constant via li;rotldiJiufu Guo2-6/+98
If a constant is possible to be rotated to/from a positive or negative value which "li" can generated, then "li;rotldi" can be used to build the constant. gcc/ChangeLog: * config/rs6000/rs6000.cc (can_be_built_by_li_and_rotldi): New function. (rs6000_emit_set_long_const): Call can_be_built_by_li_and_rotldi. gcc/testsuite/ChangeLog: * gcc.target/powerpc/const-build.c: New test.
2023-10-08[i386] Fix apx test fails on 32bit targetHongyu Wang5-5/+5
Since -mapxf works similar as -muintr that will emit error for 32bit target, add !ia32 target guard for apx related tests. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-egprs-names.c: Compile for non-ia32. * gcc.target/i386/apx-inline-gpr-norex2.c: Likewise. * gcc.target/i386/apx-interrupt-1.c: Likewise. * gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: Likewise. * gcc.target/i386/apx-legacy-insn-check-norex2.c: Likewise.
2023-10-08RISC-V: add static-pie supportYanzhang Wang1-3/+4
We only need to pass options to the linker when static-pie is passed. There's another patch to enable static-pie in glibc. And we need to enable in GCC first. gcc/ChangeLog: * config/riscv/linux.h: Pass the static-pie specific options to the linker. Signed-off-by: Yanzhang Wang <yanzhang.wang@intel.com>
2023-10-08TEST: Fix XPASS of TSVC testsuites for RVVJuzhe-Zhong23-23/+23
Fix these following XPASS FAILs of TSVC for RVV: XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops" gcc/testsuite/ChangeLog: * gcc.dg/vect/tsvc/vect-tsvc-s1115.c: Fix TSVC XPASS. * gcc.dg/vect/tsvc/vect-tsvc-s114.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1161.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1232.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s124.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s1279.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s161.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s253.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s257.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s271.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s2711.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s2712.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s272.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s273.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s274.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s276.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s278.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s279.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s3111.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s353.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s441.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-s443.c: Ditto. * gcc.dg/vect/tsvc/vect-tsvc-vif.c: Ditto.
2023-10-08RISC-V: Enable more tests of "vect" for RVVJuzhe-Zhong1-37/+101
This patch enables almost full coverage vectorization tests for RVV, except these following tests (not enabled yet): 1. Will enable soon: check_effective_target_vect_call_lrint check_effective_target_vect_call_btrunc check_effective_target_vect_call_btruncf check_effective_target_vect_call_ceil check_effective_target_vect_call_ceilf check_effective_target_vect_call_floor check_effective_target_vect_call_floorf check_effective_target_vect_call_lceil check_effective_target_vect_call_lfloor check_effective_target_vect_call_nearbyint check_effective_target_vect_call_nearbyintf check_effective_target_vect_call_round check_effective_target_vect_call_roundf 2. Not sure we will need to enable or not: check_effective_target_vect_complex_* check_effective_target_vect_simd_clones check_effective_target_vect_bswap check_effective_target_vect_widen_shift check_effective_target_vect_widen_mult_* check_effective_target_vect_widen_sum_* check_effective_target_vect_unpack check_effective_target_vect_interleave check_effective_target_vect_extract_even_odd check_effective_target_vect_pack_trunc check_effective_target_vect_check_ptrs check_effective_target_vect_sdiv_pow2_si check_effective_target_vect_usad_* check_effective_target_vect_udot_* check_effective_target_vect_sdot_* check_effective_target_vect_gather_load_ifn After this patch, we will have these following additional FAILs: XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1115.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s114.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1232.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s124.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s1279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s161.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s253.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s257.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s271.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2711.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s2712.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s272.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s273.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s274.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s276.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s278.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s279.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s3111.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s353.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s441.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-s443.c scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c -flto -ffat-lto-objects scan-tree-dump vect "vectorized 1 loops" XPASS: gcc.dg/vect/tsvc/vect-tsvc-vif.c scan-tree-dump vect "vectorized 1 loops" FAIL: gcc.dg/vect/vect-114.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/vect/vect-114.c scan-tree-dump-times vect "vectorized 0 loops" 1 FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1 FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 1 FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects scan-tree-dump vect "Reduce using vector shifts" FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector shifts" FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects scan-tree-dump vect "Reduce using vector shifts" FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector shifts" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_ADD" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_SUB" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_MUL" FAIL: gcc.dg/vect/vect-cond-arith-5.c -flto -ffat-lto-objects scan-tree-dump optimized " = \\.COND_RDIV" They are all dump FAILs (No more ICE and execution FAILs). Fixing those FAILs will be another separate patch. But I think we should commit this patch first. Ok for trunk ? gcc/testsuite/ChangeLog: * lib/target-supports.exp: Enable more vect tests for RVV.
2023-10-08Daily bump.GCC Administrator3-1/+502
2023-10-07aarch64: Enable Cortex-X4 CPUSaurabh Jha3-4/+6
This patch adds support for the Cortex-X4 CPU to GCC. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add support for cortex-x4 core. * config/aarch64/aarch64-tune.md: Regenerated. * doc/invoke.texi: Add command-line option for cortex-x4 core.
2023-10-07Revert "RISC-V: Add more run test for FP rounding autovec"Lehua Ding10-371/+2
Revert since other fails are introduced This reverts commit 7866984ba427dc56a12ee1b8d99feb4927b834b1.
2023-10-07[APX EGPR] Handle vex insns that only support GPR16 (5/5)Kong Lingling4-142/+242
These vex insn may have legacy counterpart that could support EGPR, but they do not have evex counterpart. Split out its vex part from patterns and set the vex part to non-EGPR supported by adjusting constraints and attr_gpr32. insn list: 1. vmovmskpd/vmovmskps 2. vpmovmskb 3. vrsqrtss/vrsqrtps 4. vrcpss/vrcpps 5. vhaddpd/vhaddps, vhsubpd/vhsubps 6. vldmxcsr/vstmxcsr 7. vaddsubpd/vaddsubps 8. vlddqu 9. vtestps/vtestpd 10. vmaskmovps/vmaskmovpd, vpmaskmovd/vpmaskmovq 11. vperm2f128/vperm2i128 12. vinserti128/vinsertf128 13. vbroadcasti128/vbroadcastf128 14. vcmppd/vcmpps, vcmpss/vcmpsd 15. vgatherdps/vgatherqps, vgatherdpd/vgatherqpd gcc/ChangeLog: * config/i386/constraints.md (jb): New constraint for vsib memory that does not allow gpr32. * config/i386/i386.md: (setcc_<mode>_sse): Replace m to jm for avx alternative and set attr_gpr32 to 0. (movmsk_df): Split avx/noavx alternatives and replace "r" to "jr" for avx alternative. (<sse>_rcp<mode>2): Split avx/noavx alternatives and replace "m/Bm" to "jm/ja" for avx alternative, set its gpr32 attr to 0. (*rsqrtsf2_sse): Likewise. * config/i386/mmx.md (mmx_pmovmskb): Split alternative 1 to avx/noavx and assign jr/r constraint to dest. * config/i386/sse.md (<sse>_movmsk<ssemodesuffix><avxsizesuffix>): Split avx/noavx alternatives and replace "r" to "jr" for avx alternative. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext): Likewise. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt): Likewise. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt): Likewise. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_shift): Likewise. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_shift): Likewise. (<sse2_avx2>_pmovmskb): Likewise. (*<sse2_avx2>_pmovmskb_zext): Likewise. (*sse2_pmovmskb_ext): Likewise. (*<sse2_avx2>_pmovmskb_lt): Likewise. (*<sse2_avx2>_pmovmskb_zext_lt): Likewise. (*sse2_pmovmskb_ext_lt): Likewise. (<sse>_rcp<mode>2): Split avx/noavx alternatives and replace "m/Bm" to "jm/ja" for avx alternative, set its attr_gpr32 to 0. (sse_vmrcpv4sf2): Likewise. (*sse_vmrcpv4sf2): Likewise. (rsqrt<mode>2): Likewise. (sse_vmrsqrtv4sf2): Likewise. (*sse_vmrsqrtv4sf2): Likewise. (avx_h<insn>v4df3): Likewise. (sse3_hsubv2df3): Likewise. (avx_h<insn>v8sf3): Likewise. (sse3_h<insn>v4sf3): Likewise. (<sse3>_lddqu<avxsizesuffix>): Likewise. (avx_cmp<mode>3): Likewise. (avx_vmcmp<mode>3): Likewise. (*sse2_gt<mode>3): Likewise. (sse_ldmxcsr): Likewise. (sse_stmxcsr): Likewise. (avx_vtest<ssemodesuffix><avxsizesuffix>): Replace m to jm for avx alternative and set attr_gpr32 to 0. (avx2_permv2ti): Likewise. (*avx_vperm2f128<mode>_full): Likewise. (*avx_vperm2f128<mode>_nozero): Likewise. (vec_set_lo_v32qi): Likewise. (<avx_avx2>_maskload<ssemodesuffix><avxsizesuffix>): Likewise. (<avx_avx2>_maskstore<ssemodesuffix><avxsi)zesuffix>: Likewise. (avx_cmp<mode>3): Likewise. (avx_vmcmp<mode>3): Likewise. (*<sse>_maskcmp<mode>3_comm): Likewise. (*avx2_gathersi<VEC_GATHER_MODE:mode>): Replace Tv to jb and set attr_gpr32 to 0. (*avx2_gathersi<VEC_GATHER_MODE:mode>_2): Likewise. (*avx2_gatherdi<VEC_GATHER_MODE:mode>): Likewise. (*avx2_gatherdi<VEC_GATHER_MODE:mode>_2): Likewise. (*avx2_gatherdi<VI4F_256:mode>_3): Likewise. (*avx2_gatherdi<VI4F_256:mode>_4): Likewise. (avx_vbroadcastf128_<mode>): Restrict non-egpr alternative to noavx512vl, set its constraint to jm and set attr_gpr32 to 0. (vec_set_lo_<mode><mask_name>): Likewise. (vec_set_lo_<mode><mask_name>): Likewise for SF/SI modes. (vec_set_hi_<mode><mask_name>): Likewise. (vec_set_hi_<mode><mask_name>): Likewise for SF/SI modes. (vec_set_hi_<mode>): Likewise. (vec_set_lo_<mode>): Likewise. (avx2_set_hi_v32qi): Likewise. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX_EGPR] Handle legacy insns that only support GPR16 (4/5)Kong Lingling3-170/+289
The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns with evex counterpart, we assume auto promotion to EGPR under APX_F if the insn uses GPR32. So for below insns, we disabled EGPR usage for their sse mnenomics, while allowing egpr generation of their v prefixed mnemonics. insn list: 1. pabsb/pabsw/pabsd 2. pextrb/pextrw/pextrd/pextrq 3. pinsrb/pinsrd/pinsrq 4. pshufb 5. extractps/insertps 6. pmaddubsw 7. pmulhrsw 8. packusdw 9. palignr 10. movntdqa 11. mpsadbw 12. pmuldq/pmulld 13. pmaxsb/pmaxsd, pminsb/pminsd pmaxud/pmaxuw, pminud/pminuw 14. (pmovsxbw/pmovsxbd/pmovsxbq, pmovsxwd/pmovsxwq, pmovsxdq pmovzxbw/pmovzxbd/pmovzxbq, pmovzxwd/pmovzxwq, pmovzxdq) 15. aesdec/aesdeclast, aesenc/aesenclast 16. pclmulqdq 17. gf2p8affineqb/gf2p8affineinvqb/gf2p8mulb gcc/ChangeLog: * config/i386/i386.md (*movhi_internal): Split out non-gpr supported pextrw with mem constraint to avx/noavx alternatives, set jm and attr gpr32 0 to the noavx alternative. (*mov<mode>_internal): Likewise. * config/i386/mmx.md (mmx_pshufbv8qi3): Change "r/m/Bm" to "jr/jm/ja" and set_attr gpr32 0 for noavx alternative. (mmx_pshufbv4qi3): Likewise. (*mmx_pinsrd): Likewise. (*mmx_pinsrb): Likewise. (*pinsrb): Likewise. (mmx_pshufbv8qi3): Likewise. (mmx_pshufbv4qi3): Likewise. (@sse4_1_insertps_<mode>): Likewise. (*mmx_pextrw): Split altrenatives and map non-EGPR constraints, attr_gpr32 and attr_isa to noavx mnemonics. (*movv2qi_internal): Likewise. (*pextrw): Likewise. (*mmx_pextrb): Likewise. (*mmx_pextrb_zext): Likewise. (*pextrb): Likewise. (*pextrb_zext): Likewise. (vec_extractv2si_1): Likewise. (vec_extractv2si_1_zext): Likewise. * config/i386/sse.md: (vi128_h_r): New mode attr for pinsr{bw}/pextr{bw} with reg operand. (*abs<mode>2): Split altrenatives and %v in mnemonics, map non-EGPR constraints, gpr32 and isa attrs to noavx mnemonics. (*vec_extract<mode>): Likewise. (*vec_extract<mode>): Likewise for HFBF pattern. (*vec_extract<PEXTR_MODE12:mode>_zext): Likewise. (*vec_extractv4si_1): Likewise. (*vec_extractv4si_zext): Likewise. (*vec_extractv2di_1): Likewise. (*vec_concatv2si_sse4_1): Likewise. (<sse2p4_1>_pinsr<ssemodesuffix>): Likewise. (vec_concatv2di): Likewise. (*sse4_1_<code>v2qiv2di2<mask_name>_1): Likewise. (ssse3_avx2>_pshufb<mode>3<mask_name>): Change "r/m/Bm" to "jr/jm/ja" and set_attr gpr32 0 for noavx alternative, split %v for avx/noavx alternatives if necessary. (*vec_concatv2sf_sse4_1): Likewise. (*sse4_1_extractps): Likewise. (vec_set<mode>_0): Likewise for VI4F_128. (*vec_setv4sf_sse4_1): Likewise. (@sse4_1_insertps<mode>): Likewise. (ssse3_pmaddubsw128): Likewise. (*<ssse3_avx2>_pmulhrsw<mode>3<mask_name>): Likewise. (<sse4_1_avx2>_packusdw<mask_name>): Likewise. (<ssse3_avx2>_palignr<mode>): Likewise. (<vi8_sse4_1_avx2_avx512>_movntdqa): Likewise. (<sse4_1_avx2>_mpsadbw): Likewise. (*sse4_1_mulv2siv2di3<mask_name>): Likewise. (*<sse4_1_avx2>_mul<mode>3<mask_name>): Likewise. (*sse4_1_<code><mode>3<mask_name>): Likewise. (*<code>v8hi3): Likewise. (*<code>v16qi3): Likewise. (*sse4_1_<code>v8qiv8hi2<mask_name>_1): Likewise. (*sse4_1_zero_extendv8qiv8hi2_3): Likewise. (*sse4_1_zero_extendv8qiv8hi2_4): Likewise. (*sse4_1_<code>v4qiv4si2<mask_name>_1): Likewise. (*sse4_1_<code>v4hiv4si2<mask_name>_1): Likewise. (*sse4_1_zero_extendv4hiv4si2_3): Likewise. (*sse4_1_zero_extendv4hiv4si2_4): Likewise. (*sse4_1_<code>v2hiv2di2<mask_name>_1): Likewise. (*sse4_1_<code>v2siv2di2<mask_name>_1): Likewise. (*sse4_1_zero_extendv2siv2di2_3): Likewise. (*sse4_1_zero_extendv2siv2di2_4): Likewise. (aesdec): Likewise. (aesdeclast): Likewise. (aesenc): Likewise. (aesenclast): Likewise. (pclmulqdq): Likewise. (vgf2p8affineinvqb_<mode><mask_name>): Likewise. (vgf2p8affineqb_<mode><mask_name>): Likewise. (vgf2p8mulb_<mode><mask_name>): Likewise. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Handle legacy insns that only support GPR16 (3/5)Kong Lingling5-33/+132
Disable EGPR usage for below legacy insns in opcode map2/3 that have vex but no evex counterpart. insn list: 1. phminposuw/vphminposuw 2. ptest/vptest 3. roundps/vroundps, roundpd/vroundpd, roundss/vroundss, roundsd/vroundsd 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist gcc/ChangeLog: * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New prototype. * config/i386/i386.cc (x86_evex_reg_mentioned_p): New function. * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0 and constraint jm to all non-evex alternatives, adjust alternative outputs if evex reg is mentioned. * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0 and constraint jm/ja to all non-evex alternatives. (ptesttf2): Likewise. (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise. (sse4_1_round<ssescalarmodesuffix>): Likewise. (sse4_2_pcmpestri): Likewise. (sse4_2_pcmpestrm): Likewise. (sse4_2_pcmpestr_cconly): Likewise. (sse4_2_pcmpistr): Likewise. (sse4_2_pcmpistri): Likewise. (sse4_2_pcmpistrm): Likewise. (sse4_2_pcmpistr_cconly): Likewise. (aesimc): Likewise. (aeskeygenassist): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic tests. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Handle legacy insns that only support GPR16 (2/5)Kong Lingling2-24/+155
These legacy insns in opcode map2/3 have vex but no evex counterpart, disable EGPR for them by adjusting alternatives and attr_gpr32. insn list: 1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw 2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw 3. psignb/vpsginb, psignw/vpsignw, psignd/vpsignd 4. blendps/vblendps, blendpd/vblendpd 5. blendvps/vblendvps, blendvpd/vblendvpd 6. pblendvb/vpblendvb, pblendw/vpblendw 7. mpsadbw/vmpsadbw 8. dpps/vddps, dppd/vdppd 9. pcmpeqq/vpcmpeqq, pcmpgtq/vpcmpgtq gcc/ChangeLog: * config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3): Set attr gpr32 0 and constraint jm/ja to all mem alternatives. (ssse3_ph<plusminus_mnemonic>wv8hi3): Likewise. (ssse3_ph<plusminus_mnemonic>wv4hi3): Likewise. (avx2_ph<plusminus_mnemonic>dv8si3): Likewise. (ssse3_ph<plusminus_mnemonic>dv4si3): Likewise. (ssse3_ph<plusminus_mnemonic>dv2si3): Likewise. (<ssse3_avx2>_psign<mode>3): Likewise. (ssse3_psign<mode>3): Likewise. (<sse4_1>_blend<ssemodesuffix><avxsizesuffix): Likewise. (<sse4_1>_blendv<ssemodesuffix><avxsizesuffix): Likewise. (*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_lt): Likewise. (*<sse4_1>_blendv<ssefltmodesuff)ix><avxsizesuffix>_not_ltint: Likewise. (<sse4_1>_dp<ssemodesuffix><avxsizesuffix>): Likewise. (<sse4_1_avx2>_mpsadbw): Likewise. (<sse4_1_avx2>_pblendvb): Likewise. (*<sse4_1_avx2>_pblendvb_lt): Likewise. (sse4_1_pblend<ssemodesuffix>): Likewise. (*avx2_pblend<ssemodesuffix>): Likewise. (avx2_permv2ti): Likewise. (*avx_vperm2f128<mode>_nozero): Likewise. (*avx2_eq<mode>3): Likewise. (*sse4_1_eqv2di3): Likewise. (sse4_2_gtv2di3): Likewise. (avx2_gt<mode>3): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add sse/vex intrinsic tests. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Handle legacy insn that only support GPR16 (1/5)Kong Lingling4-6/+57
These legacy insn in opcode map0/1 only support GPR16, and do not have vex/evex counterpart, directly adjust constraints and add gpr32 attr to patterns. insn list: 1. xsave/xsave64, xrstor/xrstor64 2. xsaves/xsaves64, xrstors/xrstors64 3. xsavec/xsavec64 4. xsaveopt/xsaveopt64 5. fxsave64/fxrstor64 gcc/ChangeLog: * config/i386/i386.md (<xsave>): Set attr gpr32 0 and constraint jm. (<xsave>_rex64): Likewise. (<xrstor>_rex64): Likewise. (<xrstor>64): Likewise. (fxsave64): Likewise. (fxstore64): Likewise. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Add apxf check. * gcc.target/i386/apx-legacy-insn-check-norex2.c: New test. * gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler test. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Handle GPR16 only vector move insnsHongyu Wang2-16/+60
For vector move insns like vmovdqa/vmovdqu, their evex counterparts requrire explicit suffix 64/32/16/8. The usage of these instruction are prohibited under AVX10_1 or AVX512F, so for we select vmovaps/vmovups for vector load/store insns that contains EGPR if ther is no AVX512VL, and keep the original move insn selection otherwise. gcc/ChangeLog: * config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used, adjust mnemonic for vmovduq/vmovdqa. * config/i386/sse.md (*<extract_type>_vinsert<shuffletype><extract_suf>_0): Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa. (avx_vec_concat<mode>): Likewise, and separate alternative 0 to avx_noavx512f. Co-authored-by: Kong Lingling <lingling.kong@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.Kong Lingling3-0/+122
In inline asm, we do not know if the insn can use EGPR, so disable EGPR usage by default via mapping the common reg/mem constraint to non-EGPR constraints. The full list of mapping goes like "g" -> "jrjmi" "r" -> "jr" "m" -> "jm" "<" -> "j<" ">" -> "j>" "o" -> "jo" "V" -> "jV" "p" -> "jp" "Bm" -> "ja For memory constraints, we add an option -mapx-inline-asm-use-gpr32 to allow/disallow gpr32 usage in any memory related constraints, as base_reg_class/index_reg_class cannot aware whether the asm insn support gpr32 or not. gcc/ChangeLog: * config/i386/i386.cc (map_egpr_constraints): New funciton to map common constraints to EGPR prohibited constraints. (ix86_md_asm_adjust): Calls map_egpr_constraints. * config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-inline-gpr-norex2.c: New test. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Add backend hook for base_reg_class/index_reg_class.Kong Lingling4-1/+111
Add backend helper functions to verify if a rtx_insn can adopt EGPR to its base/index reg of memory operand. The verification rule goes like 1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32. 2. Disable EGPR for unrecognized insn. 3. If which_alternative is not decided, loop through enabled alternatives and check its attr_gpr32. Only enable EGPR when all enabled alternatives has attr_gpr32 = 1. 4. If which_alternative is decided, enable/disable EGPR by its corresponding attr_gpr32. gcc/ChangeLog: * config/i386/i386-protos.h (ix86_insn_base_reg_class): New prototype. (ix86_regno_ok_for_insn_base_p): Likewise. (ix86_insn_index_reg_class): Likewise. * config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p): New helper function to scan the insn. (ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS. (ix86_regno_ok_for_insn_base_p): Likewise for base regno. (ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS. * config/i386/i386.h (INSN_BASE_REG_CLASS): Define. (REGNO_OK_FOR_INSN_BASE_P): Likewise. (INSN_INDEX_REG_CLASS): Likewise. (enum reg_class): Add INDEX_GPR16. (GENERAL_GPR16_REGNO_P): Define. * config/i386/i386.md (gpr32): New attribute. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Add register and memory constraints that disallow EGPRKong Lingling2-1/+62
For APX, as we extended the GENERAL_REG_CLASS, new constraints are needed to restrict insns that cannot adopt EGPR either in its reg or memory operands. We added a series of constraints for general/backend ones that related to GPR usage. All of them are prefixed with "j" to indicate the constraints does not allow EGPR. gcc/ChangeLog: * config/i386/constraints.md (jr): New register constraint that prohibits EGPR. (jR): Constraint that force usage of EGPR. (jm): New memory constraint that prohibits EGPR. (ja): Likewise for Bm constraint. (jb): Likewise for Tv constraint. (j<): New auto-dec memory constraint that prohibits EGPR. (j>): Likewise for ">" constraint. (jo): Likewise for "o" constraint. (jv): Likewise for "V" constraint. (jp): Likewise for "p" constraint. * config/i386/i386.h (enum reg_class): Add new reg class GENERAL_GPR16. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] Add 16 new integer general purpose registersKong Lingling7-24/+252
Extend GENERAL_REGS with extra r16-r31 registers like REX registers, named as REX2 registers. They will only be enabled under TARGET_APX_EGPR. gcc/ChangeLog: * config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p): New function prototype. * config/i386/i386.cc (regclass_map): Add mapping for 16 new general registers. (debugger64_register_map): Likewise. (ix86_conditional_register_usage): Clear REX2 register when APX disabled. (ix86_code_end): Add handling for REX2 reg. (print_reg): Likewise. (ix86_output_jmp_thunk_or_indirect): Likewise. (ix86_output_indirect_branch_via_reg): Likewise. (ix86_attr_length_vex_default): Likewise. (ix86_emit_save_regs): Adjust to allow saving r31. (ix86_register_priority): Set REX2 reg priority same as REX. (x86_extended_reg_mentioned_p): Add check for REX2 regs. (x86_extended_rex2reg_mentioned_p): New function. * config/i386/i386.h (CALL_USED_REGISTERS): Add new extended registers. (REG_ALLOC_ORDER): Likewise. (FIRST_REX2_INT_REG): Define. (LAST_REX2_INT_REG): Ditto. (GENERAL_REGS): Add 16 new registers. (INT_SSE_REGS): Likewise. (FLOAT_INT_REGS): Likewise. (FLOAT_INT_SSE_REGS): Likewise. (INT_MASK_REGS): Likewise. (ALL_REGS):Likewise. (REX2_INT_REG_P): Define. (REX2_INT_REGNO_P): Ditto. (GENERAL_REGNO_P): Add REX2_INT_REGNO_P. (REGNO_OK_FOR_INDEX_P): Ditto. (REG_OK_FOR_INDEX_NONSTRICT_P): Add new extended registers. * config/i386/i386.md: Add 16 new integer general registers. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-egprs-names.c: New test. * gcc.target/i386/apx-spill_to_egprs-1.c: Likewise. * gcc.target/i386/apx-interrupt-1.c: Likewise. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX_EGPR] Initial support for APX_FKong Lingling12-5/+102
Add -mapx-features= enumeration to separate subfeatures of APX_F. -mapxf is treated same as previous ISA flag, while it sets -mapx-features=apx_all that enables all subfeatures. gcc/ChangeLog: * common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro. (XCR_APX_F_ENABLED_MASK): Likewise. (get_available_features): Detect APX_F under * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New. (OPTION_MASK_ISA2_APX_F_UNSET): Likewise. (ix86_handle_option): Handle -mapxf. * common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New. * common/config/i386/i386-isas.h: Add entry for APX_F. * config/i386/cpuid.h (bit_APX_F): New. * config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR, TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define. * config/i386/i386-opts.h (enum apx_features): New enum. * config/i386/i386-isa.def (APX_F): New DEF_PTA. * config/i386/i386-options.cc (ix86_function_specific_save): Save ix86_apx_features. (ix86_function_specific_restore): Restore it. (ix86_valid_target_attribute_inner_p): Add mapxf. (ix86_option_override_internal): Set ix86_apx_features for PTA and TARGET_APX_F. Also reports error when APX_F is set but not having TARGET_64BIT. * config/i386/i386.opt: (-mapxf): New ISA flag option. (-mapx=): New enumeration option. (apx_features): New enum type. (apx_none): New enum value. (apx_egpr): Likewise. (apx_push2pop2): Likewise. (apx_ndd): Likewise. (apx_all): Likewise. * doc/invoke.texi: Document mapxf. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-1.c: New test. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] middle-end: Add index_reg_class with insn argument.Hongyu Wang5-10/+37
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn. Add index_reg_class with insn argument for lra/reload usage. gcc/ChangeLog: * addresses.h (index_reg_class): New wrapper function like base_reg_class. * doc/tm.texi: Document INSN_INDEX_REG_CLASS. * doc/tm.texi.in: Ditto. * lra-constraints.cc (index_part_to_reg): Pass index_class. (process_address_1): Calls index_reg_class with curr_insn and replace INDEX_REG_CLASS with its return value index_cl. * reload.cc (find_reloads_address): Likewise. (find_reloads_address_1): Likewise. Co-authored-by: Kong Lingling <lingling.kong@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07[APX EGPR] middle-end: Add insn argument to base_reg_classKong Lingling6-23/+79
Current reload infrastructure does not support selective base_reg_class for backend insn. Add new macros with insn parameters to base_reg_class for lra/reload usage. gcc/ChangeLog: * addresses.h (base_reg_class): Add insn argument and new macro INSN_BASE_REG_CLASS. (regno_ok_for_base_p_1): Add insn argument and new macro REGNO_OK_FOR_INSN_BASE_P. (regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1. * doc/tm.texi: Document INSN_BASE_REG_CLASS and REGNO_OK_FOR_INSN_BASE_P. * doc/tm.texi.in: Ditto. * lra-constraints.cc (process_address_1): Pass insn to base_reg_class. (curr_insn_transform): Ditto. * reload.cc (find_reloads): Ditto. (find_reloads_address): Ditto. (find_reloads_address_1): Ditto. (find_reloads_subreg_address): Ditto. * reload1.cc (maybe_fix_stack_asms): Ditto. Co-authored-by: Hongyu Wang <hongyu.wang@intel.com> Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
2023-10-07RISC-V: Add more run test for FP rounding autovecPan Li10-2/+371
For _Float16 types, add run test for: * ceil * floor * nearbyint * rint * round * roundeven * trunc For float and double, add run test for: * roundeven The zfa extension is required for these run test cases, the simulation target_board may look like below for rv64. target_board="riscv-sim/-march=rv64gcv_zfa_zfh/-mabi=lp64d/-mcmodel=medlow" gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Add zfa for building. * gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-nearbyint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-rint-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-round-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-1.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-roundeven-run-2.c: New test. * gcc.target/riscv/rvv/autovec/unop/math-trunc-run-0.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-07rs6000: use mtvsrws to move sf from si p9Jiufu Guo2-9/+37
As mentioned in PR108338, on p9, we could use mtvsrws to implement the bitcast from SI to SF (or lowpart DI to SF). For example: *(long long*)buff = di; float f = *(float*)(buff); "sldi 9,3,32 ; mtvsrd 1,9 ; xscvspdpn 1,1" is generated. A better one would be "mtvsrws 1,3 ; xscvspdpn 1,1". PR target/108338 gcc/ChangeLog: * config/rs6000/rs6000.md (movsf_from_si): Update to generate mtvsrws for P9. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108338.c: Updated to check mtvsrws for p9.
2023-10-07rs6000: optimize moving to sf from highpart diJiufu Guo3-5/+49
Currently, we have the pattern "movsf_from_si2" which was trying to support moving high part DI to SF. But current pattern only accepts "ashiftrt": XX:SF=bitcast:SF(subreg(YY:DI>>32),0), but actually "lshiftrt" should also be ok. And current pattern only supports BE. Here, updating the pattern to support BE and "lshiftrt". PR target/108338 gcc/ChangeLog: * config/rs6000/predicates.md (lowpart_subreg_operator): New define_predicate. * config/rs6000/rs6000.md (any_rshift): New code_iterator. (movsf_from_si2): Rename to ... (movsf_from_si2_<code>): ... this. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108338.c: New test.
2023-10-07RISC-V: Bugfix for legitimize address PR/111634Pan Li1-1/+1
Given we have RTL as below. (plus:DI (mult:DI (reg:DI 138 [ g.4_6 ]) (const_int 8 [0x8])) (lo_sum:DI (reg:DI 167) (symbol_ref:DI ("f") [flags 0x86] <var_decl 0x7fa96ea1cc60 f>) )) When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case, the fp will be the lo_sum operand as above. We have assumption that the fp is reg but actually not here. It will have ICE when building with option --enable-checking=rtl. This patch would like to fix it by adding the REG_P to ensure the operand is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this fix when build with --enable-checking=rtl. PR target/111634 gcc/ChangeLog: * config/riscv/riscv.cc (riscv_legitimize_address): Ensure object is a REG before extracting its' REGNO. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-07RISC-V: Fix scan-assembler-times of RVV test casexuli2-10/+10
gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler times. * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
2023-10-07Daily bump.GCC Administrator5-1/+153
2023-10-06i386: Implement doubleword shift left by 1 bit using add+adc.Roger Sayle4-1/+33
This patch tweaks the i386 back-end's ix86_split_ashl to implement doubleword left shifts by 1 bit, using an add followed by an add-with-carry (i.e. a doubleword x+x) instead of using the x86's shld instruction. The replacement sequence both requires fewer bytes and is faster on both Intel and AMD architectures (from Agner Fog's latency tables and confirmed by my own micro-benchmarking). For the test case: __int128 foo(__int128 x) { return x << 1; } with -O2 we previously generated: foo: movq %rdi, %rax movq %rsi, %rdx shldq $1, %rdi, %rdx addq %rdi, %rax ret with this patch we now generate: foo: movq %rdi, %rax movq %rsi, %rdx addq %rdi, %rax adcq %rsi, %rdx ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-06 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_split_ashl): Split shifts by one into add3_cc_overflow_1 followed by add3_carry. * config/i386/i386.md (@add<mode>3_cc_overflow_1): Renamed from "*add<mode>3_cc_overflow_1" to provide generator function. gcc/testsuite/ChangeLog * gcc.target/i386/ashldi3-2.c: New 32-bit test case. * gcc.target/i386/ashlti3-3.c: New 64-bit test case.
2023-10-06Makefile.tpl: disable -Werror for feedback stage [PR111663]Sergei Trofimovich2-0/+8
Without the change profiled bootstrap fails for various warnings on master branch as: $ ../gcc/configure $ make profiledbootstrap ... gcc/genmodes.cc: In function ‘int main(int, char**)’: gcc/genmodes.cc:2152:1: error: ‘gcc/build/genmodes.gcda’ profile count data file not found [-Werror=missing-profile] ... gcc/gengtype-parse.cc: In function ‘void parse_error(const char*, ...)’: gcc/gengtype-parse.cc:142:21: error: ‘%s’ directive argument is null [-Werror=format-overflow=] The change removes -Werror just like autofeedback does today. / PR bootstrap/111663 * Makefile.tpl (STAGEfeedback_CONFIGURE_FLAGS): Disable -Werror. * Makefile.in: Regenerate.
2023-10-06i386: Split lea into shorter left shift by 2 or 3 bits with -Oz.Roger Sayle2-0/+14
This patch avoids long lea instructions for performing x<<2 and x<<3 by splitting them into shorter sal and move (or xchg instructions). Because this increases the number of instructions, but reduces the total size, its suitable for -Oz (but not -Os). The impact can be seen in the new test case: int foo(int x) { return x<<2; } int bar(int x) { return x<<3; } long long fool(long long x) { return x<<2; } long long barl(long long x) { return x<<3; } where with -O2 we generate: foo: lea 0x0(,%rdi,4),%eax // 7 bytes retq bar: lea 0x0(,%rdi,8),%eax // 7 bytes retq fool: lea 0x0(,%rdi,4),%rax // 8 bytes retq barl: lea 0x0(,%rdi,8),%rax // 8 bytes retq and with -Oz we now generate: foo: xchg %eax,%edi // 1 byte shl $0x2,%eax // 3 bytes retq bar: xchg %eax,%edi // 1 byte shl $0x3,%eax // 3 bytes retq fool: xchg %rax,%rdi // 2 bytes shl $0x2,%rax // 4 bytes retq barl: xchg %rax,%rdi // 2 bytes shl $0x3,%rax // 4 bytes retq Over the entirety of the CSiBE code size benchmark this saves 1347 bytes (0.037%) for x86_64, and 1312 bytes (0.036%) with -m32. Conveniently, there's already a backend function in i386.cc for deciding whether to split an lea into its component instructions, ix86_avoid_lea_for_addr, all that's required is an additional clause checking for -Oz (i.e. optimize_size > 1). 2023-10-06 Roger Sayle <roger@nextmovesoftware.com> Uros Bizjak <ubizjak@gmail.com> gcc/ChangeLog * config/i386/i386.cc (ix86_avoid_lea_for_addr): Split LEAs used to perform left shifts into shorter instructions with -Oz. gcc/testsuite/ChangeLog * gcc.target/i386/lea-2.c: New test case.
2023-10-06RISC-V: const: hide mvconst splitter from IRAVineet Gupta1-3/+6
Vlad recently introduced a new gate @ira_in_progress, similar to counterparts @{reload,lra}_in_progress. Use this to hide the constant synthesis splitter from being recog* () by IRA register equivalence logic which is eager to undo the splits, generating worse code for constants (and sometimes no code at all). See PR/109279 (large constant), PR/110748 (const -0.0) ... Granted the IRA logic is subsided with -fsched-pressure which is now enabled for RISC-V backend, the gate makes this future-proof in addition to helping with -O1 etc. This fixes 1 addition test ========= Summary of gcc testsuite ========= | # of unexpected case / # of unique unexpected case | gcc | g++ | gfortran | rv32imac/ ilp32/ medlow | 416 / 103 | 13 / 6 | 67 / 12 | rv32imafdc/ ilp32d/ medlow | 416 / 103 | 13 / 6 | 24 / 4 | rv64imac/ lp64/ medlow | 417 / 104 | 9 / 3 | 67 / 12 | rv64imafdc/ lp64d/ medlow | 416 / 103 | 5 / 2 | 6 / 1 | Also similar to v1, this doesn't move RISC-V SPEC scores at all. gcc/ChangeLog: * config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress. Suggested-by: Jeff Law <jeffreyalaw@gmail.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-10-06Docs: Minimally document standard C/C++ attribute syntax.Sandra Loosemore1-22/+52
gcc/ChangeLog: * doc/extend.texi (Function Attributes): Mention standard attribute syntax. (Variable Attributes): Likewise. (Type Attributes): Likewise. (Attribute Syntax): Likewise.
2023-10-06amdgcn: switch mov insns to compact syntaxAndrew Stubbs2-130/+108
The move instructions typically have many alternatives (and I'm about to add more) so are good candidates for the new syntax. This patch only converts the patterns where there are no significant changes to the generated files. The other patterns can be converted another time. gcc/ChangeLog: * config/gcn/gcn-valu.md (*mov<mode>): Convert to compact syntax. (mov<mode>_exec): Likewise. (mov<mode>_sgprbase): Likewise. * config/gcn/gcn.md (*mov<mode>_insn): Likewise. (*movti_insn): Likewise.
2023-10-06amdgcn: silence warningAndrew Stubbs1-1/+1
gcc/ChangeLog: * config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning.
2023-10-06libgomp.texi: Document some of the device-memory routinesTobias Burnus1-15/+288
libgomp/ChangeLog: * libgomp.texi (Device Memory Routines): New.
2023-10-06MATCH: Fix infinite loop between `vec_cond(vec_cond(a,b,0), c, d)` and `a & b`Andrew Pinski2-0/+12
Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)` into `vec_cond(a & b, c, d)` but since in this case a is a comparison fold will change `a & b` back into `vec_cond(a,b,0)` which causes an infinite loop. The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*) only for GIMPLE so we don't get an infinite loop for fold any more. Note this is a latent bug since these patterns were added in r11-2577-g229752afe3156a and was exposed by r14-3350-g47b833a9abe1 where now able to remove a VIEW_CONVERT_EXPR. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/111699 gcc/ChangeLog: * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e), (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr111699-1.c: New test.
2023-10-06ipa: Remove ipa_bitsJakub Jelinek4-415/+206
The following patch removes ipa_bits struct pointer/vector from ipa jump functions and ipa cp transformations. The reason is because the struct uses widest_int to represent mask/value pair, which in the RFC patches to allow larger precisions for wide_int/widest_int is GC unfriendly because those types become non-trivially default constructible/copyable/destructible. One option would be to use trailing_wide_int for that instead, but as pointed out by Aldy, irange_storage which we already use under the hood for ipa_vr when type of parameter is integral or pointer already stores the mask/value pair because VRP now does the bit cp as well. So, this patch just uses m_vr to store both the value range and the bitmask. There is still separate propagation of the ipcp_bits_lattice from propagation of the ipcp_vr_lattice, but when storing we merge the two into the same container. 2023-10-06 Jakub Jelinek <jakub@redhat.com> * ipa-prop.h (ipa_bits): Remove. (struct ipa_jump_func): Remove bits member. (struct ipcp_transformation): Remove bits member, adjust ctor and dtor. (ipa_get_ipa_bits_for_value): Remove. * ipa-prop.cc (struct ipa_bit_ggc_hash_traits): Remove. (ipa_bits_hash_table): Remove. (ipa_print_node_jump_functions_for_edge): Don't print bits. (ipa_get_ipa_bits_for_value): Remove. (ipa_set_jfunc_bits): Remove. (ipa_compute_jump_functions_for_edge): For pointers query pointer alignment before ipa_set_jfunc_vr and update_bitmask in there. For integral types, just rely on bitmask already being handled in value ranges. (ipa_check_create_edge_args): Don't create ipa_bits_hash_table. (ipcp_transformation_initialize): Neither here. (ipcp_transformation_t::duplicate): Don't copy bits vector. (ipa_write_jump_function): Don't stream bits here. (ipa_read_jump_function): Neither here. (useful_ipcp_transformation_info_p): Don't test bits vec. (write_ipcp_transformation_info): Don't stream bits here. (read_ipcp_transformation_info): Neither here. (ipcp_get_parm_bits): Get mask and value from m_vr rather than bits. (ipcp_update_bits): Remove. (ipcp_update_vr): For pointers, set_ptr_info_alignment from bitmask stored in value range. (ipcp_transform_function): Don't test bits vector, don't call ipcp_update_bits. * ipa-cp.cc (propagate_bits_across_jump_function): Don't use jfunc->bits, instead get mask and value from jfunc->m_vr. (ipcp_store_bits_results): Remove. (ipcp_store_vr_results): Incorporate parts of ipcp_store_bits_results here, merge the bitmasks with value range if both are supplied. (ipcp_driver): Don't call ipcp_store_bits_results. * ipa-sra.cc (zap_useless_ipcp_results): Remove *ts->bits clearing.
2023-10-05RISC-V: Use stdint-gcc.h in rvv testsuitePatrick O'Neill28-28/+28
stdint.h can be replaced with stdint-gcc.h to resolve some missing system headers in non-multilib installations. Tested using glibc rv32gcv and rv64gcv on r14-4381-g7eb5ce7f58e. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-1.h: Replace stdint.h with stdint-gcc.h. * gcc.target/riscv/rvv/autovec/cond/cond_convert_float2float-2.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-4.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-5.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-6.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-7.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_unary-8.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-8.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-9.c: Ditto. * gcc.target/riscv/rvv/autovec/pr111232.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/cvt-0.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/cvt-1.c: Ditto. * gcc.target/riscv/rvv/autovec/vls-vlmax/perm.h: Ditto. * gcc.target/riscv/rvv/base/abi-call-args-4-run.c: Ditto. * gcc.target/riscv/rvv/base/pr110119-2.c: Ditto. * gcc.target/riscv/rvv/vsetvl/pr111255.c: Ditto. * gcc.target/riscv/rvv/vsetvl/wredsum_vlmax.c: Ditto. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-10-06RISC-V: Update comments for FP rounding related autovecPan Li1-1/+5
Some comment is out of date, this patch would like to fix it. gcc/ChangeLog: * config/riscv/autovec.md: Update comments. Signed-off-by: Pan Li <pan2.li@intel.com>
2023-10-06Daily bump.GCC Administrator7-1/+267
2023-10-05RISC-V: Test memcpy inlined on riscv_vPatrick O'Neill2-0/+8
Since r14-4358-g9464e72bcc9 riscv_v targets use vector instructions to perform a memcpy. We no longer expect memcpy for riscv_v targets. gcc/testsuite/ChangeLog: * gcc.dg/pr90263.c: Skip riscv_v targets. * gcc.target/riscv/rvv/base/pr90263.c: New test. Signed-off-by: Patrick O'Neill <patrick@rivosinc.com> Co-authored-by: Joern Rennecke <joern.rennecke@embecosm.com>
2023-10-05Delete MALLOC_ABI_ALIGNMENT define from pa32-linux.hJohn David Anglin1-5/+0
2023-10-05 John David Anglin <danglin@gcc.gnu.org> * config/pa/pa32-linux.h (MALLOC_ABI_ALIGNMENT): Delete.
2023-10-05libstdc++: [_GLIBCXX_INLINE_VERSION] Add missing symbolsFrançois Dumont1-0/+9
libstdc++-v3/ChangeLog: * config/abi/pre/gnu-versioned-namespace.ver: Add missing symbols for _Float{16,32,64,128,32x,64x,128x}.
2023-10-05Create a fast VRP passAndrew MacLeod3-0/+126
* timevar.def (TV_TREE_FAST_VRP): New. * tree-pass.h (make_pass_fast_vrp): New prototype. * tree-vrp.cc (class fvrp_folder): New. (fvrp_folder::fvrp_folder): New. (fvrp_folder::~fvrp_folder): New. (fvrp_folder::value_of_expr): New. (fvrp_folder::value_on_edge): New. (fvrp_folder::value_of_stmt): New. (fvrp_folder::pre_fold_bb): New. (fvrp_folder::post_fold_bb): New. (fvrp_folder::pre_fold_stmt): New. (fvrp_folder::fold_stmt): New. (execute_fast_vrp): New. (pass_data_fast_vrp): New. (pass_vrp:execute): Check for fast VRP pass. (make_pass_fast_vrp): New.
2023-10-05Add a dom based ranger for fast VRP.Andrew MacLeod2-0/+328
Provide a dominator based implementation of a range query. * gimple-range.cc (dom_ranger::dom_ranger): New. (dom_ranger::~dom_ranger): New. (dom_ranger::range_of_expr): New. (dom_ranger::edge_range): New. (dom_ranger::range_on_edge): New. (dom_ranger::range_in_bb): New. (dom_ranger::range_of_stmt): New. (dom_ranger::maybe_push_edge): New. (dom_ranger::pre_bb): New. (dom_ranger::post_bb): New. * gimple-range.h (class dom_ranger): New.