aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2018-05-25Add IFN_COND_{MUL,DIV,MOD,RDIV}Richard Sandiford2-2/+54
This patch adds support for conditional multiplication and division. It's mostly mechanical, but a few notes: * The *_optab name and the .md names are the same as the unconditional forms, just with "cond_" added to the front. This means we still have the awkward difference between sdiv and div, etc. * It was easier to retain the difference between integer and FP division in the function names, given that they map to different tree codes (TRUNC_DIV_EXPR and RDIV_EXPR). * SVE has no direct support for IFN_COND_MOD, but it seemed more consistent to add it anyway. * Adding IFN_COND_MUL enables an extra fully-masked reduction in gcc.dg/vect/pr53773.c. * In practice we don't actually use the integer division forms without if-conversion support (added by a later patch). 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (vect_double_cond_arith): Include multiplication and division. * doc/md.texi (cond_mul@var{m}, cond_div@var{m}, cond_mod@var{m}) (cond_udiv@var{m}, cond_umod@var{m}): Document. * optabs.def (cond_smul_optab, cond_sdiv_optab, cond_smod_optab) (cond_udiv_optab, cond_umod_optab): New optabs. * internal-fn.def (IFN_COND_MUL, IFN_COND_DIV, IFN_COND_MOD) (IFN_COND_RDIV): New internal functions. * internal-fn.c (get_conditional_internal_fn): Handle TRUNC_DIV_EXPR, TRUNC_MOD_EXPR and RDIV_EXPR. * match.pd (UNCOND_BINARY, COND_BINARY): Handle them. * config/aarch64/iterators.md (UNSPEC_COND_MUL, UNSPEC_COND_DIV): New unspecs. (SVE_INT_BINARY): Include mult. (SVE_COND_FP_BINARY): Include UNSPEC_MUL and UNSPEC_DIV. (optab, sve_int_op): Handle mult. (optab, sve_fp_op, commutative): Handle UNSPEC_COND_MUL and UNSPEC_COND_DIV. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): New pattern for SVE_INT_BINARY_SD. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_double_cond_arith): Include multiplication and division. * gcc.dg/vect/pr53773.c: Do not expect a scalar tail when using fully-masked loops with a fixed vector length. * gcc.dg/vect/vect-cond-arith-1.c: Add multiplication and division tests. * gcc.target/aarch64/sve/vcond_8.c: Likewise. * gcc.target/aarch64/sve/vcond_9.c: Likewise. * gcc.target/aarch64/sve/vcond_12.c: Add multiplication tests. From-SVN: r260713
2018-05-25[AArch64] Add SVE support for integer divisionRichard Sandiford2-0/+36
After the previous patch to prevent pessimisation of divisions by constants, this patch adds support for the SVE integer division instructions. 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * config/aarch64/iterators.md (SVE_INT_BINARY_SD): New code iterator. (optab, sve_int_op): Handle div and udiv. * config/aarch64/aarch64-sve.md (<optab><mode>3): New expander for SVE_INT_BINARY_SD. (*<optab><mode>3): New insn for the same. gcc/testsuite/ * gcc.target/aarch64/sve/div_1.c: New test. * gcc.target/aarch64/sve/div_1_run.c: Likewise. * gcc.target/aarch64/sve/mul_highpart_2.c: Likewise. * gcc.target/aarch64/sve/mul_highpart_2_run.c: Likewise. From-SVN: r260712
2018-05-25Fold VEC_COND_EXPRs to IFN_COND_* where possibleRichard Sandiford4-5/+101
This patch adds the folds: (vec_cond COND (foo A B) C) -> (IFN_COND_FOO COND A B C) (vec_cond COND C (foo A B)) -> (IFN_COND_FOO (!COND) A B C) with the usual implicit restriction that the target must support the produced IFN_COND_FOO. The results of these folds don't have identical semantics, since the reverse transform would be invalid if (FOO A[i] B[i]) faults when COND[i] is false. But this direction is OK since we're simply dropping faults for operations whose results aren't needed. The new gimple_resimplify4 doesn't try to do any constant folding on the IFN_COND_*s. This is because a later patch will handle it by folding the associated unconditional operation. Doing this in gimple is better than doing it in .md patterns, since the second form (with the inverted condition) is much more common than the first, and it's better to fold away the inversion in gimple and optimise the result before entering expand. 2018-05-24 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (vect_double_cond_arith: Document. * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 4. (gimple_match_op::gimple_match_op): Add an overload for 4 operands. (gimple_match_op::set_op): Likewise. (gimple_resimplify4): Declare. * genmatch.c (get_operand_type): Handle CFN_COND_* functions. (expr::gen_transform): Likewise. (decision_tree::gen): Generate a simplification routine for 4 operands. * gimple-match-head.c (gimple_simplify): Add an overload for 4 operands. In the top-level function, handle up to 4 call arguments and call gimple_resimplify4. (gimple_resimplify4): New function. (build_call_internal): Pass a fourth operand. (maybe_push_to_seq): Likewise. * match.pd (UNCOND_BINARY, COND_BINARY): New operator lists. Fold VEC_COND_EXPRs of an operation and a default value into an IFN_COND_* function if possible. * config/aarch64/iterators.md (UNSPEC_COND_MAX, UNSPEC_COND_MIN): New unspecs. (SVE_COND_FP_BINARY): Include them. (optab, sve_fp_op): Handle them. (SVE_INT_BINARY_REV): New code iterator. (SVE_COND_FP_BINARY_REV): New int iterator. (commutative): New int attribute. * config/aarch64/aarch64-protos.h (aarch64_sve_prepare_conditional_op): Declare. * config/aarch64/aarch64.c (aarch64_sve_prepare_conditional_op): New function. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): Use it. (*cond_<optab><mode>): New patterns for reversed operands. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_double_cond_arith): New proc. * gcc.dg/vect/vect-cond-arith-1.c: New test. * gcc.target/aarch64/sve/vcond_8.c: Likewise. * gcc.target/aarch64/sve/vcond_8_run.c: Likewise. * gcc.target/aarch64/sve/vcond_9.c: Likewise. * gcc.target/aarch64/sve/vcond_9_run.c: Likewise. * gcc.target/aarch64/sve/vcond_12.c: Likewise. * gcc.target/aarch64/sve/vcond_12_run.c: Likewise. From-SVN: r260710
2018-05-25Support SHF_EXCLUDE on non-x86 and with Solaris asRainer Orth2-0/+13
* configure.ac (gcc_cv_as_section_has_e): Move to common section. Rename to... (gcc_cv_as_section_exclude): ... this. Try Solaris as #exclude syntax. * configure: Regenerate. * config.in: Regenerate. * config/i386/i386.c (i386_solaris_elf_named_section): Handle SECTION_EXCLUDE. * config/sparc/sparc.c (sparc_solaris_elf_asm_named_section) [HAVE_GAS_SECTION_EXCLUDE]: Handle SECTION_EXCLUDE. * varasm.c (default_elf_asm_named_section): Don't check if HAVE_GAS_SECTION_EXCLUDE is defined. From-SVN: r260708
2018-05-25Add an "else" argument to IFN_COND_* functionsRichard Sandiford2-41/+52
As suggested by Richard B, this patch changes the IFN_COND_* functions so that they take the else value of the ?: operation as a final argument, rather than always using argument 1. All current callers will still use the equivalent of argument 1, so this patch makes the SVE code assert that for now. Later patches add the general case. 2018-05-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/md.texi: Update the documentation of the cond_* optabs to mention the new final operand. Fix GET_MODE_NUNITS call. Describe the scalar case too. * internal-fn.def (IFN_EXTRACT_LAST): Change type to fold_left. * internal-fn.c (expand_cond_unary_optab_fn): Expect 3 operands instead of 2. (expand_cond_binary_optab_fn): Expect 4 operands instead of 3. (get_conditional_internal_fn): Update comment. * tree-vect-loop.c (vectorizable_reduction): Pass the original accumulator value as a final argument to conditional functions. * config/aarch64/aarch64-sve.md (cond_<optab><mode>): Turn into a define_expand and add an "else" operand. Assert for now that the else operand is equal to operand 2. Use SVE_INT_BINARY and SVE_COND_FP_BINARY instead of SVE_COND_INT_OP and SVE_COND_FP_OP. (*cond_<optab><mode>): New patterns. * config/aarch64/iterators.md (UNSPEC_COND_SMAX, UNSPEC_COND_UMAX) (UNSPEC_COND_SMIN, UNSPEC_COND_UMIN, UNSPEC_COND_AND, UNSPEC_COND_ORR) (UNSPEC_COND_EOR): Delete. (optab): Remove associated mappings. (SVE_INT_BINARY): New code iterator. (sve_int_op): Remove int attribute and add "minus" to the code attribute. (SVE_COND_INT_OP): Delete. (SVE_COND_FP_OP): Rename to... (SVE_COND_FP_BINARY): ...this. From-SVN: r260707
2018-05-24sse.md (cvtusi2<ssescalarmodesuffix>64<round_name>): Add {q} suffix to insn ↵Uros Bizjak1-1/+1
mnemonic. * config/i386/sse.md (cvtusi2<ssescalarmodesuffix>64<round_name>): Add {q} suffix to insn mnemonic. testsuite/Changelog: * gcc.target/i386/avx512f-vcvtusi2sd64-1.c: Update scan string. * gcc.target/i386/avx512f-vcvtusi2ss64-1.c: Ditto. From-SVN: r260691
2018-05-24msp430.c (TARGET_WARN_FUNC_RETURN): Define.Jozef Lawrynowicz1-0/+11
* config/msp430/msp430.c (TARGET_WARN_FUNC_RETURN): Define. (msp430_warn_func_return): New. From-SVN: r260690
2018-05-24re PR target/85903 (FAIL: gcc.target/i386/avx512dq-vcvtuqq2pd-2.c)Uros Bizjak1-5/+2
PR target/85903 * config/i386/sse.md (movdi_to_sse): Do not generate pseudo when memory input operand is handled. From-SVN: r260681
2018-05-24[AArch64, Falkor] Falkor address costs tuningLuis Machado1-1/+17
Switch from using generic address costs to using Falkor-specific ones, which give Falkor better results overall. gcc/ChangeLog: 2018-05-24 Luis Machado <luis.machado@linaro.org> * config/aarch64/aarch64.c (qdf24xx_addrcost_table): New static global. (qdf24xx_tunings) <addr_costs>: Set to qdf24xx_addrcost_table. From-SVN: r260675
2018-05-24PR target/83009: Relax strict address checking for store pair lanesAndre Vieira1-1/+1
The operand constraint for the memory address of store/load pair lanes was enforcing strictly hardware registers be allowed as memory addresses. We want to relax that such that these patterns can be used by combine. During register allocation the register constraint will enforce the correct register is chosen. gcc 2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com> PR target/83009 * config/aarch64/predicates.md (aarch64_mem_pair_lanes_operand): Make address check not strict. gcc/testsuite 2018-05-24 Andre Vieira <andre.simoesdiasvieira@arm.com> PR target/83009 * gcc/target/aarch64/store_v2vec_lanes.c: Add extra tests. From-SVN: r260635
2018-05-23[Patch 02/02] Introduce prefetch-dynamic-strides optionLuis Machado2-0/+14
The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. gcc/ChangeLog: 2018-05-23 Luis Machado <luis.machado@linaro.org> * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) <prefetch_dynamic_strides>: New const bool field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to false. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. From-SVN: r260618
2018-05-23[Patch 01/02] Introduce prefetch-minimum stride optionLuis Machado2-1/+15
This patch adds a new option to control the minimum stride, for a memory reference, after which the loop prefetch pass may issue software prefetch hints for. There are two motivations: * Make the pass less aggressive, only issuing prefetch hints for bigger strides that are more likely to benefit from prefetching. I've noticed a case in cpu2017 where we were issuing thousands of hints, for example. * For processors that have a hardware prefetcher, like Falkor, it allows the loop prefetch pass to defer prefetching of smaller (less than the threshold) strides to the hardware prefetcher instead. This prevents conflicts between the software prefetcher and the hardware prefetcher. I've noticed considerable reduction in the number of prefetch hints and slightly positive performance numbers. This aligns GCC and LLVM in terms of prefetch behavior for Falkor. The default settings should guarantee no changes for existing targets. Those are free to tweak the settings as necessary. gcc/ChangeLog: 2018-05-23 Luis Machado <luis.machado@linaro.org> * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) <minimum_stride>: New const int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include minimum_stride field defaulting to -1. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune) <minimum_stride>: Set to 2048. <default_opt_level>: Set to 3. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_MINIMUM_STRIDE. * doc/invoke.texi (prefetch-minimum-stride): Document new option. * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New. * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if stride is constant and is below the minimum stride threshold. From-SVN: r260617
2018-05-23[arm] Remove mode26 feature bitKyrylo Tkachov2-16/+2
* config/arm/arm-cpus.in (mode26): Delete. (armv4): Delete mode26 reference. * config/arm/arm.c (arm_configure_build_target): Delete use of isa_bit_mode26. From-SVN: r260615
2018-05-23i386.md (*floatuns<SWI48:mode><MODEF:mode>2_avx512): New insn pattern.Uros Bizjak1-28/+93
* config/i386/i386.md (*floatuns<SWI48:mode><MODEF:mode>2_avx512): New insn pattern. (floatunssi<mode>2): Also enable for AVX512F and TARGET_SSE_MATH. Rewrite expander pattern. Emit gen_floatunssi<mode>2_i387_with_xmm for non-SSE modes. (floatunsdisf2): Rewrite expander pattern. Hanlde TARGET_AVX512F. (floatunsdidf2): Ditto. * config/i386/i386.md (fixuns_trunc<mode>di2): New insn pattern. (fixuns_trunc<mode>si2_avx512f): Ditto. (*fixuns_trunc<mode>si2_avx512f_zext): Ditto. (fixuns_trunc<mode>si2): Also enable for AVX512F and TARGET_SSE_MATH. Emit fixuns_trunc<mode>si2_avx512f for AVX512F targets. testsuite/ChangeLog: * gcc.target/i386/cvt-2.c: New test. * gcc.target/i386/cvt-3.c: New test. From-SVN: r260614
2018-05-23[AArch64] Simplify frame pointer logicWilco Dijkstra1-17/+29
Simplify frame pointer logic. Add aarch64_needs_frame_chain to decide when to emit the frame chain using clearer logic. Introduce aarch64_use_frame_pointer which contains the value of -fno-omit-frame-pointer (flag_omit_frame_pointer is set to a magic value so that the mid-end won't force the frame pointer in all cases, and leaf frame pointer emission can't be supported). gcc/ * config/aarch64/aarch64.c (aarch64_use_frame_pointer): Add new boolean. (aarch64_needs_frame_chain): New function. (aarch64_parse_override_string): Set aarch64_use_frame_pointer. From-SVN: r260606
2018-05-23[AArch64][PR target/84882] Add mno-strict-alignSudakshina Das2-8/+5
*** gcc/ChangeLog *** 2018-05-23 Sudakshina Das <sudi.das@arm.com> PR target/84882 * common/config/aarch64/aarch64-common.c (aarch64_handle_option): Check val before adding MASK_STRICT_ALIGN to opts->x_target_flags. * config/aarch64/aarch64.opt (mstrict-align): Remove RejectNegative. * config/aarch64/aarch64.c (aarch64_attributes): Mark allow_neg as true for strict-align. (aarch64_can_inline_p): Perform checks even when callee has no attributes to check for strict alignment. * doc/extend.texi (AArch64 Function Attributes): Document no-strict-align. * doc/invoke.texi: (AArch64 Options): Likewise. *** gcc/testsuite/ChangeLog *** 2018-05-23 Sudakshina Das <sudi.das@arm.com> PR target/84882 * gcc.target/aarch64/pr84882.c: New test. * gcc.target/aarch64/target_attr_18.c: Likewise. From-SVN: r260604
2018-05-22[AArch64] Recognize a missed usage of a sbfiz instructionLuis Machado1-0/+14
A customer reported the following missed opportunities to combine a couple instructions into a sbfiz. int sbfiz32 (int x) { return x << 29 >> 10; } long sbfiz64 (long x) { return x << 58 >> 20; } This gets converted to the following pattern: (set (reg:SI 98) (ashift:SI (sign_extend:SI (reg:HI 0 x0 [ xD.3334 ])) (const_int 6 [0x6]))) Currently, gcc generates the following: sbfiz32: lsl x0, x0, 29 asr x0, x0, 10 ret sbfiz64: lsl x0, x0, 58 asr x0, x0, 20 ret It could generate this instead: sbfiz32: sbfiz w0, w0, 19, 3 ret sbfiz64:: sbfiz x0, x0, 38, 6 ret The unsigned versions already generate ubfiz for the same code, so the lack of a sbfiz pattern may have been an oversight. This particular sbfiz pattern shows up in both CPU2006 (~ 80 hits) and CPU2017 (~ 280 hits). It's not a lot, but seems beneficial in any case. No significant performance differences, probably due to the small number of occurrences or cases outside hot areas. gcc/ChangeLog: 2018-05-22 Luis Machado <luis.machado@linaro.org> gcc/ * config/aarch64/aarch64.md (*ashift<mode>_extv_bfiz): New pattern. gcc/testsuite/ChangeLog: 2018-05-22 Luis Machado <luis.machado@linaro.org> gcc/testsuite/ * gcc.target/aarch64/lsl_asr_sbfiz.c: New test. From-SVN: r260546
2018-05-22[AArch64, patch] Refactor of aarch64-ldpstpJackson Woodruff3-167/+90
This patch removes a lot of duplicated code in aarch64-ldpstp.md. The patterns that did not previously generate a base register now do not check for aarch64_mem_pair_operand in the pattern. This has been extracted to a check in aarch64_operands_ok_for_ldpstp. All patterns in the file used to have explicit switching code to swap loads and stores that were in the wrong order. This has been extracted into aarch64_operands_ok_for_ldpstp as a final operation after all the checks have been performed. 2018-05-22 Jackson Woodruff <jackson.woodruff@arm.com> Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64-ldpstp.md: Replace uses of aarch64_mem_pair_operand with memory_operand and delete operand swapping code. * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp): Add check for legitimate_address. (aarch64_gen_adjusted_ldpstp): Swap operands where appropriate. (aarch64_swap_ldrstr_operands): New. * config/aarch64/aarch64-protos.h (aarch64_swap_ldrstr_operands): Define prototype. Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com> From-SVN: r260539
2018-05-22[AArch64] Merge stores of D-register values with different modesJackson Woodruff6-133/+124
This patch merges loads and stores from D-registers that are of different modes. Code like this: typedef int __attribute__((vector_size(8))) vec; struct pair { vec v; double d; } Now generates a store pair instruction: void assign (struct pair *p, vec v) { p->v = v; p->d = 1.0; } Whereas previously it generated two `str` instructions. This patch also merges storing of double zero values with long integer values: struct pair { long long l; double d; } void foo (struct pair *p) { p->l = 10; p->d = 0.0; } Now generates a single store pair instruction rather than two `str` instructions. The patch basically generalises the mode iterators on the patterns in aarch64.md and the peepholes in aarch64-ldpstp.md to take all combinations of pairs of modes so, while it may be a large-ish patch, it does fairly mechanical stuff. 2018-05-22 Jackson Woodruff <jackson.woodruff@arm.com> Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.md: New patterns to generate stp and ldp. (store_pair_sw, store_pair_dw): New patterns to generate stp for single words and double words. (load_pair_sw, load_pair_dw): Likewise. (store_pair_sf, store_pair_df, store_pair_si, store_pair_di): Delete. (load_pair_sf, load_pair_df, load_pair_si, load_pair_di): Delete. * config/aarch64/aarch64-ldpstp.md: Modify peephole for different mode ldpstp and add peephole for merged zero stores. Likewise for loads. * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp): Add size check. (aarch64_gen_store_pair): Rename calls to match new patterns. (aarch64_gen_load_pair): Rename calls to match new patterns. * config/aarch64/aarch64-simd.md (load_pair<mode>): Rename to... (load_pair<DREG:mode><DREG2:mode>): ... This. (store_pair<mode>): Rename to... (vec_store_pair<DREG:mode><DREG2:mode>): ... This. * config/aarch64/iterators.md (DREG, DREG2, DX2, SX, SX2, DSX): New mode iterators. (V_INT_EQUIV): Handle SImode. * config/aarch64/predicates.md (aarch64_reg_zero_or_fp_zero): New predicate. * gcc.target/aarch64/ldp_stp_6.c: New. * gcc.target/aarch64/ldp_stp_7.c: New. * gcc.target/aarch64/ldp_stp_8.c: New. Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com> From-SVN: r260538
2018-05-21re PR target/85657 (Make __ibm128 a separate type, even if long double uses ↵Michael Meissner3-26/+46
the IBM double-double format) [gcc] 2018-05-21 Michael Meissner <meissner@linux.ibm.com> PR target/85657 * config/rs6000/rs6000-c.c (rs6000_cpu_cpp_builtins): Do not define __ibm128 as long double. * config/rs6000/rs6000.c (rs6000_init_builtins): Always create __ibm128 as a distinct type. (init_float128_ieee): Fix up conversions between IFmode and IEEE 128-bit types to use the correct functions. (rs6000_expand_float128_convert): Use explicit FLOAT_EXTEND to convert between 128-bit floating point types that have different modes but the same representation, instead of using gen_lowpart to makean alias. * config/rs6000/rs6000.md (IFKF): New iterator for IFmode and KFmode. (IFKF_reg): New attributes to give the register constraints for IFmode and KFmode. (extend<mode>tf2_internal): New insns to mark an explicit conversion between 128-bit floating point types that have a different mode but share the same representation. [gcc/testsuite] 2018-05-21 Michael Meissner <meissner@linux.ibm.com> PR target/85657 * gcc.target/powerpc/pr85657-1.c: New test for converting between __float128, __ibm128, and long double. * gcc.target/powerpc/pr85657-2.c: Likewise. * gcc.target/powerpc/pr85657-3.c: Likewise. * g++.dg/pr85667.C: New test to make sure __ibm128 is implementated as a separate type internally, and is not just an alias for long double. From-SVN: r260489
2018-05-21[AArch64] Implement usadv16qi and ssadv16qi standard namesKyrylo Tkachov3-0/+80
This patch implements the usadv16qi and ssadv16qi standard names. See the thread at on gcc@gcc.gnu.org [1] for background. The V16QImode variant is important to get right as it is the most commonly used pattern: reducing vectors of bytes into an int. The midend expects the optab to compute the absolute differences of operands 1 and 2 and reduce them while widening along the way up to SImode. So the inputs are V16QImode and the output is V4SImode. I've tried out a few different strategies for that, the one I settled with is to emit: UABDL2 tmp.8h, op1.16b, op2.16b UABAL tmp.8h, op1.16b, op2.16b UADALP op3.4s, tmp.8h To work through the semantics let's say operands 1 and 2 are: op1 { a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 } op2 { b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15 } op3 { c0, c1, c2, c3 } The UABDL2 takes the upper V8QI elements, computes their absolute differences, widens them and stores them into the V8HImode tmp: tmp { ABS(a[8]-b[8]), ABS(a[9]-b[9]), ABS(a[10]-b[10]), ABS(a[11]-b[11]), ABS(a[12]-b[12]), ABS(a[13]-b[13]), ABS(a[14]-b[14]), ABS(a[15]-b[15]) } The UABAL after that takes the lower V8QI elements, computes their absolute differences, widens them and accumulates them into the V8HImode tmp from the previous step: tmp { ABS(a[8]-b[8])+ABS (a[0]-b[0]), ABS(a[9]-b[9])+ABS(a[1]-b[1]), ABS(a[10]-b[10])+ABS(a[2]-b[2]), ABS(a[11]-b[11])+ABS(a[3]-b[3]), ABS(a[12]-b[12])+ABS(a[4]-b[4]), ABS(a[13]-b[13])+ABS(a[5]-b[5]), ABS(a[14]-b[14])+ABS(a[6]-b[6]), ABS(a[15]-b[15])+ABS(a[7]-b[7]) } Finally the UADALP does a pairwise widening reduction and accumulation into the V4SImode op3: op3 { c0+ABS(a[8]-b[8])+ABS(a[0]-b[0])+ABS(a[9]-b[9])+ABS(a[1]-b[1]), c1+ABS(a[10]-b[10])+ABS(a[2]-b[2])+ABS(a[11]-b[11])+ABS(a[3]-b[3]), c2+ABS(a[12]-b[12])+ABS(a[4]-b[4])+ABS(a[13]-b[13])+ABS(a[5]-b[5]), c3+ABS(a[14]-b[14])+ABS(a[6]-b[6])+ABS(a[15]-b[15])+ABS(a[7]-b[7]) } (sorry for the text dump) Remember, according to [1] the exact reduction sequence doesn't matter (for integer arithmetic at least). I've considered other sequences as well (thanks Wilco), for example * UABD + UADDLP + UADALP * UABLD2 + UABDL + UADALP + UADALP I ended up settling in the sequence in this patch as it's short (3 instructions) and in the future we can potentially look to optimise multiple occurrences of these into something even faster (for example accumulating into H registers for longer before doing a single UADALP in the end to accumulate into the final S register). If your microarchitecture has some some strong preferences for a particular sequence, please let me know or, even better, propose a patch to parametrise the generation sequence by code (or the appropriate RTX cost). This expansion allows the vectoriser to avoid unpacking the bytes in two steps and performing V4SI arithmetic on them. So, for the code: unsigned char pix1[N], pix2[N]; int foo (void) { int i_sum = 0; int i; for (i = 0; i < 16; i++) i_sum += __builtin_abs (pix1[i] - pix2[i]); return i_sum; } we now generate on aarch64: foo: adrp x1, pix1 add x1, x1, :lo12:pix1 movi v0.4s, 0 adrp x0, pix2 add x0, x0, :lo12:pix2 ldr q2, [x1] ldr q3, [x0] uabdl2 v1.8h, v2.16b, v3.16b uabal v1.8h, v2.8b, v3.8b uadalp v0.4s, v1.8h addv s0, v0.4s umov w0, v0.s[0] ret instead of: foo: adrp x1, pix1 adrp x0, pix2 add x1, x1, :lo12:pix1 add x0, x0, :lo12:pix2 ldr q0, [x1] ldr q4, [x0] ushll v1.8h, v0.8b, 0 ushll2 v0.8h, v0.16b, 0 ushll v2.8h, v4.8b, 0 ushll2 v4.8h, v4.16b, 0 usubl v3.4s, v1.4h, v2.4h usubl2 v1.4s, v1.8h, v2.8h usubl v2.4s, v0.4h, v4.4h usubl2 v0.4s, v0.8h, v4.8h abs v3.4s, v3.4s abs v1.4s, v1.4s abs v2.4s, v2.4s abs v0.4s, v0.4s add v1.4s, v3.4s, v1.4s add v1.4s, v2.4s, v1.4s add v0.4s, v0.4s, v1.4s addv s0, v0.4s umov w0, v0.s[0] ret So I expect this new expansion to be better than the status quo in any case. Bootstrapped and tested on aarch64-none-linux-gnu. This gives about 8% on 525.x264_r from SPEC2017 on a Cortex-A72. * config/aarch64/aarch64.md ("unspec"): Define UNSPEC_SABAL, UNSPEC_SABDL2, UNSPEC_SADALP, UNSPEC_UABAL, UNSPEC_UABDL2, UNSPEC_UADALP values. * config/aarch64/iterators.md (ABAL): New int iterator. (ABDL2): Likewise. (ADALP): Likewise. (sur): Add mappings for the above. * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>_3): New define_insn. (aarch64_<sur>abal<mode>_4): Likewise. (aarch64_<sur>adalp<mode>_3): Likewise. (<sur>sadv16qi): New define_expand. * gcc.c-torture/execute/ssad-run.c: New test. * gcc.c-torture/execute/usad-run.c: Likewise. * gcc.target/aarch64/ssadv16qi.c: Likewise. * gcc.target/aarch64/usadv16qi.c: Likewise. From-SVN: r260437
2018-05-21i386.md (*movsf_internal): AVX falsedep fix.Alexander Nesterovskiy1-11/+17
2018-05-21 Alexander Nesterovskiy <alexander.nesterovskiy@intel.com> gcc/ * config/i386/i386.md (*movsf_internal): AVX falsedep fix. (*movdf_internal): Ditto. (*rcpsf2_sse): Ditto. (*rsqrtsf2_sse): Ditto. (*sqrt<mode>2_sse): Ditto. From-SVN: r260436
2018-05-21Add missing AArch64 NEON instrinctics for Armv8.2-a to Armv8.4-aTamar Christina4-18/+124
This patch adds the missing neon intrinsics for all 128 bit vector Integer modes for the three-way XOR and negate and xor instructions for Arm8.2-a to Armv8.4-a. gcc/ 2018-05-21 Tamar Christina <tamar.christina@arm.com> * config/aarch64/aarch64-simd.md (aarch64_eor3qv8hi): Change to eor3q<mode>4. (aarch64_bcaxqv8hi): Change to bcaxq<mode>4. * config/aarch64/aarch64-simd-builtins.def (veor3q_u8, veor3q_u32, veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8, vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32, vbcaxq_s64): New. * config/aarch64/arm_neon.h: Likewise. * config/aarch64/iterators.md (VQ_I): New. gcc/testsuite/ 2018-05-21 Tamar Christina <tamar.christina@arm.com> * gcc.target/gcc.target/aarch64/sha3.h (veor3q_u8, veor3q_u32, veor3q_u64, veor3q_s8, veor3q_s16, veor3q_s32, veor3q_s64, vbcaxq_u8, vbcaxq_u32, vbcaxq_u64, vbcaxq_s8, vbcaxq_s16, vbcaxq_s32, vbcaxq_s64): New. * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise. * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise. * gcc.target/gcc.target/aarch64/sha3_1.c: Likewise. From-SVN: r260435
2018-05-21[ARC] Add multilib support for linux targetsAlexey Brodkin1-0/+25
We used to build baremetal (AKA Elf32) multilibbed toolchains for years now but never made that for Linux targets since there were problems with uClibc n multilib setup. Now with help of Crosstool-NG it is finally possible to create uClibc-based multilibbed toolchains and so we add relevant CPUs for multilib in case of configuration for "arc*-*-linux*". This will be essentially useful for glibc-based multilibbbed toolchains in the future. gcc/ 2018-05-16 Alexey Brodkin <abrodkin@synopsys.com> * config.gcc: Add arc/t-multilib-linux to tmake_file for arc*-*-linux*. * config/arc/t-multilib-linux: Specify MULTILIB_OPTIONS and MULTILIB_DIRNAMES From-SVN: r260434
2018-05-20[NDS32] Set call address constraint.Chung-Ju Wu2-4/+9
gcc/ * config/nds32/constraints.md (S): New constraint. * config/nds32/nds32.md (call_internal): Use constraint S. (call_value_internal): Likewise. (sibcall_internal): Likewise. (sibcall_value_internal): Likewise. From-SVN: r260422
2018-05-20[NDS32] Adjust register move cost for graywolf cpu.Kito Cheng1-2/+22
gcc/ * config/nds32/nds32.c (nds32_register_move_cost): Take garywolf cpu into consideration. Co-Authored-By: Chung-Ju Wu <jasonwucj@gmail.com> From-SVN: r260412
2018-05-20[NDS32] Rewrite cost model.Kito Cheng3-67/+543
gcc/ * config/nds32/nds32-cost.c (rtx_cost_model_t): New structure. (insn_size_16bit, insn_size_32bit): New variables for cost evaluation. (nds32_rtx_costs_impl): Simplify. (nds32_address_cost_impl): Simplify. (nds32_init_rtx_costs): New function. (nds32_rtx_costs_speed_prefer): Likewise. (nds32_rtx_costs_size_prefer): Likewise. (nds32_address_cost_speed_prefer): Likewise. (nds32_address_cost_speed_fwprop): Likewise. (nds32_address_cost_size_prefer): Likewise. * config/nds32/nds32-protos.h (nds32_init_rtx_costs): Declare. * config/nds32/nds32.c (nds32_option_override): Use nds32_init_rtx_costs function. Co-Authored-By: Chung-Ju Wu <jasonwucj@gmail.com> From-SVN: r260411
2018-05-20[NDS32] Print pipeline model in asm header.Chung-Ju Wu2-0/+54
gcc/ * config/nds32/nds32.c (nds32_asm_file_start): Output pipeline model. * config/nds32/nds32.h (TARGET_PIPELINE_N7): Define. (TARGET_PIPELINE_N8): Likewise. (TARGET_PIPELINE_N10): Likewise. (TARGET_PIPELINE_N13): Likewise. (TARGET_PIPELINE_GRAYWOLF): Likewise. From-SVN: r260409
2018-05-19[NDS32] Update copyright year in nds32-fpu.md.Monk Chiang1-1/+1
gcc/ * config/nds32/nds32-fpu.md: Update copyright year. From-SVN: r260402
2018-05-19[NDS32] Adjust ASM spec.Chung-Ju Wu1-1/+6
gcc/ * config/nds32/nds32.h (ASM_SPEC): Adjust spec rule. From-SVN: r260401
2018-05-19[NDS32] New option -minline-asm-r15.Chung-Ju Wu2-2/+9
gcc/ * config/nds32/nds32.c (nds32_md_asm_adjust): Consider flag_inline_asm_r15 variable. * config/nds32/nds32.opt (minline-asm-r15): New option. From-SVN: r260400
2018-05-19[NDS32] Add abssi2 pattern.Chung-Ju Wu1-0/+14
gcc/ * common/config/nds32/nds32-common.c (TARGET_DEFAULT_TARGET_FLAGS): Add MASK_HW_ABS. * config/nds32/nds32.md (abssi2): New pattern. From-SVN: r260398
2018-05-19i386.md (rex64namesuffix): New mode attribute.Uros Bizjak2-219/+67
* config/i386/i386.md (rex64namesuffix): New mode attribute. * config/i386/sse.md (sse_cvtsi2ss<rex64namesuffix><round_name>): Merge insn pattern from sse_cvtsi2ss<round_name> and sse_cvtsi2ssq<round_name> using SWI48 mode iterator. (sse_cvtss2si<rex64namesuffix><round_name>): Merge insn pattern from sse_cvtss2si<round_name> and sse_cvtss2siq<round_name> using SWI48 mode iterator. (sse_cvtss2si<rex64namesuffix>_2): Merge insn pattern from sse_cvtss2si_2 and sse_cvtss2siq_2 using SWI48 mode iterator. (sse_cvttss2si<rex64namesuffix><round_saeonly_name>): Merge insn pattern from sse_cvttss2si<round_saeonly_name> and sse_cvttss2siq<round_saeonly_name> using SWI48 mode iterator. (avx512f_vcvtss2usi<rex64namesuffix><round_name>): Merge insn pattern from avx512f_vcvtss2usi<round_name> and avx512f_vcvtss2usiq<round_name> using SWI48 mode iterator. (avx512f_vcvttss2usi<rex64namesuffix><round_saeonly_name>): Merge insn pattern from avx512f_vcvttss2usi<round_saeonly_name> and avx512f_vcvttss2usiq<round_saeonly_name> using SWI48 mode iterator. (avx512f_vcvtsd2usi<rex64namesuffix><round_name>): Merge insn pattern from avx512f_vcvtsd2usi<round_name> and avx512f_vcvtsd2usiq<round_name> using SWI48 mode iterator. (avx512f_vcvttsd2usi<rex64namesuffix><round_saeonly_name>): Merge insn pattern from avx512f_vcvttsd2usi<round_saeonly_name> and avx512f_vcvttsd2usiq<round_saeonly_name> using SWI48 mode iterator. (sse2_cvtsd2si<rex64namesuffix><round_name>): Merge insn pattern from sse2_cvtsd2si<round_name> and sse2_cvtsd2siq<round_name> using SWI48 mode iterator. (sse2_cvtsd2si<rex64namesuffix>_2): Merge insn pattern from sse2_cvtsd2si_2 and sse2_cvtsd2siq_2 using SWI48 mode iterator. (sse_cvttsd2si<rex64namesuffix><round_saeonly_name>): Merge insn pattern from sse_cvttsd2si<round_saeonly_name> and sse_cvttsd2siq<round_saeonly_name> using SWI48 mode iterator. From-SVN: r260397
2018-05-19[NDS32] Refine functions that deal with lwm and smw operations.Chung-Ju Wu2-23/+72
gcc/ * config/nds32/nds32-md-auxiliary.c (nds32_valid_smw_lwm_base_p): Refine. (nds32_output_smw_single_word): Refine. (nds32_output_smw_double_word): New. * config/nds32/nds32-protos.h (nds32_output_smw_double_word): New. From-SVN: r260396
2018-05-19[NDS32] Refine nds32-md-auxiliary.c.Chung-Ju Wu1-18/+8
gcc/ * config/nds32/nds32-md-auxiliary.c (nds32_output_stack_push): Refine. (nds32_output_stack_pop): Refine. (nds32_expand_unaligned_load): Refine. (nds32_expand_unaligned_store): Refine. From-SVN: r260394
2018-05-19[NDS32] Support PIC and TLS.Kuan-Lin Chen11-137/+1036
gcc/ * config/nds32/constants.md: Add TP_REGNUM constant. (unspec_element): Add UNSPEC_GOTINIT, UNSPEC_GOT, UNSPEC_GOTOFF, UNSPEC_PLT, UNSPEC_TLSGD, UNSPEC_TLSLD, UNSPEC_TLSIE, UNSPEC_TLSLE and UNSPEC_ADD32. * config/nds32/nds32-doubleword.md: Consider flag_pic. * config/nds32/nds32-dspext.md (mov<mode>): Expand TLS and PIC cases. * config/nds32/nds32-predicates.c (nds32_const_unspec_p): New. * config/nds32/nds32-md-auxiliary.c: Implementation that support TLS and PIC code generation. * config/nds32/nds32-protos.h: Declarations that support TLS and PIC code generation. * config/nds32/nds32-relax-opt.c: Consider TLS and PIC for relax optimization. * config/nds32/nds32.md: Support TLS and PIC. * config/nds32/nds32.c: Support TLS and PIC. * config/nds32/nds32.h (nds32_relax_insn_type): New enum type. * config/nds32/predicates.md (nds32_nonunspec_symbolic_operand): New predicate. Co-Authored-By: Chung-Ju Wu <jasonwucj@gmail.com> From-SVN: r260393
2018-05-19[NDS32] Use machine mode with E_ prefix.Chung-Ju Wu1-2/+2
gcc/ * config/nds32/nds32-predicates.c (const_vector_to_hwint): Use machine mode with E_ prefix. From-SVN: r260391
2018-05-19[NDS32] Implment indirect funciton call attribute.Kuan-Lin Chen10-25/+461
* config/nds32/constants.md (unspec_element): Add UNSPEC_ICT. * config/nds32/nds32-md-auxiliary.c (symbolic_reference_mentioned_p): New. (nds32_legitimize_ict_address): New. (nds32_expand_ict_move): New. (nds32_indirect_call_referenced_p): New. (nds32_symbol_binds_local_p): Delete. (nds32_long_call_p): Modify. * config/nds32/nds32-opts.h (nds32_ict_model_type): New enum type. * config/nds32/nds32-protos.h (symbolic_reference_mentioned_p): Declare. (nds32_legitimize_ict_address): Declare. (nds32_expand_ict_move): Declare. (nds32_indirect_call_referenced_p): Declare. * config/nds32/nds32-relax-opt.c (nds32_ict_const_p): New. (nds32_relax_group): Use nds32_ict_const_p as condition. * config/nds32/nds32.c (nds32_attribute_table): Add "indirect_call". (nds32_asm_file_start): Output ict_model directive in asm code. (nds32_legitimate_address_p): Consider indirect call. (nds32_print_operand): Consider indirect call. (nds32_print_operand_address): Consider indirect call. (nds32_insert_attributes): Handle "indirect_call" attribute. (TARGET_LEGITIMATE_ADDRESS_P): Define. (TARGET_LEGITIMATE_CONSTANT_P): Define. (TARGET_CANNOT_FORCE_CONST_MEM): Define. (TARGET_DELEGITIMIZE_ADDRESS): Define. (TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA): Define. * config/nds32/nds32.h (SYMBOLIC_CONST_P): Define. (TARGET_ICT_MODEL_SMALL): Define. (TARGET_ICT_MODEL_LARGE): Define. * config/nds32/nds32.md (movsi): Consider ict model. (call, call_value): Consider ict model. (sibcall, sibcall_value): Consider ict model. * config/nds32/nds32.opt (mict-model): New option. * config/nds32/predicates.md (nds32_symbolic_operand): Consider ict model. Co-Authored-By: Chung-Ju Wu <jasonwucj@gmail.com> From-SVN: r260390
2018-05-18RISC-V: Add RV32E support.Kito Cheng5-8/+44
Kito Cheng <kito.cheng@gmail.com> Monk Chiang <sh.chiang04@gmail.com> gcc/ * common/config/riscv/riscv-common.c (riscv_parse_arch_string): Add support to parse rv32e*. Clear MASK_RVE for rv32i and rv64i. * config.gcc (riscv*-*-*): Add support for rv32e* and ilp32e. * config/riscv/riscv-c.c (riscv_cpu_cpp_builtins): Define __riscv_32e when TARGET_RVE. Handle ABI_ILP32E as soft-float ABI. * config/riscv/riscv-opts.h (riscv_abi_type): Add ABI_ILP32E. * config/riscv/riscv.c (riscv_compute_frame_info): When TARGET_RVE, compute save_libcall_adjustment properly. (riscv_option_override): Call error if TARGET_RVE and not ABI_ILP32E. (riscv_conditional_register_usage): Handle TARGET_RVE and ABI_ILP32E. * config/riscv/riscv.h (UNITS_PER_FP_ARG): Handle ABI_ILP32E. (STACK_BOUNDARY, ABI_STACK_BOUNDARY): Handle TARGET_RVE. (GP_REG_LAST, MAX_ARGS_IN_REGISTERS): Likewise. (ABI_SPEC): Handle mabi=ilp32e. * config/riscv/riscv.opt (abi_type): Add ABI_ILP32E. (RVE): Add RVE mask. * doc/invoke.texi (RISC-V options) <-mabi>: Add ilp32e info. <-march>: Add rv32e as an example. gcc/testsuite/ * gcc.dg/stack-usage-1.c: Add support for rv32e. libgcc/ * config/riscv/save-restore.S: Add support for rv32e. Co-Authored-By: Jim Wilson <jimw@sifive.com> Co-Authored-By: Monk Chiang <sh.chiang04@gmail.com> From-SVN: r260384
2018-05-18re PR bootstrap/85838 (-Wmaybe-uninitialized warning in sparc.c ↵Eric Botcazou1-0/+2
(sparc_expand_builtin) breaks SPARC bootstrap) PR bootstrap/85838 * config/sparc/sparc.c (sparc_expand_builtin): Always initialize op[0]. From-SVN: r260374
2018-05-18[arm][2/2] Remove support for -march=armv3 and olderKyrylo Tkachov9-371/+85
We deprecated architecture versions earlier than Armv4T in GCC 6 [1]. This patch removes support for architectures lower than Armv4. That is the -march values armv2, armv2a, armv3, armv3m are removed with this patch. I did not remove armv4 because it's a bit more involved code-wise and there has been some pushback on the implications for -mcpu=strongarm support. Removing armv3m and earlier though is pretty straightforward. This allows us to get rid of the armv3m and mode32 feature bits in arm-cpus.in as they can be assumed to be universally available. Consequently the mcpu values arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620, arm7, arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720, arm710c, arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dm are now also removed. Bootstrapped and tested on arm-none-linux-gnueabihf and on arm-none-eabi with an aprofile multilib configuration (which builds quite a lot of library configurations). [1] https://gcc.gnu.org/gcc-6/changes.html#arm * config/arm/arm-cpus.in (armv3m, mode32): Delete features. (ARMv4): Update. (ARMv2, ARMv3, ARMv3m): Delete fgroups. (ARMv6m): Update. (armv2, armv2a, armv3, armv3m): Delete architectures. (arm2, arm250, arm3, arm6, arm60, arm600, arm610, arm620, arm7, arm7d, arm7di, arm70, arm700, arm700i, arm710, arm720, arm710c, arm7100, arm7500, arm7500fe, arm7m, arm7dm, arm7dmi): Delete cpus. * config/arm/arm.md (maddsidi4): Remove check for arm_arch3m. (*mulsidi3adddi): Likewise. (mulsidi3): Likewise. (*mulsidi3_nov6): Likewise. (umulsidi3): Likewise. (umulsidi3_nov6): Likewise. (umaddsidi4): Likewise. (*umulsidi3adddi): Likewise. (smulsi3_highpart): Likewise. (*smulsi3_highpart_nov6): Likewise. (umulsi3_highpart): Likewise. (*umulsi3_highpart_nov6): Likewise. * config/arm/arm.h (arm_arch3m): Delete. * config/arm/arm.c (arm_arch3m): Delete. (arm_option_override_internal): Update armv3-related comment. (arm_configure_build_target): Delete use of isa_bit_mode32. (arm_option_reconfigure_globals): Delete set of arm_ach3m. (arm_rtx_costs_internal): Delete check of arm_arch3m. * config/arm/arm-fixed.md (mulsq3): Delete check for arm_arch3m. (mulsa3): Likewise. (mulusa3): Likewise. * config/arm/arm-protos.h (arm_arch3m): Delete. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm-tune.md: Likewise. * config/arm/t-arm-elf (all_early_nofp): Delete mentions of deleted architectures. * gcc.target/arm/pr62554.c: Delete. * gcc.target/arm/pr69610-1.c: Likewise. * gcc.target/arm/pr69610-2.c: Likewise. From-SVN: r260363
2018-05-18[arm][1/2] Remove support for deprecated -march=armv5 and armv5eKyrylo Tkachov11-122/+85
The -march=armv5 and armv5e options have been deprecated in GCC 7 [1]. This patch removes support for them. It's mostly mechanical stuff. The functionality that was previously gated on arm_arch5 is now gated on arm_arch5t and the functionality that was gated on arm_arch5e is now gated on arm_arch5te. A path in TARGET_OS_CPP_BUILTINS for VxWorks is now unreachable and therefore is deleted. References to armv5 and armv5e are deleted/updated throughout the source tree and testsuite. Bootstrapped and tested on arm-none-linux-gnueabihf. Also built a cc1 for arm-wrs-vxworks as a sanity check. * config/arm/arm-cpus.in (armv5, armv5e): Delete features. (armv5t, armv5te): New features. (ARMv5, ARMv5e): Delete fgroups. (ARMv5t, ARMv5te): Adjust for above changes. (ARMv6m): Likewise. (armv5, armv5e): Delete arches. * config/arm/arm.md (*call_reg_armv5): Use arm_arch5t instead of arm_arch5. (*call_reg_arm): Likewise. (*call_value_reg_armv5): Likewise. (*call_value_reg_arm): Likewise. (*call_symbol): Likewise. (*call_value_symbol): Likewise. (*sibcall_insn): Likewise. (*sibcall_value_insn): Likewise. (clzsi2): Likewise. (prefetch): Likewise. (define_split and define_peephole2 dependent on arm_arch5): Likewise. * config/arm/arm.h (TARGET_LDRD): Use arm_arch5te instead of arm_arch5e. (TARGET_ARM_QBIT): Likewise. (TARGET_DSP_MULTIPLY): Likewise. (enum base_architecture): Delete BASE_ARCH_5, BASE_ARCH_5E. (arm_arch5, arm_arch5e): Delete. (arm_arch5t, arm_arch5te): Declare. * config/arm/arm.c (arm_arch5, arm_arch5e): Delete. (arm_arch5t): Declare. (arm_option_reconfigure_globals): Update for the above. (arm_options_perform_arch_sanity_checks): Update comment, replace use of arm_arch5 with arm_arch5t. (use_return_insn): Likewise. (arm_emit_call_insn): Likewise. (output_return_instruction): Likewise. (arm_final_prescan_insn): Likewise. (arm_coproc_builtin_available): Likewise. * config/arm/arm-c.c (arm_cpu_builtins): Replace arm_arch5 and arm_arch5e with arm_arch5t and arm_arch5te. * config/arm/arm-protos.h (arm_arch5, arm_arch5e): Delete. (arm_arch5t, arm_arch5te): Declare. * config/arm/arm-tables.opt: Regenerate. * config/arm/t-arm-elf: Remove references to armv5, armv5e. * config/arm/t-multilib: Likewise. * config/arm/thumb1.md (*call_reg_thumb1_v5): Check arm_arch5t instead of arm_arch5. (*call_reg_thumb1): Likewise. (*call_value_reg_thumb1_v5): Likewise. (*call_value_reg_thumb1): Likewise. * config/arm/vxworks.h (TARGET_OS_CPP_BUILTINS): Remove now unreachable path. * doc/invoke.texi (ARM Options): Remove references to armv5, armv5e. * gcc.target/arm/pr40887.c: Update comment. * lib/target-supports.exp: Don't generate effective target checks and related helpers for armv5. Update comment. * gcc.target/arm/armv5_thumb_isa.c: Delete. * gcc.target/arm/di-longlong64-sync-withhelpers.c: Update effective target check and options. * config/arm/libunwind.S: Update comment relating to armv5. From-SVN: r260362
2018-05-18[AArch64] Unify vec_set patterns, support floating-point vector modes properlyKyrylo Tkachov1-76/+9
We've a deficiency in our vec_set family of patterns. We don't support directly loading a vector lane using LD1 for V2DImode and all the vector floating-point modes. We do do it correctly for the other integer vector modes (V4SI, V8HI etc) though. The alternatives on the relative floating-point patterns only allow a register-to-register INS instruction. That means if we want to load a value into a vector lane we must first load it into a scalar register and then perform an INS, which is wasteful. There is also an explicit V2DI vec_set expander dangling around for no reason that I can see. It seems to do the exact same things as the other vec_set expanders. This patch removes that. It now unifies all vec_set expansions into a single "vec_set<mode>" define_expand using the catch-all VALL_F16 iterator. With this patch we avoid loading values into scalar registers and then doing an explicit INS on them to move them into the desired vector lanes. For example for: typedef float v4sf __attribute__ ((vector_size (16))); typedef long long v2di __attribute__ ((vector_size (16))); v2di foo_v2di (long long *a, long long *b) { v2di res = { *a, *b }; return res; } v4sf foo_v4sf (float *a, float *b, float *c, float *d) { v4sf res = { *a, *b, *c, *d }; return res; } we currently generate: foo_v2di: ldr d0, [x0] ldr x0, [x1] ins v0.d[1], x0 ret foo_v4sf: ldr s0, [x0] ldr s3, [x1] ldr s2, [x2] ldr s1, [x3] ins v0.s[1], v3.s[0] ins v0.s[2], v2.s[0] ins v0.s[3], v1.s[0] ret but with this patch we generate the much cleaner: foo_v2di: ldr d0, [x0] ld1 {v0.d}[1], [x1] ret foo_v4sf: ldr s0, [x0] ld1 {v0.s}[1], [x1] ld1 {v0.s}[2], [x2] ld1 {v0.s}[3], [x3] ret * config/aarch64/aarch64-simd.md (vec_set<mode>): Use VALL_F16 mode iterator. Delete separate integer-mode vec_set<mode> expander. (aarch64_simd_vec_setv2di): Delete. (vec_setv2di): Delete. (aarch64_simd_vec_set<mode>): Delete all other patterns with that name. Use VALL_F16 mode iterator. Add LD1 alternative and use vwcore for the "w, r" alternative. * gcc.target/aarch64/vect-init-ld1.c: New test. From-SVN: r260351
2018-05-18Replace FMA_EXPR with one internal fn per optabRichard Sandiford2-15/+30
There are four optabs for various forms of fused multiply-add: fma, fms, fnma and fnms. Of these, only fma had a direct gimple representation. For the other three we relied on special pattern- matching during expand, although tree-ssa-math-opts.c did have some code to try to second-guess what expand would do. This patch removes the old FMA_EXPR representation of fma and introduces four new internal functions, one for each optab. IFN_FMA is tied to BUILT_IN_FMA* while the other three are independent directly-mapped internal functions. It's then possible to do the pattern-matching in match.pd and tree-ssa-math-opts.c (via folding) can select the exact FMA-based operation. The BRIG & HSA parts are a best guess, but seem relatively simple. 2018-05-18 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (scalar_all_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * gimple-fold.h (follow_all_ssa_edges): Declare. * gimple-fold.c (follow_all_ssa_edges): New function. * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the gimple_build interface and use follow_all_ssa_edges to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_scalar_all_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require scalar_all_fma. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. From-SVN: r260348
2018-05-17RISC-V: Optimize switch with sign-extended index.Jim Wilson1-2/+12
gcc/ * expr.c (do_tablejump): When converting index to Pmode, if we have a sign extended promoted subreg, and the range does not have the sign bit set, then do a sign extend. * config/riscv/riscv.c (riscv_extend_comparands): In unsigned QImode test, check for sign extended subreg and/or constant operands, and do a sign extend in that case. gcc/testsuite/ * gcc.target/riscv/switch-qi.c: New. * gcc.target/riscv/switch-si.c: New. From-SVN: r260340
2018-05-17thunderx2t99.md (thunderx2t99_ls_both): Delete.Steve Ellcey1-50/+60
2018-05-17 Steve Ellcey <sellcey@cavium.com> * config/aarch64/thunderx2t99.md (thunderx2t99_ls_both): Delete. (thunderx2t99_multiple): Delete psuedo-units from used cpus. Add untyped. (thunderx2t99_alu_shift): Remove alu_shift_reg, alus_shift_reg. Change logics_shift_reg to logics_shift_imm. (thunderx2t99_fp_loadpair_basic): Delete. (thunderx2t99_fp_storepair_basic): Delete. (thunderx2t99_asimd_int): Add neon_sub and neon_sub_q types. (thunderx2t99_asimd_polynomial): Delete. (thunderx2t99_asimd_fp_simple): Add neon_fp_mul_s_scalar_q and neon_fp_mul_d_scalar_q. (thunderx2t99_asimd_fp_conv): Add *int_to_fp* types. (thunderx2t99_asimd_misc): Delete neon_dup and neon_dup_q. (thunderx2t99_asimd_recip_step): Add missing *sqrt* types. (thunderx2t99_asimd_lut): Add missing tbl types. (thunderx2t99_asimd_ext): Delete. (thunderx2t99_asimd_load1_1_mult): Delete. (thunderx2t99_asimd_load1_2_mult): Delete. (thunderx2t99_asimd_load1_ldp): New. (thunderx2t99_asimd_load1): New. (thunderx2t99_asimd_load2): Add missing *load2* types. (thunderx2t99_asimd_load3): New. (thunderx2t99_asimd_load4): New. (thunderx2t99_asimd_store1_1_mult): Delete. (thunderx2t99_asimd_store1_2_mult): Delete. (thunderx2t99_asimd_store2_mult): Delete. (thunderx2t99_asimd_store2_onelane): Delete. (thunderx2t99_asimd_store_stp): New. (thunderx2t99_asimd_store1): New. (thunderx2t99_asimd_store2): New. (thunderx2t99_asimd_store3): New. (thunderx2t99_asimd_store4): New. From-SVN: r260335
2018-05-17arm_cmse.h (cmse_nsfptr_create, [...]): Remove #include <stdint.h>.Jerome Lambourg1-3/+2
2018-05-17 Jerome Lambourg <lambourg@adacore.com> gcc/ * config/arm/arm_cmse.h (cmse_nsfptr_create, cmse_is_nsfptr): Remove #include <stdint.h>. Replace intptr_t with __INTPTR_TYPE__. libgcc/ * config/arm/cmse.c (cmse_check_address_range): Replace UINTPTR_MAX with __UINTPTR_MAX__ and uintptr_t with __UINTPTR_TYPE__. From-SVN: r260330
2018-05-17re PR tree-optimization/85698 (CPU2017 525.x264_r fails starting with r257581)Pat Haugen1-1/+1
PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. * gcc.target/powerpc/pr85698.c: New test. Co-Authored-By: Segher Boessenkool <segher@kernel.crashing.org> From-SVN: r260329
2018-05-17re PR target/85323 (SSE/AVX/AVX512 shift by 0 not optimized away)Jakub Jelinek1-11/+31
PR target/85323 * config/i386/i386.c (ix86_fold_builtin): Handle masked shifts even if the mask is not all ones. * gcc.target/i386/pr85323-7.c: New test. * gcc.target/i386/pr85323-8.c: New test. * gcc.target/i386/pr85323-9.c: New test. From-SVN: r260313
2018-05-17re PR target/85323 (SSE/AVX/AVX512 shift by 0 not optimized away)Jakub Jelinek1-8/+155
PR target/85323 * config/i386/i386.c (ix86_fold_builtin): Fold shift builtins by vector. (ix86_gimple_fold_builtin): Likewise. * gcc.target/i386/pr85323-4.c: New test. * gcc.target/i386/pr85323-5.c: New test. * gcc.target/i386/pr85323-6.c: New test. From-SVN: r260312