aboutsummaryrefslogtreecommitdiff
path: root/gcc/simplify-rtx.c
AgeCommit message (Collapse)AuthorFilesLines
2019-02-24re PR rtl-optimization/89445 (_mm512_maskz_loadu_pd "forgets" to use the mask)Jakub Jelinek1-2/+4
PR rtl-optimization/89445 * simplify-rtx.c (simplify_ternary_operation): Don't use simplify_merge_mask on operands that may trap. * rtlanal.c (may_trap_p_1): Use FLOAT_MODE_P instead of SCALAR_FLOAT_MODE_P checks. For integral division by zero, if second operand is CONST_VECTOR, check if any element could be zero. Don't expect traps for VEC_{MERGE,SELECT,CONCAT,DUPLICATE} unless their operands can trap. * gcc.target/i386/avx512f-pr89445.c: New test. From-SVN: r269176
2019-01-09PR other/16615 [1/5]Sandra Loosemore1-1/+1
2019-01-09 Sandra Loosemore <sandra@codesourcery.com> PR other/16615 [1/5] contrib/ * mklog: Mechanically replace "can not" with "cannot". gcc/ * Makefile.in: Mechanically replace "can not" with "cannot". * alias.c: Likewise. * builtins.c: Likewise. * calls.c: Likewise. * cgraph.c: Likewise. * cgraph.h: Likewise. * cgraphclones.c: Likewise. * cgraphunit.c: Likewise. * combine-stack-adj.c: Likewise. * combine.c: Likewise. * common/config/i386/i386-common.c: Likewise. * config/aarch64/aarch64.c: Likewise. * config/alpha/sync.md: Likewise. * config/arc/arc.c: Likewise. * config/arc/predicates.md: Likewise. * config/arm/arm-c.c: Likewise. * config/arm/arm.c: Likewise. * config/arm/arm.h: Likewise. * config/arm/arm.md: Likewise. * config/arm/cortex-r4f.md: Likewise. * config/csky/csky.c: Likewise. * config/csky/csky.h: Likewise. * config/darwin-f.c: Likewise. * config/epiphany/epiphany.md: Likewise. * config/i386/i386.c: Likewise. * config/i386/sol2.h: Likewise. * config/m68k/m68k.c: Likewise. * config/mcore/mcore.h: Likewise. * config/microblaze/microblaze.md: Likewise. * config/mips/20kc.md: Likewise. * config/mips/sb1.md: Likewise. * config/nds32/nds32.c: Likewise. * config/nds32/predicates.md: Likewise. * config/pa/pa.c: Likewise. * config/rs6000/e300c2c3.md: Likewise. * config/rs6000/rs6000.c: Likewise. * config/s390/s390.h: Likewise. * config/sh/sh.c: Likewise. * config/sh/sh.md: Likewise. * config/spu/vmx2spu.h: Likewise. * cprop.c: Likewise. * dbxout.c: Likewise. * df-scan.c: Likewise. * doc/cfg.texi: Likewise. * doc/extend.texi: Likewise. * doc/fragments.texi: Likewise. * doc/gty.texi: Likewise. * doc/invoke.texi: Likewise. * doc/lto.texi: Likewise. * doc/md.texi: Likewise. * doc/objc.texi: Likewise. * doc/rtl.texi: Likewise. * doc/tm.texi: Likewise. * dse.c: Likewise. * emit-rtl.c: Likewise. * emit-rtl.h: Likewise. * except.c: Likewise. * expmed.c: Likewise. * expr.c: Likewise. * fold-const.c: Likewise. * genautomata.c: Likewise. * gimple-fold.c: Likewise. * hard-reg-set.h: Likewise. * ifcvt.c: Likewise. * ipa-comdats.c: Likewise. * ipa-cp.c: Likewise. * ipa-devirt.c: Likewise. * ipa-fnsummary.c: Likewise. * ipa-icf.c: Likewise. * ipa-inline-transform.c: Likewise. * ipa-inline.c: Likewise. * ipa-polymorphic-call.c: Likewise. * ipa-profile.c: Likewise. * ipa-prop.c: Likewise. * ipa-pure-const.c: Likewise. * ipa-reference.c: Likewise. * ipa-split.c: Likewise. * ipa-visibility.c: Likewise. * ipa.c: Likewise. * ira-build.c: Likewise. * ira-color.c: Likewise. * ira-conflicts.c: Likewise. * ira-costs.c: Likewise. * ira-int.h: Likewise. * ira-lives.c: Likewise. * ira.c: Likewise. * ira.h: Likewise. * loop-invariant.c: Likewise. * loop-unroll.c: Likewise. * lower-subreg.c: Likewise. * lra-assigns.c: Likewise. * lra-constraints.c: Likewise. * lra-eliminations.c: Likewise. * lra-lives.c: Likewise. * lra-remat.c: Likewise. * lra-spills.c: Likewise. * lra.c: Likewise. * lto-cgraph.c: Likewise. * lto-streamer-out.c: Likewise. * postreload-gcse.c: Likewise. * predict.c: Likewise. * profile-count.h: Likewise. * profile.c: Likewise. * recog.c: Likewise. * ree.c: Likewise. * reload.c: Likewise. * reload1.c: Likewise. * reorg.c: Likewise. * resource.c: Likewise. * rtl.def: Likewise. * rtl.h: Likewise. * rtlanal.c: Likewise. * sched-deps.c: Likewise. * sched-ebb.c: Likewise. * sched-rgn.c: Likewise. * sel-sched-ir.c: Likewise. * sel-sched.c: Likewise. * shrink-wrap.c: Likewise. * simplify-rtx.c: Likewise. * symtab.c: Likewise. * target.def: Likewise. * toplev.c: Likewise. * tree-call-cdce.c: Likewise. * tree-cfg.c: Likewise. * tree-complex.c: Likewise. * tree-core.h: Likewise. * tree-eh.c: Likewise. * tree-inline.c: Likewise. * tree-loop-distribution.c: Likewise. * tree-nrv.c: Likewise. * tree-profile.c: Likewise. * tree-sra.c: Likewise. * tree-ssa-alias.c: Likewise. * tree-ssa-dce.c: Likewise. * tree-ssa-dom.c: Likewise. * tree-ssa-forwprop.c: Likewise. * tree-ssa-loop-im.c: Likewise. * tree-ssa-loop-ivcanon.c: Likewise. * tree-ssa-loop-ivopts.c: Likewise. * tree-ssa-loop-niter.c: Likewise. * tree-ssa-phionlycprop.c: Likewise. * tree-ssa-phiopt.c: Likewise. * tree-ssa-propagate.c: Likewise. * tree-ssa-threadedge.c: Likewise. * tree-ssa-threadupdate.c: Likewise. * tree-ssa-uninit.c: Likewise. * tree-ssanames.c: Likewise. * tree-streamer-out.c: Likewise. * tree.c: Likewise. * tree.h: Likewise. * vr-values.c: Likewise. gcc/ada/ * exp_ch9.adb: Mechanically replace "can not" with "cannot". * libgnat/s-regpat.ads: Likewise. * par-ch4.adb: Likewise. * set_targ.adb: Likewise. * types.ads: Likewise. gcc/cp/ * cp-tree.h: Mechanically replace "can not" with "cannot". * parser.c: Likewise. * pt.c: Likewise. gcc/fortran/ * class.c: Mechanically replace "can not" with "cannot". * decl.c: Likewise. * expr.c: Likewise. * gfc-internals.texi: Likewise. * intrinsic.texi: Likewise. * invoke.texi: Likewise. * io.c: Likewise. * match.c: Likewise. * parse.c: Likewise. * primary.c: Likewise. * resolve.c: Likewise. * symbol.c: Likewise. * trans-array.c: Likewise. * trans-decl.c: Likewise. * trans-intrinsic.c: Likewise. * trans-stmt.c: Likewise. gcc/go/ * go-backend.c: Mechanically replace "can not" with "cannot". * go-gcc.cc: Likewise. gcc/lto/ * lto-partition.c: Mechanically replace "can not" with "cannot". * lto-symtab.c: Likewise. * lto.c: Likewise. gcc/objc/ * objc-act.c: Mechanically replace "can not" with "cannot". libbacktrace/ * backtrace.h: Mechanically replace "can not" with "cannot". libgcc/ * config/c6x/libunwind.S: Mechanically replace "can not" with "cannot". * config/tilepro/atomic.h: Likewise. * config/vxlib-tls.c: Likewise. * generic-morestack-thread.c: Likewise. * generic-morestack.c: Likewise. * mkmap-symver.awk: Likewise. libgfortran/ * caf/single.c: Mechanically replace "can not" with "cannot". * io/unit.c: Likewise. libobjc/ * class.c: Mechanically replace "can not" with "cannot". * objc/runtime.h: Likewise. * sendmsg.c: Likewise. liboffloadmic/ * include/coi/common/COIResult_common.h: Mechanically replace "can not" with "cannot". * include/coi/source/COIBuffer_source.h: Likewise. libstdc++-v3/ * include/ext/bitmap_allocator.h: Mechanically replace "can not" with "cannot". From-SVN: r267783
2019-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r267494
2018-11-13re PR rtl-optimization/87918 (ICE in simplify_binary_operation, at ↵Jakub Jelinek1-3/+13
simplify-rtx.c:2153 since r264688) PR rtl-optimization/87918 * simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use simplify_gen_relational rather than simplify_gen_binary. * gcc.target/i386/pr87918.c: New test. From-SVN: r266062
2018-11-06re PR middle-end/18041 (OR of two single-bit bitfields is inefficient)Richard Biener1-0/+32
2018-11-06 Richard Biener <rguenther@suse.de> PR middle-end/18041 * simplify-rtx.c (simplify_binary_operation_1): Add pattern matching bitfield insertion. * gcc.target/i386/pr18041-1.c: New testcase. * gcc.target/i386/pr18041-2.c: Likewise. From-SVN: r265829
2018-10-18Limit mask of vec_merge to HOST_BITS_PER_WIDE_INTH.J. Lu1-0/+3
Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is the maximum number of vector elements. * simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT. (test_vector_ops_duplicate): Likewise. From-SVN: r265290
2018-10-18Call simplify_gen_subreg to simplify subreg of vec_mergeH.J. Lu1-6/+7
Simplify (subreg (vec_merge (X) (vector) (const_int ((1 << N) | M))) (N * sizeof (outermode))) to (subreg (X) (N * sizeof (outermode))) * simplify-rtx.c (simplify_subreg): Call simplify_gen_subreg to simplify subreg of vec_merge. From-SVN: r265267
2018-10-18Simplify subreg of vec_merge of vec_duplicateH.J. Lu1-1/+28
We can simplify (subreg (vec_merge (vec_duplicate X) (vector) (const_int ((1 << N) | M))) (N * sizeof (X))) to X when mode of X is the same as of mode of subreg. gcc/ PR target/87537 * simplify-rtx.c (simplify_subreg): Simplify subreg of vec_merge of vec_duplicate. (test_vector_ops_duplicate): Add test for a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE. gcc/testsuite/ PR target/87537 * gcc.target/i386/pr87537-1.c: New test. From-SVN: r265260
2018-09-28Simplify vec_merge according to the mask.Andrew Stubbs1-0/+136
This patch was part of the original patch we acquired from Honza and Martin. It simplifies nested vec_merge operations using the same mask. Self-tests are included. 2018-09-28 Andrew Stubbs <ams@codesourcery.com> Jan Hubicka <jh@suse.cz> Martin Jambor <mjambor@suse.cz> * simplify-rtx.c (simplify_merge_mask): New function. (simplify_ternary_operation): Use it, also see if VEC_MERGEs with the same masks are used in op1 or op2. (test_vec_merge): New function. (test_vector_ops): Call test_vec_merge. Co-Authored-By: Jan Hubicka <jh@suse.cz> Co-Authored-By: Martin Jambor <mjambor@suse.cz> From-SVN: r264688
2018-09-19Remove constant vec_select restriction.Andrew Stubbs1-2/+7
The vec_select operator is documented to require a const_int for the lane selector operand, but GCN has an instruction that can select the lane at runtime, so it seems reasonable to remove this restriction. This patch simply replaces assertions that the operand is constant with early exits from the optimizers. I think it's reasonable that vec_select with a non-constant operand cannot be optimized, yet. Also included is the necessary documentation tweak. 2018-09-19 Andrew Stubbs <ams@codesourcery.com> gcc/ * doc/rtl.texi: Adjust vec_select description. * simplify-rtx.c (simplify_binary_operation_1): Allow VEC_SELECT to use non-constant selectors. From-SVN: r264423
2018-07-07tree-vrp.c (vrp_int_const_binop): Change overflow type to overflow_type.Aldy Hernandez1-1/+1
* tree-vrp.c (vrp_int_const_binop): Change overflow type to overflow_type. (combine_bound): Use wide-int overflow calculation instead of rolling our own. * calls.c (maybe_warn_alloc_args_overflow): Change overflow type to overflow_type. * fold-const.c (int_const_binop_2): Same. (extract_muldiv_1): Same. (fold_div_compare): Same. (fold_abs_const): Same. * match.pd: Same. * poly-int.h (add): Same. (sub): Same. (neg): Same. (mul): Same. * predict.c (predict_iv_comparison): Same. * profile-count.c (slow_safe_scale_64bit): Same. * simplify-rtx.c (simplify_const_binary_operation): Same. * tree-chrec.c (tree_fold_binomial): Same. * tree-data-ref.c (split_constant_offset_1): Same. * tree-if-conv.c (idx_within_array_bound): Same. * tree-scalar-evolution.c (iv_can_overflow_p): Same. * tree-ssa-phiopt.c (minmax_replacement): Same. * tree-vect-loop.c (is_nonwrapping_integer_induction): Same. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Same. * vr-values.c (vr_values::adjust_range_with_scev): Same. * wide-int.cc (wi::add_large): Same. (wi::mul_internal): Same. (wi::sub_large): Same. (wi::divmod_internal): Same. * wide-int.h: Change overflow type to overflow_type for neg, add, mul, smul, umul, div_trunc, div_floor, div_ceil, div_round, mod_trunc, mod_ceil, mod_round, add_large, sub_large, mul_internal, divmod_internal. (overflow_type): New enum. (accumulate_overflow): New. cp/ * decl.c (build_enumerator): Change overflow type to overflow_type. * init.c (build_new_1): Same. From-SVN: r262494
2018-06-12Use poly_int rtx accessors instead of hwi accessorsRichard Sandiford1-13/+7
This patch generalises various places that used hwi rtx accessors so that they can handle poly_ints instead. In many cases these changes are by inspection rather than because something had shown them to be necessary. 2018-06-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * poly-int.h (can_div_trunc_p): Add new overload in which all values are poly_ints. * alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p. (memrefs_conflict_p): Likewise. (init_alias_analysis): Likewise. * cfgexpand.c (expand_debug_expr): Likewise. * combine.c (combine_simplify_rtx, force_int_to_mode): Likewise. * cse.c (fold_rtx): Likewise. * explow.c (adjust_stack, anti_adjust_stack): Likewise. * expr.c (emit_block_move_hints): Likewise. (clear_storage_hints, push_block, emit_push_insn): Likewise. (store_expr_with_bounds, reduce_to_bit_field_precision): Likewise. (emit_group_load_1): Use rtx_to_poly_int64 for group offsets. (emit_group_store): Likewise. (find_args_size_adjust): Use strip_offset. Use rtx_to_poly_int64 to read the PRE/POST_MODIFY increment. * calls.c (store_one_arg): Use strip_offset. * rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to poly_int_rtx_p. (set_noop_p): Use rtx_to_poly_int64 for the elements selected by a VEC_SELECT. * simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset. (simplify_binary_operation_1): Extend CONST_INT handling to poly_int_rtx_p. * var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather than a HOST_WIDE_INT. (hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to poly_int64. (adjust_mems, add_stores): Update accodingly. (vt_canonicalize_addr): Track polynomial offsets. (emit_note_insn_var_location): Likewise. (vt_add_function_parameter): Likewise. (vt_initialize): Likewise. From-SVN: r261530
2018-05-17[patch AArch64] Do not perform a vector splat for vector initialisation if ↵James Greenhalgh1-0/+54
it is not useful In the testcase in this patch we create an SLP vector with only two elements. Our current vector initialisation code will first duplicate the first element to both lanes, then overwrite the top lane with a new value. This duplication can be clunky and wasteful. Better would be to simply use the fact that we will always be overwriting the remaining bits, and simply move the first element to the corrcet place (implicitly zeroing all other bits). This reduces the code generation for this case, and can allow more efficient addressing modes, and other second order benefits for AArch64 code which has been vectorized to V2DI mode. Note that the change is generic enough to catch the case for any vector mode, but is expected to be most useful for 2x64-bit vectorization. Unfortunately, on its own, this would cause failures in gcc.target/aarch64/load_v2vec_lanes_1.c and gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more vec_merge and vec_duplicate for their simplifications to apply. To fix this, add a special case to the AArch64 code if we are loading from two memory addresses, and use the load_pair_lanes patterns directly. We also need a new pattern in simplify-rtx.c:simplify_ternary_operation to catch: (vec_merge:OUTER (vec_duplicate:OUTER x:INNER) (subreg:OUTER y:INNER 0) (const_int N)) And simplify it to: (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) This is similar to the existing patterns which are tested in this function, without requiring the second operand to also be a vec_duplicate. * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. * gcc.target/aarch64/vect-slp-dup.c: New. Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com> From-SVN: r260309
2018-04-25re PR middle-end/85414 (ICE: in ix86_expand_prologue, at ↵Jakub Jelinek1-2/+2
config/i386/i386.c:13810 with -Og -fgcse) PR middle-end/85414 * simplify-rtx.c (simplify_unary_operation_1) <case SIGN_EXTEND, case ZERO_EXTEND>: Pass SUBREG_REG (op) rather than op to gen_lowpart_no_emit. From-SVN: r259649
2018-04-13re PR rtl-optimization/85376 (wrong code with -Og -fno-dce -fgcse ↵Jakub Jelinek1-2/+2
-fno-tree-ccp -fno-tree-copy-prop) PR rtl-optimization/85376 * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX instead of a specific value. * gcc.dg/pr85376.c: New test. From-SVN: r259377
2018-03-21re PR rtl-optimization/84989 (_mm512_broadcast_f32x4 triggers ICE in ↵Jakub Jelinek1-1/+3
simplify_const_unary_operation, at simplify-rtx.c:1731) PR rtl-optimization/84989 * simplify-rtx.c (simplify_unary_operation_1): Don't try to simplify VEC_DUPLICATE with scalar result mode. * gcc.target/i386/pr84989.c: New test. From-SVN: r258709
2018-01-20re PR target/83930 (ICE: RTL check: expected code 'const_int', have 'mem' in ↵Jakub Jelinek1-1/+2
simplify_binary_operation_1, at simplify-rtx.c:3302) PR target/83930 * simplify-rtx.c (simplify_binary_operation_1) <case UMOD>: Use UINTVAL (trueop1) instead of INTVAL (op1). * gcc.dg/pr83930.c: New test. From-SVN: r256915
2018-01-13Extra subreg fold for variable-length CONST_VECTORsRichard Sandiford1-11/+24
The SVE support for the new CONST_VECTOR encoding needs to be able to extract the first N bits of the vector and duplicate it. This patch adds a simplify_subreg rule for that. The code is covered by the gcc.target/aarch64/sve_slp_*.c tests. 2018-01-13 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes parameter and use it instead of GET_MODE_SIZE (innermode). Use inner_bytes * BITS_PER_UNIT instead of GET_MODE_BITSIZE (innermode). Use CEIL (inner_bytes, GET_MODE_UNIT_SIZE (innermode)) instead of GET_MODE_NUNITS (innermode). Also add a first_elem parameter. Change innermode from fixed_mode_size to machine_mode. (simplify_subreg): Update call accordingly. Handle a constant-sized subreg of a variable-length CONST_VECTOR. From-SVN: r256610
2018-01-08PR target/83663 - Revert r255946Vidya Praveen1-51/+0
gcc/ 2018-01-08 Vidya Praveen <vidyapraveen@arm.com> PR target/83663 - Revert r255946 * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. gcc/testsuite/ 2018-01-08 Vidya Praveen <vidyapraveen@arm.com> PR target/83663 - Revert r255946 * gcc.target/aarch64/vect-slp-dup.c: New. From-SVN: r256346
2018-01-05[PATCH PR82439][simplify-rtx] Simplify (x | y) == x -> (y & ~x) == 0Sudakshina Das1-21/+25
This patch add support for the missing transformation of (x | y) == x -> (y & ~x) == 0. The transformation for (x & y) == x case already exists in simplify-rtx.c since 2014 as of r218503 and this patch only adds a couple of extra patterns for the IOR case. This benefits targets that have the BICS instruction to generate better code. For targets that do not have the BICS instructions, it still results in no worse code generation and gives out 2 instructions. ChangeLog Entries: *** gcc/ChangeLog *** 2018-01-05 Sudakshina Das <sudi.das@arm.com> PR target/82439 * simplify-rtx.c (simplify_relational_operation_1): Add simplifications of (x|y) == x for BICS pattern. *** gcc/testsuite/ChangeLog *** 2018-01-05 Sudakshina Das <sudi.das@arm.com> PR target/82439 * gcc.target/aarch64/bics_5.c: New test. * gcc.target/arm/bics_5.c: Likewise. From-SVN: r256275
2018-01-03poly_int: GET_MODE_SIZERichard Sandiford1-21/+23
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16. The non-mechanical parts were handled by previous patches. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_size): Change from unsigned short to poly_uint16_pod. (mode_to_bytes): Return a poly_uint16 rather than an unsigned short. (GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. (fixed_size_mode::includes_p): Check for constant-sized modes. * genmodes.c (emit_mode_size_inline): Make mode_size_inline return a poly_uint16 rather than an unsigned short. (emit_mode_size): Change the type of mode_size from unsigned short to poly_uint16_pod. Use ZERO_COEFFS for the initializer. (emit_mode_adjustments): Cope with polynomial vector sizes. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_SIZE. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_SIZE. * auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial. * builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise. * caller-save.c (setup_save_areas): Likewise. (replace_reg_with_saved_mem): Likewise. * calls.c (emit_library_call_value_1): Likewise. * combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise. * combine.c (simplify_set, make_extraction, simplify_shift_const_1) (gen_lowpart_for_combine): Likewise. * convert.c (convert_to_integer_1): Likewise. * cse.c (equiv_constant, cse_insn): Likewise. * cselib.c (autoinc_split, cselib_hash_rtx): Likewise. (cselib_subst_to_values): Likewise. * dce.c (word_dce_process_block): Likewise. * df-problems.c (df_word_lr_mark_ref): Likewise. * dwarf2cfi.c (init_one_dwarf_reg_size): Likewise. * dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor) (concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor) (rtl_for_decl_location): Likewise. * emit-rtl.c (gen_highpart, widen_memory_access): Likewise. * expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise. * expr.c (emit_group_load_1, clear_storage_hints): Likewise. (emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise. (expand_expr_real_1): Likewise. * function.c (assign_parm_setup_block_p, assign_parm_setup_block) (pad_below): Likewise. * gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise. * gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise. * ira.c (get_subreg_tracking_sizes): Likewise. * ira-build.c (ira_create_allocno_objects): Likewise. * ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise. (ira_sort_regnos_for_alter_reg): Likewise. * ira-costs.c (record_operand_costs): Likewise. * lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn) (resolve_simple_move): Likewise. * lra-constraints.c (get_reload_reg, operands_match_p): Likewise. (process_addr_reg, simplify_operand_subreg, curr_insn_transform) (lra_constraints): Likewise. (CONST_POOL_OK_P): Reject variable-sized modes. * lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare) (add_pseudo_to_slot, lra_spill): Likewise. * omp-low.c (omp_clause_aligned_alignment): Likewise. * optabs-query.c (get_best_extraction_insn): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise. (expand_mult_highpart, valid_multiword_target_p): Likewise. * recog.c (offsettable_address_addr_space_p): Likewise. * regcprop.c (maybe_mode_change): Likewise. * reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise. * regrename.c (build_def_use): Likewise. * regstat.c (dump_reg_info): Likewise. * reload.c (complex_word_subreg_p, push_reload, find_dummy_reload) (find_reloads, find_reloads_subreg_address): Likewise. * reload1.c (eliminate_regs_1): Likewise. * rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise. * simplify-rtx.c (avoid_constant_pool_reference): Likewise. (simplify_binary_operation_1, simplify_subreg): Likewise. * targhooks.c (default_function_arg_padding): Likewise. (default_hard_regno_nregs, default_class_max_nregs): Likewise. * tree-cfg.c (verify_gimple_assign_binary): Likewise. (verify_gimple_assign_ternary): Likewise. * tree-inline.c (estimate_move_cost): Likewise. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise. (get_address_cost_ainc): Likewise. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): Likewise. * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise. (vectorizable_reduction): Likewise. * tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift) (vectorizable_operation, vectorizable_load): Likewise. * tree.c (build_same_sized_truth_vector_type): Likewise. * valtrack.c (cleanup_auto_inc_dec): Likewise. * var-tracking.c (emit_note_insn_var_location): Likewise. * config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>. (ADDR_VEC_ALIGN): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256201
2018-01-03poly_int: GET_MODE_NUNITSRichard Sandiford1-57/+82
This patch changes GET_MODE_NUNITS from unsigned char to poly_uint16, although it remains a macro when compiling target code with NUM_POLY_INT_COEFFS == 1. We can handle permuted loads and stores for variable nunits if the number of statements is a power of 2, but not otherwise. The to_constant call in make_vector_type goes away in a later patch. 2018-01-03 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (mode_nunits): Change from unsigned char to poly_uint16_pod. (ONLY_FIXED_SIZE_MODES): New macro. (pod_mode::measurement_type, scalar_int_mode::measurement_type) (scalar_float_mode::measurement_type, scalar_mode::measurement_type) (complex_mode::measurement_type, fixed_size_mode::measurement_type): New typedefs. (mode_to_nunits): Return a poly_uint16 rather than an unsigned short. (GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES, or if measurement_type is not polynomial. * genmodes.c (ZERO_COEFFS): New macro. (emit_mode_nunits_inline): Make mode_nunits_inline return a poly_uint16. (emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod. Use ZERO_COEFFS when emitting initializers. * data-streamer.h (bp_pack_poly_value): New function. (bp_unpack_poly_value): Likewise. * lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value for GET_MODE_NUNITS. * lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value for GET_MODE_NUNITS. * tree.c (make_vector_type): Remove temporary shim and make the real function take the number of units as a poly_uint64 rather than an int. (build_vector_type_for_mode): Handle polynomial nunits. * dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise. * emit-rtl.c (const_vec_series_p_1): Likewise. (gen_rtx_CONST_VECTOR): Likewise. * fold-const.c (test_vec_duplicate_folding): Likewise. * genrecog.c (validate_pattern): Likewise. * optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise. * optabs-tree.c (expand_vec_cond_expr_p): Likewise. * optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise. (shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise. (expand_vec_cond_expr, expand_mult_highpart): Likewise. * rtlanal.c (subreg_get_info): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_grouped_load_supported): Likewise. * tree-vect-generic.c (type_for_widest_vector_mode): Likewise. * tree-vect-loop.c (have_whole_vector_shift): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_const_unary_operation, simplify_binary_operation_1) (simplify_const_binary_operation, simplify_ternary_operation) (test_vector_ops_duplicate, test_vector_ops): Likewise. (simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode instead of CONST_VECTOR_NUNITS. * varasm.c (output_constant_pool_2): Likewise. * rtx-vector-builder.c (rtx_vector_builder::build): Only include the explicit-encoded elements in the XVEC for variable-length vectors. gcc/ada/ * gcc-interface/misc.c (enumerate_modes): Handle polynomial GET_MODE_NUNITS. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r256195
2018-01-03Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r256169
2018-01-02Make CONST_VECTOR_ELT handle implicitly-encoded elementsRichard Sandiford1-3/+3
This patch makes CONST_VECTOR_ELT handle implicitly-encoded elements, in a similar way to VECTOR_CST_ELT. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * rtl.h (CONST_VECTOR_ELT): Redefine to const_vector_elt. (const_vector_encoded_nelts): New function. (CONST_VECTOR_NUNITS): Redefine to use GET_MODE_NUNITS. (const_vector_int_elt, const_vector_elt): Declare. * emit-rtl.c (const_vector_int_elt_1): New function. (const_vector_elt): Likewise. * simplify-rtx.c (simplify_immed_subreg): Avoid taking the address of CONST_VECTOR_ELT. From-SVN: r256104
2018-01-02Use CONST_VECTOR_ELT instead of XVECEXPRichard Sandiford1-2/+2
This patch replaces target-independent uses of XVECEXP with uses of CONST_VECTOR_ELT. This kind of replacement isn't necessary for code specific to targets other than AArch64. 2018-01-02 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * simplify-rtx.c (simplify_const_binary_operation): Use CONST_VECTOR_ELT instead of XVECEXP. From-SVN: r256101
2017-12-28Use valid_for_const_vector_p instead of CONSTANT_PRichard Sandiford1-1/+2
This patch makes the VEC_SERIES code use valid_for_const_vector_p instead of CONSTANT_P, to match what we already do for VEC_DUPLICATE. This showed up as a failure in gcc.c-torture/execute/pr28982b.c for -m32 on x86_64-linux-gnu after later patches. 2017-12-28 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * emit-rtl.c (gen_const_vec_series): Use valid_for_const_vector_p instead of CONSTANT_P. (gen_vec_series): Likewise. * simplify-rtx.c (simplify_binary_operation_1): Likewise. From-SVN: r256023
2017-12-21[patch AArch64] Do not perform a vector splat for vector initialisation if ↵James Greenhalgh1-0/+51
it is not useful Our current vector initialisation code will first duplicate the first element to both lanes, then overwrite the top lane with a new value. This duplication can be clunky and wasteful. Better would be to simply use the fact that we will always be overwriting the remaining bits, and simply move the first element to the corrcet place (implicitly zeroing all other bits). We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to ensure we can still simplify: (vec_merge:OUTER (vec_duplicate:OUTER x:INNER) (subreg:OUTER y:INNER 0) (const_int N)) To: (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) --- gcc/ * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. gcc/testsuite/ * gcc.target/aarch64/vect-slp-dup.c: New. From-SVN: r255946
2017-12-21re PR rtl-optimization/82973 (ICE in output_constant_pool_2, at ↵Jakub Jelinek1-5/+5
varasm.c:3896 on aarch64) PR rtl-optimization/82973 * emit-rtl.h (valid_for_const_vec_duplicate_p): Rename to ... (valid_for_const_vector_p): ... this. * emit-rtl.c (valid_for_const_vec_duplicate_p): Rename to ... (valid_for_const_vector_p): ... this. Adjust function comment. (gen_vec_duplicate): Adjust caller. * optabs.c (expand_vector_broadcast): Likewise. * simplify-rtx.c (simplify_const_unary_operation): Don't optimize into CONST_VECTOR if some element isn't simplified valid_for_const_vector_p constant. (simplify_const_binary_operation): Likewise. Use CONST_FIXED_P macro instead of GET_CODE == CONST_FIXED. (simplify_subreg): Use CONST_FIXED_P macro instead of GET_CODE == CONST_FIXED. * gfortran.dg/pr82973.f90: New test. From-SVN: r255938
2017-12-21poly_int: get_inner_reference & co.Richard Sandiford1-9/+5
This patch makes get_inner_reference and ptr_difference_const return the bit size and bit position as poly_int64s rather than HOST_WIDE_INTS. The non-mechanical changes were handled by previous patches. 2017-12-21 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * tree.h (get_inner_reference): Return the bitsize and bitpos as poly_int64_pods rather than HOST_WIDE_INT. * fold-const.h (ptr_difference_const): Return the pointer difference as a poly_int64_pod rather than a HOST_WIDE_INT. * expr.c (get_inner_reference): Return the bitsize and bitpos as poly_int64_pods rather than HOST_WIDE_INT. (expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial offsets and sizes. * fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64 rather than a HOST_WIDE_INT. Update call to get_inner_reference. (optimize_bit_field_compare): Update call to get_inner_reference. (decode_field_reference): Likewise. (fold_unary_loc): Track polynomial offsets and sizes. (split_address_to_core_and_offset): Return the bitpos as a poly_int64_pod rather than a HOST_WIDE_INT. (ptr_difference_const): Likewise for the pointer difference. * asan.c (instrument_derefs): Track polynomial offsets and sizes. * config/mips/mips.c (r10k_safe_mem_expr_p): Likewise. * dbxout.c (dbxout_expand_expr): Likewise. * dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref) (loc_list_from_tree_1, fortran_common): Likewise. * gimple-laddress.c (pass_laddress::execute): Likewise. * gimple-ssa-store-merging.c (find_bswap_or_nop_load): Likewise. * gimplify.c (gimplify_scan_omp_clauses): Likewise. * simplify-rtx.c (delegitimize_mem_from_attrs): Likewise. * tree-affine.c (tree_to_aff_combination): Likewise. (get_inner_reference_aff): Likewise. * tree-data-ref.c (split_constant_offset_1): Likewise. (dr_analyze_innermost): Likewise. * tree-scalar-evolution.c (interpret_rhs_expr): Likewise. * tree-sra.c (ipa_sra_check_caller): Likewise. * tree-vect-data-refs.c (vect_check_gather_scatter): Likewise. * ubsan.c (maybe_instrument_pointer_overflow): Likewise. (instrument_bool_enum_load, instrument_object_size): Likewise. * gimple-ssa-strength-reduction.c (slsr_process_ref): Update call to get_inner_reference. * hsa-gen.c (gen_hsa_addr): Likewise. * sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise. * tsan.c (instrument_expr): Likewise. * match.pd: Update call to ptr_difference_const. gcc/ada/ * gcc-interface/trans.c (Attribute_to_gnu): Track polynomial offsets and sizes. * gcc-interface/utils2.c (build_unary_op): Likewise. gcc/cp/ * constexpr.c (check_automatic_or_tls): Track polynomial offsets and sizes. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255914
2017-12-20poly_int: SUBREG_BYTERichard Sandiford1-35/+42
This patch changes SUBREG_BYTE from an int to a poly_int. Since valid SUBREG_BYTEs must be contained within the mode of the SUBREG_REG, the required range is the same as for GET_MODE_SIZE, i.e. unsigned short. The patch therefore uses poly_uint16(_pod) for the SUBREG_BYTE. Using poly_uint16_pod rtx fields requires a new field code ('p'). Since there are no other uses of 'p' besides SUBREG_BYTE, the patch doesn't add an XPOLY or whatever; all uses should go via SUBREG_BYTE instead. The patch doesn't bother implementing 'p' support for legacy define_peepholes, since none of the remaining ones have subregs in their patterns. As it happened, the rtl documentation used SUBREG as an example of a code with mixed field types, accessed via XEXP (x, 0) and XINT (x, 1). Since there's no direct replacement for XINT, and since people should never use it even if there were, the patch changes the example to use INT_LIST instead. The patch also changes subreg-related helper functions so that they too take and return polynomial offsets. This makes the patch quite big, but it's mostly mechanical. The patch generally sticks to existing choices wrt signedness. 2017-12-20 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/rtl.texi: Update documentation of SUBREG_BYTE. Document the 'p' format code. Use INT_LIST rather than SUBREG as the example of a code with an XINT and an XEXP. Remove the implication that accessing an rtx field using XINT is expected to work. * rtl.def (SUBREG): Change format from "ei" to "ep". * rtl.h (rtunion::rt_subreg): New field. (XCSUBREG): New macro. (SUBREG_BYTE): Use it. (subreg_shape): Change offset from an unsigned int to a poly_uint16. Update constructor accordingly. (subreg_shape::operator ==): Update accordingly. (subreg_shape::unique_id): Return an unsigned HOST_WIDE_INT rather than an unsigned int. (subreg_lsb, subreg_lowpart_offset, subreg_highpart_offset): Return a poly_uint64 rather than an unsigned int. (subreg_lsb_1): Likewise. Take the offset as a poly_uint64 rather than an unsigned int. (subreg_size_offset_from_lsb, subreg_size_lowpart_offset) (subreg_size_highpart_offset): Return a poly_uint64 rather than an unsigned int. Take the sizes as poly_uint64s. (subreg_offset_from_lsb): Return a poly_uint64 rather than an unsigned int. Take the shift as a poly_uint64 rather than an unsigned int. (subreg_regno_offset, subreg_offset_representable_p): Take the offset as a poly_uint64 rather than an unsigned int. (simplify_subreg_regno): Likewise. (byte_lowpart_offset): Return the memory offset as a poly_int64 rather than an int. (subreg_memory_offset): Likewise. Take the subreg offset as a poly_uint64 rather than an unsigned int. (simplify_subreg, simplify_gen_subreg, subreg_get_info) (gen_rtx_SUBREG, validate_subreg): Take the subreg offset as a poly_uint64 rather than an unsigned int. * rtl.c (rtx_format): Describe 'p' in comment. (copy_rtx, rtx_equal_p_cb, rtx_equal_p): Handle 'p'. * emit-rtl.c (validate_subreg, gen_rtx_SUBREG): Take the subreg offset as a poly_uint64 rather than an unsigned int. (byte_lowpart_offset): Return the memory offset as a poly_int64 rather than an int. (subreg_memory_offset): Likewise. Take the subreg offset as a poly_uint64 rather than an unsigned int. (subreg_size_lowpart_offset, subreg_size_highpart_offset): Take the mode sizes as poly_uint64s rather than unsigned ints. Return a poly_uint64 rather than an unsigned int. (subreg_lowpart_p): Treat subreg offsets as poly_ints. (copy_insn_1): Handle 'p'. * rtlanal.c (set_noop_p): Treat subregs offsets as poly_uint64s. (subreg_lsb_1): Take the subreg offset as a poly_uint64 rather than an unsigned int. Return the shift in the same way. (subreg_lsb): Return the shift as a poly_uint64 rather than an unsigned int. (subreg_size_offset_from_lsb): Take the sizes and shift as poly_uint64s rather than unsigned ints. Return the offset as a poly_uint64. (subreg_get_info, subreg_regno_offset, subreg_offset_representable_p) (simplify_subreg_regno): Take the offset as a poly_uint64 rather than an unsigned int. * rtlhash.c (add_rtx): Handle 'p'. * genemit.c (gen_exp): Likewise. * gengenrtl.c (type_from_format, gendef): Likewise. * gensupport.c (subst_pattern_match, get_alternatives_number) (collect_insn_data, alter_predicate_for_insn, alter_constraints) (subst_dup): Likewise. * gengtype.c (adjust_field_rtx_def): Likewise. * genrecog.c (find_operand, find_matching_operand, validate_pattern) (match_pattern_2): Likewise. (rtx_test::SUBREG_FIELD): New rtx_test::kind_enum. (rtx_test::subreg_field): New function. (operator ==, safe_to_hoist_p, transition_parameter_type) (print_nonbool_test, print_test): Handle SUBREG_FIELD. * genattrtab.c (attr_rtx_1): Say that 'p' is deliberately not handled. * genpeep.c (match_rtx): Likewise. * print-rtl.c (print_poly_int): Include if GENERATOR_FILE too. (rtx_writer::print_rtx_operand): Handle 'p'. (print_value): Handle SUBREG. * read-rtl.c (apply_int_iterator): Likewise. (rtx_reader::read_rtx_operand): Handle 'p'. * alias.c (rtx_equal_for_memref_p): Likewise. * cselib.c (rtx_equal_for_cselib_1, cselib_hash_rtx): Likewise. * caller-save.c (replace_reg_with_saved_mem): Treat subreg offsets as poly_ints. * calls.c (expand_call): Likewise. * combine.c (combine_simplify_rtx, expand_field_assignment): Likewise. (make_extraction, gen_lowpart_for_combine): Likewise. * loop-invariant.c (hash_invariant_expr_1, invariant_expr_equal_p): Likewise. * cse.c (remove_invalid_subreg_refs): Take the offset as a poly_uint64 rather than an unsigned int. Treat subreg offsets as poly_ints. (exp_equiv_p): Handle 'p'. (hash_rtx_cb): Likewise. Treat subreg offsets as poly_ints. (equiv_constant, cse_insn): Treat subreg offsets as poly_ints. * dse.c (find_shift_sequence): Likewise. * dwarf2out.c (rtl_for_decl_location): Likewise. * expmed.c (extract_low_bits): Likewise. * expr.c (emit_group_store, undefined_operand_subword_p): Likewise. (expand_expr_real_2): Likewise. * final.c (alter_subreg): Likewise. (leaf_renumber_regs_insn): Handle 'p'. * function.c (assign_parm_find_stack_rtl, assign_parm_setup_stack): Treat subreg offsets as poly_ints. * fwprop.c (forward_propagate_and_simplify): Likewise. * ifcvt.c (noce_emit_move_insn, noce_emit_cmove): Likewise. * ira.c (get_subreg_tracking_sizes): Likewise. * ira-conflicts.c (go_through_subreg): Likewise. * ira-lives.c (process_single_reg_class_operands): Likewise. * jump.c (rtx_renumbered_equal_p): Likewise. Handle 'p'. * lower-subreg.c (simplify_subreg_concatn): Take the subreg offset as a poly_uint64 rather than an unsigned int. (simplify_gen_subreg_concatn, resolve_simple_move): Treat subreg offsets as poly_ints. * lra-constraints.c (operands_match_p): Handle 'p'. (match_reload, curr_insn_transform): Treat subreg offsets as poly_ints. * lra-spills.c (assign_mem_slot): Likewise. * postreload.c (move2add_valid_value_p): Likewise. * recog.c (general_operand, indirect_operand): Likewise. * regcprop.c (copy_value, maybe_mode_change): Likewise. (copyprop_hardreg_forward_1): Likewise. * reginfo.c (simplifiable_subregs_hasher::hash, simplifiable_subregs) (record_subregs_of_mode): Likewise. * rtlhooks.c (gen_lowpart_general, gen_lowpart_if_possible): Likewise. * reload.c (operands_match_p): Handle 'p'. (find_reloads_subreg_address): Treat subreg offsets as poly_ints. * reload1.c (alter_reg, choose_reload_regs): Likewise. (compute_reload_subreg_offset): Likewise, and return an poly_int64. * simplify-rtx.c (simplify_truncation, simplify_binary_operation_1): (test_vector_ops_duplicate): Treat subreg offsets as poly_ints. (simplify_const_poly_int_tests<N>::run): Likewise. (simplify_subreg, simplify_gen_subreg): Take the subreg offset as a poly_uint64 rather than an unsigned int. * valtrack.c (debug_lowpart_subreg): Likewise. * var-tracking.c (var_lowpart): Likewise. (loc_cmp): Handle 'p'. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255882
2017-12-20poly_int: MEM_OFFSET and MEM_SIZERichard Sandiford1-12/+6
This patch changes the MEM_OFFSET and MEM_SIZE memory attributes from HOST_WIDE_INT to poly_int64. Most of it is mechanical, but there is one nonbovious change in widen_memory_access. Previously the main while loop broke with: /* Similarly for the decl. */ else if (DECL_P (attrs.expr) && DECL_SIZE_UNIT (attrs.expr) && TREE_CODE (DECL_SIZE_UNIT (attrs.expr)) == INTEGER_CST && compare_tree_int (DECL_SIZE_UNIT (attrs.expr), size) >= 0 && (! attrs.offset_known_p || attrs.offset >= 0)) break; but it seemed wrong to optimistically assume the best case when the offset isn't known (and thus might be negative). As it happens, the "! attrs.offset_known_p" condition was always false, because we'd already nullified attrs.expr in that case: /* If we don't know what offset we were at within the expression, then we can't know if we've overstepped the bounds. */ if (! attrs.offset_known_p) attrs.expr = NULL_TREE; The patch therefore drops "! attrs.offset_known_p ||" when converting the offset check to the may/must interface. 2017-12-20 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * rtl.h (mem_attrs): Add a default constructor. Change size and offset from HOST_WIDE_INT to poly_int64. * emit-rtl.h (set_mem_offset, set_mem_size, adjust_address_1) (adjust_automodify_address_1, set_mem_attributes_minus_bitpos) (widen_memory_access): Take the sizes and offsets as poly_int64s rather than HOST_WIDE_INTs. * alias.c (ao_ref_from_mem): Handle the new form of MEM_OFFSET. (offset_overlap_p): Take poly_int64s rather than HOST_WIDE_INTs and ints. (adjust_offset_for_component_ref): Change the offset from a HOST_WIDE_INT to a poly_int64. (nonoverlapping_memrefs_p): Track polynomial offsets and sizes. * cfgcleanup.c (merge_memattrs): Update after mem_attrs changes. * dce.c (find_call_stack_args): Likewise. * dse.c (record_store): Likewise. * dwarf2out.c (tls_mem_loc_descriptor, dw_sra_loc_expr): Likewise. * print-rtl.c (rtx_writer::print_rtx): Likewise. * read-rtl-function.c (test_loading_mem): Likewise. * rtlanal.c (may_trap_p_1): Likewise. * simplify-rtx.c (delegitimize_mem_from_attrs): Likewise. * var-tracking.c (int_mem_offset, track_expr_p): Likewise. * emit-rtl.c (mem_attrs_eq_p, get_mem_align_offset): Likewise. (mem_attrs::mem_attrs): New function. (set_mem_attributes_minus_bitpos): Change bitpos from a HOST_WIDE_INT to poly_int64. (set_mem_alias_set, set_mem_addr_space, set_mem_align, set_mem_expr) (clear_mem_offset, clear_mem_size, change_address) (get_spill_slot_decl, set_mem_attrs_for_spill): Directly initialize mem_attrs. (set_mem_offset, set_mem_size, adjust_address_1) (adjust_automodify_address_1, offset_address, widen_memory_access): Likewise. Take poly_int64s rather than HOST_WIDE_INT. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255875
2017-12-20poly_int: rtx constantsRichard Sandiford1-5/+145
This patch adds an rtl representation of poly_int values. There were three possible ways of doing this: (1) Add a new rtl code for the poly_ints themselves and store the coefficients as trailing wide_ints. This would give constants like: (const_poly_int [c0 c1 ... cn]) The runtime value would be: c0 + c1 * x1 + ... + cn * xn (2) Like (1), but use rtxes for the coefficients. This would give constants like: (const_poly_int [(const_int c0) (const_int c1) ... (const_int cn)]) although the coefficients could be const_wide_ints instead of const_ints where appropriate. (3) Add a new rtl code for the polynomial indeterminates, then use them in const wrappers. A constant like c0 + c1 * x1 would then look like: (const:M (plus:M (mult:M (const_param:M x1) (const_int c1)) (const_int c0))) There didn't seem to be that much to choose between them. The main advantage of (1) is that it's a more efficient representation and that we can refer to the cofficients directly as wide_int_storage. 2017-12-20 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/rtl.texi (const_poly_int): Document. Also document the rtl sharing behavior. * gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT. * rtl.h (const_poly_int_def): New struct. (rtx_def::u): Add a cpi field. (CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT. (CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros. (wi::rtx_to_poly_wide_ref): New typedef (const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64) (poly_int_rtx_p): New functions. (trunc_int_for_mode): Declare a poly_int64 version. (plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT. (immed_wide_int_const): Take a poly_wide_int_ref rather than a wide_int_ref. (strip_offset): Declare. (strip_offset_and_add): New function. * rtl.def (CONST_POLY_INT): New rtx code. * rtl.c (rtx_size): Handle CONST_POLY_INT. (shared_const_p): Use poly_int_rtx_p. * emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT. (gen_int_shift_amount): Likewise. * emit-rtl.c (const_poly_int_hasher): New class. (const_poly_int_htab): New variable. (init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1. (const_poly_int_hasher::hash): New function. (const_poly_int_hasher::equal): Likewise. (gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT. (immed_wide_int_const): Rename to... (immed_wide_int_const_1): ...this and make static. (immed_wide_int_const): New function, taking a poly_wide_int_ref instead of a wide_int_ref. (gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT. (gen_lowpart_common): Handle CONST_POLY_INT. * cse.c (hash_rtx_cb, equiv_constant): Likewise. * cselib.c (cselib_hash_rtx): Likewise. * dwarf2out.c (const_ok_for_output_1): Likewise. * expr.c (convert_modes): Likewise. * print-rtl.c (rtx_writer::print_rtx, print_value): Likewise. * rtlhash.c (add_rtx): Likewise. * explow.c (trunc_int_for_mode): Add a poly_int64 version. (plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT. Handle existing CONST_POLY_INT rtxes. * expmed.h (expand_shift): Take a poly_int64 instead of a HOST_WIDE_INT. * expmed.c (expand_shift): Likewise. * rtlanal.c (strip_offset): New function. (commutative_operand_precedence): Give CONST_POLY_INT the same precedence as CONST_DOUBLE and put CONST_WIDE_INT between that and CONST_INT. * rtl-tests.c (const_poly_int_tests): New struct. (rtl_tests_c_tests): Use it. * simplify-rtx.c (simplify_const_unary_operation): Handle CONST_POLY_INT. (simplify_const_binary_operation): Likewise. (simplify_binary_operation_1): Fold additions of symbolic constants and CONST_POLY_INTs. (simplify_subreg): Handle extensions and truncations of CONST_POLY_INTs. (simplify_const_poly_int_tests): New struct. (simplify_rtx_c_tests): Use it. * wide-int.h (storage_ref): Add default constructor. (wide_int_ref_storage): Likewise. (trailing_wide_ints): Use GTY((user)). (trailing_wide_ints::operator[]): Add a const version. (trailing_wide_ints::get_precision): New function. (trailing_wide_ints::extra_size): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255862
2017-12-20Add a gen_int_shift_amount helper functionRichard Sandiford1-11/+18
This patch adds a helper routine that constructs rtxes for constant shift amounts, given the mode of the value being shifted. As well as helping with the SVE patches, this is one step towards allowing CONST_INTs to have a real mode. One long-standing problem has been to decide what the mode of a shift count should be for arbitrary rtxes (as opposed to those directly tied to a target pattern). Realistic choices would be the mode of the shifted elements, word_mode, QImode, a 64-bit mode, or the same mode as the shift optabs (in which case what should the mode be when the target doesn't have a pattern?) For now the patch picks a 64-bit mode, but with a ??? comment. 2017-12-20 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * emit-rtl.h (gen_int_shift_amount): Declare. * emit-rtl.c (gen_int_shift_amount): New function. * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount instead of GEN_INT. * calls.c (shift_return_value): Likewise. * cse.c (fold_rtx): Likewise. * dse.c (find_shift_sequence): Likewise. * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1) (expand_shift, expand_smod_pow2): Likewise. * lower-subreg.c (shift_cost): Likewise. * optabs.c (expand_superword_shift, expand_doubleword_mult) (expand_unop, expand_binop, shift_amt_for_vec_perm_mask) (expand_vec_perm_var): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_binary_operation_1): Likewise. * combine.c (try_combine, find_split_point, force_int_to_mode) (simplify_shift_const_1, simplify_shift_const): Likewise. (change_zero_ext): Likewise. Use simplify_gen_binary. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255861
2017-12-19read-rtl.c (parse_reg_note_name): Replace Yoda conditions with typical order ↵Jakub Jelinek1-5/+5
conditions. * read-rtl.c (parse_reg_note_name): Replace Yoda conditions with typical order conditions. * sel-sched.c (extract_new_fences_from): Likewise. * config/visium/constraints.md (J, K, L): Likewise. * config/visium/predicates.md (const_shift_operand): Likewise. * config/visium/visium.c (visium_legitimize_address, visium_legitimize_reload_address): Likewise. * config/m68k/m68k.c (output_reg_adjust, emit_reg_adjust): Likewise. * config/arm/arm.c (arm_block_move_unaligned_straight): Likewise. * config/avr/constraints.md (Y01, Ym1, Y02, Ym2): Likewise. * config/avr/avr-log.c (avr_vdump, avr_log_set_avr_log, SET_DUMP_DETAIL): Likewise. * config/avr/predicates.md (const_8_16_24_operand): Likewise. * config/avr/avr.c (STR_PREFIX_P, avr_popcount_each_byte, avr_is_casesi_sequence, avr_casei_sequence_check_operands, avr_set_core_architecture, avr_set_current_function, avr_legitimize_reload_address, avr_asm_len, avr_print_operand, output_movqi, output_movsisf, avr_out_plus, avr_out_bitop, avr_out_fract, avr_adjust_insn_length, avr_encode_section_info, avr_2word_insn_p, output_reload_in_const, avr_has_nibble_0xf, avr_map_decompose, avr_fold_builtin): Likewise. * config/avr/driver-avr.c (avr_devicespecs_file): Likewise. * config/avr/gen-avr-mmcu-specs.c (str_prefix_p, print_mcu): Likewise. * config/i386/i386.c (ix86_parse_stringop_strategy_string): Likewise. * config/m32c/m32c-pragma.c (m32c_pragma_memregs): Likewise. * config/m32c/m32c.c (m32c_conditional_register_usage, m32c_address_cost): Likewise. * config/m32c/predicates.md (shiftcount_operand, longshiftcount_operand): Likewise. * config/iq2000/iq2000.c (iq2000_expand_prologue): Likewise. * config/nios2/nios2.c (nios2_handle_custom_fpu_insn_option, can_use_cdx_ldstw): Likewise. * config/nios2/nios2.h (CDX_REG_P): Likewise. * config/cr16/cr16.h (RETURN_ADDR_RTX, REGNO_MODE_OK_FOR_BASE_P): Likewise. * config/cr16/cr16.md (*mov<mode>_double): Likewise. * config/cr16/cr16.c (cr16_create_dwarf_for_multi_push): Likewise. * config/h8300/h8300.c (h8300_rtx_costs, get_shift_alg): Likewise. * config/vax/constraints.md (U06, U08, U16, CN6, S08, S16): Likewise. * config/vax/vax.c (adjacent_operands_p): Likewise. * config/ft32/constraints.md (L, b, KA): Likewise. * config/ft32/ft32.c (ft32_load_immediate, ft32_expand_prologue): Likewise. * cfgexpand.c (expand_stack_alignment): Likewise. * gcse.c (insert_expr_in_table): Likewise. * print-rtl.c (rtx_writer::print_rtx_operand_codes_E_and_V): Likewise. * cgraphunit.c (cgraph_node::expand): Likewise. * ira-build.c (setup_min_max_allocno_live_range_point): Likewise. * emit-rtl.c (add_insn): Likewise. * input.c (dump_location_info): Likewise. * passes.c (NEXT_PASS): Likewise. * read-rtl-function.c (parse_note_insn_name, function_reader::read_rtx_operand_r, function_reader::parse_mem_expr): Likewise. * sched-rgn.c (sched_rgn_init): Likewise. * diagnostic-show-locus.c (layout::show_ruler): Likewise. * combine.c (find_split_point, simplify_if_then_else, force_to_mode, if_then_else_cond, simplify_shift_const_1, simplify_comparison): Likewise. * explow.c (eliminate_constant_term): Likewise. * final.c (leaf_renumber_regs_insn): Likewise. * cfgrtl.c (print_rtl_with_bb): Likewise. * genhooks.c (emit_init_macros): Likewise. * poly-int.h (maybe_ne, maybe_le, maybe_lt): Likewise. * tree-data-ref.c (conflict_fn): Likewise. * selftest.c (assert_streq): Likewise. * expr.c (store_constructor_field, expand_expr_real_1): Likewise. * fold-const.c (fold_range_test, extract_muldiv_1, fold_truth_andor, fold_binary_loc, multiple_of_p): Likewise. * reload.c (push_reload, find_equiv_reg): Likewise. * et-forest.c (et_nca, et_below): Likewise. * dbxout.c (dbxout_symbol_location): Likewise. * reorg.c (relax_delay_slots): Likewise. * dojump.c (do_compare_rtx_and_jump): Likewise. * gengtype-parse.c (type): Likewise. * simplify-rtx.c (simplify_gen_ternary, simplify_gen_relational, simplify_const_relational_operation): Likewise. * reload1.c (do_output_reload): Likewise. * dumpfile.c (get_dump_file_info_by_switch): Likewise. * gengtype.c (type_for_name): Likewise. * gimple-ssa-sprintf.c (format_directive): Likewise. ada/ * gcc-interface/trans.c (Loop_Statement_to_gnu): Replace Yoda conditions with typical order conditions. * gcc-interface/misc.c (gnat_get_array_descr_info, default_pass_by_ref): Likewise. * gcc-interface/decl.c (gnat_to_gnu_entity): Likewise. * adaint.c (__gnat_tmp_name): Likewise. c-family/ * known-headers.cc (get_stdlib_header_for_name): Replace Yoda conditions with typical order conditions. c/ * c-typeck.c (comptypes_internal, function_types_compatible_p, perform_integral_promotions, digest_init): Replace Yoda conditions with typical order conditions. * c-decl.c (check_bitfield_type_and_width): Likewise. cp/ * name-lookup.c (get_std_name_hint): Replace Yoda conditions with typical order conditions. * class.c (check_bitfield_decl): Likewise. * pt.c (convert_template_argument): Likewise. * decl.c (duplicate_decls): Likewise. * typeck.c (commonparms): Likewise. fortran/ * scanner.c (preprocessor_line): Replace Yoda conditions with typical order conditions. * dependency.c (check_section_vs_section): Likewise. * trans-array.c (gfc_conv_expr_descriptor): Likewise. jit/ * jit-playback.c (get_type, playback::compile_to_file::copy_file, playback::context::acquire_mutex): Replace Yoda conditions with typical order conditions. * libgccjit.c (gcc_jit_context_new_struct_type, gcc_jit_struct_set_fields, gcc_jit_context_new_union_type, gcc_jit_context_new_function, gcc_jit_timer_pop): Likewise. * jit-builtins.c (matches_builtin): Likewise. * jit-recording.c (recording::compound_type::set_fields, recording::fields::write_reproducer, recording::rvalue::set_scope, recording::function::validate): Likewise. * jit-logging.c (logger::decref): Likewise. From-SVN: r255831
2017-12-16Revert accidental commitRichard Sandiford1-18/+11
From-SVN: r255746
2017-12-16Add a gen_int_shift_amount helper functionRichard Sandiford1-11/+18
This patch adds a helper routine that constructs rtxes for constant shift amounts, given the mode of the value being shifted. As well as helping with the SVE patches, this is one step towards allowing CONST_INTs to have a real mode. One long-standing problem has been to decide what the mode of a shift count should be for arbitrary rtxes (as opposed to those directly tied to a target pattern). Realistic choices would be the mode of the shifted elements, word_mode, QImode, or the same mode as the shift optabs (in which case what should the mode be when the target doesn't have a pattern?) For now the patch picks the mode of the shifted elements, but with a ??? comment. 2017-11-06 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * emit-rtl.h (gen_int_shift_amount): Declare. * emit-rtl.c (gen_int_shift_amount): New function. * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount instead of GEN_INT. * calls.c (shift_return_value): Likewise. * cse.c (fold_rtx): Likewise. * dse.c (find_shift_sequence): Likewise. * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1) (expand_shift, expand_smod_pow2): Likewise. * lower-subreg.c (shift_cost): Likewise. * optabs.c (expand_superword_shift, expand_doubleword_mult) (expand_unop, expand_binop, shift_amt_for_vec_perm_mask) (expand_vec_perm_var): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_binary_operation_1): Likewise. * combine.c (try_combine, find_split_point, force_int_to_mode) (simplify_shift_const_1, simplify_shift_const): Likewise. (change_zero_ext): Likewise. Use simplify_gen_binary. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r255745
2017-11-22simplify-rtx.c (simplify_binary_operation_1): Handle the case where both ↵Jakub Jelinek1-0/+5
arguments are using gen_const_vec_series. * simplify-rtx.c (simplify_binary_operation_1) <case VEC_SERIES>: Handle the case where both arguments are using gen_const_vec_series. From-SVN: r255079
2017-11-20Fix comparison mode in simplify_ternary_operationTom de Vries1-2/+0
2017-11-20 Tom de Vries <tom@codesourcery.com> PR rtl-optimization/82020 * simplify-rtx.c (simplify_ternary_operation): Fix comparison mode of IF_THEN_ELSE condition. From-SVN: r254944
2017-11-08[simplify-rtx] Simplify vec_merge of vec_duplicates into vec_concatKyrylo Tkachov1-0/+18
Another vec_merge simplification that's missing from simplify-rtx.c is transforming a vec_merge of two vec_duplicates. For example: (set (reg:V2DF 80) (vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 84)) (vec_duplicate:V2DF (reg:DF 81)) (const_int 2))) Can be transformed into the simpler: (set (reg:V2DF 80) (vec_concat:V2DF (reg:DF 81) (reg:DF 84))) I believe this should always be beneficial. I'm still looking into finding a small testcase demonstrating this, but on aarch64 SPEC I've seen this eliminate some really bizzare codegen where GCC was generating nonsense like: ldr q18, [sp, 448] ins v18.d[0], v23.d[0] ins v18.d[1], v22.d[0] With q18 being pushed and popped off the stack in the prologue and epilogue of the function! These are large files from SPEC that I haven't been able to analyse yet as to why GCC even attempts to do that, but with this patch it doesn't try to load a register and overwrite all its lanes. This patch shaves off about 5k of code size from zeusmp on aarch64 at -O3, so I believe it's a good thing to do. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge of two vec_duplicates into a vec_concat. From-SVN: r254550
2017-11-08vec_merge + vec_duplicate + vec_concat simplificationKyrylo Tkachov1-0/+19
Another vec_merge simplification that's missing is transforming: (vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N)) into (vec_concat x z) if N == 1 (0b01) or (vec_concat y x) if N == 2 (0b10) For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern: (set (reg:V2DI 78 [ x ]) (vec_concat:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]) (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))) rather than the more complex: (set (reg:V2DI 78 [ x ]) (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ]) (const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])) (vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])) (const_int 2 [0x2]))) We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate the effectiveness of the simplify-rtx change, so I've kept them together as one patch. Now for the testcase in the patch we can generate: construct_lanedi: ldr q0, [x0] ret construct_lanedf: ldr q0, [x0] ret instead of: construct_lanedi: ld1r {v0.2d}, [x0] ldr x0, [x0, 8] ins v0.d[1], x0 ret construct_lanedf: ld1r {v0.2d}, [x0] ldr d1, [x0, 8] ins v0.d[1], v1.d[0] ret The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will allow the D-register addressing modes during register allocation/address mode selection, which will produce invalid assembly. Bootstrapped and tested on aarch64-none-linux-gnu. * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): Simplify vec_merge of vec_duplicate and vec_concat. * config/aarch64/constraints.md (Utq): New constraint. * config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New define_insn. * gcc.target/aarch64/load_v2vec_lanes_1.c: New test. From-SVN: r254549
2017-11-08Simplify vec_merge of vec_duplicate with const_vectorKyrylo Tkachov1-0/+16
I'm trying to improve some of the RTL-level handling of vector lane operations on aarch64 and that involves dealing with a lot of vec_merge operations. One simplification that I noticed missing from simplify-rtx are combinations of vec_merge with vec_duplicate. In this particular case: (vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N)) which can be replaced with (vec_concat (X) (B)) if N == 1 (0b01) or (vec_concat (A) (X)) if N == 2 (0b10). For the aarch64 testcase in this patch this simplifications allows us to try to combine: (set (reg:V2DI 77 [ x ]) (vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]) (const_int 0 [0]))) instead of the more complex: (set (reg:V2DI 77 [ x ]) (vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])) (const_vector:V2DI [ (const_int 0 [0]) (const_int 0 [0]) ]) (const_int 1 [0x1]))) For the simplified form above we already have an aarch64 pattern: *aarch64_combinez<mode> which is missing a DI/DFmode version due to an oversight, so this patch extends that pattern as well to use the VDC mode iterator that includes DI and DFmode (as well as V2HF which VD_BHSI was missing). The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so I didn't split them into separate patches. Before this for the testcase we'd generate: construct_lanedi: movi v0.4s, 0 ldr x0, [x0] ins v0.d[0], x0 ret construct_lanedf: movi v0.2d, 0 ldr d1, [x0] ins v0.d[0], v1.d[0] ret but now we can generate: construct_lanedi: ldr d0, [x0] ret construct_lanedf: ldr d0, [x0] ret Bootstrapped and tested on aarch64-none-linux-gnu. * simplify-rtx.c (simplify_ternary_operation, VEC_MERGE): Simplify vec_merge of vec_duplicate and const_vector. * config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero): New predicate. * config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC mode iterator. Update predicate on operand 1 to handle non-const_vec constants. Delete constraints. (*aarch64_combinez_be<mode>): Likewise for operand 2. * gcc.target/aarch64/construct_lane_zero_1.c: New test. From-SVN: r254548
2017-11-01Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.cRichard Sandiford1-46/+15
This patch avoids some calculations of the form: GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode) in simplify-rtx.c. If we're dealing with CONST_VECTORs, it's better to use CONST_VECTOR_NUNITS, since that remains constant even after the SVE patches. In other cases we can get the number from GET_MODE_NUNITS. 2017-11-01 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS and CONST_VECTOR_NUNITS instead of computing the number of units from the byte sizes of the vector and element. (simplify_binary_operation_1): Likewise. (simplify_const_binary_operation): Likewise. (simplify_ternary_operation): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r254311
2017-11-01Add a fixed_size_mode classRichard Sandiford1-5/+14
This patch adds a fixed_size_mode machine_mode wrapper for modes that are known to have a fixed size. That applies to all current modes, but future patches will add support for variable-sized modes. The use of this class should be pretty restricted. One important use case is to hold the mode of static data, which can never be variable-sized with current file formats. Another is to hold the modes of registers involved in __builtin_apply and __builtin_result, since those interfaces don't cope well with variable-sized data. The class can also be useful when reinterpreting the contents of a fixed-length bit string as a different kind of value. 2017-11-01 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * machmode.h (fixed_size_mode): New class. * rtl.h (get_pool_mode): Return fixed_size_mode. * gengtype.c (main): Add fixed_size_mode. * target.def (get_raw_result_mode): Return a fixed_size_mode. (get_raw_arg_mode): Likewise. * doc/tm.texi: Regenerate. * targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode. * targhooks.c (default_get_reg_raw_mode): Likewise. * config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise. * config/mips/mips.c (mips_get_reg_raw_mode): Likewise. * config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise. (msp430_get_raw_result_mode): Likewise. * config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode> * dbxout.c (dbxout_parms): Require fixed-size modes. * expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise. * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise. * omp-low.c (lower_oacc_reductions): Likewise. * simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes. (simplify_subreg): Update accordingly. * varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode. (force_const_mem): Update accordingly. Return NULL_RTX for modes that aren't fixed-size. (get_pool_mode): Return a fixed_size_mode. (output_constant_pool_2): Take a fixed_size_mode. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r254300
2017-11-01Add a VEC_SERIES rtl codeRichard Sandiford1-1/+135
This patch adds an rtl representation of a vector linear series of the form: a[I] = BASE + I * STEP Like vec_duplicate; - the new rtx can be used for both constant and non-constant vectors - when used for constant vectors it is wrapped in a (const ...) - the constant form is only used for variable-length vectors; fixed-length vectors still use CONST_VECTOR At the moment the code is restricted to integer elements, to avoid concerns over floating-point rounding. 2017-11-01 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * doc/rtl.texi (vec_series): Document. (const): Say that the operand can be a vec_series. * rtl.def (VEC_SERIES): New rtx code. * rtl.h (const_vec_series_p_1): Declare. (const_vec_series_p): New function. * emit-rtl.h (gen_const_vec_series): Declare. (gen_vec_series): Likewise. * emit-rtl.c (const_vec_series_p_1, gen_const_vec_series) (gen_vec_series): Likewise. * optabs.c (expand_mult_highpart): Use gen_const_vec_series. * simplify-rtx.c (simplify_unary_operation): Handle negations of vector series. (simplify_binary_operation_series): New function. (simplify_binary_operation_1): Use it. Handle VEC_SERIES. (test_vector_ops_series): New function. (test_vector_ops): Call it. * config/powerpcspe/altivec.md (altivec_lvsl): Use gen_const_vec_series. (altivec_lvsr): Likewise. * config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r254297
2017-11-01Add more vec_duplicate simplificationsRichard Sandiford1-5/+196
This patch adds a vec_duplicate_p helper that tests for constant or non-constant vector duplicates. Together with the existing const_vec_duplicate_p, this complements the gen_vec_duplicate and gen_const_vec_duplicate added by a previous patch. The patch uses the new routines to add more rtx simplifications involving vector duplicates. These mirror simplifications that we already do for CONST_VECTOR broadcasts and are needed for variable-length SVE, which uses: (const:M (vec_duplicate:M X)) to represent constant broadcasts instead. The simplifications do trigger on the testsuite for variable duplicates too, and in each case I saw the change was an improvement. The best way of testing the new simplifications seemed to be via selftests. The patch cribs part of David's patch here: https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html . 2017-11-01 Richard Sandiford <richard.sandiford@linaro.org> David Malcolm <dmalcolm@redhat.com> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * rtl.h (vec_duplicate_p): New function. * selftest-rtl.c (assert_rtx_eq_at): New function. * selftest-rtl.h (ASSERT_RTX_EQ): New macro. (assert_rtx_eq_at): Declare. * selftest.h (selftest::simplify_rtx_c_tests): Declare. * selftest-run-tests.c (selftest::run_tests): Call it. * simplify-rtx.c: Include selftest.h and selftest-rtl.h. (simplify_unary_operation_1): Recursively handle vector duplicates. (simplify_binary_operation_1): Likewise. Handle VEC_SELECTs of vector duplicates. (simplify_subreg): Handle subregs of vector duplicates. (make_test_reg, test_vector_ops_duplicate, test_vector_ops) (selftest::simplify_rtx_c_tests): New functions. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Malcolm <dmalcolm@redhat.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r254294
2017-11-01Add gen_(const_)vec_duplicate helpersRichard Sandiford1-23/+11
This patch adds helper functions for generating constant and non-constant vector duplicates. These routines help with SVE because it is then easier to use: (const:M (vec_duplicate:M X)) for a broadcast of X, even if the number of elements in M isn't known at compile time. It also makes it easier for general rtx code to treat constant and non-constant duplicates in the same way. In the target code, the patch uses gen_vec_duplicate instead of gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially useful. It might be that some or all of the call sites only handle non-constants in practice, in which case the change is a harmless no-op (and a saving of a few characters). Otherwise, the target changes use gen_const_vec_duplicate instead of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate. They also include some changes to use CONSTxx_RTX for easy global constants. 2017-11-01 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * emit-rtl.h (gen_const_vec_duplicate): Declare. (gen_vec_duplicate): Likewise. * emit-rtl.c (gen_const_vec_duplicate_1): New function, split out from... (gen_const_vector): ...here. (gen_const_vec_duplicate, gen_vec_duplicate): New functions. (gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants whose elements are all equal. * optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate. * simplify-rtx.c (simplify_const_unary_operation): Likewise. (simplify_relational_operation): Likewise. * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup): Likewise. (aarch64_simd_dup_constant): Use gen_vec_duplicate. (aarch64_expand_vector_init): Likewise. * config/arm/arm.c (neon_vdup_constant): Likewise. (neon_expand_vector_init): Likewise. (arm_expand_vec_perm): Use gen_const_vec_duplicate. (arm_block_set_unaligned_vect): Likewise. (arm_block_set_aligned_vect): Likewise. * config/arm/neon.md (neon_copysignf<mode>): Likewise. * config/i386/i386.c (ix86_expand_vec_perm): Likewise. (expand_vec_perm_even_odd_pack): Likewise. (ix86_vector_duplicate_value): Use gen_vec_duplicate. * config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX. * config/ia64/ia64.c (ia64_expand_vecint_compare): Use gen_const_vec_duplicate. * config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX. * config/mips/mips.c (mips_gen_const_int_vector): Use gen_const_vec_duplicate. (mips_expand_vector_init): Use CONST0_RTX. * config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise. (define_split): Use gen_const_vec_duplicate. * config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX. (define_split): Use gen_const_vec_duplicate. * config/s390/vx-builtins.md (vec_genmask<mode>): Likewise. (vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise. * config/spu/spu.c (spu_const): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r254292
2017-10-22Make more use of GET_MODE_UNIT_PRECISIONRichard Sandiford1-22/+26
This patch is like the earlier GET_MODE_UNIT_SIZE one, but for precisions rather than sizes. There is one behavioural change in expand_debug_expr: we shouldn't use lowpart subregs for non-scalar truncations, since that would just reinterpret some of the scalars and drop the rest. (This probably doesn't trigger in practice.) Using TRUNCATE is fine for scalars, since simplify_gen_unary knows when a subreg can be used. 2017-10-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * cfgexpand.c (expand_debug_expr): Use GET_MODE_UNIT_PRECISION. (expand_debug_source_expr): Likewise. * combine.c (combine_simplify_rtx): Likewise. * cse.c (fold_rtx): Likewise. * optabs.c (expand_float): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. (simplify_binary_operation_1): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253991
2017-10-22Make more use of HWI_COMPUTABLE_MODE_PRichard Sandiford1-4/+5
This patch uses HWI_COMPUTABLE_MODE_P (X) instead of GET_MODE_PRECISION (X) <= HOST_BITS_PER_WIDE_INT in cases where X also needs to be a scalar integer. 2017-10-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * combine.c (simplify_comparison): Use HWI_COMPUTABLE_MODE_P. (record_promoted_value): Likewise. * expr.c (expand_expr_real_2): Likewise. * ree.c (update_reg_equal_equiv_notes): Likewise. (combine_set_extension): Likewise. * rtlanal.c (low_bitmask_len): Likewise. * simplify-rtx.c (neg_const_int): Likewise. (simplify_binary_operation_1): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253990
2017-10-13Make more use of GET_MODE_UNIT_SIZERichard Sandiford1-4/+3
This patch uses GET_MODE_UNIT_SIZE instead of GET_MODE_SIZE in cases where, for compound modes, the mode of the scalar elements is what matters. E.g. the choice between truncation and extension is really based on the modes of the consistuent scalars rather than the mode as a whole. None of the existing code was wrong. The patch simply makes things easier when converting to variable-sized modes. 2017-10-13 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.c (add_equal_note): Use GET_MODE_UNIT_SIZE. (widened_mode): Likewise. (expand_unop): Likewise. * ree.c (transform_ifelse): Likewise. (merge_def_and_ext): Likewise. (combine_reaching_defs): Likewise. * simplify-rtx.c (simplify_unary_operation_1): Likewise. Co-Authored-By: Alan Hayward <alan.hayward@arm.com> Co-Authored-By: David Sherwood <david.sherwood@arm.com> From-SVN: r253715
2017-10-03simplify-rtx: Remove non-simplifying simplification (PR77729)Segher Boessenkool1-25/+0
If we have (X&C1)|C2 simplify_binary_operation_1 makes C1 as small as possible. This makes worse code in common cases like when the AND with C1 is from a zero-extension. This patch fixes it by removing this transformation (twice). PR rtl-optimization/77729 * simplify-rtx.c (simplify_binary_operation_1): Delete the (X&C1)|C2 to (X&(C1&~C2))|C2 transformations. From-SVN: r253384