Age | Commit message (Collapse) | Author | Files | Lines |
|
PR rtl-optimization/89445
* simplify-rtx.c (simplify_ternary_operation): Don't use
simplify_merge_mask on operands that may trap.
* rtlanal.c (may_trap_p_1): Use FLOAT_MODE_P instead of
SCALAR_FLOAT_MODE_P checks. For integral division by zero, if
second operand is CONST_VECTOR, check if any element could be zero.
Don't expect traps for VEC_{MERGE,SELECT,CONCAT,DUPLICATE} unless
their operands can trap.
* gcc.target/i386/avx512f-pr89445.c: New test.
From-SVN: r269176
|
|
2019-01-09 Sandra Loosemore <sandra@codesourcery.com>
PR other/16615 [1/5]
contrib/
* mklog: Mechanically replace "can not" with "cannot".
gcc/
* Makefile.in: Mechanically replace "can not" with "cannot".
* alias.c: Likewise.
* builtins.c: Likewise.
* calls.c: Likewise.
* cgraph.c: Likewise.
* cgraph.h: Likewise.
* cgraphclones.c: Likewise.
* cgraphunit.c: Likewise.
* combine-stack-adj.c: Likewise.
* combine.c: Likewise.
* common/config/i386/i386-common.c: Likewise.
* config/aarch64/aarch64.c: Likewise.
* config/alpha/sync.md: Likewise.
* config/arc/arc.c: Likewise.
* config/arc/predicates.md: Likewise.
* config/arm/arm-c.c: Likewise.
* config/arm/arm.c: Likewise.
* config/arm/arm.h: Likewise.
* config/arm/arm.md: Likewise.
* config/arm/cortex-r4f.md: Likewise.
* config/csky/csky.c: Likewise.
* config/csky/csky.h: Likewise.
* config/darwin-f.c: Likewise.
* config/epiphany/epiphany.md: Likewise.
* config/i386/i386.c: Likewise.
* config/i386/sol2.h: Likewise.
* config/m68k/m68k.c: Likewise.
* config/mcore/mcore.h: Likewise.
* config/microblaze/microblaze.md: Likewise.
* config/mips/20kc.md: Likewise.
* config/mips/sb1.md: Likewise.
* config/nds32/nds32.c: Likewise.
* config/nds32/predicates.md: Likewise.
* config/pa/pa.c: Likewise.
* config/rs6000/e300c2c3.md: Likewise.
* config/rs6000/rs6000.c: Likewise.
* config/s390/s390.h: Likewise.
* config/sh/sh.c: Likewise.
* config/sh/sh.md: Likewise.
* config/spu/vmx2spu.h: Likewise.
* cprop.c: Likewise.
* dbxout.c: Likewise.
* df-scan.c: Likewise.
* doc/cfg.texi: Likewise.
* doc/extend.texi: Likewise.
* doc/fragments.texi: Likewise.
* doc/gty.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/lto.texi: Likewise.
* doc/md.texi: Likewise.
* doc/objc.texi: Likewise.
* doc/rtl.texi: Likewise.
* doc/tm.texi: Likewise.
* dse.c: Likewise.
* emit-rtl.c: Likewise.
* emit-rtl.h: Likewise.
* except.c: Likewise.
* expmed.c: Likewise.
* expr.c: Likewise.
* fold-const.c: Likewise.
* genautomata.c: Likewise.
* gimple-fold.c: Likewise.
* hard-reg-set.h: Likewise.
* ifcvt.c: Likewise.
* ipa-comdats.c: Likewise.
* ipa-cp.c: Likewise.
* ipa-devirt.c: Likewise.
* ipa-fnsummary.c: Likewise.
* ipa-icf.c: Likewise.
* ipa-inline-transform.c: Likewise.
* ipa-inline.c: Likewise.
* ipa-polymorphic-call.c: Likewise.
* ipa-profile.c: Likewise.
* ipa-prop.c: Likewise.
* ipa-pure-const.c: Likewise.
* ipa-reference.c: Likewise.
* ipa-split.c: Likewise.
* ipa-visibility.c: Likewise.
* ipa.c: Likewise.
* ira-build.c: Likewise.
* ira-color.c: Likewise.
* ira-conflicts.c: Likewise.
* ira-costs.c: Likewise.
* ira-int.h: Likewise.
* ira-lives.c: Likewise.
* ira.c: Likewise.
* ira.h: Likewise.
* loop-invariant.c: Likewise.
* loop-unroll.c: Likewise.
* lower-subreg.c: Likewise.
* lra-assigns.c: Likewise.
* lra-constraints.c: Likewise.
* lra-eliminations.c: Likewise.
* lra-lives.c: Likewise.
* lra-remat.c: Likewise.
* lra-spills.c: Likewise.
* lra.c: Likewise.
* lto-cgraph.c: Likewise.
* lto-streamer-out.c: Likewise.
* postreload-gcse.c: Likewise.
* predict.c: Likewise.
* profile-count.h: Likewise.
* profile.c: Likewise.
* recog.c: Likewise.
* ree.c: Likewise.
* reload.c: Likewise.
* reload1.c: Likewise.
* reorg.c: Likewise.
* resource.c: Likewise.
* rtl.def: Likewise.
* rtl.h: Likewise.
* rtlanal.c: Likewise.
* sched-deps.c: Likewise.
* sched-ebb.c: Likewise.
* sched-rgn.c: Likewise.
* sel-sched-ir.c: Likewise.
* sel-sched.c: Likewise.
* shrink-wrap.c: Likewise.
* simplify-rtx.c: Likewise.
* symtab.c: Likewise.
* target.def: Likewise.
* toplev.c: Likewise.
* tree-call-cdce.c: Likewise.
* tree-cfg.c: Likewise.
* tree-complex.c: Likewise.
* tree-core.h: Likewise.
* tree-eh.c: Likewise.
* tree-inline.c: Likewise.
* tree-loop-distribution.c: Likewise.
* tree-nrv.c: Likewise.
* tree-profile.c: Likewise.
* tree-sra.c: Likewise.
* tree-ssa-alias.c: Likewise.
* tree-ssa-dce.c: Likewise.
* tree-ssa-dom.c: Likewise.
* tree-ssa-forwprop.c: Likewise.
* tree-ssa-loop-im.c: Likewise.
* tree-ssa-loop-ivcanon.c: Likewise.
* tree-ssa-loop-ivopts.c: Likewise.
* tree-ssa-loop-niter.c: Likewise.
* tree-ssa-phionlycprop.c: Likewise.
* tree-ssa-phiopt.c: Likewise.
* tree-ssa-propagate.c: Likewise.
* tree-ssa-threadedge.c: Likewise.
* tree-ssa-threadupdate.c: Likewise.
* tree-ssa-uninit.c: Likewise.
* tree-ssanames.c: Likewise.
* tree-streamer-out.c: Likewise.
* tree.c: Likewise.
* tree.h: Likewise.
* vr-values.c: Likewise.
gcc/ada/
* exp_ch9.adb: Mechanically replace "can not" with "cannot".
* libgnat/s-regpat.ads: Likewise.
* par-ch4.adb: Likewise.
* set_targ.adb: Likewise.
* types.ads: Likewise.
gcc/cp/
* cp-tree.h: Mechanically replace "can not" with "cannot".
* parser.c: Likewise.
* pt.c: Likewise.
gcc/fortran/
* class.c: Mechanically replace "can not" with "cannot".
* decl.c: Likewise.
* expr.c: Likewise.
* gfc-internals.texi: Likewise.
* intrinsic.texi: Likewise.
* invoke.texi: Likewise.
* io.c: Likewise.
* match.c: Likewise.
* parse.c: Likewise.
* primary.c: Likewise.
* resolve.c: Likewise.
* symbol.c: Likewise.
* trans-array.c: Likewise.
* trans-decl.c: Likewise.
* trans-intrinsic.c: Likewise.
* trans-stmt.c: Likewise.
gcc/go/
* go-backend.c: Mechanically replace "can not" with "cannot".
* go-gcc.cc: Likewise.
gcc/lto/
* lto-partition.c: Mechanically replace "can not" with "cannot".
* lto-symtab.c: Likewise.
* lto.c: Likewise.
gcc/objc/
* objc-act.c: Mechanically replace "can not" with "cannot".
libbacktrace/
* backtrace.h: Mechanically replace "can not" with "cannot".
libgcc/
* config/c6x/libunwind.S: Mechanically replace "can not" with
"cannot".
* config/tilepro/atomic.h: Likewise.
* config/vxlib-tls.c: Likewise.
* generic-morestack-thread.c: Likewise.
* generic-morestack.c: Likewise.
* mkmap-symver.awk: Likewise.
libgfortran/
* caf/single.c: Mechanically replace "can not" with "cannot".
* io/unit.c: Likewise.
libobjc/
* class.c: Mechanically replace "can not" with "cannot".
* objc/runtime.h: Likewise.
* sendmsg.c: Likewise.
liboffloadmic/
* include/coi/common/COIResult_common.h: Mechanically replace
"can not" with "cannot".
* include/coi/source/COIBuffer_source.h: Likewise.
libstdc++-v3/
* include/ext/bitmap_allocator.h: Mechanically replace "can not"
with "cannot".
From-SVN: r267783
|
|
From-SVN: r267494
|
|
simplify-rtx.c:2153 since r264688)
PR rtl-optimization/87918
* simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use
simplify_gen_relational rather than simplify_gen_binary.
* gcc.target/i386/pr87918.c: New test.
From-SVN: r266062
|
|
2018-11-06 Richard Biener <rguenther@suse.de>
PR middle-end/18041
* simplify-rtx.c (simplify_binary_operation_1): Add pattern
matching bitfield insertion.
* gcc.target/i386/pr18041-1.c: New testcase.
* gcc.target/i386/pr18041-2.c: Likewise.
From-SVN: r265829
|
|
Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is
the maximum number of vector elements.
* simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to
HOST_BITS_PER_WIDE_INT.
(test_vector_ops_duplicate): Likewise.
From-SVN: r265290
|
|
Simplify
(subreg (vec_merge (X)
(vector)
(const_int ((1 << N) | M)))
(N * sizeof (outermode)))
to
(subreg (X) (N * sizeof (outermode)))
* simplify-rtx.c (simplify_subreg): Call simplify_gen_subreg
to simplify subreg of vec_merge.
From-SVN: r265267
|
|
We can simplify
(subreg (vec_merge (vec_duplicate X)
(vector)
(const_int ((1 << N) | M)))
(N * sizeof (X)))
to X when mode of X is the same as of mode of subreg.
gcc/
PR target/87537
* simplify-rtx.c (simplify_subreg): Simplify subreg of vec_merge
of vec_duplicate.
(test_vector_ops_duplicate): Add test for a scalar subreg of a
VEC_MERGE of a VEC_DUPLICATE.
gcc/testsuite/
PR target/87537
* gcc.target/i386/pr87537-1.c: New test.
From-SVN: r265260
|
|
This patch was part of the original patch we acquired from Honza and Martin.
It simplifies nested vec_merge operations using the same mask.
Self-tests are included.
2018-09-28 Andrew Stubbs <ams@codesourcery.com>
Jan Hubicka <jh@suse.cz>
Martin Jambor <mjambor@suse.cz>
* simplify-rtx.c (simplify_merge_mask): New function.
(simplify_ternary_operation): Use it, also see if VEC_MERGEs with the
same masks are used in op1 or op2.
(test_vec_merge): New function.
(test_vector_ops): Call test_vec_merge.
Co-Authored-By: Jan Hubicka <jh@suse.cz>
Co-Authored-By: Martin Jambor <mjambor@suse.cz>
From-SVN: r264688
|
|
The vec_select operator is documented to require a const_int for the lane
selector operand, but GCN has an instruction that can select the lane at
runtime, so it seems reasonable to remove this restriction.
This patch simply replaces assertions that the operand is constant with early
exits from the optimizers. I think it's reasonable that vec_select with a
non-constant operand cannot be optimized, yet.
Also included is the necessary documentation tweak.
2018-09-19 Andrew Stubbs <ams@codesourcery.com>
gcc/
* doc/rtl.texi: Adjust vec_select description.
* simplify-rtx.c (simplify_binary_operation_1): Allow VEC_SELECT to use
non-constant selectors.
From-SVN: r264423
|
|
* tree-vrp.c (vrp_int_const_binop): Change overflow type to
overflow_type.
(combine_bound): Use wide-int overflow calculation instead of
rolling our own.
* calls.c (maybe_warn_alloc_args_overflow): Change overflow type to
overflow_type.
* fold-const.c (int_const_binop_2): Same.
(extract_muldiv_1): Same.
(fold_div_compare): Same.
(fold_abs_const): Same.
* match.pd: Same.
* poly-int.h (add): Same.
(sub): Same.
(neg): Same.
(mul): Same.
* predict.c (predict_iv_comparison): Same.
* profile-count.c (slow_safe_scale_64bit): Same.
* simplify-rtx.c (simplify_const_binary_operation): Same.
* tree-chrec.c (tree_fold_binomial): Same.
* tree-data-ref.c (split_constant_offset_1): Same.
* tree-if-conv.c (idx_within_array_bound): Same.
* tree-scalar-evolution.c (iv_can_overflow_p): Same.
* tree-ssa-phiopt.c (minmax_replacement): Same.
* tree-vect-loop.c (is_nonwrapping_integer_induction): Same.
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Same.
* vr-values.c (vr_values::adjust_range_with_scev): Same.
* wide-int.cc (wi::add_large): Same.
(wi::mul_internal): Same.
(wi::sub_large): Same.
(wi::divmod_internal): Same.
* wide-int.h: Change overflow type to overflow_type for neg, add,
mul, smul, umul, div_trunc, div_floor, div_ceil, div_round,
mod_trunc, mod_ceil, mod_round, add_large, sub_large,
mul_internal, divmod_internal.
(overflow_type): New enum.
(accumulate_overflow): New.
cp/
* decl.c (build_enumerator): Change overflow type to overflow_type.
* init.c (build_new_1): Same.
From-SVN: r262494
|
|
This patch generalises various places that used hwi rtx accessors so
that they can handle poly_ints instead. In many cases these changes
are by inspection rather than because something had shown them to be
necessary.
2018-06-12 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* poly-int.h (can_div_trunc_p): Add new overload in which all values
are poly_ints.
* alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p.
(memrefs_conflict_p): Likewise.
(init_alias_analysis): Likewise.
* cfgexpand.c (expand_debug_expr): Likewise.
* combine.c (combine_simplify_rtx, force_int_to_mode): Likewise.
* cse.c (fold_rtx): Likewise.
* explow.c (adjust_stack, anti_adjust_stack): Likewise.
* expr.c (emit_block_move_hints): Likewise.
(clear_storage_hints, push_block, emit_push_insn): Likewise.
(store_expr_with_bounds, reduce_to_bit_field_precision): Likewise.
(emit_group_load_1): Use rtx_to_poly_int64 for group offsets.
(emit_group_store): Likewise.
(find_args_size_adjust): Use strip_offset. Use rtx_to_poly_int64
to read the PRE/POST_MODIFY increment.
* calls.c (store_one_arg): Use strip_offset.
* rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to
poly_int_rtx_p.
(set_noop_p): Use rtx_to_poly_int64 for the elements selected
by a VEC_SELECT.
* simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset.
(simplify_binary_operation_1): Extend CONST_INT handling to
poly_int_rtx_p.
* var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather
than a HOST_WIDE_INT.
(hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to
poly_int64.
(adjust_mems, add_stores): Update accodingly.
(vt_canonicalize_addr): Track polynomial offsets.
(emit_note_insn_var_location): Likewise.
(vt_add_function_parameter): Likewise.
(vt_initialize): Likewise.
From-SVN: r261530
|
|
it is not useful
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting the remaining bits, and simply move the first element to the corrcet
place (implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
to catch:
(vec_merge:OUTER
(vec_duplicate:OUTER x:INNER)
(subreg:OUTER y:INNER 0)
(const_int N))
And simplify it to:
(vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function, without requiring the second operand to also be a vec_duplicate.
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
code generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
* gcc.target/aarch64/vect-slp-dup.c: New.
Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
From-SVN: r260309
|
|
config/i386/i386.c:13810 with -Og -fgcse)
PR middle-end/85414
* simplify-rtx.c (simplify_unary_operation_1) <case SIGN_EXTEND,
case ZERO_EXTEND>: Pass SUBREG_REG (op) rather than op to
gen_lowpart_no_emit.
From-SVN: r259649
|
|
-fno-tree-ccp -fno-tree-copy-prop)
PR rtl-optimization/85376
* simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and
zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX
instead of a specific value.
* gcc.dg/pr85376.c: New test.
From-SVN: r259377
|
|
simplify_const_unary_operation, at simplify-rtx.c:1731)
PR rtl-optimization/84989
* simplify-rtx.c (simplify_unary_operation_1): Don't try to simplify
VEC_DUPLICATE with scalar result mode.
* gcc.target/i386/pr84989.c: New test.
From-SVN: r258709
|
|
simplify_binary_operation_1, at simplify-rtx.c:3302)
PR target/83930
* simplify-rtx.c (simplify_binary_operation_1) <case UMOD>: Use
UINTVAL (trueop1) instead of INTVAL (op1).
* gcc.dg/pr83930.c: New test.
From-SVN: r256915
|
|
The SVE support for the new CONST_VECTOR encoding needs to be able
to extract the first N bits of the vector and duplicate it. This patch
adds a simplify_subreg rule for that.
The code is covered by the gcc.target/aarch64/sve_slp_*.c tests.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* simplify-rtx.c (simplify_immed_subreg): Add an inner_bytes
parameter and use it instead of GET_MODE_SIZE (innermode). Use
inner_bytes * BITS_PER_UNIT instead of GET_MODE_BITSIZE (innermode).
Use CEIL (inner_bytes, GET_MODE_UNIT_SIZE (innermode)) instead of
GET_MODE_NUNITS (innermode). Also add a first_elem parameter.
Change innermode from fixed_mode_size to machine_mode.
(simplify_subreg): Update call accordingly. Handle a constant-sized
subreg of a variable-length CONST_VECTOR.
From-SVN: r256610
|
|
gcc/
2018-01-08 Vidya Praveen <vidyapraveen@arm.com>
PR target/83663 - Revert r255946
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
gcc/testsuite/
2018-01-08 Vidya Praveen <vidyapraveen@arm.com>
PR target/83663 - Revert r255946
* gcc.target/aarch64/vect-slp-dup.c: New.
From-SVN: r256346
|
|
This patch add support for the missing transformation of
(x | y) == x -> (y & ~x) == 0. The transformation for (x & y) == x case
already exists in simplify-rtx.c since 2014 as of r218503 and this patch
only adds a couple of extra patterns for the IOR case. This benefits
targets that have the BICS instruction to generate better code. For
targets that do not have the BICS instructions, it still results in
no worse code generation and gives out 2 instructions.
ChangeLog Entries:
*** gcc/ChangeLog ***
2018-01-05 Sudakshina Das <sudi.das@arm.com>
PR target/82439
* simplify-rtx.c (simplify_relational_operation_1): Add simplifications
of (x|y) == x for BICS pattern.
*** gcc/testsuite/ChangeLog ***
2018-01-05 Sudakshina Das <sudi.das@arm.com>
PR target/82439
* gcc.target/aarch64/bics_5.c: New test.
* gcc.target/arm/bics_5.c: Likewise.
From-SVN: r256275
|
|
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod. Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform)
(lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
(get_address_cost_ainc): Likewise.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
(vectorizable_operation, vectorizable_load): Likewise.
* tree.c (build_same_sized_truth_vector_type): Likewise.
* valtrack.c (cleanup_auto_inc_dec): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
(ADDR_VEC_ALIGN): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256201
|
|
This patch changes GET_MODE_NUNITS from unsigned char
to poly_uint16, although it remains a macro when compiling
target code with NUM_POLY_INT_COEFFS == 1.
We can handle permuted loads and stores for variable nunits if
the number of statements is a power of 2, but not otherwise.
The to_constant call in make_vector_type goes away in a later patch.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_nunits): Change from unsigned char to
poly_uint16_pod.
(ONLY_FIXED_SIZE_MODES): New macro.
(pod_mode::measurement_type, scalar_int_mode::measurement_type)
(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
(complex_mode::measurement_type, fixed_size_mode::measurement_type):
New typedefs.
(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
* genmodes.c (ZERO_COEFFS): New macro.
(emit_mode_nunits_inline): Make mode_nunits_inline return a
poly_uint16.
(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
Use ZERO_COEFFS when emitting initializers.
* data-streamer.h (bp_pack_poly_value): New function.
(bp_unpack_poly_value): Likewise.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_NUNITS.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_NUNITS.
* tree.c (make_vector_type): Remove temporary shim and make
the real function take the number of units as a poly_uint64
rather than an int.
(build_vector_type_for_mode): Handle polynomial nunits.
* dwarf2out.c (loc_descriptor, add_const_value_attribute): Likewise.
* emit-rtl.c (const_vec_series_p_1): Likewise.
(gen_rtx_CONST_VECTOR): Likewise.
* fold-const.c (test_vec_duplicate_folding): Likewise.
* genrecog.c (validate_pattern): Likewise.
* optabs-query.c (can_vec_perm_var_p, can_mult_highpart_p): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vector_broadcast, expand_binop_directly): Likewise.
(shift_amt_for_vec_perm_mask, expand_vec_perm_var): Likewise.
(expand_vec_cond_expr, expand_mult_highpart): Likewise.
* rtlanal.c (subreg_get_info): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_grouped_load_supported): Likewise.
* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
* tree-vect-loop.c (have_whole_vector_shift): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_const_unary_operation, simplify_binary_operation_1)
(simplify_const_binary_operation, simplify_ternary_operation)
(test_vector_ops_duplicate, test_vector_ops): Likewise.
(simplify_immed_subreg): Use GET_MODE_NUNITS on a fixed_size_mode
instead of CONST_VECTOR_NUNITS.
* varasm.c (output_constant_pool_2): Likewise.
* rtx-vector-builder.c (rtx_vector_builder::build): Only include the
explicit-encoded elements in the XVEC for variable-length vectors.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Handle polynomial
GET_MODE_NUNITS.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256195
|
|
From-SVN: r256169
|
|
This patch makes CONST_VECTOR_ELT handle implicitly-encoded elements,
in a similar way to VECTOR_CST_ELT.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* rtl.h (CONST_VECTOR_ELT): Redefine to const_vector_elt.
(const_vector_encoded_nelts): New function.
(CONST_VECTOR_NUNITS): Redefine to use GET_MODE_NUNITS.
(const_vector_int_elt, const_vector_elt): Declare.
* emit-rtl.c (const_vector_int_elt_1): New function.
(const_vector_elt): Likewise.
* simplify-rtx.c (simplify_immed_subreg): Avoid taking the address
of CONST_VECTOR_ELT.
From-SVN: r256104
|
|
This patch replaces target-independent uses of XVECEXP with uses
of CONST_VECTOR_ELT. This kind of replacement isn't necessary
for code specific to targets other than AArch64.
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* simplify-rtx.c (simplify_const_binary_operation): Use
CONST_VECTOR_ELT instead of XVECEXP.
From-SVN: r256101
|
|
This patch makes the VEC_SERIES code use valid_for_const_vector_p
instead of CONSTANT_P, to match what we already do for VEC_DUPLICATE.
This showed up as a failure in gcc.c-torture/execute/pr28982b.c for -m32
on x86_64-linux-gnu after later patches.
2017-12-28 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* emit-rtl.c (gen_const_vec_series): Use valid_for_const_vector_p
instead of CONSTANT_P.
(gen_vec_series): Likewise.
* simplify-rtx.c (simplify_binary_operation_1): Likewise.
From-SVN: r256023
|
|
it is not useful
Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be overwriting
the remaining bits, and simply move the first element to the corrcet place
(implicitly zeroing all other bits).
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation ,
to ensure we can still simplify:
(vec_merge:OUTER
(vec_duplicate:OUTER x:INNER)
(subreg:OUTER y:INNER 0)
(const_int N))
To:
(vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
---
gcc/
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code
generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
gcc/testsuite/
* gcc.target/aarch64/vect-slp-dup.c: New.
From-SVN: r255946
|
|
varasm.c:3896 on aarch64)
PR rtl-optimization/82973
* emit-rtl.h (valid_for_const_vec_duplicate_p): Rename to ...
(valid_for_const_vector_p): ... this.
* emit-rtl.c (valid_for_const_vec_duplicate_p): Rename to ...
(valid_for_const_vector_p): ... this. Adjust function comment.
(gen_vec_duplicate): Adjust caller.
* optabs.c (expand_vector_broadcast): Likewise.
* simplify-rtx.c (simplify_const_unary_operation): Don't optimize into
CONST_VECTOR if some element isn't simplified valid_for_const_vector_p
constant.
(simplify_const_binary_operation): Likewise. Use CONST_FIXED_P macro
instead of GET_CODE == CONST_FIXED.
(simplify_subreg): Use CONST_FIXED_P macro instead of
GET_CODE == CONST_FIXED.
* gfortran.dg/pr82973.f90: New test.
From-SVN: r255938
|
|
This patch makes get_inner_reference and ptr_difference_const return the
bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
The non-mechanical changes were handled by previous patches.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree.h (get_inner_reference): Return the bitsize and bitpos
as poly_int64_pods rather than HOST_WIDE_INT.
* fold-const.h (ptr_difference_const): Return the pointer difference
as a poly_int64_pod rather than a HOST_WIDE_INT.
* expr.c (get_inner_reference): Return the bitsize and bitpos
as poly_int64_pods rather than HOST_WIDE_INT.
(expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial
offsets and sizes.
* fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64
rather than a HOST_WIDE_INT. Update call to get_inner_reference.
(optimize_bit_field_compare): Update call to get_inner_reference.
(decode_field_reference): Likewise.
(fold_unary_loc): Track polynomial offsets and sizes.
(split_address_to_core_and_offset): Return the bitpos as a
poly_int64_pod rather than a HOST_WIDE_INT.
(ptr_difference_const): Likewise for the pointer difference.
* asan.c (instrument_derefs): Track polynomial offsets and sizes.
* config/mips/mips.c (r10k_safe_mem_expr_p): Likewise.
* dbxout.c (dbxout_expand_expr): Likewise.
* dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref)
(loc_list_from_tree_1, fortran_common): Likewise.
* gimple-laddress.c (pass_laddress::execute): Likewise.
* gimple-ssa-store-merging.c (find_bswap_or_nop_load): Likewise.
* gimplify.c (gimplify_scan_omp_clauses): Likewise.
* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
* tree-affine.c (tree_to_aff_combination): Likewise.
(get_inner_reference_aff): Likewise.
* tree-data-ref.c (split_constant_offset_1): Likewise.
(dr_analyze_innermost): Likewise.
* tree-scalar-evolution.c (interpret_rhs_expr): Likewise.
* tree-sra.c (ipa_sra_check_caller): Likewise.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
* ubsan.c (maybe_instrument_pointer_overflow): Likewise.
(instrument_bool_enum_load, instrument_object_size): Likewise.
* gimple-ssa-strength-reduction.c (slsr_process_ref): Update call
to get_inner_reference.
* hsa-gen.c (gen_hsa_addr): Likewise.
* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.
* tsan.c (instrument_expr): Likewise.
* match.pd: Update call to ptr_difference_const.
gcc/ada/
* gcc-interface/trans.c (Attribute_to_gnu): Track polynomial
offsets and sizes.
* gcc-interface/utils2.c (build_unary_op): Likewise.
gcc/cp/
* constexpr.c (check_automatic_or_tls): Track polynomial
offsets and sizes.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255914
|
|
This patch changes SUBREG_BYTE from an int to a poly_int.
Since valid SUBREG_BYTEs must be contained within the mode of the
SUBREG_REG, the required range is the same as for GET_MODE_SIZE,
i.e. unsigned short. The patch therefore uses poly_uint16(_pod)
for the SUBREG_BYTE.
Using poly_uint16_pod rtx fields requires a new field code ('p').
Since there are no other uses of 'p' besides SUBREG_BYTE, the patch
doesn't add an XPOLY or whatever; all uses should go via SUBREG_BYTE
instead.
The patch doesn't bother implementing 'p' support for legacy
define_peepholes, since none of the remaining ones have subregs
in their patterns.
As it happened, the rtl documentation used SUBREG as an example of a
code with mixed field types, accessed via XEXP (x, 0) and XINT (x, 1).
Since there's no direct replacement for XINT, and since people should
never use it even if there were, the patch changes the example to use
INT_LIST instead.
The patch also changes subreg-related helper functions so that they too
take and return polynomial offsets. This makes the patch quite big, but
it's mostly mechanical. The patch generally sticks to existing choices
wrt signedness.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi: Update documentation of SUBREG_BYTE. Document the
'p' format code. Use INT_LIST rather than SUBREG as the example of
a code with an XINT and an XEXP. Remove the implication that
accessing an rtx field using XINT is expected to work.
* rtl.def (SUBREG): Change format from "ei" to "ep".
* rtl.h (rtunion::rt_subreg): New field.
(XCSUBREG): New macro.
(SUBREG_BYTE): Use it.
(subreg_shape): Change offset from an unsigned int to a poly_uint16.
Update constructor accordingly.
(subreg_shape::operator ==): Update accordingly.
(subreg_shape::unique_id): Return an unsigned HOST_WIDE_INT rather
than an unsigned int.
(subreg_lsb, subreg_lowpart_offset, subreg_highpart_offset): Return
a poly_uint64 rather than an unsigned int.
(subreg_lsb_1): Likewise. Take the offset as a poly_uint64 rather
than an unsigned int.
(subreg_size_offset_from_lsb, subreg_size_lowpart_offset)
(subreg_size_highpart_offset): Return a poly_uint64 rather than
an unsigned int. Take the sizes as poly_uint64s.
(subreg_offset_from_lsb): Return a poly_uint64 rather than
an unsigned int. Take the shift as a poly_uint64 rather than
an unsigned int.
(subreg_regno_offset, subreg_offset_representable_p): Take the offset
as a poly_uint64 rather than an unsigned int.
(simplify_subreg_regno): Likewise.
(byte_lowpart_offset): Return the memory offset as a poly_int64
rather than an int.
(subreg_memory_offset): Likewise. Take the subreg offset as a
poly_uint64 rather than an unsigned int.
(simplify_subreg, simplify_gen_subreg, subreg_get_info)
(gen_rtx_SUBREG, validate_subreg): Take the subreg offset as a
poly_uint64 rather than an unsigned int.
* rtl.c (rtx_format): Describe 'p' in comment.
(copy_rtx, rtx_equal_p_cb, rtx_equal_p): Handle 'p'.
* emit-rtl.c (validate_subreg, gen_rtx_SUBREG): Take the subreg
offset as a poly_uint64 rather than an unsigned int.
(byte_lowpart_offset): Return the memory offset as a poly_int64
rather than an int.
(subreg_memory_offset): Likewise. Take the subreg offset as a
poly_uint64 rather than an unsigned int.
(subreg_size_lowpart_offset, subreg_size_highpart_offset): Take the
mode sizes as poly_uint64s rather than unsigned ints. Return a
poly_uint64 rather than an unsigned int.
(subreg_lowpart_p): Treat subreg offsets as poly_ints.
(copy_insn_1): Handle 'p'.
* rtlanal.c (set_noop_p): Treat subregs offsets as poly_uint64s.
(subreg_lsb_1): Take the subreg offset as a poly_uint64 rather than
an unsigned int. Return the shift in the same way.
(subreg_lsb): Return the shift as a poly_uint64 rather than an
unsigned int.
(subreg_size_offset_from_lsb): Take the sizes and shift as
poly_uint64s rather than unsigned ints. Return the offset as
a poly_uint64.
(subreg_get_info, subreg_regno_offset, subreg_offset_representable_p)
(simplify_subreg_regno): Take the offset as a poly_uint64 rather than
an unsigned int.
* rtlhash.c (add_rtx): Handle 'p'.
* genemit.c (gen_exp): Likewise.
* gengenrtl.c (type_from_format, gendef): Likewise.
* gensupport.c (subst_pattern_match, get_alternatives_number)
(collect_insn_data, alter_predicate_for_insn, alter_constraints)
(subst_dup): Likewise.
* gengtype.c (adjust_field_rtx_def): Likewise.
* genrecog.c (find_operand, find_matching_operand, validate_pattern)
(match_pattern_2): Likewise.
(rtx_test::SUBREG_FIELD): New rtx_test::kind_enum.
(rtx_test::subreg_field): New function.
(operator ==, safe_to_hoist_p, transition_parameter_type)
(print_nonbool_test, print_test): Handle SUBREG_FIELD.
* genattrtab.c (attr_rtx_1): Say that 'p' is deliberately not handled.
* genpeep.c (match_rtx): Likewise.
* print-rtl.c (print_poly_int): Include if GENERATOR_FILE too.
(rtx_writer::print_rtx_operand): Handle 'p'.
(print_value): Handle SUBREG.
* read-rtl.c (apply_int_iterator): Likewise.
(rtx_reader::read_rtx_operand): Handle 'p'.
* alias.c (rtx_equal_for_memref_p): Likewise.
* cselib.c (rtx_equal_for_cselib_1, cselib_hash_rtx): Likewise.
* caller-save.c (replace_reg_with_saved_mem): Treat subreg offsets
as poly_ints.
* calls.c (expand_call): Likewise.
* combine.c (combine_simplify_rtx, expand_field_assignment): Likewise.
(make_extraction, gen_lowpart_for_combine): Likewise.
* loop-invariant.c (hash_invariant_expr_1, invariant_expr_equal_p):
Likewise.
* cse.c (remove_invalid_subreg_refs): Take the offset as a poly_uint64
rather than an unsigned int. Treat subreg offsets as poly_ints.
(exp_equiv_p): Handle 'p'.
(hash_rtx_cb): Likewise. Treat subreg offsets as poly_ints.
(equiv_constant, cse_insn): Treat subreg offsets as poly_ints.
* dse.c (find_shift_sequence): Likewise.
* dwarf2out.c (rtl_for_decl_location): Likewise.
* expmed.c (extract_low_bits): Likewise.
* expr.c (emit_group_store, undefined_operand_subword_p): Likewise.
(expand_expr_real_2): Likewise.
* final.c (alter_subreg): Likewise.
(leaf_renumber_regs_insn): Handle 'p'.
* function.c (assign_parm_find_stack_rtl, assign_parm_setup_stack):
Treat subreg offsets as poly_ints.
* fwprop.c (forward_propagate_and_simplify): Likewise.
* ifcvt.c (noce_emit_move_insn, noce_emit_cmove): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-conflicts.c (go_through_subreg): Likewise.
* ira-lives.c (process_single_reg_class_operands): Likewise.
* jump.c (rtx_renumbered_equal_p): Likewise. Handle 'p'.
* lower-subreg.c (simplify_subreg_concatn): Take the subreg offset
as a poly_uint64 rather than an unsigned int.
(simplify_gen_subreg_concatn, resolve_simple_move): Treat
subreg offsets as poly_ints.
* lra-constraints.c (operands_match_p): Handle 'p'.
(match_reload, curr_insn_transform): Treat subreg offsets as poly_ints.
* lra-spills.c (assign_mem_slot): Likewise.
* postreload.c (move2add_valid_value_p): Likewise.
* recog.c (general_operand, indirect_operand): Likewise.
* regcprop.c (copy_value, maybe_mode_change): Likewise.
(copyprop_hardreg_forward_1): Likewise.
* reginfo.c (simplifiable_subregs_hasher::hash, simplifiable_subregs)
(record_subregs_of_mode): Likewise.
* rtlhooks.c (gen_lowpart_general, gen_lowpart_if_possible): Likewise.
* reload.c (operands_match_p): Handle 'p'.
(find_reloads_subreg_address): Treat subreg offsets as poly_ints.
* reload1.c (alter_reg, choose_reload_regs): Likewise.
(compute_reload_subreg_offset): Likewise, and return an poly_int64.
* simplify-rtx.c (simplify_truncation, simplify_binary_operation_1):
(test_vector_ops_duplicate): Treat subreg offsets as poly_ints.
(simplify_const_poly_int_tests<N>::run): Likewise.
(simplify_subreg, simplify_gen_subreg): Take the subreg offset as
a poly_uint64 rather than an unsigned int.
* valtrack.c (debug_lowpart_subreg): Likewise.
* var-tracking.c (var_lowpart): Likewise.
(loc_cmp): Handle 'p'.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255882
|
|
This patch changes the MEM_OFFSET and MEM_SIZE memory attributes
from HOST_WIDE_INT to poly_int64. Most of it is mechanical,
but there is one nonbovious change in widen_memory_access.
Previously the main while loop broke with:
/* Similarly for the decl. */
else if (DECL_P (attrs.expr)
&& DECL_SIZE_UNIT (attrs.expr)
&& TREE_CODE (DECL_SIZE_UNIT (attrs.expr)) == INTEGER_CST
&& compare_tree_int (DECL_SIZE_UNIT (attrs.expr), size) >= 0
&& (! attrs.offset_known_p || attrs.offset >= 0))
break;
but it seemed wrong to optimistically assume the best case
when the offset isn't known (and thus might be negative).
As it happens, the "! attrs.offset_known_p" condition was
always false, because we'd already nullified attrs.expr in
that case:
/* If we don't know what offset we were at within the expression, then
we can't know if we've overstepped the bounds. */
if (! attrs.offset_known_p)
attrs.expr = NULL_TREE;
The patch therefore drops "! attrs.offset_known_p ||" when
converting the offset check to the may/must interface.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (mem_attrs): Add a default constructor. Change size and
offset from HOST_WIDE_INT to poly_int64.
* emit-rtl.h (set_mem_offset, set_mem_size, adjust_address_1)
(adjust_automodify_address_1, set_mem_attributes_minus_bitpos)
(widen_memory_access): Take the sizes and offsets as poly_int64s
rather than HOST_WIDE_INTs.
* alias.c (ao_ref_from_mem): Handle the new form of MEM_OFFSET.
(offset_overlap_p): Take poly_int64s rather than HOST_WIDE_INTs
and ints.
(adjust_offset_for_component_ref): Change the offset from a
HOST_WIDE_INT to a poly_int64.
(nonoverlapping_memrefs_p): Track polynomial offsets and sizes.
* cfgcleanup.c (merge_memattrs): Update after mem_attrs changes.
* dce.c (find_call_stack_args): Likewise.
* dse.c (record_store): Likewise.
* dwarf2out.c (tls_mem_loc_descriptor, dw_sra_loc_expr): Likewise.
* print-rtl.c (rtx_writer::print_rtx): Likewise.
* read-rtl-function.c (test_loading_mem): Likewise.
* rtlanal.c (may_trap_p_1): Likewise.
* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
* var-tracking.c (int_mem_offset, track_expr_p): Likewise.
* emit-rtl.c (mem_attrs_eq_p, get_mem_align_offset): Likewise.
(mem_attrs::mem_attrs): New function.
(set_mem_attributes_minus_bitpos): Change bitpos from a
HOST_WIDE_INT to poly_int64.
(set_mem_alias_set, set_mem_addr_space, set_mem_align, set_mem_expr)
(clear_mem_offset, clear_mem_size, change_address)
(get_spill_slot_decl, set_mem_attrs_for_spill): Directly
initialize mem_attrs.
(set_mem_offset, set_mem_size, adjust_address_1)
(adjust_automodify_address_1, offset_address, widen_memory_access):
Likewise. Take poly_int64s rather than HOST_WIDE_INT.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255875
|
|
This patch adds an rtl representation of poly_int values.
There were three possible ways of doing this:
(1) Add a new rtl code for the poly_ints themselves and store the
coefficients as trailing wide_ints. This would give constants like:
(const_poly_int [c0 c1 ... cn])
The runtime value would be:
c0 + c1 * x1 + ... + cn * xn
(2) Like (1), but use rtxes for the coefficients. This would give
constants like:
(const_poly_int [(const_int c0)
(const_int c1)
...
(const_int cn)])
although the coefficients could be const_wide_ints instead
of const_ints where appropriate.
(3) Add a new rtl code for the polynomial indeterminates,
then use them in const wrappers. A constant like c0 + c1 * x1
would then look like:
(const:M (plus:M (mult:M (const_param:M x1)
(const_int c1))
(const_int c0)))
There didn't seem to be that much to choose between them. The main
advantage of (1) is that it's a more efficient representation and
that we can refer to the cofficients directly as wide_int_storage.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi (const_poly_int): Document. Also document the
rtl sharing behavior.
* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
* rtl.h (const_poly_int_def): New struct.
(rtx_def::u): Add a cpi field.
(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
(wi::rtx_to_poly_wide_ref): New typedef
(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
(poly_int_rtx_p): New functions.
(trunc_int_for_mode): Declare a poly_int64 version.
(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
(immed_wide_int_const): Take a poly_wide_int_ref rather than
a wide_int_ref.
(strip_offset): Declare.
(strip_offset_and_add): New function.
* rtl.def (CONST_POLY_INT): New rtx code.
* rtl.c (rtx_size): Handle CONST_POLY_INT.
(shared_const_p): Use poly_int_rtx_p.
* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
HOST_WIDE_INT.
(gen_int_shift_amount): Likewise.
* emit-rtl.c (const_poly_int_hasher): New class.
(const_poly_int_htab): New variable.
(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
(const_poly_int_hasher::hash): New function.
(const_poly_int_hasher::equal): Likewise.
(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
(immed_wide_int_const): Rename to...
(immed_wide_int_const_1): ...this and make static.
(immed_wide_int_const): New function, taking a poly_wide_int_ref
instead of a wide_int_ref.
(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
(gen_lowpart_common): Handle CONST_POLY_INT.
* cse.c (hash_rtx_cb, equiv_constant): Likewise.
* cselib.c (cselib_hash_rtx): Likewise.
* dwarf2out.c (const_ok_for_output_1): Likewise.
* expr.c (convert_modes): Likewise.
* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
* rtlhash.c (add_rtx): Likewise.
* explow.c (trunc_int_for_mode): Add a poly_int64 version.
(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
Handle existing CONST_POLY_INT rtxes.
* expmed.h (expand_shift): Take a poly_int64 instead of a
HOST_WIDE_INT.
* expmed.c (expand_shift): Likewise.
* rtlanal.c (strip_offset): New function.
(commutative_operand_precedence): Give CONST_POLY_INT the same
precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
and CONST_INT.
* rtl-tests.c (const_poly_int_tests): New struct.
(rtl_tests_c_tests): Use it.
* simplify-rtx.c (simplify_const_unary_operation): Handle
CONST_POLY_INT.
(simplify_const_binary_operation): Likewise.
(simplify_binary_operation_1): Fold additions of symbolic constants
and CONST_POLY_INTs.
(simplify_subreg): Handle extensions and truncations of
CONST_POLY_INTs.
(simplify_const_poly_int_tests): New struct.
(simplify_rtx_c_tests): Use it.
* wide-int.h (storage_ref): Add default constructor.
(wide_int_ref_storage): Likewise.
(trailing_wide_ints): Use GTY((user)).
(trailing_wide_ints::operator[]): Add a const version.
(trailing_wide_ints::get_precision): New function.
(trailing_wide_ints::extra_size): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255862
|
|
This patch adds a helper routine that constructs rtxes
for constant shift amounts, given the mode of the value
being shifted. As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.
One long-standing problem has been to decide what the mode
of a shift count should be for arbitrary rtxes (as opposed to those
directly tied to a target pattern). Realistic choices would be
the mode of the shifted elements, word_mode, QImode, a 64-bit mode,
or the same mode as the shift optabs (in which case what should the
mode be when the target doesn't have a pattern?)
For now the patch picks a 64-bit mode, but with a ??? comment.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
(expand_vec_perm_var): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255861
|
|
conditions.
* read-rtl.c (parse_reg_note_name): Replace Yoda conditions with
typical order conditions.
* sel-sched.c (extract_new_fences_from): Likewise.
* config/visium/constraints.md (J, K, L): Likewise.
* config/visium/predicates.md (const_shift_operand): Likewise.
* config/visium/visium.c (visium_legitimize_address,
visium_legitimize_reload_address): Likewise.
* config/m68k/m68k.c (output_reg_adjust, emit_reg_adjust): Likewise.
* config/arm/arm.c (arm_block_move_unaligned_straight): Likewise.
* config/avr/constraints.md (Y01, Ym1, Y02, Ym2): Likewise.
* config/avr/avr-log.c (avr_vdump, avr_log_set_avr_log,
SET_DUMP_DETAIL): Likewise.
* config/avr/predicates.md (const_8_16_24_operand): Likewise.
* config/avr/avr.c (STR_PREFIX_P, avr_popcount_each_byte,
avr_is_casesi_sequence, avr_casei_sequence_check_operands,
avr_set_core_architecture, avr_set_current_function,
avr_legitimize_reload_address, avr_asm_len, avr_print_operand,
output_movqi, output_movsisf, avr_out_plus, avr_out_bitop,
avr_out_fract, avr_adjust_insn_length, avr_encode_section_info,
avr_2word_insn_p, output_reload_in_const, avr_has_nibble_0xf,
avr_map_decompose, avr_fold_builtin): Likewise.
* config/avr/driver-avr.c (avr_devicespecs_file): Likewise.
* config/avr/gen-avr-mmcu-specs.c (str_prefix_p, print_mcu): Likewise.
* config/i386/i386.c (ix86_parse_stringop_strategy_string): Likewise.
* config/m32c/m32c-pragma.c (m32c_pragma_memregs): Likewise.
* config/m32c/m32c.c (m32c_conditional_register_usage,
m32c_address_cost): Likewise.
* config/m32c/predicates.md (shiftcount_operand,
longshiftcount_operand): Likewise.
* config/iq2000/iq2000.c (iq2000_expand_prologue): Likewise.
* config/nios2/nios2.c (nios2_handle_custom_fpu_insn_option,
can_use_cdx_ldstw): Likewise.
* config/nios2/nios2.h (CDX_REG_P): Likewise.
* config/cr16/cr16.h (RETURN_ADDR_RTX, REGNO_MODE_OK_FOR_BASE_P):
Likewise.
* config/cr16/cr16.md (*mov<mode>_double): Likewise.
* config/cr16/cr16.c (cr16_create_dwarf_for_multi_push): Likewise.
* config/h8300/h8300.c (h8300_rtx_costs, get_shift_alg): Likewise.
* config/vax/constraints.md (U06, U08, U16, CN6, S08, S16): Likewise.
* config/vax/vax.c (adjacent_operands_p): Likewise.
* config/ft32/constraints.md (L, b, KA): Likewise.
* config/ft32/ft32.c (ft32_load_immediate, ft32_expand_prologue):
Likewise.
* cfgexpand.c (expand_stack_alignment): Likewise.
* gcse.c (insert_expr_in_table): Likewise.
* print-rtl.c (rtx_writer::print_rtx_operand_codes_E_and_V): Likewise.
* cgraphunit.c (cgraph_node::expand): Likewise.
* ira-build.c (setup_min_max_allocno_live_range_point): Likewise.
* emit-rtl.c (add_insn): Likewise.
* input.c (dump_location_info): Likewise.
* passes.c (NEXT_PASS): Likewise.
* read-rtl-function.c (parse_note_insn_name,
function_reader::read_rtx_operand_r, function_reader::parse_mem_expr):
Likewise.
* sched-rgn.c (sched_rgn_init): Likewise.
* diagnostic-show-locus.c (layout::show_ruler): Likewise.
* combine.c (find_split_point, simplify_if_then_else, force_to_mode,
if_then_else_cond, simplify_shift_const_1, simplify_comparison): Likewise.
* explow.c (eliminate_constant_term): Likewise.
* final.c (leaf_renumber_regs_insn): Likewise.
* cfgrtl.c (print_rtl_with_bb): Likewise.
* genhooks.c (emit_init_macros): Likewise.
* poly-int.h (maybe_ne, maybe_le, maybe_lt): Likewise.
* tree-data-ref.c (conflict_fn): Likewise.
* selftest.c (assert_streq): Likewise.
* expr.c (store_constructor_field, expand_expr_real_1): Likewise.
* fold-const.c (fold_range_test, extract_muldiv_1, fold_truth_andor,
fold_binary_loc, multiple_of_p): Likewise.
* reload.c (push_reload, find_equiv_reg): Likewise.
* et-forest.c (et_nca, et_below): Likewise.
* dbxout.c (dbxout_symbol_location): Likewise.
* reorg.c (relax_delay_slots): Likewise.
* dojump.c (do_compare_rtx_and_jump): Likewise.
* gengtype-parse.c (type): Likewise.
* simplify-rtx.c (simplify_gen_ternary, simplify_gen_relational,
simplify_const_relational_operation): Likewise.
* reload1.c (do_output_reload): Likewise.
* dumpfile.c (get_dump_file_info_by_switch): Likewise.
* gengtype.c (type_for_name): Likewise.
* gimple-ssa-sprintf.c (format_directive): Likewise.
ada/
* gcc-interface/trans.c (Loop_Statement_to_gnu): Replace Yoda
conditions with typical order conditions.
* gcc-interface/misc.c (gnat_get_array_descr_info,
default_pass_by_ref): Likewise.
* gcc-interface/decl.c (gnat_to_gnu_entity): Likewise.
* adaint.c (__gnat_tmp_name): Likewise.
c-family/
* known-headers.cc (get_stdlib_header_for_name): Replace Yoda
conditions with typical order conditions.
c/
* c-typeck.c (comptypes_internal, function_types_compatible_p,
perform_integral_promotions, digest_init): Replace Yoda conditions
with typical order conditions.
* c-decl.c (check_bitfield_type_and_width): Likewise.
cp/
* name-lookup.c (get_std_name_hint): Replace Yoda conditions with
typical order conditions.
* class.c (check_bitfield_decl): Likewise.
* pt.c (convert_template_argument): Likewise.
* decl.c (duplicate_decls): Likewise.
* typeck.c (commonparms): Likewise.
fortran/
* scanner.c (preprocessor_line): Replace Yoda conditions with typical
order conditions.
* dependency.c (check_section_vs_section): Likewise.
* trans-array.c (gfc_conv_expr_descriptor): Likewise.
jit/
* jit-playback.c (get_type, playback::compile_to_file::copy_file,
playback::context::acquire_mutex): Replace Yoda conditions with
typical order conditions.
* libgccjit.c (gcc_jit_context_new_struct_type,
gcc_jit_struct_set_fields, gcc_jit_context_new_union_type,
gcc_jit_context_new_function, gcc_jit_timer_pop): Likewise.
* jit-builtins.c (matches_builtin): Likewise.
* jit-recording.c (recording::compound_type::set_fields,
recording::fields::write_reproducer, recording::rvalue::set_scope,
recording::function::validate): Likewise.
* jit-logging.c (logger::decref): Likewise.
From-SVN: r255831
|
|
From-SVN: r255746
|
|
This patch adds a helper routine that constructs rtxes
for constant shift amounts, given the mode of the value
being shifted. As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.
One long-standing problem has been to decide what the mode
of a shift count should be for arbitrary rtxes (as opposed to those
directly tied to a target pattern). Realistic choices would be
the mode of the shifted elements, word_mode, QImode, or the same
mode as the shift optabs (in which case what should the mode
be when the target doesn't have a pattern?)
For now the patch picks the mode of the shifted elements,
but with a ??? comment.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
(expand_vec_perm_var): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255745
|
|
arguments are using gen_const_vec_series.
* simplify-rtx.c (simplify_binary_operation_1) <case VEC_SERIES>:
Handle the case where both arguments are using gen_const_vec_series.
From-SVN: r255079
|
|
2017-11-20 Tom de Vries <tom@codesourcery.com>
PR rtl-optimization/82020
* simplify-rtx.c (simplify_ternary_operation): Fix comparison mode of
IF_THEN_ELSE condition.
From-SVN: r254944
|
|
Another vec_merge simplification that's missing from simplify-rtx.c is transforming
a vec_merge of two vec_duplicates. For example:
(set (reg:V2DF 80)
(vec_merge:V2DF (vec_duplicate:V2DF (reg:DF 84))
(vec_duplicate:V2DF (reg:DF 81))
(const_int 2)))
Can be transformed into the simpler:
(set (reg:V2DF 80)
(vec_concat:V2DF (reg:DF 81)
(reg:DF 84)))
I believe this should always be beneficial.
I'm still looking into finding a small testcase demonstrating this, but on aarch64 SPEC
I've seen this eliminate some really bizzare codegen where GCC was generating nonsense like:
ldr q18, [sp, 448]
ins v18.d[0], v23.d[0]
ins v18.d[1], v22.d[0]
With q18 being pushed and popped off the stack in the prologue and epilogue of the function!
These are large files from SPEC that I haven't been able to analyse yet as to why GCC even attempts
to do that, but with this patch it doesn't try to load a register and overwrite all its lanes.
This patch shaves off about 5k of code size from zeusmp on aarch64 at -O3, so I believe it's a good
thing to do.
* simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge
of two vec_duplicates into a vec_concat.
From-SVN: r254550
|
|
Another vec_merge simplification that's missing is transforming:
(vec_merge (vec_duplicate x) (vec_concat (y) (z)) (const_int N))
into
(vec_concat x z) if N == 1 (0b01) or
(vec_concat y x) if N == 2 (0b10)
For the testcase in this patch on aarch64 this allows us to try matching during combine the pattern:
(set (reg:V2DI 78 [ x ])
(vec_concat:V2DI
(mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64])
(mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
(const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64])))
rather than the more complex:
(set (reg:V2DI 78 [ x ])
(vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (plus:DI (reg/v/f:DI 76 [ y ])
(const_int 8 [0x8])) [1 MEM[(long long int *)y_4(D) + 8B]+0 S8 A64]))
(vec_duplicate:V2DI (mem:DI (reg/v/f:DI 76 [ y ]) [1 *y_4(D)+0 S8 A64]))
(const_int 2 [0x2])))
We don't actually have an aarch64 pattern for the simplified version above, but it's a simple enough
form to add, so this patch adds such a pattern that performs a concatenated load of two 64-bit vectors
in adjacent memory locations as a single Q-register LDR. The new aarch64 pattern is needed to demonstrate
the effectiveness of the simplify-rtx change, so I've kept them together as one patch.
Now for the testcase in the patch we can generate:
construct_lanedi:
ldr q0, [x0]
ret
construct_lanedf:
ldr q0, [x0]
ret
instead of:
construct_lanedi:
ld1r {v0.2d}, [x0]
ldr x0, [x0, 8]
ins v0.d[1], x0
ret
construct_lanedf:
ld1r {v0.2d}, [x0]
ldr d1, [x0, 8]
ins v0.d[1], v1.d[0]
ret
The new memory constraint Utq is needed because we need to allow only the Q-register addressing modes but
the MEM expressions in the RTL pattern have 64-bit vector modes, and if we don't constrain them they will
allow the D-register addressing modes during register allocation/address mode selection, which will produce
invalid assembly.
Bootstrapped and tested on aarch64-none-linux-gnu.
* simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
Simplify vec_merge of vec_duplicate and vec_concat.
* config/aarch64/constraints.md (Utq): New constraint.
* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): New
define_insn.
* gcc.target/aarch64/load_v2vec_lanes_1.c: New test.
From-SVN: r254549
|
|
I'm trying to improve some of the RTL-level handling of vector lane operations on aarch64 and that
involves dealing with a lot of vec_merge operations. One simplification that I noticed missing
from simplify-rtx are combinations of vec_merge with vec_duplicate.
In this particular case:
(vec_merge (vec_duplicate (X)) (const_vector [A, B]) (const_int N))
which can be replaced with
(vec_concat (X) (B)) if N == 1 (0b01) or
(vec_concat (A) (X)) if N == 2 (0b10).
For the aarch64 testcase in this patch this simplifications allows us to try to combine:
(set (reg:V2DI 77 [ x ])
(vec_concat:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64])
(const_int 0 [0])))
instead of the more complex:
(set (reg:V2DI 77 [ x ])
(vec_merge:V2DI (vec_duplicate:V2DI (mem:DI (reg:DI 0 x0 [ y ]) [1 *y_3(D)+0 S8 A64]))
(const_vector:V2DI [
(const_int 0 [0])
(const_int 0 [0])
])
(const_int 1 [0x1])))
For the simplified form above we already have an aarch64 pattern: *aarch64_combinez<mode> which
is missing a DI/DFmode version due to an oversight, so this patch extends that pattern as well to
use the VDC mode iterator that includes DI and DFmode (as well as V2HF which VD_BHSI was missing).
The aarch64 hunk is needed to see the benefit of the simplify-rtx.c hunk, so I didn't split them
into separate patches.
Before this for the testcase we'd generate:
construct_lanedi:
movi v0.4s, 0
ldr x0, [x0]
ins v0.d[0], x0
ret
construct_lanedf:
movi v0.2d, 0
ldr d1, [x0]
ins v0.d[0], v1.d[0]
ret
but now we can generate:
construct_lanedi:
ldr d0, [x0]
ret
construct_lanedf:
ldr d0, [x0]
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
* simplify-rtx.c (simplify_ternary_operation, VEC_MERGE):
Simplify vec_merge of vec_duplicate and const_vector.
* config/aarch64/predicates.md (aarch64_simd_or_scalar_imm_zero):
New predicate.
* config/aarch64/aarch64-simd.md (*aarch64_combinez<mode>): Use VDC
mode iterator. Update predicate on operand 1 to
handle non-const_vec constants. Delete constraints.
(*aarch64_combinez_be<mode>): Likewise for operand 2.
* gcc.target/aarch64/construct_lane_zero_1.c: New test.
From-SVN: r254548
|
|
This patch avoids some calculations of the form:
GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
in simplify-rtx.c. If we're dealing with CONST_VECTORs, it's better
to use CONST_VECTOR_NUNITS, since that remains constant even after the
SVE patches. In other cases we can get the number from GET_MODE_NUNITS.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
and CONST_VECTOR_NUNITS instead of computing the number of units from
the byte sizes of the vector and element.
(simplify_binary_operation_1): Likewise.
(simplify_const_binary_operation): Likewise.
(simplify_ternary_operation): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254311
|
|
This patch adds a fixed_size_mode machine_mode wrapper
for modes that are known to have a fixed size. That applies
to all current modes, but future patches will add support for
variable-sized modes.
The use of this class should be pretty restricted. One important
use case is to hold the mode of static data, which can never be
variable-sized with current file formats. Another is to hold
the modes of registers involved in __builtin_apply and
__builtin_result, since those interfaces don't cope well with
variable-sized data.
The class can also be useful when reinterpreting the contents of
a fixed-length bit string as a different kind of value.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (fixed_size_mode): New class.
* rtl.h (get_pool_mode): Return fixed_size_mode.
* gengtype.c (main): Add fixed_size_mode.
* target.def (get_raw_result_mode): Return a fixed_size_mode.
(get_raw_arg_mode): Likewise.
* doc/tm.texi: Regenerate.
* targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
* targhooks.c (default_get_reg_raw_mode): Likewise.
* config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
* config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
* config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
(msp430_get_raw_result_mode): Likewise.
* config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode>
* dbxout.c (dbxout_parms): Require fixed-size modes.
* expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
* omp-low.c (lower_oacc_reductions): Likewise.
* simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
(simplify_subreg): Update accordingly.
* varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
(force_const_mem): Update accordingly. Return NULL_RTX for modes
that aren't fixed-size.
(get_pool_mode): Return a fixed_size_mode.
(output_constant_pool_2): Take a fixed_size_mode.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254300
|
|
This patch adds an rtl representation of a vector linear series
of the form:
a[I] = BASE + I * STEP
Like vec_duplicate;
- the new rtx can be used for both constant and non-constant vectors
- when used for constant vectors it is wrapped in a (const ...)
- the constant form is only used for variable-length vectors;
fixed-length vectors still use CONST_VECTOR
At the moment the code is restricted to integer elements, to avoid
concerns over floating-point rounding.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi (vec_series): Document.
(const): Say that the operand can be a vec_series.
* rtl.def (VEC_SERIES): New rtx code.
* rtl.h (const_vec_series_p_1): Declare.
(const_vec_series_p): New function.
* emit-rtl.h (gen_const_vec_series): Declare.
(gen_vec_series): Likewise.
* emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
(gen_vec_series): Likewise.
* optabs.c (expand_mult_highpart): Use gen_const_vec_series.
* simplify-rtx.c (simplify_unary_operation): Handle negations
of vector series.
(simplify_binary_operation_series): New function.
(simplify_binary_operation_1): Use it. Handle VEC_SERIES.
(test_vector_ops_series): New function.
(test_vector_ops): Call it.
* config/powerpcspe/altivec.md (altivec_lvsl): Use
gen_const_vec_series.
(altivec_lvsr): Likewise.
* config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254297
|
|
This patch adds a vec_duplicate_p helper that tests for constant
or non-constant vector duplicates. Together with the existing
const_vec_duplicate_p, this complements the gen_vec_duplicate
and gen_const_vec_duplicate added by a previous patch.
The patch uses the new routines to add more rtx simplifications
involving vector duplicates. These mirror simplifications that
we already do for CONST_VECTOR broadcasts and are needed for
variable-length SVE, which uses:
(const:M (vec_duplicate:M X))
to represent constant broadcasts instead. The simplifications do
trigger on the testsuite for variable duplicates too, and in each
case I saw the change was an improvement.
The best way of testing the new simplifications seemed to be
via selftests. The patch cribs part of David's patch here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
David Malcolm <dmalcolm@redhat.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (vec_duplicate_p): New function.
* selftest-rtl.c (assert_rtx_eq_at): New function.
* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
(assert_rtx_eq_at): Declare.
* selftest.h (selftest::simplify_rtx_c_tests): Declare.
* selftest-run-tests.c (selftest::run_tests): Call it.
* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
(simplify_unary_operation_1): Recursively handle vector duplicates.
(simplify_binary_operation_1): Likewise. Handle VEC_SELECTs of
vector duplicates.
(simplify_subreg): Handle subregs of vector duplicates.
(make_test_reg, test_vector_ops_duplicate, test_vector_ops)
(selftest::simplify_rtx_c_tests): New functions.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Malcolm <dmalcolm@redhat.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254294
|
|
This patch adds helper functions for generating constant and
non-constant vector duplicates. These routines help with SVE because
it is then easier to use:
(const:M (vec_duplicate:M X))
for a broadcast of X, even if the number of elements in M isn't known
at compile time. It also makes it easier for general rtx code to treat
constant and non-constant duplicates in the same way.
In the target code, the patch uses gen_vec_duplicate instead of
gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
useful. It might be that some or all of the call sites only handle
non-constants in practice, in which case the change is a harmless
no-op (and a saving of a few characters).
Otherwise, the target changes use gen_const_vec_duplicate instead
of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
They also include some changes to use CONSTxx_RTX for easy global
constants.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_const_vec_duplicate): Declare.
(gen_vec_duplicate): Likewise.
* emit-rtl.c (gen_const_vec_duplicate_1): New function, split
out from...
(gen_const_vector): ...here.
(gen_const_vec_duplicate, gen_vec_duplicate): New functions.
(gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
whose elements are all equal.
* optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
* simplify-rtx.c (simplify_const_unary_operation): Likewise.
(simplify_relational_operation): Likewise.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Likewise.
(aarch64_simd_dup_constant): Use gen_vec_duplicate.
(aarch64_expand_vector_init): Likewise.
* config/arm/arm.c (neon_vdup_constant): Likewise.
(neon_expand_vector_init): Likewise.
(arm_expand_vec_perm): Use gen_const_vec_duplicate.
(arm_block_set_unaligned_vect): Likewise.
(arm_block_set_aligned_vect): Likewise.
* config/arm/neon.md (neon_copysignf<mode>): Likewise.
* config/i386/i386.c (ix86_expand_vec_perm): Likewise.
(expand_vec_perm_even_odd_pack): Likewise.
(ix86_vector_duplicate_value): Use gen_vec_duplicate.
* config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
* config/ia64/ia64.c (ia64_expand_vecint_compare): Use
gen_const_vec_duplicate.
* config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
* config/mips/mips.c (mips_gen_const_int_vector): Use
gen_const_vec_duplicate.
(mips_expand_vector_init): Use CONST0_RTX.
* config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
(define_split): Use gen_const_vec_duplicate.
* config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
(define_split): Use gen_const_vec_duplicate.
* config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
(vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
* config/spu/spu.c (spu_const): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r254292
|
|
This patch is like the earlier GET_MODE_UNIT_SIZE one,
but for precisions rather than sizes. There is one behavioural
change in expand_debug_expr: we shouldn't use lowpart subregs
for non-scalar truncations, since that would just reinterpret
some of the scalars and drop the rest. (This probably doesn't
trigger in practice.) Using TRUNCATE is fine for scalars,
since simplify_gen_unary knows when a subreg can be used.
2017-10-22 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* cfgexpand.c (expand_debug_expr): Use GET_MODE_UNIT_PRECISION.
(expand_debug_source_expr): Likewise.
* combine.c (combine_simplify_rtx): Likewise.
* cse.c (fold_rtx): Likewise.
* optabs.c (expand_float): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r253991
|
|
This patch uses HWI_COMPUTABLE_MODE_P (X) instead of
GET_MODE_PRECISION (X) <= HOST_BITS_PER_WIDE_INT in cases
where X also needs to be a scalar integer.
2017-10-22 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* combine.c (simplify_comparison): Use HWI_COMPUTABLE_MODE_P.
(record_promoted_value): Likewise.
* expr.c (expand_expr_real_2): Likewise.
* ree.c (update_reg_equal_equiv_notes): Likewise.
(combine_set_extension): Likewise.
* rtlanal.c (low_bitmask_len): Likewise.
* simplify-rtx.c (neg_const_int): Likewise.
(simplify_binary_operation_1): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r253990
|
|
This patch uses GET_MODE_UNIT_SIZE instead of GET_MODE_SIZE in
cases where, for compound modes, the mode of the scalar elements
is what matters. E.g. the choice between truncation and extension
is really based on the modes of the consistuent scalars rather
than the mode as a whole.
None of the existing code was wrong. The patch simply makes
things easier when converting to variable-sized modes.
2017-10-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs.c (add_equal_note): Use GET_MODE_UNIT_SIZE.
(widened_mode): Likewise.
(expand_unop): Likewise.
* ree.c (transform_ifelse): Likewise.
(merge_def_and_ext): Likewise.
(combine_reaching_defs): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r253715
|
|
If we have (X&C1)|C2 simplify_binary_operation_1 makes C1 as small as
possible. This makes worse code in common cases like when the AND with
C1 is from a zero-extension. This patch fixes it by removing this
transformation (twice).
PR rtl-optimization/77729
* simplify-rtx.c (simplify_binary_operation_1): Delete the (X&C1)|C2
to (X&(C1&~C2))|C2 transformations.
From-SVN: r253384
|