Age | Commit message (Collapse) | Author | Files | Lines |
|
combine at -02)
PR rtl-optimization/85925
* rtl.h (word_register_operation_p): New predicate.
* combine.c (record_dead_and_set_regs_1): Only apply specific handling
for WORD_REGISTER_OPERATIONS targets to word_register_operation_p RTX.
* rtlanal.c (nonzero_bits1): Likewise. Adjust couple of comments.
(num_sign_bit_copies1): Likewise.
From-SVN: r266302
|
|
This makes make_more_copies do what its documentation says, that is,
only make an intermediate pseudo if copying to a pseudo.
This regressed generated code quality when we didn't keep the original
notes that were on the copy, but since r265582 we do, and only allowing
pseudos now is a win. It also simplifies the code.
* combine.c (make_more_copies): Only make an intermediate copy if the
dest of a move is a pseudo.
From-SVN: r266004
|
|
The code with an intermediate register is perfectly fine, but LRA
apparently cannot handle the resulting code, or perhaps something else
is wrong. In either case, making an extra temporary will not likely
help here, so let's just skip it.
PR rtl-optimization/87871
* combine.c (make_more_copies): Skip if dest is frame_pointer_rtx.
From-SVN: r265821
|
|
This rewrites most of make_more_copies, in the process fixing a few PRs
and some other bugs, and working around a few target problems. Certain
notes turn out to actually change the meaning of the RTL, so we cannot
drop them; and i386 takes subregs of hard regs.
PR rtl-optimization/87701
PR rtl-optimization/87780
* combine.c (make_more_copies): Rewrite.
From-SVN: r265582
|
|
Jumps are written in RTL as moves to PC. But the latter has no mode,
so we shouldn't try to use it. Since the optimization this routine
does does not really help for jumps at all, let's just skip it.
PR rtl-optimization/87720
* combine.c (make_more_copies): Skip if the dest is pc_rtx.
From-SVN: r265474
|
|
On most targets every function starts with moves from the parameter
passing (hard) registers into pseudos. Similarly, after every call
there is a move from the return register into a pseudo. These moves
usually combine with later instructions (leaving pretty much the same
instruction, just with a hard reg instead of a pseudo).
This isn't a good idea. Register allocation can get rid of unnecessary
moves just fine, and moving the parameter passing registers into many
later instructions tends to prevent good register allocation. This
patch disallows combining moves from a hard (non-fixed) register.
This also avoid the problem mentioned in PR87600 #c3 (combining hard
registers into inline assembler is problematic).
Because the register move can often be combined with other instructions
*itself*, for example for setting some condition code, this patch adds
extra copies via new pseudos after every copy-from-hard-reg.
On some targets this reduces average code size. On others it increases
it a bit, 0.1% or 0.2% or so. (I tested this on all *-linux targets).
PR rtl-optimization/87600
* combine.c: Add include of expr.h.
(cant_combine_insn_p): Do not combine moves from any hard non-fixed
register to a pseudo.
(make_more_copies): New function, add a copy to a new pseudo after
the moves from hard registers into pseudos.
(rest_of_handle_combine): Declare rebuild_jump_labels_after_combine
later. Call make_more_copies.
From-SVN: r265398
|
|
This code in try_combine uses the wrong mode. This fails (with RTL
checking) in trunk, but not in any released branches.
PR rtl-optimization/86902
* combine.c (try_combine): When changing the CC mode used, don't change
an unrelated mode in other_insn to that new CC mode.
From-SVN: r264426
|
|
PR rtl-optimization/87065
* combine.c (simplify_if_then_else): Formatting fix.
(if_then_else_cond): Guard MULT optimization with SCALAR_INT_MODE_P
check.
(known_cond): Don't return const_true_rtx for vector modes. Use
CONST0_RTX instead of const0_rtx. Formatting fixes.
* gcc.target/i386/pr87065.c: New test.
From-SVN: r263872
|
|
When combine splits a resulting parallel into its two SETs, it has to
place one at i2, and the other stays at i3. This does not work if the
destination of the SET that will be placed at i2 is modified between
i2 and i3. This patch fixes it.
* combine.c (try_combine): Do not allow splitting a resulting PARALLEL
of two SETs into those two SETs, one to be placed at i2, if that SETs
destination is modified between i2 and i3.
From-SVN: r263776
|
|
gcc/
* alias.c (record_set): Check for clobber high.
* cfgexpand.c (expand_gimple_stmt): Likewise.
* combine-stack-adj.c (single_set_for_csa): Likewise.
* combine.c (find_single_use_1): Likewise.
(set_nonzero_bits_and_sign_copies): Likewise.
(get_combine_src_dest): Likewise.
(is_parallel_of_n_reg_sets): Likewise.
(try_combine): Likewise.
(record_dead_and_set_regs_1): Likewise.
(reg_dead_at_p_1): Likewise.
(reg_dead_at_p): Likewise.
* dce.c (deletable_insn_p): Likewise.
(mark_nonreg_stores_1): Likewise.
(mark_nonreg_stores_2): Likewise.
* df-scan.c (df_find_hard_reg_defs): Likewise.
(df_uses_record): Likewise.
(df_get_call_refs): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* haifa-sched.c (haifa_classify_rtx): Likewise.
* ira-build.c (create_insn_allocnos): Likewise.
* ira-costs.c (scan_one_insn): Likewise.
* ira.c (equiv_init_movable_p): Likewise.
(rtx_moveable_p): Likewise.
(interesting_dest_for_shprep): Likewise.
* jump.c (mark_jump_label_1): Likewise.
* postreload-gcse.c (record_opr_changes): Likewise.
* postreload.c (reload_cse_simplify): Likewise.
(struct reg_use): Add source expr.
(reload_combine): Check for clobber high.
(reload_combine_note_use): Likewise.
(reload_cse_move2add): Likewise.
(move2add_note_store): Likewise.
* print-rtl.c (print_pattern): Likewise.
* recog.c (decode_asm_operands): Likewise.
(store_data_bypass_p): Likewise.
(if_test_bypass_p): Likewise.
* regcprop.c (kill_clobbered_value): Likewise.
(kill_set_value): Likewise.
* reginfo.c (reg_scan_mark_refs): Likewise.
* reload1.c (maybe_fix_stack_asms): Likewise.
(eliminate_regs_1): Likewise.
(elimination_effects): Likewise.
(mark_not_eliminable): Likewise.
(scan_paradoxical_subregs): Likewise.
(forget_old_reloads_1): Likewise.
* reorg.c (find_end_label): Likewise.
(try_merge_delay_insns): Likewise.
(redundant_insn): Likewise.
(own_thread_p): Likewise.
(fill_simple_delay_slots): Likewise.
(fill_slots_from_thread): Likewise.
(dbr_schedule): Likewise.
* resource.c (update_live_status): Likewise.
(mark_referenced_resources): Likewise.
(mark_set_resources): Likewise.
* rtl.c (copy_rtx): Likewise.
* rtlanal.c (reg_referenced_p): Likewise.
(single_set_2): Likewise.
(noop_move_p): Likewise.
(note_stores): Likewise.
* sched-deps.c (sched_analyze_reg): Likewise.
(sched_analyze_insn): Likewise.
From-SVN: r263331
|
|
This patch allows combine to combine two insns into two. This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen. PR85160 is a typical example of code
that it can improve.
This patch does not allow such combinations if either of the original
instructions was a simple move instruction. In those cases combining
the two instructions increases register pressure without improving the
code. With this move test register pressure does no longer increase
noticably as far as I can tell.
(At first I also didn't allow either of the resulting insns to be a
move instruction. But that is actually a very good thing to have, as
should have been obvious).
PR rtl-optimization/85160
* combine.c (is_just_move): New function.
(try_combine): Allow combining two instructions into two if neither of
the original instructions was a move.
From-SVN: r263067
|
|
The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode. This is
completely wrong for non-pseudos. For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts. The value written to the DImode is not the same as that
written to the lowest-numbered QImode!
PR rtl-optimization/85805
* combine.c (reg_nonzero_bits_for_combine): Only use the last set
value for hard registers if that was written in the same mode.
From-SVN: r262994
|
|
This patch generalises various places that used hwi rtx accessors so
that they can handle poly_ints instead. In many cases these changes
are by inspection rather than because something had shown them to be
necessary.
2018-06-12 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* poly-int.h (can_div_trunc_p): Add new overload in which all values
are poly_ints.
* alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p.
(memrefs_conflict_p): Likewise.
(init_alias_analysis): Likewise.
* cfgexpand.c (expand_debug_expr): Likewise.
* combine.c (combine_simplify_rtx, force_int_to_mode): Likewise.
* cse.c (fold_rtx): Likewise.
* explow.c (adjust_stack, anti_adjust_stack): Likewise.
* expr.c (emit_block_move_hints): Likewise.
(clear_storage_hints, push_block, emit_push_insn): Likewise.
(store_expr_with_bounds, reduce_to_bit_field_precision): Likewise.
(emit_group_load_1): Use rtx_to_poly_int64 for group offsets.
(emit_group_store): Likewise.
(find_args_size_adjust): Use strip_offset. Use rtx_to_poly_int64
to read the PRE/POST_MODIFY increment.
* calls.c (store_one_arg): Use strip_offset.
* rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to
poly_int_rtx_p.
(set_noop_p): Use rtx_to_poly_int64 for the elements selected
by a VEC_SELECT.
* simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset.
(simplify_binary_operation_1): Extend CONST_INT handling to
poly_int_rtx_p.
* var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather
than a HOST_WIDE_INT.
(hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to
poly_int64.
(adjust_mems, add_stores): Update accodingly.
(vt_canonicalize_addr): Track polynomial offsets.
(emit_note_insn_var_location): Likewise.
(vt_add_function_parameter): Likewise.
(vt_initialize): Likewise.
From-SVN: r261530
|
|
simplify-rtx.c:895)
PR rtl-optimization/85300
* combine.c (subst): Handle subst of CONST_SCALAR_INT_P new_rtx also
into FLOAT and UNSIGNED_FLOAT like ZERO_EXTEND, return a CLOBBER if
simplify_unary_operation fails.
* gcc.dg/pr85300.c: New test.
From-SVN: r259285
|
|
distribute_links tries to place a log_link for whatever the destination
of the modified instruction is. It shouldn't do that when that dest
is pc_rtx, which isn't actually a register.
* combine.c (distribute_links): Don't make a link based on pc_rtx.
From-SVN: r258523
|
|
There still are situations where we have stale LOG_LINKS. This causes
combine to try two-insn combinations I2->I3 where the register set by
I2 is used before I3 as well. Not good.
This patch fixes it by checking for this situation in can_combine_p
(similar to what we already do for three and four insn combinations).
From-SVN: r258452
|
|
rhs_regno, at rtl.h:1896 with -O -fno-forward-propagate)
PR target/84710
* combine.c (try_combine): Use reg_or_subregno instead of handling
just paradoxical SUBREGs and REGs.
* gcc.dg/pr84710.c: New test.
From-SVN: r258301
|
|
PR target/84700
* combine.c (combine_simplify_rtx): Don't try to simplify if
if_then_else_cond returned non-NULL, but either true_rtx or false_rtx
are equal to x.
* gcc.target/powerpc/pr84700.c: New test.
From-SVN: r258263
|
|
-fno-split-wide-types -g3 --param=max-combine-insns=3)
PR target/84614
* rtl.h (prev_real_nondebug_insn, next_real_nondebug_insn): New
prototypes.
* emit-rtl.c (next_real_insn, prev_real_insn): Fix up function
comments.
(next_real_nondebug_insn, prev_real_nondebug_insn): New functions.
* cfgcleanup.c (try_head_merge_bb): Use prev_real_nondebug_insn
instead of a loop around prev_real_insn.
* combine.c (move_deaths): Use prev_real_nondebug_insn instead of
prev_real_insn.
* gcc.dg/pr84614.c: New test.
From-SVN: r258129
|
|
As Jakub found, after my recent combine patch at least on x86 problems
show up with RTL checking enabled. This is because the I2 generated
by a successful instruction combination can write not only a register
but it can also write a paradoxical subreg of one.
This fixes it.
* combine.c (try_combine): When adjusting LOG_LINKS for the destination
that moved to I2, also allow destinations that are a paradoxical
subreg (instead of a normal reg).
From-SVN: r257736
|
|
If there is a LOG_LINK between two insns, this means those two insns
can be combined, as far as dataflow is concerned. There never should
be a LOG_LINK between two unrelated insns. If there is one, combine
will try to combine the insns without doing all the needed checks if
the earlier destination is used before the later insn, etc.
Unfortunately we do not update the LOG_LINKs correctly in some cases.
This patch fixes at least some of those cases.
PR rtl-optimization/84169
* combine.c (try_combine): New variable split_i2i3. Set it to true if
we generated a parallel as new i3 and we split that to new i2 and i3
instructions. Handle split_i2i3 similar to swap_i2i3: scan the
LOG_LINKs of i3 to see which of those need to link to i2 now. Link
those to i2, not i1. Partially rewrite this scan code.
gcc/testsuite/
PR rtl-optimization/84169
* gcc.c-torture/execute/pr84169.c: New.
From-SVN: r257644
|
|
have 'lshiftrt')
PR rtl-optimization/84157
* combine.c (change_zero_ext): Use REG_P predicate in
front of HARD_REGISTER_P predicate.
From-SVN: r257302
|
|
emit-rtl.c:908, alpha linux.)
PR rtl-optimization/84123
* combine.c (change_zero_ext): Check if hard register satisfies
can_change_dest_mode before calling gen_lowpart_SUBREG.
From-SVN: r257270
|
|
sign-extended load)
PR rtl-optimization/84071
* combine.c (record_dead_and_set_regs_1): Record the source unmodified
for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.
From-SVN: r257224
|
|
on alpha)
PR target/83628
* combine.c (force_int_to_mode) <case ASHIFT>: Use mode instead of
op_mode in the force_to_mode call.
From-SVN: r256387
|
|
This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_size): Change from unsigned short to
poly_uint16_pod.
(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(fixed_size_mode::includes_p): Check for constant-sized modes.
* genmodes.c (emit_mode_size_inline): Make mode_size_inline
return a poly_uint16 rather than an unsigned short.
(emit_mode_size): Change the type of mode_size from unsigned short
to poly_uint16_pod. Use ZERO_COEFFS for the initializer.
(emit_mode_adjustments): Cope with polynomial vector sizes.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_SIZE.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_SIZE.
* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
* caller-save.c (setup_save_areas): Likewise.
(replace_reg_with_saved_mem): Likewise.
* calls.c (emit_library_call_value_1): Likewise.
* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
(gen_lowpart_for_combine): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (equiv_constant, cse_insn): Likewise.
* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
(cselib_subst_to_values): Likewise.
* dce.c (word_dce_process_block): Likewise.
* df-problems.c (df_word_lr_mark_ref): Likewise.
* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
(rtl_for_decl_location): Likewise.
* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
(expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
(pad_below): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-build.c (ira_create_allocno_objects): Likewise.
* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
(ira_sort_regnos_for_alter_reg): Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
(resolve_simple_move): Likewise.
* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
(process_addr_reg, simplify_operand_subreg, curr_insn_transform)
(lra_constraints): Likewise.
(CONST_POOL_OK_P): Reject variable-sized modes.
* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
(add_pseudo_to_slot, lra_spill): Likewise.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* optabs-query.c (get_best_extraction_insn): Likewise.
* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
* optabs.c (expand_vec_perm_var, expand_vec_cond_expr): Likewise.
(expand_mult_highpart, valid_multiword_target_p): Likewise.
* recog.c (offsettable_address_addr_space_p): Likewise.
* regcprop.c (maybe_mode_change): Likewise.
* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
* regrename.c (build_def_use): Likewise.
* regstat.c (dump_reg_info): Likewise.
* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
(find_reloads, find_reloads_subreg_address): Likewise.
* reload1.c (eliminate_regs_1): Likewise.
* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
(simplify_binary_operation_1, simplify_subreg): Likewise.
* targhooks.c (default_function_arg_padding): Likewise.
(default_hard_regno_nregs, default_class_max_nregs): Likewise.
* tree-cfg.c (verify_gimple_assign_binary): Likewise.
(verify_gimple_assign_ternary): Likewise.
* tree-inline.c (estimate_move_cost): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
(get_address_cost_ainc): Likewise.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(vectorizable_reduction): Likewise.
* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
(vectorizable_operation, vectorizable_load): Likewise.
* tree.c (build_same_sized_truth_vector_type): Likewise.
* valtrack.c (cleanup_auto_inc_dec): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
(ADDR_VEC_ALIGN): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256201
|
|
This patch changes GET_MODE_BITSIZE from an unsigned short
to a poly_uint16.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_to_bits): Return a poly_uint16 rather than an
unsigned short.
(GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is polynomial.
* calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial.
* combine.c (make_extraction): Likewise.
* dse.c (find_shift_sequence): Likewise.
* dwarf2out.c (mem_loc_descriptor): Likewise.
* expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise.
(extract_bit_field, extract_low_bits): Likewise.
* expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise.
(optimize_bitfield_assignment_op, expand_assignment): Likewise.
(store_expr_with_bounds, store_field, expand_expr_real_1): Likewise.
* fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise.
* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
* reload.c (find_reloads): Likewise.
* reload1.c (alter_reg): Likewise.
* stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise.
* targhooks.c (default_secondary_memory_needed_mode): Likewise.
* tree-if-conv.c (predicate_mem_writes): Likewise.
* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
* tree-vect-patterns.c (adjust_bool_pattern): Likewise.
* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
* valtrack.c (dead_debug_insert_temp): Likewise.
* varasm.c (mergeable_constant_section): Likewise.
* config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a <fixed_size_mode>.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE
as polynomial.
gcc/c-family/
* c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE
as polynomial.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256200
|
|
This patch changes GET_MODE_PRECISION from an unsigned short
to a poly_uint16.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (mode_precision): Change from unsigned short to
poly_uint16_pod.
(mode_to_precision): Return a poly_uint16 rather than an unsigned
short.
(GET_MODE_PRECISION): Return a constant if ONLY_FIXED_SIZE_MODES,
or if measurement_type is not polynomial.
(HWI_COMPUTABLE_MODE_P): Turn into a function. Optimize the case
in which the mode is already known to be a scalar_int_mode.
* genmodes.c (emit_mode_precision): Change the type of mode_precision
from unsigned short to poly_uint16_pod. Use ZERO_COEFFS for the
initializer.
* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
for GET_MODE_PRECISION.
* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
for GET_MODE_PRECISION.
* combine.c (update_rsp_from_reg_equal): Treat GET_MODE_PRECISION
as polynomial.
(try_combine, find_split_point, combine_simplify_rtx): Likewise.
(expand_field_assignment, make_extraction): Likewise.
(make_compound_operation_int, record_dead_and_set_regs_1): Likewise.
(get_last_value): Likewise.
* convert.c (convert_to_integer_1): Likewise.
* cse.c (cse_insn): Likewise.
* expr.c (expand_expr_real_1): Likewise.
* lra-constraints.c (simplify_operand_subreg): Likewise.
* optabs-query.c (can_atomic_load_p): Likewise.
* optabs.c (expand_atomic_load): Likewise.
(expand_atomic_store): Likewise.
* ree.c (combine_reaching_defs): Likewise.
* rtl.h (partial_subreg_p, paradoxical_subreg_p): Likewise.
* rtlanal.c (nonzero_bits1, lsb_bitfield_op_p): Likewise.
* tree.h (type_has_mode_precision_p): Likewise.
* ubsan.c (instrument_si_overflow): Likewise.
gcc/ada/
* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_PRECISION
as polynomial.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256198
|
|
From-SVN: r256169
|
|
This patch makes target-independent code that uses REGMODE_NATURAL_SIZE
treat it as a poly_int rather than a constant.
2018-01-03 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* combine.c (can_change_dest_mode): Handle polynomial
REGMODE_NATURAL_SIZE.
* expmed.c (store_bit_field_1): Likewise.
* expr.c (store_constructor): Likewise.
* emit-rtl.c (validate_subreg): Operate on polynomial mode sizes
and polynomial REGMODE_NATURAL_SIZE.
(gen_lowpart_common): Likewise.
* reginfo.c (record_subregs_of_mode): Likewise.
* rtlanal.c (read_modify_subreg_p): Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256149
|
|
scalar int mode
gcc/
* combine.c (simplify_set): Do not transform subregs to zero_extends
if the destination is not a scalar int mode.
From-SVN: r255945
|
|
This patch adds new utility functions for manipulating REG_ARGS_SIZE
notes and allows the notes to carry polynomial as well as constant sizes.
The code was inconsistent about whether INT_MIN or HOST_WIDE_INT_MIN
should be used to represent an unknown size. The patch uses
HOST_WIDE_INT_MIN throughout.
2017-12-21 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (get_args_size, add_args_size_note): New functions.
(find_args_size_adjust): Return a poly_int64 rather than a
HOST_WIDE_INT.
(fixup_args_size_notes): Likewise. Make the same change to the
end_args_size parameter.
* rtlanal.c (get_args_size, add_args_size_note): New functions.
* builtins.c (expand_builtin_trap): Use add_args_size_note.
* calls.c (emit_call_1): Likewise.
* explow.c (adjust_stack_1): Likewise.
* cfgcleanup.c (old_insns_match_p): Update use of
find_args_size_adjust.
* combine.c (distribute_notes): Track polynomial arg sizes.
* dwarf2cfi.c (dw_trace_info): Change beg_true_args_size,
end_true_args_size, beg_delay_args_size and end_delay_args_size
from HOST_WIDE_INT to poly_int64.
(add_cfi_args_size): Take the args_size as a poly_int64 rather
than a HOST_WIDE_INT.
(notice_args_size, notice_eh_throw, maybe_record_trace_start)
(maybe_record_trace_start_abnormal, scan_trace, connect_traces): Track
polynomial arg sizes.
* emit-rtl.c (try_split): Use get_args_size.
* recog.c (peep2_attempt): Likewise.
* reload1.c (reload_as_needed): Likewise.
* expr.c (find_args_size_adjust): Return the adjustment as a
poly_int64 rather than a HOST_WIDE_INT.
(fixup_args_size_notes): Change end_args_size from a HOST_WIDE_INT
to a poly_int64 and change the return type in the same way.
(emit_single_push_insn): Track polynomial arg sizes.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255919
|
|
This patch changes SUBREG_BYTE from an int to a poly_int.
Since valid SUBREG_BYTEs must be contained within the mode of the
SUBREG_REG, the required range is the same as for GET_MODE_SIZE,
i.e. unsigned short. The patch therefore uses poly_uint16(_pod)
for the SUBREG_BYTE.
Using poly_uint16_pod rtx fields requires a new field code ('p').
Since there are no other uses of 'p' besides SUBREG_BYTE, the patch
doesn't add an XPOLY or whatever; all uses should go via SUBREG_BYTE
instead.
The patch doesn't bother implementing 'p' support for legacy
define_peepholes, since none of the remaining ones have subregs
in their patterns.
As it happened, the rtl documentation used SUBREG as an example of a
code with mixed field types, accessed via XEXP (x, 0) and XINT (x, 1).
Since there's no direct replacement for XINT, and since people should
never use it even if there were, the patch changes the example to use
INT_LIST instead.
The patch also changes subreg-related helper functions so that they too
take and return polynomial offsets. This makes the patch quite big, but
it's mostly mechanical. The patch generally sticks to existing choices
wrt signedness.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi: Update documentation of SUBREG_BYTE. Document the
'p' format code. Use INT_LIST rather than SUBREG as the example of
a code with an XINT and an XEXP. Remove the implication that
accessing an rtx field using XINT is expected to work.
* rtl.def (SUBREG): Change format from "ei" to "ep".
* rtl.h (rtunion::rt_subreg): New field.
(XCSUBREG): New macro.
(SUBREG_BYTE): Use it.
(subreg_shape): Change offset from an unsigned int to a poly_uint16.
Update constructor accordingly.
(subreg_shape::operator ==): Update accordingly.
(subreg_shape::unique_id): Return an unsigned HOST_WIDE_INT rather
than an unsigned int.
(subreg_lsb, subreg_lowpart_offset, subreg_highpart_offset): Return
a poly_uint64 rather than an unsigned int.
(subreg_lsb_1): Likewise. Take the offset as a poly_uint64 rather
than an unsigned int.
(subreg_size_offset_from_lsb, subreg_size_lowpart_offset)
(subreg_size_highpart_offset): Return a poly_uint64 rather than
an unsigned int. Take the sizes as poly_uint64s.
(subreg_offset_from_lsb): Return a poly_uint64 rather than
an unsigned int. Take the shift as a poly_uint64 rather than
an unsigned int.
(subreg_regno_offset, subreg_offset_representable_p): Take the offset
as a poly_uint64 rather than an unsigned int.
(simplify_subreg_regno): Likewise.
(byte_lowpart_offset): Return the memory offset as a poly_int64
rather than an int.
(subreg_memory_offset): Likewise. Take the subreg offset as a
poly_uint64 rather than an unsigned int.
(simplify_subreg, simplify_gen_subreg, subreg_get_info)
(gen_rtx_SUBREG, validate_subreg): Take the subreg offset as a
poly_uint64 rather than an unsigned int.
* rtl.c (rtx_format): Describe 'p' in comment.
(copy_rtx, rtx_equal_p_cb, rtx_equal_p): Handle 'p'.
* emit-rtl.c (validate_subreg, gen_rtx_SUBREG): Take the subreg
offset as a poly_uint64 rather than an unsigned int.
(byte_lowpart_offset): Return the memory offset as a poly_int64
rather than an int.
(subreg_memory_offset): Likewise. Take the subreg offset as a
poly_uint64 rather than an unsigned int.
(subreg_size_lowpart_offset, subreg_size_highpart_offset): Take the
mode sizes as poly_uint64s rather than unsigned ints. Return a
poly_uint64 rather than an unsigned int.
(subreg_lowpart_p): Treat subreg offsets as poly_ints.
(copy_insn_1): Handle 'p'.
* rtlanal.c (set_noop_p): Treat subregs offsets as poly_uint64s.
(subreg_lsb_1): Take the subreg offset as a poly_uint64 rather than
an unsigned int. Return the shift in the same way.
(subreg_lsb): Return the shift as a poly_uint64 rather than an
unsigned int.
(subreg_size_offset_from_lsb): Take the sizes and shift as
poly_uint64s rather than unsigned ints. Return the offset as
a poly_uint64.
(subreg_get_info, subreg_regno_offset, subreg_offset_representable_p)
(simplify_subreg_regno): Take the offset as a poly_uint64 rather than
an unsigned int.
* rtlhash.c (add_rtx): Handle 'p'.
* genemit.c (gen_exp): Likewise.
* gengenrtl.c (type_from_format, gendef): Likewise.
* gensupport.c (subst_pattern_match, get_alternatives_number)
(collect_insn_data, alter_predicate_for_insn, alter_constraints)
(subst_dup): Likewise.
* gengtype.c (adjust_field_rtx_def): Likewise.
* genrecog.c (find_operand, find_matching_operand, validate_pattern)
(match_pattern_2): Likewise.
(rtx_test::SUBREG_FIELD): New rtx_test::kind_enum.
(rtx_test::subreg_field): New function.
(operator ==, safe_to_hoist_p, transition_parameter_type)
(print_nonbool_test, print_test): Handle SUBREG_FIELD.
* genattrtab.c (attr_rtx_1): Say that 'p' is deliberately not handled.
* genpeep.c (match_rtx): Likewise.
* print-rtl.c (print_poly_int): Include if GENERATOR_FILE too.
(rtx_writer::print_rtx_operand): Handle 'p'.
(print_value): Handle SUBREG.
* read-rtl.c (apply_int_iterator): Likewise.
(rtx_reader::read_rtx_operand): Handle 'p'.
* alias.c (rtx_equal_for_memref_p): Likewise.
* cselib.c (rtx_equal_for_cselib_1, cselib_hash_rtx): Likewise.
* caller-save.c (replace_reg_with_saved_mem): Treat subreg offsets
as poly_ints.
* calls.c (expand_call): Likewise.
* combine.c (combine_simplify_rtx, expand_field_assignment): Likewise.
(make_extraction, gen_lowpart_for_combine): Likewise.
* loop-invariant.c (hash_invariant_expr_1, invariant_expr_equal_p):
Likewise.
* cse.c (remove_invalid_subreg_refs): Take the offset as a poly_uint64
rather than an unsigned int. Treat subreg offsets as poly_ints.
(exp_equiv_p): Handle 'p'.
(hash_rtx_cb): Likewise. Treat subreg offsets as poly_ints.
(equiv_constant, cse_insn): Treat subreg offsets as poly_ints.
* dse.c (find_shift_sequence): Likewise.
* dwarf2out.c (rtl_for_decl_location): Likewise.
* expmed.c (extract_low_bits): Likewise.
* expr.c (emit_group_store, undefined_operand_subword_p): Likewise.
(expand_expr_real_2): Likewise.
* final.c (alter_subreg): Likewise.
(leaf_renumber_regs_insn): Handle 'p'.
* function.c (assign_parm_find_stack_rtl, assign_parm_setup_stack):
Treat subreg offsets as poly_ints.
* fwprop.c (forward_propagate_and_simplify): Likewise.
* ifcvt.c (noce_emit_move_insn, noce_emit_cmove): Likewise.
* ira.c (get_subreg_tracking_sizes): Likewise.
* ira-conflicts.c (go_through_subreg): Likewise.
* ira-lives.c (process_single_reg_class_operands): Likewise.
* jump.c (rtx_renumbered_equal_p): Likewise. Handle 'p'.
* lower-subreg.c (simplify_subreg_concatn): Take the subreg offset
as a poly_uint64 rather than an unsigned int.
(simplify_gen_subreg_concatn, resolve_simple_move): Treat
subreg offsets as poly_ints.
* lra-constraints.c (operands_match_p): Handle 'p'.
(match_reload, curr_insn_transform): Treat subreg offsets as poly_ints.
* lra-spills.c (assign_mem_slot): Likewise.
* postreload.c (move2add_valid_value_p): Likewise.
* recog.c (general_operand, indirect_operand): Likewise.
* regcprop.c (copy_value, maybe_mode_change): Likewise.
(copyprop_hardreg_forward_1): Likewise.
* reginfo.c (simplifiable_subregs_hasher::hash, simplifiable_subregs)
(record_subregs_of_mode): Likewise.
* rtlhooks.c (gen_lowpart_general, gen_lowpart_if_possible): Likewise.
* reload.c (operands_match_p): Handle 'p'.
(find_reloads_subreg_address): Treat subreg offsets as poly_ints.
* reload1.c (alter_reg, choose_reload_regs): Likewise.
(compute_reload_subreg_offset): Likewise, and return an poly_int64.
* simplify-rtx.c (simplify_truncation, simplify_binary_operation_1):
(test_vector_ops_duplicate): Treat subreg offsets as poly_ints.
(simplify_const_poly_int_tests<N>::run): Likewise.
(simplify_subreg, simplify_gen_subreg): Take the subreg offset as
a poly_uint64 rather than an unsigned int.
* valtrack.c (debug_lowpart_subreg): Likewise.
* var-tracking.c (var_lowpart): Likewise.
(loc_cmp): Handle 'p'.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255882
|
|
This patch adds a helper routine that constructs rtxes
for constant shift amounts, given the mode of the value
being shifted. As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.
One long-standing problem has been to decide what the mode
of a shift count should be for arbitrary rtxes (as opposed to those
directly tied to a target pattern). Realistic choices would be
the mode of the shifted elements, word_mode, QImode, a 64-bit mode,
or the same mode as the shift optabs (in which case what should the
mode be when the target doesn't have a pattern?)
For now the patch picks a 64-bit mode, but with a ??? comment.
2017-12-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
(expand_vec_perm_var): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255861
|
|
conditions.
* read-rtl.c (parse_reg_note_name): Replace Yoda conditions with
typical order conditions.
* sel-sched.c (extract_new_fences_from): Likewise.
* config/visium/constraints.md (J, K, L): Likewise.
* config/visium/predicates.md (const_shift_operand): Likewise.
* config/visium/visium.c (visium_legitimize_address,
visium_legitimize_reload_address): Likewise.
* config/m68k/m68k.c (output_reg_adjust, emit_reg_adjust): Likewise.
* config/arm/arm.c (arm_block_move_unaligned_straight): Likewise.
* config/avr/constraints.md (Y01, Ym1, Y02, Ym2): Likewise.
* config/avr/avr-log.c (avr_vdump, avr_log_set_avr_log,
SET_DUMP_DETAIL): Likewise.
* config/avr/predicates.md (const_8_16_24_operand): Likewise.
* config/avr/avr.c (STR_PREFIX_P, avr_popcount_each_byte,
avr_is_casesi_sequence, avr_casei_sequence_check_operands,
avr_set_core_architecture, avr_set_current_function,
avr_legitimize_reload_address, avr_asm_len, avr_print_operand,
output_movqi, output_movsisf, avr_out_plus, avr_out_bitop,
avr_out_fract, avr_adjust_insn_length, avr_encode_section_info,
avr_2word_insn_p, output_reload_in_const, avr_has_nibble_0xf,
avr_map_decompose, avr_fold_builtin): Likewise.
* config/avr/driver-avr.c (avr_devicespecs_file): Likewise.
* config/avr/gen-avr-mmcu-specs.c (str_prefix_p, print_mcu): Likewise.
* config/i386/i386.c (ix86_parse_stringop_strategy_string): Likewise.
* config/m32c/m32c-pragma.c (m32c_pragma_memregs): Likewise.
* config/m32c/m32c.c (m32c_conditional_register_usage,
m32c_address_cost): Likewise.
* config/m32c/predicates.md (shiftcount_operand,
longshiftcount_operand): Likewise.
* config/iq2000/iq2000.c (iq2000_expand_prologue): Likewise.
* config/nios2/nios2.c (nios2_handle_custom_fpu_insn_option,
can_use_cdx_ldstw): Likewise.
* config/nios2/nios2.h (CDX_REG_P): Likewise.
* config/cr16/cr16.h (RETURN_ADDR_RTX, REGNO_MODE_OK_FOR_BASE_P):
Likewise.
* config/cr16/cr16.md (*mov<mode>_double): Likewise.
* config/cr16/cr16.c (cr16_create_dwarf_for_multi_push): Likewise.
* config/h8300/h8300.c (h8300_rtx_costs, get_shift_alg): Likewise.
* config/vax/constraints.md (U06, U08, U16, CN6, S08, S16): Likewise.
* config/vax/vax.c (adjacent_operands_p): Likewise.
* config/ft32/constraints.md (L, b, KA): Likewise.
* config/ft32/ft32.c (ft32_load_immediate, ft32_expand_prologue):
Likewise.
* cfgexpand.c (expand_stack_alignment): Likewise.
* gcse.c (insert_expr_in_table): Likewise.
* print-rtl.c (rtx_writer::print_rtx_operand_codes_E_and_V): Likewise.
* cgraphunit.c (cgraph_node::expand): Likewise.
* ira-build.c (setup_min_max_allocno_live_range_point): Likewise.
* emit-rtl.c (add_insn): Likewise.
* input.c (dump_location_info): Likewise.
* passes.c (NEXT_PASS): Likewise.
* read-rtl-function.c (parse_note_insn_name,
function_reader::read_rtx_operand_r, function_reader::parse_mem_expr):
Likewise.
* sched-rgn.c (sched_rgn_init): Likewise.
* diagnostic-show-locus.c (layout::show_ruler): Likewise.
* combine.c (find_split_point, simplify_if_then_else, force_to_mode,
if_then_else_cond, simplify_shift_const_1, simplify_comparison): Likewise.
* explow.c (eliminate_constant_term): Likewise.
* final.c (leaf_renumber_regs_insn): Likewise.
* cfgrtl.c (print_rtl_with_bb): Likewise.
* genhooks.c (emit_init_macros): Likewise.
* poly-int.h (maybe_ne, maybe_le, maybe_lt): Likewise.
* tree-data-ref.c (conflict_fn): Likewise.
* selftest.c (assert_streq): Likewise.
* expr.c (store_constructor_field, expand_expr_real_1): Likewise.
* fold-const.c (fold_range_test, extract_muldiv_1, fold_truth_andor,
fold_binary_loc, multiple_of_p): Likewise.
* reload.c (push_reload, find_equiv_reg): Likewise.
* et-forest.c (et_nca, et_below): Likewise.
* dbxout.c (dbxout_symbol_location): Likewise.
* reorg.c (relax_delay_slots): Likewise.
* dojump.c (do_compare_rtx_and_jump): Likewise.
* gengtype-parse.c (type): Likewise.
* simplify-rtx.c (simplify_gen_ternary, simplify_gen_relational,
simplify_const_relational_operation): Likewise.
* reload1.c (do_output_reload): Likewise.
* dumpfile.c (get_dump_file_info_by_switch): Likewise.
* gengtype.c (type_for_name): Likewise.
* gimple-ssa-sprintf.c (format_directive): Likewise.
ada/
* gcc-interface/trans.c (Loop_Statement_to_gnu): Replace Yoda
conditions with typical order conditions.
* gcc-interface/misc.c (gnat_get_array_descr_info,
default_pass_by_ref): Likewise.
* gcc-interface/decl.c (gnat_to_gnu_entity): Likewise.
* adaint.c (__gnat_tmp_name): Likewise.
c-family/
* known-headers.cc (get_stdlib_header_for_name): Replace Yoda
conditions with typical order conditions.
c/
* c-typeck.c (comptypes_internal, function_types_compatible_p,
perform_integral_promotions, digest_init): Replace Yoda conditions
with typical order conditions.
* c-decl.c (check_bitfield_type_and_width): Likewise.
cp/
* name-lookup.c (get_std_name_hint): Replace Yoda conditions with
typical order conditions.
* class.c (check_bitfield_decl): Likewise.
* pt.c (convert_template_argument): Likewise.
* decl.c (duplicate_decls): Likewise.
* typeck.c (commonparms): Likewise.
fortran/
* scanner.c (preprocessor_line): Replace Yoda conditions with typical
order conditions.
* dependency.c (check_section_vs_section): Likewise.
* trans-array.c (gfc_conv_expr_descriptor): Likewise.
jit/
* jit-playback.c (get_type, playback::compile_to_file::copy_file,
playback::context::acquire_mutex): Replace Yoda conditions with
typical order conditions.
* libgccjit.c (gcc_jit_context_new_struct_type,
gcc_jit_struct_set_fields, gcc_jit_context_new_union_type,
gcc_jit_context_new_function, gcc_jit_timer_pop): Likewise.
* jit-builtins.c (matches_builtin): Likewise.
* jit-recording.c (recording::compound_type::set_fields,
recording::fields::write_reproducer, recording::rvalue::set_scope,
recording::function::validate): Likewise.
* jit-logging.c (logger::decref): Likewise.
From-SVN: r255831
|
|
From-SVN: r255746
|
|
This patch adds a helper routine that constructs rtxes
for constant shift amounts, given the mode of the value
being shifted. As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.
One long-standing problem has been to decide what the mode
of a shift count should be for arbitrary rtxes (as opposed to those
directly tied to a target pattern). Realistic choices would be
the mode of the shifted elements, word_mode, QImode, or the same
mode as the shift optabs (in which case what should the mode
be when the target doesn't have a pattern?)
For now the patch picks the mode of the shifted elements,
but with a ??? comment.
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
(expand_vec_perm_var): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r255745
|
|
In move_deaths we move a REG_DEAD note if the instruction combination
has extended the lifetime of a register so that the existing note is
no longer valid. We find that note using reg_stat, but what that finds
can refer to a later insn. If so, we cannot use the cached value. This
patch implements that.
PR rtl-optimization/83393
* combine.c (move_deaths): If reg_stat points to a too new insn in
last_death, do not use it: find the proper insn instead.
gcc/testsuite/
PR rtl-optimization/83393
* gcc.dg/pr83393.c: New testcase.
From-SVN: r255606
|
|
The code in simplify set to handle transforming the paradoxical subreg
expression:
(set FOO (subreg:M (mem:N BAR) 0))
in to:
(set FOO (zero_extend:M (mem:N BAR)))
Does not consider the case where M is a vector mode, allowing it to
construct (for example):
(zero_extend:V4SI (mem:SI))
For one, this has the wrong semantics - but fortunately we fail long
before then in expand_compound_operation.
We need to explicitly reject vector modes from this transformation.
gcc/
* combine.c (simplify_set): Do not transform subregs to zero_extends
if the destination mode is a vector mode.
From-SVN: r255578
|
|
This patch introduces a number of new macros and functions that will
be used to distinguish between different kinds of debug stmts, insns
and notes, namely, preexisting debug bind ones and to-be-introduced
nonbind markers.
In a seemingly mechanical way, it adjusts several uses of the macros
and functions, so that they refer to narrower categories when
appropriate.
These changes, by themselves, should not have any visible effect in
the compiler behavior, since the upcoming debug markers are never
created with this patch alone.
for gcc/ChangeLog
* gimple.h (enum gimple_debug_subcode): Add
GIMPLE_DEBUG_BEGIN_STMT.
(gimple_debug_begin_stmt_p): New.
(gimple_debug_nonbind_marker_p): New.
* tree.h (MAY_HAVE_DEBUG_MARKER_STMTS): New.
(MAY_HAVE_DEBUG_BIND_STMTS): Renamed from....
(MAY_HAVE_DEBUG_STMTS): ... this. Check both.
* insn-notes.def (BEGIN_STMT): New.
* rtl.h (MAY_HAVE_DEBUG_MARKER_INSNS): New.
(MAY_HAVE_DEBUG_BIND_INSNS): Renamed from....
(MAY_HAVE_DEBUG_INSNS): ... this. Check both.
(NOTE_MARKER_LOCATION, NOTE_MARKER_P): New.
(DEBUG_BIND_INSN_P, DEBUG_MARKER_INSN_P): New.
(INSN_DEBUG_MARKER_KIND): New.
(GEN_RTX_DEBUG_MARKER_BEGIN_STMT_PAT): New.
(INSN_VAR_LOCATION): Check for VAR_LOCATION.
(INSN_VAR_LOCATION_PTR): New.
* cfgexpand.c (expand_debug_locations): Handle debug bind insns
only.
(expand_gimple_basic_block): Likewise. Emit debug temps for TER
deps only if debug bind insns are enabled.
(pass_expand::execute): Avoid deep TER and expand
debug locations for debug bind insns only.
* cgraph.c (cgraph_edge::redirect_call_stmt_to_callee): Narrow
debug stmts special handling down to debug bind stmts.
* combine.c (try_combine): Narrow debug insns special handling
down to debug bind insns.
* cse.c (delete_trivially_dead_insns): Handle debug bindings.
Narrow debug insns preexisting special handling down to debug
bind insns.
* dce.c (rest_of_handle_ud_dce): Narrow debug insns special
handling down to debug bind insns.
* function.c (instantiate_virtual_regs): Skip debug markers,
adjust handling of debug binds.
* gimple-ssa-backprop.c (backprop::prepare_change): Try debug
temp insertion iff MAY_HAVE_DEBUG_BIND_STMTS.
* haifa-sched.c (schedule_insn): Narrow special handling of debug
insns to debug bind insns.
* ipa-param-manipulation.c (ipa_modify_call_arguments): Narrow
special handling of debug stmts to debug bind stmts.
* ipa-split.c (split_function): Likewise.
* ira.c (combine_and_move_insns): Adjust debug bind insns only.
* loop-unroll.c (apply_opt_in_copies): Adjust tests on bind
debug insns.
* reg-stack.c (convert_regs_1): Use DEBUG_BIND_INSN_P.
* regrename.c (build_def_use): Likewise.
* regcprop.c (copyprop_hardreg_forward_1): Likewise.
(pass_cprop_hardreg): Narrow special casing of debug insns to
debug bind insns.
* regstat.c (regstat_init_n_sets_and_refs): Likewise.
* reload1.c (reload): Likewise.
* sese.c (sese_insert_phis_for_liveouts): Narrow special
casing of debug stmts to debug bind stmts.
* shrink-wrap.c (move_insn_for_shrink_wrap): Likewise.
* ssa-iterators.h (num_imm_uses): Likewise.
* tree-cfg.c (gimple_merge_blocks): Narrow special casing of
debug stmts to debug bind stmts.
* tree-inline.c (tree_function_versioning): Narrow special casing
of debug stmts to debug bind stmts.
* tree-loop-distribution.c (generate_loops_for_partition):
Narrow special casing of debug stmts to debug bind stmts.
* tree-sra.c (analyze_access_subtree): Narrow special casing
of debug stmts to debug bind stmts.
* tree-ssa-dce.c (remove_dead_stmt): Narrow special casing of debug
stmts to debug bind stmts.
* tree-ssa-loop-ivopt.c (remove_unused_ivs): Narrow special
casing of debug stmts to debug bind stmts.
* tree-ssa-reassoc.c (reassoc_remove_stmt): Likewise.
* tree-ssa-tail-merge.c (tail_merge_optimize): Narrow special
casing of debug stmts to debug bind stmts.
* tree-ssa-threadedge.c (propagate_threaded_block_debug_info):
Likewise.
* tree-ssa.c (flush_pending_stmts): Narrow special casing of
debug stmts to debug bind stmts.
(gimple_replace_ssa_lhs): Likewise.
(insert_debug_temp_for_var_def): Likewise.
(insert_debug_temps_for_defs): Likewise.
(reset_debug_uses): Likewise.
* tree-ssanames.c (release_ssa_name_fn): Likewise.
* tree-vect-loop-manip.c (adjust_debug_stmts_now): Likewise.
(adjust_debug_stmts): Likewise.
(adjust_phi_and_debug_stmts): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (vect_transform_loop): Likewise.
* valtrack.c (propagate_for_debug): Use BIND_DEBUG_INSN_P.
* var-tracking.c (adjust_mems): Narrow special casing of debug
insns to debug bind insns.
(dv_onepart_p, dataflow_set_clar_at_call, use_type): Likewise.
(compute_bb_dataflow, vt_find_locations): Likewise.
(vt_expand_loc, emit_notes_for_changes): Likewise.
(vt_init_cfa_base): Likewise.
(vt_emit_notes): Likewise.
(vt_initialize): Likewise.
(vt_finalize): Likewise.
From-SVN: r255565
|
|
When combine drops a REG_UNUSED SET in a parallel, we have to clear
cached values, so that, even if the REGs remain used (e.g. because
they were referenced in the used SET_SRC), we will not use properties
of the dropped modified value as if they applied to the preserved
original one.
We fail to adjust REG_N_SETS.
for gcc/ChangeLog
PR rtl-optimization/80693
PR rtl-optimization/81019
PR rtl-optimization/81020
* combine.c (distribute_notes): Reset any REG_UNUSED REGs that
are not mentioned in i3. Place the REG_UNUSED note on i2,
possibly modified to REG_DEAD, if it did not originate in i3.
for gcc/testsuite/ChangeLog
PR rtl-optimization/80693
PR rtl-optimization/81019
PR rtl-optimization/81020
* gcc.dg/pr80693.c: New.
* gcc.dg/pr81019.c: New.
From-SVN: r255554
|
|
In PR83304 two insns are combined, where the I2 uses a register that
has a REG_DEAD note on an insn after I2 but before I3. In such a case
move_deaths should move that death note. But move_deaths only looks
at the reg_stat[regno].last_death insn, and that field can be zeroed
out (previously, use_crosses_set_p would prevent the combination in
this case).
If the last_death field is zero it means "unknown", not "no death", so
we have to find if there is a REG_DEAD note.
PR rtl-optimization/83304
* combine.c (move_deaths): If we do not know where a register died,
search for it.
From-SVN: r255506
|
|
This removes use_crosses_set_p, and uses modified_between_p instead
everywhere it was used. This improves optimisation.
* combine.c: Adjust comment.
(use_crosses_set_p): Delete.
(can_combine_p): Use modified_between_p instead of use_crosses_set_p.
(try_combine): Ditto.
From-SVN: r255384
|
|
Eventually we should print the reason that any combination fails.
This is a good start (these happen often).
* combine.c (try_combine): Print a message to dump file whenever
I0, I1, or I2 cannot be combined into I3.
From-SVN: r255261
|
|
The fix for PR82621 makes us not split an I2 if one of the results of
those SETs is unused, since combine does not handle that properly. But
this results in degradation for i386 (or more in general, for any
target that does not have patterns for parallels with an unused result
as a CLOBBER instead of a SET for that result).
This patch instead makes us not split only if one of the results is set
again before I3. That fixes PR83156 and also fixes PR82621.
Unfortunately it undoes the nice optimisations that the previous patch
did on powerpc.
PR rtl-optimization/83156
PR rtl-optimization/82621
* combine.c (try_combine): Don't split an I2 if one of the dests is
set again before I3. Allow unused dests.
From-SVN: r255260
|
|
PR rtl-optimization/81553
* combine.c (simplify_if_then_else): In (if_then_else COND (OP Z C1) Z)
to (OP Z (mult COND (C1 * STORE_FLAG_VALUE))) optimization, if OP
is a shift where C1 has different mode than the whole shift, use C1's
mode for MULT rather than the shift's mode.
* gcc.c-torture/compile/pr81553.c: New test.
From-SVN: r255150
|
|
This patch makes combine reconsider insns it added notes to. This
matters for example if the note is a REG_DEAD; without the note the
setter of the register has to be kept around in the result of
combinations, so it cannot be a 2->1 combination, and the cost of
the result is higher than without that extra set, so try_combine may
refuse the combination with the set, but allow it without the set.
This fixes a regression for powerpc: pr69946.c has started to fail
after the bitfield expansion changes. GCC used to generate
lwz 3,0(9)
rlwinm 3,3,12,20,23
ori 3,3,0x11
rotldi 3,3,52
bl bar
but now it does
lwz 3,0(9)
rldicr 3,3,32,3
srdi 3,3,48
ori 3,3,0x110
sldi 3,3,48
bl bar
(an instruction too many). After this patch it is
lwz 3,0(9)
rlwinm 3,3,16,16,19
ori 3,3,0x110
sldi 3,3,48
bl bar
(the testcase still does not pass, it looks for very specific insns).
* combine.c (added_notes_insn): New.
(try_combine): Handle added_notes_insn like added_links_insn.
Rewrite return value code.
(distribute_notes): Set added_notes_insn to the earliest insn we added
a note to.
From-SVN: r254875
|
|
If we have a PARALLEL of two SETs, and one half is unused, we currently
happily split that into two instructions (although the unused one is
useless). Worse, as PR82621 shows, combine will happily merge this
insn into I3 even if some intervening insn sets the same register
again, which is wrong.
This fixes it by not splitting PARALLELs with REG_UNUSED notes. It
all is handled fine by combine in that case: just the "single set
that is unused" case isn't handled properly.
This also results in better code: combine will now actually throw
away the unused SET. (It still won't do that in an I3).
PR rtl-optimization/82621
* combine.c (try_combine): Do not split PARALLELs of two SETs if the
dest of one of those SETs is unused.
From-SVN: r254874
|
|
This adds some extra debug info to the dump file for combine: print
the insns that are input to try_combine. I was worried printing more
will make the dump file only harder to read, but especially the info
from the REG_DEAD notes is invaluable.
* combine (try_combine): Print the insns input to try_combine to the
dump file.
From-SVN: r254365
|
|
REGMODE_NATURAL_SIZE.
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
Revert accidental duplicate:
* combine.c (can_change_dest_mode): Reject changes in
REGMODE_NATURAL_SIZE.
From-SVN: r254316
|