Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch fixes the fallout that Kewen reported on Power after
the recent change to avoid unnecessary use of partial vectors.
As Kewen said, the problem is that vect_analyze_loop_2 doesn't
know how many epilogue iterations there will be, and so it
cannot make a final decision about whether the number of
iterations forces an epilogue loop to use partial vectors.
This is similar to the current situation for peeling: we don't know
during initial analysis whether an epilogue loop will itself require
peeling. Instead we decide that during vect_do_peeling, where the
final number of epilogue loop iterations is known.
The patch takes a similar approach for the decision about whether
to use partial vectors. As the comments in the patch say, the
idea is that vect_analyze_loop_2 should make peeling and partial-
vector decisions based on the assumption that the loop_vinfo will
be used as the main loop, while vect_do_peeling should make them
in the knowledge that the loop_vinfo will be used as an epilogue loop.
This allows the same analysis to be used for both cases, which we
rely on for implementing VECT_COMPARE_COSTS; see the big comment
in vect_analyze_loop for details.
I hope the patch makes the (mostly preexisting) structure a bit
more obvious. It isn't what anyone would design from scratch,
but that's the nature of working with a mature vector framework.
Arranging things this way means that vect_verify_full_masking
and vect_verify_loop_lens now become part of the “can” rather
than “will” test for partial vectors.
Also, while splitting out the logic that handles epilogues with
constant iterations, I added a check to make sure that we don't
try to use partial vectors to vectorise a single-scalar loop.
This required some changes to the Power tests.
gcc/
* tree-vectorizer.h (determine_peel_for_niter): Delete in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function.
* tree-vect-loop-manip.c (vect_update_epilogue_niters): New function.
Reject using vector epilogue loops for single iterations. Install
the constant number of epilogue loop iterations in the associated
loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling
to do the main part of the test.
(vect_do_peeling): Use vect_update_epilogue_niters to handle
epilogue loops with a known number of iterations. Skip recomputing
the number of iterations later in that case. Otherwise, use
vect_determine_partial_vectors_and_peeling to decide whether the
epilogue loop needs to use partial vectors or peeling.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the
default can_use_partial_vectors_p to false if partial-vector-usage=0.
(determine_peel_for_niter): Remove in favor of...
(vect_determine_partial_vectors_and_peeling): ...this new function,
split out from...
(vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking
and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P
rather than USING_PARTIAL_VECTORS_P.
gcc/testsuite/
* gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the
single-iteration epilogues of the 64-bit loops to be vectorized.
* gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise.
* gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
|
|
The condition we're expecting to eventually run into isn't fully
captured by checking for CTORs, instead we can also run into the
CTOR element conversion.
2020-09-23 Richard Biener <rguenther@suse.de>
PR tree-optimization/97173
* tree-vect-loop.c (vectorizable_live_operation): Extend
assert to also conver element conversions.
* gcc.dg/vect/pr97173.c: New testcase.
|
|
This fixes a typo introduced with the last change and not noticed
because those vectorizer access macros are not type safe ...
2020-09-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/97095
* tree-vect-loop.c (vectorizable_live_operation): Get
the SLP vector type from the correct object.
* gfortran.dg/pr97095.f: New testcase.
|
|
gcc/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_need_peeling_or_partial_vectors_p): New
function.
(vect_analyze_loop_2): Make use of it not to select partial
vectors if no peel is required.
(determine_peel_for_niter): Move out some logic into
'vect_need_peeling_or_partial_vectors_p'.
gcc/testsuite/ChangeLog
2020-09-09 Andrea Corallo <andrea.corallo@arm.com>
* gcc.target/aarch64/sve/cost_model_10.c: New test.
* gcc.target/aarch64/sve/clastb_8.c: Update test for new
vectorization strategy.
* gcc.target/aarch64/sve/cost_model_5.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
|
|
This removes STMT_VINFO_NUM_SLP_USES by pushing the setting of
the shared stmt_vec_info vector type to where we actually need it
which is alignment analysis and vectorizable_* analysis (where
we could eventually elide it for non-load/store operations).
In particular "uses" in the cache and in disqualified SLP
subgraphs should no longer provide conflicting vector types
this way.
2020-09-16 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::num_slp_uses): Remove.
(STMT_VINFO_NUM_SLP_USES): Likewise.
(vect_free_slp_instance): Adjust.
(vect_update_shared_vectype): Declare.
* tree-vectorizer.c (vec_info::~vec_info): Adjust.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vectorizable_live_operation): Use vector type from
SLP_TREE_REPRESENTATIVE.
(vect_transform_loop): Adjust.
* tree-vect-data-refs.c (vect_slp_analyze_node_alignment):
Set the shared vector type.
* tree-vect-slp.c (vect_free_slp_tree): Remove final_p
parameter, remove STMT_VINFO_NUM_SLP_USES updating.
(vect_free_slp_instance): Adjust.
(vect_create_new_slp_node): Remove STMT_VINFO_NUM_SLP_USES
updating.
(vect_update_shared_vectype): Always compare with the
present vector type, update if NULL.
(vect_build_slp_tree_1): Do not update the shared vector
type here.
(vect_build_slp_tree_2): Adjust.
(slp_copy_subtree): Likewise.
(vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Likewise.
(vect_slp_analyze_node_operations_1): Update the shared
vector type.
(vect_slp_analyze_operations): Adjust.
(vect_slp_analyze_bb_1): Likewise.
|
|
This tries to improve BB vectorization dumps by providing more
precise locations. Currently the vect_location is simply the
very last stmt in a basic-block that has a location. So for
double a[4], b[4];
int x[4], y[4];
void foo()
{
a[0] = b[0]; // line 5
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
x[0] = y[0]; // line 9
x[1] = y[1];
x[2] = y[2];
x[3] = y[3];
} // line 13
we show the user with -O3 -fopt-info-vec
t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
while with the patch we point to both independently vectorized
opportunities:
t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
there's the possibility that the location regresses in case the
root stmt in the SLP instance has no location. For a SLP subgraph
with multiple entries the location also chooses one entry at random,
not sure in which case we want to dump both.
Still as the plan is to extend the basic-block vectorization
scope from single basic-block to multiple ones this is a first
step to preserve something sensible.
Implementation-wise this makes both costing and code-generation
happen on the subgraphs as analyzed.
2020-09-11 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_instance::location): New method.
(vect_schedule_slp): Adjust prototype.
* tree-vectorizer.c (vec_info::remove_stmt): Adjust
the BB region begin if we removed the stmt it points to.
* tree-vect-loop.c (vect_transform_loop): Adjust.
* tree-vect-slp.c (_slp_instance::location): Implement.
(vect_analyze_slp_instance): For BB vectorization set
vect_location to that of the instance.
(vect_slp_analyze_operations): Likewise.
(vect_bb_vectorization_profitable_p): Remove wrapper.
(vect_slp_analyze_bb_1): Remove cost check here.
(vect_slp_region): Cost check and code generate subgraphs separately,
report optimized locations and missed optimizations due to
profitability for each of them.
(vect_schedule_slp): Get the vector of SLP graph entries to
vectorize as argument.
|
|
gcc/ChangeLog
2020-09-07 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Revert
dead-code removal introduced by 09fa6acd8d9 + add a comment to
clarify.
|
|
The following adds the capability to code-generate live lanes in
basic-block vectorization using lane extracts from vector stmts
rather than keeping the original scalar code around for those.
This eventually makes previously not profitable vectorizations
profitable (the live scalar code was appropriately costed so
are the lane extracts now), without considering the cost model
this patch doesn't add or remove any basic-block vectorization
capabilities.
The patch re/ab-uses STMT_VINFO_LIVE_P in basic-block vectorization
mode to tell whether a live lane is vectorized or whether it is
provided by means of keeping the scalar code live.
The patch is a first step towards vectorizing sequences of
stmts that do not end up in stores or vector constructors though.
Bootstrapped and tested on x86_64-unknown-linux-gnu.
2020-09-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vectorizable_live_operation): Adjust.
* tree-vect-loop.c (vectorizable_live_operation): Vectorize
live lanes out of basic-block vectorization nodes.
* tree-vect-slp.c (vect_bb_slp_mark_live_stmts): New function.
(vect_slp_analyze_operations): Analyze live lanes and their
vectorization possibility after the whole SLP graph is final.
(vect_bb_slp_scalar_cost): Adjust for vectorized live lanes.
* tree-vect-stmts.c (can_vectorize_live_stmts): Adjust.
(vect_transform_stmt): Call can_vectorize_live_stmts also for
basic-block vectorization.
* gcc.dg/vect/bb-slp-46.c: New testcase.
* gcc.dg/vect/bb-slp-47.c: Likewise.
* gcc.dg/vect/bb-slp-32.c: Adjust.
|
|
This refines the previous fix for PR96698 by re-doing how and where
we arrange for setting vectorized cycle PHI backedge values.
2020-09-04 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
PR tree-optimization/96920
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): Remove.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Remove vectorized
cycle PHI latch code.
* tree-vect-loop.c (maybe_set_vectorized_backedge_value): New
helper to set vectorized cycle PHI latch values.
(vect_transform_loop): Walk over all PHIs again after
vectorizing them, calling maybe_set_vectorized_backedge_value.
Call maybe_set_vectorized_backedge_value for each vectorized
stmt. Remove delayed update code.
* tree-vect-slp.c (vect_analyze_slp_instance): Initialize
SLP instance reduc_phis member.
(vect_schedule_slp): Set vectorized cycle PHI latch values.
* gfortran.dg/vect/pr96920.f90: New testcase.
* gcc.dg/vect/pr96920.c: Likewise.
|
|
gcc/ChangeLog
2020-09-04 Andrea Corallo <andrea.corallo@arm.com>
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Remove
dead code as LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) is
always verified.
|
|
gcc/ada/ChangeLog:
* gcc-interface/trans.c (gigi): Set exact argument of a vector
growth function to true.
(Attribute_to_gnu): Likewise.
gcc/ChangeLog:
* alias.c (init_alias_analysis): Set exact argument of a vector
growth function to true.
* calls.c (internal_arg_pointer_based_exp_scan): Likewise.
* cfgbuild.c (find_many_sub_basic_blocks): Likewise.
* cfgexpand.c (expand_asm_stmt): Likewise.
* cfgrtl.c (rtl_create_basic_block): Likewise.
* combine.c (combine_split_insns): Likewise.
(combine_instructions): Likewise.
* config/aarch64/aarch64-sve-builtins.cc (function_expander::add_output_operand): Likewise.
(function_expander::add_input_operand): Likewise.
(function_expander::add_integer_operand): Likewise.
(function_expander::add_address_operand): Likewise.
(function_expander::add_fixed_operand): Likewise.
* df-core.c (df_worklist_dataflow_doublequeue): Likewise.
* dwarf2cfi.c (update_row_reg_save): Likewise.
* early-remat.c (early_remat::init_block_info): Likewise.
(early_remat::finalize_candidate_indices): Likewise.
* except.c (sjlj_build_landing_pads): Likewise.
* final.c (compute_alignments): Likewise.
(grow_label_align): Likewise.
* function.c (temp_slots_at_level): Likewise.
* fwprop.c (build_single_def_use_links): Likewise.
(update_uses): Likewise.
* gcc.c (insert_wrapper): Likewise.
* genautomata.c (create_state_ainsn_table): Likewise.
(add_vect): Likewise.
(output_dead_lock_vect): Likewise.
* genmatch.c (capture_info::capture_info): Likewise.
(parser::finish_match_operand): Likewise.
* genrecog.c (optimize_subroutine_group): Likewise.
(merge_pattern_info::merge_pattern_info): Likewise.
(merge_into_decision): Likewise.
(print_subroutine_start): Likewise.
(main): Likewise.
* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Likewise.
* gimple.c (gimple_set_bb): Likewise.
* graphite-isl-ast-to-gimple.c (translate_isl_ast_node_user): Likewise.
* haifa-sched.c (sched_extend_luids): Likewise.
(extend_h_i_d): Likewise.
* insn-addr.h (insn_addresses_new): Likewise.
* ipa-cp.c (gather_context_independent_values): Likewise.
(find_more_contexts_for_caller_subset): Likewise.
* ipa-devirt.c (final_warning_record::grow_type_warnings): Likewise.
(ipa_odr_read_section): Likewise.
* ipa-fnsummary.c (evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(analyze_function_body): Likewise.
(ipa_merge_fn_summary_after_inlining): Likewise.
(read_ipa_call_summary): Likewise.
* ipa-icf.c (sem_function::bb_dict_test): Likewise.
* ipa-prop.c (ipa_alloc_node_params): Likewise.
(parm_bb_aa_status_for_bb): Likewise.
(ipa_compute_jump_functions_for_edge): Likewise.
(ipa_analyze_node): Likewise.
(update_jump_functions_after_inlining): Likewise.
(ipa_read_edge_info): Likewise.
(read_ipcp_transformation_info): Likewise.
(ipcp_transform_function): Likewise.
* ipa-reference.c (ipa_reference_write_optimization_summary): Likewise.
* ipa-split.c (execute_split_functions): Likewise.
* ira.c (find_moveable_pseudos): Likewise.
* lower-subreg.c (decompose_multiword_subregs): Likewise.
* lto-streamer-in.c (input_eh_regions): Likewise.
(input_cfg): Likewise.
(input_struct_function_base): Likewise.
(input_function): Likewise.
* modulo-sched.c (set_node_sched_params): Likewise.
(extend_node_sched_params): Likewise.
(schedule_reg_moves): Likewise.
* omp-general.c (omp_construct_simd_compare): Likewise.
* passes.c (pass_manager::create_pass_tab): Likewise.
(enable_disable_pass): Likewise.
* predict.c (determine_unlikely_bbs): Likewise.
* profile.c (compute_branch_probabilities): Likewise.
* read-rtl-function.c (function_reader::parse_block): Likewise.
* read-rtl.c (rtx_reader::read_rtx_code): Likewise.
* reg-stack.c (stack_regs_mentioned): Likewise.
* regrename.c (regrename_init): Likewise.
* rtlanal.c (T>::add_single_to_queue): Likewise.
* sched-deps.c (init_deps_data_vector): Likewise.
* sel-sched-ir.c (sel_extend_global_bb_info): Likewise.
(extend_region_bb_info): Likewise.
(extend_insn_data): Likewise.
* symtab.c (symtab_node::create_reference): Likewise.
* tracer.c (tail_duplicate): Likewise.
* trans-mem.c (tm_region_init): Likewise.
(get_bb_regions_instrumented): Likewise.
* tree-cfg.c (init_empty_tree_cfg_for_function): Likewise.
(build_gimple_cfg): Likewise.
(create_bb): Likewise.
(move_block_to_fn): Likewise.
* tree-complex.c (tree_lower_complex): Likewise.
* tree-if-conv.c (predicate_rhs_code): Likewise.
* tree-inline.c (copy_bb): Likewise.
* tree-into-ssa.c (get_ssa_name_ann): Likewise.
(mark_phi_for_rewrite): Likewise.
* tree-object-size.c (compute_builtin_object_size): Likewise.
(init_object_sizes): Likewise.
* tree-predcom.c (initialize_root_vars_store_elim_1): Likewise.
(initialize_root_vars_store_elim_2): Likewise.
(prepare_initializers_chain_store_elim): Likewise.
* tree-ssa-address.c (addr_for_mem_ref): Likewise.
(multiplier_allowed_in_address_p): Likewise.
* tree-ssa-coalesce.c (ssa_conflicts_new): Likewise.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-ssa-loop-ivopts.c (addr_offset_valid_p): Likewise.
(get_address_cost_ainc): Likewise.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise.
* tree-ssa-pre.c (add_to_value): Likewise.
(phi_translate_1): Likewise.
(do_pre_regular_insertion): Likewise.
(do_pre_partial_partial_insertion): Likewise.
(init_pre): Likewise.
* tree-ssa-propagate.c (ssa_prop_init): Likewise.
(update_call_from_tree): Likewise.
* tree-ssa-reassoc.c (optimize_range_tests_cmp_bitwise): Likewise.
* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
(vn_reference_lookup_pieces): Likewise.
(eliminate_dom_walker::eliminate_push_avail): Likewise.
* tree-ssa-strlen.c (set_strinfo): Likewise.
(get_stridx_plus_constant): Likewise.
(zero_length_string): Likewise.
(find_equal_ptrs): Likewise.
(printf_strlen_execute): Likewise.
* tree-ssa-threadedge.c (set_ssa_name_value): Likewise.
* tree-ssanames.c (make_ssa_name_fn): Likewise.
* tree-streamer-in.c (streamer_read_tree_bitfields): Likewise.
* tree-vect-loop.c (vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
(vect_record_loop_len): Likewise.
(vect_get_loop_len): Likewise.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-slp.c (vect_slp_convert_to_external): Likewise.
(vect_bb_slp_scalar_cost): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
(vectorizable_slp_permutation): Likewise.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(scan_store_can_perm_p): Likewise.
(vectorizable_store): Likewise.
* expr.c: Likewise.
* vec.c (test_safe_grow_cleared): Likewise.
* vec.h (vec_safe_grow): Likewise.
(vec_safe_grow_cleared): Likewise.
(vl_ptr>::safe_grow): Likewise.
(vl_ptr>::safe_grow_cleared): Likewise.
* config/c6x/c6x.c (insn_set_clock): Likewise.
gcc/c/ChangeLog:
* gimple-parser.c (c_parser_gimple_compound_statement): Set exact argument of a vector
growth function to true.
gcc/cp/ChangeLog:
* class.c (build_vtbl_initializer): Set exact argument of a vector
growth function to true.
* constraint.cc (get_mapped_args): Likewise.
* decl.c (cp_maybe_mangle_decomp): Likewise.
(cp_finish_decomp): Likewise.
* parser.c (cp_parser_omp_for_loop): Likewise.
* pt.c (canonical_type_parameter): Likewise.
* rtti.c (get_pseudo_ti_init): Likewise.
gcc/fortran/ChangeLog:
* trans-openmp.c (gfc_trans_omp_do): Set exact argument of a vector
growth function to true.
gcc/lto/ChangeLog:
* lto-common.c (lto_file_finalize): Set exact argument of a vector
growth function to true.
|
|
This fixes vectorized PHI latch edge updating and delay it until
all of the loop is code generated to deal with the case that the
latch def is a PHI in the same block.
2020-08-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/96698
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Only record
stmts to update PHI latches from, perform the update ...
* tree-vect-loop.c (vect_transform_loop): ... here after
vectorizing those PHIs.
(info_for_reduction): Properly handle non-reduction PHIs.
* gcc.dg/vect/pr96698.c: New testcase.
|
|
gcc/ChangeLog:
* tree-vect-data-refs.c (dr_group_sort_cmp): Work on
data_ref_pair.
(vect_analyze_data_ref_accesses): Work on groups.
(vect_find_stmt_data_reference): Add group_id argument and fill
up dataref_groups vector.
* tree-vect-loop.c (vect_get_datarefs_in_loop): Pass new
arguments.
(vect_analyze_loop_2): Likewise.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Pass argument.
(vect_slp_bb_region): Likewise.
(vect_slp_region): Likewise.
(vect_slp_bb):Work on the entire BB.
* tree-vectorizer.h (vect_analyze_data_ref_accesses): Add new
argument.
(vect_find_stmt_data_reference): Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/bb-slp-38.c: Adjust pattern as now we only process
a single vectorization and now 2 partial.
* gcc.dg/vect/bb-slp-45.c: New test.
|
|
This patch is to add the cost modeling for vector with length,
it mainly follows what we generate for vector with length in
functions vect_set_loop_controls_directly and vect_gen_len
at the worst case.
For Power, the length is expected to be in bits 0-7 (high bits),
we have to model the cost of shifting bits, which is implemented
in adjust_vect_cost_per_loop.
Bootstrapped/regtested on powerpc64le-linux-gnu (P9) with explicit
param vect-partial-vector-usage=1.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_adjust_vect_cost_per_loop): New
function.
(rs6000_finish_cost): Call rs6000_adjust_vect_cost_per_loop.
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Add cost
modeling for vector with length.
(vect_rgroup_iv_might_wrap_p): New function, factored out from...
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): ...this.
Update function comment.
* tree-vect-stmts.c (vect_gen_len): Update function comment.
* tree-vectorizer.h (vect_rgroup_iv_might_wrap_p): New declare.
|
|
Currently vectorizer cost modeling counts branch taken costs for
prologue and epilogue if the number of iterations is unknown.
But it isn't sensible if there are no peeled iterations.
This patch is to guard them under peel_iters_prologue > 0 or
peel_iters_epilogue > 0.
Bootstrapped/regtested on powerpc64le-linux-gnu and aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Don't consider branch
taken costs for prologue and epilogue if they don't exist.
(vect_estimate_min_profitable_iters): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/cost_model_2.c: Adjust due to cost model
change.
|
|
This patch is to refactor the existing peel_iters_prologue and
peel_iters_epilogue cost model handlings, by following the structure
below suggested by Richard Sandiford:
- calculate peel_iters_prologue
- calculate peel_iters_epilogue
- add costs associated with peel_iters_prologue
- add costs associated with peel_iters_epilogue
- add costs related to branch taken/not_taken.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_get_known_peeling_cost): Factor out some code
to determine peel_iters_epilogue to...
(vect_get_peel_iters_epilogue): ...this new function.
(vect_estimate_min_profitable_iters): Refactor cost calculation on
peel_iters_prologue and peel_iters_epilogue.
|
|
Power9 supports vector load/store instruction lxvl/stxvl which allow
us to operate partial vectors with one specific length. This patch
extends some of current mask-based partial vectors support code for
length-based approach, also adds some length specific support code.
So far it assumes that we can only have one partial vectors approach
at the same time, it will disable to use partial vectors if both
approaches co-exist.
Like the description of optab len_load/len_store, the length-based
approach can have two flavors, one is length in bytes, the other is
length in lanes. This patch is mainly implemented and tested for
length in bytes, but as Richard S. suggested, most of code has
considered both flavors.
This also introduces one parameter vect-partial-vector-usage allow
users to control when the loop vectorizer considers using partial
vectors as an alternative to falling back to scalar code.
gcc/ChangeLog:
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Set param_vect_partial_vector_usage to 0 explicitly.
* doc/invoke.texi (vect-partial-vector-usage): Document new option.
* optabs-query.c (get_len_load_store_mode): New function.
* optabs-query.h (get_len_load_store_mode): New declare.
* params.opt (vect-partial-vector-usage): New.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Add the
handlings for vectorization using length-based partial vectors, call
vect_gen_len for length generation, and rename some variables with
items instead of scalars.
(vect_set_loop_condition_partial_vectors): Add the handlings for
vectorization using length-based partial vectors.
(vect_do_peeling): Allow remaining eiters less than epilogue vf for
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Init
epil_using_partial_vectors_p.
(_loop_vec_info::~_loop_vec_info): Call release_vec_loop_controls
for lengths destruction.
(vect_verify_loop_lens): New function.
(vect_analyze_loop): Add handlings for epilogue of loop when it's
marked to use vectorization using partial vectors.
(vect_analyze_loop_2): Add the check to allow only one vectorization
approach using partial vectorization at the same time. Check param
vect-partial-vector-usage for partial vectors decision. Mark
LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P if the epilogue is
considerable to use partial vectors. Call release_vec_loop_controls
for lengths destruction.
(vect_estimate_min_profitable_iters): Adjust for loop vectorization
using length-based partial vectors.
(vect_record_loop_mask): Init factor to 1 for vectorization using
mask-based partial vectors.
(vect_record_loop_len): New function.
(vect_get_loop_len): Likewise.
* tree-vect-stmts.c (check_load_store_for_partial_vectors): Add
checks for vectorization using length-based partial vectors. Factor
some code to lambda function get_valid_nvectors.
(vectorizable_store): Add handlings when using length-based partial
vectors.
(vectorizable_load): Likewise.
(vect_gen_len): New function.
* tree-vectorizer.h (struct rgroup_controls): Add field factor
mainly for length-based partial vectors.
(vec_loop_lens): New typedef.
(_loop_vec_info): Add lens and epil_using_partial_vectors_p.
(LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P): New macro.
(LOOP_VINFO_LENS): Likewise.
(LOOP_VINFO_FULLY_WITH_LENGTH_P): Likewise.
(vect_record_loop_len): New declare.
(vect_get_loop_len): Likewise.
(vect_gen_len): Likewise.
|
|
This patch is derived from the review of vector with length patch
series. I relaxed the guard on LOOP_VINFO_PEELING_FOR_ALIGNMENT for
vector with length as Richard S.'s suggestion, then encountered one
failure from case gcc.dg/vect/vect-ifcvt-11.c with param
vect-partial-vector-usage=2 enablement run. The root cause is that
we still use the original niters for the loop body vectorization,
it leads the access to go out of bound, instead we should use
LOOP_VINFO_NITERS which has been adjusted in vect_do_peeling by
considering the peeling number for prologue.
Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_transform_loop): Use LOOP_VINFO_NITERS which
is adjusted by considering peeled prologue for non
vect_use_loop_mask_for_alignment_p cases.
|
|
This followup removes vect_verify_datarefs_alignment and its
premature cancellation of vectorization leaving the actual
decision whether alignment is supported to the functions
deciding whether we can vectorize a load or store.
2020-07-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_verify_datarefs_alignment): Remove.
(vect_slp_analyze_and_verify_instance_alignment): Rename to ...
(vect_slp_analyze_instance_alignment): ... this.
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Do not call
vect_verify_datarefs_alignment.
(vect_slp_analyze_node_alignment): Rename from
vect_slp_analyze_and_verify_node_alignment and do not
call verify_data_ref_alignment.
(vect_slp_analyze_instance_alignment): Rename from
vect_slp_analyze_and_verify_instance_alignment.
* tree-vect-stmts.c (vectorizable_store): Dump when
we vectorize an unaligned access.
(vectorizable_load): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Do not call
vect_verify_datarefs_alignment.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust.
* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/slp-45.c: Likewise.
* gcc.dg/vect/vect-109.c: Likewise.
|
|
As Richard S. suggested in the review of vector with length patch
series, we can use one message on "partial vectors" instead of
"fully with masking". This patch is to update the dumping string
and related test cases.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (vect_analyze_loop_2): Update dumping string
for fully masking to be more common.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/clastb_1.c: Update dumping string.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.
|
|
This fixes computation of the insertion place for fold-left SLP
reductions where the PHIs do not have vectorized stmts. The
SLP representation isn't perfect here thus the following.
2020-06-26 Richard Biener <rguenther@suse.de>
PR tree-optimization/95897
* tree-vectorizer.h (vectorizable_induction): Remove
unused gimple_stmt_iterator * parameter.
* tree-vect-loop.c (vectorizable_induction): Likewise.
(vect_analyze_loop_operations): Adjust.
* tree-vect-stmts.c (vect_analyze_stmt): Likewise.
(vect_transform_stmt): Likewise.
* tree-vect-slp.c (vect_schedule_slp_instance): Adjust
for fold-left reductions, clarify existing reduction case.
* gcc.dg/vect/pr95897.c: New testcase.
|
|
Minor code refactorings in tree-vect-data-refs.c and tree-vect-loop.c.
Use LOOP_VINFO_DATAREFS and LOOP_VINFO_DDRS when possible and rename
several parameters to make code more consistent.
2020-06-13 Felix Yang <felix.yang@huawei.com>
gcc/
* tree-vect-data-refs.c (vect_verify_datarefs_alignment): Rename
parameter to loop_vinfo and update uses. Use LOOP_VINFO_DATAREFS
when possible.
(vect_analyze_data_refs_alignment): Likewise, and use LOOP_VINFO_DDRS
when possible.
* tree-vect-loop.c (vect_dissolve_slp_only_groups): Use
LOOP_VINFO_DATAREFS when possible.
(update_epilogue_loop_vinfo): Likewise.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, we should share the codes in approaches
with partial vectors if possible. This patch is to:
1) factor out two functions:
- vect_min_prec_for_max_niters
- vect_known_niters_smaller_than_vf.
2) rename four functions:
- vect_iv_limit_for_full_masking
- check_load_store_masking
- vect_set_loop_condition_masked
- vect_set_loop_condition_unmasked
3) rename macros LOOP_VINFO_MASK_COMPARE_TYPE and LOOP_VINFO_MASK_IV_TYPE.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vect_set_loop_condition_masked): Renamed to ...
(vect_set_loop_condition_partial_vectors): ... this. Rename
LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
(vect_set_loop_condition_unmasked): Renamed to ...
(vect_set_loop_condition_normal): ... this.
(vect_set_loop_condition): Rename vect_set_loop_condition_unmasked to
vect_set_loop_condition_normal. Rename vect_set_loop_condition_masked
to vect_set_loop_condition_partial_vectors.
(vect_prepare_for_masked_peels): Rename LOOP_VINFO_MASK_COMPARE_TYPE
to LOOP_VINFO_RGROUP_COMPARE_TYPE.
* tree-vect-loop.c (vect_known_niters_smaller_than_vf): New, factored
out from ...
(vect_analyze_loop_costing): ... this.
(_loop_vec_info::_loop_vec_info): Rename mask_compare_type to
compare_type.
(vect_min_prec_for_max_niters): New, factored out from ...
(vect_verify_full_masking): ... this. Rename
vect_iv_limit_for_full_masking to vect_iv_limit_for_partial_vectors.
Rename LOOP_VINFO_MASK_COMPARE_TYPE to LOOP_VINFO_RGROUP_COMPARE_TYPE.
Rename LOOP_VINFO_MASK_IV_TYPE to LOOP_VINFO_RGROUP_IV_TYPE.
(vectorizable_reduction): Update some dumpings with partial
vectors instead of fully-masked.
(vectorizable_live_operation): Likewise.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): ... this.
* tree-vect-stmts.c (check_load_store_masking): Renamed to ...
(check_load_store_for_partial_vectors): ... this. Update some
dumpings with partial vectors instead of fully-masked.
(vectorizable_store): Rename check_load_store_masking to
check_load_store_for_partial_vectors.
(vectorizable_load): Likewise.
* tree-vectorizer.h (LOOP_VINFO_MASK_COMPARE_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_COMPARE_TYPE): ... this.
(LOOP_VINFO_MASK_IV_TYPE): Renamed to ...
(LOOP_VINFO_RGROUP_IV_TYPE): ... this.
(vect_iv_limit_for_full_masking): Renamed to ...
(vect_iv_limit_for_partial_vectors): this.
(_loop_vec_info): Rename mask_compare_type to rgroup_compare_type.
Rename iv_type to rgroup_iv_type.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford pointed out, we can rename the rgroup struct
rgroup_masks to rgroup_controls, rename its members mask_type to type,
masks to controls to be more generic.
Besides, this patch also renames some functions like vect_set_loop_mask
to vect_set_loop_control, release_vec_loop_masks to
release_vec_loop_controls, vect_set_loop_masks_directly to
vect_set_loop_controls_directly.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_mask): Renamed to ...
(vect_set_loop_control): ... this.
(vect_maybe_permute_loop_masks): Rename rgroup_masks related things.
(vect_set_loop_masks_directly): Renamed to ...
(vect_set_loop_controls_directly): ... this. Also rename some
variables with ctrl instead of mask. Rename vect_set_loop_mask to
vect_set_loop_control.
(vect_set_loop_condition_masked): Rename rgroup_masks related things.
Also rename some variables with ctrl instead of mask.
* tree-vect-loop.c (release_vec_loop_masks): Renamed to ...
(release_vec_loop_controls): ... this. Rename rgroup_masks related
things.
(_loop_vec_info::~_loop_vec_info): Rename release_vec_loop_masks to
release_vec_loop_controls.
(can_produce_all_loop_masks_p): Rename rgroup_masks related things.
(vect_get_max_nscalars_per_iter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_record_loop_mask): Likewise.
(vect_get_loop_mask): Likewise.
* tree-vectorizer.h (struct rgroup_masks): Renamed to ...
(struct rgroup_controls): ... this. Also rename mask_type
to type and rename masks to controls.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford suggested, this patch is to update the existing
fully_masked_p field to using_partial_vectors_p. Introduce one macro
LOOP_VINFO_USING_PARTIAL_VECTORS_P for partial vectorization checking
usage, update the LOOP_VINFO_FULLY_MASKED_P with
LOOP_VINFO_USING_PARTIAL_VECTORS_P && !masks.is_empty() and still use
it for mask-based partial vectors approach specific checks.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop-manip.c (vect_set_loop_condition): Rename
LOOP_VINFO_FULLY_MASKED_P to LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(vect_gen_vector_loop_niters): Likewise.
(vect_do_peeling): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename
fully_masked_p to using_partial_vectors_p.
(vect_analyze_loop_costing): Rename LOOP_VINFO_FULLY_MASKED_P to
LOOP_VINFO_USING_PARTIAL_VECTORS_P.
(determine_peel_for_niter): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_transform_loop): Likewise.
* tree-vectorizer.h (LOOP_VINFO_FULLY_MASKED_P): Updated.
(LOOP_VINFO_USING_PARTIAL_VECTORS_P): New macro.
|
|
Power supports vector memory access with length (in bytes) instructions.
Like existing fully masking for SVE, it is another approach to vectorize
the loop using partially-populated vectors.
As Richard Sandiford pointed out, we should extend the existing flag
can_fully_mask_p to be more generic, to indicate whether we have
any chances with partial vectors for this loop. So this patch
is to rename this flag to can_use_partial_vectors_p to be more
meaningful, also rename the macro LOOP_VINFO_CAN_FULLY_MASK_P
to LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
Bootstrapped/regtested on aarch64-linux-gnu.
gcc/ChangeLog:
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Rename
can_fully_mask_p to can_use_partial_vectors_p.
(vect_analyze_loop_2): Rename LOOP_VINFO_CAN_FULLY_MASK_P to
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P. Rename saved_can_fully_mask_p
to saved_can_use_partial_vectors_p.
(vectorizable_reduction): Rename LOOP_VINFO_CAN_FULLY_MASK_P to
LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P.
(vectorizable_live_operation): Likewise.
* tree-vect-stmts.c (permute_vec_elements): Likewise.
(check_load_store_masking): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
* tree-vectorizer.h (LOOP_VINFO_CAN_FULLY_MASK_P): Renamed to ...
(LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P): ... this.
(_loop_vec_info): Rename can_fully_mask_p to can_use_partial_vectors_p.
|
|
The following avoids allocating stmt info structs for debug stmts.
2020-06-10 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vect_determine_vectorization_factor):
Skip debug stmts.
(_loop_vec_info::_loop_vec_info): Likewise.
(vect_update_vf_for_slp): Likewise.
(vect_analyze_loop_operations): Likewise.
(update_epilogue_loop_vinfo): Likewise.
* tree-vect-patterns.c (vect_determine_precisions): Likewise.
(vect_pattern_recog): Likewise.
* tree-vect-slp.c (vect_detect_hybrid_slp): Likewise.
(_bb_vec_info::_bb_vec_info): Likewise.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized):
Likewise.
|
|
This makes {SLP_TREE,STMT_VINFO}_VEC_STMTS a vector of gimple * and
not allocate a stmt_vec_info for vectorizer generated stmts since
this is now possible after removing the only use which was chaining
of vector stmts via STMT_VINFO_RELATED_STMT.
This also removes all stmt_vec_info allocations done for vector
stmts, the remaining ones are for stmts in the scalar IL and for
patterns which are not part of the IL. Thus after this the stmt
UIDs inside a basic-block are suitable for dominance checking
if you ignore (or lazy-fill) UIDs of zero of the vector stmts
inserted during transform. This property is ensured by a new
flag set when pattern analysis is complete.
2020-06-10 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::vec_stmts): Make it a vector
of gimple * stmts.
(_stmt_vec_info::vec_stmts): Likewise.
(vec_info::stmt_vec_info_ro): New flag.
(vect_finish_replace_stmt): Adjust declaration.
(vect_finish_stmt_generation): Likewise.
(vectorizable_induction): Likewise.
(vect_transform_reduction): Likewise.
(vectorizable_lc_phi): Likewise.
* tree-vect-data-refs.c (vect_create_data_ref_ptr): Do not
allocate stmt infos for increments.
(vect_record_grouped_load_vectors): Adjust.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
(vect_transform_loop): Likewise.
* tree-vect-patterns.c (vect_pattern_recog): Set stmt_vec_info_ro.
* tree-vect-slp.c (vect_get_slp_vect_def): Adjust.
(vect_get_slp_defs): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
(vectorize_slp_instance_root_stmt): Likewise.
* tree-vect-stmts.c (vect_get_vec_defs_for_operand): Likewise.
(vect_finish_stmt_generation_1): Do not allocate a stmt info.
(vect_finish_replace_stmt): Do not return anything.
(vect_finish_stmt_generation): Likewise.
(vect_build_gather_load_calls): Adjust.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_create_vectorized_demotion_stmts): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_scan_store): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_transform_stmt): Likewise.
* tree-vectorizer.c (vec_info::vec_info): Initialize
stmt_vec_info_ro.
(vec_info::replace_stmt): Copy over stmt UID rather than
unsetting/setting a stmt info allocating a new UID.
(vec_info::set_vinfo_for_stmt): Assert !stmt_vec_info_ro.
|
|
This gets rid of the linked list of STMT_VINFO_VECT_STMT and
STMT_VINFO_RELATED_STMT in preparation for vectorized stmts no
longer needing a stmt_vec_info (just for this chaining). This
has ripple-down effects in all places we gather vectorized
defs. For this new interfaces are introduced and used
throughout vectorization, simplifying code in a lot of places
and merging it with the SLP way of gathering vectorized
operands. There is vect_get_vec_defs as the new recommended
unified interface and vect_get_vec_defs_for_operand as one
for non-SLP operation. I've resorted to keep the structure
of the code the same where using vect_get_vec_defs would have
been too disruptive for this already large patch.
2020-06-10 Richard Biener <rguenther@suse.de>
* tree-vect-data-refs.c (vect_vfa_access_size): Adjust.
(vect_record_grouped_load_vectors): Likewise.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vectorize_fold_left_reduction): Likewise.
(vect_transform_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_lc_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
(vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_get_slp_defs): New function, split out
from overload.
* tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Remove.
(vect_get_vec_def_for_operand): Likewise.
(vect_get_vec_def_for_stmt_copy): Likewise.
(vect_get_vec_defs_for_stmt_copy): Likewise.
(vect_get_vec_defs_for_operand): New function.
(vect_get_vec_defs): Likewise.
(vect_build_gather_load_calls): Adjust.
(vect_get_gather_scatter_ops): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vect_get_loop_based_defs): Remove.
(vect_create_vectorized_demotion_stmts): Adjust.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_scan_store): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_comparison): Likewise.
(vect_transform_stmt): Adjust and remove no longer applicable
sanity checks.
* tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize
STMT_VINFO_VEC_STMTS.
(vec_info::free_stmt_vec_info): Relase it.
* tree-vectorizer.h (_stmt_vec_info::vectorized_stmt): Remove.
(_stmt_vec_info::vec_stmts): Add.
(STMT_VINFO_VEC_STMT): Remove.
(STMT_VINFO_VEC_STMTS): New.
(vect_get_vec_def_for_operand_1): Remove.
(vect_get_vec_def_for_operand): Likewise.
(vect_get_vec_defs_for_stmt_copy): Likewise.
(vect_get_vec_def_for_stmt_copy): Likewise.
(vect_get_vec_defs): New overloads.
(vect_get_vec_defs_for_operand): New.
(vect_get_slp_defs): Declare.
|
|
This removes dead code left over from the reduction vectorization
refactoring last year.
2020-06-09 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vectorizable_induction): Remove dead code.
|
|
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def,
abstracting away the details. It also fixes one stray failure to
use SLP_TREE_REPRESENTATIVE.
2020-05-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_get_slp_vect_def): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
* tree-vect-stmts.c (vect_transform_stmt): Likewise.
(vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE.
* tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single
use ...
(vect_get_slp_defs): ... here.
(vect_get_slp_vect_def): New function.
|
|
This adds an explicit number of scalar lanes to the SLP node
avoiding to dispatch between stmts/ops and eventually not require
those vectors at all.
2020-05-27 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::lanes): New.
(SLP_TREE_LANES): Likewise.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_optimize_slp): Use it.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
* tree-vect-stmts.c (vectorizable_load): Likewise.
(get_vectype_for_scalar_type): Likewise.
|
|
The following removes the ugly pushing of SLP_TREE_DEF_TYPE to
stmt_infos and instead makes sure to handle invariants fully
in vect_is_simple_use plus adjusting a few places I refrained
from touching when enforcing vector types for them.
It also simplifies building SLP nodes with all external operands
from scalars by not doing that in the parent but instead not
building those from the start. That also gets rid of
vect_update_all_shared_vectypes.
2020-06-04 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_update_all_shared_vectypes): Remove.
(vect_build_slp_tree_2): Simplify building all external op
nodes from scalars.
(vect_slp_analyze_node_operations): Remove push/pop of
STMT_VINFO_DEF_TYPE.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (ect_check_store_rhs): Pass in the
stmt_info, use the vect_is_simple_use overload combining
SLP and stmt_info analysis.
(vect_is_simple_cond): Likewise.
(vectorizable_store): Adjust.
(vectorizable_condition): Likewise.
(vect_is_simple_use): Fully handle invariant SLP nodes
here. Amend stmt_info operand extraction with COND_EXPR
and masked stores.
* tree-vect-loop.c (vectorizable_reduction): Deal with
COND_EXPR representation ugliness.
|
|
This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that
is used by SLP analysis and code generation. This avoids the need
for the hack in vect_slp_rearrange_stmts which previously avoided
to re-arrange stmts that might not have been isomorphic because
of operand swapping. It also plays nice with future directions of SLP
and for the forseeable future is easier than replicating more and
more info in the SLP node as long as non-SLP is in-tree.
2020-05-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/95272
* tree-vectorizer.h (_slp_tree::representative): Add.
(SLP_TREE_REPRESENTATIVE): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Adjust SLP
node gathering.
(vectorizable_live_operation): Use the representative to
attach the reduction info to.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
SLP_TREE_REPRESENTATIVE.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts.
(vect_slp_analyze_node_operations_1): Pass the representative
to vect_analyze_stmt.
(vect_schedule_slp_instance): Pass the representative to
vect_transform_stmt.
* gcc.dg/vect/pr95272.c: New testcase.
|
|
This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors
and provides some infrastructure for setting it in the vectorizable_*
functions, amending those.
2020-05-22 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_is_simple_use): New overload.
(vect_maybe_update_slp_op_vectype): New.
* tree-vect-stmts.c (vect_is_simple_use): New overload
accessing operands of SLP vs. non-SLP operation transparently.
(vect_maybe_update_slp_op_vectype): New function updating
the possibly shared SLP operands vector type.
(vectorizable_operation): Be a bit more SLP vs non-SLP agnostic
using the new vect_is_simple_use overload; update SLP invariant
operand nodes vector type.
(vectorizable_comparison): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_store): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_assignment): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Enforce
present SLP_TREE_VECTYPE and check it matches previous
behavior.
|
|
This improves code generation with SSE2 for the testcase by
making sure to only generate a single IV when the group size
is a multiple of the vector size. It also adjusts the testcase
which was passing before.
2020-05-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/95219
* tree-vect-loop.c (vectorizable_induction): Reduce
group_size before computing the number of required IVs.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: Adjust.
|
|
This adds a vectype parameter to add_stmt_cost which avoids the need
to pass down a (wrong) stmt_info just to carry this information.
Useful for invariants which do not have a stmt_info associated.
2020-05-13 Richard Biener <rguenther@suse.de>
* target.def (add_stmt_cost): Add new vectype parameter.
* targhooks.c (default_add_stmt_cost): Adjust.
* targhooks.h (default_add_stmt_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new
vectype parameter.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
* tree-vectorizer.h (stmt_info_for_cost::vectype): Add.
(dump_stmt_cost): Add new vectype parameter.
(add_stmt_cost): Likewise.
(record_stmt_cost): Likewise.
(record_stmt_cost): Add overload with old signature.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Adjust.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter.
* tree-vect-stmts.c (record_stmt_cost): Likewise.
(vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter
and pass down correct vectype and NULL stmt_info.
(vect_model_simple_cost): Adjust.
(vect_model_store_cost): Likewise.
|
|
This removes the SLP_INSTANCE_GROUP_SIZE member since the number of
lanes throughout a SLP subgraph is not necessarily constant.
2020-05-13 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove.
(_slp_instance::group_size): Likewise.
* tree-vect-loop.c (vectorizable_reduction): The group size
is the number of lanes in the node.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE,
verify it matches the instance trees number of lanes.
(vect_slp_analyze_node_operations_1): Use the numer of lanes
in the node as group size.
(vect_bb_vectorization_profitable_p): Use the instance root
number of lanes for the size of life.
(vect_schedule_slp_instance): Use the number of lanes as
group_size.
* tree-vect-stmts.c (vectorizable_load): Remove SLP instance
parameter. Use the number of lanes of the load for the group
size in the gap adjustment code.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
|
|
A lot of code that wants to know the number of bits in a vector
element gets that information from the element's TYPE_SIZE,
which is always equal to TYPE_SIZE_UNIT * BITS_PER_UNIT.
This doesn't work for SVE and AVX512-style packed boolean vectors,
where several elements can occupy a single byte.
This patch introduces a new pair of helpers for getting the true
(possibly sub-byte) size. I made a token attempt to convert obvious
element size calculations, but I'm sure I missed some.
2020-05-12 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/94980
* tree.h (vector_element_bits, vector_element_bits_tree): Declare.
* tree.c (vector_element_bits, vector_element_bits_tree): New.
* match.pd: Use the new functions instead of determining the
vector element size directly from TYPE_SIZE(_UNIT).
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-stmts.c (vect_is_simple_cond): Likewise.
* tree-vect-generic.c (expand_vector_piecewise): Likewise.
(expand_vector_conversion): Likewise.
(expand_vector_addition): Likewise for a TYPE_SIZE_UNIT used as
a divisor. Convert the dividend to bits to compensate.
* tree-vect-loop.c (vectorizable_live_operation): Call
vector_element_bits instead of open-coding it.
|
|
This delays the SLP permutation check to vectorizable_load and optimizes
permutations only after all SLP instances have been generated and the
vectorization factor is determined.
2020-05-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vec_info::slp_loads): New.
(vect_optimize_slp): Declare.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when there are no loads.
(vect_gather_slp_loads): Gather loads into a vector.
(vect_supported_load_permutation_p): Remove.
(vect_analyze_slp_instance): Do not verify permutation
validity here.
(vect_analyze_slp): Optimize permutations of reductions
after all SLP instances have been gathered and gather
all loads.
(vect_optimize_slp): New function split out from
vect_supported_load_permutation_p. Elide some permutations.
(vect_slp_analyze_bb_1): Call vect_optimize_slp.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-stmts.c (vectorizable_load): Check whether
the load can be permuted. When generating code assert we can.
* gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported
SLP permutations becoming builds from scalars.
* gcc.dg/vect/bb-slp-pr78205.c: Likewise.
* gcc.dg/vect/bb-slp-34.c: Likewise.
|
|
Soonish we'll get SLP nodes which have no corresponding scalar
stmt and thus not stmt_vec_info and thus no way to get back to
the associated vec_info. This patch makes the vec_info available
as part of the APIs instead of putting in that back-pointer into
the leaf data structures.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::vinfo): Remove.
(STMT_VINFO_LOOP_VINFO): Likewise.
(STMT_VINFO_BB_VINFO): Likewise.
* tree-vect-data-refs.c: Adjust for the above, adding vec_info *
parameters and adjusting calls.
* tree-vect-loop-manip.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-patterns.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.c: Likewise.
* target.def (add_stmt_cost): Add vec_info * parameter.
* target.h (stmt_in_inner_loop_p): Likewise.
* targhooks.c (default_add_stmt_cost): Adjust.
* doc/tm.texi: Re-generate.
* config/aarch64/aarch64.c (aarch64_extending_load_p): Add
vec_info * parameter and adjust.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Likewise.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
|
|
This patch fixes a large lmbench performance regression with
128-bit SVE, compiled in length-agnostic mode.
vect_better_loop_vinfo_p (new in GCC 10) tries to estimate whether
a new loop_vinfo is cheaper than a previous one, with an in-built
preference for the old one. For variable VF it prefers the old
loop_vinfo if it is cheaper for at least one VF. However, we have
no idea how likely that VF is in practice.
Another extreme would be to do what most of the rest of the
vectoriser does, and rely solely on the constant estimated VF.
But as noted in the comment, this means that a one-unit cost
difference would be enough to pick the new loop_vinfo,
despite the target generally preferring the old loop_vinfo
where possible. The cost model just isn't accurate enough
for that to produce good results as things stand: there might
not be any practical benefit to the new loop_vinfo at the
estimated VF, and it would be significantly worse for higher VFs.
The patch instead goes for a hacky compromise: make sure that the new
loop_vinfo is also no worse than the old loop_vinfo at double the
estimated VF. For all but trivial loops, this ensures that the
new loop_vinfo is only chosen if it is better than the old one
by a non-trivial amount at the estimated VF. It also avoids
putting too much faith in the VF estimate.
I realise this isn't great, but it's supposed to be a conservative fix
suitable for stage 4. The only affected testcases are the ones for
pr89007-*.c, where Advanced SIMD is indeed preferred for 128-bit SVE
and is no worse for 256-bit SVE.
Part of the problem here is that if the new loop_vinfo is better,
we discard the old one and never consider using it even as an
epilogue loop. This means that if we choose Advanced SIMD over SVE,
we're much more likely to have left-over scalar elements.
Another is that the estimate provided by estimated_poly_value might have
different probabilities attached. E.g. when tuning for a particular core,
the estimate is probably accurate, but when tuning for generic code,
the estimate is more of a guess. Relying solely on the estimate is
probably correct for the former but not for the latter.
Hopefully those are things that we could tackle in GCC 11.
2020-04-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_better_loop_vinfo_p): If old_loop_vinfo
has a variable VF, prefer new_loop_vinfo if it is cheaper for the
estimated VF and is no worse at double the estimated VF.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_8.c: New test.
* gcc.target/aarch64/sve/cost_model_9.c: Likewise.
* gcc.target/aarch64/sve/pr89007-1.c: Add -msve-vector-bits=512.
* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
|
|
This patch is to fix the stupid mistake by using
gsi_insert_seq_before instead of gsi_insert_before.
BTW, the regression testing on one x86_64 machine from CFarm is
unable to reveal it (I guess due to native arch sandybridge?), so I
specified additional option -march=znver2 and verified the coverage.
Bootstrapped/regtested on powerpc64le-linux-gnu (P9) and
x86_64-pc-linux-gnu, also verified the fail cases in related PRs.
2020-04-03 Kewen Lin <linkw@gcc.gnu.org>
gcc/
PR tree-optimization/94443
* tree-vect-loop.c (vectorizable_live_operation): Use
gsi_insert_seq_before to replace gsi_insert_before.
gcc/testsuite/
PR tree-optimization/94443
* gcc.dg/vect/pr94443.c: New test.
|
|
As PR94043 shows, my commit r10-4524 exposed one issue in
vectorizable_live_operation, which inserts one extra BB
before the single exit, leading unexpected operand expansion
and unexpected loop depth assertion. As Richi suggested,
this patch is to teach vectorizable_live_operation to
generate loop closed phi for vec_lhs, it looks like:
loop;
# lhs' = PHI <lhs>
=>
loop;
# vec_lhs' = PHI <vec_lhs>
new_tree = BIT_FIELD_REF <vec_lhs', ...>;
lhs' = new_tree;
I noticed that there are some SLP cases that have same lhs
and vec_lhs but different offsets, which can make us have
more PHIs for the same vec_lhs there. But I think it would
be fine since only one of them is actually live, the others
should be eliminated by the following dce. So the patch
doesn't check whether there is one phi for vec_lhs, just
create one directly instead.
Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8.
2020-04-01 Kewen Lin <linkw@gcc.gnu.org>
gcc/ChangeLog
PR tree-optimization/94043
* tree-vect-loop.c (vectorizable_live_operation): Generate loop-closed
phi for vec_lhs and use it for lane extraction.
gcc/testsuite/ChangeLog
PR tree-optimization/94043
* gfortran.dg/graphite/vect-pr94043.f90: New test.
|
|
In the r10-7197-gbae7b38cf8a21e068ad5c0bab089dedb78af3346 commit I've
noticed duplicated word in a message, which lead me to grep for those and
we have a tons of them.
I've used
grep -v 'long long\|optab optab\|template template\|double double' *.[chS] */*.[chS] *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 '
Note, the command will not detect the doubled words at the start or end of
line or when one of the words is at the end of line and the next one at the
start of another one.
Some of it is fairly obvious, e.g. all the "the the" cases which is
something I've posted and committed patch for already e.g. in 2016,
other cases are often valid, e.g. "that that" seems to look mostly ok to me.
Some cases are quite hard to figure out, I've left out some of them from the
patch (e.g. "and and" in some cases isn't talking about bitwise/logical and
and so looks incorrect, but in other cases it is talking about those
operations).
In most cases the right solution seems to be to remove one of the duplicated
words, but not always.
I think most important are the ones with user visible messages (in the patch
3 of the first 4 hunks), the rest is just comments (and internal
documentation; for that see the doc/tm.texi changes).
2020-03-17 Jakub Jelinek <jakub@redhat.com>
* lra-spills.c (remove_pseudos): Fix up duplicated word issue in
a dump message.
* tree-sra.c (create_access_replacement): Fix up duplicated word issue
in a comment.
* read-rtl-function.c (find_param_by_name,
function_reader::parse_enum_value, function_reader::get_insn_by_uid):
Likewise.
* spellcheck.c (get_edit_distance_cutoff): Likewise.
* tree-data-ref.c (create_ifn_alias_checks): Likewise.
* tree.def (SWITCH_EXPR): Likewise.
* selftest.c (assert_str_contains): Likewise.
* ipa-param-manipulation.h (class ipa_param_body_adjustments):
Likewise.
* tree-ssa-math-opts.c (convert_expand_mult_copysign): Likewise.
* tree-ssa-loop-split.c (find_vdef_in_loop): Likewise.
* langhooks.h (struct lang_hooks_for_decls): Likewise.
* ipa-prop.h (struct ipa_param_descriptor): Likewise.
* tree-ssa-strlen.c (handle_builtin_string_cmp, handle_store):
Likewise.
* tree-ssa-dom.c (simplify_stmt_for_jump_threading): Likewise.
* tree-ssa-reassoc.c (reassociate_bb): Likewise.
* tree.c (component_ref_size): Likewise.
* hsa-common.c (hsa_init_compilation_unit_data): Likewise.
* gimple-ssa-sprintf.c (get_string_length, format_string,
format_directive): Likewise.
* omp-grid.c (grid_process_kernel_body_copy): Likewise.
* input.c (string_concat_db::get_string_concatenation,
test_lexer_string_locations_ucn4): Likewise.
* cfgexpand.c (pass_expand::execute): Likewise.
* gimple-ssa-warn-restrict.c (builtin_memref::offset_out_of_bounds,
maybe_diag_overlap): Likewise.
* rtl.c (RTX_CODE_HWINT_P_1): Likewise.
* shrink-wrap.c (spread_components): Likewise.
* tree-ssa-dse.c (initialize_ao_ref_for_dse, valid_ao_ref_for_dse):
Likewise.
* tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds):
Likewise.
* dwarf2out.c (dwarf2out_early_finish): Likewise.
* gimple-ssa-store-merging.c: Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* target.def (dispatch): Likewise.
(validate_dims, gen_ccmp_first): Fix up duplicated word issue
in documentation text.
* doc/tm.texi: Regenerated.
* config/i386/x86-tune.def (X86_TUNE_PARTIAL_FLAG_REG_STALL): Fix up
duplicated word issue in a comment.
* config/i386/i386.c (ix86_test_loading_unspec): Likewise.
* config/i386/i386-features.c (remove_partial_avx_dependency):
Likewise.
* config/msp430/msp430.c (msp430_select_section): Likewise.
* config/gcn/gcn-run.c (load_image): Likewise.
* config/aarch64/aarch64-sve.md (sve_ld1r<mode>): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(single_dest_per_chain): Likewise.
* config/nvptx/nvptx.c (nvptx_record_fndecl): Likewise.
* config/fr30/fr30.c (fr30_arg_partial_bytes): Likewise.
* config/rs6000/rs6000-string.c (expand_cmp_vec_sequence): Likewise.
* config/rs6000/rs6000-p8swap.c (replace_swapped_load_constant):
Likewise.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
* config/rs6000/rs6000-logue.c
(rs6000_emit_probe_stack_range_stack_clash): Likewise.
* config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3): Likewise.
Fix various other issues in the comment.
c-family/
* c-common.c (resolve_overloaded_builtin): Fix up duplicated word
issue in a diagnostic message.
cp/
* pt.c (tsubst): Fix up duplicated word issue in a diagnostic message.
(lookup_template_class_1, tsubst_expr): Fix up duplicated word issue
in a comment.
* parser.c (cp_parser_statement, cp_parser_linkage_specification,
cp_parser_placeholder_type_specifier,
cp_parser_constraint_requires_parens): Likewise.
* name-lookup.c (suggest_alternative_in_explicit_scope): Likewise.
fortran/
* array.c (gfc_check_iter_variable): Fix up duplicated word issue
in a comment.
* arith.c (gfc_arith_concat): Likewise.
* resolve.c (gfc_resolve_ref): Likewise.
* frontend-passes.c (matmul_lhs_realloc): Likewise.
* module.c (gfc_match_submodule, load_needed): Likewise.
* trans-expr.c (gfc_init_se): Likewise.
|
|
gcc.dg/pr56350.c started ICEing for SVE in GCC 10 because we
pattern-matched a division reduction:
a /= 8;
into a signed shift with division semantics:
... = IFN_SDIV_POW2 (..., 3);
whereas the reduction code expected it still to be a gassign.
One fix would be to check for a reduction in the pattern matcher
(but current patterns don't generally do that). Another would be
to fail gracefully for reductions involving calls. Since we can't
vectorise the reduction either way, and probably have a better shot
with the shift form, this patch goes for the "fail gracefully" approach.
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Fail gracefully
for reduction chains that (now) include a call.
|
|
When versioning is run the IL is already mangled and finding
a VECTORIZED_CALL IFN can fail.
2020-01-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/93094
* tree-vectorizer.h (vect_loop_versioning): Adjust.
(vect_transform_loop): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Pass down
loop_vectorized_call to vect_transform_loop.
* tree-vect-loop.c (vect_transform_loop): Pass down
loop_vectorized_call to vect_loop_versioning.
* tree-vect-loop-manip.c (vect_loop_versioning): Use
the earlier discovered loop_vectorized_call.
* gcc.dg/vect/pr93094.c: New testcase.
|
|
This patch addresses the problem reported in PR92429. When creating an
epilogue for vectorization we have to replace the SSA_NAMEs in the
PATTERN_DEF_SEQs and RELATED_STMTs of the epilogue's loop_vec_infos. When doing
this we were using simplify_replace_tree which always folds the replacement.
This may lead to a different tree-node than the one which was analyzed in
vect_loop_analyze. In turn the new tree-node may require a different
vectorization than the one we had prepared for which caused the ICE in
question.
gcc/ChangeLog:
2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR tree-optimization/92429
* tree-ssa-loop-niter.h (simplify_replace_tree): Add parameter.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add parameter to
control folding.
* tree-vect-loop.c (update_epilogue_vinfo): Do not fold when replacing
tree.
gcc/testsuite/ChangeLog:
2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR tree-optimization/92429
* gcc.dg/vect/pr92429.c: New test.
|
|
My earlier update_epilogue_loop_vinfo patch introduced an ICE on these
tests for AVX512. If we use pattern stmts, STMT_VINFO_GATHER_SCATTER_P
is valid for both the original stmt and the pattern stmt, but
STMT_VINFO_MEMORY_ACCESS_TYPE is valid only for the latter.
2020-01-15 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/93247
* tree-vect-loop.c (update_epilogue_loop_vinfo): Check the access
type of the stmt that we're going to vectorize.
gcc/testsuite/
PR tree-optimization/93247
* gcc.dg/vect/pr93247-1.c: New test.
* gcc.dg/vect/pr93247-2.c: Likewise.
|
|
The related_vector_mode series missed this case in
vect_create_epilog_for_reduction, where we want to create the
unsigned integer equivalent of another vector. Without it we
could mix SVE and Advanced SIMD vectors in the same operation.
This showed up on existing tests when testing with fixed-length
-msve-vector-bits=128.
2020-01-10 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type rather than build_vector_type
to create the index type for a conditional reduction.
From-SVN: r280112
|