Age | Commit message (Collapse) | Author | Files | Lines |
|
This adds vect_get_slp_vect_def to get at a SLP nodes vectorized def,
abstracting away the details. It also fixes one stray failure to
use SLP_TREE_REPRESENTATIVE.
2020-05-04 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_get_slp_vect_def): Declare.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
* tree-vect-stmts.c (vect_transform_stmt): Likewise.
(vect_is_simple_use): Use SLP_TREE_REPRESENTATIVE.
* tree-vect-slp.c (vect_get_slp_vect_defs): Fold into single
use ...
(vect_get_slp_defs): ... here.
(vect_get_slp_vect_def): New function.
|
|
This adds an explicit number of scalar lanes to the SLP node
avoiding to dispatch between stmts/ops and eventually not require
those vectors at all.
2020-05-27 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_tree::lanes): New.
(SLP_TREE_LANES): Likewise.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use it.
(vectorizable_reduction): Likewise.
(vect_transform_cycle_phi): Likewise.
(vectorizable_induction): Likewise.
(vectorizable_live_operation): Likewise.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize lanes.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_optimize_slp): Use it.
(vect_slp_analyze_node_operations_1): Likewise.
(vect_slp_convert_to_external): Likewise.
(vect_bb_vectorization_profitable_p): Likewise.
* tree-vect-stmts.c (vectorizable_load): Likewise.
(get_vectype_for_scalar_type): Likewise.
|
|
The following removes the ugly pushing of SLP_TREE_DEF_TYPE to
stmt_infos and instead makes sure to handle invariants fully
in vect_is_simple_use plus adjusting a few places I refrained
from touching when enforcing vector types for them.
It also simplifies building SLP nodes with all external operands
from scalars by not doing that in the parent but instead not
building those from the start. That also gets rid of
vect_update_all_shared_vectypes.
2020-06-04 Richard Biener <rguenther@suse.de>
* tree-vect-slp.c (vect_update_all_shared_vectypes): Remove.
(vect_build_slp_tree_2): Simplify building all external op
nodes from scalars.
(vect_slp_analyze_node_operations): Remove push/pop of
STMT_VINFO_DEF_TYPE.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (ect_check_store_rhs): Pass in the
stmt_info, use the vect_is_simple_use overload combining
SLP and stmt_info analysis.
(vect_is_simple_cond): Likewise.
(vectorizable_store): Adjust.
(vectorizable_condition): Likewise.
(vect_is_simple_use): Fully handle invariant SLP nodes
here. Amend stmt_info operand extraction with COND_EXPR
and masked stores.
* tree-vect-loop.c (vectorizable_reduction): Deal with
COND_EXPR representation ugliness.
|
|
This adds SLP_TREE_REPRESENTATIVE - a representative stmt-info that
is used by SLP analysis and code generation. This avoids the need
for the hack in vect_slp_rearrange_stmts which previously avoided
to re-arrange stmts that might not have been isomorphic because
of operand swapping. It also plays nice with future directions of SLP
and for the forseeable future is easier than replicating more and
more info in the SLP node as long as non-SLP is in-tree.
2020-05-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/95272
* tree-vectorizer.h (_slp_tree::representative): Add.
(SLP_TREE_REPRESENTATIVE): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Adjust SLP
node gathering.
(vectorizable_live_operation): Use the representative to
attach the reduction info to.
* tree-vect-slp.c (_slp_tree::_slp_tree): Initialize
SLP_TREE_REPRESENTATIVE.
(vect_create_new_slp_node): Likewise.
(slp_copy_subtree): Copy it.
(vect_slp_rearrange_stmts): Re-arrange even COND_EXPR stmts.
(vect_slp_analyze_node_operations_1): Pass the representative
to vect_analyze_stmt.
(vect_schedule_slp_instance): Pass the representative to
vect_transform_stmt.
* gcc.dg/vect/pr95272.c: New testcase.
|
|
This tries to enforce a set SLP_TREE_VECTYPE in vect_get_constant_vectors
and provides some infrastructure for setting it in the vectorizable_*
functions, amending those.
2020-05-22 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vect_is_simple_use): New overload.
(vect_maybe_update_slp_op_vectype): New.
* tree-vect-stmts.c (vect_is_simple_use): New overload
accessing operands of SLP vs. non-SLP operation transparently.
(vect_maybe_update_slp_op_vectype): New function updating
the possibly shared SLP operands vector type.
(vectorizable_operation): Be a bit more SLP vs non-SLP agnostic
using the new vect_is_simple_use overload; update SLP invariant
operand nodes vector type.
(vectorizable_comparison): Likewise.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_store): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_assignment): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* tree-vect-slp.c (vect_get_constant_vectors): Enforce
present SLP_TREE_VECTYPE and check it matches previous
behavior.
|
|
This improves code generation with SSE2 for the testcase by
making sure to only generate a single IV when the group size
is a multiple of the vector size. It also adjusts the testcase
which was passing before.
2020-05-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/95219
* tree-vect-loop.c (vectorizable_induction): Reduce
group_size before computing the number of required IVs.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c: Adjust.
|
|
This adds a vectype parameter to add_stmt_cost which avoids the need
to pass down a (wrong) stmt_info just to carry this information.
Useful for invariants which do not have a stmt_info associated.
2020-05-13 Richard Biener <rguenther@suse.de>
* target.def (add_stmt_cost): Add new vectype parameter.
* targhooks.c (default_add_stmt_cost): Adjust.
* targhooks.h (default_add_stmt_cost): Likewise.
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Take new
vectype parameter.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
* tree-vectorizer.h (stmt_info_for_cost::vectype): Add.
(dump_stmt_cost): Add new vectype parameter.
(add_stmt_cost): Likewise.
(record_stmt_cost): Likewise.
(record_stmt_cost): Add overload with old signature.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Adjust.
(vect_get_known_peeling_cost): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
* tree-vectorizer.c (dump_stmt_cost): Add new vectype parameter.
* tree-vect-stmts.c (record_stmt_cost): Likewise.
(vect_prologue_cost_for_slp_op): Remove stmt_vec_info parameter
and pass down correct vectype and NULL stmt_info.
(vect_model_simple_cost): Adjust.
(vect_model_store_cost): Likewise.
|
|
This removes the SLP_INSTANCE_GROUP_SIZE member since the number of
lanes throughout a SLP subgraph is not necessarily constant.
2020-05-13 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (SLP_INSTANCE_GROUP_SIZE): Remove.
(_slp_instance::group_size): Likewise.
* tree-vect-loop.c (vectorizable_reduction): The group size
is the number of lanes in the node.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Likewise.
(vect_analyze_slp_instance): Do not set SLP_INSTANCE_GROUP_SIZE,
verify it matches the instance trees number of lanes.
(vect_slp_analyze_node_operations_1): Use the numer of lanes
in the node as group size.
(vect_bb_vectorization_profitable_p): Use the instance root
number of lanes for the size of life.
(vect_schedule_slp_instance): Use the number of lanes as
group_size.
* tree-vect-stmts.c (vectorizable_load): Remove SLP instance
parameter. Use the number of lanes of the load for the group
size in the gap adjustment code.
(vect_analyze_stmt): Adjust.
(vect_transform_stmt): Likewise.
|
|
A lot of code that wants to know the number of bits in a vector
element gets that information from the element's TYPE_SIZE,
which is always equal to TYPE_SIZE_UNIT * BITS_PER_UNIT.
This doesn't work for SVE and AVX512-style packed boolean vectors,
where several elements can occupy a single byte.
This patch introduces a new pair of helpers for getting the true
(possibly sub-byte) size. I made a token attempt to convert obvious
element size calculations, but I'm sure I missed some.
2020-05-12 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/94980
* tree.h (vector_element_bits, vector_element_bits_tree): Declare.
* tree.c (vector_element_bits, vector_element_bits_tree): New.
* match.pd: Use the new functions instead of determining the
vector element size directly from TYPE_SIZE(_UNIT).
* tree-vect-data-refs.c (vect_gather_scatter_fn_p): Likewise.
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Likewise.
* tree-vect-stmts.c (vect_is_simple_cond): Likewise.
* tree-vect-generic.c (expand_vector_piecewise): Likewise.
(expand_vector_conversion): Likewise.
(expand_vector_addition): Likewise for a TYPE_SIZE_UNIT used as
a divisor. Convert the dividend to bits to compensate.
* tree-vect-loop.c (vectorizable_live_operation): Call
vector_element_bits instead of open-coding it.
|
|
This delays the SLP permutation check to vectorizable_load and optimizes
permutations only after all SLP instances have been generated and the
vectorization factor is determined.
2020-05-08 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (vec_info::slp_loads): New.
(vect_optimize_slp): Declare.
* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Do
nothing when there are no loads.
(vect_gather_slp_loads): Gather loads into a vector.
(vect_supported_load_permutation_p): Remove.
(vect_analyze_slp_instance): Do not verify permutation
validity here.
(vect_analyze_slp): Optimize permutations of reductions
after all SLP instances have been gathered and gather
all loads.
(vect_optimize_slp): New function split out from
vect_supported_load_permutation_p. Elide some permutations.
(vect_slp_analyze_bb_1): Call vect_optimize_slp.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
* tree-vect-stmts.c (vectorizable_load): Check whether
the load can be permuted. When generating code assert we can.
* gcc.dg/vect/bb-slp-pr68892.c: Adjust for not supported
SLP permutations becoming builds from scalars.
* gcc.dg/vect/bb-slp-pr78205.c: Likewise.
* gcc.dg/vect/bb-slp-34.c: Likewise.
|
|
Soonish we'll get SLP nodes which have no corresponding scalar
stmt and thus not stmt_vec_info and thus no way to get back to
the associated vec_info. This patch makes the vec_info available
as part of the APIs instead of putting in that back-pointer into
the leaf data structures.
2020-05-05 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_stmt_vec_info::vinfo): Remove.
(STMT_VINFO_LOOP_VINFO): Likewise.
(STMT_VINFO_BB_VINFO): Likewise.
* tree-vect-data-refs.c: Adjust for the above, adding vec_info *
parameters and adjusting calls.
* tree-vect-loop-manip.c: Likewise.
* tree-vect-loop.c: Likewise.
* tree-vect-patterns.c: Likewise.
* tree-vect-slp.c: Likewise.
* tree-vect-stmts.c: Likewise.
* tree-vectorizer.c: Likewise.
* target.def (add_stmt_cost): Add vec_info * parameter.
* target.h (stmt_in_inner_loop_p): Likewise.
* targhooks.c (default_add_stmt_cost): Adjust.
* doc/tm.texi: Re-generate.
* config/aarch64/aarch64.c (aarch64_extending_load_p): Add
vec_info * parameter and adjust.
(aarch64_sve_adjust_stmt_cost): Likewise.
(aarch64_add_stmt_cost): Likewise.
* config/arm/arm.c (arm_add_stmt_cost): Likewise.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise.
|
|
This patch fixes a large lmbench performance regression with
128-bit SVE, compiled in length-agnostic mode.
vect_better_loop_vinfo_p (new in GCC 10) tries to estimate whether
a new loop_vinfo is cheaper than a previous one, with an in-built
preference for the old one. For variable VF it prefers the old
loop_vinfo if it is cheaper for at least one VF. However, we have
no idea how likely that VF is in practice.
Another extreme would be to do what most of the rest of the
vectoriser does, and rely solely on the constant estimated VF.
But as noted in the comment, this means that a one-unit cost
difference would be enough to pick the new loop_vinfo,
despite the target generally preferring the old loop_vinfo
where possible. The cost model just isn't accurate enough
for that to produce good results as things stand: there might
not be any practical benefit to the new loop_vinfo at the
estimated VF, and it would be significantly worse for higher VFs.
The patch instead goes for a hacky compromise: make sure that the new
loop_vinfo is also no worse than the old loop_vinfo at double the
estimated VF. For all but trivial loops, this ensures that the
new loop_vinfo is only chosen if it is better than the old one
by a non-trivial amount at the estimated VF. It also avoids
putting too much faith in the VF estimate.
I realise this isn't great, but it's supposed to be a conservative fix
suitable for stage 4. The only affected testcases are the ones for
pr89007-*.c, where Advanced SIMD is indeed preferred for 128-bit SVE
and is no worse for 256-bit SVE.
Part of the problem here is that if the new loop_vinfo is better,
we discard the old one and never consider using it even as an
epilogue loop. This means that if we choose Advanced SIMD over SVE,
we're much more likely to have left-over scalar elements.
Another is that the estimate provided by estimated_poly_value might have
different probabilities attached. E.g. when tuning for a particular core,
the estimate is probably accurate, but when tuning for generic code,
the estimate is more of a guess. Relying solely on the estimate is
probably correct for the former but not for the latter.
Hopefully those are things that we could tackle in GCC 11.
2020-04-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_better_loop_vinfo_p): If old_loop_vinfo
has a variable VF, prefer new_loop_vinfo if it is cheaper for the
estimated VF and is no worse at double the estimated VF.
gcc/testsuite/
* gcc.target/aarch64/sve/cost_model_8.c: New test.
* gcc.target/aarch64/sve/cost_model_9.c: Likewise.
* gcc.target/aarch64/sve/pr89007-1.c: Add -msve-vector-bits=512.
* gcc.target/aarch64/sve/pr89007-2.c: Likewise.
|
|
This patch is to fix the stupid mistake by using
gsi_insert_seq_before instead of gsi_insert_before.
BTW, the regression testing on one x86_64 machine from CFarm is
unable to reveal it (I guess due to native arch sandybridge?), so I
specified additional option -march=znver2 and verified the coverage.
Bootstrapped/regtested on powerpc64le-linux-gnu (P9) and
x86_64-pc-linux-gnu, also verified the fail cases in related PRs.
2020-04-03 Kewen Lin <linkw@gcc.gnu.org>
gcc/
PR tree-optimization/94443
* tree-vect-loop.c (vectorizable_live_operation): Use
gsi_insert_seq_before to replace gsi_insert_before.
gcc/testsuite/
PR tree-optimization/94443
* gcc.dg/vect/pr94443.c: New test.
|
|
As PR94043 shows, my commit r10-4524 exposed one issue in
vectorizable_live_operation, which inserts one extra BB
before the single exit, leading unexpected operand expansion
and unexpected loop depth assertion. As Richi suggested,
this patch is to teach vectorizable_live_operation to
generate loop closed phi for vec_lhs, it looks like:
loop;
# lhs' = PHI <lhs>
=>
loop;
# vec_lhs' = PHI <vec_lhs>
new_tree = BIT_FIELD_REF <vec_lhs', ...>;
lhs' = new_tree;
I noticed that there are some SLP cases that have same lhs
and vec_lhs but different offsets, which can make us have
more PHIs for the same vec_lhs there. But I think it would
be fine since only one of them is actually live, the others
should be eliminated by the following dce. So the patch
doesn't check whether there is one phi for vec_lhs, just
create one directly instead.
Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8.
2020-04-01 Kewen Lin <linkw@gcc.gnu.org>
gcc/ChangeLog
PR tree-optimization/94043
* tree-vect-loop.c (vectorizable_live_operation): Generate loop-closed
phi for vec_lhs and use it for lane extraction.
gcc/testsuite/ChangeLog
PR tree-optimization/94043
* gfortran.dg/graphite/vect-pr94043.f90: New test.
|
|
In the r10-7197-gbae7b38cf8a21e068ad5c0bab089dedb78af3346 commit I've
noticed duplicated word in a message, which lead me to grep for those and
we have a tons of them.
I've used
grep -v 'long long\|optab optab\|template template\|double double' *.[chS] */*.[chS] *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 '
Note, the command will not detect the doubled words at the start or end of
line or when one of the words is at the end of line and the next one at the
start of another one.
Some of it is fairly obvious, e.g. all the "the the" cases which is
something I've posted and committed patch for already e.g. in 2016,
other cases are often valid, e.g. "that that" seems to look mostly ok to me.
Some cases are quite hard to figure out, I've left out some of them from the
patch (e.g. "and and" in some cases isn't talking about bitwise/logical and
and so looks incorrect, but in other cases it is talking about those
operations).
In most cases the right solution seems to be to remove one of the duplicated
words, but not always.
I think most important are the ones with user visible messages (in the patch
3 of the first 4 hunks), the rest is just comments (and internal
documentation; for that see the doc/tm.texi changes).
2020-03-17 Jakub Jelinek <jakub@redhat.com>
* lra-spills.c (remove_pseudos): Fix up duplicated word issue in
a dump message.
* tree-sra.c (create_access_replacement): Fix up duplicated word issue
in a comment.
* read-rtl-function.c (find_param_by_name,
function_reader::parse_enum_value, function_reader::get_insn_by_uid):
Likewise.
* spellcheck.c (get_edit_distance_cutoff): Likewise.
* tree-data-ref.c (create_ifn_alias_checks): Likewise.
* tree.def (SWITCH_EXPR): Likewise.
* selftest.c (assert_str_contains): Likewise.
* ipa-param-manipulation.h (class ipa_param_body_adjustments):
Likewise.
* tree-ssa-math-opts.c (convert_expand_mult_copysign): Likewise.
* tree-ssa-loop-split.c (find_vdef_in_loop): Likewise.
* langhooks.h (struct lang_hooks_for_decls): Likewise.
* ipa-prop.h (struct ipa_param_descriptor): Likewise.
* tree-ssa-strlen.c (handle_builtin_string_cmp, handle_store):
Likewise.
* tree-ssa-dom.c (simplify_stmt_for_jump_threading): Likewise.
* tree-ssa-reassoc.c (reassociate_bb): Likewise.
* tree.c (component_ref_size): Likewise.
* hsa-common.c (hsa_init_compilation_unit_data): Likewise.
* gimple-ssa-sprintf.c (get_string_length, format_string,
format_directive): Likewise.
* omp-grid.c (grid_process_kernel_body_copy): Likewise.
* input.c (string_concat_db::get_string_concatenation,
test_lexer_string_locations_ucn4): Likewise.
* cfgexpand.c (pass_expand::execute): Likewise.
* gimple-ssa-warn-restrict.c (builtin_memref::offset_out_of_bounds,
maybe_diag_overlap): Likewise.
* rtl.c (RTX_CODE_HWINT_P_1): Likewise.
* shrink-wrap.c (spread_components): Likewise.
* tree-ssa-dse.c (initialize_ao_ref_for_dse, valid_ao_ref_for_dse):
Likewise.
* tree-call-cdce.c (shrink_wrap_one_built_in_call_with_conds):
Likewise.
* dwarf2out.c (dwarf2out_early_finish): Likewise.
* gimple-ssa-store-merging.c: Likewise.
* ira-costs.c (record_operand_costs): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
* target.def (dispatch): Likewise.
(validate_dims, gen_ccmp_first): Fix up duplicated word issue
in documentation text.
* doc/tm.texi: Regenerated.
* config/i386/x86-tune.def (X86_TUNE_PARTIAL_FLAG_REG_STALL): Fix up
duplicated word issue in a comment.
* config/i386/i386.c (ix86_test_loading_unspec): Likewise.
* config/i386/i386-features.c (remove_partial_avx_dependency):
Likewise.
* config/msp430/msp430.c (msp430_select_section): Likewise.
* config/gcn/gcn-run.c (load_image): Likewise.
* config/aarch64/aarch64-sve.md (sve_ld1r<mode>): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_adjusted_ldpstp): Likewise.
* config/aarch64/falkor-tag-collision-avoidance.c
(single_dest_per_chain): Likewise.
* config/nvptx/nvptx.c (nvptx_record_fndecl): Likewise.
* config/fr30/fr30.c (fr30_arg_partial_bytes): Likewise.
* config/rs6000/rs6000-string.c (expand_cmp_vec_sequence): Likewise.
* config/rs6000/rs6000-p8swap.c (replace_swapped_load_constant):
Likewise.
* config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Likewise.
* config/rs6000/rs6000-logue.c
(rs6000_emit_probe_stack_range_stack_clash): Likewise.
* config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3): Likewise.
Fix various other issues in the comment.
c-family/
* c-common.c (resolve_overloaded_builtin): Fix up duplicated word
issue in a diagnostic message.
cp/
* pt.c (tsubst): Fix up duplicated word issue in a diagnostic message.
(lookup_template_class_1, tsubst_expr): Fix up duplicated word issue
in a comment.
* parser.c (cp_parser_statement, cp_parser_linkage_specification,
cp_parser_placeholder_type_specifier,
cp_parser_constraint_requires_parens): Likewise.
* name-lookup.c (suggest_alternative_in_explicit_scope): Likewise.
fortran/
* array.c (gfc_check_iter_variable): Fix up duplicated word issue
in a comment.
* arith.c (gfc_arith_concat): Likewise.
* resolve.c (gfc_resolve_ref): Likewise.
* frontend-passes.c (matmul_lhs_realloc): Likewise.
* module.c (gfc_match_submodule, load_needed): Likewise.
* trans-expr.c (gfc_init_se): Likewise.
|
|
gcc.dg/pr56350.c started ICEing for SVE in GCC 10 because we
pattern-matched a division reduction:
a /= 8;
into a signed shift with division semantics:
... = IFN_SDIV_POW2 (..., 3);
whereas the reduction code expected it still to be a gassign.
One fix would be to check for a reduction in the pattern matcher
(but current patterns don't generally do that). Another would be
to fail gracefully for reductions involving calls. Since we can't
vectorise the reduction either way, and probably have a better shot
with the shift form, this patch goes for the "fail gracefully" approach.
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Fail gracefully
for reduction chains that (now) include a call.
|
|
When versioning is run the IL is already mangled and finding
a VECTORIZED_CALL IFN can fail.
2020-01-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/93094
* tree-vectorizer.h (vect_loop_versioning): Adjust.
(vect_transform_loop): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Pass down
loop_vectorized_call to vect_transform_loop.
* tree-vect-loop.c (vect_transform_loop): Pass down
loop_vectorized_call to vect_loop_versioning.
* tree-vect-loop-manip.c (vect_loop_versioning): Use
the earlier discovered loop_vectorized_call.
* gcc.dg/vect/pr93094.c: New testcase.
|
|
This patch addresses the problem reported in PR92429. When creating an
epilogue for vectorization we have to replace the SSA_NAMEs in the
PATTERN_DEF_SEQs and RELATED_STMTs of the epilogue's loop_vec_infos. When doing
this we were using simplify_replace_tree which always folds the replacement.
This may lead to a different tree-node than the one which was analyzed in
vect_loop_analyze. In turn the new tree-node may require a different
vectorization than the one we had prepared for which caused the ICE in
question.
gcc/ChangeLog:
2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR tree-optimization/92429
* tree-ssa-loop-niter.h (simplify_replace_tree): Add parameter.
* tree-ssa-loop-niter.c (simplify_replace_tree): Add parameter to
control folding.
* tree-vect-loop.c (update_epilogue_vinfo): Do not fold when replacing
tree.
gcc/testsuite/ChangeLog:
2020-01-16 Andre Vieira <andre.simoesdiasvieira@arm.com>
PR tree-optimization/92429
* gcc.dg/vect/pr92429.c: New test.
|
|
My earlier update_epilogue_loop_vinfo patch introduced an ICE on these
tests for AVX512. If we use pattern stmts, STMT_VINFO_GATHER_SCATTER_P
is valid for both the original stmt and the pattern stmt, but
STMT_VINFO_MEMORY_ACCESS_TYPE is valid only for the latter.
2020-01-15 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/93247
* tree-vect-loop.c (update_epilogue_loop_vinfo): Check the access
type of the stmt that we're going to vectorize.
gcc/testsuite/
PR tree-optimization/93247
* gcc.dg/vect/pr93247-1.c: New test.
* gcc.dg/vect/pr93247-2.c: Likewise.
|
|
The related_vector_mode series missed this case in
vect_create_epilog_for_reduction, where we want to create the
unsigned integer equivalent of another vector. Without it we
could mix SVE and Advanced SIMD vectors in the same operation.
This showed up on existing tests when testing with fixed-length
-msve-vector-bits=128.
2020-01-10 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type rather than build_vector_type
to create the index type for a conditional reduction.
From-SVN: r280112
|
|
update_epilogue_loop_vinfo applies SSA renmaing to the DR_REF of a
gather or scatter, so that vect_check_gather_scatter continues to work.
However, we sometimes also rely on vect_check_gather_scatter when
using gathers and scatters to implement strided accesses.
This showed up on existing tests when testing with fixed-length
-msve-vector-bits=128.
2020-01-10 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (update_epilogue_loop_vinfo): Update DR_REF
for any type of gather or scatter, including strided accesses.
From-SVN: r280111
|
|
gcc/ChangeLog:
2020-01-10 Andre Vieira <andre.simoesdiasvieira@arm.com>
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref): Use
get_dr_vinfo_offset
* tree-vect-loop.c (update_epilogue_loop_vinfo): Remove orig_drs_init
parameter and its use to reset DR_OFFSET's.
(vect_transform_loop): Remove orig_drs_init argument.
* tree-vect-loop-manip.c (vect_update_init_of_dr): Update the offset
member of dr_vec_info rather than the offset of the associated
data_reference's innermost_loop_behavior.
(vect_update_init_of_dr): Pass dr_vec_info instead of data_reference.
(vect_do_peeling): Remove orig_drs_init parameter and its construction.
* tree-vect-stmts.c (check_scan_store): Replace use of DR_OFFSET with
get_dr_vinfo_offset.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
From-SVN: r280107
|
|
From-SVN: r279813
|
|
The fold-left reduction code has a (rarely-used) fallback that handles
cases in which the loop is fully-masked and the target has no native
support for the reduction. The fallback includea a VEC_COND_EXPR
between the reduction vector and a safe value, so we should check
whether that VEC_COND_EXPR is supported.
2019-12-27 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Check whether the
target supports the required VEC_COND_EXPR operation before
allowing the fallback handling of masked fold-left reductions.
gcc/testsuite/
* gcc.target/aarch64/sve/mixed_size_10.c: New test.
From-SVN: r279742
|
|
2019-12-17 Andrew Stubbs <ams@codesourcery.com>
* tree-vect-loop.c (vect_create_epilog_for_reduction): Mention pr92772
in the comments.
From-SVN: r279460
|
|
The direct_slp_reduc code in vect_create_epilog_for_reduction was
still assuming that all types involved in a reduction are the same
(up to types_compatible_p), whereas we now support differences in
sign. This was causing an ICE in gcc.dg/vect/pr92324-4.c for SVE.
2019-12-10 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_create_epilog_for_reduction): When
handling direct_slp_reduc, allow the PHI arguments to have
a different type from the vector elements.
From-SVN: r279164
|
|
gcc.dg/vect/vect-cond-reduc-5.c was ICEing for SVE because we
tried to use an extract-last reduction for a chain of COND_EXPRs.
Adding support for the chained case would be too invasive for stage 3
so this patch explicitly forbids it instead. I've filed PR92884 for
the possible future work.
2019-12-10 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vectorizable_reduction): Don't use
EXTRACT_LAST_REDUCTION for chained reductions.
From-SVN: r279161
|
|
gcc/ChangeLog
2019-12-05 Sudakshina Das <sudi.das@arm.com>
* tree-vect-loop.c (vect_model_reduction_cost): Remove reduction_type
check from if condition.
From-SVN: r279012
|
|
tree-vect-loop.c:4367)
2019-12-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/92742
* tree-vect-loop.c (vect_fixup_reduc_chain): Do not
touch the def-type but verify it is consistent with the
original stmts.
* gcc.dg/torture/pr92742.c: New testcase.
From-SVN: r278896
|
|
When dissolving an SLP-only group of accesses, we should only set
the gap to group_size - 1 for normal non-strided groups.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/92677
* tree-vect-loop.c (vect_dissolve_slp_only_groups): Set the gap
to zero when dissolving a group of strided accesses.
gcc/testsuite/
PR tree-optimization/92677
* gcc.dg/vect/pr92677.c: New test.
From-SVN: r278852
|
|
Now that stmt_vec_info records the choice between vector mask
types and normal nonmask types, we can use that information in
vect_get_vector_types_for_stmt instead of deferring the choice
of vector type till later.
vect_get_mask_type_for_stmt used to check whether the boolean inputs
to an operation:
(a) consistently used mask types or consistently used nonmask types; and
(b) agreed on the number of elements.
(b) shouldn't be a problem when (a) is met. If the operation
consistently uses mask types, tree-vect-patterns.c will have corrected
any mismatches in mask precision. (This is because we only use mask
types for a small well-known set of operations and tree-vect-patterns.c
knows how to handle any that could have different mask precisions.)
And if the operation consistently uses normal nonmask types, there's
no reason why booleans should need extra vector compatibility checks
compared to ordinary integers.
So the potential difficulties all seem to come from (a). Now that
we've chosen the result type ahead of time, we also have to consider
whether the outputs and inputs consistently use mask types.
Taking each vectorizable_* routine in turn:
- vectorizable_call
vect_get_vector_types_for_stmt only handled booleans specially
for gassigns, so vect_get_mask_type_for_stmt never had chance to
handle calls. I'm not sure we support any calls that operate on
booleans, but as things stand, a boolean result would always have
a nonmask type. Presumably any vector argument would also need to
use nonmask types, unless it corresponds to internal_fn_mask_index
(which is already a special case).
For safety, I've added a check for mask/nonmask combinations here
even though we didn't check this previously.
- vectorizable_simd_clone_call
Again, vect_get_mask_type_for_stmt never had chance to handle calls.
The result of the call will always be a nonmask type and the patch
for PR 92710 rejects mask arguments. So all booleans should
consistently use nonmask types here.
- vectorizable_conversion
The function already rejects any conversion between booleans in which
one type isn't a mask type.
- vectorizable_operation
This function definitely needs a consistency check, e.g. to handle
& and | in which one operand is loaded from memory and the other is
a comparison result. Ideally we'd handle this via pattern stmts
instead (like we do for the all-mask case), but that's future work.
- vectorizable_assignment
VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
current code already rejects problematic cases.
- vectorizable_load
Loads always produce nonmask types and there are no relevant inputs
to check against.
- vectorizable_store
vect_check_store_rhs already rejects mask/nonmask combinations
via useless_type_conversion_p.
- vectorizable_reduction
- vectorizable_lc_phi
PHIs always have nonmask types. After the change above, attempts
to combine the PHI result with a mask type would be rejected by
vectorizable_operation. (Again, it would be better to handle
this using pattern stmts.)
- vectorizable_induction
We don't generate inductions for booleans.
- vectorizable_shift
The function already rejects boolean shifts via type_has_mode_precision_p.
- vectorizable_condition
The function already rejects mismatches via useless_type_conversion_p.
- vectorizable_comparison
The function already rejects comparisons between mask and nonmask types.
The result is always a mask type.
2019-11-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/92596
* tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
operations.
(vectorizable_operation): Likewise, instead of relying on
vect_get_mask_type_for_stmt to do this.
(vect_get_vector_types_for_stmt): Always return a vector type
immediately, rather than deferring the choice for boolean results.
Use a vector mask type instead of a normal vector if
vect_use_mask_type_p.
(vect_get_mask_type_for_stmt): Delete.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
mask_producers argument and special boolean_type_node handling.
(vect_determine_vf_for_stmt): Remove mask_producers argument and
update calls to vect_determine_vf_for_stmt_1. Remove doubled call.
(vect_determine_vectorization_factor): Update call accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove special
boolean_type_node handling.
(vect_slp_analyze_node_operations_1): Likewise.
gcc/testsuite/
PR tree-optimization/92596
* gcc.dg/vect/bb-slp-pr92596.c: New test.
* gcc.dg/vect/bb-slp-43.c: Likewise.
From-SVN: r278851
|
|
gcc.target/aarch64/sve/clastb_[57].c started failing after the increase
in the cost of vec_to_scalar (r278452). The problem is that we were
double-counting the cost of the CLASTB: once in vect_model_reduction_cost
as a vec_to_scalar and once in vectorizable_condition as a plain
vector_stmt.
Based on the TODO above vect_model_reduction_cost, I think the
preferred long-term direction is for vectorizable_* to cost these
things itself, so that's what the patch does (for this one case only).
2019-11-22 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vect_model_simple_cost): Take an optional
vect_cost_for_stmt.
(vectorizable_condition): Calculate the cost of EXTRACT_LAST_REDUCTION
here rather than...
* tree-vect-loop.c (vect_model_reduction_cost): ...here.
From-SVN: r278611
|
|
2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92581
* tree-vect-loop.c (vect_create_epilog_for_reduction): For
condition reduction chains gather all conditions involved
for computing the index reduction vector.
* gcc.dg/vect/vect-cond-reduc-5.c: New testcase.
From-SVN: r278445
|
|
tree-vect-loop.c:4325)
2019-11-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92554
* tree-vect-loop.c (vect_create_epilog_for_reduction): Look
for the actual condition stmt and deal with sign-changes.
* gcc.dg/vect/pr92554.c: New testcase.
From-SVN: r278431
|
|
2019-09-19 Richard Biener <rguenther@suse.de>
PR tree-optimization/92555
* tree-vect-loop.c (vect_update_vf_for_slp): Also scan PHIs
for non-SLP stmts.
* gcc.dg/vect/pr92555.c: New testcase.
From-SVN: r278430
|
|
-march=znver2 -flto since r278289)
2019-11-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/92558
* tree-vect-loop.c (vect_create_epilog_for_reduction): When
reducting the width of a reduction vector def update new_phis.
* gcc.dg/vect/pr92558.c: New testcase.
From-SVN: r278400
|
|
This patch adds a mode in which the vectoriser tries each available
base vector mode and picks the one with the lowest cost. The new
behaviour is selected by autovectorize_vector_modes.
The patch keeps the current behaviour of preferring a VF of
loop->simdlen over any larger or smaller VF, regardless of costs
or target preferences.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (VECT_COMPARE_COSTS): New constant.
* target.def (autovectorize_vector_modes): Return a bitmask of flags.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_modes): Update accordingly.
* targhooks.c (default_autovectorize_vector_modes): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_modes):
Likewise.
* config/arc/arc.c (arc_autovectorize_vector_modes): Likewise.
* config/arm/arm.c (arm_autovectorize_vector_modes): Likewise.
* config/i386/i386.c (ix86_autovectorize_vector_modes): Likewise.
* config/mips/mips.c (mips_autovectorize_vector_modes): Likewise.
* tree-vectorizer.h (_loop_vec_info::vec_outside_cost)
(_loop_vec_info::vec_inside_cost): New member variables.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize them.
(vect_better_loop_vinfo_p, vect_joust_loop_vinfos): New functions.
(vect_analyze_loop): When autovectorize_vector_modes returns
VECT_COMPARE_COSTS, try vectorizing the loop with each available
vector mode and picking the one with the lowest cost.
(vect_estimate_min_profitable_iters): Record the computed costs
in the loop_vec_info.
From-SVN: r278336
|
|
This patch makes can_duplicate_and_interleave_p cope with mixtures of
vector sizes, by using queries based on get_vectype_for_scalar_type
instead of directly querying GET_MODE_SIZE (vinfo->vector_mode).
int_mode_for_size is now the first check we do for a candidate mode,
so it seemed better to restrict it to MAX_FIXED_MODE_SIZE. This avoids
unnecessary work and avoids trying to create scalar types that the
target might not support.
2019-11-16 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (can_duplicate_and_interleave_p): Take an
element type rather than an element mode.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
Use get_vectype_for_scalar_type to query the natural types
for a given element type rather than basing everything on
GET_MODE_SIZE (vinfo->vector_mode). Limit int_mode_for_size
query to MAX_FIXED_MODE_SIZE.
(duplicate_and_interleave): Update call accordingly.
* tree-vect-loop.c (vectorizable_reduction): Likewise.
From-SVN: r278335
|
|
2019-11-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/92512
* tree-vect-loop.c (check_reduction_path): Fix operand index
computability check. Add check for second use in COND_EXPRs.
* gcc.dg/torture/pr92512.c: New testcase.
From-SVN: r278293
|
|
internal-fn.c:2890)
2019-11-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/92324
* tree-vect-loop.c (vect_create_epilog_for_reduction): Fix
singedness of SLP reduction epilouge operations. Also reduce
the vector width for SLP reductions before doing elementwise
operations if possible.
* gcc.dg/vect/pr92324-4.c: New testcase.
From-SVN: r278289
|
|
A later patch makes the AArch64 port add four entries to
autovectorize_vector_modes. Each entry describes a different
vector mode assignment for vector code that mixes 8-bit, 16-bit,
32-bit and 64-bit elements. But if (as usual) the vector code has
fewer element sizes than that, we could end up trying the same
combination of vector modes multiple times. This patch adds a
check to prevent that.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::mode_set): New typedef.
(vec_info::used_vector_mode): New member variable.
(vect_chooses_same_modes_p): Declare.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Record each
chosen vector mode in vec_info::used_vector_mode.
(vect_chooses_same_modes_p): New function.
* tree-vect-loop.c (vect_analyze_loop): Use it to avoid trying
the same vector statements multiple times.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
From-SVN: r278242
|
|
After previous patches, it's now possible to make the vectoriser
support multiple vector sizes in the same vector region, using
related_vector_mode to pick the right vector mode for a given
element mode. No port yet takes advantage of this, but I have
a follow-on patch for AArch64.
This patch also seemed like a good opportunity to add some more dump
messages: one to make it clear which vector size/mode was being used
when analysis passed or failed, and another to say when we've decided
to skip a redundant vector size/mode.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* machmode.h (opt_machine_mode::operator==): New function.
(opt_machine_mode::operator!=): Likewise.
* tree-vectorizer.h (vec_info::vector_mode): Update comment.
(get_related_vectype_for_scalar_type): Delete.
(get_vectype_for_scalar_type_and_size): Declare.
* tree-vect-slp.c (vect_slp_bb_region): Print dump messages to say
whether analysis passed or failed, and with what vector modes.
Use related_vector_mode to check whether trying a particular
vector mode would be redundant with the autodetected mode,
and print a dump message if we decide to skip it.
* tree-vect-loop.c (vect_analyze_loop): Likewise.
(vect_create_epilog_for_reduction): Use
get_related_vectype_for_scalar_type instead of
get_vectype_for_scalar_type_and_size.
* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Replace
with...
(get_related_vectype_for_scalar_type): ...this new function.
Take a starting/"prevailing" vector mode rather than a vector size.
Take an optional nunits argument, with the same meaning as for
related_vector_mode. Use related_vector_mode when not
auto-detecting a mode, falling back to mode_for_vector if no
target mode exists.
(get_vectype_for_scalar_type): Update accordingly.
(get_same_sized_vectype): Likewise.
* tree-vectorizer.c (get_vec_alignment_for_array_type): Likewise.
From-SVN: r278240
|
|
This patch replaces vec_info::vector_size with vec_info::vector_mode,
but for now continues to use it as a way of specifying a single
vector size. This makes it easier for later patches to use
related_vector_mode instead.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vec_info::vector_size): Replace with...
(vec_info::vector_mode): ...this new field.
* tree-vect-loop.c (vect_update_vf_for_slp): Update accordingly.
(vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-loop-manip.c (vect_do_peeling): Likewise.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Likewise.
(vect_make_slp_decision, vect_slp_bb_region): Likewise.
* tree-vect-stmts.c (get_vectype_for_scalar_type): Likewise.
* tree-vectorizer.c (try_vectorize_loop_1): Likewise.
gcc/testsuite/
* gcc.dg/vect/vect-tail-nomask-1.c: Update expected epilogue
vectorization message.
From-SVN: r278237
|
|
This is another patch in the series to remove the assumption that
all modes involved in vectorisation have to be the same size.
Rather than have the target provide a list of vector sizes,
it makes the target provide a list of vector "approaches",
with each approach represented by a mode.
A later patch will pass this mode to targetm.vectorize.related_mode
to get the vector mode for a given element mode. Until then, the modes
simply act as an alternative way of specifying the vector size.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* target.h (vector_sizes, auto_vector_sizes): Delete.
(vector_modes, auto_vector_modes): New typedefs.
* target.def (autovectorize_vector_sizes): Replace with...
(autovectorize_vector_modes): ...this new hook.
* doc/tm.texi.in (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES):
Replace with...
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): ...this new hook.
* doc/tm.texi: Regenerate.
* targhooks.h (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* targhooks.c (default_autovectorize_vector_sizes): Delete.
(default_autovectorize_vector_modes): New function.
* omp-general.c (omp_max_vf): Use autovectorize_vector_modes instead
of autovectorize_vector_sizes. Use the number of units in the mode
to calculate the maximum VF.
* omp-low.c (omp_clause_aligned_alignment): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
Use a loop based on related_mode to iterate through all supported
vector modes for a given scalar mode.
* optabs-query.c (can_vec_mask_load_store_p): Use
autovectorize_vector_modes instead of autovectorize_vector_sizes.
* tree-vect-loop.c (vect_analyze_loop, vect_transform_loop): Likewise.
* tree-vect-slp.c (vect_slp_bb_region): Likewise.
* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
Replace with...
(aarch64_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arc/arc.c (arc_autovectorize_vector_sizes): Replace with...
(arc_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/arm/arm.c (arm_autovectorize_vector_sizes): Replace with...
(arm_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/i386/i386.c (ix86_autovectorize_vector_sizes): Replace with...
(ix86_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
* config/mips/mips.c (mips_autovectorize_vector_sizes): Replace with...
(mips_autovectorize_vector_modes): ...this new function.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES): Delete.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
From-SVN: r278236
|
|
build_same_sized_truth_vector_type was confusingly named, since for
SVE and AVX512 the returned vector isn't the same byte size (although
it does have the same number of elements). What it really returns
is the "truth" vector type for a given data vector type.
The more general truth_type_for provides the same thing when passed
a vector and IMO has a more descriptive name, so this patch replaces
all uses of build_same_sized_truth_vector_type with that. It does
the same for a call to build_truth_vector_type, leaving truth_type_for
itself as the only remaining caller.
It's then more natural to pass build_truth_vector_type the original
vector type rather than its size and nunits, especially since the
given size isn't the size of the returned vector. This in turn allows
a future patch to simplify the interface of get_mask_mode. Doing this
also fixes a bug in which truth_type_for would pass a size of zero for
BLKmode vector types.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.h (build_truth_vector_type): Delete.
(build_same_sized_truth_vector_type): Likewise.
* tree.c (build_truth_vector_type): Rename to...
(build_truth_vector_type_for): ...this. Make static and take
a vector type as argument.
(truth_type_for): Update accordingly.
(build_same_sized_truth_vector_type): Delete.
* tree-vect-generic.c (expand_vector_divmod): Use truth_type_for
instead of build_same_sized_truth_vector_type.
* tree-vect-loop.c (vect_create_epilog_for_reduction): Likewise.
(vect_record_loop_mask, vect_get_loop_mask): Likewise.
* tree-vect-patterns.c (build_mask_conversion): Likeise.
* tree-vect-slp.c (vect_get_constant_vectors): Likewise.
* tree-vect-stmts.c (vect_get_vec_def_for_operand): Likewise.
(vect_build_gather_load_calls, vectorizable_call): Likewise.
(scan_store_can_perm_p, vectorizable_scan_store): Likewise.
(vectorizable_store, vectorizable_condition): Likewise.
(get_mask_type_for_scalar_type, get_same_sized_vectype): Likewise.
(vect_get_mask_type_for_stmt): Use truth_type_for instead of
build_truth_vector_type.
* config/aarch64/aarch64-sve-builtins.cc (gimple_folder::convert_pred):
Use truth_type_for instead of build_same_sized_truth_vector_type.
* config/rs6000/rs6000-call.c (fold_build_vec_cmp): Likewise.
gcc/c/
* c-typeck.c (build_conditional_expr): Use truth_type_for instead
of build_same_sized_truth_vector_type.
(build_vec_cmp): Likewise.
gcc/cp/
* call.c (build_conditional_expr_1): Use truth_type_for instead
of build_same_sized_truth_vector_type.
* typeck.c (build_vec_cmp): Likewise.
gcc/d/
* d-codegen.cc (build_boolop): Use truth_type_for instead of
build_same_sized_truth_vector_type.
From-SVN: r278232
|
|
Callers of vect_halve_mask_nunits and vect_double_mask_nunits
already know what mode the resulting vector type should have,
so we might as well create the vector type directly with that mode,
just like build_vector_type_for_mode lets us build normal vectors
with a known mode. This avoids the current awkwardness of having
to recompute the mode starting from vec_info::vector_size, which
hard-codes the assumption that all vectors have to be the same size.
A later patch gets rid of build_truth_vector_type and
build_same_sized_truth_vector_type, so the net effect of the
series is to reduce the number of type functions by one.
2019-11-14 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree.h (build_truth_vector_type_for_mode): Declare.
* tree.c (build_truth_vector_type_for_mode): New function,
split out from...
(build_truth_vector_type): ...here.
(build_opaque_vector_type): Fix head comment.
* tree-vectorizer.h (supportable_narrowing_operation): Remove
vec_info parameter.
(vect_halve_mask_nunits): Replace vec_info parameter with the
mode of the new vector.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop.c (vect_halve_mask_nunits): Likewise.
(vect_double_mask_nunits): Likewise.
* tree-vect-loop-manip.c: Include insn-config.h, rtl.h and recog.h.
(vect_maybe_permute_loop_masks): Remove vinfo parameter. Update call
to vect_halve_mask_nunits, getting the required mode from the unpack
patterns.
(vect_set_loop_condition_masked): Update call accordingly.
* tree-vect-stmts.c (supportable_narrowing_operation): Remove vec_info
parameter and update call to vect_double_mask_nunits.
(vectorizable_conversion): Update call accordingly.
(simple_integer_narrowing): Likewise. Remove vec_info parameter.
(vectorizable_call): Update call accordingly.
(supportable_widening_operation): Update call to
vect_halve_mask_nunits.
* config/aarch64/aarch64-sve-builtins.cc (register_builtin_types):
Use build_truth_vector_type_mode instead of build_truth_vector_type.
From-SVN: r278231
|
|
We didn't take the cost of generating loop masks into account, and so
tended to underestimate the cost of loops that need multiple masks.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-loop.c (vect_estimate_min_profitable_iters): Include
the cost of generating loop masks.
gcc/testsuite/
* gcc.target/aarch64/sve/mask_struct_store_3.c: Add
-fno-vect-cost-model.
* gcc.target/aarch64/sve/mask_struct_store_3_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_2_run.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3.c: Likewise.
* gcc.target/aarch64/sve/peel_ind_3_run.c: Likewise.
From-SVN: r278125
|
|
vect_analyze_loop_costing uses two profitability thresholds: a runtime
one and a static compile-time one. The runtime one is simply the point
at which the vector loop is cheaper than the scalar loop, while the
static one also takes into account the cost of choosing between the
scalar and vector loops at runtime. We compare this static cost against
the expected execution frequency to decide whether it's worth generating
any vector code at all.
However, we never reclaimed the cost of applying the runtime threshold
if it turned out that the vector code can always be used. And we only
know whether that's true once we've calculated what the runtime
threshold would be.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_apply_runtime_profitability_check_p):
New function.
* tree-vect-loop-manip.c (vect_loop_versioning): Use it.
* tree-vect-loop.c (vect_analyze_loop_2): Likewise.
(vect_transform_loop): Likewise.
(vect_analyze_loop_costing): Don't take the cost of versioning
into account for the static profitability threshold if it turns
out that no versioning is needed.
From-SVN: r278124
|
|
vectorizable_assignment handles true SSA-to-SSA copies (which hopefully
we don't see in practice) and no-op conversions that are required
to maintain correct gimple, such as changes between signed and
unsigned types. These cases shouldn't generate any code and so
shouldn't count against either the scalar or vector costs.
Later patches test this, but it seemed worth splitting out.
2019-11-13 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (vect_nop_conversion_p): Declare.
* tree-vect-stmts.c (vect_nop_conversion_p): New function.
(vectorizable_assignment): Don't add a cost for nop conversions.
* tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
Likewise.
* tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise.
From-SVN: r278122
|
|
2019-11-13 Richard Biener <rguenther@suse.de>
PR tree-optimization/92473
* tree-vect-loop.c (vect_create_epilog_for_reduction): Perform
direct optab reduction in the correct type.
From-SVN: r278113
|