Age | Commit message (Collapse) | Author | Files | Lines |
|
w/ -mavx512bw -ftree-loop-vectorize -O1)
PR tree-optimization/78938
* tree-vect-stmts.c (vectorizable_condition): For non-masked COND_EXPR
where comp_vectype is VECTOR_BOOLEAN_TYPE_P, use
BIT_{NOT,XOR,AND,IOR}_EXPR on the comparison operands instead of
{EQ,NE,GE,GT,LE,LT}_EXPR directly inside of VEC_COND_EXPR. Formatting
fixes.
* gcc.dg/vect/pr78938.c: New test.
From-SVN: r244223
|
|
From-SVN: r243994
|
|
2016-11-09 Richard Biener <rguenther@suse.de>
PR tree-optimization/78007
* tree-vect-stmts.c (vectorizable_bswap): New function.
(vectorizable_call): Call vectorizable_bswap for
BUILT_IN_BSWAP{16,32,64} if arguments are not promoted.
* gcc.dg/vect/vect-bswap32.c: Adjust.
* gcc.dg/vect/vect-bswap64.c: Likewise.
From-SVN: r241992
|
|
not trigger peeling for gaps.
2016-11-08 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (get_group_load_store_type): If the
access is aligned do not trigger peeling for gaps.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Do not
force alignment of vars with DECL_USER_ALIGN.
* gcc.dg/vect/vect-nb-iter-ub-2.c: Adjust.
From-SVN: r241959
|
|
groups)
2016-11-08 Richard Biener <rguenther@suse.de>
PR tree-optimization/78205
* tree-vect-stmts.c (vectorizable_load): Move check whether
we may run into gaps when BB vectorizing SLP permutations ...
* tree-vect-slp.c (vect_supported_load_permutation_p): ...
here where we can do a more precise check.
* gcc.dg/vect/bb-slp-pr78205.c: New testcase.
From-SVN: r241956
|
|
2016-11-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/37150
* tree-vectorizer.h (vect_transform_slp_perm_load): Add n_perms
parameter.
* tree-vect-slp.c (vect_supported_load_permutation_p): Adjust.
(vect_analyze_slp_cost_1): Account for the real number of
permutations emitted and for dead loads.
(vect_transform_slp_perm_load): Add n_perms parameter counting
the number of emitted permutations.
* tree-vect-stmts.c (vectorizable_load): Adjust.
From-SVN: r241893
|
|
PR target/78102
* optabs.def (vcondeq_optab, vec_cmpeq_optab): New optabs.
* optabs.c (expand_vec_cond_expr): For comparison codes
EQ_EXPR and NE_EXPR, attempt vcondeq_optab as fallback.
(expand_vec_cmp_expr): For comparison codes
EQ_EXPR and NE_EXPR, attempt vec_cmpeq_optab as fallback.
* optabs-tree.h (expand_vec_cmp_expr_p, expand_vec_cond_expr_p):
Add enum tree_code argument.
* optabs-query.h (get_vec_cmp_eq_icode, get_vcond_eq_icode): New
inline functions.
* optabs-tree.c (expand_vec_cmp_expr_p): Add CODE argument. For
CODE EQ_EXPR or NE_EXPR, attempt to use vec_cmpeq_optab as
fallback.
(expand_vec_cond_expr_p): Add CODE argument. For CODE EQ_EXPR or
NE_EXPR, attempt to use vcondeq_optab as fallback.
* tree-vect-generic.c (expand_vector_comparison,
expand_vector_divmod, expand_vector_condition): Adjust
expand_vec_cmp_expr_p and expand_vec_cond_expr_p callers.
* tree-vect-stmts.c (vectorizable_condition,
vectorizable_comparison): Likewise.
* tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern,
check_bool_pattern, search_type_for_mask_1): Likewise.
* expr.c (do_store_flag): Likewise.
* doc/md.texi (@code{vec_cmpeq@var{m}@var{n}},
@code{vcondeq@var{m}@var{n}}): Document.
* config/i386/sse.md (vec_cmpeqv2div2di, vcondeq<VI8F_128:mode>v2di):
New expanders.
testsuite/
* gcc.target/i386/pr78102.c: New test.
From-SVN: r241525
|
|
with vector elements.
2016-09-29 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_load): Avoid emitting vector
constructors with vector elements.
From-SVN: r240611
|
|
gimple_call_internal_p.
* ipa-inline-analysis.c (find_foldable_builtin_expect): Use
gimple_call_internal_p.
* ipa-split.c (find_return_bb): Likewise.
(execute_split_functions): Likewise.
* omp-low.c (dump_oacc_loop_part): Likewise.
(oacc_loop_xform_head_tail): Likewise.
* predict.c (predict_loops): Likewise.
* sanopt.c (pass_sanopt::execute): Likewise.
* tree-cfg.c (get_abnormal_succ_dispatcher): Likewise.
* tree-parloops.c (oacc_entry_exit_ok_1): Likewise.
* tree-stdarg.c (gimple_call_ifn_va_arg_p): Remove function.
(expand_ifn_va_arg_1): Use gimple_call_internal_p.
(expand_ifn_va_arg): Likewise.
* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vect_simd_lane_linear): Likewise.
(vect_transform_stmt): Likewise.
* tree-vectorizer.c (vect_loop_vectorized_call): Likewise.
* tsan.c (instrument_memory_accesses): Likewise.
From-SVN: r240498
|
|
gcc:
2016-09-21 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR tree-optimization/77550
* tree-vect-stmts.c (create_array_ref): Change parameters.
(get_group_alias_ptr_type): New function.
(vectorizable_store, vectorizable_load): Use get_group_alias_ptr_type.
testsuite:
2016-09-21 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR tree-optimization/77550
* g++.dg/pr77550.C: New test.
From-SVN: r240313
|
|
* hwint.h (least_bit_hwi, pow2_or_zerop, pow2p_hwi, ctz_or_zero):
New.
* hwint.c (exact_log2): Use pow2p_hwi.
(ctz_hwi, ffs_hwi): Use least_bit_hwi.
* alias.c (memrefs_conflict_p): Use pow2_or_zerop.
* builtins.c (get_object_alignment_2, get_object_alignment)
(get_pointer_alignment, fold_builtin_atomic_always_lock_free): Use
least_bit_hwi.
* calls.c (compute_argument_addresses, store_one_arg): Use
least_bit_hwi.
* cfgexpand.c (expand_one_stack_var_at): Use least_bit_hwi.
* combine.c (force_to_mode): Use least_bit_hwi.
* emit-rtl.c (set_mem_attributes_minus_bitpos, adjust_address_1):
Use least_bit_hwi.
* expmed.c (synth_mult, expand_divmod): Use ctz_or_zero, ctz_hwi.
(init_expmed_one_conv): Use pow2p_hwi.
* fold-const.c (round_up_loc, round_down_loc): Use pow2_or_zerop.
(fold_binary_loc): Use pow2p_hwi.
* function.c (assign_parm_find_stack_rtl): Use least_bit_hwi.
* gimple-fold.c (gimple_fold_builtin_memory_op): Use pow2p_hwi.
* gimple-ssa-strength-reduction.c (replace_ref): Use least_bit_hwi.
* hsa-gen.c (gen_hsa_addr_with_align, hsa_bitmemref_alignment):
Use least_bit_hwi.
* ipa-cp.c (ipcp_alignment_lattice::meet_with_1): Use least_bit_hwi.
* ipa-prop.c (ipa_modify_call_arguments): Use least_bit_hwi.
* omp-low.c (oacc_loop_fixed_partitions)
(oacc_loop_auto_partitions): Use least_bit_hwi.
* rtlanal.c (nonzero_bits1): Use ctz_or_zero.
* stor-layout.c (place_field): Use least_bit_hwi.
* tree-pretty-print.c (dump_generic_node): Use pow2p_hwi.
* tree-sra.c (build_ref_for_offset): Use least_bit_hwi.
* tree-ssa-ccp.c (ccp_finalize): Use least_bit_hwi.
* tree-ssa-math-opts.c (bswap_replace): Use least_bit_hwi.
* tree-ssa-strlen.c (handle_builtin_memcmp): Use pow2p_hwi.
* tree-vect-data-refs.c (vect_analyze_group_access_1)
(vect_grouped_store_supported, vect_grouped_load_supported)
(vect_permute_load_chain, vect_shift_permute_load_chain)
(vect_transform_grouped_load): Use pow2p_hwi.
* tree-vect-generic.c (expand_vector_divmod): Use ctz_or_zero.
* tree-vect-patterns.c (vect_recog_divmod_pattern): Use ctz_or_zero.
* tree-vect-stmts.c (vectorizable_mask_load_store): Use
least_bit_hwi.
* tsan.c (instrument_expr): Use least_bit_hwi.
* var-tracking.c (negative_power_of_two_p): Use pow2_or_zerop.
From-SVN: r240194
|
|
PR tree-optimization/77503
* tree-vect-loop.c (vectorizable_reduction): Record reduction
code for CONST_COND_REDUCTION at analysis stage and use it at
transform stage.
* tree-vectorizer.h (struct _stmt_vec_info): New field.
(STMT_VINFO_VEC_CONST_COND_REDUC_CODE): New macro.
* tree-vect-stmts.c (new_stmt_vec_info): Initialize above new
field.
gcc/testsuite
* gcc.dg/vect/pr77503.c: New test.
From-SVN: r240166
|
|
There's no point in calling release () on an auto_vec just before it goes
out of scope.
gcc/ChangeLog:
2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org>
* tree-data-ref.c (find_data_references_in_stmt): Remove
unnecessary call to vec::release.
(graphite_find_data_references_in_stmt): Likewise.
* tree-ssa-alias.c (nonoverlapping_component_refs_of_decl_p): Likewise.
* tree-vect-stmts.c (vectorizable_condition): Likewise.
From-SVN: r238286
|
|
gcc/c/ChangeLog:
2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org>
* c-parser.c (c_parser_generic_selection): Make type of variable
auto_vec.
(c_parser_omp_declare_simd): Likewise.
gcc/ChangeLog:
2016-07-12 Trevor Saunders <tbsaunde+gcc@tbsaunde.org>
* cfgexpand.c (expand_used_vars): Make the type of a local variable auto_vec.
* genmatch.c (lower_for): Likewise.
* haifa-sched.c (haifa_sched_init): Likewise.
(add_to_speculative_block): Likewise.
(create_check_block_twin): Likewise.
* predict.c (handle_missing_profiles): Likewise.
* tree-data-ref.c (loop_nest_has_data_refs): Likewise.
* tree-diagnostic.c (maybe_unwind_expanded_macro_loc): Likewise.
* tree-ssa-loop-niter.c (discover_iteration_bound_by_body_walk): Likewise.
(maybe_lower_iteration_bound): Likewise.
* tree-ssa-sccvn.c (DFS): Likewise.
* tree-stdarg.c (reachable_at_most_once): Likewise.
* tree-vect-stmts.c (vectorizable_conversion): Likewise.
(vectorizable_store): Likewise.
From-SVN: r238285
|
|
-ftree-slp-vectorize)
PR tree-optimization/71823
* tree-vect-stmts.c (vectorizable_operation): Use vect_get_vec_defs
to get vec_oprnds2 from op2.
* gcc.dg/vect/pr71823.c: New test.
From-SVN: r238229
|
|
This patch uses the vect_memory_access_type from patch 6 to represent
the effect of a negative contiguous stride or a zero stride. The latter
is valid only for loads.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_memory_access_type): Add
VMAT_INVARIANT, VMAT_CONTIGUOUS_DOWN and VMAT_CONTIGUOUS_REVERSED.
* tree-vect-stmts.c (compare_step_with_zero): New function.
(perm_mask_for_reverse): Move further up file.
(get_group_load_store_type): Stick to VMAT_ELEMENTWISE if the
step is negative.
(get_negative_load_store_type): New function.
(get_load_store_type): Call it. Add an ncopies argument.
(vectorizable_mask_load_store): Update call accordingly and
remove tests for negative steps.
(vectorizable_store, vectorizable_load): Likewise. Handle new
memory_access_types.
From-SVN: r238039
|
|
This is the main patch in the series. It adds a new enum and routines
for classifying a vector load or store implementation.
Originally there were three motivations:
(1) Reduce cut-&-paste
(2) Make the chosen vectorisation strategy more obvious. At the
moment this is derived implicitly from various other bits of
state (GROUPED, STRIDED, SLP, etc.)
(3) Decouple the vectorisation strategy from those other bits of state,
so that there can be a choice of implementation for a given scalar
statement. The specific problem here is that we class:
for (...)
{
... = a[i * x];
... = a[i * x + 1];
}
as "strided and grouped" but:
for (...)
{
... = a[i * 7];
... = a[i * 7 + 1];
}
as "non-strided and grouped". Before the patch, "strided and
grouped" loads would always try to use separate scalar loads
while "non-strided and grouped" loads would always try to use
load-and-permute. But load-and-permute is never supported for
a group size of 7, so the effect was that the first loop was
vectorisable and the second wasn't. It seemed odd that not
knowing x (but accepting it could be 7) would allow more
optimisation opportunities than knowing x is 7.
Unfortunately, it looks like we underestimate the cost of separate
scalar accesses on at least aarch64, so I've disabled (3) for now;
see the "if" statement at the end of get_load_store_type. I think
the patch still does (1) and (2), so that's the justification for
it in its current form. It also means that (3) is now simply a
case of removing the FIXME code, once the cost model problems have
been sorted out. (I did wonder about adding a --param, but that
seems overkill. I hope to get back to this during GCC 7 stage 1.)
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_memory_access_type): New enum.
(_stmt_vec_info): Add a memory_access_type field.
(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
(vect_model_store_cost): Take an access type instead of a boolean.
(vect_model_load_cost): Likewise.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
vect_model_store_cost and vect_model_load_cost.
* tree-vect-stmts.c (vec_load_store_type): New enum.
(vect_model_store_cost): Take an access type instead of a
store_lanes_p boolean. Simplify tests.
(vect_model_load_cost): Likewise, but for load_lanes_p.
(get_group_load_store_type, get_load_store_type): New functions.
(vectorizable_store): Use get_load_store_type. Record the access
type in STMT_VINFO_MEMORY_ACCESS_TYPE.
(vectorizable_load): Likewise.
(vectorizable_mask_load_store): Likewise. Replace is_store
variable with vls_type.
From-SVN: r238038
|
|
This patch moves the fix for PR65518 to the code that checks whether
load-and-permute operations are supported. If the group size is
greater than the vectorisation factor, it would still be possible
to fall back to elementwise loads (as for strided groups) rather
than fail vectorisation entirely.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vect_grouped_load_supported): Add a
single_element_p parameter.
* tree-vect-data-refs.c (vect_grouped_load_supported): Likewise.
Check the PR65518 case here rather than in vectorizable_load.
* tree-vect-loop.c (vect_analyze_loop_2): Update call accordignly.
* tree-vect-stmts.c (vectorizable_load): Likewise.
From-SVN: r238037
|
|
This patch just refactors the gather/scatter support so that all
information is in a single structure, rather than separate variables.
This reduces the number of arguments to a function added in patch 6.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (gather_scatter_info): New structure.
(vect_check_gather_scatter): Return a bool rather than a decl.
Replace return-by-pointer arguments with a single
gather_scatter_info *.
* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
(vect_analyze_data_refs): Update call accordingly.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
(vectorizable_mask_load_store): Likewise. Also record the
offset dt and vectype in the gather_scatter_info.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
From-SVN: r238036
|
|
vect_model_store_cost had:
/* Costs of the stores. */
if (STMT_VINFO_STRIDED_P (stmt_info)
&& !STMT_VINFO_GROUPED_ACCESS (stmt_info))
{
/* N scalar stores plus extracting the elements. */
inside_cost += record_stmt_cost (body_cost_vec,
ncopies * TYPE_VECTOR_SUBPARTS (vectype),
scalar_store, stmt_info, 0, vect_body);
But non-SLP strided groups also use individual scalar stores rather than
vector stores, so I think we should skip this only for SLP groups.
The same applies to vect_model_load_cost.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vect_model_store_cost): For non-SLP
strided groups, use the cost of N scalar accesses instead
of ncopies vector accesses.
(vect_model_load_cost): Likewise.
From-SVN: r238035
|
|
Add a bit more commentary and try to make the structure more obvious.
The horrendous:
if (grouped_access_p
&& represents_group_p
&& !store_lanes_p
&& !STMT_VINFO_STRIDED_P (stmt_info)
&& !slp_node)
checks go away in patch 6.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vect_cost_group_size): Delete.
(vect_model_store_cost): Avoid calling it. Use first_stmt_p
variable to indicate when once-per-group costs are being used.
(vect_model_load_cost): Likewise. Fix comment and misindented code.
From-SVN: r238034
|
|
I recently relaxed the peeling-for-gaps conditions for LD3 but
kept them as-is for load-and-permute. I don't think the conditions
are needed for load-and-permute either though. No current load-and-
permute should load outside the group, so if there is no gap at the end,
the final vector element loaded will correspond to an element loaded
by the original scalar loop.
The patch for PR68559 (a missed optimisation PR) increased the peeled
cases from "exact_log2 (groupsize) == -1" to "vf % group_size == 0", so
before that fix, we didn't peel for gaps if there was no gap at the end
of the group and if the group size was a power of 2.
The only current non-power-of-2 load-and-permute size is 3, which
doesn't require loading more than 3 vectors.
The testcase is based on gcc.dg/vect/pr49038.c.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vectorizable_load): Remove unnecessary
peeling-for-gaps condition.
gcc/testsuite/
* gcc.dg/vect/group-no-gaps-1.c: New test.
From-SVN: r238033
|
|
gcc/
PR tree-optimization/71655
* tree-vect-stmts.c (vectorizable_comparison): Swap definition
types when swapping operands.
gcc/testsuite/
PR tree-optimization/71655
* g++.dg/pr71655.C: New test.
From-SVN: r237846
|
|
ivybridge and westmere targets)
gcc/
PR middle-end/71488
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Support
comparison of boolean vectors.
* tree-vect-stmts.c (vectorizable_comparison): Vectorize comparison
of boolean vectors using bitwise operations.
gcc/testsuite/
PR middle-end/71488
* g++.dg/pr71488.C: New test.
* gcc.dg/vect/vect-bool-cmp.c: New test.
From-SVN: r237706
|
|
restrictions.
2016-06-15 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_store): Remove strided grouped
store restrictions.
* gcc.dg/vect/slp-45.c: New testcase.
From-SVN: r237474
|
|
loads and fall back to scalar loads in case...
2016-06-08 Richard Biener <rguenther@suse.de>
* tree-vect-stmts.c (vectorizable_load): Remove restrictions
on strided SLP loads and fall back to scalar loads in case
we can't chunk them.
* gcc.dg/vect/slp-43.c: New testcase.
From-SVN: r237215
|
|
2016-06-03 Alan Hayward <alan.hayward@arm.com>
[3/3] No need to vectorize simple only-live stmts
gcc/
* tree-vect-stmts.c (vect_stmt_relevant_p): Do not vectorize non live
relevant stmts which are simple and invariant.
* tree-vect-loop.c (vectorizable_live_operation): Check relevance
instead of simple and invariant
testsuite/
* gcc.dg/vect/vect-live-slp-5.c: Remove dg check.
From-SVN: r237065
|
|
2016-06-03 Alan Hayward <alan.hayward@arm.com>
[2/3] Vectorize inductions that are live after the loop
gcc/
* tree-vect-loop.c (vect_analyze_loop_operations): Allow live stmts.
(vectorizable_reduction): Check for new relevant state.
(vectorizable_live_operation): vectorize live stmts using
BIT_FIELD_REF. Remove special case for gimple assigns stmts.
* tree-vect-stmts.c (is_simple_and_all_uses_invariant): New function.
(vect_stmt_relevant_p): Check for stmts which are only used live.
(process_use): Use of a stmt does not inherit it's live value.
(vect_mark_stmts_to_be_vectorized): Simplify relevance inheritance.
(vect_analyze_stmt): Check for new relevant state.
* tree-vectorizer.h (vect_relevant): New entry for a stmt which is used
outside the loop, but not inside it.
testsuite/
* gcc.dg/tree-ssa/pr64183.c: Ensure test does not vectorize.
* testsuite/gcc.dg/vect/no-scevccp-vect-iv-2.c: Remove xfail.
* gcc.dg/vect/vect-live-1.c: New test.
* gcc.dg/vect/vect-live-2.c: New test.
* gcc.dg/vect/vect-live-3.c: New test.
* gcc.dg/vect/vect-live-4.c: New test.
* gcc.dg/vect/vect-live-5.c: New test.
* gcc.dg/vect/vect-live-slp-1.c: New test.
* gcc.dg/vect/vect-live-slp-2.c: New test.
* gcc.dg/vect/vect-live-slp-3.c: New test.
From-SVN: r237064
|
|
2016-06-03 Alan Hayward <alan.hayward@arm.com>
[1/3] Split vect_get_vec_def_for_operand into two
gcc/
* tree-vectorizer.h (vect_get_vec_def_for_operand_1): New
* tree-vect-stmts.c (vect_get_vec_def_for_operand_1): New
(vect_get_vec_def_for_operand): Split out code.
From-SVN: r237063
|
|
gcc/
* tree-vect-stmts.c (vectorizable_call) Remove GOMP_SIMD_LANE code.
From-SVN: r237061
|
|
2016-05-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/71264
* tree-vect-stmts.c (vect_init_vector): Properly deal with
vector type val.
* gcc.dg/vect/pr71264.c: New testcase.
From-SVN: r236699
|
|
references to non-integer/non-pointer)
PR c++/71257
* tree-vect-stmts.c (vectorizable_simd_clone_call): Handle
SIMD_CLONE_ARG_TYPE_LINEAR_REF_CONSTANT_STEP like
SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP. Add
SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP and
SIMD_CLONE_ARG_TYPE_LINEAR_UVAL_CONSTANT_STEP cases explicitly.
* semantics.c (finish_omp_clauses) <case OMP_CLAUSE_LINEAR>:
For OMP_CLAUSE_LINEAR_REF don't require type to be
integral or pointer.
* g++.dg/vect/simd-clone-6.cc: New test.
* g++.dg/gomp/declare-simd-6.C: New test.
From-SVN: r236648
|
|
The vectorizable_* routines had many instances of:
slp_node || PURE_SLP_STMT (stmt_info)
which gives the misleading impression that we can have
!slp_node && PURE_SLP_STMT (stmt_info). In this context
it's really enough to test slp_node on its own.
There are three cases:
loop vectorisation only:
vectorizable_foo called only with !slp_node
pure SLP:
vectorizable_foo called only with slp_node
hybrid SLP:
(e.g. a vector that's used in SLP statements and also in a reduction)
- vectorizable_foo called once with slp_node for the SLP uses.
- vectorizable_foo called once with !slp_node for the non-SLP uses.
Hybrid SLP isn't possible for stores, so I added an explicit assert
for that.
I also made vectorizable_comparison static, to make it obvious that
no other callers outside tree-vect-stmts.c could use it with the
!slp && PURE_SLP_STMT combination.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vectorizer.h (vectorizable_comparison): Delete.
* tree-vect-loop.c (vectorizable_reduction): Remove redundant
PURE_SLP_STMT check.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_store): Likewise. Assert that we don't have
hybrid SLP.
(vectorizable_comparison): Make static. Remove redundant
PURE_SLP_STMT check.
(vect_transform_stmt): Assert that we always have an slp_node
if PURE_SLP_STMT.
From-SVN: r236642
|
|
vectorizable_load forces peeling for gaps if the vectorisation factor
is not a multiple of the group size, since in that case we'd normally load
beyond the original scalar accesses but drop the excess elements as part
of a following permute:
if (loop_vinfo
&& ! STMT_VINFO_STRIDED_P (stmt_info)
&& (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
|| (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
This isn't necessary for LOAD_LANES though, since it loads only the
data needed and does the permute itself.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-stmts.c (vectorizable_load): Reorder checks so that
load_lanes/grouped_load classification comes first. Don't check
whether the vectorization factor is a multiple of the group size
for load_lanes.
gcc/testsuite/
* gcc.dg/vect/vect-load-lanes-peeling-1.c: New test.
From-SVN: r236632
|
|
vectorizable_load had a curious "force_peeling" variable, with no
comment explaining why we need it for single-element interleaving
but not for other cases. I think it's simply because we weren't
initialising the GROUP_GAP correctly for single loads.
Tested on aarch64-linux-gnu and x86_64-linux-gnu.
gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Set
GROUP_GAP for single-element interleaving.
* tree-vect-stmts.c (vectorizable_load): Remove force_peeling
variable.
From-SVN: r236631
|
|
* tree-vect-stmts.c (vectorizable_simd_clone_call): Utilize
auto_vec instead of vec.
From-SVN: r236472
|
|
2016-04-20 Richard Biener <rguenther@suse.de>
PR tree-optimization/70726
* tree-vect-stmts.c (vectorizable_shift): Do not use scalar
shift amounts from a pattern stmt operand.
* g++.dg/vect/pr70726.cc: New testcase.
From-SVN: r235236
|
|
* tree.h (TYPE_ALIGN, DECL_ALIGN): Return shifted amount.
(SET_TYPE_ALIGN, SET_DECL_ALIGN): New.
* tree-core.h (tree_type_common.align): Use bit-field.
(tree_type_common.spare): New.
(tree_decl_common.off_align): Make smaller.
(tree_decl_common.align): Use bit-field.
* expr.c (expand_expr_addr_expr_1): Use SET_TYPE_ALIGN.
* omp-low.c (install_var_field): Use SET_DECL_ALIGN.
(scan_sharing_clauses): Ditto.
(finish_taskreg_scan): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(omp_finish_file): Ditto.
* stor-layout.c (do_type_align): Use SET_DECL_ALIGN.
(layout_decl): Ditto.
(relayout_decl): Ditto.
(finalize_record_size): Use SET_TYPE_ALIGN.
(finalize_type_size): Ditto.
(finish_builtin_struct): Ditto.
(layout_type): Ditto.
(initialize_sizetypes): Ditto.
* targhooks.c (std_gimplify_va_arg_expr): Use SET_TYPE_ALIGN.
* tree-nested.c (insert_field_into_struct): Use SET_TYPE_ALIGN.
(lookup_field_for_decl): Use SET_DECL_ALIGN.
(get_chain_field): Ditto.
(get_trampoline_type): Ditto.
(get_nl_goto_field): Ditto.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
SET_DECL_ALIGN.
(unpack_ts_type_common_value_fields): Use SET_TYPE_ALIGN.
* gimple-expr.c (copy_var_decl): Use SET_DECL_ALIGN.
* tree.c (make_node_stat): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(build_qualified_type): Use SET_TYPE_ALIGN.
(build_aligned_type, build_range_type_1): Ditto.
(build_atomic_base): Ditto.
(build_common_tree_nodes): Ditto.
* cfgexpand.c (align_local_variable): Use SET_DECL_ALIGN.
(expand_one_stack_var_at): Ditto.
* coverage.c (build_var): Use SET_DECL_ALIGN.
* except.c (init_eh): Ditto.
* function.c (assign_parm_setup_block): Ditto.
* symtab.c (increase_alignment_1): Ditto.
* tree-ssa-ccp.c (fold_builtin_alloca_with_align): Ditto.
* tree-vect-stmts.c (ensure_base_align): Ditto.
* varasm.c (align_variable): Ditto.
(assemble_variable): Ditto.
(build_constant_desc): Ditto.
(output_constant_def_contents): Ditto.
* config/arm/arm.c (arm_relayout_function): Use SET_DECL_ALIGN.
* config/avr/avr.c (avr_adjust_type_node): Use SET_TYPE_ALIGN.
* config/mips/mips.c (mips_std_gimplify_va_arg_expr): Ditto.
* config/msp430/msp430.c (msp430_gimplify_va_arg_expr): Ditto.
* config/spu/spu.c (spu_build_builtin_va_list): Use SET_DECL_ALIGN.
ada/
* gcc-interface/decl.c (gnat_to_gnu_entity): Use SET_TYPE_ALIGN.
(gnat_to_gnu_field): Ditto.
(components_to_record): Ditto.
(create_variant_part_from): Ditto.
(copy_and_substitute_in_size): Ditto.
(substitute_in_type): Ditto.
* gcc-interface/utils.c (make_aligning_type): Use SET_TYPE_ALIGN.
(make_packable_type): Ditto.
(maybe_pad_type): Ditto.
(finish_fat_pointer_type): Ditto.
(finish_record_type): Ditto and use SET_DECL_ALIGN.
(rest_of_record_type_compilation): Use SET_TYPE_ALIGN.
(create_field_decl): Use SET_DECL_ALIGN.
c-family/
* c-common.c (handle_aligned_attribute): Use SET_TYPE_ALIGN
and SET_DECL_ALIGN.
c/
* c-decl.c (merge_decls): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(grokdeclarator, parser_xref_tag, finish_enum): Use SET_TYPE_ALIGN.
cp/
* class.c (build_vtable): Use SET_DECL_ALIGN and SET_TYPE_ALIGN.
(layout_class_type): Ditto.
(build_base_field): Use SET_DECL_ALIGN.
(fixup_attribute_variants): Use SET_TYPE_ALIGN.
* decl.c (duplicate_decls): Use SET_DECL_ALIGN.
(record_unknown_type): Use SET_TYPE_ALIGN.
(cxx_init_decl_processing): Ditto.
(copy_type_enum): Ditto.
(grokfndecl): Use SET_DECL_ALIGN.
(copy_type_enum): Use SET_TYPE_ALIGN.
* pt.c (instantiate_class_template_1): Use SET_TYPE_ALIGN.
(tsubst): Ditto.
* tree.c (cp_build_qualified_type_real): Use SET_TYPE_ALIGN.
* lambda.c (maybe_add_lambda_conv_op): Use SET_DECL_ALIGN.
* method.c (implicitly_declare_fn): Use SET_DECL_ALIGN.
* rtti.c (emit_tinfo_decl): Ditto.
fortran/
* trans-io.c (gfc_build_io_library_fndecls): Use SET_TYPE_ALIGN.
* trans-common.c (build_common_decl): Use SET_DECL_ALIGN.
* trans-types.c (gfc_add_field_to_struct): Use SET_DECL_ALIGN.
go/
* go-gcc.cc (Gcc_backend::implicit_variable): Use SET_DECL_ALIGN.
java/
* class.c (add_method_1): Use SET_DECL_ALIGN.
(make_class_data): Ditto.
(emit_register_classes_in_jcr_section): Ditto.
* typeck.c (build_java_array_type): Ditto.
objc/
* objc-act.c (objc_build_struct): Use SET_DECL_ALIGN.
libcc1/
* plugin.cc (plugin_finish_record_or_union): Use SET_TYPE_ALIGN.
From-SVN: r235172
|
|
modes on x86_64-linux-gnu (in immed_wide_int_const, at emit-rtl.c:606))
2016-03-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/70396
* tree-vect-stmts.c (vectorizable_comparison): Use
get_vectype_for_scalar_type.
* gcc.dg/torture/pr70396.c: New testcase.
From-SVN: r234455
|
|
-O3 -march=skylake-avx512.)
gcc/
PR tree-optimization/70252
* tree-vect-stmts.c (supportable_widening_operation): Check resulting
boolean vector has a proper number of elements.
(supportable_narrowing_operation): Likewise.
gcc/testsuite/
PR tree-optimization/70252
* gcc.dg/pr70252.c: New test.
From-SVN: r234323
|
|
number of elements.
gcc/
* tree-vect-stmts.c (vectorizable_mask_load_store): Check mask
has a proper number of elements.
From-SVN: r234104
|
|
PR target/70021
* tree-vect-stmts.c (vect_mark_relevant): Remove USED_IN_PATTERN
argument, if STMT_VINFO_IN_PATTERN_P (stmt_info), always mark
the pattern no matter if it is used just by non-pattern, pattern
or mix thereof.
(process_use, vect_mark_stmts_to_be_vectorized): Adjust callers.
* tree-vect-patterns.c (vect_recog_vector_vector_shift_pattern): If
oprnd1 def_stmt is in pattern, don't look through it.
* gcc.dg/vect/pr70021.c: New test.
* gcc.target/i386/pr70021.c: New test.
From-SVN: r233940
|
|
gcc/
PR tree-optimization/69956
* tree-vect-stmts.c (supportable_widening_operation): Support
multi-step conversion of boolean vectors.
(supportable_narrowing_operation): Likewise.
gcc/testsuite/
PR tree-optimization/69956
* gcc.dg/pr69956.c: New test.
From-SVN: r233850
|
|
2016-02-24 Richard Biener <rguenther@suse.de>
PR tree-optimization/69907
* tree-vect-stmts.c (vectorizable_load): Check for gaps at the
end of permutations for BB vectorization.
* gcc.dg/vect/bb-slp-pr69907.c: New testcase.
* gcc.dg/vect/bb-slp-34.c: XFAIL.
* gcc.dg/vect/bb-slp-pr68892.c: Likewise.
From-SVN: r233655
|
|
gcc/
2016-02-02 Yuri Rumyantsev <ysrumyan@gmail.com>
PR middle-end/68542
* config/i386/i386.c (ix86_expand_branch): Add support for conditional
branch with vector comparison.
* config/i386/sse.md (VI48_AVX): New mode iterator.
(define_expand "cbranch<mode>4): Add support for conditional branch
with vector comparison.
* tree-vect-loop.c (optimize_mask_stores): New function.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores after vec_info destroy.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.
gcc/testsuite
2016-02-02 Yuri Rumyantsev <ysrumyan@gmail.com>
PR middle-end/68542
* gcc.dg/vect/vect-mask-store-move-1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
From-SVN: r233068
|
|
gcc/
* tree-vect-stmts.c (vectorizable_comparison): Add
NULL check for vectype.
gcc/testsuite/
* gcc.dg/declare-simd.c: New test.
From-SVN: r232917
|
|
gcc/
PR target/69421
* tree-vect-stmts.c (vectorizable_condition): Check vectype
of operands is compatible with a statement vectype.
gcc/testsuite/
PR target/69421
* gcc.dg/pr69421.c: New test.
From-SVN: r232792
|
|
instead of building a temporary tree.
* tree-vect-stmts.c (vectorizable_condition): Build a VEC_COND_EXPR
directly instead of building a temporary tree.
From-SVN: r232764
|
|
tree-vect-stmts.c:1379 with -O3)
gcc/
PR tree-optimization/69328
* tree-vect-stmts.c (vect_is_simple_cond): Check compared
vectors have same number of elements.
(vectorizable_condition): Fix masked version recognition.
gcc/testsuite/
PR tree-optimization/69328
* gcc.dg/pr69328.c: New test.
Co-Authored-By: Richard Biener <rguenther@suse.de>
From-SVN: r232608
|
|
gcc/
* tree-vect-stmts.c (vectorizable_store): Check
rhs vectype.
From-SVN: r232568
|