aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.c
AgeCommit message (Collapse)AuthorFilesLines
2021-08-10tree-optimization/101801 - rework generic vector vectorization moreRichard Biener1-0/+18
This builds ontop of the vect_worthwhile_without_simd_p refactoring done earlier. It was wrong in dropping the appearant double checks for operation support since the optab check can happen with an integer vector emulation mode and thus succeed but vector lowering might not actually support the operation on word_mode. The following patch adds a vect_emulated_vector_p helper and re-instantiates the check where it was previously. It also adds appropriate costing of the scalar stmts emitted by vector lowering to vectorizable_operation which should be the only place such operations are synthesized. I've also cared for the case where the vector mode is supported but the operation is not (though I think this will be unlikely given we're talking about plus, minus and negate). This fixes the observed FAIL of gcc.dg/tree-ssa/gen-vect-11b.c with -m32 where we end up vectorizing a multiplication that ends up being teared down to scalars again by vector lowering. I'm not super happy about all the other places where we're now and previously feeding scalar modes to optab checks where we want to know whether we can vectorize sth but well. 2021-09-08 Richard Biener <rguenther@suse.de> PR tree-optimization/101801 PR tree-optimization/101819 * tree-vectorizer.h (vect_emulated_vector_p): Declare. * tree-vect-loop.c (vect_emulated_vector_p): New function. (vectorizable_reduction): Re-instantiate a check for emulated operations. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. Cost emulated vector operations according to the scalar sequence synthesized by vector lowering.
2021-08-06tree-optimization/101801 - remove vect_worthwhile_without_simd_pRichard Biener1-36/+7
This removes the cost part of vect_worthwhile_without_simd_p, retaining only the correctness bits. The reason is that the cost heuristic do not properly account for SLP plus the check whether "without simd" applies misfires for AVX512 mask vectors at the moment, leading to missed vectorizations there. Any costing decision should take place in the cost modeling, no single stmt is to disable all vectorization on its own. 2021-08-06 Richard Biener <rguenther@suse.de> PR tree-optimization/101801 * tree-vectorizer.h (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this. * tree-vect-loop.c (vect_worthwhile_without_simd_p): Rename... (vect_can_vectorize_without_simd_p): ... to this and fold in vect_min_worthwhile_factor. (vect_min_worthwhile_factor): Remove. (vectorizable_reduction): Adjust and remove the cost part. * tree-vect-stmts.c (vectorizable_shift): Likewise. (vectorizable_operation): Likewise.
2021-08-04vect: Tweak comparisons with existing epilogue loopsRichard Sandiford1-1/+9
This patch uses a more accurate scalar iteration estimate when comparing the epilogue of a constant-iteration loop with a candidate replacement epilogue. In the testcase, the patch prevents a 1-to-3-element SVE epilogue from seeming better than a 64-bit Advanced SIMD epilogue. gcc/ * tree-vect-loop.c (vect_better_loop_vinfo_p): Detect cases in which old_loop_vinfo is an epilogue loop that handles a constant number of iterations. gcc/testsuite/ * gcc.target/aarch64/sve/cost_model_12.c: New test.
2021-08-04vect: Tweak dump messages for vector mode choiceRichard Sandiford1-1/+10
After vect_analyze_loop has successfully analysed a loop for one base vector mode B1, it considers using following base vector modes to vectorise an epilogue. However, for VECT_COMPARE_COSTS, a later mode B2 might turn out to be better than B1 was. Initially this comparison will be between an epilogue loop (for B2) and a main loop (for B1). However, in r11-6458 I'd added code to reanalyse the B2 epilogue loop as a main loop, partly for correctness and partly for better costing. This can lead to a situation in which we think that the B2 epilogue loop was better than the B1 main loop, but that the B2 main loop is not better than the B1 main loop. There was no dump message to say that this had happened, which made it look like B2 had still won. gcc/ * tree-vect-loop.c (vect_analyze_loop): Print a dump message when a reanalyzed loop fails to be cheaper than the current main loop.
2021-07-16tree-optimization/101462 - fix signedness of reused reduction vectorRichard Biener1-11/+25
This fixes the partial reduction of the reused reduction vector to carried out in the correct sign and the correctly signed vector recorded for the skip edge use. 2021-07-16 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_transform_cycle_phi): Correct sign conversion issues with the partial reduction of the reused vector accumulator.
2021-07-14Vect: Add support for dot-product where the sign for the multiplicant changes.Tamar Christina1-1/+7
This patch adds support for a dot product where the sign of the multiplication arguments differ. i.e. one is signed and one is unsigned but the precisions are the same. #define N 480 #define SIGNEDNESS_1 unsigned #define SIGNEDNESS_2 signed #define SIGNEDNESS_3 signed #define SIGNEDNESS_4 unsigned SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a, SIGNEDNESS_4 char *restrict b) { for (__INTPTR_TYPE__ i = 0; i < N; ++i) { int av = a[i]; int bv = b[i]; SIGNEDNESS_2 short mult = av * bv; res += mult; } return res; } The operations are performed as if the operands were extended to a 32-bit value. As such this operation isn't valid if there is an intermediate conversion to an unsigned value. i.e. if SIGNEDNESS_2 is unsigned. more over if the signs of SIGNEDNESS_3 and SIGNEDNESS_4 are flipped the same optab is used but the operands are flipped in the optab expansion. To support this the patch extends the dot-product detection to optionally ignore operands with different signs and stores this information in the optab subtype which is now made a bitfield. The subtype can now additionally controls which optab an EXPR can expand to. gcc/ChangeLog: * optabs.def (usdot_prod_optab): New. * doc/md.texi: Document it and clarify other dot prod optabs. * optabs-tree.h (enum optab_subtype): Add optab_vector_mixed_sign. * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab. * optabs.c (expand_widen_pattern_expr): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind. * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take optional optab subtype. (vect_widened_op_tree): Optionally ignore mismatch types. (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
2021-07-14Support reduction def re-use for epilogue with different vector sizeRichard Biener1-88/+139
The following adds support for re-using the vector reduction def from the main loop in vectorized epilogue loops on architectures which use different vector sizes for the epilogue. That's only x86 as far as I am aware. 2021-07-13 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_find_reusable_accumulator): Handle vector types where the old vector type has a multiple of the new vector type elements. (vect_create_partial_epilog): New function, split out from... (vect_create_epilog_for_reduction): ... here. (vect_transform_cycle_phi): Reduce the re-used accumulator to the new vector type. * gcc.target/i386/vect-reduc-1.c: New testcase.
2021-07-13vect: Reuse reduction accumulators between loopsRichard Sandiford1-58/+249
This patch adds support for reusing a main loop's reduction accumulator in an epilogue loop. This in turn lets the loops share a single piece of vector->scalar reduction code. The patch has the following restrictions: (1) The epilogue reduction can only operate on a single vector (e.g. ncopies must be 1 for non-SLP reductions, and the group size must be <= the element count for SLP reductions). (2) Both loops must use the same vector mode for their accumulators. This means that the patch is restricted to targets that support --param vect-partial-vector-usage=1. (3) The reduction must be a standard “tree code” reduction. However, these restrictions could be lifted in future. For example, if the main loop operates on 128-bit vectors and the epilogue loop operates on 64-bit vectors, we could in future reduce the 128-bit vector by one stage and use the 64-bit result as the starting point for the epilogue result. The patch tries to handle chained SLP reductions, unchained SLP reductions and non-SLP reductions. It also handles cases in which the epilogue loop is entered directly (rather than via the main loop) and cases in which the epilogue loop can be skipped. vect_get_main_loop_result is a bit more general than the current patch needs. gcc/ * tree-vectorizer.h (vect_reusable_accumulator): New structure. (_loop_vec_info::main_loop_edge): New field. (_loop_vec_info::skip_main_loop_edge): Likewise. (_loop_vec_info::skip_this_loop_edge): Likewise. (_loop_vec_info::reusable_accumulators): Likewise. (_stmt_vec_info::reduc_scalar_results): Likewise. (_stmt_vec_info::reused_accumulator): Likewise. (vect_get_main_loop_result): Declare. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Initialize reduc_scalar_inputs. (vec_info::free_stmt_vec_info): Free reduc_scalar_inputs. * tree-vect-loop-manip.c (vect_get_main_loop_result): New function. (vect_do_peeling): Fill an epilogue loop's main_loop_edge, skip_main_loop_edge and skip_this_loop_edge fields. * tree-vect-loop.c (INCLUDE_ALGORITHM): Define. (vect_emit_reduction_init_stmts): New function. (get_initial_def_for_reduction): Use it. (get_initial_defs_for_reduction): Likewise. Change the vinfo parameter to a loop_vec_info. (vect_create_epilog_for_reduction): Store the scalar results in the reduc_info. If an epilogue loop is reusing an accumulator from the main loop, and if the epilogue loop can also be skipped, try to place the reduction code in the join block. Record accumulators that could potentially be reused by epilogue loops. (vect_transform_cycle_phi): When vectorizing epilogue loops, try to reuse accumulators from the main loop. Record the initial value in reduc_info for non-SLP reductions too. gcc/testsuite/ * gcc.target/aarch64/sve/reduc_9.c: New test. * gcc.target/aarch64/sve/reduc_9_run.c: Likewise. * gcc.target/aarch64/sve/reduc_10.c: Likewise. * gcc.target/aarch64/sve/reduc_10_run.c: Likewise. * gcc.target/aarch64/sve/reduc_11.c: Likewise. * gcc.target/aarch64/sve/reduc_11_run.c: Likewise. * gcc.target/aarch64/sve/reduc_12.c: Likewise. * gcc.target/aarch64/sve/reduc_12_run.c: Likewise. * gcc.target/aarch64/sve/reduc_13.c: Likewise. * gcc.target/aarch64/sve/reduc_13_run.c: Likewise. * gcc.target/aarch64/sve/reduc_14.c: Likewise. * gcc.target/aarch64/sve/reduc_14_run.c: Likewise. * gcc.target/aarch64/sve/reduc_15.c: Likewise. * gcc.target/aarch64/sve/reduc_15_run.c: Likewise.
2021-07-13vect: Simplify get_initial_def_for_reductionRichard Sandiford1-118/+59
After previous patches, we can now easily provide the neutral op as an argument to get_initial_def_for_reduction. This in turn allows the adjustment calculation to be moved outside of get_initial_def_for_reduction, which is the main motivation of the patch. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Remove adjustment handling. Take the neutral value as an argument, in place of the code argument. (vect_transform_cycle_phi): Update accordingly. Handle the initial values of cond reductions separately from code reductions. Choose the adjustment here rather than in get_initial_def_for_reduction. Sink the splat of vec_initial_def.
2021-07-13vect: Generalise neutral_op_for_slp_reductionRichard Sandiford1-33/+26
This patch generalises the interface to neutral_op_for_slp_reduction so that it can be used for non-SLP reductions too. This isn't much of a win on its own, but it helps later patches. gcc/ * tree-vect-loop.c (neutral_op_for_slp_reduction): Replace with... (neutral_op_for_reduction): ...this, providing a more general interface. (vect_create_epilog_for_reduction): Update accordingly. (vectorizable_reduction): Likewise. (vect_transform_cycle_phi): Likewise.
2021-07-13vect: Pass reduc_info to get_initial_def_for_reductionRichard Sandiford1-5/+5
Similarly to the previous patch, this one passes the reduc_info to get_initial_def_for_reduction, rather than a stmt_vec_info that lacks the metadata. This again becomes useful later. gcc/ * tree-vect-loop.c (get_initial_def_for_reduction): Take the reduc_info instead of the original stmt_vec_info. (vect_transform_cycle_phi): Update accordingly.
2021-07-13vect: Pass reduc_info to get_initial_defs_for_reductionRichard Sandiford1-13/+10
This patch passes the reduc_info to get_initial_defs_for_reduction, so that the function can get general information from there rather than from the first SLP statement. This isn't a win on its own, but it becomes important with later patches. gcc/ * tree-vect-loop.c (get_initial_defs_for_reduction): Take the reduc_info as an additional parameter. (vect_transform_cycle_phi): Update accordingly.
2021-07-13vect: Add a vect_phi_initial_value helper functionRichard Sandiford1-20/+9
This patch adds a helper function called vect_phi_initial_value for returning the incoming value of a given loop phi. The main reason for adding it is to ensure that the right preheader edge is used when vectorising nested loops. (PHI_ARG_DEF_FROM_EDGE itself doesn't assert that the given edge is for the right block, although I guess that would be good to add separately.) gcc/ * tree-vectorizer.h: Include tree-ssa-operands.h. (vect_phi_initial_value): New function. * tree-vect-loop.c (neutral_op_for_slp_reduction): Use it. (get_initial_defs_for_reduction, info_for_reduction): Likewise. (vect_create_epilog_for_reduction, vectorizable_reduction): Likewise. (vect_transform_cycle_phi, vectorizable_induction): Likewise.
2021-07-13vect: Ensure reduc_inputs always have vectypeRichard Sandiford1-17/+11
Vector reduction accumulators can differ in signedness from the final scalar result. The conversions to handle that case were distributed through vect_create_epilog_for_reduction; this patch does the conversion up-front instead. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Convert the phi results to vectype after creating them. Remove later conversion code that thus becomes redundant.
2021-07-13vect: Remove new_phis from vect_create_epilog_for_reductionRichard Sandiford1-72/+41
vect_create_epilog_for_reduction had a variable called new_phis. It collected the statements that produce the exit block definitions of the vector reduction accumulators. Although those statements are indeed phis initially, they are often replaced with normal statements later, leading to puzzling code like: FOR_EACH_VEC_ELT (new_phis, i, new_phi) { int bit_offset; if (gimple_code (new_phi) == GIMPLE_PHI) vec_temp = PHI_RESULT (new_phi); else vec_temp = gimple_assign_lhs (new_phi); Also, although the array collects statements, in practice all users want the lhs instead. This patch therefore replaces new_phis with a vector of gimple values called “reduc_inputs”. Also, reduction chains and ncopies>1 were handled with identical code (and there was a comment saying so). The patch unites them into a single “if”. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Replace the new_phis vector with a reduc_inputs vector. Combine handling of reduction chains and ncopies > 1.
2021-07-13vect: Create array_slice of live-out stmtsRichard Sandiford1-40/+21
This patch constructs an array_slice of the scalar statements that produce live-out reduction results in the original unvectorised loop. There are three cases: - SLP reduction chains: the final SLP stmt is live-out - full SLP reductions: all SLP stmts are live-out - non-SLP reductions: the single scalar stmt is live-out This is a slight simplification on its own, mostly because it maans “group_size” has a consistent meaning throughout the function. The main justification though is that it helps with later patches. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Truncate scalar_results to group_size elements after reducing down from N*group_size elements. Construct an array_slice of the live-out stmts and assert that there is one stmt per scalar result.
2021-07-13vect: Simplify epilogue reduction codeRichard Sandiford1-26/+4
vect_create_epilog_for_reduction only handles two cases: single-loop reductions and double reductions. “nested cycles” (i.e. reductions in the inner loop when vectorising an outer loop) are handled elsewhere and don't need a vector->scalar reduction. The function had variables called nested_in_vect_loop and double_reduc and asserted that nested_in_vect_loop implied double_reduc, but it still had code to handle nested_in_vect_loop && !double_reduc. This patch removes that and uses double_reduc everywhere. gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Remove nested_in_vect_loop and use double_reduc everywhere. Remove dead assignment to "loop".
2021-07-08vect: Remove always-true conditionRichard Sandiford1-26/+24
vectorizable_reduction had code guarded by: if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) But that's always true after: if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_nested_cycle) return false; if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) { … return true; } (I wasn't sure at first how the empty “else” for the first “if” above was supposed to work.) gcc/ * tree-vect-loop.c (vectorizable_reduction): Remove always-true if condition.
2021-06-17Vectorization of BB reductionsRichard Biener1-1/+1
This adds a simple reduction vectorization capability to the non-loop vectorizer. Simple meaning it lacks any of the fancy ways to generate the reduction epilogue but only supports those we can handle via a direct internal function reducing a vector to a scalar. One of the main reasons is to avoid massive refactoring at this point but also that more complex epilogue operations are hardly profitable. Mixed sign reductions are for now fend off and I'm not finally settled with whether we want an explicit SLP node for the reduction epilogue operation. Handling mixed signs could be done by multiplying with a { 1, -1, .. } vector. Fend off are also reductions with non-internal operands (constants or register parameters for example). Costing is done by accounting the original scalar participating stmts for the scalar cost and log2 permutes and operations for the vectorized epilogue. -- SPEC CPU 2017 FP with rate workload measurements show (picked fastest runs of three) regressions for 507.cactuBSSN_r (1.5%), 508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and 527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and 538.imagick_r (1.5%). This is with -Ofast -march=znver2 on a Zen2. Statistics on CPU 2017 shows that the overwhelming number of seeds we find are reductions of two lanes (well - that's basically every associative operation). That means we put a quite high pressure on the SLP discovery process this way. In total we find 583218 seeds we put to SLP discovery out of which 66205 pass that and only 6185 of those make it through code generation checks. 796 of those are discarded because the reduction is part of a larger SLP instance. 4195 of the remaining are deemed not profitable to vectorize and 1194 are finally vectorized. That's a poor 0.2% rate. Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%), 28181 four (5%), 4808 five, 909 six and there are instances up to 120 lanes. There's a set of 54086 candidate seeds we reject because they contain a constant or invariant (not implemented yet) but still have two or more lanes that could be put to SLP discovery. 2021-06-16 Richard Biener <rguenther@suse.de> PR tree-optimization/54400 * tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_bb_reduc. (reduction_fn_for_scalar_code): Declare. * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence): Check SLP_INSTANCE_KIND instead of looking at the representative. (vect_slp_analyze_instance_alignment): Likewise. * tree-vect-loop.c (reduction_fn_for_scalar_code): Export. * tree-vect-slp.c (vect_slp_linearize_chain): Split out chain linearization from vect_build_slp_tree_2 and generalize for the use of BB reduction vectorization. (vect_build_slp_tree_2): Adjust accordingly. (vect_optimize_slp): Elide permutes at the root of BB reduction instances. (vectorizable_bb_reduc_epilogue): New function. (vect_slp_prune_covered_roots): Likewise. (vect_slp_analyze_operations): Use them. (vect_slp_check_for_constructors): Recognize associatable chains for BB reduction vectorization. (vectorize_slp_instance_root_stmt): Generate code for the BB reduction epilogue. * gcc.dg/vect/bb-slp-pr54400.c: New testcase.
2021-06-09tree-optimization/100981 - fix SLP patterns involving reductionsRichard Biener1-1/+1
The following fixes the SLP FMA patterns to preserve reduction info and the reduction vectorization to consider internal function call defs for the reduction stmt. 2021-06-09 Richard Biener <rguenther@suse.de> PR tree-optimization/100981 gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use gimple_get_lhs to also handle calls. * tree-vect-slp-patterns.c (complex_pattern::build): Transfer reduction info. gcc/testsuite/ * gfortran.dg/vect/pr100981-1.f90: New testcase. libgomp/ * testsuite/libgomp.fortran/pr100981-2.f90: New testcase.
2021-06-03vect: Use main loop's thresholds and VF to narrow upper_bound of epilogueAndre Vieira1-6/+25
This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Use main loop's various' thresholds to narrow the upper bound on epilogue iterations. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
2021-05-20vect: Replace hardcoded inner loop cost factorKewen Lin1-1/+2
This patch is to replace the current hardcoded weight factor 50, which is applied by the loop vectorizer to the cost of statements in an inner loop relative to the loop being vectorized, with one newly added member inner_loop_cost_factor in loop vinfo. It also introduces one parameter vect-inner-loop-cost-factor whose default value is 50, and is used to initialize the inner_loop_cost_factor member. The motivation here is that: if targets want to have one unique function to gather some information in each add_stmt_cost call, no matter that it's put before or after the cost tweaking part for inner loop, it may have the need to adjust (expand or shrink) the gathered data as the factor. Now the factor is hardcoded, it's not easily maintained. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. gcc/ChangeLog: * doc/invoke.texi (vect-inner-loop-cost-factor): Document new parameter. * params.opt (vect-inner-loop-cost-factor): New. * targhooks.c (default_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file tree-vectorizer.h and its required ones. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor. * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor. (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.
2021-05-11vect: Add costing_for_scalar parameter to init_cost hookKewen Lin1-3/+3
rs6000 port function rs6000_density_test wants to differentiate the current cost model is for the scalar version of a loop or block, or the vector version. As Richi suggested, this patch introduces one new parameter costing_for_scalar to init_cost hook to pass down this information explicitly. gcc/ChangeLog: * doc/tm.texi: Regenerated. * target.def (init_cost): Add new parameter costing_for_scalar. * targhooks.c (default_init_cost): Adjust for new parameter. * targhooks.h (default_init_cost): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise. (vect_compute_single_scalar_iteration_cost): Likewise. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vectorizer.h (init_cost): Likewise. * config/aarch64/aarch64.c (aarch64_init_cost): Likewise. * config/i386/i386.c (ix86_init_cost): Likewise. * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
2021-04-16vectorizer: Remove dead scalar .COND_* calls from vectorized loops [PR99767]Jakub Jelinek1-1/+15
The following testcase ICEs because disabling of DCE means there are dead stmts in the loop (though, in theory they could become dead only shortly before if-conv through some optimization), ifcvt which goes through all stmts in the loop if-converts them into .COND_DIV etc. internal fn calls in the copy of the loop meant for vectorization only, the loop is successfully vectorized but the particular .COND_* call is not because it isn't a live statement and the scalar .COND_* remains in the IL until expansion where it ICEs because these ifns only support vectors and not scalars. These ifns are similar to .MASK_{LOAD,STORE} in this behavior. One possible fix could be to expand scalar versions of them during expansion, basically undoing what if-conv did to create them, i.e. expand them as the lhs = else; if (mask) { lhs = statement; } or so. For .MASK_LOAD we have code to replace them in vect_transform_loop already though (not needed for .MASK_STORE, as stores should be always live and thus always vectorized), so this patch instead replaces .COND_* similarly to .MASK_LOAD in that loop, with the small difference that lhs = .MASK_LOAD (...); is replaced by lhs = 0; while lhs = .COND_* (..., else_arg); is replaced by lhs = else_arg. The statement must be dead, otherwise it would be vectorized, so I think it is not a big deal we don't turn it back into multiple basic blocks etc. (and it might be not possible to do that at that point). 2021-04-16 Jakub Jelinek <jakub@redhat.com> PR target/99767 * tree-vect-loop.c (vect_transform_loop): Don't remove just dead scalar .MASK_LOAD calls, but also dead .COND_* calls - replace them by their last argument. * gcc.target/aarch64/pr99767.c: New test.
2021-04-07tree-optimization/99947 - avoid v.safe_push (v[0])Richard Biener1-1/+2
This avoids (again) the C++ pitfall of pushing a reference to sth being reallocated. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99947 * tree-vect-loop.c (vectorizable_induction): Pre-allocate steps vector to avoid pushing elements from the reallocated vector. * gcc.dg/torture/pr99947.c: New testcase.
2021-04-06tree-optimization/99880 - avoid vectorizing irrelevant PHI backedge defsRichard Biener1-0/+1
This adds a relevancy check before trying to set the vector def of a backedge in an unvectorized PHI. 2021-04-06 Richard Biener <rguenther@suse.de> PR tree-optimization/99880 * tree-vect-loop.c (maybe_set_vectorized_backedge_value): Only set vectorized defs of relevant PHIs. * gcc.dg/torture/pr99880.c: New testcase.
2021-03-25vect: Init inside_cost in vect_model_reduction_costKewen Lin1-1/+1
This patch is to initialize the inside_cost as zero, can avoid to use its uninitialized value when some path doesn't assign it. gcc/ChangeLog: * tree-vect-loop.c (vect_model_reduction_cost): Init inside_cost.
2021-02-25tree-optimization/99253 - fix reduction path checkRichard Biener1-28/+28
This fixes an ordering problem with verifying that no intermediate computations in a reduction path are used outside of the chain. The check was disabled for value-preserving conversions at the tail but whether a stmt was a conversion or not was only computed after the first use. The following fixes this by re-ordering things accordingly. 2021-02-25 Richard Biener <rguenther@suse.de> PR tree-optimization/99253 * tree-vect-loop.c (check_reduction_path): First compute code, then verify out-of-loop uses. * gcc.dg/vect/pr99253.c: New testcase.
2021-02-10tree-optimization/99024 - fix leak in loop vect analysisRichard Biener1-1/+5
When we analyzed a loop as epilogue but later in peeling decide we're not going to use it then in the DTOR we clear the original loops ->aux which causes us to leak the main loop vinfo. Fixed by only clearing aux if it is associated with the vinfo we're destroying. 2021-02-10 Richard Biener <rguenther@suse.de> PR tree-optimization/99024 * tree-vect-loop.c (_loop_vec_info::~_loop_vec_info): Only clear loop->aux if it is associated with the destroyed loop_vinfo.
2021-02-04tree-optimization/98855 - fix some vectorizer cost issuesRichard Biener1-2/+6
This fixes us not costing vectorized bswap for SLP as well as avoiding biasing to the vectorized side when costing single-argument PHIs. Instead we assume coalescing here and cost them with zero cost for both the scalar and vectorized code. This doesn't fix the PR on its own. 2021-02-04 Richard Biener <rguenther@suse.de> PR tree-optimization/98855 * tree-vect-loop.c (vectorizable_phi): Do not cost single-argument PHIs. * tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise. * tree-vect-stmts.c (vectorizable_bswap): Also perform costing for SLP operation.
2021-02-03slp: Split out patterns away from using SLP_ONLY into their own flagTamar Christina1-1/+1
Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way to dissolve the SLP only patterns during SLP cancellation. However it seems like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what I expected. Namely that the non-SLP path can still use a statement marked STMT_VINFO_SLP_VECT_ONLY. One such example is masked loads which are used both in the SLP and non-SLP path. To fix this I now introduce a new flag STMT_VINFO_SLP_VECT_ONLY_PATTERN which is used only by the pattern matcher. gcc/ChangeLog: PR tree-optimization/98928 * tree-vect-loop.c (vect_analyze_loop_2): Change STMT_VINFO_SLP_VECT_ONLY to STMT_VINFO_SLP_VECT_ONLY_PATTERN. * tree-vect-slp-patterns.c (complex_pattern::build): Likewise. * tree-vectorizer.h (STMT_VINFO_SLP_VECT_ONLY_PATTERN): New. (class _stmt_vec_info): Add slp_vect_pattern_only_p. gcc/testsuite/ChangeLog: PR tree-optimization/98928 * gcc.target/i386/pr98928.c: New test.
2021-01-11tree-optimization/98526 - fix vectorizer reduction costRichard Biener1-6/+11
This fixes a double-counting in the reduction cost when vectorizing the reduction through the regular vectorizable_* functions. 2021-01-11 Richard Biener <rguenther@suse.de> PR tree-optimization/98526 * tree-vect-loop.c (vect_model_reduction_cost): Remove costing of the actual reduction op for the regular case. (vectorizable_reduction): Cost the stmts vect_transform_reduction produces here.
2021-01-05tree-optimization/98381 - fix live bool vector extractRichard Biener1-3/+2
This fixes extraction of live bool vector results for the case of integer mode vectors. 2021-01-05 Richard Biener <rguenther@suse.de> PR tree-optimization/98381 * tree.c (vector_element_bits): Properly compute bool vector element size. * tree-vect-loop.c (vectorizable_live_operation): Properly compute the last lane bit offset.
2021-01-05vect: Fix missing alias checks for 128-bit SVE [PR98371]Richard Sandiford1-3/+58
On AArch64, the vectoriser tries various ways of vectorising with both SVE and Advanced SIMD and picks the best one. All other things being equal, it prefers earlier attempts over later attempts. The way this works currently is that, once it has a successful vectorisation attempt A, it analyses all other attempts as epilogue loops of A: /* When pick_lowest_cost_p is true, we should in principle iterate over all the loop_vec_infos that LOOP_VINFO could replace and try to vectorize LOOP_VINFO under the same conditions. E.g. when trying to replace an epilogue loop, we should vectorize LOOP_VINFO as an epilogue loop with the same VF limit. When trying to replace the main loop, we should vectorize LOOP_VINFO as a main loop too. However, autovectorize_vector_modes is usually sorted as follows: - Modes that naturally produce lower VFs usually follow modes that naturally produce higher VFs. - When modes naturally produce the same VF, maskable modes usually follow unmaskable ones, so that the maskable mode can be used to vectorize the epilogue of the unmaskable mode. This order is preferred because it leads to the maximum epilogue vectorization opportunities. Targets should only use a different order if they want to make wide modes available while disparaging them relative to earlier, smaller modes. The assumption in that case is that the wider modes are more expensive in some way that isn't reflected directly in the costs. There should therefore be few interesting cases in which LOOP_VINFO fails when treated as an epilogue loop, succeeds when treated as a standalone loop, and ends up being genuinely cheaper than FIRST_LOOP_VINFO. */ However, the vectoriser can normally elide alias checks for epilogue loops, on the basis that the main loop should do them instead. Converting an epilogue loop to a main loop can therefore cause the alias checks to be skipped. (It probably also unfairly penalises the original loop in the cost comparison, given that one loop will have alias checks and the other won't.) As the comment says, we should in principle analyse each vector mode twice: once as a main loop and once as an epilogue. However, doing that up-front would be quite expensive. This patch instead goes for a compromise: if an epilogue loop for mode M2 seems better than a main loop for mode M1, re-analyse with M2 as the main loop. The patch fixes dg.torture.exp=pr69719.c when testing with -msve-vector-bits=128. gcc/ PR tree-optimization/98371 * tree-vect-loop.c (vect_reanalyze_as_main_loop): New function. (vect_analyze_loop): If an epilogue loop appears to be cheaper than the main loop, re-analyze it as a main loop before adopting it as a main loop.
2021-01-04tree-optimization/98291 - allow SLP more vectorization of reductionsRichard Biener1-2/+8
When the VF is one a SLP reduction is in-order and thus we can vectorize even when the reduction op is not associative. 2021-01-04 Richard Biener <rguenther@suse.de> PR tree-optimization/98291 * tree-vect-loop.c (vectorizable_reduction): Bypass associativity check for SLP reductions with VF 1. * gcc.dg/vect/slp-reduc-11.c: New testcase. * gcc.dg/vect/vect-reduc-in-order-4.c: Adjust.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-18aarch64: SVE: ICE in expand_direct_optab_fn [PR98177]Przemyslaw Wirkus1-4/+4
Problem comes from using the wrong interface to get the index type for a COND_REDUCTION. For fixed-length SVE we get a V2SI (a 64-bit Advanced SIMD vector) instead of a VNx2SI (an SVE vector that stores SI elements in DI containers). Credits to Richard Sandiford for pointing out the issue's root cause. Original PR snippet proposed to reproduce issue was only causing ICE for C++ compiler (see pr98177-1 test cases). I've slightly modified original snippet in order to reproduce issue on both C and C++ compilers. These are pr98177-2 test cases. gcc/ChangeLog: PR target/98177 * tree-vect-loop.c (vect_create_epilog_for_reduction): Use get_same_sized_vectype to obtain index type. (vectorizable_reduction): Likewise. gcc/testsuite/ChangeLog: PR target/98177 * g++.target/aarch64/sve/pr98177-1.C: New test. * g++.target/aarch64/sve/pr98177-2.C: New test. * gcc.target/aarch64/sve/pr98177-1.c: New test. * gcc.target/aarch64/sve/pr98177-2.c: New test.
2020-12-17vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in ↵Kyrylo Tkachov1-36/+49
vect_better_loop_vinfo_p While experimenting with some backend costs for Advanced SIMD and SVE I hit many cases where GCC would pick SVE for VLA auto-vectorisation even when the backend very clearly presented cheaper costs for Advanced SIMD. For a simple float addition loop the SVE costs were: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 28 Vector prologue cost: 2 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 2 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 4 and for Advanced SIMD (Neon) they're: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 11 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 vec.c:9:21: note: Runtime profitability threshold = 4 yet the SVE one was always picked. With guidance from Richard this seems to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the part with the big comment explaining the estimated_rel_new * 2 <= estimated_rel_old heuristic. This patch extends the comparisons by introducing a three-way estimate kind for poly_int values that the backend can distinguish. This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. gcc/ * target.h (enum poly_value_estimate_kind): Define. (estimated_poly_value): Take an estimate kind argument. * target.def (estimated_poly_value): Update definition for the above. * doc/tm.texi: Regenerate. * targhooks.c (estimated_poly_value): Update prototype. * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely estimates of VF to pick between vinfos. * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use estimated_poly_value instead of aarch64_estimated_poly_value. (aarch64_estimated_poly_value): Take a kind argument and handle it.
2020-12-13middle-end: Support complex AdditionTamar Christina1-2/+6
This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/complex.exp: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
2020-12-02guard maybe_set_vectorized_backedge_value callsRichard Biener1-11/+13
This makes sure to not call maybe_set_vectorized_backedge_value when we did not vectorize the latch def candidate. 2020-12-02 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_transform_loop_stmt): Return whether we vectorized a stmt. (vect_transform_loop): Only call maybe_set_vectorized_backedge_value when we vectorized the stmt.
2020-11-30tree-optimization/98064 - fix BB SLP live lane extract wrt LC SSARichard Biener1-0/+18
This avoids breaking LC SSA when SLP codegen pulled an out-of-loop def into a loop when merging with in-loop defs for an external def. 2020-11-30 Richard Biener <rguenther@suse.de> PR tree-optimization/98064 * tree-vect-loop.c (vectorizable_live_operation): Avoid breaking LC SSA for BB vectorization. * g++.dg/vect/pr98064.cc: New testcase.
2020-11-19vect: Add a “very cheap” cost modelRichard Sandiford1-0/+27
Currently we have three vector cost models: cheap, dynamic and unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's still relatively aggressive about peeling and aliasing checks, and can lead to significant code size growth. This patch adds an even more conservative choice, which for lack of imagination I've called “very cheap”. It only allows vectorisation if the vector code entirely replaces the scalar code. It also requires one iteration of the vector loop to pay for itself, regardless of how often the loop iterates. (If the vector loop needs multiple iterations to be beneficial then things are probably too close to call, and the conservative thing would be to stick with the scalar code.) The idea is that this should be suitable for -O2, although the patch doesn't change any defaults itself. I tested this by building and running a bunch of workloads for SVE, with three options: (1) -O2 (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap] All three builds used the default -msve-vector-bits=scalable and ran with the minimum vector length of 128 bits, which should give a worst-case bound for the performance impact. The workloads included a mixture of microbenchmarks and full applications. Because it's quite an eclectic mix, there's not much point giving exact figures. The aim was more to get a general impression. Code size growth with (2) was much lower than with (3). Only a handful of tests increased by more than 5%, and all of them were microbenchmarks. In terms of performance, (2) was significantly faster than (1) on microbenchmarks (as expected) but also on some full apps. Again, performance only regressed on a handful of tests. As expected, the performance of (3) vs. (1) and (3) vs. (2) is more of a mixed bag. There are several significant improvements with (3) over (2), but also some (smaller) regressions. That seems to be in line with -O2 -ftree-vectorize being a kind of -O2.5. The patch reorders vect_cost_model so that values are in order of increasing aggressiveness, which makes it possible to use range checks. The value 0 still represents “unlimited”, so “if (flag_vect_cost_model)” is still a meaningful check. gcc/ * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model. * common.opt (fvect-cost-model=): Add very-cheap as a possible option. (fsimd-cost-model=): Likewise. (vect_cost_model): Add very-cheap. * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP. Put the values in order of increasing aggressiveness. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use range checks when comparing against VECT_COST_MODEL_CHEAP. (vect_prune_runtime_alias_test_list): Do not allow any alias checks for the very-cheap cost model. * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow any peeling for the very-cheap cost model. Also require one iteration of the vector loop to pay for itself. gcc/testsuite/ * gcc.dg/vect/vect-cost-model-1.c: New test. * gcc.dg/vect/vect-cost-model-2.c: Likewise. * gcc.dg/vect/vect-cost-model-3.c: Likewise. * gcc.dg/vect/vect-cost-model-4.c: Likewise. * gcc.dg/vect/vect-cost-model-5.c: Likewise. * gcc.dg/vect/vect-cost-model-6.c: Likewise.
2020-11-18tree-optimization/97886 - deal with strange LC PHI nodesRichard Biener1-0/+11
This makes vectorization properly assign vector types to PHI nodes that copy from externals on loop exit edges. 2020-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97886 * tree-vect-loop.c (vectorizable_lc_phi): Properly assign vector types to invariants for SLP.
2020-11-16Delay SLP instance loads gatheringRichard Biener1-0/+3
This delays filling SLP_INSTANCE_LOADS. 2020-11-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_gather_slp_loads): Declare. * tree-vect-loop.c (vect_analyze_loop_2): Call vect_gather_slp_loads. * tree-vect-slp.c (vect_build_slp_instance): Do not gather SLP loads here. (vect_gather_slp_loads): Remove wrapper, new function. (vect_slp_analyze_bb_1): Call it.
2020-11-16tree-optimization/97835 - fix step vector construction for SLP inductionRichard Biener1-1/+1
We're stripping conversions off access functions of inductions and thus the step can be of different sign. Fix bogus step CTORs by converting the elements rather than the whole vector. 2020-11-16 Richard Biener <rguenther@suse.de> PR tree-optimization/97835 * tree-vect-loop.c (vectorizable_induction): Convert step scalars rather than step vector. * gcc.dg/vect/pr97835.c: New testcase.
2020-11-10tree-optimization/97760 - reduction paths with unhandled live stmtRichard Biener1-3/+6
This makes sure we reject reduction paths with a live stmt that is not the last one altering the value. This is because we do not handle this in the epilogue unless there's a scalar epilogue loop. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97760 * tree-vect-loop.c (check_reduction_path): Reject reduction paths we do not handle in epilogue generation. * gcc.dg/vect/pr97760.c: New testcase.
2020-11-09tree-optimization/97753 - fix SLP induction vectRichard Biener1-2/+5
This fixes updating of the step vectors when filling up to group_size. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97753 * tree-vect-loop.c (vectorizable_induction): Fill vec_steps when CSEing inside the group. * gcc.dg/vect/pr97753.c: New testcase.
2020-11-06tree-optimization/97732 - fix init of SLP induction vectorizationRichard Biener1-0/+4
This PR exposes two issues - one that the vector builder treats &x as eligible for VECTOR_CST elements and one that SLP induction vectorization forgets to convert init elements to the vector component type which makes a difference for pointer vs. integer. 2020-11-06 Richard Biener <rguenther@suse.de> PR tree-optimization/97732 * tree-vect-loop.c (vectorizable_induction): Convert the init elements to the vector component type. * gimple-fold.c (gimple_build_vector): Use CONSTANT_CLASS_P rather than TREE_CONSTANT to determine if elements are eligible for VECTOR_CSTs. * gcc.dg/vect/bb-slp-pr97732.c: New testcase.
2020-11-05middle-end: Store and use the SLP instance kind when aborting load/store lanesTamar Christina1-0/+1
This patch stores the SLP instance kind in the SLP instance so that we can use it later when detecting load/store lanes support. This also changes the load/store lane support check to only check if the SLP kind is a store. This means that in order for the load/lanes to work all instances must be of kind store. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Check kind. * tree-vect-slp.c (vect_build_slp_instance): New. (enum slp_instance_kind): Move to... * tree-vectorizer.h (enum slp_instance_kind): .. Here (SLP_INSTANCE_KIND): New.
2020-11-04middle-end: Move load/store-lanes check till late.Tamar Christina1-0/+72
This moves the code that checks for load/store lanes further in the pipeline and places it after slp_optimize. This would allow us to perform optimizations on the SLP tree and only bail out if we really have a permute. With this change it allows us to handle permutes such as {1,1,1,1} which should be handled by a load and replicate. This change however makes it all or nothing. Either all instances can be handled or none at all. This is why some of the test cases have been adjusted. gcc/ChangeLog: * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes check to ... * tree-vect-loop.c (vect_analyze_loop_2): ..Here gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-11b.c: Update output scan. * gcc.dg/vect/slp-perm-6.c: Likewise.