aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vect-loop.c
AgeCommit message (Collapse)AuthorFilesLines
2021-07-08vect: Remove always-true conditionRichard Sandiford1-26/+24
vectorizable_reduction had code guarded by: if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def) But that's always true after: if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def && STMT_VINFO_DEF_TYPE (stmt_info) != vect_nested_cycle) return false; if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_nested_cycle) { … return true; } (I wasn't sure at first how the empty “else” for the first “if” above was supposed to work.) gcc/ * tree-vect-loop.c (vectorizable_reduction): Remove always-true if condition.
2021-06-17Vectorization of BB reductionsRichard Biener1-1/+1
This adds a simple reduction vectorization capability to the non-loop vectorizer. Simple meaning it lacks any of the fancy ways to generate the reduction epilogue but only supports those we can handle via a direct internal function reducing a vector to a scalar. One of the main reasons is to avoid massive refactoring at this point but also that more complex epilogue operations are hardly profitable. Mixed sign reductions are for now fend off and I'm not finally settled with whether we want an explicit SLP node for the reduction epilogue operation. Handling mixed signs could be done by multiplying with a { 1, -1, .. } vector. Fend off are also reductions with non-internal operands (constants or register parameters for example). Costing is done by accounting the original scalar participating stmts for the scalar cost and log2 permutes and operations for the vectorized epilogue. -- SPEC CPU 2017 FP with rate workload measurements show (picked fastest runs of three) regressions for 507.cactuBSSN_r (1.5%), 508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and 527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and 538.imagick_r (1.5%). This is with -Ofast -march=znver2 on a Zen2. Statistics on CPU 2017 shows that the overwhelming number of seeds we find are reductions of two lanes (well - that's basically every associative operation). That means we put a quite high pressure on the SLP discovery process this way. In total we find 583218 seeds we put to SLP discovery out of which 66205 pass that and only 6185 of those make it through code generation checks. 796 of those are discarded because the reduction is part of a larger SLP instance. 4195 of the remaining are deemed not profitable to vectorize and 1194 are finally vectorized. That's a poor 0.2% rate. Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%), 28181 four (5%), 4808 five, 909 six and there are instances up to 120 lanes. There's a set of 54086 candidate seeds we reject because they contain a constant or invariant (not implemented yet) but still have two or more lanes that could be put to SLP discovery. 2021-06-16 Richard Biener <rguenther@suse.de> PR tree-optimization/54400 * tree-vectorizer.h (enum slp_instance_kind): Add slp_inst_kind_bb_reduc. (reduction_fn_for_scalar_code): Declare. * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence): Check SLP_INSTANCE_KIND instead of looking at the representative. (vect_slp_analyze_instance_alignment): Likewise. * tree-vect-loop.c (reduction_fn_for_scalar_code): Export. * tree-vect-slp.c (vect_slp_linearize_chain): Split out chain linearization from vect_build_slp_tree_2 and generalize for the use of BB reduction vectorization. (vect_build_slp_tree_2): Adjust accordingly. (vect_optimize_slp): Elide permutes at the root of BB reduction instances. (vectorizable_bb_reduc_epilogue): New function. (vect_slp_prune_covered_roots): Likewise. (vect_slp_analyze_operations): Use them. (vect_slp_check_for_constructors): Recognize associatable chains for BB reduction vectorization. (vectorize_slp_instance_root_stmt): Generate code for the BB reduction epilogue. * gcc.dg/vect/bb-slp-pr54400.c: New testcase.
2021-06-09tree-optimization/100981 - fix SLP patterns involving reductionsRichard Biener1-1/+1
The following fixes the SLP FMA patterns to preserve reduction info and the reduction vectorization to consider internal function call defs for the reduction stmt. 2021-06-09 Richard Biener <rguenther@suse.de> PR tree-optimization/100981 gcc/ * tree-vect-loop.c (vect_create_epilog_for_reduction): Use gimple_get_lhs to also handle calls. * tree-vect-slp-patterns.c (complex_pattern::build): Transfer reduction info. gcc/testsuite/ * gfortran.dg/vect/pr100981-1.f90: New testcase. libgomp/ * testsuite/libgomp.fortran/pr100981-2.f90: New testcase.
2021-06-03vect: Use main loop's thresholds and VF to narrow upper_bound of epilogueAndre Vieira1-6/+25
This patch uses the knowledge of the conditions to enter an epilogue loop to help come up with a potentially more restricive upper bound. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Use main loop's various' thresholds to narrow the upper bound on epilogue iterations. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/part_vect_single_iter_epilog.c: New test.
2021-05-20vect: Replace hardcoded inner loop cost factorKewen Lin1-1/+2
This patch is to replace the current hardcoded weight factor 50, which is applied by the loop vectorizer to the cost of statements in an inner loop relative to the loop being vectorized, with one newly added member inner_loop_cost_factor in loop vinfo. It also introduces one parameter vect-inner-loop-cost-factor whose default value is 50, and is used to initialize the inner_loop_cost_factor member. The motivation here is that: if targets want to have one unique function to gather some information in each add_stmt_cost call, no matter that it's put before or after the cost tweaking part for inner loop, it may have the need to adjust (expand or shrink) the gathered data as the factor. Now the factor is hardcoded, it's not easily maintained. Bootstrapped/regtested on powerpc64le-linux-gnu P9, x86_64-redhat-linux and aarch64-linux-gnu. gcc/ChangeLog: * doc/invoke.texi (vect-inner-loop-cost-factor): Document new parameter. * params.opt (vect-inner-loop-cost-factor): New. * targhooks.c (default_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR, include head file tree-vectorizer.h and its required ones. * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Replace hardcoded factor 50 with LOOP_VINFO_INNER_LOOP_COST_FACTOR. * config/arm/arm.c (arm_add_stmt_cost): Likewise. * config/i386/i386.c (ix86_add_stmt_cost): Likewise. * config/rs6000/rs6000.c (rs6000_add_stmt_cost): Likewise. * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Likewise. (_loop_vec_info::_loop_vec_info): Init inner_loop_cost_factor. * tree-vectorizer.h (_loop_vec_info): Add inner_loop_cost_factor. (LOOP_VINFO_INNER_LOOP_COST_FACTOR): New macro.
2021-05-11vect: Add costing_for_scalar parameter to init_cost hookKewen Lin1-3/+3
rs6000 port function rs6000_density_test wants to differentiate the current cost model is for the scalar version of a loop or block, or the vector version. As Richi suggested, this patch introduces one new parameter costing_for_scalar to init_cost hook to pass down this information explicitly. gcc/ChangeLog: * doc/tm.texi: Regenerated. * target.def (init_cost): Add new parameter costing_for_scalar. * targhooks.c (default_init_cost): Adjust for new parameter. * targhooks.h (default_init_cost): Likewise. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Likewise. (vect_compute_single_scalar_iteration_cost): Likewise. (vect_analyze_loop_2): Likewise. * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise. (vect_bb_vectorization_profitable_p): Likewise. * tree-vectorizer.h (init_cost): Likewise. * config/aarch64/aarch64.c (aarch64_init_cost): Likewise. * config/i386/i386.c (ix86_init_cost): Likewise. * config/rs6000/rs6000.c (rs6000_init_cost): Likewise.
2021-04-16vectorizer: Remove dead scalar .COND_* calls from vectorized loops [PR99767]Jakub Jelinek1-1/+15
The following testcase ICEs because disabling of DCE means there are dead stmts in the loop (though, in theory they could become dead only shortly before if-conv through some optimization), ifcvt which goes through all stmts in the loop if-converts them into .COND_DIV etc. internal fn calls in the copy of the loop meant for vectorization only, the loop is successfully vectorized but the particular .COND_* call is not because it isn't a live statement and the scalar .COND_* remains in the IL until expansion where it ICEs because these ifns only support vectors and not scalars. These ifns are similar to .MASK_{LOAD,STORE} in this behavior. One possible fix could be to expand scalar versions of them during expansion, basically undoing what if-conv did to create them, i.e. expand them as the lhs = else; if (mask) { lhs = statement; } or so. For .MASK_LOAD we have code to replace them in vect_transform_loop already though (not needed for .MASK_STORE, as stores should be always live and thus always vectorized), so this patch instead replaces .COND_* similarly to .MASK_LOAD in that loop, with the small difference that lhs = .MASK_LOAD (...); is replaced by lhs = 0; while lhs = .COND_* (..., else_arg); is replaced by lhs = else_arg. The statement must be dead, otherwise it would be vectorized, so I think it is not a big deal we don't turn it back into multiple basic blocks etc. (and it might be not possible to do that at that point). 2021-04-16 Jakub Jelinek <jakub@redhat.com> PR target/99767 * tree-vect-loop.c (vect_transform_loop): Don't remove just dead scalar .MASK_LOAD calls, but also dead .COND_* calls - replace them by their last argument. * gcc.target/aarch64/pr99767.c: New test.
2021-04-07tree-optimization/99947 - avoid v.safe_push (v[0])Richard Biener1-1/+2
This avoids (again) the C++ pitfall of pushing a reference to sth being reallocated. 2021-04-07 Richard Biener <rguenther@suse.de> PR tree-optimization/99947 * tree-vect-loop.c (vectorizable_induction): Pre-allocate steps vector to avoid pushing elements from the reallocated vector. * gcc.dg/torture/pr99947.c: New testcase.
2021-04-06tree-optimization/99880 - avoid vectorizing irrelevant PHI backedge defsRichard Biener1-0/+1
This adds a relevancy check before trying to set the vector def of a backedge in an unvectorized PHI. 2021-04-06 Richard Biener <rguenther@suse.de> PR tree-optimization/99880 * tree-vect-loop.c (maybe_set_vectorized_backedge_value): Only set vectorized defs of relevant PHIs. * gcc.dg/torture/pr99880.c: New testcase.
2021-03-25vect: Init inside_cost in vect_model_reduction_costKewen Lin1-1/+1
This patch is to initialize the inside_cost as zero, can avoid to use its uninitialized value when some path doesn't assign it. gcc/ChangeLog: * tree-vect-loop.c (vect_model_reduction_cost): Init inside_cost.
2021-02-25tree-optimization/99253 - fix reduction path checkRichard Biener1-28/+28
This fixes an ordering problem with verifying that no intermediate computations in a reduction path are used outside of the chain. The check was disabled for value-preserving conversions at the tail but whether a stmt was a conversion or not was only computed after the first use. The following fixes this by re-ordering things accordingly. 2021-02-25 Richard Biener <rguenther@suse.de> PR tree-optimization/99253 * tree-vect-loop.c (check_reduction_path): First compute code, then verify out-of-loop uses. * gcc.dg/vect/pr99253.c: New testcase.
2021-02-10tree-optimization/99024 - fix leak in loop vect analysisRichard Biener1-1/+5
When we analyzed a loop as epilogue but later in peeling decide we're not going to use it then in the DTOR we clear the original loops ->aux which causes us to leak the main loop vinfo. Fixed by only clearing aux if it is associated with the vinfo we're destroying. 2021-02-10 Richard Biener <rguenther@suse.de> PR tree-optimization/99024 * tree-vect-loop.c (_loop_vec_info::~_loop_vec_info): Only clear loop->aux if it is associated with the destroyed loop_vinfo.
2021-02-04tree-optimization/98855 - fix some vectorizer cost issuesRichard Biener1-2/+6
This fixes us not costing vectorized bswap for SLP as well as avoiding biasing to the vectorized side when costing single-argument PHIs. Instead we assume coalescing here and cost them with zero cost for both the scalar and vectorized code. This doesn't fix the PR on its own. 2021-02-04 Richard Biener <rguenther@suse.de> PR tree-optimization/98855 * tree-vect-loop.c (vectorizable_phi): Do not cost single-argument PHIs. * tree-vect-slp.c (vect_bb_slp_scalar_cost): Likewise. * tree-vect-stmts.c (vectorizable_bswap): Also perform costing for SLP operation.
2021-02-03slp: Split out patterns away from using SLP_ONLY into their own flagTamar Christina1-1/+1
Previously the SLP pattern matcher was using STMT_VINFO_SLP_VECT_ONLY as a way to dissolve the SLP only patterns during SLP cancellation. However it seems like the semantics for STMT_VINFO_SLP_VECT_ONLY are slightly different than what I expected. Namely that the non-SLP path can still use a statement marked STMT_VINFO_SLP_VECT_ONLY. One such example is masked loads which are used both in the SLP and non-SLP path. To fix this I now introduce a new flag STMT_VINFO_SLP_VECT_ONLY_PATTERN which is used only by the pattern matcher. gcc/ChangeLog: PR tree-optimization/98928 * tree-vect-loop.c (vect_analyze_loop_2): Change STMT_VINFO_SLP_VECT_ONLY to STMT_VINFO_SLP_VECT_ONLY_PATTERN. * tree-vect-slp-patterns.c (complex_pattern::build): Likewise. * tree-vectorizer.h (STMT_VINFO_SLP_VECT_ONLY_PATTERN): New. (class _stmt_vec_info): Add slp_vect_pattern_only_p. gcc/testsuite/ChangeLog: PR tree-optimization/98928 * gcc.target/i386/pr98928.c: New test.
2021-01-11tree-optimization/98526 - fix vectorizer reduction costRichard Biener1-6/+11
This fixes a double-counting in the reduction cost when vectorizing the reduction through the regular vectorizable_* functions. 2021-01-11 Richard Biener <rguenther@suse.de> PR tree-optimization/98526 * tree-vect-loop.c (vect_model_reduction_cost): Remove costing of the actual reduction op for the regular case. (vectorizable_reduction): Cost the stmts vect_transform_reduction produces here.
2021-01-05tree-optimization/98381 - fix live bool vector extractRichard Biener1-3/+2
This fixes extraction of live bool vector results for the case of integer mode vectors. 2021-01-05 Richard Biener <rguenther@suse.de> PR tree-optimization/98381 * tree.c (vector_element_bits): Properly compute bool vector element size. * tree-vect-loop.c (vectorizable_live_operation): Properly compute the last lane bit offset.
2021-01-05vect: Fix missing alias checks for 128-bit SVE [PR98371]Richard Sandiford1-3/+58
On AArch64, the vectoriser tries various ways of vectorising with both SVE and Advanced SIMD and picks the best one. All other things being equal, it prefers earlier attempts over later attempts. The way this works currently is that, once it has a successful vectorisation attempt A, it analyses all other attempts as epilogue loops of A: /* When pick_lowest_cost_p is true, we should in principle iterate over all the loop_vec_infos that LOOP_VINFO could replace and try to vectorize LOOP_VINFO under the same conditions. E.g. when trying to replace an epilogue loop, we should vectorize LOOP_VINFO as an epilogue loop with the same VF limit. When trying to replace the main loop, we should vectorize LOOP_VINFO as a main loop too. However, autovectorize_vector_modes is usually sorted as follows: - Modes that naturally produce lower VFs usually follow modes that naturally produce higher VFs. - When modes naturally produce the same VF, maskable modes usually follow unmaskable ones, so that the maskable mode can be used to vectorize the epilogue of the unmaskable mode. This order is preferred because it leads to the maximum epilogue vectorization opportunities. Targets should only use a different order if they want to make wide modes available while disparaging them relative to earlier, smaller modes. The assumption in that case is that the wider modes are more expensive in some way that isn't reflected directly in the costs. There should therefore be few interesting cases in which LOOP_VINFO fails when treated as an epilogue loop, succeeds when treated as a standalone loop, and ends up being genuinely cheaper than FIRST_LOOP_VINFO. */ However, the vectoriser can normally elide alias checks for epilogue loops, on the basis that the main loop should do them instead. Converting an epilogue loop to a main loop can therefore cause the alias checks to be skipped. (It probably also unfairly penalises the original loop in the cost comparison, given that one loop will have alias checks and the other won't.) As the comment says, we should in principle analyse each vector mode twice: once as a main loop and once as an epilogue. However, doing that up-front would be quite expensive. This patch instead goes for a compromise: if an epilogue loop for mode M2 seems better than a main loop for mode M1, re-analyse with M2 as the main loop. The patch fixes dg.torture.exp=pr69719.c when testing with -msve-vector-bits=128. gcc/ PR tree-optimization/98371 * tree-vect-loop.c (vect_reanalyze_as_main_loop): New function. (vect_analyze_loop): If an epilogue loop appears to be cheaper than the main loop, re-analyze it as a main loop before adopting it as a main loop.
2021-01-04tree-optimization/98291 - allow SLP more vectorization of reductionsRichard Biener1-2/+8
When the VF is one a SLP reduction is in-order and thus we can vectorize even when the reduction op is not associative. 2021-01-04 Richard Biener <rguenther@suse.de> PR tree-optimization/98291 * tree-vect-loop.c (vectorizable_reduction): Bypass associativity check for SLP reductions with VF 1. * gcc.dg/vect/slp-reduc-11.c: New testcase. * gcc.dg/vect/vect-reduc-in-order-4.c: Adjust.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-18aarch64: SVE: ICE in expand_direct_optab_fn [PR98177]Przemyslaw Wirkus1-4/+4
Problem comes from using the wrong interface to get the index type for a COND_REDUCTION. For fixed-length SVE we get a V2SI (a 64-bit Advanced SIMD vector) instead of a VNx2SI (an SVE vector that stores SI elements in DI containers). Credits to Richard Sandiford for pointing out the issue's root cause. Original PR snippet proposed to reproduce issue was only causing ICE for C++ compiler (see pr98177-1 test cases). I've slightly modified original snippet in order to reproduce issue on both C and C++ compilers. These are pr98177-2 test cases. gcc/ChangeLog: PR target/98177 * tree-vect-loop.c (vect_create_epilog_for_reduction): Use get_same_sized_vectype to obtain index type. (vectorizable_reduction): Likewise. gcc/testsuite/ChangeLog: PR target/98177 * g++.target/aarch64/sve/pr98177-1.C: New test. * g++.target/aarch64/sve/pr98177-2.C: New test. * gcc.target/aarch64/sve/pr98177-1.c: New test. * gcc.target/aarch64/sve/pr98177-2.c: New test.
2020-12-17vect, aarch64: Extend SVE vs Advanced SIMD costing decisions in ↵Kyrylo Tkachov1-36/+49
vect_better_loop_vinfo_p While experimenting with some backend costs for Advanced SIMD and SVE I hit many cases where GCC would pick SVE for VLA auto-vectorisation even when the backend very clearly presented cheaper costs for Advanced SIMD. For a simple float addition loop the SVE costs were: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 28 Vector prologue cost: 2 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 2 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 1 Calculated minimum iters for profitability: 4 and for Advanced SIMD (Neon) they're: vec.c:9:21: note: Cost model analysis: Vector inside of loop cost: 11 Vector prologue cost: 0 Vector epilogue cost: 0 Scalar iteration cost: 10 Scalar outside cost: 0 Vector outside cost: 0 prologue iterations: 0 epilogue iterations: 0 Calculated minimum iters for profitability: 0 vec.c:9:21: note: Runtime profitability threshold = 4 yet the SVE one was always picked. With guidance from Richard this seems to be due to the vinfo comparisons in vect_better_loop_vinfo_p, in particular the part with the big comment explaining the estimated_rel_new * 2 <= estimated_rel_old heuristic. This patch extends the comparisons by introducing a three-way estimate kind for poly_int values that the backend can distinguish. This allows vect_better_loop_vinfo_p to ask for minimum, maximum and likely estimates and pick Advanced SIMD overs SVE when it is clearly cheaper. gcc/ * target.h (enum poly_value_estimate_kind): Define. (estimated_poly_value): Take an estimate kind argument. * target.def (estimated_poly_value): Update definition for the above. * doc/tm.texi: Regenerate. * targhooks.c (estimated_poly_value): Update prototype. * tree-vect-loop.c (vect_better_loop_vinfo_p): Use min, max and likely estimates of VF to pick between vinfos. * config/aarch64/aarch64.c (aarch64_cmp_autovec_modes): Use estimated_poly_value instead of aarch64_estimated_poly_value. (aarch64_estimated_poly_value): Take a kind argument and handle it.
2020-12-13middle-end: Support complex AdditionTamar Christina1-2/+6
This patch adds support for * Complex Addition with rotation of 90 and 270. Addition with rotation of the second argument around the Argand plane. Supported rotations are 90 and 180. c = a + (b * I) and c = a + (b * I * I * I) gcc/ChangeLog: * tree-vect-slp-patterns.c: New file. * Makefile.in: Add it. * doc/passes.texi: Document it. * internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New. * optabs.def (cadd90_optab, cadd270_optab): New. * doc/md.texi: Document them. * tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code. * tree-vect-slp.c: (vect_free_slp_instance, vect_create_new_slp_node): Export. (vect_match_slp_patterns_2, vect_match_slp_patterns): New. (vect_analyze_slp): Use it. * tree-vectorizer.h (vect_free_slp_tree): Export. (enum _complex_operation): Forward declare. (class vect_pattern): New gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it. (check_effective_target_vect_complex_add_byte ,check_effective_target_vect_complex_add_int ,check_effective_target_vect_complex_add_short ,check_effective_target_vect_complex_add_long ,check_effective_target_vect_complex_add_half ,check_effective_target_vect_complex_add_float ,check_effective_target_vect_complex_add_double): New. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test. * gcc.dg/vect/complex/complex-add-pattern-template.c: New test. * gcc.dg/vect/complex/complex-add-template.c: New test. * gcc.dg/vect/complex/complex-operations-run.c: New test. * gcc.dg/vect/complex/complex-operations.c: New test. * gcc.dg/vect/complex/complex.exp: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test. * gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test. * gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.
2020-12-02guard maybe_set_vectorized_backedge_value callsRichard Biener1-11/+13
This makes sure to not call maybe_set_vectorized_backedge_value when we did not vectorize the latch def candidate. 2020-12-02 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vect_transform_loop_stmt): Return whether we vectorized a stmt. (vect_transform_loop): Only call maybe_set_vectorized_backedge_value when we vectorized the stmt.
2020-11-30tree-optimization/98064 - fix BB SLP live lane extract wrt LC SSARichard Biener1-0/+18
This avoids breaking LC SSA when SLP codegen pulled an out-of-loop def into a loop when merging with in-loop defs for an external def. 2020-11-30 Richard Biener <rguenther@suse.de> PR tree-optimization/98064 * tree-vect-loop.c (vectorizable_live_operation): Avoid breaking LC SSA for BB vectorization. * g++.dg/vect/pr98064.cc: New testcase.
2020-11-19vect: Add a “very cheap” cost modelRichard Sandiford1-0/+27
Currently we have three vector cost models: cheap, dynamic and unlimited. -O2 -ftree-vectorize uses “cheap” by default, but that's still relatively aggressive about peeling and aliasing checks, and can lead to significant code size growth. This patch adds an even more conservative choice, which for lack of imagination I've called “very cheap”. It only allows vectorisation if the vector code entirely replaces the scalar code. It also requires one iteration of the vector loop to pay for itself, regardless of how often the loop iterates. (If the vector loop needs multiple iterations to be beneficial then things are probably too close to call, and the conservative thing would be to stick with the scalar code.) The idea is that this should be suitable for -O2, although the patch doesn't change any defaults itself. I tested this by building and running a bunch of workloads for SVE, with three options: (1) -O2 (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap] All three builds used the default -msve-vector-bits=scalable and ran with the minimum vector length of 128 bits, which should give a worst-case bound for the performance impact. The workloads included a mixture of microbenchmarks and full applications. Because it's quite an eclectic mix, there's not much point giving exact figures. The aim was more to get a general impression. Code size growth with (2) was much lower than with (3). Only a handful of tests increased by more than 5%, and all of them were microbenchmarks. In terms of performance, (2) was significantly faster than (1) on microbenchmarks (as expected) but also on some full apps. Again, performance only regressed on a handful of tests. As expected, the performance of (3) vs. (1) and (3) vs. (2) is more of a mixed bag. There are several significant improvements with (3) over (2), but also some (smaller) regressions. That seems to be in line with -O2 -ftree-vectorize being a kind of -O2.5. The patch reorders vect_cost_model so that values are in order of increasing aggressiveness, which makes it possible to use range checks. The value 0 still represents “unlimited”, so “if (flag_vect_cost_model)” is still a meaningful check. gcc/ * doc/invoke.texi (-fvect-cost-model): Add a very-cheap model. * common.opt (fvect-cost-model=): Add very-cheap as a possible option. (fsimd-cost-model=): Likewise. (vect_cost_model): Add very-cheap. * flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP. Put the values in order of increasing aggressiveness. * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use range checks when comparing against VECT_COST_MODEL_CHEAP. (vect_prune_runtime_alias_test_list): Do not allow any alias checks for the very-cheap cost model. * tree-vect-loop.c (vect_analyze_loop_costing): Do not allow any peeling for the very-cheap cost model. Also require one iteration of the vector loop to pay for itself. gcc/testsuite/ * gcc.dg/vect/vect-cost-model-1.c: New test. * gcc.dg/vect/vect-cost-model-2.c: Likewise. * gcc.dg/vect/vect-cost-model-3.c: Likewise. * gcc.dg/vect/vect-cost-model-4.c: Likewise. * gcc.dg/vect/vect-cost-model-5.c: Likewise. * gcc.dg/vect/vect-cost-model-6.c: Likewise.
2020-11-18tree-optimization/97886 - deal with strange LC PHI nodesRichard Biener1-0/+11
This makes vectorization properly assign vector types to PHI nodes that copy from externals on loop exit edges. 2020-11-18 Richard Biener <rguenther@suse.de> PR tree-optimization/97886 * tree-vect-loop.c (vectorizable_lc_phi): Properly assign vector types to invariants for SLP.
2020-11-16Delay SLP instance loads gatheringRichard Biener1-0/+3
This delays filling SLP_INSTANCE_LOADS. 2020-11-16 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vect_gather_slp_loads): Declare. * tree-vect-loop.c (vect_analyze_loop_2): Call vect_gather_slp_loads. * tree-vect-slp.c (vect_build_slp_instance): Do not gather SLP loads here. (vect_gather_slp_loads): Remove wrapper, new function. (vect_slp_analyze_bb_1): Call it.
2020-11-16tree-optimization/97835 - fix step vector construction for SLP inductionRichard Biener1-1/+1
We're stripping conversions off access functions of inductions and thus the step can be of different sign. Fix bogus step CTORs by converting the elements rather than the whole vector. 2020-11-16 Richard Biener <rguenther@suse.de> PR tree-optimization/97835 * tree-vect-loop.c (vectorizable_induction): Convert step scalars rather than step vector. * gcc.dg/vect/pr97835.c: New testcase.
2020-11-10tree-optimization/97760 - reduction paths with unhandled live stmtRichard Biener1-3/+6
This makes sure we reject reduction paths with a live stmt that is not the last one altering the value. This is because we do not handle this in the epilogue unless there's a scalar epilogue loop. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97760 * tree-vect-loop.c (check_reduction_path): Reject reduction paths we do not handle in epilogue generation. * gcc.dg/vect/pr97760.c: New testcase.
2020-11-09tree-optimization/97753 - fix SLP induction vectRichard Biener1-2/+5
This fixes updating of the step vectors when filling up to group_size. 2020-11-09 Richard Biener <rguenther@suse.de> PR tree-optimization/97753 * tree-vect-loop.c (vectorizable_induction): Fill vec_steps when CSEing inside the group. * gcc.dg/vect/pr97753.c: New testcase.
2020-11-06tree-optimization/97732 - fix init of SLP induction vectorizationRichard Biener1-0/+4
This PR exposes two issues - one that the vector builder treats &x as eligible for VECTOR_CST elements and one that SLP induction vectorization forgets to convert init elements to the vector component type which makes a difference for pointer vs. integer. 2020-11-06 Richard Biener <rguenther@suse.de> PR tree-optimization/97732 * tree-vect-loop.c (vectorizable_induction): Convert the init elements to the vector component type. * gimple-fold.c (gimple_build_vector): Use CONSTANT_CLASS_P rather than TREE_CONSTANT to determine if elements are eligible for VECTOR_CSTs. * gcc.dg/vect/bb-slp-pr97732.c: New testcase.
2020-11-05middle-end: Store and use the SLP instance kind when aborting load/store lanesTamar Christina1-0/+1
This patch stores the SLP instance kind in the SLP instance so that we can use it later when detecting load/store lanes support. This also changes the load/store lane support check to only check if the SLP kind is a store. This means that in order for the load/lanes to work all instances must be of kind store. gcc/ChangeLog: * tree-vect-loop.c (vect_analyze_loop_2): Check kind. * tree-vect-slp.c (vect_build_slp_instance): New. (enum slp_instance_kind): Move to... * tree-vectorizer.h (enum slp_instance_kind): .. Here (SLP_INSTANCE_KIND): New.
2020-11-04middle-end: Move load/store-lanes check till late.Tamar Christina1-0/+72
This moves the code that checks for load/store lanes further in the pipeline and places it after slp_optimize. This would allow us to perform optimizations on the SLP tree and only bail out if we really have a permute. With this change it allows us to handle permutes such as {1,1,1,1} which should be handled by a load and replicate. This change however makes it all or nothing. Either all instances can be handled or none at all. This is why some of the test cases have been adjusted. gcc/ChangeLog: * tree-vect-slp.c (vect_analyze_slp_instance): Moved load/store lanes check to ... * tree-vect-loop.c (vect_analyze_loop_2): ..Here gcc/testsuite/ChangeLog: * gcc.dg/vect/slp-11b.c: Update output scan. * gcc.dg/vect/slp-perm-6.c: Likewise.
2020-11-04add costing to SLP vectorized PHIsRichard Biener1-1/+3
I forgot to cost vectorized PHIs. Scalar PHIs are just costed as scalar_stmt so the following costs vector PHIs as vector_stmt. 2020-11-04 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (vectorizable_phi): Adjust prototype. * tree-vect-stmts.c (vect_transform_stmt): Adjust. (vect_analyze_stmt): Pass cost_vec to vectorizable_phi. * tree-vect-loop.c (vectorizable_phi): Do costing.
2020-11-04tree-optimization/97709 - set abnormal flag when vectorizing live lanesRichard Biener1-0/+3
This properly sets the abnormal flag when vectorizing live lanes when the original scalar was live across an abnormal edge. 2020-11-04 Richard Biener <rguenther@suse.de> PR tree-optimization/97709 * tree-vect-loop.c (vectorizable_live_operation): Set SSA_NAME_OCCURS_IN_ABNORMAL_PHI when necessary. * gcc.dg/vect/bb-slp-pr97709.c: New testcase.
2020-11-04Re-instantiate SLP induction IV CSERichard Biener1-2/+19
This re-instantiates the previously removed CSE, fixing the FAIL of gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c It turns out the previous approach still works. 2020-11-04 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_induction): Re-instantiate previously removed CSE of SLP IVs.
2020-11-03tree-optimization/80928 - SLP vectorize nested loop inductionRichard Biener1-65/+51
This adds SLP vectorization of nested inductions. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/80928 * tree-vect-loop.c (vectorizable_induction): SLP vectorize nested inductions. * gcc.dg/vect/vect-outer-slp-2.c: New testcase. * gcc.dg/vect/vect-outer-slp-3.c: Likewise.
2020-11-03tree-optimization/97678 - fix SLP induction epilogue vectorizationRichard Biener1-5/+44
This restores not tracking SLP nodes for induction initial values in not nested context because this interferes with peeling and epilogue vectorization. 2020-11-03 Richard Biener <rguenther@suse.de> PR tree-optimization/97678 * tree-vect-slp.c (vect_build_slp_tree_2): Do not track the initial values of inductions when not nested. * tree-vect-loop.c (vectorizable_induction): Look at PHI node initial values again for SLP and not nested inductions. Handle LOOP_VINFO_MASK_SKIP_NITERS and cost invariants. * gcc.dg/vect/pr97678.c: New testcase.
2020-11-02Rewrite SLP induction vectorizationRichard Biener1-135/+161
This rewrites SLP induction vectorization to handle different inductions in the different SLP lanes. It also changes SLP build to represent the initial value (but not the cycle) so it can be enhanced to handle outer loop vectorization later. Note this FAILs gcc.dg/vect/costmodel/x86_64/costmodel-pr30843.c because it removes one CSE optimization that no longer works with non-uniform initial value and step. I'll see to recover from this after outer loop vectorization of inductions works. It might be a bit friendlier to variable-size vectors now but then we're now building the step vector from scalars ... 2020-11-02 Richard Biener <rguenther@suse.de> * tree.h (build_real_from_wide): Declare. * tree.c (build_real_from_wide): New function. * tree-vect-slp.c (vect_build_slp_tree_2): Remove restriction on induction vectorization, represent the initial value. * tree-vect-loop.c (vect_model_induction_cost): Inline ... (vectorizable_induction): ... here. Rewrite SLP code generation. * gcc.dg/vect/slp-49.c: New testcase.
2020-11-02tree-optimization/97558 - compute vectype for SLP nested cyclesRichard Biener1-3/+22
This makes sure to compute the vector type for invariant SLP children of nested cycles. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97558 * tree-vect-loop.c (vectorizable_reduction): For nested SLP cycles compute invariant operands vector type. * gcc.dg/vect/pr97558-2.c: New testcase.
2020-11-02tree-optimization/97558 - avoid SLP analyzing irrelevant stmtsRichard Biener1-21/+44
This avoids analyzing reductions that are not relevant (thus dead) which eventually will lead into crashes because the participating stmts meta is not analyzed. For this to work the patch also properly removes reduction groups that are not uniformly recognized as patterns. 2020-11-02 Richard Biener <rguenther@suse.de> PR tree-optimization/97558 * tree-vect-loop.c (vect_fixup_scalar_cycles_with_patterns): Check for any mismatch in pattern vs. non-pattern and dissolve the group if there is one. * tree-vect-slp.c (vect_analyze_slp_instance): Avoid analyzing not relevant reductions. (vect_analyze_slp): Avoid analyzing not relevant reduction groups. * gcc.dg/vect/pr97558.c: New testcase.
2020-10-29Fix some memleaksRichard Biener1-1/+1
This fixes some memleaks, one older, one recently introduced. 2020-10-29 Richard Biener <rguenther@suse.de> * tree-ssa-pre.c (compute_avail): Free operands consistently. * tree-vect-loop.c (vectorizable_phi): Make sure all operand defs vectors are released.
2020-10-27SLP vectorize across PHI nodesRichard Biener1-11/+101
This makes SLP discovery detect backedges by seeding the bst_map with the node to be analyzed so it can be picked up from recursive calls. This removes the need to discover backedges in a separate walk. This enables SLP build to handle PHI nodes in full, continuing the SLP build to non-backedges. For loop vectorization this enables outer loop vectorization of nested SLP cycles and for BB vectorization this enables vectorization of PHIs at CFG merges. It also turns code generation into a SCC discovery walk to handle irreducible regions and nodes only reachable via backedges where we now also fill in vectorized backedge defs. This requires sanitizing the SLP tree for SLP reduction chains even more, manually filling the backedge SLP def. This also exposes the fact that CFG copying (and edge splitting until I fixed that) ends up with different edge order in the copy which doesn't play well with the desired 1:1 mapping of SLP PHI node children and edges for epilogue vectorization. I've tried to fixup CFG copying here but this really looks like a dead (or expensive) end there so I've done fixup in slpeel_tree_duplicate_loop_to_edge_cfg instead for the cases we can run into. There's still NULLs in the SLP_TREE_CHILDREN vectors and I'm not sure it's possible to eliminate them all this stage1 so the patch has quite some checks for this case all over the place. Bootstrapped and tested on x86_64-unknown-linux-gnu. SPEC CPU 2017 and SPEC CPU 2006 successfully built and tested. 2020-10-27 Richard Biener <rguenther@suse.de> * gimple.h (gimple_expr_type): For PHIs return the type of the result. * tree-vect-loop-manip.c (slpeel_tree_duplicate_loop_to_edge_cfg): Make sure edge order into copied loop headers line up with the originals. * tree-vect-loop.c (vect_transform_cycle_phi): Handle nested loops with SLP. (vectorizable_phi): New function. (vectorizable_live_operation): For BB vectorization compute insert location here. * tree-vect-slp.c (vect_free_slp_tree): Deal with NULL SLP_TREE_CHILDREN entries. (vect_create_new_slp_node): Add overloads with pre-existing node argument. (vect_print_slp_graph): Likewise. (vect_mark_slp_stmts): Likewise. (vect_mark_slp_stmts_relevant): Likewise. (vect_gather_slp_loads): Likewise. (vect_optimize_slp): Likewise. (vect_slp_analyze_node_operations): Likewise. (vect_bb_slp_scalar_cost): Likewise. (vect_remove_slp_scalar_calls): Likewise. (vect_get_and_check_slp_defs): Handle PHIs. (vect_build_slp_tree_1): Handle PHIs. (vect_build_slp_tree_2): Continue SLP build, following PHI arguments. Fix memory leak. (vect_build_slp_tree): Put stub node into the hash-map so we can discover cycles directly. (vect_build_slp_instance): Set the backedge SLP def for reduction chains. (vect_analyze_slp_backedges): Remove. (vect_analyze_slp): Do not call it. (vect_slp_convert_to_external): Release SLP_TREE_LOAD_PERMUTATION. (vect_slp_analyze_node_operations): Handle stray failed backedge defs by failing. (vect_slp_build_vertices): Adjust leaf condition. (vect_bb_slp_mark_live_stmts): Handle PHIs, use visited hash-set to handle cycles. (vect_slp_analyze_operations): Adjust. (vect_bb_partition_graph_r): Likewise. (vect_slp_function): Adjust split condition to allow CFG merges. (vect_schedule_slp_instance): Rename to ... (vect_schedule_slp_node): ... this. Move DFS walk to ... (vect_schedule_scc): ... this new function. (vect_schedule_slp): Call it. Remove ad-hoc vectorized backedge fill code. * tree-vect-stmts.c (vect_analyze_stmt): Call vectorizable_phi. (vect_transform_stmt): Likewise. (vect_is_simple_use): Handle vect_backedge_def. * tree-vectorizer.c (vec_info::new_stmt_vec_info): Only set loop header PHIs to vect_unknown_def_type for loop vectorization. * tree-vectorizer.h (enum vect_def_type): Add vect_backedge_def. (enum stmt_vec_info_type): Add phi_info_type. (vectorizable_phi): Declare. * gcc.dg/vect/bb-slp-54.c: New test. * gcc.dg/vect/bb-slp-55.c: Likewise. * gcc.dg/vect/bb-slp-56.c: Likewise. * gcc.dg/vect/bb-slp-57.c: Likewise. * gcc.dg/vect/bb-slp-58.c: Likewise. * gcc.dg/vect/bb-slp-59.c: Likewise. * gcc.dg/vect/bb-slp-60.c: Likewise. * gcc.dg/vect/bb-slp-61.c: Likewise. * gcc.dg/vect/bb-slp-62.c: Likewise. * gcc.dg/vect/bb-slp-63.c: Likewise. * gcc.dg/vect/bb-slp-64.c: Likewise. * gcc.dg/vect/bb-slp-65.c: Likewise. * gcc.dg/vect/bb-slp-66.c: Likewise. * gcc.dg/vect/vect-outer-slp-1.c: Likewise. * gfortran.dg/vect/O3-bb-slp-1.f: Likewise. * gfortran.dg/vect/O3-bb-slp-2.f: Likewise. * g++.dg/vect/simd-11.cc: Likewise.
2020-10-22vect: Remove redundant LOOP_VINFO_FULLY_MASKED_PKewen Lin1-2/+1
Remove one redundant LOOP_VINFO_FULLY_MASKED_P condition check which will be checked in vect_use_loop_mask_for_alignment_p. gcc/ChangeLog: * tree-vect-loop.c (vect_transform_loop): Remove the redundant LOOP_VINFO_FULLY_MASKED_P check.
2020-10-20Fix latch PHI arg lookup in vectorizable_reduction for double-reductionRichard Biener1-2/+4
We were using the wrong loop to figure the latch arg of a double-reduction PHI. Which isn't a problem in case ->dest_idx match up with the outer loop edges - but that's of course not guaranteed. 2020-10-20 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_reduction): Use the correct loops latch edge for the PHI arg lookup.
2020-10-15Fix ICE in vectorizable_live_operationRichard Biener1-2/+5
This fixes the case where the insertion iterator for the live stmt is the end of a BB by adjusting the dominance query to the definition of the def we're substituting. 2020-10-15 Richard Biener <rguenther@suse.de> * tree-vect-loop.c (vectorizable_live_operation): Adjust dominance query. * gcc.dg/vect/bb-slp-52.c: New testcase.
2020-10-09random memory leak fixesRichard Biener1-0/+1
This fixes leaks discovered checking whether I introduced new ones with the last vectorizer changes. 2020-10-09 Richard Biener <rguenther@suse.de> * cgraphunit.c (expand_all_functions): Free tp_first_run_order. * ipa-modref.c (pass_ipa_modref::execute): Free order. * tree-ssa-loop-niter.c (estimate_numbers_of_iterations): Free loop body. * tree-vect-data-refs.c (vect_find_stmt_data_reference): Free data references upon failure. * tree-vect-loop.c (update_epilogue_loop_vinfo): Free BBs array of the original loop. * tree-vect-slp.c (vect_slp_bbs): Use an auto_vec for dataref_groups to release its memory.
2020-09-29tree-optimization/97241 - fix ICE in reduction vectorizationRichard Biener1-12/+5
The following moves an ad-hoc attempt at discovering the SLP node for a stmt to the place where we can find it in lock-step when we find the stmt itself. 2020-09-29 Richard Biener <rguenther@suse.de> PR tree-optimization/97241 * tree-vect-loop.c (vectorizable_reduction): Move finding the SLP node for the reduction stmt to a better place. * gcc.dg/vect/pr97241.c: New testcase.
2020-09-23vect: Fix epilogue loop handling of partial vectorsRichard Sandiford1-68/+128
This patch fixes the fallout that Kewen reported on Power after the recent change to avoid unnecessary use of partial vectors. As Kewen said, the problem is that vect_analyze_loop_2 doesn't know how many epilogue iterations there will be, and so it cannot make a final decision about whether the number of iterations forces an epilogue loop to use partial vectors. This is similar to the current situation for peeling: we don't know during initial analysis whether an epilogue loop will itself require peeling. Instead we decide that during vect_do_peeling, where the final number of epilogue loop iterations is known. The patch takes a similar approach for the decision about whether to use partial vectors. As the comments in the patch say, the idea is that vect_analyze_loop_2 should make peeling and partial- vector decisions based on the assumption that the loop_vinfo will be used as the main loop, while vect_do_peeling should make them in the knowledge that the loop_vinfo will be used as an epilogue loop. This allows the same analysis to be used for both cases, which we rely on for implementing VECT_COMPARE_COSTS; see the big comment in vect_analyze_loop for details. I hope the patch makes the (mostly preexisting) structure a bit more obvious. It isn't what anyone would design from scratch, but that's the nature of working with a mature vector framework. Arranging things this way means that vect_verify_full_masking and vect_verify_loop_lens now become part of the “can” rather than “will” test for partial vectors. Also, while splitting out the logic that handles epilogues with constant iterations, I added a check to make sure that we don't try to use partial vectors to vectorise a single-scalar loop. This required some changes to the Power tests. gcc/ * tree-vectorizer.h (determine_peel_for_niter): Delete in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function. * tree-vect-loop-manip.c (vect_update_epilogue_niters): New function. Reject using vector epilogue loops for single iterations. Install the constant number of epilogue loop iterations in the associated loop_vinfo. Rely on vect_determine_partial_vectors_and_peeling to do the main part of the test. (vect_do_peeling): Use vect_update_epilogue_niters to handle epilogue loops with a known number of iterations. Skip recomputing the number of iterations later in that case. Otherwise, use vect_determine_partial_vectors_and_peeling to decide whether the epilogue loop needs to use partial vectors or peeling. * tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Set the default can_use_partial_vectors_p to false if partial-vector-usage=0. (determine_peel_for_niter): Remove in favor of... (vect_determine_partial_vectors_and_peeling): ...this new function, split out from... (vect_analyze_loop_2): ...here. Reflect the vect_verify_full_masking and vect_verify_loop_lens results in CAN_USE_PARTIAL_VECTORS_P rather than USING_PARTIAL_VECTORS_P. gcc/testsuite/ * gcc.target/powerpc/p9-vec-length-epil-1.c: Do not expect the single-iteration epilogues of the 64-bit loops to be vectorized. * gcc.target/powerpc/p9-vec-length-epil-7.c: Likewise. * gcc.target/powerpc/p9-vec-length-epil-8.c: Likewise.
2020-09-23tree-optimization/97173 - extend assert in vectorizable_live_operationRichard Biener1-2/+4
The condition we're expecting to eventually run into isn't fully captured by checking for CTORs, instead we can also run into the CTOR element conversion. 2020-09-23 Richard Biener <rguenther@suse.de> PR tree-optimization/97173 * tree-vect-loop.c (vectorizable_live_operation): Extend assert to also conver element conversions. * gcc.dg/vect/pr97173.c: New testcase.