aboutsummaryrefslogtreecommitdiff
path: root/gcc/simplify-rtx.c
AgeCommit message (Collapse)AuthorFilesLines
2021-07-27simplify-rtx: Push sign/zero-extension inside vec_duplicateJonathan Wright1-11/+24
As a general principle, vec_duplicate should be as close to the root of an expression as possible. Where unary operations have vec_duplicate as an argument, these operations should be pushed inside the vec_duplicate. This patch modifies unary operation simplification to push sign/zero-extension of a scalar inside vec_duplicate. This patch also updates all RTL patterns in aarch64-simd.md to use the new canonical form. gcc/ChangeLog: 2021-07-19 Jonathan Wright <jonathan.wright@arm.com> * config/aarch64/aarch64-simd.md: Push sign/zero-extension inside vec_duplicate for all patterns. * simplify-rtx.c (simplify_context::simplify_unary_operation_1): Push sign/zero-extension inside vec_duplicate.
2021-07-20Adjust by-value function vec arguments to by-reference.Martin Sebor1-2/+2
gcc/c-family/ChangeLog: * c-common.c (c_build_shufflevector): Adjust by-value argument to by-const-reference. * c-common.h (c_build_shufflevector): Same. gcc/c/ChangeLog: * c-tree.h (c_build_function_call_vec): Adjust by-value argument to by-const-reference. * c-typeck.c (c_build_function_call_vec): Same. gcc/ChangeLog: * cfgloop.h (single_likely_exit): Adjust by-value argument to by-const-reference. * cfgloopanal.c (single_likely_exit): Same. * cgraph.h (struct cgraph_node): Same. * cgraphclones.c (cgraph_node::create_virtual_clone): Same. * genautomata.c (merge_states): Same. * genextract.c (VEC_char_to_string): Same. * genmatch.c (dt_node::gen_kids_1): Same. (walk_captures): Adjust by-value argument to by-reference. * gimple-ssa-store-merging.c (check_no_overlap): Adjust by-value argument to by-const-reference. * gimple.c (gimple_build_call_vec): Same. (gimple_build_call_internal_vec): Same. (gimple_build_switch): Same. (sort_case_labels): Same. (preprocess_case_label_vec_for_gimple): Adjust by-value argument to by-reference. * gimple.h (gimple_build_call_vec): Adjust by-value argument to by-const-reference. (gimple_build_call_internal_vec): Same. (gimple_build_switch): Same. (sort_case_labels): Same. (preprocess_case_label_vec_for_gimple): Adjust by-value argument to by-reference. * haifa-sched.c (calc_priorities): Adjust by-value argument to by-const-reference. (sched_init_luids): Same. (haifa_init_h_i_d): Same. * ipa-cp.c (ipa_get_indirect_edge_target_1): Same. (adjust_callers_for_value_intersection): Adjust by-value argument to by-reference. (find_more_scalar_values_for_callers_subset): Adjust by-value argument to by-const-reference. (find_more_contexts_for_caller_subset): Same. (find_aggregate_values_for_callers_subset): Same. (copy_useful_known_contexts): Same. * ipa-fnsummary.c (remap_edge_summaries): Same. (remap_freqcounting_predicate): Same. * ipa-inline.c (add_new_edges_to_heap): Adjust by-value argument to by-reference. * ipa-predicate.c (predicate::remap_after_inlining): Adjust by-value argument to by-const-reference. * ipa-predicate.h (predicate::remap_after_inlining): Same. * ipa-prop.c (ipa_find_agg_cst_for_param): Same. * ipa-prop.h (ipa_find_agg_cst_for_param): Same. * ira-build.c (ira_loop_tree_body_rev_postorder): Same. * read-rtl.c (add_overload_instance): Same. * rtl.h (native_decode_rtx): Same. (native_decode_vector_rtx): Same. * sched-int.h (sched_init_luids): Same. (haifa_init_h_i_d): Same. * simplify-rtx.c (native_decode_vector_rtx): Same. (native_decode_rtx): Same. * tree-call-cdce.c (gen_shrink_wrap_conditions): Same. (shrink_wrap_one_built_in_call_with_conds): Same. (shrink_wrap_conditional_dead_built_in_calls): Same. * tree-data-ref.c (create_runtime_alias_checks): Same. (compute_all_dependences): Same. * tree-data-ref.h (compute_all_dependences): Same. (create_runtime_alias_checks): Same. (index_in_loop_nest): Same. * tree-if-conv.c (mask_exists): Same. * tree-loop-distribution.c (class loop_distribution): Same. (loop_distribution::create_rdg_vertices): Same. (dump_rdg_partitions): Same. (debug_rdg_partitions): Same. (partition_contains_all_rw): Same. (loop_distribution::distribute_loop): Same. * tree-parloops.c (oacc_entry_exit_ok_1): Same. (oacc_entry_exit_single_gang): Same. * tree-ssa-loop-im.c (hoist_memory_references): Same. (loop_suitable_for_sm): Same. * tree-ssa-loop-niter.c (bound_index): Same. * tree-ssa-reassoc.c (update_ops): Same. (swap_ops_for_binary_stmt): Same. (rewrite_expr_tree): Same. (rewrite_expr_tree_parallel): Same. * tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Same. * tree-ssa-sccvn.h (ao_ref_init_from_vn_reference): Same. * tree-ssa-structalias.c (process_all_all_constraints): Same. (make_constraints_to): Same. (handle_lhs_call): Same. (find_func_aliases_for_builtin_call): Same. (sort_fieldstack): Same. (check_for_overlaps): Same. * tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Same. (vect_create_cond_for_unequal_addrs): Same. (vect_create_cond_for_lower_bounds): Same. (vect_create_cond_for_alias_checks): Same. * tree-vect-slp-patterns.c (vect_validate_multiplication): Same. * tree-vect-slp.c (vect_analyze_slp_instance): Same. (vect_make_slp_decision): Same. (vect_slp_bbs): Same. (duplicate_and_interleave): Same. (vect_transform_slp_perm_load): Same. (vect_schedule_slp): Same. * tree-vectorizer.h (vect_transform_slp_perm_load): Same. (vect_schedule_slp): Same. (duplicate_and_interleave): Same. * tree.c (build_vector_from_ctor): Same. (build_vector): Same. (check_vector_cst): Same. (check_vector_cst_duplicate): Same. (check_vector_cst_fill): Same. (check_vector_cst_stepped): Same. * tree.h (build_vector_from_ctor): Same.
2021-07-13gcc: Add vec_select -> subreg RTL simplificationJonathan Wright1-0/+10
Add a new RTL simplification for the case of a VEC_SELECT selecting the low part of a vector. The simplification returns a SUBREG. The primary goal of this patch is to enable better combinations of Neon RTL patterns - specifically allowing generation of 'write-to- high-half' narrowing intructions. Adding this RTL simplification means that the expected results for a number of tests need to be updated: * aarch64 Neon: Update the scan-assembler regex for intrinsics tests to expect a scalar register instead of lane 0 of a vector. * aarch64 SVE: Likewise. * arm MVE: Use lane 1 instead of lane 0 for lane-extraction intrinsics tests (as the move instructions get optimized away for lane 0.) This patch also adds new code generation tests to narrow_high_combine.c to verify the benefit of this RTL simplification. gcc/ChangeLog: 2021-06-08 Jonathan Wright <jonathan.wright@arm.com> * combine.c (combine_simplify_rtx): Add vec_select -> subreg simplification. * config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64): Add Neon to general purpose register case for zero-extend pattern. * config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r case to prevent some cases opting to go through memory. * cse.c (fold_rtx): Add vec_select -> subreg simplification. * rtl.c (rtvec_series_p): Define predicate to determine whether a vector contains a linear series of integers. * rtl.h (rtvec_series_p): Define. * rtlanal.c (vec_series_lowpart_p): Define predicate to determine if a vector selection is equivalent to the low part of the vector. * rtlanal.h (vec_series_lowpart_p): Define. * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Add vec_select -> subreg simplification. gcc/testsuite/ChangeLog: * gcc.target/aarch64/extract_zero_extend.c: Remove dump scan for RTL pattern match. * gcc.target/aarch64/narrow_high_combine.c: Add new tests. * gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update scan-assembler regex to look for a scalar register instead of lane 0 of a vector. * gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise. * gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise. * gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise. * gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise. * gcc.target/aarch64/sve/dup_lane_1.c: Likewise. * gcc.target/aarch64/sve/extract_1.c: Likewise. * gcc.target/aarch64/sve/extract_2.c: Likewise. * gcc.target/aarch64/sve/extract_3.c: Likewise. * gcc.target/aarch64/sve/extract_4.c: Likewise. * gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex cases to look for 'b' and 'h' registers instead of 'w'. * gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler regex to reflect lane 0 vector extractions being simplified to scalar register moves. * gcc.target/arm/crypto-vsha1h_u32.c: Likewise. * gcc.target/arm/crypto-vsha1mq_u32.c: Likewise. * gcc.target/arm/crypto-vsha1pq_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract lane 1 as the moves for lane 0 now get optimized away. * gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise. * gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
2021-06-11simplify-rtx: Fix up simplify_logical_relational_operation for vector IOR ↵Jakub Jelinek1-46/+49
[PR101008] simplify_relational_operation callees typically return just const0_rtx or const_true_rtx and then simplify_relational_operation attempts to fix that up if the comparison result has vector mode, or floating mode, or punt if it has scalar mode and vector mode operands (it doesn't know how exactly to deal with the scalar masks). But, simplify_logical_relational_operation has a special case, where it attempts to fold (x < y) | (x >= y) etc. and if it determines it is always true, it just returns const_true_rtx, without doing the dances that simplify_relational_operation does. That results in an ICE on the following testcase, where such folding happens during expansion (of debug stmts into DEBUG_INSNs) and we ICE because all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode) rtx. The following patch fixes that by moving the adjustement into a separate helper routine and using it from both simplify_relational_operation and simplify_logical_relational_operation. 2021-06-11 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/101008 * simplify-rtx.c (relational_result): New function. (simplify_logical_relational_operation, simplify_relational_operation): Use it. * gcc.dg/pr101008.c: New test.
2021-05-04Remove CC0Segher Boessenkool1-12/+8
This removes CC0 and all directly related infrastructure. CC_STATUS, CC_STATUS_MDEP, CC_STATUS_MDEP_INIT, and NOTICE_UPDATE_CC are deleted and poisoned. CC0 is only deleted (some targets use that name for something else). HAVE_cc0 is automatically generated, and we no longer will do that after this patch. CC_STATUS_INIT is suggested in final.c to also be useful for ports that are not CC0, and at least arm seems to use it for something. So I am leaving that alone, but most targets that have it could remove it. 2021-05-04 Segher Boessenkool <segher@kernel.crashing.org> * caller-save.c: Remove CC0. * cfgcleanup.c: Remove CC0. * cfgrtl.c: Remove CC0. * combine.c: Remove CC0. * compare-elim.c: Remove CC0. * conditions.h: Remove CC0. * config/h8300/h8300.h: Remove CC0. * config/h8300/h8300-protos.h: Remove CC0. * config/h8300/peepholes.md: Remove CC0. * config/i386/x86-tune-sched.c: Remove CC0. * config/m68k/m68k.c: Remove CC0. * config/rl78/rl78.c: Remove CC0. * config/sparc/sparc.c: Remove CC0. * config/xtensa/xtensa.c: Remove CC0. (gen_conditional_move): Use pc_rtx instead of cc0_rtx in a piece of RTL where that is used as a placeholder only. * cprop.c: Remove CC0. * cse.c: Remove CC0. * cselib.c: Remove CC0. * df-problems.c: Remove CC0. * df-scan.c: Remove CC0. * doc/md.texi: Remove CC0. Adjust an example. * doc/rtl.texi: Remove CC0. Adjust an example. * doc/tm.texi: Regenerate. * doc/tm.texi.in: Remove CC0. * emit-rtl.c: Remove CC0. * final.c: Remove CC0. * fwprop.c: Remove CC0. * gcse-common.c: Remove CC0. * gcse.c: Remove CC0. * genattrtab.c: Remove CC0. * genconfig.c: Remove CC0. * genemit.c: Remove CC0. * genextract.c: Remove CC0. * gengenrtl.c: Remove CC0. * genrecog.c: Remove CC0. * haifa-sched.c: Remove CC0. * ifcvt.c: Remove CC0. * ira-costs.c: Remove CC0. * ira.c: Remove CC0. * jump.c: Remove CC0. * loop-invariant.c: Remove CC0. * lra-constraints.c: Remove CC0. * lra-eliminations.c: Remove CC0. * optabs.c: Remove CC0. * postreload-gcse.c: Remove CC0. * postreload.c: Remove CC0. * print-rtl.c: Remove CC0. * read-rtl-function.c: Remove CC0. * reg-notes.def: Remove CC0. * reg-stack.c: Remove CC0. * reginfo.c: Remove CC0. * regrename.c: Remove CC0. * reload.c: Remove CC0. * reload1.c: Remove CC0. * reorg.c: Remove CC0. * resource.c: Remove CC0. * rtl.c: Remove CC0. * rtl.def: Remove CC0. * rtl.h: Remove CC0. * rtlanal.c: Remove CC0. * sched-deps.c: Remove CC0. * sched-rgn.c: Remove CC0. * shrink-wrap.c: Remove CC0. * simplify-rtx.c: Remove CC0. * system.h: Remove CC0. Poison NOTICE_UPDATE_CC, CC_STATUS_MDEP_INIT, CC_STATUS_MDEP, and CC_STATUS. * target.def: Remove CC0. * valtrack.c: Remove CC0. * var-tracking.c: Remove CC0.
2021-04-27Fix target/100106 ICE in gen_movdiBernd Edlinger1-0/+1
As the test case shows, the outer mode may have a higher alignment requirement than the inner mode here. 2021-04-27 Bernd Edlinger <bernd.edlinger@hotmail.de> PR target/100106 * simplify-rtx.c (simplify_context::simplify_subreg): Check the memory alignment for the outer mode. * gcc.c-torture/compile/pr100106.c: New testcase.
2021-04-13simplify-rtx: Punt on simplify_{,gen_}subreg to IBM double double if bits ↵Jakub Jelinek1-4/+18
are lost [PR99648] Similarly to PR95450 done on GIMPLE, this patch punts if we try to simplify_{gen_,}subreg from some constant into the IBM double double IFmode (or sometimes TFmode) if the double double format wouldn't preserve the bits. Not all values are valid in IBM double double, e.g. the format requires that the upper double is the whole value rounded to double, and if in some cases such as in the pr71522.c testcase with -m32 -Os -mcpu=power7 some non-floating data is copied through long double variable, we can simplify a subreg into something that has different value. Fixed by punting if the planned simplify_immed_subreg result doesn't encode to bitwise identical values compared to what we were decoding. As for the simplify_gen_subreg change, I think it would be desirable to just avoid creating SUBREGs of constants on all targets and for all constants, if simplify_immed_subreg simplified, fine, otherwise punt, but as we are late in GCC11 development, the patch instead guards this behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to powerpc{,64,64le} double double long double - and only for the cases where simplify_immed_subreg was called. 2021-04-13 Jakub Jelinek <jakub@redhat.com> PR target/99648 * simplify-rtx.c (simplify_immed_subreg): For MODE_COMPOSITE_P outermode, return NULL if the result doesn't encode back to the original byte sequence. (simplify_gen_subreg): Don't create SUBREGs from constants to MODE_COMPOSITE_P outermode.
2021-01-05simplify-rtx: Optimize (x - 1) * y + y [PR98334]Jakub Jelinek1-0/+56
We don't try to optimize for signed x, y (int) (x - 1U) * y + y into x * y, we can't do that with signed x * y, because the former is well defined for INT_MIN and -1, while the latter is not. We could perhaps optimize it during isel or some very late optimization where we'd turn magically flag_wrapv, but we don't do that yet. This patch optimizes it in simplify-rtx.c, such that we can optimize it during combine. 2021-01-05 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/98334 * simplify-rtx.c (simplify_context::simplify_binary_operation_1): Optimize (X - 1) * Y + Y to X * Y or (X + 1) * Y - Y to X * Y. * gcc.target/i386/pr98334.c: New test.
2021-01-04Update copyright years.Jakub Jelinek1-1/+1
2020-12-17simplify-rtx: Put simplify routines into a classRichard Sandiford1-63/+89
One of the recurring warts of RTL is that multiplication by a power of 2 is represented as a MULT inside a MEM but as an ASHIFT outside a MEM. It would obviously be better if we didn't have this kind of context sensitivity, but it would be difficult to remove. Currently the simplify-rtx.c routines are hard-coded for the ASHIFT form. This means that some callers have to convert the ASHIFTs “back” into MULTs after calling the simplify-rtx.c routines; see fwprop.c:canonicalize_address for an example. I think we can relieve some of the pain by wrapping the simplify-rtx.c routines in a simple class that tracks whether the expression occurs in a MEM or not, so that no post-processing is needed. An obvious concern is whether passing the “this” pointer around will slow things down or bloat the code. I can't measure any increase in compile time after applying the patch. Sizewise, simplify-rtx.o text increases by 2.3% in default-checking builds and 4.1% in release-checking builds. I realise the MULT/ASHIFT thing isn't the most palatable reason for doing this, but I think it might be useful for other things in future, such as using local nonzero_bits hooks/virtual functions instead of the global hooks. The obvious alternative would be to add a static variable and hope that it is always updated correctly. Later patches make use of this. gcc/ * rtl.h (simplify_context): New class. (simplify_unary_operation, simplify_binary_operation): Use it. (simplify_ternary_operation, simplify_relational_operation): Likewise. (simplify_subreg, simplify_gen_unary, simplify_gen_binary): Likewise. (simplify_gen_ternary, simplify_gen_relational): Likewise. (simplify_gen_subreg, lowpart_subreg): Likewise. * simplify-rtx.c (simplify_gen_binary): Turn into a member function of simplify_context. (simplify_gen_unary, simplify_gen_ternary, simplify_gen_relational) (simplify_truncation, simplify_unary_operation): Likewise. (simplify_unary_operation_1, simplify_byte_swapping_operation) (simplify_associative_operation, simplify_logical_relational_operation) (simplify_binary_operation, simplify_binary_operation_series) (simplify_distributive_operation, simplify_plus_minus): Likewise. (simplify_relational_operation, simplify_relational_operation_1) (simplify_cond_clz_ctz, simplify_merge_mask): Likewise. (simplify_ternary_operation, simplify_subreg, simplify_gen_subreg) (lowpart_subreg): Likewise. (simplify_binary_operation_1): Likewise. Test mem_depth when deciding whether the ASHIFT or MULT form is canonical. (simplify_merge_mask): Use simplify_context.
2020-10-22Simplify vec_select of a subreg of X to just a vec_select of X.liuhongt1-0/+41
gcc/ChangeLog PR rtl-optimization/97249 * simplify-rtx.c (simplify_binary_operation_1): Simplify vec_select of a subreg of X to a vec_select of X. gcc/testsuite/ChangeLog * gcc.target/i386/pr97249-1.c: New test.
2020-08-16middle-end: Simplify (sign_extend:HI (truncate:QI (ashiftrt:HI X 8)))Roger Sayle1-0/+32
The combination of several my recent nvptx patches has revealed an interesting RTL optimization opportunity. This patch to simplify-rtx.c simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just (ashiftrt:HI x 8), as the inner shift already sets the high bits appropriately. The equivalent zero_extend variant appears to already be implemented in simplify_unary_operation_1. These result from RTL expansion generating a reasonable arithmetic right shift and truncation to char, only to then discover the backend doesn't support QImode comparisons, so the next optab widens this result/operand back to HImode. In this sequence the truncate and sign extend are redundant as the original arithmetic shift has already set the high bits appropriately. The one oddity of the patch is that it tests for LSHIFTRT as inner shift, as simplify/combine has already canonicalized this to a logical shift, assuming that the distinction is unimportant following the truncatation. 2020-08-16 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]: Simplify (sign_extend:M (truncate:N (lshiftrt:M x C))) to (ashiftrt:M x C) when the shift sets the high bits appropriately.
2020-08-03PR rtl-optimization 61494: Preserve x-0.0 with HONOR_SNANS.Roger Sayle1-3/+4
The following patch avoids simplifying x-0.0 to x when -fsignaling-nans is specified, which resolves PR rtl-optimization 61494. Indeed, running the test program attached to that PR now reports no failures. 2020-08-02 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/61494 * simplify-rtx.c (simplify_binary_operation_1) [MINUS]: Don't simplify x - 0.0 with -fsignaling-nans.
2020-07-23Resolve regression rtl-optimization/96298. Sorry for the breakage.Roger Sayle1-1/+0
2020-07-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR rtl-optimization/96298 * simplify-rtx.c (simplify_binary_operation_1) [XOR]: Xor doesn't distribute over xor, so (a^b)^(c^b) is not the same as (a^c)^b.
2020-06-29middle-end: Optimize (A&C)^(B&C) to (A^B)&C in simplify_rtx (take 3).Roger Sayle1-0/+169
2020-06-29 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog: * simplify-rtx.c (simplify_distributive_operation): New function to un-distribute a binary operation of two binary operations. (X & C) ^ (Y & C) to (X ^ Y) & C, when C is simple (i.e. a constant). (simplify_binary_operation_1) <IOR, XOR, AND>: Call it from here when appropriate. (test_scalar_int_ops): New function for unit self-testing scalar integer transformations in simplify-rtx.c. (test_scalar_ops): Call test_scalar_int_ops for each integer mode. (simplify_rtx_c_tests): Call test_scalar_ops.
2020-06-24simplify-rtx: Simplify rotates by zeroRoger Sayle1-0/+2
2020-06-24 Roger Sayle <roger@nextmovesoftware.com> Segher Boessenkool <segher@kernel.crashing.org> * simplify-rtx.c (simplify_unary_operation_1): Simplify rotates by 0.
2020-06-24simplify-rtx: Parity of parity is parityRoger Sayle1-0/+4
2020-06-24 Roger Sayle <roger@nextmovesoftware.com> * simplify-rtx.c (simplify_unary_operation_1): Simplify (parity (parity x)) as (parity x), i.e. PARITY is idempotent.
2020-03-23Verify the code used for the optimized comparison is valid for the ↵Jeff Law1-0/+51
comparison's mode. PR rtl-optimization/90275 PR target/94238 PR target/94144 * simplify-rtx.c (comparison_code_valid_for_mode): New function. (simplify_logical_relational_operation): Use it. PR target/94144 PR target/94238 * gcc.c-torture/compile/pr94144.c: New test. * gcc.c-torture/compile/pr94238.c: New test.
2020-01-31middle-end: Fix logical shift truncation (PR rtl-optimization/91838)Tamar Christina1-3/+15
This fixes a fall-out from a patch I had submitted two years ago which started allowing simplify-rtx to fold logical right shifts by offsets a followed by b into >> (a + b). However this can generate inefficient code when the resulting shift count ends up being the same as the size of the shift mode. This will create some undefined behavior on most platforms. This patch changes to code to truncate to 0 if the shift amount goes out of range. Before my older patch this used to happen in combine when it saw the two shifts. However since we combine them here combine never gets a chance to truncate them. The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal with this shift constant but it's better to do the right thing in simplify-rtx. Note that this doesn't take care of the Arithmetic shift where you could replace the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it. gcc/ChangeLog: PR rtl-optimization/91838 * simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case to truncate if allowed or reject combination. gcc/testsuite/ChangeLog: PR rtl-optimization/91838 * g++.dg/pr91838.C: New test.
2020-01-29Revert g-465c7c89e92a6d6d582173e505cb16dcb9873034Richard Sandiford1-3/+1
The patch caused regressions in gcc.target/sh/pr64345-1.c on sh3-linux-gnu and gcc.target/m68k/pr39726.c on m68k-linux-gnu. It didn't look like they would be fixable in an acceptably non-invasive and unhacky way, so punting till future releases. 2020-01-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ Revert: 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> PR rtl-optimization/87763 * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract simplification to handle subregs as well as bare regs. * config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
2020-01-28simplify-rtx: Extend (truncate (*extract ...)) fold [PR87763]Richard Sandiford1-1/+3
In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have: Failed to match this instruction: (set (reg:SI 95) (ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0) (const_int 3 [0x3]) (const_int 0 [0])) 0) (const_int 19 [0x13]))) If we perform the natural simplification to: (set (reg:SI 95) (ashift:SI (sign_extract:SI (reg:SI 97) (const_int 3 [0x3]) (const_int 0 [0])) 0) (const_int 19 [0x13]))) then the pattern matches. And it turns out that we do have a simplification like that already, but it would only kick in for extractions from a reg, not a subreg. E.g.: (set (reg:SI 95) (ashift:SI (subreg:SI (sign_extract:DI (reg:DI X) (const_int 3 [0x3]) (const_int 0 [0])) 0) (const_int 19 [0x13]))) would simplify to: (set (reg:SI 95) (ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0) (const_int 3 [0x3]) (const_int 0 [0])) 0) (const_int 19 [0x13]))) IMO the subreg case is even more obviously a simplification than the bare reg case, since the net effect is to remove either one or two subregs, rather than simply change the position of a subreg/truncation. However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c for -m32 on x86_64-linux-gnu, because we could then simplify a :HI zero_extract to a :QI one. The associated *testqi_ext_3 pattern did already seem to want to handle QImode extractions: "ix86_match_ccmode (insn, CCNOmode) && ((TARGET_64BIT && GET_MODE (operands[2]) == DImode) || GET_MODE (operands[2]) == SImode || GET_MODE (operands[2]) == HImode || GET_MODE (operands[2]) == QImode) but I'm not sure how often the QI case would trigger in practice, since the zero_extract mode was restricted to HI and above. I checked the other x86 patterns and couldn't see any other instances of this. 2020-01-28 Richard Sandiford <richard.sandiford@arm.com> gcc/ PR rtl-optimization/87763 * simplify-rtx.c (simplify_truncation): Extend sign/zero_extract simplification to handle subregs as well as bare regs. * config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
2020-01-24simplify-rtx: Punt for modes with precision above MAX_BITSIZE_MODE_ANY_INT ↵Jakub Jelinek1-1/+8
[PR93376] The following patch makes sure we punt in the 3 spots if precision is above MAX_BITSIZE_MODE_ANY_INT. 2020-01-24 Jakub Jelinek <jakub@redhat.com> PR target/93376 * simplify-rtx.c (simplify_const_unary_operation, simplify_const_binary_operation): Punt for mode precision above MAX_BITSIZE_MODE_ANY_INT.
2020-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r279813
2019-12-13re PR target/92908 (wrong code with -Og -fno-tree-fre -mavx512bw and vector ↵Jakub Jelinek1-1/+10
compare) PR target/92908 * simplify-rtx.c (simplify_relational_operation): Punt for vector cmp_mode and scalar mode, if simplify_relational_operation returned const_true_rtx. (simplify_const_relational_operation): Change VOID_mode in function comment to VOIDmode. * gcc.target/i386/avx512bw-pr92908.c: New test. From-SVN: r279369
2019-11-19Revert r278441Richard Sandiford1-42/+14
To restore powerpc bootstrap. 2019-11-19 Richard Sandiford <richard.sandiford@arm.com> gcc/ Revert: 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> * cse.c (cse_insn): Delete no-op register moves too. * simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons. Take a second comparison to control the value for NE. (mask_to_comparison): Handle unsigned comparisons. (simplify_logical_relational_operation): Likewise. Update call to comparison_to_mask. Handle AND if !HONOR_NANs. (simplify_binary_operation_1): Call the above for AND too. gcc/testsuite/ Revert: 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> * gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test. From-SVN: r278455
2019-11-18Two RTL CC tweaks for SVE pmore/plast conditionsRichard Sandiford1-14/+42
SVE has two composite conditions: pmore == at least one bit set && last bit clear plast == no bits set || last bit set So in general we generate them from: A: CC = test bits B: reg1 = first condition C: CC = test bits D: reg2 = second condition E: result = (reg1 op reg2) where op is || or && To fold all this into a single test, we need to be able to remove the redundant C (the cse.c patch) and then fold B, D and E down to a single condition (the simplify-rtx.c patch). The underlying conditions are unsigned, so the simplify-rtx.c part needs to support both unsigned comparisons and AND. However, to avoid opening the can of worms that is ANDing FP comparisons for unordered inputs, I've restricted the new AND handling to cases in which NaNs can be ignored. I think this is still a strict extension of what we have now, it just doesn't go as far as it could. Going further would need an entirely different set of testcases so I think would make more sense as separate work. 2019-11-18 Richard Sandiford <richard.sandiford@arm.com> gcc/ * cse.c (cse_insn): Delete no-op register moves too. * simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons. Take a second comparison to control the value for NE. (mask_to_comparison): Handle unsigned comparisons. (simplify_logical_relational_operation): Likewise. Update call to comparison_to_mask. Handle AND if !HONOR_NANs. (simplify_binary_operation_1): Call the above for AND too. gcc/testsuite/ * gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test. From-SVN: r278411
2019-11-07simplify-rtx: simplify_logical_relational_operationSegher Boessenkool1-0/+130
This introduces simplify_logical_relational_operation. Currently the only thing implemented it can simplify is the IOR of two CONDs of the same arguments. * simplify-rtx.c (comparison_to_mask): New function. (mask_to_comparison): New function. (simplify_logical_relational_operation): New function. (simplify_binary_operation_1): Call simplify_logical_relational_operation. From-SVN: r277931
2019-09-21Extend neg_const_int simplifications to other const rtxesRichard Sandiford1-18/+13
This patch generalises some neg_const_int-based rtx simplifications so that they handle all CONST_SCALAR_INTs and also CONST_POLY_INT. This actually simplifies things a bit, since we no longer have to treat HOST_WIDE_INT_MIN specially. This is tested by later SVE patches. 2019-09-21 Richard Sandiford <richard.sandiford@arm.com> gcc/ * simplify-rtx.c (neg_const_int): Replace with... (neg_poly_int_rtx): ...this new function. (simplify_binary_operation_1): Extend (minus x C) -> (plus X -C) to all CONST_SCALAR_INTs and to CONST_POLY_INT. (simplify_plus_minus): Likewise for constant terms here. From-SVN: r276017
2019-09-19Rework constant subreg folds and handle more variable-length casesRichard Sandiford1-311/+591
This patch rewrites the way simplify_subreg handles constants. It uses similar native_encode/native_decode routines to the tree-level handling of VIEW_CONVERT_EXPR, meaning that we can move between rtx constants and the target memory image of them. The main point of this patch is to support subregs of constant-length vectors for VLA vectors, beyond the very simple cases that were already handled. Many of the new tests failed before the patch for variable- length vectors. The boolean side is tested more by the upcoming SVE ACLE work. 2019-09-19 Richard Sandiford <richard.sandiford@arm.com> gcc/ * defaults.h (TARGET_UNIT): New macro. (target_unit): New type. * rtl.h (native_encode_rtx, native_decode_rtx) (native_decode_vector_rtx, subreg_size_lsb): Declare. (subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb. * rtlanal.c (subreg_lsb_1): Delete. (subreg_size_lsb): New function. * simplify-rtx.c: Include rtx-vector-builder.h (simplify_immed_subreg): Delete. (native_encode_rtx, native_decode_vector_rtx, native_decode_rtx) (simplify_const_vector_byte_offset, simplify_const_vector_subreg): New functions. (simplify_subreg): Use them. (test_vector_subregs_modes, test_vector_subregs_repeating) (test_vector_subregs_fore_back, test_vector_subregs_stepped) (test_vector_subregs): New functions. (test_vector_ops): Call test_vector_subregs for integer vector modes with at least 2 elements. From-SVN: r275959
2019-07-29Generalise VEC_DUPLICATE folding for variable-length vectorsRichard Sandiford1-13/+26
This patch uses the constant vector encoding scheme to handle more cases of a VEC_DUPLICATE of another vector. Duplicating any fixed-length vector is fine, and duplicating a variable-length vector is OK as long as that vector is also a duplicate of a fixed-length sequence. Other cases fell through to: if (VECTOR_MODE_P (mode) && GET_CODE (op) == CONST_VECTOR) which was only expecting to deal with elementwise operations. 2019-07-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ * simplify-rtx.c (simplify_const_unary_operation): Fold a VEC_DUPLICATE of a fixed-length vector even if the result is variable-length. Likewise fold a duplicate of a variable-length vector if the variable-length vector is itself a duplicate of a fixed-length sequence. (test_vector_ops_duplicate): Test more cases. From-SVN: r273868
2019-07-29Implement more rtx vector folds on variable-length vectorsRichard Sandiford1-25/+114
This patch extends the tree-level folding of variable-length vectors so that it can also be used on rtxes. The first step is to move the tree_vector_builder new_unary/binary_operator routines to the parent vector_builder class (which in turn means adding a new template parameter). The second step is to make simplify-rtx.c use a direct rtx analogue of the VECTOR_CST handling in fold-const.c. 2019-07-29 Richard Sandiford <richard.sandiford@arm.com> gcc/ * vector-builder.h (vector_builder): Add a shape template parameter. (vector_builder::new_unary_operation): New function, generalizing the old tree_vector_builder function. (vector_builder::new_binary_operation): Likewise. (vector_builder::binary_encoded_nelts): Likewise. * int-vector-builder.h (int_vector_builder): Update template parameters to vector_builder. (int_vector_builder::shape_nelts): New function. * rtx-vector-builder.h (rtx_vector_builder): Update template parameters to vector_builder. (rtx_vector_builder::shape_nelts): New function. (rtx_vector_builder::nelts_of): Likewise. (rtx_vector_builder::npatterns_of): Likewise. (rtx_vector_builder::nelts_per_pattern_of): Likewise. * tree-vector-builder.h (tree_vector_builder): Update template parameters to vector_builder. (tree_vector_builder::shape_nelts): New function. (tree_vector_builder::nelts_of): Likewise. (tree_vector_builder::npatterns_of): Likewise. (tree_vector_builder::nelts_per_pattern_of): Likewise. * tree-vector-builder.c (tree_vector_builder::new_unary_operation) (tree_vector_builder::new_binary_operation): Delete. (tree_vector_builder::binary_encoded_nelts): Likewise. * simplify-rtx.c: Include rtx-vector-builder.h. (distributes_over_addition_p): New function. (simplify_const_unary_operation) (simplify_const_binary_operation): Generalize handling of vector constants to include variable-length vectors. (test_vector_ops_series): Add more tests. From-SVN: r273867
2019-07-09simplify-rtx.c (simplify_unary_operation_1): Use GET_MODE_PRECISION rather ↵John Darrington1-4/+4
than GET_MODE_BITSIZE to better handle partial... 2019-07-09 John Darrington <john@darrington.wattle.id.au> * simplify-rtx.c (simplify_unary_operation_1): Use GET_MODE_PRECISION rather than GET_MODE_BITSIZE to better handle partial integer modes. From-SVN: r273312
2019-07-04re PR target/88833 ([SVE] Redundant moves for WHILELO-based loops)Prathamesh Kulkarni1-0/+11
2019-07-04 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> PR target/88833 * fwprop.c (reg_single_def_p): New function. (propagate_rtx_1): Add unconditional else inside RTX_EXTRA case. (forward_propagate_into): New parameter reg_prop_only with default value false. Propagate def's src into loop only if SET_SRC and SET_DEST of def_set have single definitions. Likewise if reg_prop_only is set to true. (fwprop): New param fwprop_addr_p. Integrate fwprop_addr into fwprop. (fwprop_addr): Remove. (pass_rtl_fwprop_addr::execute): Call fwprop with arg set to true. (pass_rtl_fwprop::execute): Call fwprop with arg set to false. * simplify-rtx.c (simplify_subreg): Add case for vector comparison. * config/i386/sse.md (UNSPEC_BLENDV): Adjust pattern. testsuite/ * gfortran.dg/pr88833.f90: New test. From-SVN: r273040
2019-02-24re PR rtl-optimization/89445 (_mm512_maskz_loadu_pd "forgets" to use the mask)Jakub Jelinek1-2/+4
PR rtl-optimization/89445 * simplify-rtx.c (simplify_ternary_operation): Don't use simplify_merge_mask on operands that may trap. * rtlanal.c (may_trap_p_1): Use FLOAT_MODE_P instead of SCALAR_FLOAT_MODE_P checks. For integral division by zero, if second operand is CONST_VECTOR, check if any element could be zero. Don't expect traps for VEC_{MERGE,SELECT,CONCAT,DUPLICATE} unless their operands can trap. * gcc.target/i386/avx512f-pr89445.c: New test. From-SVN: r269176
2019-01-09PR other/16615 [1/5]Sandra Loosemore1-1/+1
2019-01-09 Sandra Loosemore <sandra@codesourcery.com> PR other/16615 [1/5] contrib/ * mklog: Mechanically replace "can not" with "cannot". gcc/ * Makefile.in: Mechanically replace "can not" with "cannot". * alias.c: Likewise. * builtins.c: Likewise. * calls.c: Likewise. * cgraph.c: Likewise. * cgraph.h: Likewise. * cgraphclones.c: Likewise. * cgraphunit.c: Likewise. * combine-stack-adj.c: Likewise. * combine.c: Likewise. * common/config/i386/i386-common.c: Likewise. * config/aarch64/aarch64.c: Likewise. * config/alpha/sync.md: Likewise. * config/arc/arc.c: Likewise. * config/arc/predicates.md: Likewise. * config/arm/arm-c.c: Likewise. * config/arm/arm.c: Likewise. * config/arm/arm.h: Likewise. * config/arm/arm.md: Likewise. * config/arm/cortex-r4f.md: Likewise. * config/csky/csky.c: Likewise. * config/csky/csky.h: Likewise. * config/darwin-f.c: Likewise. * config/epiphany/epiphany.md: Likewise. * config/i386/i386.c: Likewise. * config/i386/sol2.h: Likewise. * config/m68k/m68k.c: Likewise. * config/mcore/mcore.h: Likewise. * config/microblaze/microblaze.md: Likewise. * config/mips/20kc.md: Likewise. * config/mips/sb1.md: Likewise. * config/nds32/nds32.c: Likewise. * config/nds32/predicates.md: Likewise. * config/pa/pa.c: Likewise. * config/rs6000/e300c2c3.md: Likewise. * config/rs6000/rs6000.c: Likewise. * config/s390/s390.h: Likewise. * config/sh/sh.c: Likewise. * config/sh/sh.md: Likewise. * config/spu/vmx2spu.h: Likewise. * cprop.c: Likewise. * dbxout.c: Likewise. * df-scan.c: Likewise. * doc/cfg.texi: Likewise. * doc/extend.texi: Likewise. * doc/fragments.texi: Likewise. * doc/gty.texi: Likewise. * doc/invoke.texi: Likewise. * doc/lto.texi: Likewise. * doc/md.texi: Likewise. * doc/objc.texi: Likewise. * doc/rtl.texi: Likewise. * doc/tm.texi: Likewise. * dse.c: Likewise. * emit-rtl.c: Likewise. * emit-rtl.h: Likewise. * except.c: Likewise. * expmed.c: Likewise. * expr.c: Likewise. * fold-const.c: Likewise. * genautomata.c: Likewise. * gimple-fold.c: Likewise. * hard-reg-set.h: Likewise. * ifcvt.c: Likewise. * ipa-comdats.c: Likewise. * ipa-cp.c: Likewise. * ipa-devirt.c: Likewise. * ipa-fnsummary.c: Likewise. * ipa-icf.c: Likewise. * ipa-inline-transform.c: Likewise. * ipa-inline.c: Likewise. * ipa-polymorphic-call.c: Likewise. * ipa-profile.c: Likewise. * ipa-prop.c: Likewise. * ipa-pure-const.c: Likewise. * ipa-reference.c: Likewise. * ipa-split.c: Likewise. * ipa-visibility.c: Likewise. * ipa.c: Likewise. * ira-build.c: Likewise. * ira-color.c: Likewise. * ira-conflicts.c: Likewise. * ira-costs.c: Likewise. * ira-int.h: Likewise. * ira-lives.c: Likewise. * ira.c: Likewise. * ira.h: Likewise. * loop-invariant.c: Likewise. * loop-unroll.c: Likewise. * lower-subreg.c: Likewise. * lra-assigns.c: Likewise. * lra-constraints.c: Likewise. * lra-eliminations.c: Likewise. * lra-lives.c: Likewise. * lra-remat.c: Likewise. * lra-spills.c: Likewise. * lra.c: Likewise. * lto-cgraph.c: Likewise. * lto-streamer-out.c: Likewise. * postreload-gcse.c: Likewise. * predict.c: Likewise. * profile-count.h: Likewise. * profile.c: Likewise. * recog.c: Likewise. * ree.c: Likewise. * reload.c: Likewise. * reload1.c: Likewise. * reorg.c: Likewise. * resource.c: Likewise. * rtl.def: Likewise. * rtl.h: Likewise. * rtlanal.c: Likewise. * sched-deps.c: Likewise. * sched-ebb.c: Likewise. * sched-rgn.c: Likewise. * sel-sched-ir.c: Likewise. * sel-sched.c: Likewise. * shrink-wrap.c: Likewise. * simplify-rtx.c: Likewise. * symtab.c: Likewise. * target.def: Likewise. * toplev.c: Likewise. * tree-call-cdce.c: Likewise. * tree-cfg.c: Likewise. * tree-complex.c: Likewise. * tree-core.h: Likewise. * tree-eh.c: Likewise. * tree-inline.c: Likewise. * tree-loop-distribution.c: Likewise. * tree-nrv.c: Likewise. * tree-profile.c: Likewise. * tree-sra.c: Likewise. * tree-ssa-alias.c: Likewise. * tree-ssa-dce.c: Likewise. * tree-ssa-dom.c: Likewise. * tree-ssa-forwprop.c: Likewise. * tree-ssa-loop-im.c: Likewise. * tree-ssa-loop-ivcanon.c: Likewise. * tree-ssa-loop-ivopts.c: Likewise. * tree-ssa-loop-niter.c: Likewise. * tree-ssa-phionlycprop.c: Likewise. * tree-ssa-phiopt.c: Likewise. * tree-ssa-propagate.c: Likewise. * tree-ssa-threadedge.c: Likewise. * tree-ssa-threadupdate.c: Likewise. * tree-ssa-uninit.c: Likewise. * tree-ssanames.c: Likewise. * tree-streamer-out.c: Likewise. * tree.c: Likewise. * tree.h: Likewise. * vr-values.c: Likewise. gcc/ada/ * exp_ch9.adb: Mechanically replace "can not" with "cannot". * libgnat/s-regpat.ads: Likewise. * par-ch4.adb: Likewise. * set_targ.adb: Likewise. * types.ads: Likewise. gcc/cp/ * cp-tree.h: Mechanically replace "can not" with "cannot". * parser.c: Likewise. * pt.c: Likewise. gcc/fortran/ * class.c: Mechanically replace "can not" with "cannot". * decl.c: Likewise. * expr.c: Likewise. * gfc-internals.texi: Likewise. * intrinsic.texi: Likewise. * invoke.texi: Likewise. * io.c: Likewise. * match.c: Likewise. * parse.c: Likewise. * primary.c: Likewise. * resolve.c: Likewise. * symbol.c: Likewise. * trans-array.c: Likewise. * trans-decl.c: Likewise. * trans-intrinsic.c: Likewise. * trans-stmt.c: Likewise. gcc/go/ * go-backend.c: Mechanically replace "can not" with "cannot". * go-gcc.cc: Likewise. gcc/lto/ * lto-partition.c: Mechanically replace "can not" with "cannot". * lto-symtab.c: Likewise. * lto.c: Likewise. gcc/objc/ * objc-act.c: Mechanically replace "can not" with "cannot". libbacktrace/ * backtrace.h: Mechanically replace "can not" with "cannot". libgcc/ * config/c6x/libunwind.S: Mechanically replace "can not" with "cannot". * config/tilepro/atomic.h: Likewise. * config/vxlib-tls.c: Likewise. * generic-morestack-thread.c: Likewise. * generic-morestack.c: Likewise. * mkmap-symver.awk: Likewise. libgfortran/ * caf/single.c: Mechanically replace "can not" with "cannot". * io/unit.c: Likewise. libobjc/ * class.c: Mechanically replace "can not" with "cannot". * objc/runtime.h: Likewise. * sendmsg.c: Likewise. liboffloadmic/ * include/coi/common/COIResult_common.h: Mechanically replace "can not" with "cannot". * include/coi/source/COIBuffer_source.h: Likewise. libstdc++-v3/ * include/ext/bitmap_allocator.h: Mechanically replace "can not" with "cannot". From-SVN: r267783
2019-01-01Update copyright years.Jakub Jelinek1-1/+1
From-SVN: r267494
2018-11-13re PR rtl-optimization/87918 (ICE in simplify_binary_operation, at ↵Jakub Jelinek1-3/+13
simplify-rtx.c:2153 since r264688) PR rtl-optimization/87918 * simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use simplify_gen_relational rather than simplify_gen_binary. * gcc.target/i386/pr87918.c: New test. From-SVN: r266062
2018-11-06re PR middle-end/18041 (OR of two single-bit bitfields is inefficient)Richard Biener1-0/+32
2018-11-06 Richard Biener <rguenther@suse.de> PR middle-end/18041 * simplify-rtx.c (simplify_binary_operation_1): Add pattern matching bitfield insertion. * gcc.target/i386/pr18041-1.c: New testcase. * gcc.target/i386/pr18041-2.c: Likewise. From-SVN: r265829
2018-10-18Limit mask of vec_merge to HOST_BITS_PER_WIDE_INTH.J. Lu1-0/+3
Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is the maximum number of vector elements. * simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to HOST_BITS_PER_WIDE_INT. (test_vector_ops_duplicate): Likewise. From-SVN: r265290
2018-10-18Call simplify_gen_subreg to simplify subreg of vec_mergeH.J. Lu1-6/+7
Simplify (subreg (vec_merge (X) (vector) (const_int ((1 << N) | M))) (N * sizeof (outermode))) to (subreg (X) (N * sizeof (outermode))) * simplify-rtx.c (simplify_subreg): Call simplify_gen_subreg to simplify subreg of vec_merge. From-SVN: r265267
2018-10-18Simplify subreg of vec_merge of vec_duplicateH.J. Lu1-1/+28
We can simplify (subreg (vec_merge (vec_duplicate X) (vector) (const_int ((1 << N) | M))) (N * sizeof (X))) to X when mode of X is the same as of mode of subreg. gcc/ PR target/87537 * simplify-rtx.c (simplify_subreg): Simplify subreg of vec_merge of vec_duplicate. (test_vector_ops_duplicate): Add test for a scalar subreg of a VEC_MERGE of a VEC_DUPLICATE. gcc/testsuite/ PR target/87537 * gcc.target/i386/pr87537-1.c: New test. From-SVN: r265260
2018-09-28Simplify vec_merge according to the mask.Andrew Stubbs1-0/+136
This patch was part of the original patch we acquired from Honza and Martin. It simplifies nested vec_merge operations using the same mask. Self-tests are included. 2018-09-28 Andrew Stubbs <ams@codesourcery.com> Jan Hubicka <jh@suse.cz> Martin Jambor <mjambor@suse.cz> * simplify-rtx.c (simplify_merge_mask): New function. (simplify_ternary_operation): Use it, also see if VEC_MERGEs with the same masks are used in op1 or op2. (test_vec_merge): New function. (test_vector_ops): Call test_vec_merge. Co-Authored-By: Jan Hubicka <jh@suse.cz> Co-Authored-By: Martin Jambor <mjambor@suse.cz> From-SVN: r264688
2018-09-19Remove constant vec_select restriction.Andrew Stubbs1-2/+7
The vec_select operator is documented to require a const_int for the lane selector operand, but GCN has an instruction that can select the lane at runtime, so it seems reasonable to remove this restriction. This patch simply replaces assertions that the operand is constant with early exits from the optimizers. I think it's reasonable that vec_select with a non-constant operand cannot be optimized, yet. Also included is the necessary documentation tweak. 2018-09-19 Andrew Stubbs <ams@codesourcery.com> gcc/ * doc/rtl.texi: Adjust vec_select description. * simplify-rtx.c (simplify_binary_operation_1): Allow VEC_SELECT to use non-constant selectors. From-SVN: r264423
2018-07-07tree-vrp.c (vrp_int_const_binop): Change overflow type to overflow_type.Aldy Hernandez1-1/+1
* tree-vrp.c (vrp_int_const_binop): Change overflow type to overflow_type. (combine_bound): Use wide-int overflow calculation instead of rolling our own. * calls.c (maybe_warn_alloc_args_overflow): Change overflow type to overflow_type. * fold-const.c (int_const_binop_2): Same. (extract_muldiv_1): Same. (fold_div_compare): Same. (fold_abs_const): Same. * match.pd: Same. * poly-int.h (add): Same. (sub): Same. (neg): Same. (mul): Same. * predict.c (predict_iv_comparison): Same. * profile-count.c (slow_safe_scale_64bit): Same. * simplify-rtx.c (simplify_const_binary_operation): Same. * tree-chrec.c (tree_fold_binomial): Same. * tree-data-ref.c (split_constant_offset_1): Same. * tree-if-conv.c (idx_within_array_bound): Same. * tree-scalar-evolution.c (iv_can_overflow_p): Same. * tree-ssa-phiopt.c (minmax_replacement): Same. * tree-vect-loop.c (is_nonwrapping_integer_induction): Same. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Same. * vr-values.c (vr_values::adjust_range_with_scev): Same. * wide-int.cc (wi::add_large): Same. (wi::mul_internal): Same. (wi::sub_large): Same. (wi::divmod_internal): Same. * wide-int.h: Change overflow type to overflow_type for neg, add, mul, smul, umul, div_trunc, div_floor, div_ceil, div_round, mod_trunc, mod_ceil, mod_round, add_large, sub_large, mul_internal, divmod_internal. (overflow_type): New enum. (accumulate_overflow): New. cp/ * decl.c (build_enumerator): Change overflow type to overflow_type. * init.c (build_new_1): Same. From-SVN: r262494
2018-06-12Use poly_int rtx accessors instead of hwi accessorsRichard Sandiford1-13/+7
This patch generalises various places that used hwi rtx accessors so that they can handle poly_ints instead. In many cases these changes are by inspection rather than because something had shown them to be necessary. 2018-06-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * poly-int.h (can_div_trunc_p): Add new overload in which all values are poly_ints. * alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p. (memrefs_conflict_p): Likewise. (init_alias_analysis): Likewise. * cfgexpand.c (expand_debug_expr): Likewise. * combine.c (combine_simplify_rtx, force_int_to_mode): Likewise. * cse.c (fold_rtx): Likewise. * explow.c (adjust_stack, anti_adjust_stack): Likewise. * expr.c (emit_block_move_hints): Likewise. (clear_storage_hints, push_block, emit_push_insn): Likewise. (store_expr_with_bounds, reduce_to_bit_field_precision): Likewise. (emit_group_load_1): Use rtx_to_poly_int64 for group offsets. (emit_group_store): Likewise. (find_args_size_adjust): Use strip_offset. Use rtx_to_poly_int64 to read the PRE/POST_MODIFY increment. * calls.c (store_one_arg): Use strip_offset. * rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to poly_int_rtx_p. (set_noop_p): Use rtx_to_poly_int64 for the elements selected by a VEC_SELECT. * simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset. (simplify_binary_operation_1): Extend CONST_INT handling to poly_int_rtx_p. * var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather than a HOST_WIDE_INT. (hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to poly_int64. (adjust_mems, add_stores): Update accodingly. (vt_canonicalize_addr): Track polynomial offsets. (emit_note_insn_var_location): Likewise. (vt_add_function_parameter): Likewise. (vt_initialize): Likewise. From-SVN: r261530
2018-05-17[patch AArch64] Do not perform a vector splat for vector initialisation if ↵James Greenhalgh1-0/+54
it is not useful In the testcase in this patch we create an SLP vector with only two elements. Our current vector initialisation code will first duplicate the first element to both lanes, then overwrite the top lane with a new value. This duplication can be clunky and wasteful. Better would be to simply use the fact that we will always be overwriting the remaining bits, and simply move the first element to the corrcet place (implicitly zeroing all other bits). This reduces the code generation for this case, and can allow more efficient addressing modes, and other second order benefits for AArch64 code which has been vectorized to V2DI mode. Note that the change is generic enough to catch the case for any vector mode, but is expected to be most useful for 2x64-bit vectorization. Unfortunately, on its own, this would cause failures in gcc.target/aarch64/load_v2vec_lanes_1.c and gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more vec_merge and vec_duplicate for their simplifications to apply. To fix this, add a special case to the AArch64 code if we are loading from two memory addresses, and use the load_pair_lanes patterns directly. We also need a new pattern in simplify-rtx.c:simplify_ternary_operation to catch: (vec_merge:OUTER (vec_duplicate:OUTER x:INNER) (subreg:OUTER y:INNER 0) (const_int N)) And simplify it to: (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) This is similar to the existing patterns which are tested in this function, without requiring the second operand to also be a vec_duplicate. * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. * gcc.target/aarch64/vect-slp-dup.c: New. Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com> From-SVN: r260309
2018-04-25re PR middle-end/85414 (ICE: in ix86_expand_prologue, at ↵Jakub Jelinek1-2/+2
config/i386/i386.c:13810 with -Og -fgcse) PR middle-end/85414 * simplify-rtx.c (simplify_unary_operation_1) <case SIGN_EXTEND, case ZERO_EXTEND>: Pass SUBREG_REG (op) rather than op to gen_lowpart_no_emit. From-SVN: r259649
2018-04-13re PR rtl-optimization/85376 (wrong code with -Og -fno-dce -fgcse ↵Jakub Jelinek1-2/+2
-fno-tree-ccp -fno-tree-copy-prop) PR rtl-optimization/85376 * simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX instead of a specific value. * gcc.dg/pr85376.c: New test. From-SVN: r259377
2018-03-21re PR rtl-optimization/84989 (_mm512_broadcast_f32x4 triggers ICE in ↵Jakub Jelinek1-1/+3
simplify_const_unary_operation, at simplify-rtx.c:1731) PR rtl-optimization/84989 * simplify-rtx.c (simplify_unary_operation_1): Don't try to simplify VEC_DUPLICATE with scalar result mode. * gcc.target/i386/pr84989.c: New test. From-SVN: r258709
2018-01-20re PR target/83930 (ICE: RTL check: expected code 'const_int', have 'mem' in ↵Jakub Jelinek1-1/+2
simplify_binary_operation_1, at simplify-rtx.c:3302) PR target/83930 * simplify-rtx.c (simplify_binary_operation_1) <case UMOD>: Use UINTVAL (trueop1) instead of INTVAL (op1). * gcc.dg/pr83930.c: New test. From-SVN: r256915