Age | Commit message (Collapse) | Author | Files | Lines |
|
As a general principle, vec_duplicate should be as close to the root
of an expression as possible. Where unary operations have
vec_duplicate as an argument, these operations should be pushed
inside the vec_duplicate.
This patch modifies unary operation simplification to push
sign/zero-extension of a scalar inside vec_duplicate.
This patch also updates all RTL patterns in aarch64-simd.md to use
the new canonical form.
gcc/ChangeLog:
2021-07-19 Jonathan Wright <jonathan.wright@arm.com>
* config/aarch64/aarch64-simd.md: Push sign/zero-extension
inside vec_duplicate for all patterns.
* simplify-rtx.c (simplify_context::simplify_unary_operation_1):
Push sign/zero-extension inside vec_duplicate.
|
|
gcc/c-family/ChangeLog:
* c-common.c (c_build_shufflevector): Adjust by-value argument to
by-const-reference.
* c-common.h (c_build_shufflevector): Same.
gcc/c/ChangeLog:
* c-tree.h (c_build_function_call_vec): Adjust by-value argument to
by-const-reference.
* c-typeck.c (c_build_function_call_vec): Same.
gcc/ChangeLog:
* cfgloop.h (single_likely_exit): Adjust by-value argument to
by-const-reference.
* cfgloopanal.c (single_likely_exit): Same.
* cgraph.h (struct cgraph_node): Same.
* cgraphclones.c (cgraph_node::create_virtual_clone): Same.
* genautomata.c (merge_states): Same.
* genextract.c (VEC_char_to_string): Same.
* genmatch.c (dt_node::gen_kids_1): Same.
(walk_captures): Adjust by-value argument to by-reference.
* gimple-ssa-store-merging.c (check_no_overlap): Adjust by-value argument
to by-const-reference.
* gimple.c (gimple_build_call_vec): Same.
(gimple_build_call_internal_vec): Same.
(gimple_build_switch): Same.
(sort_case_labels): Same.
(preprocess_case_label_vec_for_gimple): Adjust by-value argument to
by-reference.
* gimple.h (gimple_build_call_vec): Adjust by-value argument to
by-const-reference.
(gimple_build_call_internal_vec): Same.
(gimple_build_switch): Same.
(sort_case_labels): Same.
(preprocess_case_label_vec_for_gimple): Adjust by-value argument to
by-reference.
* haifa-sched.c (calc_priorities): Adjust by-value argument to
by-const-reference.
(sched_init_luids): Same.
(haifa_init_h_i_d): Same.
* ipa-cp.c (ipa_get_indirect_edge_target_1): Same.
(adjust_callers_for_value_intersection): Adjust by-value argument to
by-reference.
(find_more_scalar_values_for_callers_subset): Adjust by-value argument to
by-const-reference.
(find_more_contexts_for_caller_subset): Same.
(find_aggregate_values_for_callers_subset): Same.
(copy_useful_known_contexts): Same.
* ipa-fnsummary.c (remap_edge_summaries): Same.
(remap_freqcounting_predicate): Same.
* ipa-inline.c (add_new_edges_to_heap): Adjust by-value argument to
by-reference.
* ipa-predicate.c (predicate::remap_after_inlining): Adjust by-value argument
to by-const-reference.
* ipa-predicate.h (predicate::remap_after_inlining): Same.
* ipa-prop.c (ipa_find_agg_cst_for_param): Same.
* ipa-prop.h (ipa_find_agg_cst_for_param): Same.
* ira-build.c (ira_loop_tree_body_rev_postorder): Same.
* read-rtl.c (add_overload_instance): Same.
* rtl.h (native_decode_rtx): Same.
(native_decode_vector_rtx): Same.
* sched-int.h (sched_init_luids): Same.
(haifa_init_h_i_d): Same.
* simplify-rtx.c (native_decode_vector_rtx): Same.
(native_decode_rtx): Same.
* tree-call-cdce.c (gen_shrink_wrap_conditions): Same.
(shrink_wrap_one_built_in_call_with_conds): Same.
(shrink_wrap_conditional_dead_built_in_calls): Same.
* tree-data-ref.c (create_runtime_alias_checks): Same.
(compute_all_dependences): Same.
* tree-data-ref.h (compute_all_dependences): Same.
(create_runtime_alias_checks): Same.
(index_in_loop_nest): Same.
* tree-if-conv.c (mask_exists): Same.
* tree-loop-distribution.c (class loop_distribution): Same.
(loop_distribution::create_rdg_vertices): Same.
(dump_rdg_partitions): Same.
(debug_rdg_partitions): Same.
(partition_contains_all_rw): Same.
(loop_distribution::distribute_loop): Same.
* tree-parloops.c (oacc_entry_exit_ok_1): Same.
(oacc_entry_exit_single_gang): Same.
* tree-ssa-loop-im.c (hoist_memory_references): Same.
(loop_suitable_for_sm): Same.
* tree-ssa-loop-niter.c (bound_index): Same.
* tree-ssa-reassoc.c (update_ops): Same.
(swap_ops_for_binary_stmt): Same.
(rewrite_expr_tree): Same.
(rewrite_expr_tree_parallel): Same.
* tree-ssa-sccvn.c (ao_ref_init_from_vn_reference): Same.
* tree-ssa-sccvn.h (ao_ref_init_from_vn_reference): Same.
* tree-ssa-structalias.c (process_all_all_constraints): Same.
(make_constraints_to): Same.
(handle_lhs_call): Same.
(find_func_aliases_for_builtin_call): Same.
(sort_fieldstack): Same.
(check_for_overlaps): Same.
* tree-vect-loop-manip.c (vect_create_cond_for_align_checks): Same.
(vect_create_cond_for_unequal_addrs): Same.
(vect_create_cond_for_lower_bounds): Same.
(vect_create_cond_for_alias_checks): Same.
* tree-vect-slp-patterns.c (vect_validate_multiplication): Same.
* tree-vect-slp.c (vect_analyze_slp_instance): Same.
(vect_make_slp_decision): Same.
(vect_slp_bbs): Same.
(duplicate_and_interleave): Same.
(vect_transform_slp_perm_load): Same.
(vect_schedule_slp): Same.
* tree-vectorizer.h (vect_transform_slp_perm_load): Same.
(vect_schedule_slp): Same.
(duplicate_and_interleave): Same.
* tree.c (build_vector_from_ctor): Same.
(build_vector): Same.
(check_vector_cst): Same.
(check_vector_cst_duplicate): Same.
(check_vector_cst_fill): Same.
(check_vector_cst_stepped): Same.
* tree.h (build_vector_from_ctor): Same.
|
|
Add a new RTL simplification for the case of a VEC_SELECT selecting
the low part of a vector. The simplification returns a SUBREG.
The primary goal of this patch is to enable better combinations of
Neon RTL patterns - specifically allowing generation of 'write-to-
high-half' narrowing intructions.
Adding this RTL simplification means that the expected results for a
number of tests need to be updated:
* aarch64 Neon: Update the scan-assembler regex for intrinsics tests
to expect a scalar register instead of lane 0 of a vector.
* aarch64 SVE: Likewise.
* arm MVE: Use lane 1 instead of lane 0 for lane-extraction
intrinsics tests (as the move instructions get optimized away for
lane 0.)
This patch also adds new code generation tests to
narrow_high_combine.c to verify the benefit of this RTL
simplification.
gcc/ChangeLog:
2021-06-08 Jonathan Wright <jonathan.wright@arm.com>
* combine.c (combine_simplify_rtx): Add vec_select -> subreg
simplification.
* config/aarch64/aarch64.md (*zero_extend<SHORT:mode><GPI:mode>2_aarch64):
Add Neon to general purpose register case for zero-extend
pattern.
* config/arm/vfp.md (*arm_movsi_vfp): Remove "*" from *t -> r
case to prevent some cases opting to go through memory.
* cse.c (fold_rtx): Add vec_select -> subreg simplification.
* rtl.c (rtvec_series_p): Define predicate to determine
whether a vector contains a linear series of integers.
* rtl.h (rtvec_series_p): Define.
* rtlanal.c (vec_series_lowpart_p): Define predicate to
determine if a vector selection is equivalent to the low part
of the vector.
* rtlanal.h (vec_series_lowpart_p): Define.
* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
Add vec_select -> subreg simplification.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/extract_zero_extend.c: Remove dump scan
for RTL pattern match.
* gcc.target/aarch64/narrow_high_combine.c: Add new tests.
* gcc.target/aarch64/simd/vmulx_laneq_f64_1.c: Update
scan-assembler regex to look for a scalar register instead of
lane 0 of a vector.
* gcc.target/aarch64/simd/vmulxd_laneq_f64_1.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_lane_f32_1.c: Likewise.
* gcc.target/aarch64/simd/vmulxs_laneq_f32_1.c: Likewise.
* gcc.target/aarch64/simd/vqdmlalh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlals_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmlslh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmlsls_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmullh_lane_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmullh_laneq_s16.c: Likewise.
* gcc.target/aarch64/simd/vqdmulls_lane_s32.c: Likewise.
* gcc.target/aarch64/simd/vqdmulls_laneq_s32.c: Likewise.
* gcc.target/aarch64/sve/dup_lane_1.c: Likewise.
* gcc.target/aarch64/sve/extract_1.c: Likewise.
* gcc.target/aarch64/sve/extract_2.c: Likewise.
* gcc.target/aarch64/sve/extract_3.c: Likewise.
* gcc.target/aarch64/sve/extract_4.c: Likewise.
* gcc.target/aarch64/sve/live_1.c: Update scan-assembler regex
cases to look for 'b' and 'h' registers instead of 'w'.
* gcc.target/arm/crypto-vsha1cq_u32.c: Update scan-assembler
regex to reflect lane 0 vector extractions being simplified
to scalar register moves.
* gcc.target/arm/crypto-vsha1h_u32.c: Likewise.
* gcc.target/arm/crypto-vsha1mq_u32.c: Likewise.
* gcc.target/arm/crypto-vsha1pq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f16.c: Extract
lane 1 as the moves for lane 0 now get optimized away.
* gcc.target/arm/mve/intrinsics/vgetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vgetq_lane_u8.c: Likewise.
|
|
[PR101008]
simplify_relational_operation callees typically return just const0_rtx
or const_true_rtx and then simplify_relational_operation attempts to fix
that up if the comparison result has vector mode, or floating mode,
or punt if it has scalar mode and vector mode operands (it doesn't know how
exactly to deal with the scalar masks).
But, simplify_logical_relational_operation has a special case, where
it attempts to fold (x < y) | (x >= y) etc. and if it determines it is
always true, it just returns const_true_rtx, without doing the dances that
simplify_relational_operation does.
That results in an ICE on the following testcase, where such folding happens
during expansion (of debug stmts into DEBUG_INSNs) and we ICE because
all of sudden a VOIDmode rtx appears where it expects a vector (V4SImode)
rtx.
The following patch fixes that by moving the adjustement into a separate
helper routine and using it from both simplify_relational_operation and
simplify_logical_relational_operation.
2021-06-11 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/101008
* simplify-rtx.c (relational_result): New function.
(simplify_logical_relational_operation,
simplify_relational_operation): Use it.
* gcc.dg/pr101008.c: New test.
|
|
This removes CC0 and all directly related infrastructure.
CC_STATUS, CC_STATUS_MDEP, CC_STATUS_MDEP_INIT, and NOTICE_UPDATE_CC
are deleted and poisoned. CC0 is only deleted (some targets use that
name for something else). HAVE_cc0 is automatically generated, and we
no longer will do that after this patch.
CC_STATUS_INIT is suggested in final.c to also be useful for ports that
are not CC0, and at least arm seems to use it for something. So I am
leaving that alone, but most targets that have it could remove it.
2021-05-04 Segher Boessenkool <segher@kernel.crashing.org>
* caller-save.c: Remove CC0.
* cfgcleanup.c: Remove CC0.
* cfgrtl.c: Remove CC0.
* combine.c: Remove CC0.
* compare-elim.c: Remove CC0.
* conditions.h: Remove CC0.
* config/h8300/h8300.h: Remove CC0.
* config/h8300/h8300-protos.h: Remove CC0.
* config/h8300/peepholes.md: Remove CC0.
* config/i386/x86-tune-sched.c: Remove CC0.
* config/m68k/m68k.c: Remove CC0.
* config/rl78/rl78.c: Remove CC0.
* config/sparc/sparc.c: Remove CC0.
* config/xtensa/xtensa.c: Remove CC0.
(gen_conditional_move): Use pc_rtx instead of cc0_rtx in a piece of
RTL where that is used as a placeholder only.
* cprop.c: Remove CC0.
* cse.c: Remove CC0.
* cselib.c: Remove CC0.
* df-problems.c: Remove CC0.
* df-scan.c: Remove CC0.
* doc/md.texi: Remove CC0. Adjust an example.
* doc/rtl.texi: Remove CC0. Adjust an example.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Remove CC0.
* emit-rtl.c: Remove CC0.
* final.c: Remove CC0.
* fwprop.c: Remove CC0.
* gcse-common.c: Remove CC0.
* gcse.c: Remove CC0.
* genattrtab.c: Remove CC0.
* genconfig.c: Remove CC0.
* genemit.c: Remove CC0.
* genextract.c: Remove CC0.
* gengenrtl.c: Remove CC0.
* genrecog.c: Remove CC0.
* haifa-sched.c: Remove CC0.
* ifcvt.c: Remove CC0.
* ira-costs.c: Remove CC0.
* ira.c: Remove CC0.
* jump.c: Remove CC0.
* loop-invariant.c: Remove CC0.
* lra-constraints.c: Remove CC0.
* lra-eliminations.c: Remove CC0.
* optabs.c: Remove CC0.
* postreload-gcse.c: Remove CC0.
* postreload.c: Remove CC0.
* print-rtl.c: Remove CC0.
* read-rtl-function.c: Remove CC0.
* reg-notes.def: Remove CC0.
* reg-stack.c: Remove CC0.
* reginfo.c: Remove CC0.
* regrename.c: Remove CC0.
* reload.c: Remove CC0.
* reload1.c: Remove CC0.
* reorg.c: Remove CC0.
* resource.c: Remove CC0.
* rtl.c: Remove CC0.
* rtl.def: Remove CC0.
* rtl.h: Remove CC0.
* rtlanal.c: Remove CC0.
* sched-deps.c: Remove CC0.
* sched-rgn.c: Remove CC0.
* shrink-wrap.c: Remove CC0.
* simplify-rtx.c: Remove CC0.
* system.h: Remove CC0. Poison NOTICE_UPDATE_CC, CC_STATUS_MDEP_INIT,
CC_STATUS_MDEP, and CC_STATUS.
* target.def: Remove CC0.
* valtrack.c: Remove CC0.
* var-tracking.c: Remove CC0.
|
|
As the test case shows, the outer mode may have a higher alignment
requirement than the inner mode here.
2021-04-27 Bernd Edlinger <bernd.edlinger@hotmail.de>
PR target/100106
* simplify-rtx.c (simplify_context::simplify_subreg): Check the
memory alignment for the outer mode.
* gcc.c-torture/compile/pr100106.c: New testcase.
|
|
are lost [PR99648]
Similarly to PR95450 done on GIMPLE, this patch punts if we try to
simplify_{gen_,}subreg from some constant into the IBM double double
IFmode (or sometimes TFmode) if the double double format wouldn't preserve
the bits. Not all values are valid in IBM double double, e.g. the format
requires that the upper double is the whole value rounded to double, and
if in some cases such as in the pr71522.c testcase with -m32 -Os -mcpu=power7
some non-floating data is copied through long double variable, we can
simplify a subreg into something that has different value.
Fixed by punting if the planned simplify_immed_subreg result doesn't
encode to bitwise identical values compared to what we were decoding.
As for the simplify_gen_subreg change, I think it would be desirable
to just avoid creating SUBREGs of constants on all targets and for all
constants, if simplify_immed_subreg simplified, fine, otherwise punt,
but as we are late in GCC11 development, the patch instead guards this
behavior on MODE_COMPOSITE_P (outermode) - i.e. only conversions to
powerpc{,64,64le} double double long double - and only for the cases where
simplify_immed_subreg was called.
2021-04-13 Jakub Jelinek <jakub@redhat.com>
PR target/99648
* simplify-rtx.c (simplify_immed_subreg): For MODE_COMPOSITE_P
outermode, return NULL if the result doesn't encode back to the
original byte sequence.
(simplify_gen_subreg): Don't create SUBREGs from constants to
MODE_COMPOSITE_P outermode.
|
|
We don't try to optimize for signed x, y (int) (x - 1U) * y + y
into x * y, we can't do that with signed x * y, because the former
is well defined for INT_MIN and -1, while the latter is not.
We could perhaps optimize it during isel or some very late optimization
where we'd turn magically flag_wrapv, but we don't do that yet.
This patch optimizes it in simplify-rtx.c, such that we can optimize
it during combine.
2021-01-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/98334
* simplify-rtx.c (simplify_context::simplify_binary_operation_1):
Optimize (X - 1) * Y + Y to X * Y or (X + 1) * Y - Y to X * Y.
* gcc.target/i386/pr98334.c: New test.
|
|
|
|
One of the recurring warts of RTL is that multiplication by a power
of 2 is represented as a MULT inside a MEM but as an ASHIFT outside
a MEM. It would obviously be better if we didn't have this kind of
context sensitivity, but it would be difficult to remove.
Currently the simplify-rtx.c routines are hard-coded for the
ASHIFT form. This means that some callers have to convert the
ASHIFTs “back” into MULTs after calling the simplify-rtx.c
routines; see fwprop.c:canonicalize_address for an example.
I think we can relieve some of the pain by wrapping the simplify-rtx.c
routines in a simple class that tracks whether the expression occurs
in a MEM or not, so that no post-processing is needed.
An obvious concern is whether passing the “this” pointer around
will slow things down or bloat the code. I can't measure any
increase in compile time after applying the patch. Sizewise,
simplify-rtx.o text increases by 2.3% in default-checking builds
and 4.1% in release-checking builds.
I realise the MULT/ASHIFT thing isn't the most palatable
reason for doing this, but I think it might be useful for
other things in future, such as using local nonzero_bits
hooks/virtual functions instead of the global hooks.
The obvious alternative would be to add a static variable
and hope that it is always updated correctly.
Later patches make use of this.
gcc/
* rtl.h (simplify_context): New class.
(simplify_unary_operation, simplify_binary_operation): Use it.
(simplify_ternary_operation, simplify_relational_operation): Likewise.
(simplify_subreg, simplify_gen_unary, simplify_gen_binary): Likewise.
(simplify_gen_ternary, simplify_gen_relational): Likewise.
(simplify_gen_subreg, lowpart_subreg): Likewise.
* simplify-rtx.c (simplify_gen_binary): Turn into a member function
of simplify_context.
(simplify_gen_unary, simplify_gen_ternary, simplify_gen_relational)
(simplify_truncation, simplify_unary_operation): Likewise.
(simplify_unary_operation_1, simplify_byte_swapping_operation)
(simplify_associative_operation, simplify_logical_relational_operation)
(simplify_binary_operation, simplify_binary_operation_series)
(simplify_distributive_operation, simplify_plus_minus): Likewise.
(simplify_relational_operation, simplify_relational_operation_1)
(simplify_cond_clz_ctz, simplify_merge_mask): Likewise.
(simplify_ternary_operation, simplify_subreg, simplify_gen_subreg)
(lowpart_subreg): Likewise.
(simplify_binary_operation_1): Likewise. Test mem_depth when
deciding whether the ASHIFT or MULT form is canonical.
(simplify_merge_mask): Use simplify_context.
|
|
gcc/ChangeLog
PR rtl-optimization/97249
* simplify-rtx.c (simplify_binary_operation_1): Simplify
vec_select of a subreg of X to a vec_select of X.
gcc/testsuite/ChangeLog
* gcc.target/i386/pr97249-1.c: New test.
|
|
The combination of several my recent nvptx patches has revealed an
interesting RTL optimization opportunity. This patch to simplify-rtx.c
simplifies (sign_extend:HI (truncate:QI (?shiftrt:HI x 8))) to just
(ashiftrt:HI x 8), as the inner shift already sets the high bits
appropriately. The equivalent zero_extend variant appears to already
be implemented in simplify_unary_operation_1.
These result from RTL expansion generating a reasonable arithmetic right
shift and truncation to char, only to then discover the backend doesn't
support QImode comparisons, so the next optab widens this result/operand
back to HImode. In this sequence the truncate and sign extend are
redundant as the original arithmetic shift has already set the high
bits appropriately. The one oddity of the patch is that it tests for
LSHIFTRT as inner shift, as simplify/combine has already canonicalized
this to a logical shift, assuming that the distinction is unimportant
following the truncatation.
2020-08-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* simplify-rtx.c (simplify_unary_operation_1) [SIGN_EXTEND]:
Simplify (sign_extend:M (truncate:N (lshiftrt:M x C))) to
(ashiftrt:M x C) when the shift sets the high bits appropriately.
|
|
The following patch avoids simplifying x-0.0 to x when -fsignaling-nans
is specified, which resolves PR rtl-optimization 61494. Indeed, running
the test program attached to that PR now reports no failures.
2020-08-02 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/61494
* simplify-rtx.c (simplify_binary_operation_1) [MINUS]: Don't
simplify x - 0.0 with -fsignaling-nans.
|
|
2020-07-23 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/96298
* simplify-rtx.c (simplify_binary_operation_1) [XOR]: Xor doesn't
distribute over xor, so (a^b)^(c^b) is not the same as (a^c)^b.
|
|
2020-06-29 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog:
* simplify-rtx.c (simplify_distributive_operation): New function
to un-distribute a binary operation of two binary operations.
(X & C) ^ (Y & C) to (X ^ Y) & C, when C is simple (i.e. a constant).
(simplify_binary_operation_1) <IOR, XOR, AND>: Call it from here
when appropriate.
(test_scalar_int_ops): New function for unit self-testing
scalar integer transformations in simplify-rtx.c.
(test_scalar_ops): Call test_scalar_int_ops for each integer mode.
(simplify_rtx_c_tests): Call test_scalar_ops.
|
|
2020-06-24 Roger Sayle <roger@nextmovesoftware.com>
Segher Boessenkool <segher@kernel.crashing.org>
* simplify-rtx.c (simplify_unary_operation_1): Simplify rotates by 0.
|
|
2020-06-24 Roger Sayle <roger@nextmovesoftware.com>
* simplify-rtx.c (simplify_unary_operation_1): Simplify
(parity (parity x)) as (parity x), i.e. PARITY is idempotent.
|
|
comparison's mode.
PR rtl-optimization/90275
PR target/94238
PR target/94144
* simplify-rtx.c (comparison_code_valid_for_mode): New function.
(simplify_logical_relational_operation): Use it.
PR target/94144
PR target/94238
* gcc.c-torture/compile/pr94144.c: New test.
* gcc.c-torture/compile/pr94238.c: New test.
|
|
This fixes a fall-out from a patch I had submitted two years ago which started
allowing simplify-rtx to fold logical right shifts by offsets a followed by b
into >> (a + b).
However this can generate inefficient code when the resulting shift count ends
up being the same as the size of the shift mode. This will create some
undefined behavior on most platforms.
This patch changes to code to truncate to 0 if the shift amount goes out of
range. Before my older patch this used to happen in combine when it saw the
two shifts. However since we combine them here combine never gets a chance to
truncate them.
The issue mostly affects GCC 8 and 9 since on 10 the back-end knows how to deal
with this shift constant but it's better to do the right thing in simplify-rtx.
Note that this doesn't take care of the Arithmetic shift where you could replace
the constant with MODE_BITS (mode) - 1, but that's not a regression so punting it.
gcc/ChangeLog:
PR rtl-optimization/91838
* simplify-rtx.c (simplify_binary_operation_1): Update LSHIFTRT case
to truncate if allowed or reject combination.
gcc/testsuite/ChangeLog:
PR rtl-optimization/91838
* g++.dg/pr91838.C: New test.
|
|
The patch caused regressions in gcc.target/sh/pr64345-1.c on
sh3-linux-gnu and gcc.target/m68k/pr39726.c on m68k-linux-gnu.
It didn't look like they would be fixable in an acceptably
non-invasive and unhacky way, so punting till future releases.
2020-01-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Revert:
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
PR rtl-optimization/87763
* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
simplification to handle subregs as well as bare regs.
* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
|
|
In the gcc.target/aarch64/lsl_asr_sbfiz.c part of this PR, we have:
Failed to match this instruction:
(set (reg:SI 95)
(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 97) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
If we perform the natural simplification to:
(set (reg:SI 95)
(ashift:SI (sign_extract:SI (reg:SI 97)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
then the pattern matches. And it turns out that we do have a
simplification like that already, but it would only kick in for
extractions from a reg, not a subreg. E.g.:
(set (reg:SI 95)
(ashift:SI (subreg:SI (sign_extract:DI (reg:DI X)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
would simplify to:
(set (reg:SI 95)
(ashift:SI (sign_extract:SI (subreg:SI (reg:DI X) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))
IMO the subreg case is even more obviously a simplification
than the bare reg case, since the net effect is to remove
either one or two subregs, rather than simply change the
position of a subreg/truncation.
However, doing that regressed gcc.dg/tree-ssa/pr64910-2.c
for -m32 on x86_64-linux-gnu, because we could then simplify
a :HI zero_extract to a :QI one. The associated *testqi_ext_3
pattern did already seem to want to handle QImode extractions:
"ix86_match_ccmode (insn, CCNOmode)
&& ((TARGET_64BIT && GET_MODE (operands[2]) == DImode)
|| GET_MODE (operands[2]) == SImode
|| GET_MODE (operands[2]) == HImode
|| GET_MODE (operands[2]) == QImode)
but I'm not sure how often the QI case would trigger in practice,
since the zero_extract mode was restricted to HI and above. I checked
the other x86 patterns and couldn't see any other instances of this.
2020-01-28 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR rtl-optimization/87763
* simplify-rtx.c (simplify_truncation): Extend sign/zero_extract
simplification to handle subregs as well as bare regs.
* config/i386/i386.md (*testqi_ext_3): Match QI extracts too.
|
|
[PR93376]
The following patch makes sure we punt in the 3 spots if precision is above
MAX_BITSIZE_MODE_ANY_INT.
2020-01-24 Jakub Jelinek <jakub@redhat.com>
PR target/93376
* simplify-rtx.c (simplify_const_unary_operation,
simplify_const_binary_operation): Punt for mode precision above
MAX_BITSIZE_MODE_ANY_INT.
|
|
From-SVN: r279813
|
|
compare)
PR target/92908
* simplify-rtx.c (simplify_relational_operation): Punt for vector
cmp_mode and scalar mode, if simplify_relational_operation returned
const_true_rtx.
(simplify_const_relational_operation): Change VOID_mode in function
comment to VOIDmode.
* gcc.target/i386/avx512bw-pr92908.c: New test.
From-SVN: r279369
|
|
To restore powerpc bootstrap.
2019-11-19 Richard Sandiford <richard.sandiford@arm.com>
gcc/
Revert:
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
* cse.c (cse_insn): Delete no-op register moves too.
* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
Take a second comparison to control the value for NE.
(mask_to_comparison): Handle unsigned comparisons.
(simplify_logical_relational_operation): Likewise. Update call
to comparison_to_mask. Handle AND if !HONOR_NANs.
(simplify_binary_operation_1): Call the above for AND too.
gcc/testsuite/
Revert:
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
From-SVN: r278455
|
|
SVE has two composite conditions:
pmore == at least one bit set && last bit clear
plast == no bits set || last bit set
So in general we generate them from:
A: CC = test bits
B: reg1 = first condition
C: CC = test bits
D: reg2 = second condition
E: result = (reg1 op reg2) where op is || or &&
To fold all this into a single test, we need to be able to remove
the redundant C (the cse.c patch) and then fold B, D and E down to
a single condition (the simplify-rtx.c patch).
The underlying conditions are unsigned, so the simplify-rtx.c part needs
to support both unsigned comparisons and AND. However, to avoid opening
the can of worms that is ANDing FP comparisons for unordered inputs,
I've restricted the new AND handling to cases in which NaNs can be
ignored. I think this is still a strict extension of what we have now,
it just doesn't go as far as it could. Going further would need an
entirely different set of testcases so I think would make more sense
as separate work.
2019-11-18 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* cse.c (cse_insn): Delete no-op register moves too.
* simplify-rtx.c (comparison_to_mask): Handle unsigned comparisons.
Take a second comparison to control the value for NE.
(mask_to_comparison): Handle unsigned comparisons.
(simplify_logical_relational_operation): Likewise. Update call
to comparison_to_mask. Handle AND if !HONOR_NANs.
(simplify_binary_operation_1): Call the above for AND too.
gcc/testsuite/
* gcc.target/aarch64/sve/acle/asm/ptest_pmore.c: New test.
From-SVN: r278411
|
|
This introduces simplify_logical_relational_operation. Currently the
only thing implemented it can simplify is the IOR of two CONDs of the
same arguments.
* simplify-rtx.c (comparison_to_mask): New function.
(mask_to_comparison): New function.
(simplify_logical_relational_operation): New function.
(simplify_binary_operation_1): Call
simplify_logical_relational_operation.
From-SVN: r277931
|
|
This patch generalises some neg_const_int-based rtx simplifications
so that they handle all CONST_SCALAR_INTs and also CONST_POLY_INT.
This actually simplifies things a bit, since we no longer have
to treat HOST_WIDE_INT_MIN specially.
This is tested by later SVE patches.
2019-09-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* simplify-rtx.c (neg_const_int): Replace with...
(neg_poly_int_rtx): ...this new function.
(simplify_binary_operation_1): Extend (minus x C) -> (plus X -C)
to all CONST_SCALAR_INTs and to CONST_POLY_INT.
(simplify_plus_minus): Likewise for constant terms here.
From-SVN: r276017
|
|
This patch rewrites the way simplify_subreg handles constants.
It uses similar native_encode/native_decode routines to the
tree-level handling of VIEW_CONVERT_EXPR, meaning that we can
move between rtx constants and the target memory image of them.
The main point of this patch is to support subregs of constant-length
vectors for VLA vectors, beyond the very simple cases that were already
handled. Many of the new tests failed before the patch for variable-
length vectors.
The boolean side is tested more by the upcoming SVE ACLE work.
2019-09-19 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* defaults.h (TARGET_UNIT): New macro.
(target_unit): New type.
* rtl.h (native_encode_rtx, native_decode_rtx)
(native_decode_vector_rtx, subreg_size_lsb): Declare.
(subreg_lsb_1): Turn into an inline wrapper around subreg_size_lsb.
* rtlanal.c (subreg_lsb_1): Delete.
(subreg_size_lsb): New function.
* simplify-rtx.c: Include rtx-vector-builder.h
(simplify_immed_subreg): Delete.
(native_encode_rtx, native_decode_vector_rtx, native_decode_rtx)
(simplify_const_vector_byte_offset, simplify_const_vector_subreg): New
functions.
(simplify_subreg): Use them.
(test_vector_subregs_modes, test_vector_subregs_repeating)
(test_vector_subregs_fore_back, test_vector_subregs_stepped)
(test_vector_subregs): New functions.
(test_vector_ops): Call test_vector_subregs for integer vector
modes with at least 2 elements.
From-SVN: r275959
|
|
This patch uses the constant vector encoding scheme to handle
more cases of a VEC_DUPLICATE of another vector. Duplicating
any fixed-length vector is fine, and duplicating a variable-length
vector is OK as long as that vector is also a duplicate of a
fixed-length sequence.
Other cases fell through to:
if (VECTOR_MODE_P (mode) && GET_CODE (op) == CONST_VECTOR)
which was only expecting to deal with elementwise operations.
2019-07-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* simplify-rtx.c (simplify_const_unary_operation): Fold a
VEC_DUPLICATE of a fixed-length vector even if the result
is variable-length. Likewise fold a duplicate of a
variable-length vector if the variable-length vector is
itself a duplicate of a fixed-length sequence.
(test_vector_ops_duplicate): Test more cases.
From-SVN: r273868
|
|
This patch extends the tree-level folding of variable-length vectors
so that it can also be used on rtxes. The first step is to move
the tree_vector_builder new_unary/binary_operator routines to the
parent vector_builder class (which in turn means adding a new
template parameter). The second step is to make simplify-rtx.c
use a direct rtx analogue of the VECTOR_CST handling in fold-const.c.
2019-07-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* vector-builder.h (vector_builder): Add a shape template parameter.
(vector_builder::new_unary_operation): New function, generalizing
the old tree_vector_builder function.
(vector_builder::new_binary_operation): Likewise.
(vector_builder::binary_encoded_nelts): Likewise.
* int-vector-builder.h (int_vector_builder): Update template
parameters to vector_builder.
(int_vector_builder::shape_nelts): New function.
* rtx-vector-builder.h (rtx_vector_builder): Update template
parameters to vector_builder.
(rtx_vector_builder::shape_nelts): New function.
(rtx_vector_builder::nelts_of): Likewise.
(rtx_vector_builder::npatterns_of): Likewise.
(rtx_vector_builder::nelts_per_pattern_of): Likewise.
* tree-vector-builder.h (tree_vector_builder): Update template
parameters to vector_builder.
(tree_vector_builder::shape_nelts): New function.
(tree_vector_builder::nelts_of): Likewise.
(tree_vector_builder::npatterns_of): Likewise.
(tree_vector_builder::nelts_per_pattern_of): Likewise.
* tree-vector-builder.c (tree_vector_builder::new_unary_operation)
(tree_vector_builder::new_binary_operation): Delete.
(tree_vector_builder::binary_encoded_nelts): Likewise.
* simplify-rtx.c: Include rtx-vector-builder.h.
(distributes_over_addition_p): New function.
(simplify_const_unary_operation)
(simplify_const_binary_operation): Generalize handling of vector
constants to include variable-length vectors.
(test_vector_ops_series): Add more tests.
From-SVN: r273867
|
|
than GET_MODE_BITSIZE to better handle partial...
2019-07-09 John Darrington <john@darrington.wattle.id.au>
* simplify-rtx.c (simplify_unary_operation_1): Use GET_MODE_PRECISION
rather than GET_MODE_BITSIZE to better handle partial integer modes.
From-SVN: r273312
|
|
2019-07-04 Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
PR target/88833
* fwprop.c (reg_single_def_p): New function.
(propagate_rtx_1): Add unconditional else inside RTX_EXTRA case.
(forward_propagate_into): New parameter reg_prop_only
with default value false.
Propagate def's src into loop only if SET_SRC and SET_DEST
of def_set have single definitions.
Likewise if reg_prop_only is set to true.
(fwprop): New param fwprop_addr_p.
Integrate fwprop_addr into fwprop.
(fwprop_addr): Remove.
(pass_rtl_fwprop_addr::execute): Call fwprop with arg set
to true.
(pass_rtl_fwprop::execute): Call fwprop with arg set to false.
* simplify-rtx.c (simplify_subreg): Add case for vector comparison.
* config/i386/sse.md (UNSPEC_BLENDV): Adjust pattern.
testsuite/
* gfortran.dg/pr88833.f90: New test.
From-SVN: r273040
|
|
PR rtl-optimization/89445
* simplify-rtx.c (simplify_ternary_operation): Don't use
simplify_merge_mask on operands that may trap.
* rtlanal.c (may_trap_p_1): Use FLOAT_MODE_P instead of
SCALAR_FLOAT_MODE_P checks. For integral division by zero, if
second operand is CONST_VECTOR, check if any element could be zero.
Don't expect traps for VEC_{MERGE,SELECT,CONCAT,DUPLICATE} unless
their operands can trap.
* gcc.target/i386/avx512f-pr89445.c: New test.
From-SVN: r269176
|
|
2019-01-09 Sandra Loosemore <sandra@codesourcery.com>
PR other/16615 [1/5]
contrib/
* mklog: Mechanically replace "can not" with "cannot".
gcc/
* Makefile.in: Mechanically replace "can not" with "cannot".
* alias.c: Likewise.
* builtins.c: Likewise.
* calls.c: Likewise.
* cgraph.c: Likewise.
* cgraph.h: Likewise.
* cgraphclones.c: Likewise.
* cgraphunit.c: Likewise.
* combine-stack-adj.c: Likewise.
* combine.c: Likewise.
* common/config/i386/i386-common.c: Likewise.
* config/aarch64/aarch64.c: Likewise.
* config/alpha/sync.md: Likewise.
* config/arc/arc.c: Likewise.
* config/arc/predicates.md: Likewise.
* config/arm/arm-c.c: Likewise.
* config/arm/arm.c: Likewise.
* config/arm/arm.h: Likewise.
* config/arm/arm.md: Likewise.
* config/arm/cortex-r4f.md: Likewise.
* config/csky/csky.c: Likewise.
* config/csky/csky.h: Likewise.
* config/darwin-f.c: Likewise.
* config/epiphany/epiphany.md: Likewise.
* config/i386/i386.c: Likewise.
* config/i386/sol2.h: Likewise.
* config/m68k/m68k.c: Likewise.
* config/mcore/mcore.h: Likewise.
* config/microblaze/microblaze.md: Likewise.
* config/mips/20kc.md: Likewise.
* config/mips/sb1.md: Likewise.
* config/nds32/nds32.c: Likewise.
* config/nds32/predicates.md: Likewise.
* config/pa/pa.c: Likewise.
* config/rs6000/e300c2c3.md: Likewise.
* config/rs6000/rs6000.c: Likewise.
* config/s390/s390.h: Likewise.
* config/sh/sh.c: Likewise.
* config/sh/sh.md: Likewise.
* config/spu/vmx2spu.h: Likewise.
* cprop.c: Likewise.
* dbxout.c: Likewise.
* df-scan.c: Likewise.
* doc/cfg.texi: Likewise.
* doc/extend.texi: Likewise.
* doc/fragments.texi: Likewise.
* doc/gty.texi: Likewise.
* doc/invoke.texi: Likewise.
* doc/lto.texi: Likewise.
* doc/md.texi: Likewise.
* doc/objc.texi: Likewise.
* doc/rtl.texi: Likewise.
* doc/tm.texi: Likewise.
* dse.c: Likewise.
* emit-rtl.c: Likewise.
* emit-rtl.h: Likewise.
* except.c: Likewise.
* expmed.c: Likewise.
* expr.c: Likewise.
* fold-const.c: Likewise.
* genautomata.c: Likewise.
* gimple-fold.c: Likewise.
* hard-reg-set.h: Likewise.
* ifcvt.c: Likewise.
* ipa-comdats.c: Likewise.
* ipa-cp.c: Likewise.
* ipa-devirt.c: Likewise.
* ipa-fnsummary.c: Likewise.
* ipa-icf.c: Likewise.
* ipa-inline-transform.c: Likewise.
* ipa-inline.c: Likewise.
* ipa-polymorphic-call.c: Likewise.
* ipa-profile.c: Likewise.
* ipa-prop.c: Likewise.
* ipa-pure-const.c: Likewise.
* ipa-reference.c: Likewise.
* ipa-split.c: Likewise.
* ipa-visibility.c: Likewise.
* ipa.c: Likewise.
* ira-build.c: Likewise.
* ira-color.c: Likewise.
* ira-conflicts.c: Likewise.
* ira-costs.c: Likewise.
* ira-int.h: Likewise.
* ira-lives.c: Likewise.
* ira.c: Likewise.
* ira.h: Likewise.
* loop-invariant.c: Likewise.
* loop-unroll.c: Likewise.
* lower-subreg.c: Likewise.
* lra-assigns.c: Likewise.
* lra-constraints.c: Likewise.
* lra-eliminations.c: Likewise.
* lra-lives.c: Likewise.
* lra-remat.c: Likewise.
* lra-spills.c: Likewise.
* lra.c: Likewise.
* lto-cgraph.c: Likewise.
* lto-streamer-out.c: Likewise.
* postreload-gcse.c: Likewise.
* predict.c: Likewise.
* profile-count.h: Likewise.
* profile.c: Likewise.
* recog.c: Likewise.
* ree.c: Likewise.
* reload.c: Likewise.
* reload1.c: Likewise.
* reorg.c: Likewise.
* resource.c: Likewise.
* rtl.def: Likewise.
* rtl.h: Likewise.
* rtlanal.c: Likewise.
* sched-deps.c: Likewise.
* sched-ebb.c: Likewise.
* sched-rgn.c: Likewise.
* sel-sched-ir.c: Likewise.
* sel-sched.c: Likewise.
* shrink-wrap.c: Likewise.
* simplify-rtx.c: Likewise.
* symtab.c: Likewise.
* target.def: Likewise.
* toplev.c: Likewise.
* tree-call-cdce.c: Likewise.
* tree-cfg.c: Likewise.
* tree-complex.c: Likewise.
* tree-core.h: Likewise.
* tree-eh.c: Likewise.
* tree-inline.c: Likewise.
* tree-loop-distribution.c: Likewise.
* tree-nrv.c: Likewise.
* tree-profile.c: Likewise.
* tree-sra.c: Likewise.
* tree-ssa-alias.c: Likewise.
* tree-ssa-dce.c: Likewise.
* tree-ssa-dom.c: Likewise.
* tree-ssa-forwprop.c: Likewise.
* tree-ssa-loop-im.c: Likewise.
* tree-ssa-loop-ivcanon.c: Likewise.
* tree-ssa-loop-ivopts.c: Likewise.
* tree-ssa-loop-niter.c: Likewise.
* tree-ssa-phionlycprop.c: Likewise.
* tree-ssa-phiopt.c: Likewise.
* tree-ssa-propagate.c: Likewise.
* tree-ssa-threadedge.c: Likewise.
* tree-ssa-threadupdate.c: Likewise.
* tree-ssa-uninit.c: Likewise.
* tree-ssanames.c: Likewise.
* tree-streamer-out.c: Likewise.
* tree.c: Likewise.
* tree.h: Likewise.
* vr-values.c: Likewise.
gcc/ada/
* exp_ch9.adb: Mechanically replace "can not" with "cannot".
* libgnat/s-regpat.ads: Likewise.
* par-ch4.adb: Likewise.
* set_targ.adb: Likewise.
* types.ads: Likewise.
gcc/cp/
* cp-tree.h: Mechanically replace "can not" with "cannot".
* parser.c: Likewise.
* pt.c: Likewise.
gcc/fortran/
* class.c: Mechanically replace "can not" with "cannot".
* decl.c: Likewise.
* expr.c: Likewise.
* gfc-internals.texi: Likewise.
* intrinsic.texi: Likewise.
* invoke.texi: Likewise.
* io.c: Likewise.
* match.c: Likewise.
* parse.c: Likewise.
* primary.c: Likewise.
* resolve.c: Likewise.
* symbol.c: Likewise.
* trans-array.c: Likewise.
* trans-decl.c: Likewise.
* trans-intrinsic.c: Likewise.
* trans-stmt.c: Likewise.
gcc/go/
* go-backend.c: Mechanically replace "can not" with "cannot".
* go-gcc.cc: Likewise.
gcc/lto/
* lto-partition.c: Mechanically replace "can not" with "cannot".
* lto-symtab.c: Likewise.
* lto.c: Likewise.
gcc/objc/
* objc-act.c: Mechanically replace "can not" with "cannot".
libbacktrace/
* backtrace.h: Mechanically replace "can not" with "cannot".
libgcc/
* config/c6x/libunwind.S: Mechanically replace "can not" with
"cannot".
* config/tilepro/atomic.h: Likewise.
* config/vxlib-tls.c: Likewise.
* generic-morestack-thread.c: Likewise.
* generic-morestack.c: Likewise.
* mkmap-symver.awk: Likewise.
libgfortran/
* caf/single.c: Mechanically replace "can not" with "cannot".
* io/unit.c: Likewise.
libobjc/
* class.c: Mechanically replace "can not" with "cannot".
* objc/runtime.h: Likewise.
* sendmsg.c: Likewise.
liboffloadmic/
* include/coi/common/COIResult_common.h: Mechanically replace
"can not" with "cannot".
* include/coi/source/COIBuffer_source.h: Likewise.
libstdc++-v3/
* include/ext/bitmap_allocator.h: Mechanically replace "can not"
with "cannot".
From-SVN: r267783
|
|
From-SVN: r267494
|
|
simplify-rtx.c:2153 since r264688)
PR rtl-optimization/87918
* simplify-rtx.c (simplify_merge_mask): For COMPARISON_P, use
simplify_gen_relational rather than simplify_gen_binary.
* gcc.target/i386/pr87918.c: New test.
From-SVN: r266062
|
|
2018-11-06 Richard Biener <rguenther@suse.de>
PR middle-end/18041
* simplify-rtx.c (simplify_binary_operation_1): Add pattern
matching bitfield insertion.
* gcc.target/i386/pr18041-1.c: New testcase.
* gcc.target/i386/pr18041-2.c: Likewise.
From-SVN: r265829
|
|
Since mask of vec_merge is in HOST_WIDE_INT, HOST_BITS_PER_WIDE_INT is
the maximum number of vector elements.
* simplify-rtx.c (simplify_subreg): Limit mask of vec_merge to
HOST_BITS_PER_WIDE_INT.
(test_vector_ops_duplicate): Likewise.
From-SVN: r265290
|
|
Simplify
(subreg (vec_merge (X)
(vector)
(const_int ((1 << N) | M)))
(N * sizeof (outermode)))
to
(subreg (X) (N * sizeof (outermode)))
* simplify-rtx.c (simplify_subreg): Call simplify_gen_subreg
to simplify subreg of vec_merge.
From-SVN: r265267
|
|
We can simplify
(subreg (vec_merge (vec_duplicate X)
(vector)
(const_int ((1 << N) | M)))
(N * sizeof (X)))
to X when mode of X is the same as of mode of subreg.
gcc/
PR target/87537
* simplify-rtx.c (simplify_subreg): Simplify subreg of vec_merge
of vec_duplicate.
(test_vector_ops_duplicate): Add test for a scalar subreg of a
VEC_MERGE of a VEC_DUPLICATE.
gcc/testsuite/
PR target/87537
* gcc.target/i386/pr87537-1.c: New test.
From-SVN: r265260
|
|
This patch was part of the original patch we acquired from Honza and Martin.
It simplifies nested vec_merge operations using the same mask.
Self-tests are included.
2018-09-28 Andrew Stubbs <ams@codesourcery.com>
Jan Hubicka <jh@suse.cz>
Martin Jambor <mjambor@suse.cz>
* simplify-rtx.c (simplify_merge_mask): New function.
(simplify_ternary_operation): Use it, also see if VEC_MERGEs with the
same masks are used in op1 or op2.
(test_vec_merge): New function.
(test_vector_ops): Call test_vec_merge.
Co-Authored-By: Jan Hubicka <jh@suse.cz>
Co-Authored-By: Martin Jambor <mjambor@suse.cz>
From-SVN: r264688
|
|
The vec_select operator is documented to require a const_int for the lane
selector operand, but GCN has an instruction that can select the lane at
runtime, so it seems reasonable to remove this restriction.
This patch simply replaces assertions that the operand is constant with early
exits from the optimizers. I think it's reasonable that vec_select with a
non-constant operand cannot be optimized, yet.
Also included is the necessary documentation tweak.
2018-09-19 Andrew Stubbs <ams@codesourcery.com>
gcc/
* doc/rtl.texi: Adjust vec_select description.
* simplify-rtx.c (simplify_binary_operation_1): Allow VEC_SELECT to use
non-constant selectors.
From-SVN: r264423
|
|
* tree-vrp.c (vrp_int_const_binop): Change overflow type to
overflow_type.
(combine_bound): Use wide-int overflow calculation instead of
rolling our own.
* calls.c (maybe_warn_alloc_args_overflow): Change overflow type to
overflow_type.
* fold-const.c (int_const_binop_2): Same.
(extract_muldiv_1): Same.
(fold_div_compare): Same.
(fold_abs_const): Same.
* match.pd: Same.
* poly-int.h (add): Same.
(sub): Same.
(neg): Same.
(mul): Same.
* predict.c (predict_iv_comparison): Same.
* profile-count.c (slow_safe_scale_64bit): Same.
* simplify-rtx.c (simplify_const_binary_operation): Same.
* tree-chrec.c (tree_fold_binomial): Same.
* tree-data-ref.c (split_constant_offset_1): Same.
* tree-if-conv.c (idx_within_array_bound): Same.
* tree-scalar-evolution.c (iv_can_overflow_p): Same.
* tree-ssa-phiopt.c (minmax_replacement): Same.
* tree-vect-loop.c (is_nonwrapping_integer_induction): Same.
* tree-vect-stmts.c (vect_truncate_gather_scatter_offset): Same.
* vr-values.c (vr_values::adjust_range_with_scev): Same.
* wide-int.cc (wi::add_large): Same.
(wi::mul_internal): Same.
(wi::sub_large): Same.
(wi::divmod_internal): Same.
* wide-int.h: Change overflow type to overflow_type for neg, add,
mul, smul, umul, div_trunc, div_floor, div_ceil, div_round,
mod_trunc, mod_ceil, mod_round, add_large, sub_large,
mul_internal, divmod_internal.
(overflow_type): New enum.
(accumulate_overflow): New.
cp/
* decl.c (build_enumerator): Change overflow type to overflow_type.
* init.c (build_new_1): Same.
From-SVN: r262494
|
|
This patch generalises various places that used hwi rtx accessors so
that they can handle poly_ints instead. In many cases these changes
are by inspection rather than because something had shown them to be
necessary.
2018-06-12 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* poly-int.h (can_div_trunc_p): Add new overload in which all values
are poly_ints.
* alias.c (get_addr): Extend CONST_INT handling to poly_int_rtx_p.
(memrefs_conflict_p): Likewise.
(init_alias_analysis): Likewise.
* cfgexpand.c (expand_debug_expr): Likewise.
* combine.c (combine_simplify_rtx, force_int_to_mode): Likewise.
* cse.c (fold_rtx): Likewise.
* explow.c (adjust_stack, anti_adjust_stack): Likewise.
* expr.c (emit_block_move_hints): Likewise.
(clear_storage_hints, push_block, emit_push_insn): Likewise.
(store_expr_with_bounds, reduce_to_bit_field_precision): Likewise.
(emit_group_load_1): Use rtx_to_poly_int64 for group offsets.
(emit_group_store): Likewise.
(find_args_size_adjust): Use strip_offset. Use rtx_to_poly_int64
to read the PRE/POST_MODIFY increment.
* calls.c (store_one_arg): Use strip_offset.
* rtlanal.c (rtx_addr_can_trap_p_1): Extend CONST_INT handling to
poly_int_rtx_p.
(set_noop_p): Use rtx_to_poly_int64 for the elements selected
by a VEC_SELECT.
* simplify-rtx.c (avoid_constant_pool_reference): Use strip_offset.
(simplify_binary_operation_1): Extend CONST_INT handling to
poly_int_rtx_p.
* var-tracking.c (compute_cfa_pointer): Take a poly_int64 rather
than a HOST_WIDE_INT.
(hard_frame_pointer_adjustment): Change from HOST_WIDE_INT to
poly_int64.
(adjust_mems, add_stores): Update accodingly.
(vt_canonicalize_addr): Track polynomial offsets.
(emit_note_insn_var_location): Likewise.
(vt_add_function_parameter): Likewise.
(vt_initialize): Likewise.
From-SVN: r261530
|
|
it is not useful
In the testcase in this patch we create an SLP vector with only two
elements. Our current vector initialisation code will first duplicate
the first element to both lanes, then overwrite the top lane with a new
value.
This duplication can be clunky and wasteful.
Better would be to simply use the fact that we will always be
overwriting the remaining bits, and simply move the first element to the corrcet
place (implicitly zeroing all other bits).
This reduces the code generation for this case, and can allow more
efficient addressing modes, and other second order benefits for AArch64
code which has been vectorized to V2DI mode.
Note that the change is generic enough to catch the case for any vector
mode, but is expected to be most useful for 2x64-bit vectorization.
Unfortunately, on its own, this would cause failures in
gcc.target/aarch64/load_v2vec_lanes_1.c and
gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more
vec_merge and vec_duplicate for their simplifications to apply. To fix
this,
add a special case to the AArch64 code if we are loading from two memory
addresses, and use the load_pair_lanes patterns directly.
We also need a new pattern in simplify-rtx.c:simplify_ternary_operation
to catch:
(vec_merge:OUTER
(vec_duplicate:OUTER x:INNER)
(subreg:OUTER y:INNER 0)
(const_int N))
And simplify it to:
(vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x)
This is similar to the existing patterns which are tested in this
function, without requiring the second operand to also be a vec_duplicate.
* config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify
code generation for cases where splatting a value is not useful.
* simplify-rtx.c (simplify_ternary_operation): Simplify
vec_merge across a vec_duplicate and a paradoxical subreg forming a vector
mode to a vec_concat.
* gcc.target/aarch64/vect-slp-dup.c: New.
Co-Authored-By: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
From-SVN: r260309
|
|
config/i386/i386.c:13810 with -Og -fgcse)
PR middle-end/85414
* simplify-rtx.c (simplify_unary_operation_1) <case SIGN_EXTEND,
case ZERO_EXTEND>: Pass SUBREG_REG (op) rather than op to
gen_lowpart_no_emit.
From-SVN: r259649
|
|
-fno-tree-ccp -fno-tree-copy-prop)
PR rtl-optimization/85376
* simplify-rtx.c (simplify_const_unary_operation): For CLZ and CTZ and
zero op0, if C?Z_DEFINED_VALUE_AT_ZERO is false, return NULL_RTX
instead of a specific value.
* gcc.dg/pr85376.c: New test.
From-SVN: r259377
|
|
simplify_const_unary_operation, at simplify-rtx.c:1731)
PR rtl-optimization/84989
* simplify-rtx.c (simplify_unary_operation_1): Don't try to simplify
VEC_DUPLICATE with scalar result mode.
* gcc.target/i386/pr84989.c: New test.
From-SVN: r258709
|
|
simplify_binary_operation_1, at simplify-rtx.c:3302)
PR target/83930
* simplify-rtx.c (simplify_binary_operation_1) <case UMOD>: Use
UINTVAL (trueop1) instead of INTVAL (op1).
* gcc.dg/pr83930.c: New test.
From-SVN: r256915
|