Age | Commit message (Collapse) | Author | Files | Lines |
|
This C++ify cond_if_else_store_replacement by using range fors
and changing using a std::pair instead of 2 vecs.
I had a hard time understanding the code when there was 2 vecs
so having a vec of a pair makes it easier to understand the relationship
between the 2.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_if_else_store_replacement): Use
range fors and use one vec for then/else stores instead of 2.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
While trying to debug PR 116747, I noticed there was no dump
saying what was done. So this adds the debug dump and it helps
debug what is going on in PR 116747 too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Add debug dump.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The heuristics for factoring out with a constant checks that the assignment statement
is the last statement of the basic block but sometimes there is a predicate or a nop statement
after the assignment. Rejecting this case does not make sense since both predicates and nop
statements are removed and don't contribute any instructions. So we should skip over them
when checking if the assignment statement was the last statement in the basic block.
phi-opt-factor-1.c's f0 is such an example where it should catch it at phiopt1 (before predicates are removed)
and should happen in a similar way as f1 (which uses a temporary variable rather than return).
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116699
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Skip over nop/predicates
for seeing the assignment is the last statement.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-factor-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
While working on a different patch, I noticed the heuristics were not
doing the right thing if there was statements before the NOP/PREDICTs.
(LABELS don't have other statements before them).
This fixes that oversight which was added in r15-3334-gceda727dafba6e.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Instead
of just ignorning a NOP/PREDICT, skip over them before checking
the heuristics.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
seperate function
When r14-303-gb9fedabe381cce was done, it was missed that some of the common parts could
be done in a template and a lambda could be used. This patch implements that. This new
function can be used later on to implement a simple ifcvt pass.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (execute_over_cond_phis): New template function,
moved the common parts from pass_phiopt::execute/pass_cselim::execute.
(pass_phiopt::execute): Move the functon specific parts of the loop
into an lamdba.
(pass_cselim::execute): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This converts the uses of PHI_RESULT in phiopt to be gimple_phi_result
instead. Since there was already a mismatch of uses here, it
would be good to use prefered one (gimple_phi_result) instead.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116643
gcc/ChangeLog:
* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): s/PHI_RESULT/gimple_phi_result/.
(factor_out_conditional_operation): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_store_replacement): Likewise.
(cond_if_else_store_replacement_1): Likewise.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
factor_out_conditional_operation
This small cleanup removes a redundant check for gimple_assign_cast_p and reformats
based on that. Also changes the if statement that checks if the integral type and the
check to see if the constant fits into the new type such that it returns null
and reformats based on that.
Also moves the check for has_single_use earlier so it is less complex still a cheaper
check than some of the others (like the check on the integer side).
This was noticed when adding a few new things to factor_out_conditional_operation
but those are not ready to submit yet.
Note there are no functional difference with this change.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the has_single_use
checks much earlier. Remove redundant check for gimple_assign_cast_p.
Change around the check if the integral consts fits into the new type.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
The heurstics that was added for PR71016, try to search to see
if the conversion was being moved away from its definition. The problem
is the heurstics would stop if there was a non GIMPLE_ASSIGN (and already ignores
debug statements) and in this case we would have a GIMPLE_LABEL that was not
being ignored. So we should need to ignore GIMPLE_NOP, GIMPLE_LABEL and GIMPLE_PREDICT.
Note this is now similar to how gimple_empty_block_p behaves.
Note this fixes the wrong code that was reported by moving the VCE (conversion) out before
the phiopt/match could convert it into an bit_ior and move the VCE out with the VCE being
conditionally valid.
Bootstrapped and tested on x86_64-linux-gnu.
Also built and tested for aarch64-linux-gnu.
PR tree-optimization/116098
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Ignore
nops, labels and predicts for heuristic for conversion with a constant.
gcc/testsuite/ChangeLog:
* c-c++-common/torture/pr116098-1.c: New test.
* gcc.target/aarch64/csel-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
As preliminary work towards an overhaul of how optinfo_items
interact with dump_pretty_printer, replace uses of optinfo_item * with
std::unique_ptr<optinfo_item> to make ownership clearer.
No functional change intended.
gcc/ChangeLog:
* config/aarch64/aarch64.cc: Define INCLUDE_MEMORY.
* config/arm/arm.cc: Likewise.
* config/i386/i386.cc: Likewise.
* config/loongarch/loongarch.cc: Likewise.
* config/riscv/riscv-vector-costs.cc: Likewise.
* config/riscv/riscv.cc: Likewise.
* config/rs6000/rs6000.cc: Likewise.
* dump-context.h (dump_context::emit_item): Convert "item" param
from * to const &.
(dump_pretty_printer::stash_item): Convert "item" param from
optinfo_ * to std::unique_ptr<optinfo_item>.
(dump_pretty_printer::emit_item): Likewise.
* dumpfile.cc: Include "make-unique.h".
(make_item_for_dump_gimple_stmt): Replace uses of optinfo_item *
with std::unique_ptr<optinfo_item>.
(dump_context::dump_gimple_stmt): Likewise.
(make_item_for_dump_gimple_expr): Likewise.
(dump_context::dump_gimple_expr): Likewise.
(make_item_for_dump_generic_expr): Likewise.
(dump_context::dump_generic_expr): Likewise.
(make_item_for_dump_symtab_node): Likewise.
(dump_pretty_printer::emit_items): Likewise.
(dump_pretty_printer::emit_any_pending_textual_chunks): Likewise.
(dump_pretty_printer::emit_item): Likewise.
(dump_pretty_printer::stash_item): Likewise.
(dump_pretty_printer::decode_format): Likewise.
(dump_context::dump_printf_va): Fix overlong line.
(make_item_for_dump_dec): Replace uses of optinfo_item * with
std::unique_ptr<optinfo_item>.
(dump_context::dump_dec): Likewise.
(dump_context::dump_symtab_node): Likewise.
(dump_context::begin_scope): Likewise.
(dump_context::emit_item): Likewise.
* gimple-loop-interchange.cc: Define INCLUDE_MEMORY.
* gimple-loop-jam.cc: Likewise.
* gimple-loop-versioning.cc: Likewise.
* graphite-dependences.cc: Likewise.
* graphite-isl-ast-to-gimple.cc: Likewise.
* graphite-optimize-isl.cc: Likewise.
* graphite-poly.cc: Likewise.
* graphite-scop-detection.cc: Likewise.
* graphite-sese-to-poly.cc: Likewise.
* graphite.cc: Likewise.
* opt-problem.cc: Likewise.
* optinfo.cc (optinfo::add_item): Convert "item" param from
optinfo_ * to std::unique_ptr<optinfo_item>.
(optinfo::emit_for_opt_problem): Update for change to
dump_context::emit_item.
* optinfo.h: Add #error to fail immediately if INCLUDE_MEMORY
wasn't defined, rather than fail to find std::unique_ptr.
(optinfo::add_item): Convert "item" param from optinfo_ * to
std::unique_ptr<optinfo_item>.
* sese.cc: Define INCLUDE_MEMORY.
* targhooks.cc: Likewise.
* tree-data-ref.cc: Likewise.
* tree-if-conv.cc: Likewise.
* tree-loop-distribution.cc: Likewise.
* tree-parloops.cc: Likewise.
* tree-predcom.cc: Likewise.
* tree-ssa-live.cc: Likewise.
* tree-ssa-loop-ivcanon.cc: Likewise.
* tree-ssa-loop-ivopts.cc: Likewise.
* tree-ssa-loop-prefetch.cc: Likewise.
* tree-ssa-loop-unswitch.cc: Likewise.
* tree-ssa-phiopt.cc: Likewise.
* tree-ssa-threadbackward.cc: Likewise.
* tree-ssa-threadupdate.cc: Likewise.
* tree-vect-data-refs.cc: Likewise.
* tree-vect-generic.cc: Likewise.
* tree-vect-loop-manip.cc: Likewise.
* tree-vect-loop.cc: Likewise.
* tree-vect-patterns.cc: Likewise.
* tree-vect-slp-patterns.cc: Likewise.
* tree-vect-slp.cc: Likewise.
* tree-vect-stmts.cc: Likewise.
* tree-vectorizer.cc: Likewise.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/dump_plugin.c: Define INCLUDE_MEMORY.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
|
|
factor_out_conditional_operation [PR 116409]
The code was assuming that maybe_push_res_to_seq would not fail if the gimple_extract_op returned true.
But for some cases when the function is pure rather than const, then it can fail.
This change moves around the code to check the result of maybe_push_res_to_seq instead of assuming it will
always work.
Changes since v1:
* v2: Instead of directly testing non-pure builtin functions change to test if maybe_push_res_to_seq fails.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/116409
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Move
maybe_push_res_to_seq before creating the phi node and the debug dump.
Return false if maybe_push_res_to_seq fails.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116409-1.c: New test.
* gcc.dg/torture/pr116409-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
To start working on more with expressions with more than one operand, converting
over to use gimple_match_op is needed.
The added side-effect here is factor_out_conditional_operation can now support
builtins/internal calls that has one operand without any extra code added.
Note on the changed testcases:
* pr87007-5.c: the test was testing testing for avoiding partial register stalls
for the sqrt and making sure there is only one zero of the register before the
branch, the phiopt would now merge the sqrt's so disable phiopt.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi):
New function.
* gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi.
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op
instead of manually extracting from/creating the gimple.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr87007-5.c: Disable phi-opt.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
I didn't update the comment before factor_out_conditional_operation
correctly. this updates it to be correct and mentions unary operations
rather than just conversions.
Pushed as obvious.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Update
comment.
|
|
Automatic arrays that are not address-taken should not be subject to
store data races. This applies to OMP SIMD in-branch lowered
functions result array which for the testcase otherwise prevents
vectorization with SSE and for AVX and AVX512 ends up with spurious
.MASK_STORE to the stack surviving.
This inefficiency was noted in PR111793.
I've introduced ref_can_have_store_data_races, commonizing uses
of flag_store_data_races in if-conversion, cselim and store motion.
PR tree-optimization/111793
* tree-ssa-alias.h (ref_can_have_store_data_races): Declare.
* tree-ssa-alias.cc (ref_can_have_store_data_races): New
function.
* tree-if-conv.cc (ifcvt_memrefs_wont_trap): Use
ref_can_have_store_data_races to allow more unconditional
stores.
* tree-ssa-loop-im.cc (execute_sm): Likewise.
* tree-ssa-phiopt.cc (cond_store_replacement): Likewise.
* gcc.dg/vect/vect-simd-clone-21.c: New testcase.
|
|
Now that all remaining users of value_range have been renamed to
int_range<>, we can reclaim value_range as a temporary, thus removing
the annoying CamelCase.
gcc/ChangeLog:
* data-streamer-in.cc (streamer_read_value_range): Rename
Value_Range to value_range.
* data-streamer.h (streamer_read_value_range): Same.
* gimple-pretty-print.cc (dump_ssaname_info): Same.
* gimple-range-cache.cc (ssa_block_ranges::dump): Same.
(ssa_lazy_cache::merge): Same.
(block_range_cache::dump): Same.
(ssa_cache::merge_range): Same.
(ssa_cache::dump): Same.
(ranger_cache::edge_range): Same.
(ranger_cache::propagate_cache): Same.
(ranger_cache::fill_block_cache): Same.
(ranger_cache::resolve_dom): Same.
(ranger_cache::range_from_dom): Same.
(ranger_cache::register_inferred_value): Same.
* gimple-range-fold.cc (op1_range): Same.
(op2_range): Same.
(fold_relations): Same.
(fold_using_range::range_of_range_op): Same.
(fold_using_range::range_of_phi): Same.
(fold_using_range::range_of_call): Same.
(fold_using_range::condexpr_adjust): Same.
(fold_using_range::range_of_cond_expr): Same.
(fur_source::register_outgoing_edges): Same.
* gimple-range-fold.h (gimple_range_type): Same.
(gimple_range_ssa_p): Same.
* gimple-range-gori.cc (gori_compute::compute_operand_range): Same.
(gori_compute::logical_combine): Same.
(gori_compute::refine_using_relation): Same.
(gori_compute::compute_operand1_range): Same.
(gori_compute::compute_operand2_range): Same.
(gori_compute::compute_operand1_and_operand2_range): Same.
(gori_calc_operands): Same.
(gori_name_helper): Same.
* gimple-range-infer.cc (gimple_infer_range::check_assume_func): Same.
(gimple_infer_range::gimple_infer_range): Same.
(infer_range_manager::maybe_adjust_range): Same.
(infer_range_manager::add_range): Same.
* gimple-range-infer.h: Same.
* gimple-range-op.cc
(gimple_range_op_handler::gimple_range_op_handler): Same.
(gimple_range_op_handler::calc_op1): Same.
(gimple_range_op_handler::calc_op2): Same.
(gimple_range_op_handler::maybe_builtin_call): Same.
* gimple-range-path.cc (path_range_query::internal_range_of_expr): Same.
(path_range_query::ssa_range_in_phi): Same.
(path_range_query::compute_ranges_in_phis): Same.
(path_range_query::compute_ranges_in_block): Same.
(path_range_query::add_to_exit_dependencies): Same.
* gimple-range-trace.cc (debug_seed_ranger): Same.
* gimple-range.cc (gimple_ranger::range_of_expr): Same.
(gimple_ranger::range_on_entry): Same.
(gimple_ranger::range_on_edge): Same.
(gimple_ranger::range_of_stmt): Same.
(gimple_ranger::prefill_stmt_dependencies): Same.
(gimple_ranger::register_inferred_ranges): Same.
(gimple_ranger::register_transitive_inferred_ranges): Same.
(gimple_ranger::export_global_ranges): Same.
(gimple_ranger::dump_bb): Same.
(assume_query::calculate_op): Same.
(assume_query::calculate_phi): Same.
(assume_query::dump): Same.
(dom_ranger::range_of_stmt): Same.
* ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Same.
(ipa_vr_operation_and_type_effects): Same.
(ipa_value_range_from_jfunc): Same.
(propagate_bits_across_jump_function): Same.
(propagate_vr_across_jump_function): Same.
(ipcp_store_vr_results): Same.
* ipa-cp.h: Same.
* ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same.
(evaluate_properties_for_edge): Same.
* ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same.
(ipa_vr::get_vrange): Same.
(ipa_vr::streamer_read): Same.
(ipa_vr::streamer_write): Same.
(ipa_vr::dump): Same.
(ipa_set_jfunc_vr): Same.
(ipa_compute_jump_functions_for_edge): Same.
(ipcp_get_parm_bits): Same.
(ipcp_update_vr): Same.
(ipa_record_return_value_range): Same.
(ipa_return_value_range): Same.
* ipa-prop.h (ipa_return_value_range): Same.
(ipa_record_return_value_range): Same.
* range-op.h (range_cast): Same.
* tree-ssa-dom.cc
(dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Same.
(cprop_operand): Same.
* tree-ssa-loop-ch.cc (loop_static_stmt_p): Same.
* tree-ssa-loop-niter.cc (record_nonwrapping_iv): Same.
* tree-ssa-loop-split.cc (split_at_bb_p): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-strlen.cc (get_range): Same.
* tree-ssa-threadedge.cc (hybrid_jt_simplifier::simplify): Same.
(hybrid_jt_simplifier::compute_exit_dependencies): Same.
* tree-ssanames.cc (set_range_info): Same.
(duplicate_ssa_name_range_info): Same.
* tree-vrp.cc (remove_unreachable::handle_early): Same.
(remove_unreachable::remove_and_update_globals): Same.
(execute_ranger_vrp): Same.
* value-query.cc (range_query::value_of_expr): Same.
(range_query::value_on_edge): Same.
(range_query::value_of_stmt): Same.
(range_query::value_on_entry): Same.
(range_query::value_on_exit): Same.
(range_query::get_tree_range): Same.
* value-range-storage.cc (vrange_storage::set_vrange): Same.
* value-range.cc (Value_Range::dump): Same.
(value_range::dump): Same.
(debug): Same.
* value-range.h (enum value_range_discriminator): Same.
(class vrange): Same.
(class Value_Range): Same.
(class value_range): Same.
(Value_Range::Value_Range): Same.
(value_range::value_range): Same.
(Value_Range::~Value_Range): Same.
(value_range::~value_range): Same.
(Value_Range::set_type): Same.
(value_range::set_type): Same.
(Value_Range::init): Same.
(value_range::init): Same.
(Value_Range::operator=): Same.
(value_range::operator=): Same.
(Value_Range::operator==): Same.
(value_range::operator==): Same.
(Value_Range::operator!=): Same.
(value_range::operator!=): Same.
(Value_Range::supports_type_p): Same.
(value_range::supports_type_p): Same.
* vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Same.
(simplify_using_ranges::legacy_fold_cond): Same.
|
|
Fix a use of int_range_max in phiopt that should be a type agnostic
range, because it could be either a pointer or an int.
PR tree-optimization/115191
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Use Value_Range instead
of int_range_max.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr115191.c: New test.
|
|
The problem here is even if last_and_only_stmt returns a statement,
the bb might still contain a phi node which defines a ssa name
which is used in that statement so we need to add a check to make sure
that the phi nodes are empty for the middle bbs in both the
`CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases.
Bootstrapped and tested on x86_64_linux-gnu with no regressions.
PR tree-optimization/115143
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Check for empty
phi nodes for middle bbs for the case where middle bb is not empty.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/pr115143-1.c: New test.
* gcc.c-torture/compile/pr115143-2.c: New test.
* gcc.c-torture/compile/pr115143-3.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
While moving value replacement part of PHIOPT over
to use match-and-simplify, I ran into the case where
we would have an undef use that was conditional become
unconditional. This prevents that. I can't remember at this
point what the testcase was though.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Reject undef variables
so they don't become unconditional used.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
This adds a few early outs to value_replacement that I noticed
while rewriting this to use match-and-simplify but could be committed
seperately.
* virtual operands won't change so return early for them
* special case `A ? B : B` as that is already just `B`
Also moves the check for NE/EQ earlier as calculating empty_or_with_defined_p
is an IR walk for a BB and that might be big.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (value_replacement): Move check for
NE/EQ earlier.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
I noticed that single_non_singleton_phi_for_edges could
return a phi whos entry are all the same for the edge.
This happens only if there was a single phis in the first place.
Also gimple_seq_singleton_p walks the sequence to see if it the one
element in the sequence so there is removing that check actually
reduces the number of pointer walks needed.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges):
Remove the special case of gimple_seq_singleton_p.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
|
|
Another patch from eyeballing
git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 '
output, this time in gcc/ subdirectory.
2024-04-09 Jakub Jelinek <jakub@redhat.com>
gcc/
* expr.cc (convert_mode_scalar): Fix duplicated words in comment;
into into -> it into.
* function.h (function::cond_uids): Fix duplicated words in comment;
same same -> same.
* config/riscv/riscv-vector-costs.cc
(costs::adjust_vect_cost_per_loop): Fix duplicated words in comment;
model model -> model.
* config/riscv/riscv-vector-builtins-shapes.cc (build_base): Fix
duplicated words in comment; for for -> for.
* config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Fix
duplicated words in comment; more more -> more.
* config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Fix
duplicated words in comment; be be -> be.
* tree-profile.cc (masking_vectors): Fix duplicated words in comment;
has has -> has, the the -> the.
* value-range.cc (irange::set_range_from_bitmask): Fix duplicated
words in comment; the the -> the.
* gcov.cc (add_condition_counts): Fix duplicated words in comment;
to to -> to.
* vr-values.cc (get_scev_info): Fix duplicated words in comment;
the the -> to the.
* tree-vrp.cc (fully_replaceable): Fix duplicated words in comment;
by by -> by.
* mode-switching.cc (single_succ_confluence_n): Fix duplicated words
in comment; the the -> the.
* tree-ssa-phiopt.cc (value_replacement): Fix duplicated words in
comment; can can -> we can.
* gimple-range-phi.cc (phi_analyzer::process_phi): Fix duplicated words
in comment; it it -> it is.
* tree-ssa-sccvn.cc (visit_phi): Fix duplicated words in comment;
to to -> to.
* rtl-ssa/accesses.h (use_info::next_debug_insn_use): Fix duplicated
words in comment; if if -> if.
* doc/options.texi (InverseMask): Fix duplicated words; and and -> and.
Change take to takes.
* doc/invoke.texi (fanalyzer-undo-inlining): Fix duplicated words;
be be -> be.
(-minline-memops-threshold): Likewise.
gcc/analyzer/
* analyzer.opt (Wanalyzer-undefined-behavior-strtok): Fix duplicated
words; in in -> in.
* program-state.cc (sm_state_map::replay_call_summary): Fix duplicated
words in comment; to to -> to.
(program_state::replay_call_summary): Likewise.
* region-model.cc (region_model::replay_call_summary): Likewise.
gcc/c/
* c-decl.cc (previous_tag): Fix duplicated words in comment; the the
-> the.
(diagnose_mismatched_decls): Fix duplicated words in comment;
about about -> about.
gcc/cp/
* constexpr.cc (build_new_constexpr_heap_type): Fix duplicated words
in comment; is is -> is.
* cp-tree.def (CO_RETURN_EXPR): Fix duplicated words in comment;
for for -> for.
* parser.cc (fixup_blocks_walker): Fix duplicated words in comment;
is is -> is.
* semantics.cc (fixup_template_type): Fix duplicated words in comment;
for for -> for.
(finish_omp_for): Fix duplicated words in comment; the the -> the.
* pt.cc (more_specialized_fn): Fix duplicated words in comment;
think think -> think.
(type_targs_deducible_from): Fix duplicated words in comment; the the
-> the.
gcc/jit/
* docs/topics/expressions.rst (Constructor expressions): Fix
duplicated words; have have -> have.
|
|
|
|
This function is never called when param_l1_cache_line_size is 0,
but it uses int and unsigned int variables to hold alignment in
bits, so for large param_l1_cache_line_size it is zero and e.g.
DECL_ALIGN () % param_align_bits can divide by zero.
Looking at the code, the function uses tree_fits_uhwi_p on the trees
before converting them using tree_to_uhwi to int variables, which
looks just wrong, either it would need to punt if it doesn't fit
into those and also check for overflows during the computation,
or use unsigned HOST_WIDE_INT for all of this. That also fixes
the division by zero, as param_l1_cache_line_size maximum is INT_MAX,
that multiplied by 8 will always fit.
2023-12-09 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/112887
* tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of
param_align, param_align_bits, offset1, offset2, size2 and align1
variables from int or unsigned int to unsigned HOST_WIDE_INT.
* gcc.dg/pr112887.c: New test.
|
|
The following patch adds 6 new type-generic builtins,
__builtin_clzg
__builtin_ctzg
__builtin_clrsbg
__builtin_ffsg
__builtin_parityg
__builtin_popcountg
The g at the end stands for generic because the unsuffixed variant
of the builtins already have unsigned int or int arguments.
The main reason to add these is to support arbitrary unsigned (for
clrsb/ffs signed) bit-precise integer types and also __int128 which
wasn't supported by the existing builtins, so that e.g. <stdbit.h>
type-generic functions could then support not just bit-precise unsigned
integer type whose width matches a standard or extended integer type,
but others too.
None of these new builtins promote their first argument, so the argument
can be e.g. unsigned char or unsigned short or unsigned __int20 etc.
The first 2 support either 1 or 2 arguments, if only 1 argument is supplied,
the behavior is undefined for argument 0 like for other __builtin_c[lt]z*
builtins, if 2 arguments are supplied, the second argument should be int
that will be returned if the argument is 0. All other builtins have
just one argument. For __builtin_clrsbg and __builtin_ffsg the argument
shall be any signed standard/extended or bit-precise integer, for the others
any unsigned standard/extended or bit-precise integer (bool not allowed).
One possibility would be to also allow signed integer types for
the clz/ctz/parity/popcount ones (and just cast the argument to
unsigned_type_for during folding) and similarly unsigned integer types
for the clrsb/ffs ones, dunno what is better; for stdbit.h the current
version is sufficient and diagnoses use of the inappropriate sign,
though on the other side I wonder if users won't be confused by
__builtin_clzg (1) being an error and having to write __builtin_clzg (1U).
The new builtins are lowered to corresponding builtins with other suffixes
or internal calls (plus casts and adjustments where needed) during FE
folding or during gimplification at latest, the non-suffixed builtins
handling precisions up to precision of int, l up to precision of long,
ll up to precision of long long, up to __int128 precision lowered to
double-word expansion early and the rest (which must be _BitInt) lowered
to internal fn calls - those are then lowered during bitint lowering pass.
The patch also changes representation of IFN_CLZ and IFN_CTZ calls,
previously they were in the IL only if they are directly supported optab
and depending on C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 they had or didn't
have defined behavior at 0, now they are in the IL either if directly
supported optab, or for the large/huge BITINT_TYPEs and they have either
1 or 2 arguments. If one, the behavior is undefined at zero, if 2, the
second argument is an int constant that should be returned for 0.
As there is no extra support during expansion, for directly supported optab
the second argument if present should still match the
C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 value, but for BITINT_TYPE arguments
it can be arbitrary int INTEGER_CST.
The indended uses in stdbit.h are e.g.
#ifdef __has_builtin
#if __has_builtin(__builtin_clzg) && __has_builtin(__builtin_ctzg) && __has_builtin(__builtin_popcountg)
#define stdc_leading_zeros(value) \
((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_leading_ones(value) \
((unsigned int) __builtin_clzg ((__typeof (value)) ~(value), __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#define stdc_first_trailing_one(value) \
((unsigned int) (__builtin_ctzg (value, -1) + 1))
#define stdc_trailing_zeros(value) \
((unsigned int) __builtin_ctzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0)))
#endif
#endif
where __builtin_popcountg ((__typeof (x)) -1) computes the bit precision
of x's type (kind of _Bitwidthof (x) alternative).
They also allow casting of arbitrary unsigned _BitInt other than
unsigned _BitInt(1) to corresponding signed _BitInt by using
signed _BitInt(__builtin_popcountg ((__typeof (a)) -1))
and of arbitrary signed _BitInt to corresponding unsigned _BitInt
using unsigned _BitInt(__builtin_clrsbg ((__typeof (a)) -1) + 1).
2023-11-14 Jakub Jelinek <jakub@redhat.com>
PR c/111309
gcc/
* builtins.def (BUILT_IN_CLZG, BUILT_IN_CTZG, BUILT_IN_CLRSBG,
BUILT_IN_FFSG, BUILT_IN_PARITYG, BUILT_IN_POPCOUNTG): New
builtins.
* builtins.cc (fold_builtin_bit_query): New function.
(fold_builtin_1): Use it for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_builtin_2): Use it for BUILT_IN_{CLZ,CTZ}G.
* fold-const-call.cc: Fix comment typo on tm.h inclusion.
(fold_const_call_ss): Handle
CFN_BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(fold_const_call_sss): New function.
(fold_const_call_1): Call it for 2 argument functions returning
scalar when passed 2 INTEGER_CSTs.
* genmatch.cc (cmp_operand): For function calls also compare
number of arguments.
(fns_cmp): New function.
(dt_node::gen_kids): Sort fns and generic_fns.
(dt_node::gen_kids_1): Handle fns with the same id but different
number of arguments.
* match.pd (CLZ simplifications): Drop checks for defined behavior
at zero. Add variant of simplifications for IFN_CLZ with 2 arguments.
(CTZ simplifications): Drop checks for defined behavior at zero,
don't optimize precisions above MAX_FIXED_MODE_SIZE. Add variant of
simplifications for IFN_CTZ with 2 arguments.
(a != 0 ? CLZ(a) : CST -> .CLZ(a)): Use TREE_TYPE (@3) instead of
type, add BITINT_TYPE handling, create 2 argument IFN_CLZ rather than
one argument. Add variant for matching CLZ with 2 arguments.
(a != 0 ? CTZ(a) : CST -> .CTZ(a)): Similarly.
* gimple-lower-bitint.cc (bitint_large_huge::lower_bit_query): New
method.
(bitint_large_huge::lower_call): Use it for IFN_{CLZ,CTZ,CLRSB,FFS}
and IFN_{PARITY,POPCOUNT} calls.
* gimple-range-op.cc (cfn_clz::fold_range): Don't check
CLZ_DEFINED_VALUE_AT_ZERO for m_gimple_call_internal_p, instead
assume defined value at zero if the call has 2 arguments and use
second argument value for that case.
(cfn_ctz::fold_range): Similarly.
(gimple_range_op_handler::maybe_builtin_call): Use op_cfn_clz_internal
or op_cfn_ctz_internal only if internal fn call has 2 arguments and
set m_op2 in that case.
* tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern,
vect_recog_popcount_clz_ctz_ffs_pattern): For value defined at zero
use second argument of calls if present, otherwise assume UB at zero,
create 2 argument .CLZ/.CTZ calls if needed.
* tree-vect-stmts.cc (vectorizable_call): Handle 2 argument .CLZ/.CTZ
calls.
* tree-ssa-loop-niter.cc (build_cltz_expr): Create 2 argument
.CLZ/.CTZ calls if needed.
* tree-ssa-forwprop.cc (simplify_count_trailing_zeroes): Create 2
argument .CTZ calls if needed.
* tree-ssa-phiopt.cc (cond_removal_in_builtin_zero_pattern): Handle
2 argument .CLZ/.CTZ calls, handle BITINT_TYPE, create 2 argument
.CLZ/.CTZ calls.
* doc/extend.texi (__builtin_clzg, __builtin_ctzg, __builtin_clrsbg,
__builtin_ffsg, __builtin_parityg, __builtin_popcountg): Document.
gcc/c-family/
* c-common.cc (check_builtin_function_arguments): Handle
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* c-gimplify.cc (c_gimplify_expr): If __builtin_c[lt]zg second
argument hasn't been folded into constant yet, transform it to one
argument call inside of a COND_EXPR which for first argument 0
returns the second argument.
gcc/c/
* c-typeck.cc (convert_arguments): Don't promote first argument
of BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
gcc/cp/
* call.cc (magic_varargs_p): Return 4 for
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
(build_over_call): Don't promote first argument of
BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G.
* cp-gimplify.cc (cp_gimplify_expr): For BUILT_IN_C{L,T}ZG use
c_gimplify_expr.
gcc/testsuite/
* c-c++-common/pr111309-1.c: New test.
* c-c++-common/pr111309-2.c: New test.
* gcc.dg/torture/bitint-43.c: New test.
* gcc.dg/torture/bitint-44.c: New test.
|
|
In the case of a NOP conversion (precisions of the 2 types are equal),
factoring out the conversion can be done even if int_fits_type_p returns
false and even when the conversion is defined by a statement inside the
conditional. Since it is a NOP conversion there is no zero/sign extending
happening which is why it is ok to be done here; we were trying to prevent
an extra sign/zero extend from being moved away from definition which no-op
conversions are not.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/104376
PR tree-optimization/101541
* tree-ssa-phiopt.cc (factor_out_conditional_operation):
Allow nop conversions even if it is defined by a statement
inside the conditional.
gcc/testsuite/ChangeLog:
PR tree-optimization/101541
* gcc.dg/tree-ssa/phi-opt-39.c: New test.
|
|
So when diamond bb support was added to minmax_replacement in r13-1950-g9bb19e143cfe,
the code was not expecting the alt_middle_bb not to exist if it was empty (for threeway_p).
So when factor_out_conditional_conversion was used to factor out conversions, it turns out
the assumption for alt_middle_bb to be wrong and we ended up with threeway_p being true but
having middle_bb being empty but alt_middle_bb not being empty which causes wrong code in
many cases.
This patch fixes the issue by adding a test for the 2 cases where the assumption on
threeway_p case having the other bb being empty.
Changes made:
v2: Fix test for `(a <= u) b = MAX(a, d) else b = u`.
Note my plan for GCC 15 is remove minmax_replacement as match.pd will catch all cases
at that point.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/111469
gcc/ChangeLog:
* tree-ssa-phiopt.cc (minmax_replacement): Fix
the assumption for the `non-diamond` handling cases
of diamond code.
gcc/testsuite/ChangeLog:
* gcc.c-torture/execute/pr111469-1.c: New test.
|
|
The problem here is after r6-7425-ga9fee7cdc3c62d0e51730,
the comparison to see if the transformation could be done was using the
wrong value. Instead of see if the inner was LE (for MIN and GE for MAX)
the outer value, it was comparing the inner to the value used in the comparison
which was wrong.
The match pattern copied the same logic mistake when they were added in
r14-1411-g17cca3c43e2f49 .
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
PR tree-optimization/111331
* match.pd (`(a CMP CST1) ? max<a,CST2> : a`):
Fix the LE/GE comparison to the correct value.
* tree-ssa-phiopt.cc (minmax_replacement):
Fix the LE/GE comparison for the
`(a CMP CST1) ? max<a,CST2> : a` optimization.
gcc/testsuite/ChangeLog:
PR tree-optimization/111331
* gcc.c-torture/execute/pr111331-1.c: New test.
* gcc.c-torture/execute/pr111331-2.c: New test.
* gcc.c-torture/execute/pr111331-3.c: New test.
|
|
This adds dump on the full result of the match-and-simplify
for phiopt and specifically to know if we are rejecting something
due to being in early phi-opt.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (gimple_simplify_phiopt): Add dump information
when resimplify returns true.
(match_simplify_replacement): Print only if accepted the match-and-simplify
result rather than the full sequence.
|
|
I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os
when enabling expensive tests, and unfortunately I can't reproduce without
_BitInt. The IL before phiopt3 has:
<bb 87> [local count: 203190070]:
# .MEM_428 = VDEF <.MEM_367>
bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC3);
goto <bb 89>; [100.00%]
<bb 88> [local count: 203190070]:
# .MEM_427 = VDEF <.MEM_367>
bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC4);
<bb 89> [local count: 406380139]:
# .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)>
# VUSE <.MEM_368>
_123 = VIEW_CONVERT_EXPR<unsigned long[8]>(r495[i_107].D.2780)[0];
and factor_out_conditional_operation is called on the vop PHI, it
sees it has exactly two operands and defining statements of both
PHI arguments are converts (VCEs in this case), so it thinks it is
a good idea to try to optimize that and while doing that it constructs
void type SSA_NAMEs and the like.
2023-08-10 Jakub Jelinek <jakub@redhat.com>
PR c/102989
* tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges): Never
return virtual phis and return NULL if there is a virtual phi
where the arguments from E0 and E1 edges aren't equal.
|
|
In some cases (usually dealing with bools only), there could be some statements
left behind which are considered trivial dead.
An example is:
```
bool f(bool a, bool b)
{
if (!a && !b)
return 0;
if (!a && b)
return 0;
if (a && !b)
return 0;
return 1;
}
```
Where during phiopt2, the IR had:
```
_3 = ~b_7(D);
_4 = _3 & a_6(D);
_4 != 0 ? 0 : 1
```
match-and-simplify would transform that into:
```
_11 = ~a_6(D);
_12 = b_7(D) | _11;
```
But phiopt would leave around the statements defining _4 and _3.
This helps by marking the conditional's lhs and rhs to see if they are
trivial dead.
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement): Mark's cond
statement's lhs and rhs to check if trivial dead.
Rename inserted_exprs to exprs_maybe_dce; also move it so
bitmap is not allocated if not needed.
|
|
The following makes sure that FP x > y ? x : y style max/min operations
are if-converted at the GIMPLE level. While we can neither match
it to MAX_EXPR nor .FMAX as both have different semantics with IEEE
than the ternary ?: operation we can make sure to maintain this form
as a COND_EXPR so backends have the chance to match this to instructions
their ISA offers.
The patch does this in phiopt where we recognize min/max and instead
of giving up when we have to honor NaNs we alter the generated code
to a COND_EXPR.
This resolves PR88540 and we can then SLP vectorize the min operation
for its testcase. It also resolves part of the regressions observed
with the change matching bit-inserts of bit-field-refs to vec_perm.
Expansion from a COND_EXPR rather than from compare-and-branch
gcc.target/i386/pr54855-9.c by producing extra moves while the
corresponding min/max operations are now already synthesized by
RTL expansion, register selection isn't optimal. This can be also
provoked without this change by altering the operand order in the source.
I have XFAILed that part of the test.
PR tree-optimization/88540
* tree-ssa-phiopt.cc (minmax_replacement): Do not give up
with NaNs but handle the simple case by if-converting to a
COND_EXPR.
* gcc.target/i386/pr88540.c: New testcase.
* gcc.target/i386/pr54855-9.c: XFAIL check for redundant moves.
* gcc.target/i386/pr54855-12.c: Adjust.
* gcc.target/i386/pr54855-13.c: Likewise.
* gcc.target/i386/pr110170.c: Likewise.
* gcc.dg/tree-ssa/split-path-12.c: Likewise.
|
|
info during match
Match will query ranger via tree_nonzero_bits/get_nonzero_bits for 2 and 3rd
operand of the COND_EXPR and phiopt tries to do create the COND_EXPR even if we moving
one statement. That one statement could have some flow sensitive information on it
based on the condition that is for the COND_EXPR but that might create wrong code
if the statement was moved out.
This is similar to the previous version of the patch except now we use
flow_sensitive_info_storage instead of manually doing the save/restore
and also handle all defs on a gimple statement rather than just for lhs
of the gimple statement. Oh and a few more testcases were added that
was failing before.
OK? Bootsrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/110252
gcc/ChangeLog:
* tree-ssa-phiopt.cc (class auto_flow_sensitive): New class.
(auto_flow_sensitive::auto_flow_sensitive): New constructor.
(auto_flow_sensitive::~auto_flow_sensitive): New deconstructor.
(match_simplify_replacement): Temporarily
remove the flow sensitive info on the two statements that might
be moved.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-25b.c: Updated as
__builtin_parity loses the nonzerobits info.
* gcc.c-torture/execute/pr110252-1.c: New test.
* gcc.c-torture/execute/pr110252-2.c: New test.
* gcc.c-torture/execute/pr110252-3.c: New test.
* gcc.c-torture/execute/pr110252-4.c: New test.
|
|
The following makes sure to not make conditional undefs in PHI arguments
unconditional by folding cond ? arg1 : arg2.
PR tree-optimization/110491
* tree-ssa-phiopt.cc (match_simplify_replacement): Check
whether the PHI args are possibly undefined before folding
the COND_EXPR.
* gcc.dg/torture/pr110491.c: New testcase.
|
|
The following removes gimple_uses_undefined_value_p and instead
uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last
user of the other API.
* tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs.
(empty_bb_or_one_feeding_into_p): Check for them.
* tree-ssa.h (gimple_uses_undefined_value_p): Remove.
* tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.
|
|
After using factor_out_conditional_conversion with diamond bb,
we should be able do use it also for all normal unary gimple and not
just conversions. This allows to optimize PR 59424 for an example.
This is also a start to optimize PR 64700 and a few others.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
An example of this is:
```
static inline unsigned long long g(int t)
{
unsigned t1 = t;
return t1;
}
static int abs1(int a)
{
if (a < 0)
a = -a;
return a;
}
unsigned long long f(int c, int d, int e)
{
unsigned long long t;
if (d > e)
t = g(abs1(d));
else
t = g(abs1(e));
return t;
}
```
Which should be optimized to:
_9 = MAX_EXPR <d_5(D), e_6(D)>;
_4 = ABS_EXPR <_9>;
t_3 = (long long unsigned intD.16) _4;
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ...
(factor_out_conditional_operation): This and add support for all unary
operations.
(pass_phiopt::execute): Update call to factor_out_conditional_conversion
to call factor_out_conditional_operation instead.
PR tree-optimization/109424
PR tree-optimization/59424
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/abs-2.c: Update tree scan for
details change in wording.
* gcc.dg/tree-ssa/minmax-17.c: Likewise.
* gcc.dg/tree-ssa/pr103771.c: Likewise.
* gcc.dg/tree-ssa/minmax-18.c: New test.
* gcc.dg/tree-ssa/minmax-19.c: New test.
|
|
After adding diamond shaped bb support to factor_out_conditional_conversion,
we can get a case where we have two conversions that needs factored out
and then would have another phiopt happen.
An example is:
```
static inline unsigned long long g(int t)
{
unsigned t1 = t;
return t1;
}
unsigned long long f(int c, int d, int e)
{
unsigned long long t;
if (c > d)
t = g(c);
else
t = g(d);
return t;
}
```
In this case we should get a MAX_EXPR in phiopt1 with two casts.
Before this patch, we would just factor out the outer cast and then
wait till phiopt2 to factor out the inner cast.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (pass_phiopt::execute): Loop
over factor_out_conditional_conversion.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-17.c: New test.
|
|
So the function factor_out_conditional_conversion already supports
diamond shaped bb forms, just need to be called for such a thing.
harden-cond-comp.c needed to be changed as we would optimize out the
conversion now and that causes the compare hardening not needing to
split the block which it was testing. So change it such that there
would be no chance of optimization.
Also add two testcases that showed the improvement. PR 103771 is
solved in ifconvert also for the vectorizer but now it is solved
in a general sense.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/49959
PR tree-optimization/103771
gcc/ChangeLog:
* tree-ssa-phiopt.cc (pass_phiopt::execute): Support
Diamond shapped bb form for factor_out_conditional_conversion.
gcc/testsuite/ChangeLog:
* c-c++-common/torture/harden-cond-comp.c: Change testcase
slightly to avoid the new phiopt optimization.
* gcc.dg/tree-ssa/abs-2.c: New test.
* gcc.dg/tree-ssa/pr103771.c: New test.
|
|
So it turns out I messed checking which edge was true/false for the diamond
form. The edges, e0 and e1 here are edges from the merge block but the
true/false edges are from the conditional block and with diamond/threeway,
there is a bb inbetween on both edges.
Most of the time, the check that was in match_simplify_replacement would
happen to be correct for diamond form as most of the time the first edge in
the conditional is the edge for the true side of the conditional.
This is why I didn't see the issue during bootstrap/testing.
I added a fragile gimple testcase which exposed the issue. Since there is
no way to specify the order of the edges in the gimple fe, we have to
have forwprop to swap the false/true edges (not order of them, just swapping
true/false flags) and hope not to do cleanupcfg inbetween forwprop and the
first phiopt pass. This is the fragile part really, it is not that we will
produce wrong code, just we won't hit what was the failing case.
OK? Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/109732
gcc/ChangeLog:
* tree-ssa-phiopt.cc (match_simplify_replacement): Fix the selection
of the argtrue/argfalse.
gcc/testsuite/ChangeLog:
* gcc.dg/pr109732.c: New test.
* gcc.dg/pr109732-1.c: New test.
|
|
While looking at differences between what minmax_replacement
and match_simplify_replacement does. I noticed that they sometimes
chose different edges to remove. I decided we should be able to do
better and be able to remove both empty basic blocks in the
case of match_simplify_replacement as that moves the statements.
This also updates the testcases as now match_simplify_replacement
will remove the unused MIN/MAX_EXPR and they were checking for
those.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle
diamond form bb with forwarder only empty blocks better.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/minmax-15.c: Update test.
* gcc.dg/tree-ssa/minmax-16.c: Update test.
* gcc.dg/tree-ssa/minmax-3.c: Update test.
* gcc.dg/tree-ssa/minmax-4.c: Update test.
* gcc.dg/tree-ssa/minmax-5.c: Update test.
* gcc.dg/tree-ssa/minmax-8.c: Update test.
|
|
When I added the dce_ssa_names argument, I didn't realize bitmap was a
pointer so I used the default argument value as auto_bitmap(). But
instead we could just use nullptr and check if it was a nullptr
before calling simple_dce_from_worklist.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Change
the default argument value for dce_ssa_names to nullptr.
Check to make sure dce_ssa_names is a non-nullptr before
calling simple_dce_from_worklist.
|
|
The following renames last_stmt to last_nondebug_stmt which is
what it really does.
* tree-cfg.h (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
* tree-cfg.cc (last_stmt): Rename to ...
(last_nondebug_stmt): ... this.
(assign_discriminators): Adjust.
(group_case_labels_stmt): Likewise.
(gimple_can_duplicate_bb_p): Likewise.
(execute_fixup_cfg): Likewise.
* auto-profile.cc (afdo_propagate_circuit): Likewise.
* gimple-range.cc (gimple_ranger::range_on_exit): Likewise.
* omp-expand.cc (workshare_safe_to_combine_p): Likewise.
(determine_parallel_type): Likewise.
(adjust_context_and_scope): Likewise.
(expand_task_call): Likewise.
(remove_exit_barrier): Likewise.
(expand_omp_taskreg): Likewise.
(expand_omp_for_init_counts): Likewise.
(expand_omp_for_init_vars): Likewise.
(expand_omp_for_static_chunk): Likewise.
(expand_omp_simd): Likewise.
(expand_oacc_for): Likewise.
(expand_omp_for): Likewise.
(expand_omp_sections): Likewise.
(expand_omp_atomic_fetch_op): Likewise.
(expand_omp_atomic_cas): Likewise.
(expand_omp_atomic): Likewise.
(expand_omp_target): Likewise.
(expand_omp): Likewise.
(omp_make_gimple_edges): Likewise.
* trans-mem.cc (tm_region_init): Likewise.
* tree-inline.cc (redirect_all_calls): Likewise.
* tree-parloops.cc (gen_parallel_loop): Likewise.
* tree-ssa-loop-ch.cc (do_while_loop_p): Likewise.
* tree-ssa-loop-ivcanon.cc (canonicalize_loop_induction_variables):
Likewise.
* tree-ssa-loop-ivopts.cc (stmt_after_ip_normal_pos): Likewise.
(may_eliminate_iv): Likewise.
* tree-ssa-loop-manip.cc (standard_iv_increment_position): Likewise.
* tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations):
Likewise.
(estimate_numbers_of_iterations): Likewise.
* tree-ssa-loop-split.cc (compute_added_num_insns): Likewise.
* tree-ssa-loop-unswitch.cc (get_predicates_for_bb): Likewise.
(set_predicates_for_bb): Likewise.
(init_loop_unswitch_info): Likewise.
(hoist_guard): Likewise.
* tree-ssa-phiopt.cc (match_simplify_replacement): Likewise.
(minmax_replacement): Likewise.
* tree-ssa-reassoc.cc (update_range_test): Likewise.
(optimize_range_tests_to_bit_test): Likewise.
(optimize_range_tests_var_bound): Likewise.
(optimize_range_tests): Likewise.
(no_side_effect_bb): Likewise.
(suitable_cond_bb): Likewise.
(maybe_optimize_range_tests): Likewise.
(reassociate_bb): Likewise.
* tree-vrp.cc (rvrp_folder::pre_fold_bb): Likewise.
|
|
When I added diamond shaped form bb to match_simplify_replacement,
I copied the code to move the statement rather than factoring it
out to a new function. This does the refactoring to a new function
to avoid the duplicated code. It will make adding support for having
two statements to move easier (the second statement will only be a
conversion).
OK? Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (move_stmt): New function.
(match_simplify_replacement): Use move_stmt instead
of the inlined version.
|
|
I noticed I didn't update the comment about how the pass
works after I initially added match_simplify_replacement.
Anyways this updates the comment to be the current state
of the pass.
OK?
gcc/ChangeLog:
* tree-ssa-phiopt.cc: Update comment about
how the transformation are implemented.
|
|
This converts the irange API to use wide_ints exclusively, along with
its users.
This patch will slow down VRP, as there will be more useless
wide_int to tree conversions. However, this slowdown is only
temporary, as a follow-up patch will convert the internal
representation of iranges to wide_ints for a net overall gain
in performance.
gcc/ChangeLog:
* fold-const.cc (expr_not_equal_to): Convert to irange wide_int API.
* gimple-fold.cc (size_must_be_zero_p): Same.
* gimple-loop-versioning.cc
(loop_versioning::prune_loop_conditions): Same.
* gimple-range-edge.cc (gcond_edge_range): Same.
(gimple_outgoing_range::calc_switch_ranges): Same.
* gimple-range-fold.cc (adjust_imagpart_expr): Same.
(adjust_realpart_expr): Same.
(fold_using_range::range_of_address): Same.
(fold_using_range::relation_fold_and_or): Same.
* gimple-range-gori.cc (gori_compute::gori_compute): Same.
(range_is_either_true_or_false): Same.
* gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same.
(cfn_clz::fold_range): Same.
(cfn_ctz::fold_range): Same.
* gimple-range-tests.cc (class test_expr_eval): Same.
* gimple-ssa-warn-alloca.cc (alloca_call_type): Same.
* ipa-cp.cc (ipa_value_range_from_jfunc): Same.
(propagate_vr_across_jump_function): Same.
(decide_whether_version_node): Same.
* ipa-prop.cc (ipa_get_value_range): Same.
* ipa-prop.h (ipa_range_set_and_normalize): Same.
* range-op.cc (get_shift_range): Same.
(value_range_from_overflowed_bounds): Same.
(value_range_with_overflow): Same.
(create_possibly_reversed_range): Same.
(equal_op1_op2_relation): Same.
(not_equal_op1_op2_relation): Same.
(lt_op1_op2_relation): Same.
(le_op1_op2_relation): Same.
(gt_op1_op2_relation): Same.
(ge_op1_op2_relation): Same.
(operator_mult::op1_range): Same.
(operator_exact_divide::op1_range): Same.
(operator_lshift::op1_range): Same.
(operator_rshift::op1_range): Same.
(operator_cast::op1_range): Same.
(operator_logical_and::fold_range): Same.
(set_nonzero_range_from_mask): Same.
(operator_bitwise_or::op1_range): Same.
(operator_bitwise_xor::op1_range): Same.
(operator_addr_expr::fold_range): Same.
(pointer_plus_operator::wi_fold): Same.
(pointer_or_operator::op1_range): Same.
(INT): Same.
(UINT): Same.
(INT16): Same.
(UINT16): Same.
(SCHAR): Same.
(UCHAR): Same.
(range_op_cast_tests): Same.
(range_op_lshift_tests): Same.
(range_op_rshift_tests): Same.
(range_op_bitwise_and_tests): Same.
(range_relational_tests): Same.
* range.cc (range_zero): Same.
(range_nonzero): Same.
* range.h (range_true): Same.
(range_false): Same.
(range_true_and_false): Same.
* tree-data-ref.cc (split_constant_offset_1): Same.
* tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same.
* tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same.
(find_unswitching_predicates_for_bb): Same.
* tree-ssa-phiopt.cc (value_replacement): Same.
* tree-ssa-threadbackward.cc
(back_threader::find_taken_edge_cond): Same.
* tree-ssanames.cc (ssa_name_has_boolean_range): Same.
* tree-vrp.cc (find_case_label_range): Same.
* value-query.cc (range_query::get_tree_range): Same.
* value-range.cc (irange::set_nonnegative): Same.
(frange::contains_p): Same.
(frange::singleton_p): Same.
(frange::internal_singleton_p): Same.
(irange::irange_set): Same.
(irange::irange_set_1bit_anti_range): Same.
(irange::irange_set_anti_range): Same.
(irange::set): Same.
(irange::operator==): Same.
(irange::singleton_p): Same.
(irange::contains_p): Same.
(irange::set_range_from_nonzero_bits): Same.
(DEFINE_INT_RANGE_INSTANCE): Same.
(INT): Same.
(UINT): Same.
(SCHAR): Same.
(UINT128): Same.
(UCHAR): Same.
(range): New.
(tree_range): New.
(range_int): New.
(range_uint): New.
(range_uint128): New.
(range_uchar): New.
(range_char): New.
(build_range3): Convert to irange wide_int API.
(range_tests_irange3): Same.
(range_tests_int_range_max): Same.
(range_tests_strict_enum): Same.
(range_tests_misc): Same.
(range_tests_nonzero_bits): Same.
(range_tests_nan): Same.
(range_tests_signed_zeros): Same.
* value-range.h (Value_Range::Value_Range): Same.
(irange::set): Same.
(irange::nonzero_p): Same.
(irange::contains_p): Same.
(range_includes_zero_p): Same.
(irange::set_nonzero): Same.
(irange::set_zero): Same.
(contains_zero_p): Same.
(frange::contains_p): Same.
* vr-values.cc
(simplify_using_ranges::op_with_boolean_value_range_p): Same.
(bounds_of_var_in_loop): Same.
(simplify_using_ranges::legacy_fold_cond_overflow): Same.
|
|
While moving working on moving
cond_removal_in_builtin_zero_pattern to match, I noticed
that functions were not allowed to move as we reject all
non-assignments.
This changes to allowing a few calls which are known not
to throw/trap. Right now it is restricted to ones
which cond_removal_in_builtin_zero_pattern handles but
adding more is just adding it to the switch statement.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p):
Allow some builtin/internal function calls which
are known not to trap/throw.
(phiopt_worker::match_simplify_replacement):
Use name instead of getting the lhs again.
|
|
This patch converts two_value_replacement function
into a match.pd pattern.
It is a direct translation with only one minor change,
does not check for the {0,+-1} case as that is handled
before in match.pd so there is no reason to do the extra
check for it.
OK? Bootstrapped and tested on x86_64-linux-gnu with
no regressions.
gcc/ChangeLog:
PR tree-optimization/100958
* tree-ssa-phiopt.cc (two_value_replacement): Remove.
(pass_phiopt::execute): Don't call two_value_replacement.
* match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to
handle what two_value_replacement did.
|
|
In the early PHIOPT mode, the original minmax_replacement, would
replace a PHI node with up to 2 min/max expressions in some cases,
this allows for that too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (phiopt_early_allow): Allow for
up to 2 min/max expressions in the sequence/match code.
|
|
This simple patch moves the body of store_elim_worker
direclty into pass_cselim::execute.
Also removes unneeded prototypes too.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (cond_store_replacement): Remove
prototype.
(cond_if_else_store_replacement): Likewise.
(get_non_trapping): Likewise.
(store_elim_worker): Move into ...
(pass_cselim::execute): This.
|
|
Now that store elimination and phiopt does not
share outer code, we can move tree_ssa_phiopt_worker
directly into pass_phiopt::execute and remove
many declarations (prototypes) from the file.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (two_value_replacement): Remove
prototype.
(match_simplify_replacement): Likewise.
(factor_out_conditional_conversion): Likewise.
(value_replacement): Likewise.
(minmax_replacement): Likewise.
(spaceship_replacement): Likewise.
(cond_removal_in_builtin_zero_pattern): Likewise.
(hoist_adjacent_loads): Likewise.
(tree_ssa_phiopt_worker): Move into ...
(pass_phiopt::execute): this.
|
|
Since the last cleanups, it made easier to see
that we should split out the store elimination
worker from tree_ssa_phiopt_worker function.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove
do_store_elim argument and split that part out to ...
(store_elim_worker): This new function.
(pass_cselim::execute): Call store_elim_worker.
(pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
|
|
gcc/ChangeLog:
* builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange
API uses to new API.
* gimple-predicate-analysis.cc (find_var_cmp_const): Same.
* internal-fn.cc (get_min_precision): Same.
* match.pd: Same.
* tree-affine.cc (expr_to_aff_combination): Same.
* tree-data-ref.cc (dr_step_indicator): Same.
* tree-dfa.cc (get_ref_base_and_extent): Same.
* tree-scalar-evolution.cc (iv_can_overflow_p): Same.
* tree-ssa-phiopt.cc (two_value_replacement): Same.
* tree-ssa-pre.cc (insert_into_preds_of_block): Same.
* tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same.
* tree-ssa-strlen.cc (compare_nonzero_chars): Same.
* tree-switch-conversion.cc (bit_test_cluster::emit): Same.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Same.
* tree.cc (get_range_pos_neg): Same.
|