aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-ssa-phiopt.cc
AgeCommit message (Collapse)AuthorFilesLines
2024-09-17phiopt: C++ify cond_if_else_store_replacementAndrew Pinski1-14/+11
This C++ify cond_if_else_store_replacement by using range fors and changing using a std::pair instead of 2 vecs. I had a hard time understanding the code when there was 2 vecs so having a vec of a pair makes it easier to understand the relationship between the 2. gcc/ChangeLog: * tree-ssa-phiopt.cc (cond_if_else_store_replacement): Use range fors and use one vec for then/else stores instead of 2. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-17phiopt: Add some details dump to cselimAndrew Pinski1-0/+21
While trying to debug PR 116747, I noticed there was no dump saying what was done. So this adds the debug dump and it helps debug what is going on in PR 116747 too. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (cond_if_else_store_replacement_1): Add debug dump. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-14phi-opt: Improve heuristics for factoring out with constant (again) [PR116699]Andrew Pinski1-0/+6
The heuristics for factoring out with a constant checks that the assignment statement is the last statement of the basic block but sometimes there is a predicate or a nop statement after the assignment. Rejecting this case does not make sense since both predicates and nop statements are removed and don't contribute any instructions. So we should skip over them when checking if the assignment statement was the last statement in the basic block. phi-opt-factor-1.c's f0 is such an example where it should catch it at phiopt1 (before predicates are removed) and should happen in a similar way as f1 (which uses a temporary variable rather than return). Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116699 gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Skip over nop/predicates for seeing the assignment is the last statement. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-factor-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-13Fix factor_out_conditional_operation heuristics for constantsAndrew Pinski1-6/+8
While working on a different patch, I noticed the heuristics were not doing the right thing if there was statements before the NOP/PREDICTs. (LABELS don't have other statements before them). This fixes that oversight which was added in r15-3334-gceda727dafba6e. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Instead of just ignorning a NOP/PREDICT, skip over them before checking the heuristics. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-09phiopt: Move the common code between pass_phiopt and pass_cselim into a ↵Andrew Pinski1-153/+100
seperate function When r14-303-gb9fedabe381cce was done, it was missed that some of the common parts could be done in a template and a lambda could be used. This patch implements that. This new function can be used later on to implement a simple ifcvt pass. gcc/ChangeLog: * tree-ssa-phiopt.cc (execute_over_cond_phis): New template function, moved the common parts from pass_phiopt::execute/pass_cselim::execute. (pass_phiopt::execute): Move the functon specific parts of the loop into an lamdba. (pass_cselim::execute): Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-09phiopt: Use gimple_phi_result rather than PHI_RESULT [PR116643]Andrew Pinski1-8/+8
This converts the uses of PHI_RESULT in phiopt to be gimple_phi_result instead. Since there was already a mismatch of uses here, it would be good to use prefered one (gimple_phi_result) instead. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116643 gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): s/PHI_RESULT/gimple_phi_result/. (factor_out_conditional_operation): Likewise. (minmax_replacement): Likewise. (spaceship_replacement): Likewise. (cond_store_replacement): Likewise. (cond_if_else_store_replacement_1): Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-09phiopt: Small refactoring/cleanup of non-ssa name case of ↵Andrew Pinski1-62/+60
factor_out_conditional_operation This small cleanup removes a redundant check for gimple_assign_cast_p and reformats based on that. Also changes the if statement that checks if the integral type and the check to see if the constant fits into the new type such that it returns null and reformats based on that. Also moves the check for has_single_use earlier so it is less complex still a cheaper check than some of the others (like the check on the integer side). This was noticed when adding a few new things to factor_out_conditional_operation but those are not ready to submit yet. Note there are no functional difference with this change. Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Move the has_single_use checks much earlier. Remove redundant check for gimple_assign_cast_p. Change around the check if the integral consts fits into the new type. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-31phiopt: Ignore some nop statements in heursics [PR116098]Andrew Pinski1-2/+7
The heurstics that was added for PR71016, try to search to see if the conversion was being moved away from its definition. The problem is the heurstics would stop if there was a non GIMPLE_ASSIGN (and already ignores debug statements) and in this case we would have a GIMPLE_LABEL that was not being ignored. So we should need to ignore GIMPLE_NOP, GIMPLE_LABEL and GIMPLE_PREDICT. Note this is now similar to how gimple_empty_block_p behaves. Note this fixes the wrong code that was reported by moving the VCE (conversion) out before the phiopt/match could convert it into an bit_ior and move the VCE out with the VCE being conditionally valid. Bootstrapped and tested on x86_64-linux-gnu. Also built and tested for aarch64-linux-gnu. PR tree-optimization/116098 gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Ignore nops, labels and predicts for heuristic for conversion with a constant. gcc/testsuite/ChangeLog: * c-c++-common/torture/pr116098-1.c: New test. * gcc.target/aarch64/csel-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-29Use std::unique_ptr for optinfo_itemDavid Malcolm1-0/+1
As preliminary work towards an overhaul of how optinfo_items interact with dump_pretty_printer, replace uses of optinfo_item * with std::unique_ptr<optinfo_item> to make ownership clearer. No functional change intended. gcc/ChangeLog: * config/aarch64/aarch64.cc: Define INCLUDE_MEMORY. * config/arm/arm.cc: Likewise. * config/i386/i386.cc: Likewise. * config/loongarch/loongarch.cc: Likewise. * config/riscv/riscv-vector-costs.cc: Likewise. * config/riscv/riscv.cc: Likewise. * config/rs6000/rs6000.cc: Likewise. * dump-context.h (dump_context::emit_item): Convert "item" param from * to const &. (dump_pretty_printer::stash_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. (dump_pretty_printer::emit_item): Likewise. * dumpfile.cc: Include "make-unique.h". (make_item_for_dump_gimple_stmt): Replace uses of optinfo_item * with std::unique_ptr<optinfo_item>. (dump_context::dump_gimple_stmt): Likewise. (make_item_for_dump_gimple_expr): Likewise. (dump_context::dump_gimple_expr): Likewise. (make_item_for_dump_generic_expr): Likewise. (dump_context::dump_generic_expr): Likewise. (make_item_for_dump_symtab_node): Likewise. (dump_pretty_printer::emit_items): Likewise. (dump_pretty_printer::emit_any_pending_textual_chunks): Likewise. (dump_pretty_printer::emit_item): Likewise. (dump_pretty_printer::stash_item): Likewise. (dump_pretty_printer::decode_format): Likewise. (dump_context::dump_printf_va): Fix overlong line. (make_item_for_dump_dec): Replace uses of optinfo_item * with std::unique_ptr<optinfo_item>. (dump_context::dump_dec): Likewise. (dump_context::dump_symtab_node): Likewise. (dump_context::begin_scope): Likewise. (dump_context::emit_item): Likewise. * gimple-loop-interchange.cc: Define INCLUDE_MEMORY. * gimple-loop-jam.cc: Likewise. * gimple-loop-versioning.cc: Likewise. * graphite-dependences.cc: Likewise. * graphite-isl-ast-to-gimple.cc: Likewise. * graphite-optimize-isl.cc: Likewise. * graphite-poly.cc: Likewise. * graphite-scop-detection.cc: Likewise. * graphite-sese-to-poly.cc: Likewise. * graphite.cc: Likewise. * opt-problem.cc: Likewise. * optinfo.cc (optinfo::add_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. (optinfo::emit_for_opt_problem): Update for change to dump_context::emit_item. * optinfo.h: Add #error to fail immediately if INCLUDE_MEMORY wasn't defined, rather than fail to find std::unique_ptr. (optinfo::add_item): Convert "item" param from optinfo_ * to std::unique_ptr<optinfo_item>. * sese.cc: Define INCLUDE_MEMORY. * targhooks.cc: Likewise. * tree-data-ref.cc: Likewise. * tree-if-conv.cc: Likewise. * tree-loop-distribution.cc: Likewise. * tree-parloops.cc: Likewise. * tree-predcom.cc: Likewise. * tree-ssa-live.cc: Likewise. * tree-ssa-loop-ivcanon.cc: Likewise. * tree-ssa-loop-ivopts.cc: Likewise. * tree-ssa-loop-prefetch.cc: Likewise. * tree-ssa-loop-unswitch.cc: Likewise. * tree-ssa-phiopt.cc: Likewise. * tree-ssa-threadbackward.cc: Likewise. * tree-ssa-threadupdate.cc: Likewise. * tree-vect-data-refs.cc: Likewise. * tree-vect-generic.cc: Likewise. * tree-vect-loop-manip.cc: Likewise. * tree-vect-loop.cc: Likewise. * tree-vect-patterns.cc: Likewise. * tree-vect-slp-patterns.cc: Likewise. * tree-vect-slp.cc: Likewise. * tree-vect-stmts.cc: Likewise. * tree-vectorizer.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/dump_plugin.c: Define INCLUDE_MEMORY. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2024-08-20phi-opt: Fix for failing maybe_push_res_to_seq in ↵Andrew Pinski1-10/+20
factor_out_conditional_operation [PR 116409] The code was assuming that maybe_push_res_to_seq would not fail if the gimple_extract_op returned true. But for some cases when the function is pure rather than const, then it can fail. This change moves around the code to check the result of maybe_push_res_to_seq instead of assuming it will always work. Changes since v1: * v2: Instead of directly testing non-pure builtin functions change to test if maybe_push_res_to_seq fails. Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/116409 gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Move maybe_push_res_to_seq before creating the phi node and the debug dump. Return false if maybe_push_res_to_seq fails. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116409-1.c: New test. * gcc.dg/torture/pr116409-2.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-18PHIOPT: move factor_out_conditional_operation over to use gimple_match_opAndrew Pinski1-37/+29
To start working on more with expressions with more than one operand, converting over to use gimple_match_op is needed. The added side-effect here is factor_out_conditional_operation can now support builtins/internal calls that has one operand without any extra code added. Note on the changed testcases: * pr87007-5.c: the test was testing testing for avoiding partial register stalls for the sqrt and making sure there is only one zero of the register before the branch, the phiopt would now merge the sqrt's so disable phiopt. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi): New function. * gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi. * tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op instead of manually extracting from/creating the gimple. gcc/testsuite/ChangeLog: * gcc.target/i386/pr87007-5.c: Disable phi-opt. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-15PHIOPT: Fix comment before factor_out_conditional_operationAndrew Pinski1-1/+1
I didn't update the comment before factor_out_conditional_operation correctly. this updates it to be correct and mentions unary operations rather than just conversions. Pushed as obvious. gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Update comment.
2024-06-18Enhance if-conversion for automatic arraysRichard Biener1-3/+1
Automatic arrays that are not address-taken should not be subject to store data races. This applies to OMP SIMD in-branch lowered functions result array which for the testcase otherwise prevents vectorization with SSE and for AVX and AVX512 ends up with spurious .MASK_STORE to the stack surviving. This inefficiency was noted in PR111793. I've introduced ref_can_have_store_data_races, commonizing uses of flag_store_data_races in if-conversion, cselim and store motion. PR tree-optimization/111793 * tree-ssa-alias.h (ref_can_have_store_data_races): Declare. * tree-ssa-alias.cc (ref_can_have_store_data_races): New function. * tree-if-conv.cc (ifcvt_memrefs_wont_trap): Use ref_can_have_store_data_races to allow more unconditional stores. * tree-ssa-loop-im.cc (execute_sm): Likewise. * tree-ssa-phiopt.cc (cond_store_replacement): Likewise. * gcc.dg/vect/vect-simd-clone-21.c: New testcase.
2024-06-17Rename Value_Range to value_range.Aldy Hernandez1-2/+2
Now that all remaining users of value_range have been renamed to int_range<>, we can reclaim value_range as a temporary, thus removing the annoying CamelCase. gcc/ChangeLog: * data-streamer-in.cc (streamer_read_value_range): Rename Value_Range to value_range. * data-streamer.h (streamer_read_value_range): Same. * gimple-pretty-print.cc (dump_ssaname_info): Same. * gimple-range-cache.cc (ssa_block_ranges::dump): Same. (ssa_lazy_cache::merge): Same. (block_range_cache::dump): Same. (ssa_cache::merge_range): Same. (ssa_cache::dump): Same. (ranger_cache::edge_range): Same. (ranger_cache::propagate_cache): Same. (ranger_cache::fill_block_cache): Same. (ranger_cache::resolve_dom): Same. (ranger_cache::range_from_dom): Same. (ranger_cache::register_inferred_value): Same. * gimple-range-fold.cc (op1_range): Same. (op2_range): Same. (fold_relations): Same. (fold_using_range::range_of_range_op): Same. (fold_using_range::range_of_phi): Same. (fold_using_range::range_of_call): Same. (fold_using_range::condexpr_adjust): Same. (fold_using_range::range_of_cond_expr): Same. (fur_source::register_outgoing_edges): Same. * gimple-range-fold.h (gimple_range_type): Same. (gimple_range_ssa_p): Same. * gimple-range-gori.cc (gori_compute::compute_operand_range): Same. (gori_compute::logical_combine): Same. (gori_compute::refine_using_relation): Same. (gori_compute::compute_operand1_range): Same. (gori_compute::compute_operand2_range): Same. (gori_compute::compute_operand1_and_operand2_range): Same. (gori_calc_operands): Same. (gori_name_helper): Same. * gimple-range-infer.cc (gimple_infer_range::check_assume_func): Same. (gimple_infer_range::gimple_infer_range): Same. (infer_range_manager::maybe_adjust_range): Same. (infer_range_manager::add_range): Same. * gimple-range-infer.h: Same. * gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler): Same. (gimple_range_op_handler::calc_op1): Same. (gimple_range_op_handler::calc_op2): Same. (gimple_range_op_handler::maybe_builtin_call): Same. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Same. (path_range_query::ssa_range_in_phi): Same. (path_range_query::compute_ranges_in_phis): Same. (path_range_query::compute_ranges_in_block): Same. (path_range_query::add_to_exit_dependencies): Same. * gimple-range-trace.cc (debug_seed_ranger): Same. * gimple-range.cc (gimple_ranger::range_of_expr): Same. (gimple_ranger::range_on_entry): Same. (gimple_ranger::range_on_edge): Same. (gimple_ranger::range_of_stmt): Same. (gimple_ranger::prefill_stmt_dependencies): Same. (gimple_ranger::register_inferred_ranges): Same. (gimple_ranger::register_transitive_inferred_ranges): Same. (gimple_ranger::export_global_ranges): Same. (gimple_ranger::dump_bb): Same. (assume_query::calculate_op): Same. (assume_query::calculate_phi): Same. (assume_query::dump): Same. (dom_ranger::range_of_stmt): Same. * ipa-cp.cc (ipcp_vr_lattice::meet_with_1): Same. (ipa_vr_operation_and_type_effects): Same. (ipa_value_range_from_jfunc): Same. (propagate_bits_across_jump_function): Same. (propagate_vr_across_jump_function): Same. (ipcp_store_vr_results): Same. * ipa-cp.h: Same. * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Same. (evaluate_properties_for_edge): Same. * ipa-prop.cc (struct ipa_vr_ggc_hash_traits): Same. (ipa_vr::get_vrange): Same. (ipa_vr::streamer_read): Same. (ipa_vr::streamer_write): Same. (ipa_vr::dump): Same. (ipa_set_jfunc_vr): Same. (ipa_compute_jump_functions_for_edge): Same. (ipcp_get_parm_bits): Same. (ipcp_update_vr): Same. (ipa_record_return_value_range): Same. (ipa_return_value_range): Same. * ipa-prop.h (ipa_return_value_range): Same. (ipa_record_return_value_range): Same. * range-op.h (range_cast): Same. * tree-ssa-dom.cc (dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Same. (cprop_operand): Same. * tree-ssa-loop-ch.cc (loop_static_stmt_p): Same. * tree-ssa-loop-niter.cc (record_nonwrapping_iv): Same. * tree-ssa-loop-split.cc (split_at_bb_p): Same. * tree-ssa-phiopt.cc (value_replacement): Same. * tree-ssa-strlen.cc (get_range): Same. * tree-ssa-threadedge.cc (hybrid_jt_simplifier::simplify): Same. (hybrid_jt_simplifier::compute_exit_dependencies): Same. * tree-ssanames.cc (set_range_info): Same. (duplicate_ssa_name_range_info): Same. * tree-vrp.cc (remove_unreachable::handle_early): Same. (remove_unreachable::remove_and_update_globals): Same. (execute_ranger_vrp): Same. * value-query.cc (range_query::value_of_expr): Same. (range_query::value_on_edge): Same. (range_query::value_of_stmt): Same. (range_query::value_on_entry): Same. (range_query::value_on_exit): Same. (range_query::get_tree_range): Same. * value-range-storage.cc (vrange_storage::set_vrange): Same. * value-range.cc (Value_Range::dump): Same. (value_range::dump): Same. (debug): Same. * value-range.h (enum value_range_discriminator): Same. (class vrange): Same. (class Value_Range): Same. (class value_range): Same. (Value_Range::Value_Range): Same. (value_range::value_range): Same. (Value_Range::~Value_Range): Same. (value_range::~value_range): Same. (Value_Range::set_type): Same. (value_range::set_type): Same. (Value_Range::init): Same. (value_range::init): Same. (Value_Range::operator=): Same. (value_range::operator=): Same. (Value_Range::operator==): Same. (value_range::operator==): Same. (Value_Range::operator!=): Same. (value_range::operator!=): Same. (Value_Range::supports_type_p): Same. (value_range::supports_type_p): Same. * vr-values.cc (simplify_using_ranges::fold_cond_with_ops): Same. (simplify_using_ranges::legacy_fold_cond): Same.
2024-05-23[prange] Use type agnostic range in phiopt [PR115191]Aldy Hernandez1-3/+2
Fix a use of int_range_max in phiopt that should be a type agnostic range, because it could be either a pointer or an int. PR tree-optimization/115191 gcc/ChangeLog: * tree-ssa-phiopt.cc (value_replacement): Use Value_Range instead of int_range_max. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr115191.c: New test.
2024-05-20PHIOPT: Don't transform minmax if middle bb contains a phi [PR115143]Andrew Pinski1-0/+12
The problem here is even if last_and_only_stmt returns a statement, the bb might still contain a phi node which defines a ssa name which is used in that statement so we need to add a check to make sure that the phi nodes are empty for the middle bbs in both the `CMP?MINMAX:MINMAX` case and the `CMP?MINMAX:B` cases. Bootstrapped and tested on x86_64_linux-gnu with no regressions. PR tree-optimization/115143 gcc/ChangeLog: * tree-ssa-phiopt.cc (minmax_replacement): Check for empty phi nodes for middle bbs for the case where middle bb is not empty. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr115143-1.c: New test. * gcc.c-torture/compile/pr115143-2.c: New test. * gcc.c-torture/compile/pr115143-3.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-30PHIOPT: Value-replacement check undefAndrew Pinski1-0/+7
While moving value replacement part of PHIOPT over to use match-and-simplify, I ran into the case where we would have an undef use that was conditional become unconditional. This prevents that. I can't remember at this point what the testcase was though. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (value_replacement): Reject undef variables so they don't become unconditional used. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-30PHI-OPT: speed up value_replacement slightlyAndrew Pinski1-7/+15
This adds a few early outs to value_replacement that I noticed while rewriting this to use match-and-simplify but could be committed seperately. * virtual operands won't change so return early for them * special case `A ? B : B` as that is already just `B` Also moves the check for NE/EQ earlier as calculating empty_or_with_defined_p is an IR walk for a BB and that might be big. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (value_replacement): Move check for NE/EQ earlier. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-30MATCH: change single_non_singleton_phi_for_edges for singleton phisAndrew Pinski1-8/+0
I noticed that single_non_singleton_phi_for_edges could return a phi whos entry are all the same for the edge. This happens only if there was a single phis in the first place. Also gimple_seq_singleton_p walks the sequence to see if it the one element in the sequence so there is removing that check actually reduces the number of pointer walks needed. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges): Remove the special case of gimple_seq_singleton_p. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-04-09Fix up duplicated words mostly in comments, part 2Jakub Jelinek1-1/+1
Another patch from eyeballing git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 ' output, this time in gcc/ subdirectory. 2024-04-09 Jakub Jelinek <jakub@redhat.com> gcc/ * expr.cc (convert_mode_scalar): Fix duplicated words in comment; into into -> it into. * function.h (function::cond_uids): Fix duplicated words in comment; same same -> same. * config/riscv/riscv-vector-costs.cc (costs::adjust_vect_cost_per_loop): Fix duplicated words in comment; model model -> model. * config/riscv/riscv-vector-builtins-shapes.cc (build_base): Fix duplicated words in comment; for for -> for. * config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Fix duplicated words in comment; more more -> more. * config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Fix duplicated words in comment; be be -> be. * tree-profile.cc (masking_vectors): Fix duplicated words in comment; has has -> has, the the -> the. * value-range.cc (irange::set_range_from_bitmask): Fix duplicated words in comment; the the -> the. * gcov.cc (add_condition_counts): Fix duplicated words in comment; to to -> to. * vr-values.cc (get_scev_info): Fix duplicated words in comment; the the -> to the. * tree-vrp.cc (fully_replaceable): Fix duplicated words in comment; by by -> by. * mode-switching.cc (single_succ_confluence_n): Fix duplicated words in comment; the the -> the. * tree-ssa-phiopt.cc (value_replacement): Fix duplicated words in comment; can can -> we can. * gimple-range-phi.cc (phi_analyzer::process_phi): Fix duplicated words in comment; it it -> it is. * tree-ssa-sccvn.cc (visit_phi): Fix duplicated words in comment; to to -> to. * rtl-ssa/accesses.h (use_info::next_debug_insn_use): Fix duplicated words in comment; if if -> if. * doc/options.texi (InverseMask): Fix duplicated words; and and -> and. Change take to takes. * doc/invoke.texi (fanalyzer-undo-inlining): Fix duplicated words; be be -> be. (-minline-memops-threshold): Likewise. gcc/analyzer/ * analyzer.opt (Wanalyzer-undefined-behavior-strtok): Fix duplicated words; in in -> in. * program-state.cc (sm_state_map::replay_call_summary): Fix duplicated words in comment; to to -> to. (program_state::replay_call_summary): Likewise. * region-model.cc (region_model::replay_call_summary): Likewise. gcc/c/ * c-decl.cc (previous_tag): Fix duplicated words in comment; the the -> the. (diagnose_mismatched_decls): Fix duplicated words in comment; about about -> about. gcc/cp/ * constexpr.cc (build_new_constexpr_heap_type): Fix duplicated words in comment; is is -> is. * cp-tree.def (CO_RETURN_EXPR): Fix duplicated words in comment; for for -> for. * parser.cc (fixup_blocks_walker): Fix duplicated words in comment; is is -> is. * semantics.cc (fixup_template_type): Fix duplicated words in comment; for for -> for. (finish_omp_for): Fix duplicated words in comment; the the -> the. * pt.cc (more_specialized_fn): Fix duplicated words in comment; think think -> think. (type_targs_deducible_from): Fix duplicated words in comment; the the -> the. gcc/jit/ * docs/topics/expressions.rst (Constructor expressions): Fix duplicated words; have have -> have.
2024-01-03Update copyright years.Jakub Jelinek1-1/+1
2023-12-09phiopt: Fix ICE with large --param l1-cache-line-size= [PR112887]Jakub Jelinek1-4/+3
This function is never called when param_l1_cache_line_size is 0, but it uses int and unsigned int variables to hold alignment in bits, so for large param_l1_cache_line_size it is zero and e.g. DECL_ALIGN () % param_align_bits can divide by zero. Looking at the code, the function uses tree_fits_uhwi_p on the trees before converting them using tree_to_uhwi to int variables, which looks just wrong, either it would need to punt if it doesn't fit into those and also check for overflows during the computation, or use unsigned HOST_WIDE_INT for all of this. That also fixes the division by zero, as param_l1_cache_line_size maximum is INT_MAX, that multiplied by 8 will always fit. 2023-12-09 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/112887 * tree-ssa-phiopt.cc (hoist_adjacent_loads): Change type of param_align, param_align_bits, offset1, offset2, size2 and align1 variables from int or unsigned int to unsigned HOST_WIDE_INT. * gcc.dg/pr112887.c: New test.
2023-11-14Add type-generic clz/ctz/clrsb/ffs/parity/popcount builtins [PR111309]Jakub Jelinek1-6/+60
The following patch adds 6 new type-generic builtins, __builtin_clzg __builtin_ctzg __builtin_clrsbg __builtin_ffsg __builtin_parityg __builtin_popcountg The g at the end stands for generic because the unsuffixed variant of the builtins already have unsigned int or int arguments. The main reason to add these is to support arbitrary unsigned (for clrsb/ffs signed) bit-precise integer types and also __int128 which wasn't supported by the existing builtins, so that e.g. <stdbit.h> type-generic functions could then support not just bit-precise unsigned integer type whose width matches a standard or extended integer type, but others too. None of these new builtins promote their first argument, so the argument can be e.g. unsigned char or unsigned short or unsigned __int20 etc. The first 2 support either 1 or 2 arguments, if only 1 argument is supplied, the behavior is undefined for argument 0 like for other __builtin_c[lt]z* builtins, if 2 arguments are supplied, the second argument should be int that will be returned if the argument is 0. All other builtins have just one argument. For __builtin_clrsbg and __builtin_ffsg the argument shall be any signed standard/extended or bit-precise integer, for the others any unsigned standard/extended or bit-precise integer (bool not allowed). One possibility would be to also allow signed integer types for the clz/ctz/parity/popcount ones (and just cast the argument to unsigned_type_for during folding) and similarly unsigned integer types for the clrsb/ffs ones, dunno what is better; for stdbit.h the current version is sufficient and diagnoses use of the inappropriate sign, though on the other side I wonder if users won't be confused by __builtin_clzg (1) being an error and having to write __builtin_clzg (1U). The new builtins are lowered to corresponding builtins with other suffixes or internal calls (plus casts and adjustments where needed) during FE folding or during gimplification at latest, the non-suffixed builtins handling precisions up to precision of int, l up to precision of long, ll up to precision of long long, up to __int128 precision lowered to double-word expansion early and the rest (which must be _BitInt) lowered to internal fn calls - those are then lowered during bitint lowering pass. The patch also changes representation of IFN_CLZ and IFN_CTZ calls, previously they were in the IL only if they are directly supported optab and depending on C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 they had or didn't have defined behavior at 0, now they are in the IL either if directly supported optab, or for the large/huge BITINT_TYPEs and they have either 1 or 2 arguments. If one, the behavior is undefined at zero, if 2, the second argument is an int constant that should be returned for 0. As there is no extra support during expansion, for directly supported optab the second argument if present should still match the C[LT]Z_DEFINED_VALUE_AT_ZERO (...) == 2 value, but for BITINT_TYPE arguments it can be arbitrary int INTEGER_CST. The indended uses in stdbit.h are e.g. #ifdef __has_builtin #if __has_builtin(__builtin_clzg) && __has_builtin(__builtin_ctzg) && __has_builtin(__builtin_popcountg) #define stdc_leading_zeros(value) \ ((unsigned int) __builtin_clzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0))) #define stdc_leading_ones(value) \ ((unsigned int) __builtin_clzg ((__typeof (value)) ~(value), __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0))) #define stdc_first_trailing_one(value) \ ((unsigned int) (__builtin_ctzg (value, -1) + 1)) #define stdc_trailing_zeros(value) \ ((unsigned int) __builtin_ctzg (value, __builtin_popcountg ((__typeof (value)) ~(__typeof (value)) 0))) #endif #endif where __builtin_popcountg ((__typeof (x)) -1) computes the bit precision of x's type (kind of _Bitwidthof (x) alternative). They also allow casting of arbitrary unsigned _BitInt other than unsigned _BitInt(1) to corresponding signed _BitInt by using signed _BitInt(__builtin_popcountg ((__typeof (a)) -1)) and of arbitrary signed _BitInt to corresponding unsigned _BitInt using unsigned _BitInt(__builtin_clrsbg ((__typeof (a)) -1) + 1). 2023-11-14 Jakub Jelinek <jakub@redhat.com> PR c/111309 gcc/ * builtins.def (BUILT_IN_CLZG, BUILT_IN_CTZG, BUILT_IN_CLRSBG, BUILT_IN_FFSG, BUILT_IN_PARITYG, BUILT_IN_POPCOUNTG): New builtins. * builtins.cc (fold_builtin_bit_query): New function. (fold_builtin_1): Use it for BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. (fold_builtin_2): Use it for BUILT_IN_{CLZ,CTZ}G. * fold-const-call.cc: Fix comment typo on tm.h inclusion. (fold_const_call_ss): Handle CFN_BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. (fold_const_call_sss): New function. (fold_const_call_1): Call it for 2 argument functions returning scalar when passed 2 INTEGER_CSTs. * genmatch.cc (cmp_operand): For function calls also compare number of arguments. (fns_cmp): New function. (dt_node::gen_kids): Sort fns and generic_fns. (dt_node::gen_kids_1): Handle fns with the same id but different number of arguments. * match.pd (CLZ simplifications): Drop checks for defined behavior at zero. Add variant of simplifications for IFN_CLZ with 2 arguments. (CTZ simplifications): Drop checks for defined behavior at zero, don't optimize precisions above MAX_FIXED_MODE_SIZE. Add variant of simplifications for IFN_CTZ with 2 arguments. (a != 0 ? CLZ(a) : CST -> .CLZ(a)): Use TREE_TYPE (@3) instead of type, add BITINT_TYPE handling, create 2 argument IFN_CLZ rather than one argument. Add variant for matching CLZ with 2 arguments. (a != 0 ? CTZ(a) : CST -> .CTZ(a)): Similarly. * gimple-lower-bitint.cc (bitint_large_huge::lower_bit_query): New method. (bitint_large_huge::lower_call): Use it for IFN_{CLZ,CTZ,CLRSB,FFS} and IFN_{PARITY,POPCOUNT} calls. * gimple-range-op.cc (cfn_clz::fold_range): Don't check CLZ_DEFINED_VALUE_AT_ZERO for m_gimple_call_internal_p, instead assume defined value at zero if the call has 2 arguments and use second argument value for that case. (cfn_ctz::fold_range): Similarly. (gimple_range_op_handler::maybe_builtin_call): Use op_cfn_clz_internal or op_cfn_ctz_internal only if internal fn call has 2 arguments and set m_op2 in that case. * tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern, vect_recog_popcount_clz_ctz_ffs_pattern): For value defined at zero use second argument of calls if present, otherwise assume UB at zero, create 2 argument .CLZ/.CTZ calls if needed. * tree-vect-stmts.cc (vectorizable_call): Handle 2 argument .CLZ/.CTZ calls. * tree-ssa-loop-niter.cc (build_cltz_expr): Create 2 argument .CLZ/.CTZ calls if needed. * tree-ssa-forwprop.cc (simplify_count_trailing_zeroes): Create 2 argument .CTZ calls if needed. * tree-ssa-phiopt.cc (cond_removal_in_builtin_zero_pattern): Handle 2 argument .CLZ/.CTZ calls, handle BITINT_TYPE, create 2 argument .CLZ/.CTZ calls. * doc/extend.texi (__builtin_clzg, __builtin_ctzg, __builtin_clrsbg, __builtin_ffsg, __builtin_parityg, __builtin_popcountg): Document. gcc/c-family/ * c-common.cc (check_builtin_function_arguments): Handle BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. * c-gimplify.cc (c_gimplify_expr): If __builtin_c[lt]zg second argument hasn't been folded into constant yet, transform it to one argument call inside of a COND_EXPR which for first argument 0 returns the second argument. gcc/c/ * c-typeck.cc (convert_arguments): Don't promote first argument of BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. gcc/cp/ * call.cc (magic_varargs_p): Return 4 for BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. (build_over_call): Don't promote first argument of BUILT_IN_{CLZ,CTZ,CLRSB,FFS,PARITY,POPCOUNT}G. * cp-gimplify.cc (cp_gimplify_expr): For BUILT_IN_C{L,T}ZG use c_gimplify_expr. gcc/testsuite/ * c-c++-common/pr111309-1.c: New test. * c-c++-common/pr111309-2.c: New test. * gcc.dg/torture/bitint-43.c: New test. * gcc.dg/torture/bitint-44.c: New test.
2023-10-24Improve factor_out_conditional_operation for conversions and constantsAndrew Pinski1-3/+13
In the case of a NOP conversion (precisions of the 2 types are equal), factoring out the conversion can be done even if int_fits_type_p returns false and even when the conversion is defined by a statement inside the conditional. Since it is a NOP conversion there is no zero/sign extending happening which is why it is ok to be done here; we were trying to prevent an extra sign/zero extend from being moved away from definition which no-op conversions are not. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/104376 PR tree-optimization/101541 * tree-ssa-phiopt.cc (factor_out_conditional_operation): Allow nop conversions even if it is defined by a statement inside the conditional. gcc/testsuite/ChangeLog: PR tree-optimization/101541 * gcc.dg/tree-ssa/phi-opt-39.c: New test.
2023-09-26PHIOPT: Fix minmax_replacement for three wayAndrew Pinski1-2/+7
So when diamond bb support was added to minmax_replacement in r13-1950-g9bb19e143cfe, the code was not expecting the alt_middle_bb not to exist if it was empty (for threeway_p). So when factor_out_conditional_conversion was used to factor out conversions, it turns out the assumption for alt_middle_bb to be wrong and we ended up with threeway_p being true but having middle_bb being empty but alt_middle_bb not being empty which causes wrong code in many cases. This patch fixes the issue by adding a test for the 2 cases where the assumption on threeway_p case having the other bb being empty. Changes made: v2: Fix test for `(a <= u) b = MAX(a, d) else b = u`. Note my plan for GCC 15 is remove minmax_replacement as match.pd will catch all cases at that point. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/111469 gcc/ChangeLog: * tree-ssa-phiopt.cc (minmax_replacement): Fix the assumption for the `non-diamond` handling cases of diamond code. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111469-1.c: New test.
2023-09-10Fix PR 111331: wrong code for `a > 28 ? MIN<a, 28> : 29`Andrew Pinski1-4/+4
The problem here is after r6-7425-ga9fee7cdc3c62d0e51730, the comparison to see if the transformation could be done was using the wrong value. Instead of see if the inner was LE (for MIN and GE for MAX) the outer value, it was comparing the inner to the value used in the comparison which was wrong. The match pattern copied the same logic mistake when they were added in r14-1411-g17cca3c43e2f49 . OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/111331 * match.pd (`(a CMP CST1) ? max<a,CST2> : a`): Fix the LE/GE comparison to the correct value. * tree-ssa-phiopt.cc (minmax_replacement): Fix the LE/GE comparison for the `(a CMP CST1) ? max<a,CST2> : a` optimization. gcc/testsuite/ChangeLog: PR tree-optimization/111331 * gcc.c-torture/execute/pr111331-1.c: New test. * gcc.c-torture/execute/pr111331-2.c: New test. * gcc.c-torture/execute/pr111331-3.c: New test.
2023-08-28PHIOPT: Add dump for match and simplify and early phioptAndrew Pinski1-26/+44
This adds dump on the full result of the match-and-simplify for phiopt and specifically to know if we are rejecting something due to being in early phi-opt. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (gimple_simplify_phiopt): Add dump information when resimplify returns true. (match_simplify_replacement): Print only if accepted the match-and-simplify result rather than the full sequence.
2023-08-10phiopt: Fix phiopt ICE on vops [PR102989]Jakub Jelinek1-1/+11
I've ran into ICE on gcc.dg/torture/bitint-42.c with -O1 or -Os when enabling expensive tests, and unfortunately I can't reproduce without _BitInt. The IL before phiopt3 has: <bb 87> [local count: 203190070]: # .MEM_428 = VDEF <.MEM_367> bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC3); goto <bb 89>; [100.00%] <bb 88> [local count: 203190070]: # .MEM_427 = VDEF <.MEM_367> bitint.159 = VIEW_CONVERT_EXPR<unsigned long[8]>(*.LC4); <bb 89> [local count: 406380139]: # .MEM_368 = PHI <.MEM_428(87), .MEM_427(88)> # VUSE <.MEM_368> _123 = VIEW_CONVERT_EXPR<unsigned long[8]>(r495[i_107].D.2780)[0]; and factor_out_conditional_operation is called on the vop PHI, it sees it has exactly two operands and defining statements of both PHI arguments are converts (VCEs in this case), so it thinks it is a good idea to try to optimize that and while doing that it constructs void type SSA_NAMEs and the like. 2023-08-10 Jakub Jelinek <jakub@redhat.com> PR c/102989 * tree-ssa-phiopt.cc (single_non_singleton_phi_for_edges): Never return virtual phis and return NULL if there is a virtual phi where the arguments from E0 and E1 edges aren't equal.
2023-08-02PHIOPT: Mark the conditional lhs and rhs as to look at to see if DCEableAndrew Pinski1-5/+16
In some cases (usually dealing with bools only), there could be some statements left behind which are considered trivial dead. An example is: ``` bool f(bool a, bool b) { if (!a && !b) return 0; if (!a && b) return 0; if (a && !b) return 0; return 1; } ``` Where during phiopt2, the IR had: ``` _3 = ~b_7(D); _4 = _3 & a_6(D); _4 != 0 ? 0 : 1 ``` match-and-simplify would transform that into: ``` _11 = ~a_6(D); _12 = b_7(D) | _11; ``` But phiopt would leave around the statements defining _4 and _3. This helps by marking the conditional's lhs and rhs to see if they are trivial dead. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Mark's cond statement's lhs and rhs to check if trivial dead. Rename inserted_exprs to exprs_maybe_dce; also move it so bitmap is not allocated if not needed.
2023-07-21tree-optimization/88540 - FP x > y ? x : y if-conversion without -ffast-mathRichard Biener1-5/+16
The following makes sure that FP x > y ? x : y style max/min operations are if-converted at the GIMPLE level. While we can neither match it to MAX_EXPR nor .FMAX as both have different semantics with IEEE than the ternary ?: operation we can make sure to maintain this form as a COND_EXPR so backends have the chance to match this to instructions their ISA offers. The patch does this in phiopt where we recognize min/max and instead of giving up when we have to honor NaNs we alter the generated code to a COND_EXPR. This resolves PR88540 and we can then SLP vectorize the min operation for its testcase. It also resolves part of the regressions observed with the change matching bit-inserts of bit-field-refs to vec_perm. Expansion from a COND_EXPR rather than from compare-and-branch gcc.target/i386/pr54855-9.c by producing extra moves while the corresponding min/max operations are now already synthesized by RTL expansion, register selection isn't optimal. This can be also provoked without this change by altering the operand order in the source. I have XFAILed that part of the test. PR tree-optimization/88540 * tree-ssa-phiopt.cc (minmax_replacement): Do not give up with NaNs but handle the simple case by if-converting to a COND_EXPR. * gcc.target/i386/pr88540.c: New testcase. * gcc.target/i386/pr54855-9.c: XFAIL check for redundant moves. * gcc.target/i386/pr54855-12.c: Adjust. * gcc.target/i386/pr54855-13.c: Likewise. * gcc.target/i386/pr110170.c: Likewise. * gcc.dg/tree-ssa/split-path-12.c: Likewise.
2023-07-19[PATCH] Fix tree-opt/110252: wrong code due to phiopt using flow sensitive ↵Andrew Pinski1-3/+48
info during match Match will query ranger via tree_nonzero_bits/get_nonzero_bits for 2 and 3rd operand of the COND_EXPR and phiopt tries to do create the COND_EXPR even if we moving one statement. That one statement could have some flow sensitive information on it based on the condition that is for the COND_EXPR but that might create wrong code if the statement was moved out. This is similar to the previous version of the patch except now we use flow_sensitive_info_storage instead of manually doing the save/restore and also handle all defs on a gimple statement rather than just for lhs of the gimple statement. Oh and a few more testcases were added that was failing before. OK? Bootsrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/110252 gcc/ChangeLog: * tree-ssa-phiopt.cc (class auto_flow_sensitive): New class. (auto_flow_sensitive::auto_flow_sensitive): New constructor. (auto_flow_sensitive::~auto_flow_sensitive): New deconstructor. (match_simplify_replacement): Temporarily remove the flow sensitive info on the two statements that might be moved. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-25b.c: Updated as __builtin_parity loses the nonzerobits info. * gcc.c-torture/execute/pr110252-1.c: New test. * gcc.c-torture/execute/pr110252-2.c: New test. * gcc.c-torture/execute/pr110252-3.c: New test. * gcc.c-torture/execute/pr110252-4.c: New test.
2023-07-04tree-optimization/110491 - PHI-OPT and undefsRichard Biener1-0/+7
The following makes sure to not make conditional undefs in PHI arguments unconditional by folding cond ? arg1 : arg2. PR tree-optimization/110491 * tree-ssa-phiopt.cc (match_simplify_replacement): Check whether the PHI args are possibly undefined before folding the COND_EXPR. * gcc.dg/torture/pr110491.c: New testcase.
2023-07-04Use mark_ssa_maybe_undefs in PHI-OPTRichard Biener1-2/+6
The following removes gimple_uses_undefined_value_p and instead uses the conservative mark_ssa_maybe_undefs in PHI-OPT, the last user of the other API. * tree-ssa-phiopt.cc (pass_phiopt::execute): Mark SSA undefs. (empty_bb_or_one_feeding_into_p): Check for them. * tree-ssa.h (gimple_uses_undefined_value_p): Remove. * tree-ssa.cc (gimple_uses_undefined_value_p): Likewise.
2023-05-08PHIOPT: factor out unary operations instead of just conversionsAndrew Pinski1-25/+31
After using factor_out_conditional_conversion with diamond bb, we should be able do use it also for all normal unary gimple and not just conversions. This allows to optimize PR 59424 for an example. This is also a start to optimize PR 64700 and a few others. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. An example of this is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } static int abs1(int a) { if (a < 0) a = -a; return a; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (d > e) t = g(abs1(d)); else t = g(abs1(e)); return t; } ``` Which should be optimized to: _9 = MAX_EXPR <d_5(D), e_6(D)>; _4 = ABS_EXPR <_9>; t_3 = (long long unsigned intD.16) _4; gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_conversion): Rename to ... (factor_out_conditional_operation): This and add support for all unary operations. (pass_phiopt::execute): Update call to factor_out_conditional_conversion to call factor_out_conditional_operation instead. PR tree-optimization/109424 PR tree-optimization/59424 gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-2.c: Update tree scan for details change in wording. * gcc.dg/tree-ssa/minmax-17.c: Likewise. * gcc.dg/tree-ssa/pr103771.c: Likewise. * gcc.dg/tree-ssa/minmax-18.c: New test. * gcc.dg/tree-ssa/minmax-19.c: New test.
2023-05-08PHIOPT: Loop over calling factor_out_conditional_conversionAndrew Pinski1-12/+15
After adding diamond shaped bb support to factor_out_conditional_conversion, we can get a case where we have two conversions that needs factored out and then would have another phiopt happen. An example is: ``` static inline unsigned long long g(int t) { unsigned t1 = t; return t1; } unsigned long long f(int c, int d, int e) { unsigned long long t; if (c > d) t = g(c); else t = g(d); return t; } ``` In this case we should get a MAX_EXPR in phiopt1 with two casts. Before this patch, we would just factor out the outer cast and then wait till phiopt2 to factor out the inner cast. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Loop over factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-17.c: New test.
2023-05-08PHIOPT: Add diamond bb form to factor_out_conditional_conversionAndrew Pinski1-1/+1
So the function factor_out_conditional_conversion already supports diamond shaped bb forms, just need to be called for such a thing. harden-cond-comp.c needed to be changed as we would optimize out the conversion now and that causes the compare hardening not needing to split the block which it was testing. So change it such that there would be no chance of optimization. Also add two testcases that showed the improvement. PR 103771 is solved in ifconvert also for the vectorizer but now it is solved in a general sense. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/49959 PR tree-optimization/103771 gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Support Diamond shapped bb form for factor_out_conditional_conversion. gcc/testsuite/ChangeLog: * c-c++-common/torture/harden-cond-comp.c: Change testcase slightly to avoid the new phiopt optimization. * gcc.dg/tree-ssa/abs-2.c: New test. * gcc.dg/tree-ssa/pr103771.c: New test.
2023-05-04PHIOPT: Fix diamond case of match_simplify_replacementAndrew Pinski1-3/+26
So it turns out I messed checking which edge was true/false for the diamond form. The edges, e0 and e1 here are edges from the merge block but the true/false edges are from the conditional block and with diamond/threeway, there is a bb inbetween on both edges. Most of the time, the check that was in match_simplify_replacement would happen to be correct for diamond form as most of the time the first edge in the conditional is the edge for the true side of the conditional. This is why I didn't see the issue during bootstrap/testing. I added a fragile gimple testcase which exposed the issue. Since there is no way to specify the order of the edges in the gimple fe, we have to have forwprop to swap the false/true edges (not order of them, just swapping true/false flags) and hope not to do cleanupcfg inbetween forwprop and the first phiopt pass. This is the fragile part really, it is not that we will produce wrong code, just we won't hit what was the failing case. OK? Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/109732 gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Fix the selection of the argtrue/argfalse. gcc/testsuite/ChangeLog: * gcc.dg/pr109732.c: New test. * gcc.dg/pr109732-1.c: New test.
2023-05-04PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bbAndrew Pinski1-1/+34
While looking at differences between what minmax_replacement and match_simplify_replacement does. I noticed that they sometimes chose different edges to remove. I decided we should be able to do better and be able to remove both empty basic blocks in the case of match_simplify_replacement as that moves the statements. This also updates the testcases as now match_simplify_replacement will remove the unused MIN/MAX_EXPR and they were checking for those. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle diamond form bb with forwarder only empty blocks better. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-15.c: Update test. * gcc.dg/tree-ssa/minmax-16.c: Update test. * gcc.dg/tree-ssa/minmax-3.c: Update test. * gcc.dg/tree-ssa/minmax-4.c: Update test. * gcc.dg/tree-ssa/minmax-5.c: Update test. * gcc.dg/tree-ssa/minmax-8.c: Update test.
2023-05-04PHIOPT: Improve replace_phi_edge_with_variable's dce_ssa_names slightlyAndrew Pinski1-2/+3
When I added the dce_ssa_names argument, I didn't realize bitmap was a pointer so I used the default argument value as auto_bitmap(). But instead we could just use nullptr and check if it was a nullptr before calling simple_dce_from_worklist. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Change the default argument value for dce_ssa_names to nullptr. Check to make sure dce_ssa_names is a non-nullptr before calling simple_dce_from_worklist.
2023-05-04Rename last_stmt to last_nondebug_stmtRichard Biener1-2/+2
The following renames last_stmt to last_nondebug_stmt which is what it really does. * tree-cfg.h (last_stmt): Rename to ... (last_nondebug_stmt): ... this. * tree-cfg.cc (last_stmt): Rename to ... (last_nondebug_stmt): ... this. (assign_discriminators): Adjust. (group_case_labels_stmt): Likewise. (gimple_can_duplicate_bb_p): Likewise. (execute_fixup_cfg): Likewise. * auto-profile.cc (afdo_propagate_circuit): Likewise. * gimple-range.cc (gimple_ranger::range_on_exit): Likewise. * omp-expand.cc (workshare_safe_to_combine_p): Likewise. (determine_parallel_type): Likewise. (adjust_context_and_scope): Likewise. (expand_task_call): Likewise. (remove_exit_barrier): Likewise. (expand_omp_taskreg): Likewise. (expand_omp_for_init_counts): Likewise. (expand_omp_for_init_vars): Likewise. (expand_omp_for_static_chunk): Likewise. (expand_omp_simd): Likewise. (expand_oacc_for): Likewise. (expand_omp_for): Likewise. (expand_omp_sections): Likewise. (expand_omp_atomic_fetch_op): Likewise. (expand_omp_atomic_cas): Likewise. (expand_omp_atomic): Likewise. (expand_omp_target): Likewise. (expand_omp): Likewise. (omp_make_gimple_edges): Likewise. * trans-mem.cc (tm_region_init): Likewise. * tree-inline.cc (redirect_all_calls): Likewise. * tree-parloops.cc (gen_parallel_loop): Likewise. * tree-ssa-loop-ch.cc (do_while_loop_p): Likewise. * tree-ssa-loop-ivcanon.cc (canonicalize_loop_induction_variables): Likewise. * tree-ssa-loop-ivopts.cc (stmt_after_ip_normal_pos): Likewise. (may_eliminate_iv): Likewise. * tree-ssa-loop-manip.cc (standard_iv_increment_position): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Likewise. (estimate_numbers_of_iterations): Likewise. * tree-ssa-loop-split.cc (compute_added_num_insns): Likewise. * tree-ssa-loop-unswitch.cc (get_predicates_for_bb): Likewise. (set_predicates_for_bb): Likewise. (init_loop_unswitch_info): Likewise. (hoist_guard): Likewise. * tree-ssa-phiopt.cc (match_simplify_replacement): Likewise. (minmax_replacement): Likewise. * tree-ssa-reassoc.cc (update_range_test): Likewise. (optimize_range_tests_to_bit_test): Likewise. (optimize_range_tests_var_bound): Likewise. (optimize_range_tests): Likewise. (no_side_effect_bb): Likewise. (suitable_cond_bb): Likewise. (maybe_optimize_range_tests): Likewise. (reassociate_bb): Likewise. * tree-vrp.cc (rvrp_folder::pre_fold_bb): Likewise.
2023-05-02PHIOPT: small refactoring of match_simplify_replacement.Andrew Pinski1-33/+24
When I added diamond shaped form bb to match_simplify_replacement, I copied the code to move the statement rather than factoring it out to a new function. This does the refactoring to a new function to avoid the duplicated code. It will make adding support for having two statements to move easier (the second statement will only be a conversion). OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (move_stmt): New function. (match_simplify_replacement): Use move_stmt instead of the inlined version.
2023-05-01PHIOPT: Update comment about what the pass now doesAndrew Pinski1-31/+36
I noticed I didn't update the comment about how the pass works after I initially added match_simplify_replacement. Anyways this updates the comment to be the current state of the pass. OK? gcc/ChangeLog: * tree-ssa-phiopt.cc: Update comment about how the transformation are implemented.
2023-05-01Conversion to irange wide_int API.Aldy Hernandez1-1/+2
This converts the irange API to use wide_ints exclusively, along with its users. This patch will slow down VRP, as there will be more useless wide_int to tree conversions. However, this slowdown is only temporary, as a follow-up patch will convert the internal representation of iranges to wide_ints for a net overall gain in performance. gcc/ChangeLog: * fold-const.cc (expr_not_equal_to): Convert to irange wide_int API. * gimple-fold.cc (size_must_be_zero_p): Same. * gimple-loop-versioning.cc (loop_versioning::prune_loop_conditions): Same. * gimple-range-edge.cc (gcond_edge_range): Same. (gimple_outgoing_range::calc_switch_ranges): Same. * gimple-range-fold.cc (adjust_imagpart_expr): Same. (adjust_realpart_expr): Same. (fold_using_range::range_of_address): Same. (fold_using_range::relation_fold_and_or): Same. * gimple-range-gori.cc (gori_compute::gori_compute): Same. (range_is_either_true_or_false): Same. * gimple-range-op.cc (cfn_toupper_tolower::get_letter_range): Same. (cfn_clz::fold_range): Same. (cfn_ctz::fold_range): Same. * gimple-range-tests.cc (class test_expr_eval): Same. * gimple-ssa-warn-alloca.cc (alloca_call_type): Same. * ipa-cp.cc (ipa_value_range_from_jfunc): Same. (propagate_vr_across_jump_function): Same. (decide_whether_version_node): Same. * ipa-prop.cc (ipa_get_value_range): Same. * ipa-prop.h (ipa_range_set_and_normalize): Same. * range-op.cc (get_shift_range): Same. (value_range_from_overflowed_bounds): Same. (value_range_with_overflow): Same. (create_possibly_reversed_range): Same. (equal_op1_op2_relation): Same. (not_equal_op1_op2_relation): Same. (lt_op1_op2_relation): Same. (le_op1_op2_relation): Same. (gt_op1_op2_relation): Same. (ge_op1_op2_relation): Same. (operator_mult::op1_range): Same. (operator_exact_divide::op1_range): Same. (operator_lshift::op1_range): Same. (operator_rshift::op1_range): Same. (operator_cast::op1_range): Same. (operator_logical_and::fold_range): Same. (set_nonzero_range_from_mask): Same. (operator_bitwise_or::op1_range): Same. (operator_bitwise_xor::op1_range): Same. (operator_addr_expr::fold_range): Same. (pointer_plus_operator::wi_fold): Same. (pointer_or_operator::op1_range): Same. (INT): Same. (UINT): Same. (INT16): Same. (UINT16): Same. (SCHAR): Same. (UCHAR): Same. (range_op_cast_tests): Same. (range_op_lshift_tests): Same. (range_op_rshift_tests): Same. (range_op_bitwise_and_tests): Same. (range_relational_tests): Same. * range.cc (range_zero): Same. (range_nonzero): Same. * range.h (range_true): Same. (range_false): Same. (range_true_and_false): Same. * tree-data-ref.cc (split_constant_offset_1): Same. * tree-ssa-loop-ch.cc (entry_loop_condition_is_static): Same. * tree-ssa-loop-unswitch.cc (struct unswitch_predicate): Same. (find_unswitching_predicates_for_bb): Same. * tree-ssa-phiopt.cc (value_replacement): Same. * tree-ssa-threadbackward.cc (back_threader::find_taken_edge_cond): Same. * tree-ssanames.cc (ssa_name_has_boolean_range): Same. * tree-vrp.cc (find_case_label_range): Same. * value-query.cc (range_query::get_tree_range): Same. * value-range.cc (irange::set_nonnegative): Same. (frange::contains_p): Same. (frange::singleton_p): Same. (frange::internal_singleton_p): Same. (irange::irange_set): Same. (irange::irange_set_1bit_anti_range): Same. (irange::irange_set_anti_range): Same. (irange::set): Same. (irange::operator==): Same. (irange::singleton_p): Same. (irange::contains_p): Same. (irange::set_range_from_nonzero_bits): Same. (DEFINE_INT_RANGE_INSTANCE): Same. (INT): Same. (UINT): Same. (SCHAR): Same. (UINT128): Same. (UCHAR): Same. (range): New. (tree_range): New. (range_int): New. (range_uint): New. (range_uint128): New. (range_uchar): New. (range_char): New. (build_range3): Convert to irange wide_int API. (range_tests_irange3): Same. (range_tests_int_range_max): Same. (range_tests_strict_enum): Same. (range_tests_misc): Same. (range_tests_nonzero_bits): Same. (range_tests_nan): Same. (range_tests_signed_zeros): Same. * value-range.h (Value_Range::Value_Range): Same. (irange::set): Same. (irange::nonzero_p): Same. (irange::contains_p): Same. (range_includes_zero_p): Same. (irange::set_nonzero): Same. (irange::set_zero): Same. (contains_zero_p): Same. (frange::contains_p): Same. * vr-values.cc (simplify_using_ranges::op_with_boolean_value_range_p): Same. (bounds_of_var_in_loop): Same. (simplify_using_ranges::legacy_fold_cond_overflow): Same.
2023-04-30PHIOPT: Allow moving of some builtin callsAndrew Pinski1-4/+31
While moving working on moving cond_removal_in_builtin_zero_pattern to match, I noticed that functions were not allowed to move as we reject all non-assignments. This changes to allowing a few calls which are known not to throw/trap. Right now it is restricted to ones which cond_removal_in_builtin_zero_pattern handles but adding more is just adding it to the switch statement. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Allow some builtin/internal function calls which are known not to trap/throw. (phiopt_worker::match_simplify_replacement): Use name instead of getting the lhs again.
2023-04-28PHIOPT: Move two_value_replacement to match.pdAndrew Pinski1-155/+2
This patch converts two_value_replacement function into a match.pd pattern. It is a direct translation with only one minor change, does not check for the {0,+-1} case as that is handled before in match.pd so there is no reason to do the extra check for it. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: PR tree-optimization/100958 * tree-ssa-phiopt.cc (two_value_replacement): Remove. (pass_phiopt::execute): Don't call two_value_replacement. * match.pd (a !=/== CST1 ? CST2 : CST3): Add pattern to handle what two_value_replacement did.
2023-04-27PHIOPT: Allow MIN/MAX to have up to 2 MIN/MAX expressions for early phioptAndrew Pinski1-1/+15
In the early PHIOPT mode, the original minmax_replacement, would replace a PHI node with up to 2 min/max expressions in some cases, this allows for that too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (phiopt_early_allow): Allow for up to 2 min/max expressions in the sequence/match code.
2023-04-27PHIOPT: Move store_elim_worker into pass_cselim::executeAndrew Pinski1-131/+119
This simple patch moves the body of store_elim_worker direclty into pass_cselim::execute. Also removes unneeded prototypes too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (cond_store_replacement): Remove prototype. (cond_if_else_store_replacement): Likewise. (get_non_trapping): Likewise. (store_elim_worker): Move into ... (pass_cselim::execute): This.
2023-04-27PHIOPT: Rename tree_ssa_phiopt_worker to pass_phiopt::executeAndrew Pinski1-204/+181
Now that store elimination and phiopt does not share outer code, we can move tree_ssa_phiopt_worker directly into pass_phiopt::execute and remove many declarations (prototypes) from the file. gcc/ChangeLog: * tree-ssa-phiopt.cc (two_value_replacement): Remove prototype. (match_simplify_replacement): Likewise. (factor_out_conditional_conversion): Likewise. (value_replacement): Likewise. (minmax_replacement): Likewise. (spaceship_replacement): Likewise. (cond_removal_in_builtin_zero_pattern): Likewise. (hoist_adjacent_loads): Likewise. (tree_ssa_phiopt_worker): Move into ... (pass_phiopt::execute): this.
2023-04-27PHIOPT: Split out store elimination from phioptAndrew Pinski1-54/+126
Since the last cleanups, it made easier to see that we should split out the store elimination worker from tree_ssa_phiopt_worker function. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove do_store_elim argument and split that part out to ... (store_elim_worker): This new function. (pass_cselim::execute): Call store_elim_worker. (pass_phiopt::execute): Update call to tree_ssa_phiopt_worker.
2023-04-26Remove some uses of deprecated irange API.Aldy Hernandez1-13/+4
gcc/ChangeLog: * builtins.cc (expand_builtin_strnlen): Rewrite deprecated irange API uses to new API. * gimple-predicate-analysis.cc (find_var_cmp_const): Same. * internal-fn.cc (get_min_precision): Same. * match.pd: Same. * tree-affine.cc (expr_to_aff_combination): Same. * tree-data-ref.cc (dr_step_indicator): Same. * tree-dfa.cc (get_ref_base_and_extent): Same. * tree-scalar-evolution.cc (iv_can_overflow_p): Same. * tree-ssa-phiopt.cc (two_value_replacement): Same. * tree-ssa-pre.cc (insert_into_preds_of_block): Same. * tree-ssa-reassoc.cc (optimize_range_tests_to_bit_test): Same. * tree-ssa-strlen.cc (compare_nonzero_chars): Same. * tree-switch-conversion.cc (bit_test_cluster::emit): Same. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Same. * tree.cc (get_range_pos_neg): Same.