aboutsummaryrefslogtreecommitdiff
path: root/gcc/rtl-ssa
AgeCommit message (Collapse)AuthorFilesLines
2024-07-28rtl-ssa: Define INCLUDE_ARRAYRichard Sandiford6-0/+6
g:72fbd3b2b2a497dbbe6599239bd61c5624203ed0 added a use of std::array without explicitly forcing <array> to be included. That didn't cause problems in my local builds but understandably did for some people. gcc/ * doc/rtl.texi: Document the need to define INCLUDE_ARRAY before including rtl-ssa.h. * rtl-ssa.h: Likewise (in comment). * config/aarch64/aarch64-cc-fusion.cc: Add INCLUDE_ARRAY. * config/aarch64/aarch64-early-ra.cc: Likewise. * config/riscv/riscv-avlprop.cc: Likewise. * config/riscv/riscv-vsetvl.cc: Likewise. * fwprop.cc: Likewise. * late-combine.cc: Likewise. * pair-fusion.cc: Likewise. * rtl-ssa/accesses.cc: Likewise. * rtl-ssa/blocks.cc: Likewise. * rtl-ssa/changes.cc: Likewise. * rtl-ssa/functions.cc: Likewise. * rtl-ssa/insns.cc: Likewise. * rtl-ssa/movement.cc: Likewise.
2024-07-28rtl-ssa: Fix split_clobber_group tree insertion [PR116044]Richard Sandiford2-30/+39
PR116044 is a regression in the testsuite on AMD GCN caused (again) by the split_clobber_group code. The first patch in this area (g:71b31690a7c52413496e91bcc5ee4c68af2f366f) fixed a bug caused by carrying the old group over as one of the split ones. That patch instead: - created two new groups - inserted them in the splay tree as neighbours of the old group - removed the old group, and - invalidated the old group (to force lazy recomputation when a clobber's parent group is queried) However, this left add_def trying to insert the new definition relative to a stale splay tree root. The second patch (g:34f33ea801563e2eabb348e8d3e9344a91abfd48) attempted to fix that by inserting it relative to the new root. But that's not always correct either. We specifically want to insert it after the first of the two new groups, whether that group is the root or not. This patch does that, and tries to refactor the code to make it a bit less brittle. gcc/ PR rtl-optimization/116044 * rtl-ssa/functions.h (function_info::split_clobber_group): Return an array of two clobber_groups. * rtl-ssa/accesses.cc (function_info::split_clobber_group): Return the new clobber groups. Don't modify the splay tree here. (function_info::add_def): Update call accordingly. Generalize the splay tree insertion code so that the new definition can be inserted as a child of any existing node, not just the root. Fix the insertion used after calling split_clobber_group.
2024-07-28rtl-ssa: Avoid using a stale splay tree root [PR116009]Richard Sandiford1-1/+2
In the fix for PR115928, I'd failed to notice that "root" was used later in the function, so needed to be updated. gcc/ PR rtl-optimization/116009 * rtl-ssa/accesses.cc (function_info::add_def): Set the root local variable after removing the old clobber group. gcc/testsuite/ PR rtl-optimization/116009 * gcc.c-torture/compile/pr116009.c: New test.
2024-07-28rtl-ssa: Add debug routines for def_splay_treeRichard Sandiford2-0/+18
This patch adds debug routines for def_splay_tree, which I found useful while debugging PR116009. gcc/ * rtl-ssa/accesses.h (rtl_ssa::pp_def_splay_tree): Declare. (dump, debug): Add overloads for def_splay_tree. * rtl-ssa/accesses.cc (rtl_ssa::pp_def_splay_tree): New function. (dump, debug): Add overloads for def_splay_tree.
2024-07-17rtl-ssa: Fix move range canonicalisation [PR115929]Richard Sandiford1-2/+18
In this PR, canonicalize_move_range walked off the end of a list and triggered a null dereference. There are multiple ways of fixing that, but I think the approach taken in the patch should be relatively efficient. gcc/ PR rtl-optimization/115929 * rtl-ssa/movement.h (canonicalize_move_range): Check for null prev and next insns and create an invalid move range for them. gcc/testsuite/ PR rtl-optimization/115929 * gcc.dg/torture/pr115929-2.c: New test.
2024-07-17rtl-ssa: Fix split_clobber_group [PR115928]Richard Sandiford3-22/+32
One of the goals of the rtl-ssa representation was to allow a group of consecutive clobbers to be skipped in constant time, with amortised sublinear insertion and deletion. This involves putting consecutive clobbers in groups. Splitting or joining groups would be linear if we had to update every clobber on each update, so the operation to query a clobber's group is lazy and (again) amortised sublinear. This means that, when splitting a group into two, we cannot reuse the old group for one side. We have to invalidate it, so that the lazy clobber_info::group query can tell that something has changed. The ICE in the PR came from failing to do that. gcc/ PR rtl-optimization/115928 * rtl-ssa/accesses.h (clobber_group): Add a new constructor that takes the first, last and root clobbers. * rtl-ssa/internals.inl (clobber_group::clobber_group): Define it. * rtl-ssa/accesses.cc (function_info::split_clobber_group): Use it. Allocate a new group for both sides and invalidate the previous group. (function_info::add_def): After calling split_clobber_group, remove the old group from the splay tree. gcc/testsuite/ PR rtl-optimization/115928 * gcc.dg/torture/pr115928.c: New test.
2024-07-16rtl-ssa: Fix removal of order_nodes [PR115929]Richard Sandiford1-1/+4
order_nodes are used to implement ordered comparisons between two insns with the same program point number. remove_insn would remove an order_node from its splay tree, but didn't remove it from the insn. This caused confusion if the insn was later reinserted somewhere else that also needed an order_node. gcc/ PR rtl-optimization/115929 * rtl-ssa/insns.cc (function_info::remove_insn): Remove an order_node from the instruction as well as from the splay tree. gcc/testsuite/ PR rtl-optimization/115929 * gcc.dg/torture/pr115929-1.c: New test.
2024-07-16rtl-ssa: Enforce earlyclobbers on hard-coded clobbers [PR115891]Richard Sandiford1-1/+59
The asm in the testcase has a memory operand and also clobbers ax. The clobber means that ax cannot be used to hold inputs, which extends to the address of the memory. I think I had an implicit assumption that constrain_operands would enforce this, but in hindsight, that clearly wasn't going to be true. constrain_operands only looks at constraints, and these clobbers are by definition outside the constraint system. (And that's why they have to be handled conservatively, since there's no way to distinguish the earlyclobber and non-earlyclobber cases.) The semantics of hard-coded clobbers are generic enough that I think they should be handled directly by rtl-ssa, rather than by consumers. And in the context of rtl-ssa, the easiest way to check for a clash is to walk the list of input registers, which we already have to hand. It therefore seemed better not to push this down to a more generic rtl helper. The patch detects hard-coded clobbers in the same way as regrename: by temporarily stubbing out the operands with pc_rtx. gcc/ PR rtl-optimization/115891 * rtl-ssa/changes.cc (find_clobbered_access): New function. (recog_level2): Use it to check for overlap between input registers and hard-coded clobbers. Conditionally reset recog_data.insn after changing the insn code. gcc/testsuite/ PR rtl-optimization/115891 * gcc.target/i386/pr115891.c: New test.
2024-07-12rtl-ssa: Fix prev_any_insn [PR115785]Richard Sandiford3-41/+51
Bit of a brown paper bag issue, but: due to the representation of the insn chain, insn_info::prev_any_insn would sometimes skip over instructions. This led to an invalid update in the PR when adding and removing instructions. I think one of the reasons I failed to spot this when checking the code is that m_prev_insn_or_last_debug_insn is misnamed: it's the previous instruction *of the same type* or the last debug instruction in a group. The patch therefore renames it to m_prev_sametype_or_last_debug_insn (with the term prev_sametype already being used in some accessors). The reason this didn't show up earlier is that (a) prev_any_insn is rarely used directly, (b) no instructions were lost from the def-use chains, and (c) only consecutive debug instructions were skipped when walking the insn chain. The chaining scheme makes prev_any_insn more complicated than next_any_insn, prev_nondebug_insn and next_nondebug_insn, but the object code produced is still relatively simple. gcc/ PR rtl-optimization/115785 * rtl-ssa/insns.h (insn_info::prev_insn_or_last_debug_insn) (insn_info::next_nondebug_or_debug_insn): Remove typedefs. (insn_info::m_prev_insn_or_last_debug_insn): Rename to... (insn_info::m_prev_sametype_or_last_debug_insn): ...this. * rtl-ssa/internals.inl (insn_info::insn_info): Update after above renaming. (insn_info::copy_prev_from): Likewise. (insn_info::set_prev_sametype_insn): Likewise. (insn_info::set_last_debug_insn): Likewise. (insn_info::clear_insn_links): Likewise. (insn_info::has_insn_links): Likewise. * rtl-ssa/member-fns.inl (insn_info::prev_nondebug_insn): Likewise. (insn_info::prev_any_insn): Fix moves from non-debug to debug insns. gcc/testsuite/ PR rtl-optimization/115785 * g++.dg/torture/pr115785.C: New test.
2024-07-10rtl-ssa: Add replace_nondebug_insn [PR115785]Richard Sandiford4-4/+48
change_insns is used to change multiple instructions at once, so that the IR on return is valid & self-consistent. These changes can involve moving instructions, and the new position for one instruction might be expressed in terms of the old position of another instruction that is changing at the same time. change_insns therefore adds placeholder instructions to mark each new instruction position, then replaces each placeholder with the corresponding real instruction. This replacement was done in two steps: removing the old placeholder instruction and inserting the new real instruction. But it's more convenient for the upcoming fix for PR115785 if we do the operation as a single step. That should also be slightly more efficient, since e.g. no splay tree operations are needed. This operation happens purely on the rtl-ssa instruction chain. The placeholders are never represented in rtl. gcc/ PR rtl-optimization/115785 * rtl-ssa/functions.h (function_info::replace_nondebug_insn): Declare. * rtl-ssa/insns.h (insn_info::order_node::set_uid): New function. (insn_info::remove_note): Declare. * rtl-ssa/insns.cc (insn_info::remove_note): New function. (function_info::replace_nondebug_insn): Likewise. * rtl-ssa/changes.cc (function_info::change_insns): Use replace_nondebug_insn instead of remove_insn + add_insn.
2024-06-24rtl-ssa: Rework _ignoring interfacesRichard Sandiford9-221/+251
rtl-ssa has routines for scanning forwards or backwards for something under the control of an exclusion set. These searches are currently used for two main things: - to work out where an instruction can be moved within its EBB - to work out whether recog can add a new hard register clobber The exclusion set was originally a callback function that returned true for insns that should be ignored. However, for the late-combine work, I'd also like to be able to skip an entire definition, along with all its uses. This patch prepares for that by turning the exclusion set into an object that provides predicate member functions. Currently the only two member functions are: - should_ignore_insn: what the old callback did - should_ignore_def: the new functionality but more could be added later. Doing this also makes it easy to remove some asymmetry that I think in hindsight was a mistake: in forward scans, ignoring an insn meant ignoring all definitions in that insn (ok) and all uses of those definitions (non-obvious). The new interface makes it possible to select the required behaviour, with that behaviour being applied consistently in both directions. Now that the exclusion set is a dedicated object, rather than just a "random" function, I think it makes sense to remove the _ignoring suffix from the function names. The suffix was originally there to describe the callback, and in particular to emphasise that a true return meant "ignore" rather than "heed". gcc/ * rtl-ssa.h: Include predicates.h. * rtl-ssa/predicates.h: New file. * rtl-ssa/access-utils.h (prev_call_clobbers_ignoring): Rename to... (prev_call_clobbers): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (next_call_clobbers_ignoring): Rename to... (next_call_clobbers): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (first_nondebug_insn_use_ignoring): Rename to... (first_nondebug_insn_use): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (last_nondebug_insn_use_ignoring): Rename to... (last_nondebug_insn_use): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (last_access_ignoring): Rename to... (last_access): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. Conditionally skip definitions. (prev_access_ignoring): Rename to... (prev_access): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (first_def_ignoring): Replace with... (first_access): ...this new function. (next_access_ignoring): Rename to... (next_access): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. Conditionally skip definitions. * rtl-ssa/change-utils.h (insn_is_changing): Delete. (restrict_movement_ignoring): Rename to... (restrict_movement): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (recog_ignoring): Rename to... (recog): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. * rtl-ssa/changes.h (insn_is_changing_closure): Delete. * rtl-ssa/functions.h (function_info::add_regno_clobber): Treat the ignore parameter as an object with the same interface as ignore_nothing. * rtl-ssa/insn-utils.h (insn_is): Delete. * rtl-ssa/insns.h (insn_is_closure): Delete. * rtl-ssa/member-fns.inl (insn_is_changing_closure::insn_is_changing_closure): Delete. (insn_is_changing_closure::operator()): Likewise. (function_info::add_regno_clobber): Treat the ignore parameter as an object with the same interface as ignore_nothing. (ignore_changing_insns::ignore_changing_insns): New function. (ignore_changing_insns::should_ignore_insn): Likewise. * rtl-ssa/movement.h (restrict_movement_for_dead_range): Treat the ignore parameter as an object with the same interface as ignore_nothing. (restrict_movement_for_defs_ignoring): Rename to... (restrict_movement_for_defs): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. (restrict_movement_for_uses_ignoring): Rename to... (restrict_movement_for_uses): ...this and treat the ignore parameter as an object with the same interface as ignore_nothing. Conditionally skip definitions. * doc/rtl.texi: Update for above name changes. Use ignore_changing_insns instead of insn_is_changing. * config/aarch64/aarch64-cc-fusion.cc (cc_fusion::parallelize_insns): Likewise. * pair-fusion.cc (no_ignore): Delete. (latest_hazard_before, first_hazard_after): Update for above name changes. Use ignore_nothing instead of no_ignore. (pair_fusion_bb_info::fuse_pair): Update for above name changes. Use ignore_changing_insns instead of insn_is_changing. (pair_fusion::try_promote_writeback): Likewise.
2024-06-24fwprop: invoke change_is_worthwhile to judge if a replacement is worthwhileHaochen Gui1-0/+8
gcc/ * fwprop.cc (try_fwprop_subst_pattern): Invoke change_is_worthwhile to judge if a replacement is worthwhile. Remove single_set check and add is_debug_insn check. * recog.cc (swap_change): Invalidate recog_data when the cached INSN is swapped out. * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Check if the insn cost of new rtl is unknown and fail the replacement.
2024-06-21rtl-ssa: Don't cost no-op movesRichard Sandiford2-2/+11
No-op moves are given the code NOOP_MOVE_INSN_CODE if we plan to delete them later. Such insns shouldn't be costed, partly because they're going to disappear, and partly because targets won't recognise the insn code. gcc/ * rtl-ssa/changes.cc (rtl_ssa::changes_are_worthwhile): Don't cost no-op moves. * rtl-ssa/insns.cc (insn_info::calculate_cost): Likewise.
2024-04-09Fix up duplicated words mostly in comments, part 2Jakub Jelinek1-1/+1
Another patch from eyeballing git grep -v 'long long\|optab optab\|template template\|double double' | grep ' \([a-zA-Z]\+\) \1 ' output, this time in gcc/ subdirectory. 2024-04-09 Jakub Jelinek <jakub@redhat.com> gcc/ * expr.cc (convert_mode_scalar): Fix duplicated words in comment; into into -> it into. * function.h (function::cond_uids): Fix duplicated words in comment; same same -> same. * config/riscv/riscv-vector-costs.cc (costs::adjust_vect_cost_per_loop): Fix duplicated words in comment; model model -> model. * config/riscv/riscv-vector-builtins-shapes.cc (build_base): Fix duplicated words in comment; for for -> for. * config/riscv/riscv-avlprop.cc (pass_avlprop::execute): Fix duplicated words in comment; more more -> more. * config/aarch64/driver-aarch64.cc (host_detect_local_cpu): Fix duplicated words in comment; be be -> be. * tree-profile.cc (masking_vectors): Fix duplicated words in comment; has has -> has, the the -> the. * value-range.cc (irange::set_range_from_bitmask): Fix duplicated words in comment; the the -> the. * gcov.cc (add_condition_counts): Fix duplicated words in comment; to to -> to. * vr-values.cc (get_scev_info): Fix duplicated words in comment; the the -> to the. * tree-vrp.cc (fully_replaceable): Fix duplicated words in comment; by by -> by. * mode-switching.cc (single_succ_confluence_n): Fix duplicated words in comment; the the -> the. * tree-ssa-phiopt.cc (value_replacement): Fix duplicated words in comment; can can -> we can. * gimple-range-phi.cc (phi_analyzer::process_phi): Fix duplicated words in comment; it it -> it is. * tree-ssa-sccvn.cc (visit_phi): Fix duplicated words in comment; to to -> to. * rtl-ssa/accesses.h (use_info::next_debug_insn_use): Fix duplicated words in comment; if if -> if. * doc/options.texi (InverseMask): Fix duplicated words; and and -> and. Change take to takes. * doc/invoke.texi (fanalyzer-undo-inlining): Fix duplicated words; be be -> be. (-minline-memops-threshold): Likewise. gcc/analyzer/ * analyzer.opt (Wanalyzer-undefined-behavior-strtok): Fix duplicated words; in in -> in. * program-state.cc (sm_state_map::replay_call_summary): Fix duplicated words in comment; to to -> to. (program_state::replay_call_summary): Likewise. * region-model.cc (region_model::replay_call_summary): Likewise. gcc/c/ * c-decl.cc (previous_tag): Fix duplicated words in comment; the the -> the. (diagnose_mismatched_decls): Fix duplicated words in comment; about about -> about. gcc/cp/ * constexpr.cc (build_new_constexpr_heap_type): Fix duplicated words in comment; is is -> is. * cp-tree.def (CO_RETURN_EXPR): Fix duplicated words in comment; for for -> for. * parser.cc (fixup_blocks_walker): Fix duplicated words in comment; is is -> is. * semantics.cc (fixup_template_type): Fix duplicated words in comment; for for -> for. (finish_omp_for): Fix duplicated words in comment; the the -> the. * pt.cc (more_specialized_fn): Fix duplicated words in comment; think think -> think. (type_targs_deducible_from): Fix duplicated words in comment; the the -> the. gcc/jit/ * docs/topics/expressions.rst (Constructor expressions): Fix duplicated words; have have -> have.
2024-02-19rtl-optimization/54052 - RTL SSA PHI insertion compile-time hogRichard Biener1-1/+6
The following tries to address the PHI insertion compile-time hog in RTL fwprop observed with the PR54052 testcase where the loop computing the "unfiltered" set of variables possibly needing PHI nodes for each block exhibits quadratic compile-time and memory-use. It does so by pruning the local DEFs with LR_OUT of the block, removing regs that can never be LR_IN (defined by this block) in the dominance frontier. PR rtl-optimization/54052 * rtl-ssa/blocks.cc (function_info::place_phis): Filter local defs by LR_OUT.
2024-01-23rtl-ssa: Provide easier access to debug uses [PR113089]Alex Coplan2-0/+42
This patch adds some accessors to set_info and use_info to make it easier to get at and iterate through uses in debug insns. It is used by the aarch64 load/store pair fusion pass in a subsequent patch to fix PR113089, i.e. to update debug uses in the pass. gcc/ChangeLog: PR target/113089 * rtl-ssa/accesses.h (use_info::next_debug_insn_use): New. (debug_insn_use_iterator): New. (set_info::first_debug_insn_use): New. (set_info::debug_insn_uses): New. * rtl-ssa/member-fns.inl (use_info::next_debug_insn_use): New. (set_info::first_debug_insn_use): New. (set_info::debug_insn_uses): New.
2024-01-23rtl-ssa: Ensure new defs get inserted [PR113070]Alex Coplan2-11/+30
In r14-5820-ga49befbd2c783e751dc2110b544fe540eb7e33eb I added support to RTL-SSA for inserting new insns, which included support for users creating new defs. However, I missed that apply_changes_to_insn needed updating to ensure that the new defs actually got inserted into the main def chain. This meant that when the aarch64 ldp/stp pass inserted a new stp insn, the stp would just get skipped over during subsequent alias analysis, as its def never got inserted into the memory def chain. This (unsurprisingly) led to wrong code. This patch fixes the issue by ensuring new user-created defs get inserted. I would have preferred to have used a flag internal to the defs instead of a separate data structure to keep track of them, but since machine_mode increased to 16 bits we're already at 64 bits in access_info, and we can't really reuse m_is_temp as the logic in finalize_new_accesses requires it to get cleared. gcc/ChangeLog: PR target/113070 * rtl-ssa.h: Include hash-set.h. * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new_sets parameter and use it to keep track of new user-created sets. (function_info::apply_changes_to_insn): Also call add_def on new sets. (function_info::change_insns): Add hash_set to keep track of new user-created defs. Plumb it through. * rtl-ssa/functions.h: Add hash_set parameter to finalize_new_accesses and apply_changes_to_insn.
2024-01-23rtl-ssa: Support for creating new uses [PR113070]Alex Coplan3-4/+33
This exposes an interface for users to create new uses in RTL-SSA. This is needed for updating uses after inserting a new store pair insn in the aarch64 load/store pair fusion pass. gcc/ChangeLog: PR target/113070 * rtl-ssa/accesses.cc (function_info::create_use): New. * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Ensure new uses end up referring to permanent defs. * rtl-ssa/functions.h (function_info::create_use): Declare.
2024-01-23rtl-ssa: Run finalize_new_accesses forwards [PR113070]Alex Coplan1-5/+16
The next patch in this series exposes an interface for creating new uses in RTL-SSA. The intent is that new user-created uses can consume new user-created defs in the same change group. This is so that we can correctly update uses of memory when inserting a new store pair insn in the aarch64 load/store pair fusion pass (the affected uses need to consume the new store pair insn). As it stands, finalize_new_accesses is called as part of the backwards insn placement loop within change_insns, but if we want new uses to be able to depend on new defs in the same change group, we need finalize_new_accesses to be called on earlier insns first. This is so that when we process temporary uses and turn them into permanent uses, we can follow the last_def link on the temporary def to ensure we end up with a permanent use consuming a permanent def. gcc/ChangeLog: PR target/113070 * rtl-ssa/changes.cc (function_info::change_insns): Split out the call to finalize_new_accesses from the backwards placement loop, run it forwards in a separate loop.
2024-01-03Update copyright years.Jakub Jelinek19-19/+19
2023-12-11Treat "p" in asms as addressing VOIDmodeRichard Sandiford1-1/+3
check_asm_operands was inconsistent about how it handled "p" after RA compared to before RA. Before RA it tested the address with a void (unknown) memory mode: case CT_ADDRESS: /* Every address operand can be reloaded to fit. */ result = result || address_operand (op, VOIDmode); break; After RA it deferred to constrain_operands, which used the mode of the operand: if ((GET_MODE (op) == VOIDmode || SCALAR_INT_MODE_P (GET_MODE (op))) && (strict <= 0 || (strict_memory_address_p (recog_data.operand_mode[opno], op)))) win = true; Using the mode of the operand is necessary for special predicates, where it is used to give the memory mode. But for asms, the operand mode is simply the mode of the address itself (so DImode on 64-bit targets), which doesn't say anything about the addressed memory. This patch uses VOIDmode for asms but continues to use the operand mode for .md insns. It's needed to avoid a regression in the testcase with the late-combine pass. Fixing this made me realise that recog_level2 was doing duplicate work for asms after RA. gcc/ * recog.cc (constrain_operands): Pass VOIDmode to strict_memory_address_p for 'p' constraints in asms. * rtl-ssa/changes.cc (recog_level2): Skip redundant constrain_operands for asms. gcc/testsuite/ * gcc.target/aarch64/prfm_imm_offset_2.c: New test.
2023-12-11RTL-SSA: Fix ICE on record_use of RTL_SSA for RISC-V VSETVL PASSJuzhe-Zhong1-3/+8
This patch fixes an ICE on record_use during RTL_SSA initialization RISC-V backend VSETVL PASS. This is the ICE: 0x11a8603 partial_subreg_p(machine_mode, machine_mode) ../../../../gcc/gcc/rtl.h:3187 0x3b695eb rtl_ssa::function_info::record_use(rtl_ssa::function_info::build_info&, rtl_ssa::insn_info*, rtx_obj_reference) ../../../../gcc/gcc/rtl-ssa/insns.cc:524 In record_use: if (HARD_REGISTER_NUM_P (regno) && partial_subreg_p (use->mode (), mode)) Assertion failed on partial_subreg_p which is: inline bool partial_subreg_p (machine_mode outermode, machine_mode innermode) { /* Modes involved in a subreg must be ordered. In particular, we must always know at compile time whether the subreg is paradoxical. */ poly_int64 outer_prec = GET_MODE_PRECISION (outermode); poly_int64 inner_prec = GET_MODE_PRECISION (innermode); gcc_checking_assert (ordered_p (outer_prec, inner_prec)); -----> cause ICE. return maybe_lt (outer_prec, inner_prec); } RISC-V VSETVL PASS is an advanced lazy vsetvl insertion PASS after RA (register allocation). The rootcause is that we have a pattern (reduction instruction) that includes both VLA (length-agnostic) and VLS (fixed-length) modes. (insn 168 173 170 31 (set (reg:RVVM1SI 101 v5 [311]) (unspec:RVVM1SI [ (unspec:V32BI [ (const_vector:V32BI [ (const_int 1 [0x1]) repeated x32 ]) (reg:DI 30 t5 [312]) (const_int 2 [0x2]) repeated x2 (reg:SI 66 vl) (reg:SI 67 vtype) ] UNSPEC_VPREDICATE) (unspec:RVVM1SI [ (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185]) -----> VLS mode NUNITS = 32 elements. (reg:RVVM1SI 113 v17 [439]) -----> VLA mode NUNITS = [8, 8] elements. ] UNSPEC_REDUC_XOR) (unspec:RVVM1SI [ (reg:SI 0 zero) ] UNSPEC_VUNDEF) ] UNSPEC_REDUC)) 15948 {pred_redxorv32si} In this case, record_use is trying to check partial_subreg_p (use->mode (), mode) for RTX = (reg:V32SI 96 v0 [orig:185 vect__96.40 ] [185]). use->mode () == V32SImode, wheras mode = RVVM1SImode. Then it ICE since they are !ordered_p. Set the use mode as the biggest mode which is natural fall back mode. gcc/ChangeLog: * rtl-ssa/insns.cc (function_info::record_use): Add !ordered_p case. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/vsetvl_bug-2.c: New test.
2023-11-24rtl-ssa: Add some helpers for removing accessesAlex Coplan2-8/+46
This adds some helpers to access-utils.h for removing accesses from an access_array. This is needed by the upcoming aarch64 load/store pair fusion pass. gcc/ChangeLog: * rtl-ssa/access-utils.h (filter_accesses): New. (remove_regno_access): New. (check_remove_regno_access): New. * rtl-ssa/accesses.cc (rtl_ssa::remove_note_accesses_base): Use new filter_accesses helper.
2023-11-24rtl-ssa: Support for inserting new insnsAlex Coplan10-13/+136
The upcoming aarch64 load pair pass needs to form store pairs, and can re-order stores over loads when alias analysis determines this is safe. In the case that both mem defs have uses in the RTL-SSA IR, and both stores require re-ordering over their uses, we represent that as (tentative) deletion of the original store insns and creation of a new insn, to prevent requiring repeated re-parenting of uses during the pass. We then update all mem uses that require re-parenting in one go at the end of the pass. To support this, RTL-SSA needs to handle inserting new insns (rather than just changing existing ones), so this patch adds support for that. New insns (and new accesses) are temporaries, allocated above a temporary obstack_watermark, such that the user can easily back out of a change without awkward bookkeeping. gcc/ChangeLog: * rtl-ssa/accesses.cc (function_info::create_set): New. * rtl-ssa/accesses.h (access_info::is_temporary): New. * rtl-ssa/changes.cc (move_insn): Handle new (temporary) insns. (function_info::finalize_new_accesses): Handle new/temporary user-created accesses. (function_info::apply_changes_to_insn): Ensure m_is_temp flag on new insns gets cleared. (function_info::change_insns): Handle new/temporary insns. (function_info::create_insn): New. * rtl-ssa/changes.h (class insn_change): Make function_info a friend class. * rtl-ssa/functions.h (function_info): Declare new entry points: create_set, create_insn. Declare new change_alloc helper. * rtl-ssa/insns.cc (insn_info::print_full): Identify temporary insns in dump. * rtl-ssa/insns.h (insn_info): Add new m_is_temp flag and accompanying is_temporary accessor. * rtl-ssa/internals.inl (insn_info::insn_info): Initialize m_is_temp to false. * rtl-ssa/member-fns.inl (function_info::change_alloc): New. * rtl-ssa/movement.h (restrict_movement_for_defs_ignoring): Add handling for temporary defs.
2023-10-25rtl-ssa: Add new helper functionsRichard Sandiford4-0/+148
This patch adds some RTL-SSA helper functions. They will be used by the upcoming late-combine pass. The patch contains the first non-template out-of-line function declared in movement.h, so it adds a movement.cc. I realise it seems a bit over-the-top to have a file with just one function, but it might grow in future. :) gcc/ * Makefile.in (OBJS): Add rtl-ssa/movement.o. * rtl-ssa/access-utils.h (accesses_include_nonfixed_hard_registers) (single_set_info): New functions. (remove_uses_of_def, accesses_reference_same_resource): Declare. (insn_clobbers_resources): Likewise. * rtl-ssa/accesses.cc (rtl_ssa::remove_uses_of_def): New function. (rtl_ssa::accesses_reference_same_resource): Likewise. (rtl_ssa::insn_clobbers_resources): Likewise. * rtl-ssa/movement.h (can_move_insn_p): Declare. * rtl-ssa/movement.cc: New file.
2023-10-25rtl-ssa: Extend make_uses_availableRichard Sandiford2-2/+39
The first in-tree use of RTL-SSA was fwprop, and one of the goals was to make the fwprop rewrite preserve the old behaviour as far as possible. The switch to RTL-SSA was supposed to be a pure infrastructure change. So RTL-SSA has various FIXMEs for things that were artifically limited to faciliate the old-fwprop vs. new-fwprop comparison. One of the things that fwprop wants to do is extend live ranges, and function_info::make_use_available tried to keep within the cases that old fwprop could handle. Since the information is built in extended basic blocks, it's easy to handle intra-EBB queries directly. This patch does that, and removes the associated FIXME. To get a flavour for how much difference this makes, I tried compiling the testsuite at -Os for at least one target per supported CPU and OS. For most targets, only a handful of tests changed, but the vast majority of changes were positive. The only target that seemed to benefit significantly was i686-apple-darwin. The main point of the patch is to remove the FIXME and to enable the upcoming post-RA late-combine pass to handle more cases. gcc/ * rtl-ssa/functions.h (function_info::remains_available_at_insn): New member function. * rtl-ssa/accesses.cc (function_info::remains_available_at_insn): Likewise. (function_info::make_use_available): Avoid false negatives for queries within an EBB.
2023-10-25rtl-ssa: Use frequency-weighted insn costsRichard Sandiford1-4/+24
rtl_ssa::changes_are_worthwhile used the standard approach of summing up the individual costs of the old and new sequences to see which one is better overall. But when optimising for speed and changing instructions in multiple blocks, it seems better to weight the cost of each instruction by its execution frequency. (We already do something similar for SLP layouts.) gcc/ * rtl-ssa/changes.cc: Include sreal.h. (rtl_ssa::changes_are_worthwhile): When optimizing for speed, scale the cost of each instruction by its execution frequency.
2023-10-25rtl-ssa: Handle call clobbers in more placesRichard Sandiford6-19/+60
In order to save (a lot of) memory, RTL-SSA avoids creating individual clobber records for every call-clobbered register. It instead maintains a list & splay tree of calls in an EBB, grouped by ABI. This patch takes these call clobbers into account in a couple more routines. I don't think this will have any effect on existing users, since it's only necessary for hard registers. gcc/ * rtl-ssa/access-utils.h (next_call_clobbers): New function. (is_single_dominating_def, remains_available_on_exit): Replace with... * rtl-ssa/functions.h (function_info::is_single_dominating_def) (function_info::remains_available_on_exit): ...these new member functions. (function_info::m_clobbered_by_calls): New member variable. * rtl-ssa/functions.cc (function_info::function_info): Explicitly initialize m_clobbered_by_calls. * rtl-ssa/insns.cc (function_info::record_call_clobbers): Update m_clobbered_by_calls for each call-clobber note. * rtl-ssa/member-fns.inl (function_info::is_single_dominating_def): New function. Check for call clobbers. * rtl-ssa/accesses.cc (function_info::remains_available_on_exit): Likewise.
2023-10-25rtl-ssa: Calculate dominance frontiers for the exit blockRichard Sandiford2-15/+30
The exit block can have multiple predecessors, for example if the function calls __builtin_eh_return. We might then need PHI nodes for values that are live on exit. RTL-SSA uses the normal dominance frontiers approach for calculating where PHI nodes are needed. However, dominannce.cc only calculates dominators for normal blocks, not the exit block. calculate_dominance_frontiers likewise only calculates dominance frontiers for normal blocks. This patch fills in the “missing” frontiers manually. gcc/ * rtl-ssa/internals.h (build_info::exit_block_dominator): New member variable. * rtl-ssa/blocks.cc (build_info::build_info): Initialize it. (bb_walker::bb_walker): Use it, moving the computation of the dominator to... (function_info::process_all_blocks): ...here. (function_info::place_phis): Add dominance frontiers for the exit block.
2023-10-25rtl-ssa: Handle artifical uses of deleted defsRichard Sandiford2-2/+34
If an optimisation removes the last real use of a definition, there can still be artificial uses left. This patch removes those uses too. These artificial uses exist because RTL-SSA is only an SSA-like view of the existing RTL IL, rather than a native SSA representation. It effectively treats RTL registers like gimple vops, but with the addition of an RPO view of the register's lifetime(s). Things are structured to allow most operations to update this RPO view in amortised sublinear time. gcc/ * rtl-ssa/functions.h (function_info::process_uses_of_deleted_def): New member function. * rtl-ssa/changes.cc (function_info::process_uses_of_deleted_def): Likewise. (function_info::change_insns): Use it.
2023-10-25rtl-ssa: Fix ICE when deleting memory clobbersRichard Sandiford1-2/+12
Sometimes an optimisation can remove a clobber of scratch registers or scratch memory. We then need to update the DU chains to reflect the removed clobber. For registers this isn't a problem. Clobbers of registers are just momentary blips in the register's lifetime. They act as a barrier for moving uses later or defs earlier, but otherwise they have no effect on the semantics of other instructions. Removing a clobber is therefore a cheap, local operation. In contrast, clobbers of memory are modelled as full sets. This is because (a) a clobber of memory does not invalidate *all* memory and (b) it's a common idiom to use (clobber (mem ...)) in stack barriers. But removing a set and redirecting all uses to a different set is a linear operation. Doing it for potentially every optimisation could lead to quadratic behaviour. This patch therefore refrains from removing sets of memory that appear to be redundant. There's an opportunity to clean this up in linear time at the end of the pass, but as things stand, nothing would benefit from that. This is also a very rare event. Usually we should try to optimise the insn before the scratch memory has been allocated. gcc/ * rtl-ssa/changes.cc (function_info::finalize_new_accesses): If a change describes a set of memory, ensure that that set is kept, regardless of the insn pattern.
2023-10-25rtl-ssa: Create REG_UNUSED notes after all pending changesRichard Sandiford1-3/+6
Unlike REG_DEAD notes, REG_UNUSED notes need to be kept free of false positives by all passes. function_info::change_insns does this by removing all REG_UNUSED notes, and then using add_reg_unused_notes to add notes back (or create new ones) where appropriate. The problem was that it called add_reg_unused_notes on the fly while updating each instruction, which meant that the information for later instructions in the change set wasn't up to date. This patch does it in a separate loop instead. gcc/ * rtl-ssa/changes.cc (function_info::apply_changes_to_insn): Remove call to add_reg_unused_notes and instead... (function_info::change_insns): ...use a separate loop here.
2023-10-25rtl-ssa: Ensure global registers are live on exitRichard Sandiford1-3/+16
RTL-SSA mostly relies on DF for block-level register liveness information, including artificial uses and defs at the beginning and end of blocks. But one case was missing. DF does not add artificial uses of global registers to the beginning or end of a block. Instead it marks them as used within every block when computing LR and LIVE problems. For RTL-SSA, global registers behave like memory, which in turn behaves like gimple vops. We need to ensure that they are live on exit so that final definitions do not appear to be unused. Also, the previous live-on-exit handling only considered the exit block itself. It needs to consider non-local gotos as well, since they jump directly to some code in a parent function and so do not have a path to the exit block. gcc/ * rtl-ssa/blocks.cc (function_info::add_artificial_accesses): Force global registers to be live on exit. Handle any block with zero successors like an exit block.
2023-10-24rtl-ssa: Avoid creating duplicated phisRichard Sandiford1-0/+5
If make_uses_available was called twice for the same use, we could end up trying to create duplicate definitions for the same extended live range. gcc/ * rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Check whether the requested phi already exists.
2023-10-24rtl-ssa: Don't insert after insns that can throwRichard Sandiford1-1/+2
rtl_ssa::can_insert_after didn't handle insns that can throw. Fixing that avoids a regression with a later patch. gcc/ * rtl-ssa.h: Include cfgbuild.h. * rtl-ssa/movement.h (can_insert_after): Replace is_jump with the more comprehensive control_flow_insn_p.
2023-10-24rtl-ssa: Fix handling of deleted insnsRichard Sandiford1-1/+4
RTL-SSA queues up some invasive changes for later. But sometimes the insns involved in those changes can be deleted by later optimisations, making the queued change unnecessary. This patch checks for that case. gcc/ * rtl-ssa/changes.cc (function_info::perform_pending_updates): Check whether an insn has been replaced by a note.
2023-10-24rtl-ssa: Fix null deref in first_any_insn_useRichard Sandiford1-1/+1
first_any_insn_use implicitly (but contrary to its documentation) assumed that there was at least one use. gcc/ * rtl-ssa/member-fns.inl (first_any_insn_use): Handle null m_first_use.
2023-10-20rtl-ssa: Don't leave NOTE_INSN_DELETED aroundAlex Coplan1-1/+5
This patch tweaks change_insns to also call ::remove_insn to ensure the underlying RTL insn gets removed from the insn chain in the case of a deletion. This avoids leaving NOTE_INSN_DELETED around after deleting insns. For movement, the RTL insn chain is updated earlier in change_insns with the call to move_insn. For deletion, it seems reasonable to do it here. gcc/ChangeLog: * rtl-ssa/changes.cc (function_info::change_insns): Ensure we call ::remove_insn on deleted insns.
2023-10-19rtl-ssa: Support inferring uses of mem in change_insnsAlex Coplan2-4/+29
Currently, rtl_ssa::change_insns requires all new uses and defs to be specified explicitly. This turns out to be rather inconvenient for forming load pairs in the new aarch64 load pair pass, as the pass has to determine which mem def the final load pair consumes, and then obtain or create a suitable use (i.e. significant bookkeeping, just to keep the RTL-SSA IR consistent). It turns out to be much more convenient to allow change_insns to infer which def is consumed and create a suitable use of mem itself. This patch does that. gcc/ChangeLog: * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new parameter to give final insn position, infer use of mem if it isn't specified explicitly. (function_info::change_insns): Pass down final insn position to finalize_new_accesses. * rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
2023-10-19rtl-ssa: Add entry point to allow re-parenting usesAlex Coplan2-0/+11
This is needed by the upcoming aarch64 load pair pass, as it can re-order stores (when alias analysis determines this is safe) and thus change which mem def a given use consumes (in the RTL-SSA view, there is no alias disambiguation of memory). gcc/ChangeLog: * rtl-ssa/accesses.cc (function_info::reparent_use): New. * rtl-ssa/functions.h (function_info): Declare new member function reparent_use.
2023-10-19rtl-ssa: Add drop_memory_access helperAlex Coplan1-0/+13
Add a helper routine to access-utils.h which removes the memory access from an access_array, if it has one. gcc/ChangeLog: * rtl-ssa/access-utils.h (drop_memory_access): New.
2023-10-19rtl-ssa: Fix bug in function_info::add_insn_afterAlex Coplan1-3/+11
In the case that !insn->is_debug_insn () && next->is_debug_insn (), this function was missing an update of the prev pointer on the first nondebug insn following the sequence of debug insns starting at next. This can lead to corruption of the insn chain, in that we end up with: insn->next_any_insn ()->prev_any_insn () != insn in this case. This patch fixes that. gcc/ChangeLog: * rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we update the prev pointer on the following nondebug insn in the case that !insn->is_debug_insn () && next->is_debug_insn ().
2023-09-29use *_grow_cleared rather than *_grow on vec<bitmap_head>Jakub Jelinek1-3/+3
The assert checking which is commented out in vec.h grow method requires trivially default constructible types to be used with this method, but bitmap_head has since the PR88317 r9-4642 workaround non-trivial default constructor to catch bugs and we pay the minimum price of initializing everything in bitmap_head twice on the common bitmap_head var; bitmap_initilize (&var, obstack); sequence. This patch makes us pay the same price times number of elements on vec<bitmap_head> v; v.create (n); v.safe_grow_cleared (n); // previous v.safe_grow (n); for (int i = 0; i < n; ++i) bitmap_initialize (&v[i], obstack); 2023-09-29 Jakub Jelinek <jakub@redhat.com> * tree-ssa-loop-im.cc (tree_ssa_lim_initialize): Use quick_grow_cleared instead of quick_grow on vec<bitmap_head> members. * cfganal.cc (control_dependences::control_dependences): Likewise. * rtl-ssa/blocks.cc (function_info::build_info::build_info): Likewise. (function_info::place_phis): Use safe_grow_cleared instead of safe_grow on auto_vec<bitmap_head> vars. * tree-ssa-live.cc (compute_live_vars): Use quick_grow_cleared instead of quick_grow on vec<bitmap_head> var.
2023-07-18RTL_SSA: Relax PHI_MODE in phi_setupJu-Zhe Zhong1-0/+3
Hi, Richard. RISC-V port needs to add a bunch VLS modes (V16QI,V32QI,V64QI,...etc) There are sharing same REG_CLASS with VLA modes (VNx16QI,VNx32QI,...etc) When I am adding those VLS modes, the RTL_SSA initialization in VSETVL PASS (inserted after RA) ICE: rvv.c:13:1: internal compiler error: in partial_subreg_p, at rtl.h:3186 13 | } | ^ 0xf7a5b1 partial_subreg_p(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3186 0x1407616 wider_subreg_mode(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl.h:3252 0x2a2c6ff rtl_ssa::combine_modes(machine_mode, machine_mode) ../../../riscv-gcc/gcc/rtl-ssa/internals.inl:677 0x2a2b9a4 rtl_ssa::function_info::simplify_phi_setup(rtl_ssa::phi_info*, rtl_ssa::set_info**, bitmap_head*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:146 0x2a2c142 rtl_ssa::function_info::simplify_phis() ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:258 0x2a2b3f0 rtl_ssa::function_info::function_info(function*) ../../../riscv-gcc/gcc/rtl-ssa/functions.cc:51 0x1cebab9 pass_vsetvl::init() ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4578 0x1cec150 pass_vsetvl::execute(function*) ../../../riscv-gcc/gcc/config/riscv/riscv-vsetvl.cc:4716 The reason is that we have V32QImode (size = [32,0]) which is the mode set as regno_reg_rtx[97] When the PHI input def comes from ENTRY BLOCK (index =0), the def->mode () = V32QImode. But the phi_mode = VNx2QI for example (I use VLA modes intrinsic write the codes). Then combine_modes report ICE. gcc/ChangeLog: * rtl-ssa/internals.inl: Fix when mode1 and mode2 are not ordred.
2023-05-18Machine_Mode: Extend machine_mode from 8 to 16 bitsPan Li2-10/+4
We are running out of the machine_mode(8 bits) in RISC-V backend. Thus we would like to extend the machine_mode bit size from 8 to 16 bits. However, it is sensitive to extend the memory size in common structure like tree or rtx. This patch would like to extend the machine_mode bits to 16 bits by shrinking, like: * Swap the bit size of code and machine code in rtx_def. * Adjust the machine_mode location and spare in tree. The memory impact of this patch for correlated structure looks like below: +-------------------+----------+---------+------+ | struct/bytes | upstream | patched | diff | +-------------------+----------+---------+------+ | rtx_obj_reference | 8 | 12 | +4 | | ext_modified | 2 | 4 | +2 | | ira_allocno | 192 | 184 | -8 | | qty_table_elem | 40 | 40 | 0 | | reg_stat_type | 64 | 64 | 0 | | rtx_def | 40 | 40 | 0 | | table_elt | 80 | 80 | 0 | | tree_decl_common | 112 | 112 | 0 | | tree_type_common | 128 | 128 | 0 | | access_info | 8 | 8 | 0 | +-------------------+----------+---------+------+ The tree and rtx related struct has no memory changes after this patch, and the machine_mode changes to 16 bits already. Signed-off-by: Pan Li <pan2.li@intel.com> Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Co-authored-by: Kito Cheng <kito.cheng@sifive.com> Co-Authored-By: Richard Biener <rguenther@suse.de> Co-Authored-By: Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * combine.cc (struct reg_stat_type): Extend machine_mode to 16 bits. * cse.cc (struct qty_table_elem): Extend machine_mode to 16 bits (struct table_elt): Extend machine_mode to 16 bits. (struct set): Ditto. * genmodes.cc (emit_mode_wider): Extend type from char to short. (emit_mode_complex): Ditto. (emit_mode_inner): Ditto. (emit_class_narrowest_mode): Ditto. * genopinit.cc (main): Extend the machine_mode limit. * ira-int.h (struct ira_allocno): Extend machine_mode to 16 bits and re-ordered the struct fields for padding. * machmode.h (MACHINE_MODE_BITSIZE): New macro. (GET_MODE_2XWIDER_MODE): Extend type from char to short. (get_mode_alignment): Extend type from char to short. * ree.cc (struct ext_modified): Extend machine_mode to 16 bits and removed the ATTRIBUTE_PACKED. * rtl-ssa/accesses.h: Extend machine_mode to 16 bits, narrow * rtl-ssa/internals.inl (rtl_ssa::access_info): Adjust the assignment. m_kind to 2 bits and remove m_spare. * rtl.h (RTX_CODE_BITSIZE): New macro. (struct rtx_def): Swap both the bit size and location between the rtx_code and the machine_mode. (subreg_shape::unique_id): Extend the machine_mode limit. * rtlanal.h: Extend machine_mode to 16 bits. * tree-core.h (struct tree_type_common): Extend machine_mode to 16 bits and re-ordered the struct fields for padding. (struct tree_decl_common): Extend machine_mode to 16 bits.
2023-02-02rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]Richard Sandiford1-5/+9
insn_info tried to save space by storing the number of definitions in a 16-bit bitfield. The justification was: // ... FIRST_PSEUDO_REGISTER + 1 // is the maximum number of accesses to hard registers and memory, and // MAX_RECOG_OPERANDS is the maximum number of pseudos that can be // defined by an instruction, so the number of definitions should fit // easily in 16 bits. But while that reasoning holds (I think) for real instructions, it doesn't hold for artificial instructions. I don't think there's any sensible higher limit we can use, so this patch goes for a full unsigned int. gcc/ PR rtl-optimization/108086 * rtl-ssa/insns.h (insn_info): Make m_num_defs a full unsigned int. Adjust size-related commentary accordingly.
2023-02-02rtl-ssa: Fix splitting of clobber groups [PR108508]Richard Sandiford1-4/+10
Since rtl-ssa isn't a real/native SSA representation, it has to honour the constraints of the underlying rtl representation. Part of this involves maintaining an rpo list of definitions for each rtl register, backed by a splay tree where necessary for quick lookup/insertion. However, clobbers of a register don't act as barriers to other clobbers of a register. E.g. it's possible to move one flag-clobbering instruction across an arbitrary number of other flag-clobbering instructions. In order to allow passes to do that without quadratic complexity, the splay tree groups all consecutive clobbers into groups, with only the group being entered into the splay tree. These groups in turn have an internal splay tree of clobbers where necessary. This means that, if we insert a new definition and use into the middle of a sea of clobbers, we need to split the clobber group into two groups. This was quite a difficult condition to trigger during development, and the PR shows that the code to handle it had (at least) two bugs. First, the process involves searching the clobber tree for the split point. This search can give either the previous clobber (which will belong to the first of the split groups) or the next clobber (which will belong to the second of the split groups). The code for the former case handled the split correctly but the code for the latter case didn't. Second, I'd forgotten to add the second clobber group to the main splay tree. :-( gcc/ PR rtl-optimization/108508 * rtl-ssa/accesses.cc (function_info::split_clobber_group): When the splay tree search gives the first clobber in the second group, make sure that the root of the first clobber group is updated correctly. Enter the new clobber group into the definition splay tree. gcc/testsuite/ PR rtl-optimization/108508 * gcc.target/aarch64/pr108508.c: New test.
2023-01-16Update copyright years.Jakub Jelinek18-18/+18
2022-06-27Add 'final' and 'override' on dom_walker vfunc implsDavid Malcolm1-2/+2
gcc/ChangeLog: * compare-elim.cc: Add "final" and "override" to dom_walker vfunc implementations, removing redundant "virtual" as appropriate. * gimple-ssa-strength-reduction.cc: Likewise. * ipa-prop.cc: Likewise. * rtl-ssa/blocks.cc: Likewise. * tree-into-ssa.cc: Likewise. * tree-ssa-dom.cc: Likewise. * tree-ssa-math-opts.cc: Likewise. * tree-ssa-phiopt.cc: Likewise. * tree-ssa-propagate.cc: Likewise. * tree-ssa-sccvn.cc: Likewise. * tree-ssa-strlen.cc: Likewise. * tree-ssa-uncprop.cc: Likewise. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2022-05-09Come up with {,UN}LIKELY macros.Martin Liska2-3/+3
gcc/c/ChangeLog: * c-parser.cc (c_parser_conditional_expression): Use {,UN}LIKELY macros. (c_parser_binary_expression): Likewise. gcc/cp/ChangeLog: * cp-gimplify.cc (cp_genericize_r): Use {,UN}LIKELY macros. * parser.cc (cp_finalize_omp_declare_simd): Likewise. (cp_finalize_oacc_routine): Likewise. gcc/ChangeLog: * system.h (LIKELY): Define. (UNLIKELY): Likewise. * domwalk.cc (sort_bbs_postorder): Use {,UN}LIKELY macros. * dse.cc (set_position_unneeded): Likewise. (set_all_positions_unneeded): Likewise. (any_positions_needed_p): Likewise. (all_positions_needed_p): Likewise. * expmed.cc (flip_storage_order): Likewise. * genmatch.cc (dt_simplify::gen_1): Likewise. * ggc-common.cc (gt_pch_save): Likewise. * print-rtl.cc: Likewise. * rtl-iter.h (T>::array_type::~array_type): Likewise. (T>::next): Likewise. * rtl-ssa/internals.inl: Likewise. * rtl-ssa/member-fns.inl: Likewise. * rtlanal.cc (T>::add_subrtxes_to_queue): Likewise. (rtx_properties::try_to_add_dest): Likewise. * rtlanal.h (growing_rtx_properties::repeat): Likewise. (vec_rtx_properties_base::~vec_rtx_properties_base): Likewise. * simplify-rtx.cc (simplify_replace_fn_rtx): Likewise. * sort.cc (likely): Likewise. (mergesort): Likewise. * wide-int.h (wi::eq_p): Likewise. (wi::ltu_p): Likewise. (wi::cmpu): Likewise. (wi::bit_and): Likewise. (wi::bit_and_not): Likewise. (wi::bit_or): Likewise. (wi::bit_or_not): Likewise. (wi::bit_xor): Likewise. (wi::add): Likewise. (wi::sub): Likewise.