aboutsummaryrefslogtreecommitdiff
path: root/gcc/analyzer
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2023-12-14 13:46:16 +0000
committerRichard Sandiford <richard.sandiford@arm.com>2023-12-14 13:46:16 +0000
commit2f46e3578d45ff060a0a329cb39d4f52878f9d5a (patch)
tree80f5a1c75e3d495e7c73d7de38824628df55e1ea /gcc/analyzer
parente1e71b4e0681974b3db41afa7fc18720a30d6848 (diff)
downloadgcc-2f46e3578d45ff060a0a329cb39d4f52878f9d5a.zip
gcc-2f46e3578d45ff060a0a329cb39d4f52878f9d5a.tar.gz
gcc-2f46e3578d45ff060a0a329cb39d4f52878f9d5a.tar.bz2
aarch64: Improve handling of accumulators in early-ra
Being very simplistic, early-ra just models an allocno's live range as a single interval. This doesn't work well for single-register accumulators that are updated multiple times in a loop, since in SSA form, each intermediate result will be a separate SSA name and will remain separate from the accumulator even after out-of-ssa. This means that in something like: for (;;) { x = x + ...; x = x + ...; } the first definition of x and the second use will be a separate pseudo from the "main" loop-carried pseudo. A real RA would fix this by keeping general, segmented live ranges. But that feels like a slippery slope in this context. This patch instead looks for sharability at a more local level, as described in the comments. It's a bit hackish, but hopefully not too much. The patch also contains some small tweaks that are needed to make the new and existing tests pass: - fix a case where a pseudo that was only moved was wrongly treated as not an FPR candidate - fix some bookkeeping related to is_strong_copy_src - use the number of FPR preferences as a tiebreaker when sorting colors I fully expect that we'll need to be more aggressive at skipping the early-ra allocation. For example, it probably makes sense to refuse any allocation that involves an FPR move. But I'd like to keep collecting examples of where things go wrong first, so that hopefully we can improve the cases with strided registers or structures. gcc/ * config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New member variable. (allocno_info::equiv_allocno): Replace with... (allocno_info::related_allocno): ...this member variable. (allocno_info::chain_prev): Put into an enum with... (allocno_info::last_use_point): ...this new member variable. (color_info::num_fpr_preferences): New member variable. (early_ra::m_shared_allocnos): Likewise. (allocno_info::is_shared): New member function. (allocno_info::is_equiv_to): Likewise. (early_ra::dump_allocnos): Dump sharing information. Tweak column widths. (early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2. (early_ra::start_new_region): Handle m_shared_allocnos. (early_ra::create_allocno_group): Set related_allocno rather than equiv_allocno. (early_ra::record_allocno_use): Likewise. Detect multiple calls for the same program point. Update last_use_point and is_equiv. Clear is_strong_copy_src rather than is_strong_copy_dest. (early_ra::record_allocno_def): Use related_allocno rather than equiv_allocno. Update last_use_point. (early_ra::valid_equivalence_p): Replace with... (early_ra::find_related_start): ...this new function. (early_ra::record_copy): Look for cases where a destination copy chain can be shared with the source allocno. (early_ra::find_strided_accesses): Update for equiv_allocno-> related_allocno change. Only call consider_strong_copy_src_chain at the head of a copy chain. (early_ra::is_chain_candidate): Skip shared allocnos. Update for new representation of equivalent allocnos. (early_ra::chain_allocnos): Update for new representation of equivalent allocnos. (early_ra::try_to_chain_allocnos): Likewise. (early_ra::merge_fpr_info): New function, split out from... (early_ra::set_single_color_rep): ...here. (early_ra::form_chains): Handle shared allocnos. (early_ra::process_copies): Count the number of FPR preferences. (early_ra::cmp_decreasing_size): Rename to... (early_ra::cmp_allocation_order): ...this. Sort equal-sized groups by the number of FPR preferences. (early_ra::finalize_allocation): Handle shared allocnos. (early_ra::process_region): Reset chain_prev as well as chain_next. gcc/testsuite/ * gcc.target/aarch64/sve/accumulators_1.c: New test. * gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to be in any order. * gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise. * gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.
Diffstat (limited to 'gcc/analyzer')
0 files changed, 0 insertions, 0 deletions