aboutsummaryrefslogtreecommitdiff
path: root/gcc/expr.cc
diff options
context:
space:
mode:
authorRichard Sandiford <richard.sandiford@arm.com>2024-11-18 19:32:51 +0000
committerRichard Sandiford <richard.sandiford@arm.com>2024-11-18 19:32:51 +0000
commit8633bdb346be748bb4dcd774ab63a01378e6af48 (patch)
tree5c6afa8faa0a9674a23d0d917d9b645c380a2804 /gcc/expr.cc
parent279475fd7236a9e4ed1ecb634f82a2bc7c895cc8 (diff)
downloadgcc-8633bdb346be748bb4dcd774ab63a01378e6af48.zip
gcc-8633bdb346be748bb4dcd774ab63a01378e6af48.tar.gz
gcc-8633bdb346be748bb4dcd774ab63a01378e6af48.tar.bz2
aarch64: Improve early-ra handling of reductions
At the moment, early-ra ducks out of allocating any region that contains a register with both a strong FPR affinity and a strong GPR affinity. The proper allocators are much better at handling that situation. But this means that early-ra tends not to allocate a region of vector code that ends in a reduction to a scalar integer if any later arithmetic is done on the scalar integer result. Currently, if a block acts as an isolated allocation region, the pass will try to split the block into subregions *between* instructions if there are no live FPRs or FPR allocnos. In the reduction case described above, it's convenient to try the same thing *within* instructions. If a block of vector code ends in a reduction, all FPRs and FPR allocnos will be dead between the "use phase" and the "def phase" of the reduction: the vector input will then have died, but the scalar result will not yet have been born. If we split the block that way, the problematic reduction result will be part of the second region, which we can skip allocating, but the vector work will be part of a separate region, which we might be able to allocate. This avoids a MOV in the testcase and also helps a small amount with x264. gcc/ * config/aarch64/aarch64-early-ra.cc (early_ra::IGNORE_REG): New flag. (early_ra::fpr_preference): Handle it. (early_ra::record_constraints): Fail the allocation if an IGNORE_REG output operand is not independent of the inputs. (defines_multi_def_pseudo): New function. (early_ra::could_split_region_here): New member function, split out from... (early_ra::process_block): ...here. Try splitting a block into multiple regions between the definition and use phases of an instruction. Set IGNORE_REG on the output registers if we do so. gcc/testsuite/ * gcc.target/aarch64/early_ra_1.c: New test.
Diffstat (limited to 'gcc/expr.cc')
0 files changed, 0 insertions, 0 deletions