diff options
author | Jeff Law <jlaw@ventanamicro.com> | 2024-10-30 07:43:22 -0600 |
---|---|---|
committer | Jeff Law <jlaw@ventanamicro.com> | 2024-10-30 07:45:31 -0600 |
commit | a65e1487cda969e4763ae84577bf3e0d9e2b34aa (patch) | |
tree | a9b4338a85c195bf26489dc8445710490cb32dd6 /gcc/tree-vectorizer.h | |
parent | 673d6b2cbf610508d315526f4963793a343a2070 (diff) | |
download | gcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.zip gcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.tar.gz gcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.tar.bz2 |
[RISC-V] Aggressively hoist VXRM assignments
So a while back I was looking at pixel_avg for RISC-V where we try to
use vaaddu for the halfword-ceiling-average step. The problem with
vaaddu is that you must set VXRM to a suitable rounding mode as it has
an undetermined state at function entry or after a function call.
It turns out some designs will fully flush their pipelines on a write to
VXRM which you can imagine is incredibly expensive.
VXRM assignments are handled by an LCM based algorithm to find "optimal"
placement points based on what insns in the stream need VXRM assignments
and the particular mode they need.
Unfortunately in pixel_avg an LCM algorithm only allows hoisting out of
the innermost loop, but not the outer loop. The core issue is that LCM
does not allow any speculation and there are paths which would bypass
the inner loop (which don't actually trigger at runtime IIRC).
The expectation is that VXRM assignments should be exceedingly rare and
needing more than one mode even rarer. So hoisting more aggressively
seems like a reasonable thing to do, but we don't want to burn too much
time trying to do something fancy.
So what this patch does is scan the IL once collecting any VXRM needs.
If the current function has precisely one VXRM mode needed, then we
pretend (for the sake of LCM) that the first instruction in the function
also has that need.
By doing so the VXRM assignment is essentially anticipated everywhere in
the function. The standard LCM algorithm is run and has enough
information to hoist the VXRM assignment more aggressively, most often
to the prologue.
This helps the BPI in a measurable way (IIRC it was 2-3%). It probably
helps some of the SiFive designs, but I've been told they still benefit
from the longer sequence of shifts & adds, hoisting just isn't enough
for those designs. The Ventana design basically doesn't care where the
VXRM assignment is. Point is we may want to have a tuning knob for the
patterns which need VXRM (vaadd[u], vasub[u]) at some point in the near
future.
Bootstrapped and regression tested on riscv64 and regression tested on
riscv32-elf and riscv64-elf. We've been using this internally for a
while a while on spec as well. Obviously I'll wait for the pre-commit
tester to do its thing.
gcc/
* config/riscv/riscv.cc (singleton_vxrm_need): New function.
(riscv_mode_needed): See if there is a singleton need and if so,
claim it happens on the first insn in the chain.
Diffstat (limited to 'gcc/tree-vectorizer.h')
0 files changed, 0 insertions, 0 deletions