aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vectorizer.h
diff options
context:
space:
mode:
authorJeff Law <jlaw@ventanamicro.com>2024-10-30 07:43:22 -0600
committerJeff Law <jlaw@ventanamicro.com>2024-10-30 07:45:31 -0600
commita65e1487cda969e4763ae84577bf3e0d9e2b34aa (patch)
treea9b4338a85c195bf26489dc8445710490cb32dd6 /gcc/tree-vectorizer.h
parent673d6b2cbf610508d315526f4963793a343a2070 (diff)
downloadgcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.zip
gcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.tar.gz
gcc-a65e1487cda969e4763ae84577bf3e0d9e2b34aa.tar.bz2
[RISC-V] Aggressively hoist VXRM assignments
So a while back I was looking at pixel_avg for RISC-V where we try to use vaaddu for the halfword-ceiling-average step. The problem with vaaddu is that you must set VXRM to a suitable rounding mode as it has an undetermined state at function entry or after a function call. It turns out some designs will fully flush their pipelines on a write to VXRM which you can imagine is incredibly expensive. VXRM assignments are handled by an LCM based algorithm to find "optimal" placement points based on what insns in the stream need VXRM assignments and the particular mode they need. Unfortunately in pixel_avg an LCM algorithm only allows hoisting out of the innermost loop, but not the outer loop. The core issue is that LCM does not allow any speculation and there are paths which would bypass the inner loop (which don't actually trigger at runtime IIRC). The expectation is that VXRM assignments should be exceedingly rare and needing more than one mode even rarer. So hoisting more aggressively seems like a reasonable thing to do, but we don't want to burn too much time trying to do something fancy. So what this patch does is scan the IL once collecting any VXRM needs. If the current function has precisely one VXRM mode needed, then we pretend (for the sake of LCM) that the first instruction in the function also has that need. By doing so the VXRM assignment is essentially anticipated everywhere in the function. The standard LCM algorithm is run and has enough information to hoist the VXRM assignment more aggressively, most often to the prologue. This helps the BPI in a measurable way (IIRC it was 2-3%). It probably helps some of the SiFive designs, but I've been told they still benefit from the longer sequence of shifts & adds, hoisting just isn't enough for those designs. The Ventana design basically doesn't care where the VXRM assignment is. Point is we may want to have a tuning knob for the patterns which need VXRM (vaadd[u], vasub[u]) at some point in the near future. Bootstrapped and regression tested on riscv64 and regression tested on riscv32-elf and riscv64-elf. We've been using this internally for a while a while on spec as well. Obviously I'll wait for the pre-commit tester to do its thing. gcc/ * config/riscv/riscv.cc (singleton_vxrm_need): New function. (riscv_mode_needed): See if there is a singleton need and if so, claim it happens on the first insn in the chain.
Diffstat (limited to 'gcc/tree-vectorizer.h')
0 files changed, 0 insertions, 0 deletions