aboutsummaryrefslogtreecommitdiff
path: root/lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.h
diff options
context:
space:
mode:
authorPhilip Reames <preames@rivosinc.com>2025-02-11 14:40:36 -0800
committerGitHub <noreply@github.com>2025-02-11 14:40:36 -0800
commit8374d421861cd3d47e21ae7889ba0b4c498e8d85 (patch)
tree74b3aa4f47354e8188b67ad05f64225f8b076cf9 /lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.h
parenta760e7faacb79e7ff0ae23d3ae370d1aa6e52666 (diff)
downloadllvm-8374d421861cd3d47e21ae7889ba0b4c498e8d85.zip
llvm-8374d421861cd3d47e21ae7889ba0b4c498e8d85.tar.gz
llvm-8374d421861cd3d47e21ae7889ba0b4c498e8d85.tar.bz2
[RISCV] Decompose single source shuffles (without exact VLEN) (#126108)
This is a continuation of the work started in #125735 to lower selected VLA shuffles in linear m1 components instead of generating O(LMUL^2) or O(LMUL*Log2(LMUL) high LMUL shuffles. This pattern focuses on shuffles where all the elements being used across the entire destination register group come from a single register in the source register group. Such cases come up fairly frequently via e.g. spread(N), and repeat(N) idioms. One subtlety to this patch is the handling of the index vector for vrgatherei16.vv. Because the index and source registers can have different EEW, the index vector for the Nth chunk of the destination is not guaranteed to be register aligned. In fact, it is common for e.g. an EEW=64 shuffle to have EEW=16 indices which are four chunks per source register. Given this, we have to pay a cost for extracting these chunks into the low position before performing each shuffle. I'd initially expressed this as a naive extract sub-vector for each data parallel piece. However, at high LMUL, this quickly caused register pressure problems since we could at worst need 4x the temporary registers for the index. Instead, this patch uses a repeating slidedown chained from previous iterations. This increases critical path by at worst 3 slides (SEW=64 is the worst case), but reduces register pressure to at worst 2x - and only if the original index vector is reused elsewhere. I view this as arguably a bit of a workaround (since our scheduling should have done better with the plain extract variant), but a probably neccessary one.
Diffstat (limited to 'lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.h')
0 files changed, 0 insertions, 0 deletions