aboutsummaryrefslogtreecommitdiff
path: root/lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.cpp
diff options
context:
space:
mode:
authorChristopher Bate <cbate@nvidia.com>2022-06-07 10:51:27 -0600
committerChristopher Bate <cbate@nvidia.com>2022-06-17 09:31:05 -0600
commit51b925df941a66349deff2467203acc200de5e78 (patch)
tree8cf2a9efd259a78cba465a138b66d2f970b577bc /lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.cpp
parent90f96ec7a52e840cdc65035cb3beca620032be69 (diff)
downloadllvm-51b925df941a66349deff2467203acc200de5e78.zip
llvm-51b925df941a66349deff2467203acc200de5e78.tar.gz
llvm-51b925df941a66349deff2467203acc200de5e78.tar.bz2
[mlir][nvgpu] shared memory access optimization pass
This change adds a transformation and pass to the NvGPU dialect that attempts to optimize reads/writes from a memref representing GPU shared memory in order to avoid bank conflicts. Given a value representing a shared memory memref, it traverses all reads/writes within the parent op and, subject to suitable conditions, rewrites all last dimension index values such that element locations in the final (col) dimension are given by `newColIdx = col % vecSize + perm[row](col/vecSize,row)` where `perm` is a permutation function indexed by `row` and `vecSize` is the vector access size in elements (currently assumes 128bit vectorized accesses, but this can be made a parameter). This specific transformation can help optimize typical distributed & vectorized accesses common to loading matrix multiplication operands to/from shared memory. Differential Revision: https://reviews.llvm.org/D127457
Diffstat (limited to 'lldb/source/Plugins/ScriptInterpreter/Python/SWIGPythonBridge.cpp')
0 files changed, 0 insertions, 0 deletions