diff options
author | Guray Ozen <guray.ozen@gmail.com> | 2023-09-22 17:09:43 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-09-22 17:09:43 +0200 |
commit | 17649a7726d3ce1ddba2bbf3ef73af03ea204753 (patch) | |
tree | 5388987a2424614d30a521f925c3f10182054a75 /clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp | |
parent | 7ff83ed6cda068d99ec2926216d9868754da6e79 (diff) | |
download | llvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.zip llvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.tar.gz llvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.tar.bz2 |
[MLIR][NVGPU] Introduce `nvgpu.mbarrier.group` for multiple mbarrier use (#65951)
A common practice involves the creation of multiple `mbarrier` objects,
see an example below. This is particularly valuable in scenarios like
software pipelining for GEMM, where we need to generate multiple
barriers dynamically use and wait them in a loop.
PR improves `nvgpu.mbarrier.barrier` type into the
`nvgpu.mbarrier.group`. All `mbarrier` related Ops now uses this type.
Consequently, these Ops are now capable of managing multiple barriers
seamlessly.
Having `num_barriers = 4` helps us to locate mbarrier object(s) into
static shared memory. We could make the value dynamic that requires
dynamic shared memory it would complicate the codegen.
```
%barriers = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c0], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c1], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c2], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
nvgpu.mbarrier.init %barriers[%c3], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4>
...
scf.for %i = %c0 to %n step %c1 {
nvgpu.mbarrier.try_wait %barriers[ (i % 4) ] ...
// ... Do work once mbarrier is ready
nvgpu.mbarrier.arrive.expect_tx %barriers[ (i + 3 % 4) ] ...
}
```
We will have mbarrier usages like below:
```
expect_tx[0]
expect_tx[1]
expect_tx[2]
Loop:
try_wait mbarrier[0], expect_tx[3]
try_wait mbarrier[1], expect_tx[0]
try_wait mbarrier[2], expect_tx[1]
try_wait mbarrier[3], expect_tx[2]
...
```
Diffstat (limited to 'clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp')
0 files changed, 0 insertions, 0 deletions