aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
diff options
context:
space:
mode:
authorGuray Ozen <guray.ozen@gmail.com>2023-09-22 17:09:43 +0200
committerGitHub <noreply@github.com>2023-09-22 17:09:43 +0200
commit17649a7726d3ce1ddba2bbf3ef73af03ea204753 (patch)
tree5388987a2424614d30a521f925c3f10182054a75 /clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
parent7ff83ed6cda068d99ec2926216d9868754da6e79 (diff)
downloadllvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.zip
llvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.tar.gz
llvm-17649a7726d3ce1ddba2bbf3ef73af03ea204753.tar.bz2
[MLIR][NVGPU] Introduce `nvgpu.mbarrier.group` for multiple mbarrier use (#65951)
A common practice involves the creation of multiple `mbarrier` objects, see an example below. This is particularly valuable in scenarios like software pipelining for GEMM, where we need to generate multiple barriers dynamically use and wait them in a loop. PR improves `nvgpu.mbarrier.barrier` type into the `nvgpu.mbarrier.group`. All `mbarrier` related Ops now uses this type. Consequently, these Ops are now capable of managing multiple barriers seamlessly. Having `num_barriers = 4` helps us to locate mbarrier object(s) into static shared memory. We could make the value dynamic that requires dynamic shared memory it would complicate the codegen. ``` %barriers = nvgpu.mbarrier.create -> !nvgpu.mbarrier.group<3, num_barriers = 4> nvgpu.mbarrier.init %barriers[%c0], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4> nvgpu.mbarrier.init %barriers[%c1], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4> nvgpu.mbarrier.init %barriers[%c2], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4> nvgpu.mbarrier.init %barriers[%c3], %num_threads : !nvgpu.mbarrier.group<3, num_barriers = 4> ... scf.for %i = %c0 to %n step %c1 { nvgpu.mbarrier.try_wait %barriers[ (i % 4) ] ... // ... Do work once mbarrier is ready nvgpu.mbarrier.arrive.expect_tx %barriers[ (i + 3 % 4) ] ... } ``` We will have mbarrier usages like below: ``` expect_tx[0] expect_tx[1] expect_tx[2] Loop: try_wait mbarrier[0], expect_tx[3] try_wait mbarrier[1], expect_tx[0] try_wait mbarrier[2], expect_tx[1] try_wait mbarrier[3], expect_tx[2] ... ```
Diffstat (limited to 'clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp')
0 files changed, 0 insertions, 0 deletions