diff options
author | Anchu Rajendran S <asudhaku@amd.com> | 2025-08-07 14:58:11 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-08-07 14:58:11 -0700 |
commit | 49ccf46adc455b64c2be0006092651182b1cb2c4 (patch) | |
tree | ff0aa116f5c8cef43a014e5eb057cb9e3787f7be /llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | |
parent | 9faac938e1b03ea23e7212550860f8b8001757e1 (diff) | |
download | llvm-49ccf46adc455b64c2be0006092651182b1cb2c4.zip llvm-49ccf46adc455b64c2be0006092651182b1cb2c4.tar.gz llvm-49ccf46adc455b64c2be0006092651182b1cb2c4.tar.bz2 |
[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035)
Scan reductions are supported in OpenMP with the help of scan directive.
Reduction clause of the for loop/simd directive can take an `inscan`
modifier along with the body of the directive specifying a `scan`
directive. This PR implements the lowering logic for scan reductions in
workshare loops of OpenMP.
The body of the for loop is split into two loops (Input phase loop and
Scan Phase loop) and a scan reduction loop is added in the middle. The
Input phase loop populates a temporary buffer with initial values that
are to be reduced. The buffer is used by the reduction loop to perform
scan reduction. Scan phase loop copies the values of the buffer to the
reduction variable before executing the scan phase. Below is a high
level view of the code generated.
```
<declare pointer to buffer> ptr
omp parallel {
size num_iters = <num_iters>
// temp buffer allocation
omp masked {
buff = malloc(num_iters*scanvarstype)
*ptr = buff
}
barrier;
// input phase loop
for (i: 0..<num_iters>) {
<input phase>;
buffer = *ptr;
buffer[i] = red;
}
// scan reduction
omp masked
{
for (int k = 0; k != ceil(log2(num_iters)); ++k) {
i=pow(2,k)
for (size cnt = last_iter; cnt >= i; --cnt) {
buffer = *ptr;
buffer[cnt] op= buffer[cnt-i];
}
}
}
barrier;
// scan phase loop
for (0..<num_iters>) {
buffer = *ptr;
red = buffer[i] ;
<scan phase>;
}
// temp buffer deletion
omp masked {
free(*ptr)
}
barrier;
}
```
The temporary buffer needs to be shared between all threads performing
reduction since it is read/written in Input and Scan workshare Loops.
This is achieved by declaring a pointer to the buffer in the shared
region and dynamically allocating the buffer by the master thread.
This is the reason why allocation, deallocation and scan reduction are
performed within `masked`. The code is verified to produce correct
results for Fortran programs with the code changes in the PR
https://github.com/llvm/llvm-project/pull/133149
Diffstat (limited to 'llvm/lib/Bitcode/Writer/BitcodeWriter.cpp')
0 files changed, 0 insertions, 0 deletions