aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
diff options
context:
space:
mode:
authorAnchu Rajendran S <asudhaku@amd.com>2025-08-07 14:58:11 -0700
committerGitHub <noreply@github.com>2025-08-07 14:58:11 -0700
commit49ccf46adc455b64c2be0006092651182b1cb2c4 (patch)
treeff0aa116f5c8cef43a014e5eb057cb9e3787f7be /llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
parent9faac938e1b03ea23e7212550860f8b8001757e1 (diff)
downloadllvm-49ccf46adc455b64c2be0006092651182b1cb2c4.zip
llvm-49ccf46adc455b64c2be0006092651182b1cb2c4.tar.gz
llvm-49ccf46adc455b64c2be0006092651182b1cb2c4.tar.bz2
[OpenMP] [IR Builder] Changes to Support Scan Operation (#136035)
Scan reductions are supported in OpenMP with the help of scan directive. Reduction clause of the for loop/simd directive can take an `inscan` modifier along with the body of the directive specifying a `scan` directive. This PR implements the lowering logic for scan reductions in workshare loops of OpenMP. The body of the for loop is split into two loops (Input phase loop and Scan Phase loop) and a scan reduction loop is added in the middle. The Input phase loop populates a temporary buffer with initial values that are to be reduced. The buffer is used by the reduction loop to perform scan reduction. Scan phase loop copies the values of the buffer to the reduction variable before executing the scan phase. Below is a high level view of the code generated. ``` <declare pointer to buffer> ptr omp parallel { size num_iters = <num_iters> // temp buffer allocation omp masked { buff = malloc(num_iters*scanvarstype) *ptr = buff } barrier; // input phase loop for (i: 0..<num_iters>) { <input phase>; buffer = *ptr; buffer[i] = red; } // scan reduction omp masked { for (int k = 0; k != ceil(log2(num_iters)); ++k) { i=pow(2,k) for (size cnt = last_iter; cnt >= i; --cnt) { buffer = *ptr; buffer[cnt] op= buffer[cnt-i]; } } } barrier; // scan phase loop for (0..<num_iters>) { buffer = *ptr; red = buffer[i] ; <scan phase>; } // temp buffer deletion omp masked { free(*ptr) } barrier; } ``` The temporary buffer needs to be shared between all threads performing reduction since it is read/written in Input and Scan workshare Loops. This is achieved by declaring a pointer to the buffer in the shared region and dynamically allocating the buffer by the master thread. This is the reason why allocation, deallocation and scan reduction are performed within `masked`. The code is verified to produce correct results for Fortran programs with the code changes in the PR https://github.com/llvm/llvm-project/pull/133149
Diffstat (limited to 'llvm/lib/Bitcode/Writer/BitcodeWriter.cpp')
0 files changed, 0 insertions, 0 deletions