aboutsummaryrefslogtreecommitdiff
path: root/flang/lib/Lower/OpenACC.cpp
AgeCommit message (Collapse)AuthorFilesLines
9 days[flang][acc] Fix the indexing of the reduction combiner for multidimensional ↵khaki31-6/+8
static arrays (#155536) In the following example of reducing a static 2D array, we have incorrect coordinates for array access in the reduction combiner. This PR reverses the order of the induction variables used for such array indexing. For other cases of static arrays, we reverse the loop order as well so that the innermost loop can handle the innermost dimension. ```Fortran program main implicit none integer, parameter :: m = 2 integer, parameter :: n = 10 integer :: r(n,m), i r = 0 !$acc parallel loop reduction(+:r(:n,:m)) do i = 1, n r(i, 1) = i enddo print *, r end program main ``` Currently, we have: ```mlir fir.do_loop %arg2 = %c0 to %c1 step %c1 { fir.do_loop %arg3 = %c0 to %c9 step %c1 { %0 = fir.coordinate_of %arg0, %arg2, %arg3 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32> %1 = fir.coordinate_of %arg1, %arg2, %arg3 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32> ``` We'll obtain: ```mlir fir.do_loop %arg2 = %c0 to %c1 step %c1 { fir.do_loop %arg3 = %c0 to %c9 step %c1 { %0 = fir.coordinate_of %arg0, %arg3, %arg2 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32> %1 = fir.coordinate_of %arg1, %arg3, %arg2 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32> ```
2025-08-03[NFC] Fix `assignment` typo. (#151864)Connector Switch1-1/+1
2025-07-29[flang][acc] Lower do and do concurrent loops specially in acc regions (#149614)Razvan Lupusoru1-116/+279
When OpenACC is enabled and Fortran loops are annotated with `acc loop`, they are lowered to `acc.loop` operation. And rest of the contained loops use the normal FIR lowering path. Hovever, the OpenACC specification has special provisions related to contained loops and their induction variable. In order to adhere to this, we convert all valid contained loops to `acc.loop` in order to store this information appropriately. The provisions in the spec that motivated this change (line numbers are from OpenACC 3.4): - 1353 Loop variables in Fortran do statements within a compute construct are predetermined to be private to the thread that executes the loop. - 3783 When do concurrent appears without a loop construct in a kernels construct it is treated as if it is annotated with loop auto. If it appears in a parallel construct or an accelerator routine then it is treated as if it is annotated with loop independent. By valid loops - we convert do loops and do concurrent loops which have induction variable. Loops which are unstructured are not handled.
2025-07-21[mlir][NFC] update `flang/Lower` create APIs (8/n) (#149912)Maksim Levental1-195/+203
See https://github.com/llvm/llvm-project/pull/147168 for more info.
2025-07-17[openacc][flang] Support two type bindName representation in acc routine ↵delaram-talaashrafi1-31/+71
(#149147) Based on the OpenACC specification — which states that if the bind name is given as an identifier it should be resolved according to the compiled language, and if given as a string it should be used unmodified — we introduce two distinct `bindName` representations for `acc routine` to handle each case appropriately: one as an array of `SymbolRefAttr` for identifiers and another as an array of `StringAttr` for strings. To ensure correct correspondence between bind names and devices, this patch also introduces two separate sets of device attributes. The routine operation is extended accordingly, along with the necessary updates to the OpenACC dialect and its lowering.
2025-07-17[flang][acc] Create UseDeviceOp for both results of hlfir.declare (#148017)nvptm1-1/+19
A sample such as ``` program test integer :: N = 100 real*8 :: b(-1:N) !$acc data copy(b) !$acc host_data use_device(b) call vadd(b) !$acc end host_data !$acc end data end ``` is lowered to ``` %13:2 = hlfir.declare %11(%12) {uniq_name = "_QFEb"} : (!fir.ref<!fir.array<?xf64>>, !fir.shapeshift<1>) -> (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.array<?xf64>>) %14 = acc.copyin var(%13#0 : !fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>> {dataClause = #acc<data_clause acc_copy>, name = "b"} acc.data dataOperands(%14 : !fir.box<!fir.array<?xf64>>) { %15 = acc.use_device var(%13#0 : !fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>> {name = "b"} acc.host_data dataOperands(%15 : !fir.box<!fir.array<?xf64>>) { fir.call @_QPvadd(%13#1) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>) -> () acc.terminator } acc.terminator } acc.copyout accVar(%14 : !fir.box<!fir.array<?xf64>>) to var(%13#0 : !fir.box<!fir.array<?xf64>>) {dataClause = #acc<data_clause acc_copy>, name = "b"} ``` Note that while the use_device clause is applied to %13#0, the argument passed to vadd is %13#1. To avoid problems later in lowering, this change additionally applies the use_device clause to %13#1, so that the resulting MLIR is ``` %13:2 = hlfir.declare %11(%12) {uniq_name = "_QFEb"} : (!fir.ref<!fir.array<?xf64>>, !fir.shapeshift<1>) -> (!fir.box<!fir.array<?xf64>>, !fir.ref<!fir.array<?xf64>>) %14 = acc.copyin var(%13#0 : !fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>> {dataClause = #acc<data_clause acc_copy>, name = "b"} acc.data dataOperands(%14 : !fir.box<!fir.array<?xf64>>) { %15 = acc.use_device var(%13#0 : !fir.box<!fir.array<?xf64>>) -> !fir.box<!fir.array<?xf64>> {name = "b"} %16 = acc.use_device varPtr(%13#1 : !fir.ref<!fir.array<?xf64>>) -> !fir.ref<!fir.array<?xf64>> {name = "b"} acc.host_data dataOperands(%15, %16 : !fir.box<!fir.array<?xf64>>, !fir.ref<!fir.array<?xf64>>) { fir.call @_QPvadd(%13#1) fastmath<contract> : (!fir.ref<!fir.array<?xf64>>) -> () acc.terminator } acc.terminator } acc.copyout accVar(%14 : !fir.box<!fir.array<?xf64>>) to var(%13#0 : !fir.box<!fir.array<?xf64>>) {dataClause = #acc<data_clause acc_copy>, name = "b"} ```
2025-07-14[flang][acc] Implement MappableType's generatePrivateInit (#148302)Razvan Lupusoru1-116/+23
The recipe body generation was moved from lowering into FIR's implementation of MappableType API. And now since all Fortran variable types implement this type, lowering of OpenACC was updated to use this API directly. No test changes were needed - all of the private, firstprivate, and recipe tests get the same body as before.
2025-07-10[flang][acc] Update FIR ref, heap, and pointer to be MappableType (#147834)Razvan Lupusoru1-7/+6
The MappableType OpenACC type interface is a richer interface that allows OpenACC dialect to be capable to better interact with a source dialect, FIR in this case. fir.box and fir.class types already implemented this interface. Now the same is being done with the other FIR types that represent variables. One additional notable change is that fir.array no longer implements this interface. This is because MappableType is primarily intended for variables - and FIR variables of this type have storage associated and thus there's a pointer-like type (fir.ref/heap/pointer) that holds the array type. The end goal of promoting these FIR types to MappableType is that we will soon implement ability to generate recipes outside of the frontend via this interface.
2025-07-09[mlir][acc][flang] Lower nested ACC loops with tile clause as collapsed ↵Vijay Kandiah1-6/+17
loops (#147801) In the case of nested loops, `acc.loop` is meant to subsume all of the loops that it applies to (when explicitly described as doing so in the OpenACC specification). So when there is a `acc loop tile(...)` present on nested Fortran DO loops, `acc.loop` should apply to the `n` loops that `tile` applies to. This change lowers such nested Fortran loops with tile clause into a collapsed `acc.loop` with `n` IVs, loop bounds, and step, in a similar fashion to the current lowering for acc loops with `collapse` clause.
2025-07-03[mlir][acc][flang] Use SymbolRefAttr for func_name in ACC routine (#146951)delaram-talaashrafi1-2/+5
Changed the type of the `func_name` attribute from SymbolNameAttr to SymbolRefAttr. SymbolNameAttr is typically used when defining a symbol (e.g., `sym_name`), while SymbolRefAttr is appropriate for referencing existing operations. This change ensures that MLIR can correctly track the link to the referenced `func.func` operation.
2025-06-19[flang][NFC] Move new code to right place (#144551)Peter Klausler1-1/+1
Some new code was added to flang/Semantics that only depends on facilities in flang/Evaluate. Move it into Evaluate and clean up some minor stylistic problems.
2025-06-11[flang][acc] Ensure all acc.loop get a default parallelism determination ↵Razvan Lupusoru1-0/+67
mode (#143623) This PR updates the flang lowering to explicitly implement the OpenACC rules: - As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop construct with no auto or seq clause is treated as if it has the independent clause when it is an orphaned loop construct or its parent compute construct is a parallel construct. - As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent compute construct is a kernels construct, a loop construct with no independent or seq clause is treated as if it has the auto clause. - Loops in serial regions are `seq` if they have no other parallelism marking such as gang, worker, vector. For now the `acc.loop` verifier has not yet been updated to enforce this.
2025-06-10[flang][NFC] Clean up code in two new functions (#142037)Peter Klausler1-2/+2
Two recently-added functions in Semantics/tools.h need some cleaning up to conform to the coding style of the project. One of them should actually be in Parser/tools.{h,cpp}, the other doesn't need to be defined in the header.
2025-05-23[flang][openacc] use location of end directive for exit operations (#140763)Andre Kuhlenschmidt1-16/+21
Make sure to preserve the location of the end statement on data declarations for use in debugging OpenACC runtime.
2025-05-23[Flang][OpenMP] fix crash on sematic error in atomic capture clause (#140710)Yang Zaizhou1-1/+3
Fix a crash caused by an invalid expression in the atomic capture clause, due to the `checkForSymbolMatch` function not accounting for `GetExpr` potentially returning null. Fix https://github.com/llvm/llvm-project/issues/139884
2025-05-21[OpenACC] rename private/firstprivate recipe attributes (#140719)Scott Manley1-42/+43
Make private and firstprivate recipe attribute names consistent with reductionRecipes attribute
2025-05-20[OpenACC] unify reduction and private-like init region recipes (#140652)Scott Manley1-251/+207
Between firstprivate, private and reduction init regions, the difference is largely whether or not the temp that is created is initialized or not. Some recent fixes were made to privatization (#135698, #137869) but did not get propagated to reductions, even though they need to return the yield the same things from their init regions. To mitigate this discrepancy in the future, refactor the init region recipes so they can be shared between the three recipe ops. Also add "none" to the OpenACC_ReductionOperator enum for better error checking.
2025-05-12[flang] Postpone hlfir.end_associate generation for calls. (#138786)Slava Zakharin1-6/+18
If we generate hlfir.end_associate at the end of the statement, we get easier optimizable HLFIR, because there are no compiler generated operations with side-effects in between the call and the consumers. This allows more hlfir.eval_in_mem to reuse the LHS instead of allocating temporary buffer. I do not think the same can be done for hlfir.copy_out always, e.g.: ``` subroutine test2(x) interface function array_func2(x,y) real:: x(*), array_func2(10), y end function array_func2 end interface real :: x(:) x = array_func2(x, 1.0) end subroutine test2 ``` If we postpone the copy-out until after the assignment, then the result may be wrong.
2025-05-09[flang][openacc] Allow open acc routines from other modules. (#136012)Andre Kuhlenschmidt1-163/+129
OpenACC routines annotations in separate compilation units currently get ignored, which leads to errors in compilation. There are two reason for currently ignoring open acc routine information and this PR is addressing both. - The module file reader doesn't read back in openacc directives from module files. - Simple fix in `flang/lib/Semantics/mod-file.cpp` - The lowering to HLFIR doesn't generate routine directives for symbols imported from other modules that are openacc routines. - This is the majority of this diff, and is address by the changes that start in `flang/lib/Lower/CallInterface.cpp`.
2025-04-29[flang][acc] Fix issue with privatization recipe for box ref (#137869)Razvan Lupusoru1-8/+27
When privatizing allocatable/pointer arrays, the code was creating a temporary but this was a box type. This led to inconsistency between the input and output of recipe. The updated logic now creates storage when a box ref is requested.
2025-04-28[flang][acc] Remove an unused variable (#137731)khaki31-3/+2
Fixes what https://github.com/llvm/llvm-project/pull/137691 introduced.
2025-04-28[flang][acc] Generate constructors and destructors for common blocks (#137691)khaki31-67/+71
2025-04-28[flang][OpenACC][OpenMP] Separate implementations of ATOMIC constructs (#137517)Krzysztof Parzyszek1-12/+308
The OpenMP implementation of the ATOMIC construct will change in the near future to accommodate atomic conditional-update and conditional- update-capture operations. This patch separates the shared implemen- tations to avoid interfering with OpenACC.
2025-04-21[flang][acc] Update stride calculation to include inner-dimensions (#136613)Razvan Lupusoru1-4/+14
The acc.bounds operation allows specifying stride - but it did not clarify what it meant. The dialect was updated to specifically note that stride must capture inner dimension sizes when specified for outer dimensions. Flang lowering was also updated for OpenACC to adhere to this. This was already the case for descriptor-based arrays - but now this is also being done for all arrays.
2025-04-17[flang][acc] Avoid implicitly privatizing IVs already privatized (#136181)Razvan Lupusoru1-101/+111
When generating `acc.loop`, the IV was always implicitly privatized. However, if the user explicitly privatized it, the IR generated wasn't quite right. For example: ``` !$acc loop private(i) do i = 1, n a(i) = b(i) end do ``` The IR generated looked like: ``` %65 = acc.private varPtr(%19#0 : !fir.ref<i32>) -> !fir.ref<i32> {implicit = true, name = "i"} %66:2 = hlfir.declare %65 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) %67 = acc.private varPtr(%66#0 : !fir.ref<i32>) -> !fir.ref<i32> {name = "i"} acc.loop private(@privatization_ref_i32 -> %65 : !fir.ref<i32>, @privatization_ref_i32 -> %67 : !fir.ref<i32>) control(%arg0 : i32) = (%c1_i32_46 : i32) to (%c10_i32_47 : i32) step (%c1_i32_48 : i32) { fir.store %arg0 to %66#0 : !fir.ref<i32> ``` In order to fix this, we first process all of the clauses. Then when attempting to generate implicit private IV, we look for an already existing data clause operation. The result is the following IR: ``` %65 = acc.private varPtr(%19#0 : !fir.ref<i32>) -> !fir.ref<i32> {name = "i"} %66:2 = hlfir.declare %65 {uniq_name = "_QFEi"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>) acc.loop private(@privatization_ref_i32 -> %65 : !fir.ref<i32>) control(%arg0 : i32) = (%c1_i32_46 : i32) to (%c10_i32_47 : i32) step (%c1_i32_48 : i32) { fir.store %arg0 to %66#0 : !fir.ref<i32> ```
2025-04-15[flang][OpenACC] use correct type when create private box init recipe (#135698)Scott Manley1-8/+16
The recipe for initializing private box types was incorrect because hlfir::createTempFromMold() is not a suitable utility function when the box element type is a trivial type.
2025-02-11[flang][acc] Fill-in name for privatized loop iv (#126601)Razvan Lupusoru1-0/+1
When the loop induction variable implicit private clause was being generated, the name was left empty. The intent is that the data clause operation holds the source language variable name. Thus, add the missing name now.
2025-02-10[flang][acc] Ensure data exit action is generated for present & nocreate ↵Razvan Lupusoru1-7/+34
(#126560) The acc.delete operation has semantics of decrementing present counter and deleting the data when the counter reaches zero. Since both acc.present and acc.nocreate are both intended to increment present counter, this matching exit action must be inserted. This is also what was specified in OpenACC dialect documentation: https://mlir.llvm.org/docs/Dialects/OpenACCDialect/#operation-categories
2025-02-04[flang][acc] Improve acc lowering around fir.box and arrays (#125600)Razvan Lupusoru1-170/+281
The current implementation of OpenACC lowering includes explicit expansion of following cases: - Creation of `acc.bounds` operations for all arrays, including those whose dimensions are captured in the type (eg `!fir.array<100xf32>`) - Expansion of box types by only putting the box's address in the data clause. The address was extracted with a `fir.box_addr` operation and the bounds were filled with `fir.box_dims` operation. However, with the creation of the new type interface `MappableType`, the idea is that specific type-based semantics can now be used. This also really simplifies representation in the IR. Consider the following example: ``` subroutine sub(arr) real :: arr(:) !$acc enter data copyin(arr) end subroutine ``` Before the current PR, the relevant acc dialect IR looked like: ``` func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "arr"}) { ... %1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} : (!fir.box<!fir.array<?xf32>>, !fir.dscope) -> (!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>) %c1 = arith.constant 1 : index %c0 = arith.constant 0 : index %2:3 = fir.box_dims %1#0, %c0 : (!fir.box<!fir.array<?xf32>>, index) -> (index, index, index) %c0_0 = arith.constant 0 : index %3 = arith.subi %2#1, %c1 : index %4 = acc.bounds lowerbound(%c0_0 : index) upperbound(%3 : index) extent(%2#1 : index) stride(%2#2 : index) startIdx(%c1 : index) {strideInBytes = true} %5 = fir.box_addr %1#0 : (!fir.box<!fir.array<?xf32>>) -> !fir.ref<!fir.array<?xf32>> %6 = acc.copyin varPtr(%5 : !fir.ref<!fir.array<?xf32>>) bounds(%4) -> !fir.ref<!fir.array<?xf32>> {name = "arr", structured = false} acc.enter_data dataOperands(%6 : !fir.ref<!fir.array<?xf32>>) ``` After the current change, it looks like: ``` func.func @_QPsub(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "arr"}) { ... %1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFsubEarr"} : (!fir.box<!fir.array<?xf32>>, !fir.dscope) -> (!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>) %2 = acc.copyin var(%1#0 : !fir.box<!fir.array<?xf32>>) -> !fir.box<!fir.array<?xf32>> {name = "arr", structured = false} acc.enter_data dataOperands(%2 : !fir.box<!fir.array<?xf32>>) ``` Restoring the old behavior can be done with following command line options: `--openacc-unwrap-fir-box=true --openacc-generate-default-bounds=true`
2025-01-21[flang][OpenMP][OpenACC] remove libEvaluate dependency in passes (#123784)jeanPerier1-4/+4
Move OpenACC/OpenMP helpers from Lower/DirectivesCommon.h that are also used in OpenACC and OpenMP mlir passes into a new Optimizer/Builder/DirectivesCommon.h so that parser and evaluate headers are not included in Optimizer libraries (this both introduce compile-time and link-time pointless overheads). This should fix https://github.com/llvm/llvm-project/issues/123377
2025-01-10[flang][acc] Add a missing acc.delete generation for the copyin clause (#122539)khaki31-8/+27
We are missing the deletion part of the copyin clause after a region or in a destructor. This PR completes its implementation for data regions, compute regions, and global declarations. Example: ```f90 subroutine sub() real :: x(1:10) !$acc data copyin(x) !$acc end data end subroutine sub ``` We are getting the following: ```mlir %5 = acc.copyin varPtr(%2#0 : !fir.ref<!fir.array<10xf32>>) bounds(%4) -> !fir.ref<!fir.array<10xf32>> {name = "x"} acc.data dataOperands(%5 : !fir.ref<!fir.array<10xf32>>) { acc.terminator } return ``` With this PR, we'll get: ```mlir %5 = acc.copyin varPtr(%2#0 : !fir.ref<!fir.array<10xf32>>) bounds(%4) -> !fir.ref<!fir.array<10xf32>> {name = "x"} acc.data dataOperands(%5 : !fir.ref<!fir.array<10xf32>>) { acc.terminator } acc.delete accPtr(%5 : !fir.ref<!fir.array<10xf32>>) bounds(%4) {dataClause = #acc<data_clause acc_copyin>, name = "x"} return ```
2024-12-25[flang] Fix some memory leaks (#121050)Matthias Springer1-6/+8
This commit fixes some but not all memory leaks in Flang. There are still 91 tests that fail with ASAN. - Use `mlir::OwningOpRef` instead of `std::unique_ptr`. The latter does not free allocations of nested blocks. - Pass `ModuleOp` as value instead of reference. - Add few missing deallocations in test cases and other places.
2024-12-18Re-apply (#117867): [flang][OpenMP] Implicitly map allocatable record fields ↵Kareem Ergawy1-1/+2
(#120374) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).
2024-12-18Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867)" ↵Kareem Ergawy1-2/+1
(#120360)
2024-12-18[flang][OpenMP] Implicitly map allocatable record fields (#117867)Kareem Ergawy1-1/+2
This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>
2024-12-09[MLIR][acc] Introduce varType to acc data clause operations (#119007)Razvan Lupusoru1-3/+5
The acc data clause operations hold an operand named `varPtr`. This was intended to hold a pointer to a variable - where the element type of that pointer specifies the type of the variable. However, for both memref and llvm dialects, this assumption is not true. This is because memref element type for cases like memref<10xf32> is simply f32 and for LLVM, after opaque pointers, the variable type is no longer recoverable. Thus, introduce varType to ensure that appropriate semantics are kept. Both the parser and printer for this new type attribute allow it to not be specified in cases where a dialect's getElementType() applied to `varPtr`'s type has a recoverable type. And more specifically, for FIR, no changes are needed in the MLIR unit tests.
2024-10-03[flang] replace fir.complex usages with mlir complex (#110850)jeanPerier1-8/+7
Core patch of https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292. After that, the last step is to remove fir.complex from FIR types.
2024-08-30[flang][openacc] Attach post allocate action on the correct operation (#106805)Valentin Clement (バレンタイン クレメン)1-14/+21
In some cases (when using stat), the action was attached to the invisible fir.result op. Apply same fix as in #89662.
2024-08-30[flang][acc] allow and ignore DIR between ACC and loops (#106522)jeanPerier1-0/+7
The current pattern was failing OpenACC semantics in acc parse tree canonicalization: ``` !acc loop !dir vector aligned do i=1,n ... ``` Fix it by moving the directive before the OpenACC construct node. Note that I think it could make sense to propagate the $dir info to the acc.loop, at least with classic flang, the $dir seems to make a difference. This is not done here since few directives are supported anyway.
2024-08-07[flang][acc] Improve lowering of Fortran optional in data clause (#102224)Razvan Lupusoru1-15/+34
Fortran optional arguments are effectively null references. To deal with this possibility, flang lowering of OpenACC data clauses creates three if-else regions when preparing the data pointer for the data clause: 1) Load box value from box reference 2) Load box addr from box value 3) Load box dims from box value However, this pattern makes it more complicated to find the original box reference. Effectively, the first if-else region to get the box value is not needed - since the value can be loaded before the corresponding `fir.box_addr` and `fir.box_dims` operations. Thus, reduce the number of if-else regions by deferring the box load to the use sites. For non-optional cases, the old functionality is left alone - which preloads the box value.
2024-07-03[mlir][acc] Added async to data clause operations. (#97307)Slava Zakharin1-135/+276
As long as the data clause operations are not tightly "associated" with the compute/data operations (e.g. they can be optimized as SSA producers and made block arguments), the information about the original async() clause should be attached to the data clause operations to make it easier to generate proper runtime actions for them. This change propagates the async() information from the OpenACC data/compute constructs to the data clause operations. This change also adds the CurrentDeviceIdResource to guarantee proper ordering of the operations that read and write the current device identifier.
2024-06-17[Flang] Switch to common::visit more call sites (#90018)Alexander Shaposhnikov1-20/+21
Switch to common::visit more call sites. Test plan: ninja check-all
2024-05-30[flang] Add parsing of DO CONCURRENT REDUCE clause (#92518)khaki31-14/+13
Derived from #92480. This PR supports parsing of the DO CONCURRENT REDUCE clause in Fortran 2023. Following the style of the OpenMP parser in MLIR, the front end accepts both arbitrary operations and procedures for the REDUCE clause. But later Semantics can notify type errors and resolve procedure names.
2024-05-08[flang] Lowering changes for assigning dummy_scope to hlfir.declare. (#90989)Slava Zakharin1-8/+14
The lowering produces fir.dummy_scope operation if the current function has dummy arguments. Each hlfir.declare generated for a dummy argument is then using the result of fir.dummy_scope as its dummy_scope operand. This is only done for HLFIR. I was not able to find a reliable way to identify dummy symbols in `genDeclareSymbol`, so I added a set of registered dummy symbols that is alive during the variables instantiation for the current function. The set is initialized during the mapping of the dummy argument symbols to their MLIR values. It is reset right after all variables are instantiated - this is done to avoid generating hlfir.declare operations with dummy_scope for the clones of the dummy symbols (e.g. this happens with OpenMP privatization). If this can be done in a cleaner way, please advise.
2024-04-28Reapply "[mlir] Mark `isa/dyn_cast/cast/...` member functions depreca… ↵Christian Sigg1-2/+2
(#90406) …ted. (#89998)" (#90250) This partially reverts commit 7aedd7dc754c74a49fe84ed2640e269c25414087. This change removes calls to the deprecated member functions. It does not mark the functions deprecated yet and does not disable the deprecation warning in TypeSwitch. This seems to cause problems with MSVC.
2024-04-26Revert "[mlir] Mark `isa/dyn_cast/cast/...` member functions deprecated. ↵dyung1-2/+2
(#89998)" (#90250) This reverts commit 950b7ce0b88318f9099e9a7c9817d224ebdc6337. This change is causing build failures on a bot https://lab.llvm.org/buildbot/#/builders/216/builds/38157
2024-04-26[mlir] Mark `isa/dyn_cast/cast/...` member functions deprecated. (#89998)Christian Sigg1-2/+2
See https://mlir.llvm.org/deprecation and https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-24[flang][cuda] Use fir.cuda_deallocate for automatic deallocation (#89662)Valentin Clement (バレンタイン クレメン)1-13/+19
Automatic deallocation of allocatable that are cuda device variable must use the fir.cuda_deallocate operation. This patch update the automatic deallocation code generation to use this operation when the variable is a cuda variable. This patch has also the side effect to correctly call `attachDeclarePostDeallocAction` for OpenACC declare variable on automatic deallocation as well. Update the code in `attachDeclarePostDeallocAction` so we do not attach on fir.result but on the correct last op.
2024-04-02[flang][NFC] use mlir::SymbolTable in lowering (#86673)jeanPerier1-2/+4
Whenever lowering is checking if a function or global already exists in the mlir::Module, it was doing module->lookup. On big programs (~5000 globals and functions), this causes important slowdowns because these lookups are linear. Use mlir::SymbolTable to speed-up these lookups. The SymbolTable has to be created from the ModuleOp and maintained in sync. It is therefore placed in the converter, and FirOPBuilders can take a pointer to it to speed-up the lookups. This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but some passes creating a lot of runtime calls could benefit from it too. More analysis will be needed. As an example of the speed-ups, this patch speeds-up compilation of Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine (there is still room for speed-ups).
2024-03-26[flang][acc] Add support for lowering combined constructs (#86696)Razvan Lupusoru1-30/+42
PR#80319 added support to record combined construct semantics via an attribute. Add lowering support for this.