aboutsummaryrefslogtreecommitdiff
path: root/flang-rt/lib/runtime/assign.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-07-18[flang][runtime] Further work on speeding up work queue operations (#149189)Peter Klausler1-3/+9
This patch avoids a trip through the work queue engine for cases on a CPU where finalization and destruction actions during assignment were handled without enqueueing another task.
2025-07-14[flang][runtime] Speed up initialization & destruction (#148087)Peter Klausler1-16/+19
Rework derived type initialization in the runtime to just initialize the first element of any array, and then memcpy it to the others, rather than exercising the per-component paths for each element. Reword derived type destruction in the runtime to detect and exploit a fast path for allocatable components whose types themselves don't need nested destruction. Small tweaks were made in hot paths exposed by profiling in descriptor operations and derived type assignment.
2025-07-08Fix the type of offset that broke 32-bit flang-rt build to use `uint64_t` ↵Daniel Chen1-2/+4
consistently (#147359) The recent change of `flang-rt` has code like `std::size_t offset{offset_};`. It broke the 32-bit `flang-rt` build because `Component::offset_` is of type `uint64_t` but `size_t` varies. Clang complains ``` error: non-constant-expression cannot be narrowed from type 'std::uint64_t' (aka 'unsigned long long') to 'std::size_t' (aka 'unsigned long') in initializer list [-Wc++11-narrowing] 143 | std::size_t offset{offset_}; | ^~~~~~~ ``` This patch is to use the consistent `uint64_t` for offset.
2025-06-16[flang] Restructure runtime to avoid recursion (relanding) (#143993)Peter Klausler1-232/+431
Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. (Relanding this patch after reverting initial attempt due to some test failures that needed some time to analyze and fix.) Fixes https://github.com/llvm/llvm-project/issues/142481.
2025-06-11Revert runtime work queue patch, it breaks some tests that need ↵Peter Klausler1-399/+224
investigation (#143713) Revert "[flang][runtime] Another try to fix build failure" This reverts commit 13869cac2b5051e453aa96ad71220d9d33404620. Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors (#143650)" This reverts commit d75e28477af0baa063a4d4cc7b3cf657cfadd758. Revert "[flang][runtime] Replace recursion with iterative work queue (#137727)" This reverts commit 163c67ad3d1bf7af6590930d8f18700d65ad4564.
2025-06-10[flang][runtime] Replace recursion with iterative work queue (#137727)Peter Klausler1-224/+399
Recursion, both direct and indirect, prevents accurate stack size calculation at link time for GPU device code. Restructure these recursive (often mutually so) routines in the Fortran runtime with new implementations based on an iterative work queue with suspendable/resumable work tickets: Assign, Initialize, initializeClone, Finalize, and Destroy. Default derived type I/O is also recursive, but already disabled. It can be added to this new framework later if the overall approach succeeds. Note that derived type FINAL subroutine calls, defined assignments, and defined I/O procedures all perform callbacks into user code, which may well reenter the runtime library. This kind of recursion is not handled by this change, although it may be possible to do so in the future using thread-local work queues. The effects of this restructuring on CPU performance are yet to be measured.
2025-06-04[flang][runtime] Accommodate change of type in assignment to allocatable ↵Peter Klausler1-0/+1
(#141988) When an assignment to a derived type allocatable requires (re)allocation, its type may change to that of the right-hand side. The code didn't update its derived type pointer, leading to the wrong type being put into the descriptors created for elemental defined assignment subroutine calls. Fixes https://github.com/llvm/llvm-project/issues/141835.
2025-05-22[flang-rt] Fix usage of kNoAsyncId in assign.cpp (#141077)Kajetan Puchalski1-1/+1
Fix a leftover old variable name causing build bot errors. Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-05-22[flang-rt] Optimise ShallowCopy and use it in CopyInAssign (#140569)Kajetan Puchalski1-5/+7
Using Descriptor.Element<>() when iterating through a rank-1 array is currently inefficient, because the generic implementation suitable for arrays of any rank makes the compiler unable to perform optimisations that would make the rank-1 case considerably faster. This is currently done inside ShallowCopy, as well as by CopyInAssign, where the implementation of elemental copies (inside Assign) is equivalent to ShallowCopyDiscontiguousToDiscontiguous. To address that, add a DescriptorIterator abstraction specialised for arrays of various ranks, and use that throughout ShallowCopy to iterate over the arrays. Furthermore, depending on the pointer type passed to memcpy, the optimiser can remove the memcpy calls from ShallowCopy altogether which can result in substantial performance improvements on its own. Specialise ShallowCopy for various element pointer types to make these optimisations possible. Finally, replace the call to Assign inside CopyInAssign with a call to newly optimised ShallowCopy. For the thornado-mini application, this reduces the runtime by 27.7%. --------- Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
2025-05-19[flang][cuda] Use a reference for asyncObject (#140614)Valentin Clement (バレンタイン クレメン)1-2/+2
Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation. New tentative with some fix. The previous was reverted some time ago. Reviewed in #138010
2025-05-15[flang] Clear obsolete type from reallocated allocatable (#139788)Peter Klausler1-1/+4
When an assignment to a polymorphic allocatable changes its type to an intrinsic type, be sure to reset its descriptor's derived type pointer to null. Fixes https://github.com/llvm/llvm-project/issues/136522.
2025-05-01Revert "[flang][cuda] Use a reference for asyncObject" (#138221)Valentin Clement (バレンタイン クレメン)1-2/+2
Reverts llvm/llvm-project#138186
2025-05-01[flang][cuda] Use a reference for asyncObject (#138186)Valentin Clement (バレンタイン クレメン)1-2/+2
Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation. New tentative with some fix. The previous was reverted yesterday.
2025-04-30Revert "[flang][cuda] Use a reference for asyncObject (#138010)" (#138082)Valentin Clement (バレンタイン クレメン)1-2/+2
This reverts commit 9b0eaf71e674a28ee55be3afa11b5f7d4da732c0.
2025-04-30[flang][cuda] Use a reference for asyncObject (#138010)Valentin Clement (バレンタイン クレメン)1-2/+2
Switch from `int64_t` to `int64_t*` to fit with the rest of the implementation.
2025-04-09[flang][cuda] Add asyncId to allocate entry point (#134947)Valentin Clement (バレンタイン クレメン)1-2/+2
2025-03-06[flang] explicitly cast the pointer to void* in std::memcpy calls (NFC) ↵Kelvin Li1-2/+3
(#129946) This patch is to add the explicit cast to the first argument of std::memcpy.
2025-02-16[Flang][NFC] Move runtime library files to flang-rt (#110298)Michael Kruse1-0/+622
Mostly mechanical changes in preparation of extracting the Flang-RT "subproject" in #110217. This PR intends to only move pre-existing files to the new folder structure, with no behavioral change. Common files (headers, testing, cmake) shared by Flang-RT and Flang remain in `flang/`. Some cosmetic changes and files paths were necessary: * Relative paths to the new path for the source files and `add_subdirectory`. * Add the new location's include directory to `include_directories` * The unittest/Evaluate directory has unitests for flang-rt and Flang. A new `CMakeLists.txt` was introduced for the flang-rt tests. * Change the `#include` paths relative to the include directive * clang-format on the `#include` directives * Since the paths are part if the copyright header and include guards, a script was used to canonicalize those * `test/Runtime` and runtime tests in `test/Driver` are moved, but the lit.cfg.py mechanism to execute the will only be added in #110217.