Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch avoids a trip through the work queue engine for cases on a
CPU where finalization and destruction actions during assignment were
handled without enqueueing another task.
|
|
Rework derived type initialization in the runtime to just initialize the
first element of any array, and then memcpy it to the others, rather
than exercising the per-component paths for each element.
Reword derived type destruction in the runtime to detect and exploit a
fast path for allocatable components whose types themselves don't need
nested destruction.
Small tweaks were made in hot paths exposed by profiling in descriptor
operations and derived type assignment.
|
|
consistently (#147359)
The recent change of `flang-rt` has code like `std::size_t
offset{offset_};`.
It broke the 32-bit `flang-rt` build because `Component::offset_` is of
type `uint64_t` but `size_t` varies.
Clang complains
```
error: non-constant-expression cannot be narrowed from type 'std::uint64_t' (aka 'unsigned long long') to 'std::size_t' (aka 'unsigned long') in initializer list [-Wc++11-narrowing]
143 | std::size_t offset{offset_};
| ^~~~~~~
```
This patch is to use the consistent `uint64_t` for offset.
|
|
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.
(Relanding this patch after reverting initial attempt due to some test
failures that needed some time to analyze and fix.)
Fixes https://github.com/llvm/llvm-project/issues/142481.
|
|
investigation (#143713)
Revert "[flang][runtime] Another try to fix build failure"
This reverts commit 13869cac2b5051e453aa96ad71220d9d33404620.
Revert "[flang][runtime] Fix build bot flang-runtime-cuda-gcc errors
(#143650)"
This reverts commit d75e28477af0baa063a4d4cc7b3cf657cfadd758.
Revert "[flang][runtime] Replace recursion with iterative work queue
(#137727)"
This reverts commit 163c67ad3d1bf7af6590930d8f18700d65ad4564.
|
|
Recursion, both direct and indirect, prevents accurate stack size
calculation at link time for GPU device code. Restructure these
recursive (often mutually so) routines in the Fortran runtime with new
implementations based on an iterative work queue with
suspendable/resumable work tickets: Assign, Initialize, initializeClone,
Finalize, and Destroy.
Default derived type I/O is also recursive, but already disabled. It can
be added to this new framework later if the overall approach succeeds.
Note that derived type FINAL subroutine calls, defined assignments, and
defined I/O procedures all perform callbacks into user code, which may
well reenter the runtime library. This kind of recursion is not handled
by this change, although it may be possible to do so in the future using
thread-local work queues.
The effects of this restructuring on CPU performance are yet to be
measured.
|
|
(#141988)
When an assignment to a derived type allocatable requires
(re)allocation, its type may change to that of the right-hand side. The
code didn't update its derived type pointer, leading to the wrong type
being put into the descriptors created for elemental defined assignment
subroutine calls.
Fixes https://github.com/llvm/llvm-project/issues/141835.
|
|
Fix a leftover old variable name causing build bot errors.
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
|
|
Using Descriptor.Element<>() when iterating through a rank-1 array is
currently inefficient, because the generic implementation suitable for
arrays of any rank makes the compiler unable to perform optimisations
that would make the rank-1 case considerably faster.
This is currently done inside ShallowCopy, as well as by CopyInAssign,
where the implementation of elemental copies (inside Assign) is
equivalent to ShallowCopyDiscontiguousToDiscontiguous.
To address that, add a DescriptorIterator abstraction specialised for
arrays of various ranks, and use that throughout ShallowCopy to iterate
over the arrays.
Furthermore, depending on the pointer type passed to memcpy, the
optimiser can remove the memcpy calls from ShallowCopy altogether which
can result in substantial performance improvements on its own.
Specialise ShallowCopy for various element pointer types to make these
optimisations possible.
Finally, replace the call to Assign inside CopyInAssign with a call to
newly optimised ShallowCopy.
For the thornado-mini application, this reduces the runtime by 27.7%.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
|
|
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
New tentative with some fix. The previous was reverted some time ago.
Reviewed in #138010
|
|
When an assignment to a polymorphic allocatable changes its type to an
intrinsic type, be sure to reset its descriptor's derived type pointer
to null.
Fixes https://github.com/llvm/llvm-project/issues/136522.
|
|
Reverts llvm/llvm-project#138186
|
|
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
New tentative with some fix. The previous was reverted yesterday.
|
|
This reverts commit 9b0eaf71e674a28ee55be3afa11b5f7d4da732c0.
|
|
Switch from `int64_t` to `int64_t*` to fit with the rest of the
implementation.
|
|
|
|
(#129946)
This patch is to add the explicit cast to the first argument of std::memcpy.
|
|
Mostly mechanical changes in preparation of extracting the Flang-RT
"subproject" in #110217. This PR intends to only move pre-existing files
to the new folder structure, with no behavioral change. Common files
(headers, testing, cmake) shared by Flang-RT and Flang remain in
`flang/`.
Some cosmetic changes and files paths were necessary:
* Relative paths to the new path for the source files and
`add_subdirectory`.
* Add the new location's include directory to `include_directories`
* The unittest/Evaluate directory has unitests for flang-rt and Flang. A
new `CMakeLists.txt` was introduced for the flang-rt tests.
* Change the `#include` paths relative to the include directive
* clang-format on the `#include` directives
* Since the paths are part if the copyright header and include guards, a
script was used to canonicalize those
* `test/Runtime` and runtime tests in `test/Driver` are moved, but the
lit.cfg.py mechanism to execute the will only be added in #110217.
|