Age | Commit message (Collapse) | Author | Files | Lines |
|
NFC.
|
|
structs (#148947)
The handling of overlapped structure mapping in CGOpenMPRuntime.cpp can
lead to redundant zero-sized mapping nodes at runtime. This patch fixes
it using a combination of approaches: trivially adjacent struct members
won't have a mapping node created between them, and for more complicated
cases (inheritance) the physical layout of the struct/class is used to
make sure that elements aren't missed.
I've introduced a new class to track the state whilst iterating over the
struct. This reduces a bit of redundancy in the code (accumulating
CombinedInfo both during and after the loop), which I think is a bit
neater.
Before:
omptarget --> Entry 0: Base=0x00007fff8d483830, Begin=0x00007fff8d483830, Size=48, Type=0x20, Name=unknown
omptarget --> Entry 1: Base=0x00007fff8d483830, Begin=0x00007fff8d483830, Size=0, Type=0x1000000000003, Name=unknown
omptarget --> Entry 2: Base=0x00007fff8d483830, Begin=0x00007fff8d483834, Size=0, Type=0x1000000000003, Name=unknown
omptarget --> Entry 3: Base=0x00007fff8d483830, Begin=0x00007fff8d483838, Size=0, Type=0x1000000000003, Name=unknown
omptarget --> Entry 4: Base=0x00007fff8d483830, Begin=0x00007fff8d48383c, Size=20, Type=0x1000000000003, Name=unknown
omptarget --> Entry 5: Base=0x00007fff8d483830, Begin=0x00007fff8d483854, Size=0, Type=0x1000000000003, Name=unknown
omptarget --> Entry 6: Base=0x00007fff8d483830, Begin=0x00007fff8d483858, Size=0, Type=0x1000000000003, Name=unknown
omptarget --> Entry 7: Base=0x00007fff8d483830, Begin=0x00007fff8d48385c, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 8: Base=0x00007fff8d483830, Begin=0x00007fff8d483830, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 9: Base=0x00007fff8d483830, Begin=0x00007fff8d483834, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 10: Base=0x00007fff8d483830, Begin=0x00007fff8d483838, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 11: Base=0x00007fff8d483840, Begin=0x00005e7665275130, Size=32, Type=0x1000000000013, Name=unknown
omptarget --> Entry 12: Base=0x00007fff8d483830, Begin=0x00007fff8d483850, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 13: Base=0x00007fff8d483830, Begin=0x00007fff8d483854, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 14: Base=0x00007fff8d483830, Begin=0x00007fff8d483858, Size=4, Type=0x1000000000003, Name=unknown
After:
omptarget --> Entry 0: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f562e0, Size=48, Type=0x20, Name=unknown
omptarget --> Entry 1: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f562ec, Size=20, Type=0x1000000000003, Name=unknown
omptarget --> Entry 2: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f5630c, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 3: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f562e0, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 4: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f562e4, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 5: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f562e8, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 6: Base=0x00007fffd0f562f0, Begin=0x000058b6013fb130, Size=32, Type=0x1000000000013, Name=unknown
omptarget --> Entry 7: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f56300, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 8: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f56304, Size=4, Type=0x1000000000003, Name=unknown
omptarget --> Entry 9: Base=0x00007fffd0f562e0, Begin=0x00007fffd0f56308, Size=4, Type=0x1000000000003, Name=unknown
For code:
#include <cstdlib>
#include <cstdio>
struct S {
int x;
int y;
int z;
int *p1;
int *p2;
};
struct T : public S {
int a;
int b;
int c;
};
int main() {
T v;
v.p1 = (int*) calloc(8, sizeof(int));
v.p2 = (int*) calloc(8, sizeof(int));
#pragma omp target map(tofrom: v, v.x, v.y, v.z, v.p1[:8], v.a, v.b, v.c)
{
v.x++;
v.y += 2;
v.z += 3;
v.p1[0] += 4;
v.a += 7;
v.b += 5;
v.c += 6;
}
return 0;
}
|
|
Some `LangOptions` duplicate their `CodeGenOptions` counterparts. My
understanding is that this was done solely because some infrastructure
(like preprocessor initialization, serialization, module compatibility
checks, etc.) were only possible/convenient for `LangOptions`. This PR
implements the missing support for `CodeGenOptions`, which makes it
possible to remove some duplicate `LangOptions` fields and simplify the
logic. Motivated by https://github.com/llvm/llvm-project/pull/146342.
|
|
The refactored code would allow creating multiple member-of maps for the
same captured var, which would be useful for changes like
https://github.com/llvm/llvm-project/pull/145454.
|
|
|
|
clause (#134709)
Codegen support for reduction over private variable with reduction
clause. Section 7.6.10 in in OpenMP 6.0 spec.
- An internal shared copy is initialized with an initializer value.
- The shared copy is updated by combining its value with the values from
the private copies created by the clause.
- Once an encountering thread verifies that all updates are complete,
its original list item is updated by merging its value with that of the
shared copy and then broadcast to all threads.
Sample Test Case from OpenMP 6.0 Example
```
#include <assert.h>
#include <omp.h>
#define N 10
void do_red(int n, int *v, int &sum_v)
{
sum_v = 0; // sum_v is private
#pragma omp for reduction(original(private),+: sum_v)
for (int i = 0; i < n; i++)
{
sum_v += v[i];
}
}
int main(void)
{
int v[N];
for (int i = 0; i < N; i++)
v[i] = i;
#pragma omp parallel num_threads(4)
{
int s_v; // s_v is private
do_red(N, v, s_v);
assert(s_v == 45);
}
return 0;
}
```
Expected Codegen:
```
// A shared global/static variable is introduced for the reduction result.
// This variable is initialized (e.g., using memset or a UDR initializer)
// e.g., .omp.reduction.internal_private_var
// Barrier before any thread performs combination
call void @__kmpc_barrier(...)
// Initialization block (executed by thread 0)
// e.g., call void @llvm.memset.p0.i64(...) or call @udr_initializer(...)
call void @__kmpc_critical(...)
// Inside critical section:
// Load the current value from the shared variable
// Load the thread-local private variable's value
// Perform the reduction operation
// Store the result back to the shared variable
call void @__kmpc_end_critical(...)
// Barrier after all threads complete their combinations
call void @__kmpc_barrier(...)
// Broadcast phase:
// Load the final result from the shared variable)
// Store the final result to the original private variable in each thread
// Final barrier after broadcast
call void @__kmpc_barrier(...)
```
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
Apply the debug prefix mapping to the OpenMP location strings.
Fixes https://github.com/llvm/llvm-project/issues/82541
|
|
We can simplify the code with *Map::try_emplace where we need
default-constructed values while avoding calling constructors when
keys are already present.
|
|
|
|
(#135511)
|
|
This is an expensive header, only include it where needed. Move some
functions out of line to achieve that.
This reduces time to build clang by ~0.5% in terms of instructions
retired.
|
|
function to Triple (#126956)
I'm adding support for SPIR-V, so let's consolidate these checks.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
|
|
|
|
(NFC) (#128711)
pointer version (NFC)
Follow-up to #123569
|
|
(#124746)
This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare
Mapper directive.
Since both MLIR and Clang now support custom mappers, I've changed the
respective function params to no longer be optional as well.
Depends on #121005
|
|
Add null checking.
|
|
|
|
As part of the "RemoveDIs" project, BasicBlock::iterator now carries a
debug-info bit that's needed when getFirstNonPHI and similar feed into
instruction insertion positions. Call-sites where that's necessary were
updated a year ago; but to ensure some type safety however, we'd like to
have all calls to moveBefore use iterators.
This patch adds a (guaranteed dereferenceable) iterator-taking
moveBefore, and changes a bunch of call-sites where it's obviously safe
to change to use it by just calling getIterator() on an instruction
pointer. A follow-up patch will contain less-obviously-safe changes.
We'll eventually deprecate and remove the instruction-pointer
insertBefore, but not before adding concise documentation of what
considerations are needed (very few).
|
|
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Pos to be nonnull.
|
|
Note that PointerUnion::dyn_cast has been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
Literal migration would result in dyn_cast_if_present (see the
definition of PointerUnion::dyn_cast), but this patch uses dyn_cast
because we expect Data to be nonnull.
|
|
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.
This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
|
|
The preprocessor definition used to enable asserts and the one that
`llvm::Error` and `llvm::Expected` use to ensure all created instances are
checked are not the same. By making these checks inside of an `assert` in cases
where errors are not expected, certain build configurations would trigger
runtime failures (e.g. `-DLLVM_ENABLE_ASSERTIONS=OFF
-DLLVM_UNREACHABLE_OPTIMIZE=ON`).
The `llvm::cantFail()` function, which was intended for this use case, is used
by this patch in place of `assert` to prevent these runtime failures. In tests,
new preprocessor definitions based on `ASSERT_THAT_EXPECTED` and
`EXPECT_THAT_EXPECTED` are used instead, to avoid silent failures in release
builds.
|
|
(#110001)
This patch migrates the OpenMP UserDefinedMapper codegen from Clang to
the OpenMPIRBuilder. I will be adding further patches in the near future
so that OpenMP dialect in MLIR can make use of these.
|
|
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
|
|
Note that PointerUnion::{is,get} have been soft deprecated in
PointerUnion.h:
// FIXME: Replace the uses of is(), get() and dyn_cast() with
// isa<T>, cast<T> and the llvm::dyn_cast<T>
I'm not touching PointerUnion::dyn_cast for now because it's a bit
complicated; we could blindly migrate it to dyn_cast_if_present, but
we should probably use dyn_cast when the operand is known to be
non-null.
|
|
construct (#117196)
Initial parsing/sema for 'strict' modifier with 'num_tasks' and
‘grainsize’ clause is present in these commits
[grainsize_parsing](https://github.com/llvm/llvm-project/commit/ab9eac762c35068e77f57795e660d06f578c9614)
and
[num_tasks_parsing](https://github.com/llvm/llvm-project/commit/56c166017055595a9f26933e85bfd89e30c528d0#diff-4184486638e85284c3a2c961a81e7752231022daf97e411007c13a6732b50db9R6545)
. However, this implementation appears incomplete as it lacks code
generation support. A runtime patch was introduced in this runtime
commit
[runtime_patch](https://github.com/llvm/llvm-project/commit/540007b42701b5ac9adba076824bfd648a265413#diff-5e95f9319910d6965d09c301359dbe6b23f3eef5ce4d262ef2c2d2137875b5c4R374)
, which adds a new API, _kmpc_taskloop_5, to accommodate the strict
modifier.
In this patch I have added codegen support. When the strict modifier is
present alongside the grainsize or num_tasks clauses of taskloop
construct, the code now emits a call to _kmpc_taskloop_5, which includes
an additional parameter of type i32 with the value 1 to indicate the
strict modifier. If the strict modifier is not present, it falls back to
the existing _kmpc_taskloop API call.
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
Identified with misc-include-cleaner.
|
|
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.
This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
|
|
Follow up to #109133.
|
|
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
|
|
`llvm::Type::getPointerTo()` is to be deprecated & removed soon.
|
|
(#109431)
Detect by clang-tidy misc-use-internal-linkage
|
|
|
|
Try to avoid excess layer of indirection when possible.
p.s. Remove a call to raw_string_ostream::flush() which is a no-op.
|
|
|
|
I'm planning to deprecate DenseMap::FindAndConstruct in favor of
DenseMap::operator[].
|
|
I'm planning to deprecate DenseMap::FindAndConstruct in favor of
DenseMap::operator[].
|
|
(#102717)
|
|
(#102715)
|
|
This patch adds the code generation support for multi-dim `num_teams`
clause when it is used with `target teams ompx_bare` construct.
|
|
By the OpenMP standard, `num_teams` clause can only accept one
expression (for now). In this patch, we extend it to allow to accept
multiple expressions when it is used with `target teams ompx_bare`
construct. This will allow to launch a multi-dim grid, same as CUDA/HIP.
|
|
This is for mapping structure has data members, which have 'default'
mappers, where needs to map these members individually using
their 'default' mappers.
example map(tofrom: spp[0][0]), look at test case.
currently create 6 maps:
1>&spp, &spp[0], size 8, maptype TARGET_PARAM | FROM | TO
2>&spp[0], &spp[0][0], size(D)with maptype OMP_MAP_NONE, nullptr
3>&spp[0], &spp[0][0].e, size(e) with maptype MEMBER_OF | FROM | TO
4>&spp[0], &spp[0][0].h, size(h) with maptype MEMBER_OF | FROM | TO
5>&spp, &spp[0],size(8), maptype MEMBER_OF | IMPLICIT | FROM | TO
6>&spp[0], &spp[0][0].f size(D) with maptype MEMBER_OF |IMPLICIT
|PTR_AND_OBJ, @.omp_mapper._ZTS1C.default
maptype with/without OMP_MAP_PTR_AND_OBJ
For "2" and "5", since it is mapping pointer and pointee pair,
PTR_AND_OBJ should be set
But for "6" the PTR_AND_OBJ should not set.
However, "5" is duplicate with "1" can be skip.
To fix "2", during the call to emitCombinEntry with false with
NotTargetParams
instead !PartialStruct.PreliminaryMapData.BasePointers.empty(), since
all captures need to be TARGET_PARAM
And inside emitCombineEntry: check
!PartialStruct.PreliminaryMapData.BasePointers.empty() to set
PTR_AND_OBJ
For "5" and "6": the fix in generateInfoForComponentList:
Add new variable IsPartialMapped set with
!PartialStruct.PreliminaryMapData.BasePointers.empty();
When that is true, skip generate "5" and don"t set IsExpressionFirstInfo
to false, so that PTR_AND_OBJ would be set.
After fix: will have 5 maps instead 6
1>&spp, &spp[0], size 8, maptype TARGET_PARAM | FROM | TO
2>&spp[0], &spp[0][0], size(D), maptype PTR_AND_OBJ, nullptr
3>&spp[0], &spp[0][0].e, size(e), maptype MEMBER_OF_2 | FROM | TO
4>&spp[0], &spp[0][0].h, size(h), maptype MEMBER_OF_2 | FROM | TO
5>&spp[0], &spp[0][0].f size(32), maptype MEMBER_OF_2 | IMPLICIT,
@.omp_mapper._ZTS1C.default
For map(sppp[0][0][0]):
after fix: will have 6 maps instead 8.
https://github.com/llvm/llvm-project/pull/101903
|
|
It returns a range of variables (via Expr*), not a range of lists.
|
|
`emitOffloadingArraysArgument` in OpenMPIRBuilder (#97088)
This patch introduces a new interface in `OpenMPIRBuilder` that combines
the creation of the so-called offloading pointer arrays and their
subsequent preparation as arguments to the OpenMP runtime library. We
then use this in Clang.
This is intended to be used in the near future
by other frontends such as Flang when lowering MLIR to LLVMIR.
|
|
This was reported in https://pvs-studio.com/en/blog/posts/cpp/1126/,
fragment N9.
V523 The 'then' statement is equivalent to the subsequent code fragment.
CGOpenMPRuntime.cpp:6040, 6036
---------
Co-authored-by: Shivam Gupta <shivma98.tkg@gmail.com>
|
|
space (#99347)
The expectation for multiple iterators used in a single depend clause
(`depend(iterator(i=0:5,j=0:5), in:x[i][j])`) is that the iterator space
is the product of the iteration vectors (25 in that case). The current
codeGen only works correctly, if `numIterators() = 1`. For more
iterators, the execution results in runtime assertions or segfaults.
The modified codeGen first calculates the iteration space, then
multiplies to the number of dependencies in the depend clause and
finally adds to the total number of iterator dependencies.
|
|
This is a follow-up from the conversation starting at
https://github.com/llvm/llvm-project/pull/93809#issuecomment-2173729801
The root problem that motivated the change are external AST sources that
compute `ASTRecordLayout`s themselves instead of letting Clang compute
them from the AST. One such example is LLDB using DWARF to get the
definitive offsets and sizes of C++ structures. Such layouts should be
considered correct (modulo buggy DWARF), but various assertions and
lowering logic around the `CGRecordLayoutBuilder` relies on the AST
having `[[no_unique_address]]` attached to them. This is a
layout-altering attribute which is not encoded in DWARF. This causes us
LLDB to trip over the various LLVM<->Clang layout consistency checks.
There has been precedent for avoiding such layout-altering attributes
from affecting lowering with externally-provided layouts (e.g., packed
structs).
This patch proposes to replace the `isZeroSize` checks in
`CGRecordLayoutBuilder` (which roughly means "empty field with
[[no_unique_address]]") with checks for
`CodeGen::isEmptyField`/`CodeGen::isEmptyRecord`.
**Details**
The main strategy here was to change the `isZeroSize` check in
`CGRecordLowering::accumulateFields` and
`CGRecordLowering::accumulateBases` to use the `isEmptyXXX` APIs
instead, preventing empty fields from being added to the `Members` and
`Bases` structures. The rest of the changes fall out from here, to
prevent lookups into these structures (for field numbers or base
indices) from failing.
Added `isEmptyRecordForLayout` and `isEmptyFieldForLayout` (open to
better naming suggestions). The main difference to the existing
`isEmptyRecord`/`isEmptyField` APIs, is that the `isEmptyXXXForLayout`
counterparts don't have special treatment for `unnamed bitfields`/arrays
and also treat fields of empty types as if they had
`[[no_unique_address]]` (i.e., just like the `AsIfNoUniqueAddr` in
`isEmptyField` does).
|
|
While lowering (#pragma omp target update from), clang's generated
.omp_task_entry. is setting up 9 arguments while calling
__tgt_target_data_update_nowait_mapper.
At the same time, in __tgt_target_data_update_nowait_mapper, call to
targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming
it has the argument count listed in the signature.
AARCH64 asm sequence for this is as follows (removed unrelated insns):
`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper
__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8,
:got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8,
:got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp
ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b
ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`
On AArch64, call to __tgt_target_data_update_nowait_mapper in
.omp_task_entry. sets up only single space on stack and this results in
ovewriting $fp and subsequent stack corruption. This issue can be
credited to discrepancy of __tgt_target_data_update_nowait_mapper
signature in openmp/libomptarget/include/omptarget.h taking 13 arguments
while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.
This patch modifies __tgt_target_data_update_nowait_mapper signature to
match .omp_task_entry usage(and other 2 files mentioned above).
Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
|
|
Fix another runtime problem when explicit map both pointer and pointee
in target data region.
In #92210, problem is only addressed in target region, but missing for
target data region.
The change just passing AreBothBasePtrAndPteeMapped in
generateInfoForComponentList when processing target data.
---------
Co-authored-by: Alexey Bataev <a.bataev@gmx.com>
|
|
This patch fixes the dynamic schedule tracking.
|