Age | Commit message (Collapse) | Author | Files | Lines |
|
The atomic construct is a particularly complicated one. The directive
itself is pretty simple, it has 5 options for the 'atomic-clause'.
However, the associated statement is fairly complicated.
'read' accepts:
v = x;
'write' accepts:
x = expr;
'update' (or no clause) accepts:
x++;
x--;
++x;
--x;
x binop= expr;
x = x binop expr;
x = expr binop x;
'capture' accepts either a compound statement, or:
v = x++;
v = x--;
v = ++x;
v = --x;
v = x binop= expr;
v = x = x binop expr;
v = x = expr binop x;
IF 'capture' has a compound statement, it accepts:
{v = x; x binop= expr; }
{x binop= expr; v = x; }
{v = x; x = x binop expr; }
{v = x; x = expr binop x; }
{x = x binop expr ;v = x; }
{x = expr binop x; v = x; }
{v = x; x = expr; }
{v = x; x++; }
{v = x; ++x; }
{x++; v = x; }
{++x; v = x; }
{v = x; x--; }
{v = x; --x; }
{x--; v = x; }
{--x; v = x; }
While these are all quite complicated, there is a significant amount
of similarity between the 'capture' and 'update' lists, so this patch
reuses a lot of the same functions.
This patch implements the entirety of 'atomic', creating a new Sema file
for the sema for it, as it is fairly sizable.
|
|
Refactoring of how __builtin_dynamic_object_size() is calculated for
flexible array members (in preparation for adding support for the
'counted_by' attribute on pointers in structs).
The only functionality change is that we use the already emitted Expr
code to build our calculations off of rather than re-emitting the Expr.
That allows the 'StructFieldAccess' visitor to sift through all casts
and ArraySubscriptExprs to find the first MemberExpr. We build our GEPs
and calculate offsets based off of relative distances from that
MemberExpr.
The testcase passes execution tests.
Calculate the flexible array member's object size using these formulae
(note: if the calculation is negative, we return 0.):
struct p;
struct s {
/* ... */
int count;
struct p *array[] __attribute__((counted_by(count)));
};
1) 'ptr->array':
count = ptr->count;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
if (flexible_array_member_size < 0)
return 0;
return flexible_array_member_size;
2) '&ptr->array[idx]':
count = ptr->count;
index = idx;
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
index_size = index * flexible_array_member_base_size;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return flexible_array_member_size - index_size;
3) '&ptr->field':
count = ptr->count;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_offset = offsetof (struct s, field);
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0)
return 0;
return offset_diff + flexible_array_member_size;
4) '&ptr->field_array[idx]':
count = ptr->count;
index = idx;
sizeof_struct = sizeof (struct s);
flexible_array_member_base_size = sizeof (*ptr->array);
flexible_array_member_size =
count * flexible_array_member_base_size;
field_base_size = sizeof (*ptr->field_array);
field_offset = offsetof (struct s, field)
field_offset += index * field_base_size;
offset_diff = sizeof_struct - field_offset;
if (flexible_array_member_size < 0 || index < 0)
return 0;
return offset_diff + flexible_array_member_size;
---------
Signed-off-by: Bill Wendling <morbo@google.com>
|
|
(#124767)
This patch contains a number of changes relating to the above flag;
primarily it updates comment references to the old flag names,
"-fextend-lifetimes" and "-fextend-this-ptr" to refer to the new names,
"-fextend-variable-liveness[={all,this}]". These changes are all NFC.
This patch also removes the explicit -fextend-this-ptr-liveness flag
alias, and shortens the help-text for the main flag; these are both
changes that were meant to be applied in the initial PR (#110000), but
due to some user-error on my part they were not included in the merged
commit.
|
|
Following the previous patch which adds the "extend lifetimes" flag
without (almost) any functionality, this patch adds the real feature by
allowing Clang to emit fake uses. These are emitted as a new form of cleanup,
set for variable addresses, which just emits a fake use intrinsic when the
variable falls out of scope. The code for achieving this is simple, with most
of the logic centered on determining whether to emit a fake use for a given
address, and on ensuring that fake uses are ignored in a few cases.
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
|
|
This patch adds the following intrinsics:
* Fused multiply-add non-indexed
float16x8_t vmlalbq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
float16x8_t vmlaltq_f16_mf8_fpm(float16x8_t, mfloat8x16_t, mfloat8x16_t,
fpm_t)
float32x4_t vmlallbbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallbtq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlalltbq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
float32x4_t vmlallttq_f32_mf8_fpm(float32x4_t, mfloat8x16_t,
mfloat8x16_t, fpm_t)
* Floating-point multiply-add long to half-precision (vector, by
element)
float16x8_t vmlalbq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlalbq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vmlaltq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
* Floating-point multiply-add long-long to single-precision (vector, by
element)
float32x4_t vmlallbbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallbtq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlalltbq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vmlallttq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
|
|
This patch adds the following intrinsics:
float16x4_t vdot_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float16x8_t vdotq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)
float16x4_t vdot_lane_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x4_t vdot_laneq_f16_mf8_fpm(float16x4_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_lane_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float16x8_t vdotq_laneq_f16_mf8_fpm(float16x8_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x2_t vdot_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn, mfloat8x8_t
vm, fpm_t fpm)
float32x4_t vdotq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, fpm_t fpm)
float32x2_t vdot_lane_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x2_t vdot_laneq_f32_mf8_fpm(float32x2_t vd, mfloat8x8_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_lane_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x8_t vm, __builtin_constant_p(lane), fpm_t fpm)
float32x4_t vdotq_laneq_f32_mf8_fpm(float32x4_t vd, mfloat8x16_t vn,
mfloat8x16_t vm, __builtin_constant_p(lane), fpm_t fpm)
|
|
The patch adds the following intrinsics:
bfloat16x8_t vcvt1_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
bfloat16x8_t vcvt1_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
bfloat16x8_t vcvt2_bf16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
bfloat16x8_t vcvt2_low_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
bfloat16x8_t vcvt1_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
bfloat16x8_t vcvt2_high_bf16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
float16x8_t vcvt1_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
float16x8_t vcvt1_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
float16x8_t vcvt2_f16_mf8_fpm(mfloat8x8_t vn, fpm_t fpm)
float16x8_t vcvt2_low_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
float16x8_t vcvt1_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
float16x8_t vcvt2_high_f16_mf8_fpm(mfloat8x16_t vn, fpm_t fpm)
mfloat8x8_t vcvt_mf8_f32_fpm(float32x4_t vn, float32x4_t vm, fpm_t fpm)
mfloat8x16_t vcvt_high_mf8_f32_fpm(mfloat8x8_t vd, float32x4_t vn,
float32x4_t vm, fpm_t fpm)
mfloat8x8_t vcvt_mf8_f16_fpm(float16x4_t vn, float16x4_t vm, fpm_t fpm)
mfloat8x16_t vcvtq_mf8_f16_fpm(float16x8_t vn, float16x8_t vm, fpm_t
fpm)
Co-Authored-By: Caroline Concatto <caroline.concatto@arm.com>
|
|
(#124291)
As part of the "RemoveDIs" work to eliminate debug intrinsics, we're
replacing methods that use Instruction*'s as positions with iterators.
This patch changes some more complex call-sites, those crossing file
boundaries and where I've had to perform some minor rewrites.
|
|
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.
The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.
For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
kernel();
}
```
and the following call:
```
struct Kernel {
int dm1;
int dm2;
void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```
the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
Kernel kernel{dm1, dm2};
kernel();
}
```
Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.
These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
TemplateArgument type 'kernel_name'
TemplateArgument type 'Kernel'
ParmVarDecl kernel 'Kernel'
SYCLKernelCallStmt
CompoundStmt
<original statements>
OutlinedFunctionDecl
ImplicitParamDecl 'dm1' 'int'
ImplicitParamDecl 'dm2' 'int'
CompoundStmt
VarDecl 'kernel' 'Kernel'
<initialization of 'kernel' with 'dm1' and 'dm2'>
<transformed statements with redirected references of 'kernel'>
```
Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.
Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
|
|
taskloop simd. (#121746)
Added codegen support for combined masked constructs `Parallel masked
taskloop simd`.
Added implementation for `EmitOMPParallelMaskedTaskLoopSimdDirective`.
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
- Adding the changes from PRs:
- #116331
- #121852
- Fixes test `tools/dxil-dis/debug-info.ll`
- Address some missed comments in the previous PR
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
|
|
Added codegen support for combined masked constructs `masked taskloop.`
Added implementation for `EmitOMPMaskedTaskLoopDirective`.
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
(#121916)
Added codegen support for combined masked constructs `masked taskloop
simd`.
Added implementation for `EmitOMPMaskedTaskLoopSimdDirective`.
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
(exactly one sanitizer is required) (#122511)
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
|
|
taskloop (#121741)
Added codegen support for combined masked constructs Parallel masked
taskloop.
Added implementation for EmitOMPParallelMaskedTaskLoopDirective.
---------
Co-authored-by: Chandra Ghale <ghale@pe31.hpc.amslabs.hpecorp.net>
|
|
`CounterPair` can hold `<uint32_t, uint32_t>` instead of current
`unsigned`, to hold also the counter number of SkipPath. For now, this
change provides the skeleton and only `CounterPair::Executed` is used.
Each counter number can have `None` to suppress emitting counter
increment. 2nd element `Skipped` is initialized as `None` by default,
since most `Stmt*` don't have a pair of counters.
This change also provides stubs for the verifier. I'll provide the impl
of verifier for `+Asserts` later.
`markStmtAsUsed(bool, Stmt*)` may be used to inform that other side
counter may not emitted.
`markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be
excluded for emission in the case of skipping by constant folding. I put
it into places where I found.
`verifyCounterMap()` will check the coverage map and the counter map,
and can be used to report inconsistency.
These verifier methods shall be eliminated in `-Asserts`.
https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
|
|
|
|
This executable construct has a larger list of clauses than some of the
others, plus has some additional restrictions. This patch implements
the AST node, plus the 'cannot be the body of a if, while, do, switch,
or label' statement restriction. Future patches will handle the
rest of the restrictions, which are based on clauses.
|
|
The 'set' construct is another fairly simple one, it doesn't have an
associated statement and only a handful of allowed clauses. This patch
implements it and all the rules for it, allowing 3 of its for clauses.
The only exception is default_async, which will be implemented in a
future patch, because it isn't just being enabled, it needs a complete
new implementation.
|
|
- adding Flatten and Branch to if stmt.
- adding dxil control flow hint metadata generation
- modifing spirv OpSelectMerge to account for the specific attributes.
Closes #70112
---------
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
|
|
- Update pr labeler so new SPIRV files get properly labeled.
- Add distance target builtin to BuiltinsSPIRV.td.
- Update TargetBuiltins.h to account for spirv builtins.
- Update clang basic CMakeLists.txt to build spirv builtin tablegen.
- Hook up sema for SPIRV in Sema.h|cpp, SemaSPIRV.h|cpp, and
SemaChecking.cpp.
- Hookup sprv target builtins to SPIR.h|SPIR.cpp target.
- Update GBuiltin.cpp to emit spirv intrinsics when we get the expected
spirv target builtin.
Consensus was reach in this RFC to add both target builtins and pattern
matching:
https://discourse.llvm.org/t/rfc-add-targetbuiltins-for-spirv-to-support-hlsl/83329.
pattern matching will come in a separate pr this one just sets up the
groundwork to do target builtins for spirv.
partially resolves
[#99107](https://github.com/llvm/llvm-project/issues/99107)
|
|
|
|
These two constructs are very simple and similar, and only support 3
different clauses, two of which are already implemented. This patch
adds AST nodes for both constructs, and leaves the device_num clause
unimplemented, but enables the other two.
|
|
(#120…464)" (#120511)
This reverts commit 2691b964150c77a9e6967423383ad14a7693095e. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).
This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.
----
Original commit message:
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
|
|
This reverts commit 7eaf4708098c216bf432fc7e0bc79c3771e793a4.
Reason: buildbot breakage (e.g.,
https://lab.llvm.org/buildbot/#/builders/144/builds/14299/steps/6/logs/FAIL__Clang__ubsan-trap-debugloc_c)
|
|
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
|
|
The arguments to this are the same as for the 'wait' clause, so this
reuses all of that infrastructure. So all this has to do is support a
pair of clauses that are already implemented (if and async), plus create
an AST node. This patch does so, and adds proper testing.
|
|
These constructs are all very similar and closely related, so this patch
creates the AST nodes for them, serialization, printing/etc.
Additionally the restrictions are all added as tests/todos in the tests,
as those will have to be implemented once we get those clauses implemented.
|
|
Currently we have code with target hooks in CodeGenModule shared between
X86 and AArch64 for sorting MultiVersionResolverOptions. Those are used
when generating IFunc resolvers for FMV. The RISCV target has different
criteria for sorting, therefore it repeats sorting after calling
CodeGenFunction::EmitMultiVersionResolver.
I am moving the FMV priority logic in TargetInfo, so that it can be
implemented by the TargetParser which then makes it possible to query it
from llvm. Here is an example why this is handy:
https://github.com/llvm/llvm-project/pull/87939
|
|
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.
We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.
This depends on #116202.
Reviewers: lenary, BeMg, kito-cheng, preames, lukel97
Reviewed By: lenary
Pull Request: https://github.com/llvm/llvm-project/pull/116231
|
|
This reverts commit b36fcf4f493ad9d30455e178076d91be99f3a7d8.
This reverts commit c11b6b1b8af7454b35eef342162dc2cddf54b4de.
This reverts commit 775148f2367600f90d28684549865ee9ea2f11be.
multiple bot build breakages, e.g. https://lab.llvm.org/buildbot/#/builders/3/builds/8076
|
|
We have defined `__riscv_cpu_model` variable in #101449. It contains
`mvendorid`, `marchid` and `mimpid` fields which are read via system
call `sys_riscv_hwprobe`.
We can support `__builtin_cpu_is` via comparing values in compiler's
CPU definitions and `__riscv_cpu_model`.
This depends on #116202.
Reviewers: lenary, BeMg, kito-cheng, preames, lukel97
Reviewed By: lenary
Pull Request: https://github.com/llvm/llvm-project/pull/116231
|
|
Combined constructs (OpenACC 3.3 section 2.11) are a short-cut for
writing a `loop` construct immediately inside of a `compute` construct.
However, this interaction requires we do additional work to ensure that
we get the semantics between the two correct, as well as diagnostics.
This patch adds the semantic analysis for the constructs (but no
clauses), as well as the AST nodes.
|
|
The __builtin_counted_by_ref builtin is used on a flexible array
pointer and returns a pointer to the "counted_by" attribute's COUNT
argument, which is a field in the same non-anonymous struct as the
flexible array member. This is useful for automatically setting the
count field without needing the programmer's intervention. Otherwise
it's possible to get this anti-pattern:
ptr = alloc(<ty>, ..., COUNT);
ptr->FAM[9] = 42; /* <<< Sanitizer will complain */
ptr->count = COUNT;
To prevent this anti-pattern, the user can create an allocator that
automatically performs the assignment:
#define alloc(TY, FAM, COUNT) ({ \
TY __p = alloc(get_size(TY, COUNT)); \
if (__builtin_counted_by_ref(__p->FAM)) \
*__builtin_counted_by_ref(__p->FAM) = COUNT; \
__p; \
})
The builtin's behavior is heavily dependent upon the "counted_by"
attribute existing. It's main utility is during allocation to avoid
the above anti-pattern. If the flexible array member doesn't have that
attribute, the builtin becomes a no-op. Therefore, if the flexible
array member has a "count" field not referenced by "counted_by", it
must be set explicitly after the allocation as this builtin will
return a "nullptr" and the assignment will most likely be elided.
---------
Co-authored-by: Bill Wendling <isanbard@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
|
|
- Adding hlsl `splitdouble` intrinsics
- Adding DXIL lowering
- Adding SPIRV lowering
- Adding test
Fixes: #108901
---------
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
|
|
Follow up to #109133.
|
|
|
|
Make Constant Arrays in HLSL assignable.
Closes #109043
|
|
I missed a codepath during PR108008 so SME2/SVE2p1 builtins are
converting their struct return type into a large vector, which is
causing unnecessary casting via memory.
|
|
(#108853)
On some targets, an FP libcall with argument types such as long double
will be lowered to pass arguments indirectly via pointers. When this is
the case we should not mark the libcall with "int" TBAA as it may lead
to incorrect optimizations.
Currently, this can be seen for long doubles on x86_64-w64-mingw32. The
`load x86_fp80` after the call is (incorrectly) marked with "int" TBAA
(overwriting the previous metadata for "long double").
Nothing seems to break due to this currently as the metadata is being
incorrectly placed on the load and not the call. But if the metadata
is moved to the call (which this patch ensures), LLVM will optimize out
the setup for the arguments.
|
|
This patch extends [D34590](https://reviews.llvm.org/D34590) to check
assumption violations.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
|
|
Added codegen for scope directive, enabled allocate and firstprivate
clauses, and added scope directive LIT test.
Testing
- LIT tests (including new scope test).
- OpenMP scope example test from 5.2 OpenMP API examples document.
- Three executable scope tests from OpenMP_VV/sollve_vv suite.
|
|
based. (#108008)
This implements our original design now that LLVM is comfortable with
structs and arrays of scalable vector types. All SVE ACLE intrinsics
already use struct types so the effect of this change is purely the
types used for alloca and function parameters.
There should be no C/C++ user visible change with this patch.
|
|
This patch enable the function multiversion(FMV) and `target_clones`
attribute for RISC-V target.
The proposal of `target_clones` syntax can be found at the
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/48 (which has
landed), as modified by the proposed
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/85 (which adds the
priority syntax).
It supports the `target_clones` function attribute and function
multiversioning feature for RISC-V target. It will generate the ifunc
resolver function for the function that declared with target_clones
attribute.
The resolver function will check the version support by runtime object
`__riscv_feature_bits`.
For example:
```
__attribute__((target_clones("default", "arch=+ver1", "arch=+ver2"))) int bar() {
return 1;
}
```
the corresponding resolver will be like:
```
bar.resolver() {
__init_riscv_feature_bits();
// Check arch=+ver1
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION1) == BITMASK_OF_VERSION1) {
return bar.arch=+ver1;
} else {
// Check arch=+ver2
if ((__riscv_feature_bits.features[0] & BITMASK_OF_VERSION2) == BITMASK_OF_VERSION2) {
return bar.arch=+ver2;
} else {
// Default
return bar.default;
}
}
}
```
|
|
Implement support for HLSL intrinsic select.
This would close issue #75377
|
|
[[clang::coro_await_elidable]] (#99282)
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.
Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.
`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.
The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.
The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.
When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.
From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.
After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.
The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.
The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
|
|
|
|
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.
For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.
In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.
This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.
Fixes #87526
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: John McCall <rjmccall@gmail.com>
|
|
As per [1] the indices for a matrix element access operator shall have
integral or unscoped enumeration types and be non-negative. At the
moment, the index expression is converted to SizeType irrespective of
the signedness of the index expression. This causes implicit sign
conversion warnings if any of the indices is signed.
As per the spec, using signed types as indices is allowed and should not
cause any warnings. If the index expression is signed, extend to
SignedSizeType to avoid the warning.
[1]
https://clang.llvm.org/docs/MatrixTypes.html#matrix-type-element-access-operator
PR: https://github.com/llvm/llvm-project/pull/103044
|
|
This patch creates an additional EmitRISCVCpuSupports function to handle
situations with multiple features. It also modifies the original
EmitRISCVCpuSupports function to invoke the new one.
|