Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
`TargetLibraryInfoImpl::addVectorizableFunctionsFromVecLib` has a lot of
data in local stack arrays, which MSVC keeps on the stack even in
release builds. To reduce stack usage, the data arrays (which are
const), are moved outside the function as statics. This drops the method
stack usage to be negligible.
|
|
|
|
Indirect call handling missed setting an `EntryDiscriminator` while it's
set for direct calls and tail calls.
Improve YAML profile accuracy by unifying the destination setting
between direct and indirect calls into `setCSIDestination` method.
Depends on: https://github.com/llvm/llvm-project/pull/86848
Test Plan: Updated bolt/test/X86/yaml-secondary-entry-discriminator.s
Reviewers: ayermolo, maksfb, rafaelauler
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/82128
|
|
Avoid deadlocks in the Alarm class by releasing the lock before invoking
callbacks. This deadlock manifested itself in the ProgressManager:
1. On the main thread, the ProgressManager acquires its lock in
ProgressManager::Decrement and calls Alarm::Create.
2. On the main thread, the Alarm acquires its lock in Alarm::Create.
3. On the alarm thread, the Alarm acquires its lock after waiting on the
condition variable and calls ProgressManager::Expire.
4. On the alarm thread, the ProgressManager acquires its lock in
ProgressManager::Expire.
Note how the two threads are acquiring the locks in different orders.
Deadlocks can be avoided by always acquiring locks in the same order,
but since the two mutexes here are private implementation details,
belong to different classes, that's not straightforward. Luckily, we
don't need to have the Alarm mutex locked when invoking the callbacks.
That exactly how this patch solves the issue.
|
|
GNU and MSVC have extensions where flexible array members (or their
equivalent) can be in unions or alone in structs. This is already fully
supported in Clang through the 0-sized array ("fake flexible array")
extension or when C99 flexible array members have been syntactically
obfuscated.
Clang needs to explicitly allow these extensions directly for C99
flexible arrays, since they are common code patterns in active use by
the
Linux kernel (and other projects). Such projects have been using either
0-sized arrays (which is considered deprecated in favor of C99 flexible
array members) or via obfuscated syntax, both of which complicate their
code bases.
For example, these do not error by default:
```
union one {
int a;
int b[0];
};
union two {
int a;
struct {
struct { } __empty;
int b[];
};
};
```
But this does:
```
union three {
int a;
int b[];
};
```
Remove the default error diagnostics for this but continue to provide
warnings under Microsoft or GNU extensions checks. This will allow for
a seamless transition for code bases away from 0-sized arrays without
losing existing code patterns. Add explicit checking for the warnings
under various constructions.
Additionally fixes a CodeGen bug with flexible array members in unions
in C++, which was found when adding a testcase for:
```
union { char x[]; } z = {0};
```
which only had Sema tests originally.
Fixes #84565
|
|
This fixes all shellcheck warnings we have in `monolithic-linux.sh` and
`monolithic-windows.sh`.
All of them have to do with
[SC2086](https://www.shellcheck.net/wiki/SC2086) - Double quote to
prevent globbing and word splitting.
|
|
Make them start with 1 instead of 0 (reserved for primary entry point).
Test Plan:
```
bin/llvm-lit -a tools/bolt/test/X86/yaml-secondary-entry-discriminator.s
```
Reviewers: rafaelauler, ayermolo, maksfb, dcci
Reviewed By: maksfb
Pull Request: https://github.com/llvm/llvm-project/pull/86848
|
|
In getRVVTypeSize(clang::ASTContext &, clang::BuiltinType const *)
potential integer overflow occurs on expression VScale->first * MinElts
with type unsigned int (32 bits, unsigned) is evaluated using 32-bit
arithmetic, and then used in a context that expects an expression of
type uint64_t (64 bits, unsigned).
To avoid integer overflow, this patch changes the types of variables
MinElts and EltSize to uint64_t instead of the cast.
The change matches what was originally done in https://github.com/llvm/llvm-project/commit/7372c0d46d2185017c509eb30910b102b4f9cdaa. Looks like the revert happened in https://github.com/llvm/llvm-project/commit/c92ad411f2f94d8521cd18abcb37285f9a390ecb
|
|
This patch replaces getAs<> with castAs<> to resolve potential static
analyzer bugs for
1. Dereferencing Proto1->param_type_begin(), which is known to be
nullptr
2. Dereferencing Proto2->param_type_begin(), which is known to be
nullptr
3. Dereferencing a pointer issue with nullptr Proto1 when calling
param_type_end()
4. Dereferencing a pointer issue with nullptr Proto2 when calling
param_type_end()
in clang::Sema::getMoreSpecializedTemplate().
|
|
These values vary by system, this flag allows you to toggle their value.
|
|
Basically, testing for interaction of shNadd matching with one step
of reassociation in the add.
|
|
limit.
Used RecursionMaxDepth to limit number of lookups in BoUpSLP::getVectorElementSize and limited reduction width for bool reduced values.
|
|
Reported by Static Analyzer Tool:
In clang::installapi::InstallAPIVisitor::VisitFunctionDecl(clang::FunctionDecl
const *): Using the auto keyword without an & causes the copy of an
object of type DynTypedNode.
|
|
|
|
(#86712)
|
|
Need to adjust ReductionBitWIdth after minbitwidth analysis, if the
demanded bits analysis sjows tht its size is less than the size of the
vectorized value. It prevents incorrect sign-zero extension
transformation after.
|
|
The MachO format supports relative offsets for ObjC method lists. This
support is present already in ld64. With this change we implement this
support in lld also.
Relative method lists can be identified by a specific flag (0x80000000)
in the method list header. When this flag is present, the method list
will contain 32-bit relative offsets to the current Program Counter
(PC), instead of absolute pointers.
Additionally, when relative method lists are used, the offset to the
selector name will now be relative and point to the selector reference
(selref) instead of the name itself.
|
|
NFC.
|
|
This reverts commit df75183d70e029352a49c93f275db703c81a65c1.
Revert for now as this appears to cause failures on some buildbots,
e.g.:
https://lab.llvm.org/buildbot/#/builders/93/builds/19428/steps/10/logs/stdio
|
|
so that --noinhibit-exec downgrades the error to a warning, which
matches GNU ld. Most recoverable errors should use errorOrWarn.
|
|
This is fixing a report from ubsan which I don't think is super high
value, but our testsuite hits it on
TestDataFormatterObjCNSContainer.py so I'd like to work around it. We
are getting
```
runtime error: left shift of negative value -8827055269646171913
3159 int64_t data_payload_signed =
3160 ((int64_t)((int64_t)unobfuscated
-> 3161 << m_objc_debug_taggedpointer_ext_payload_lshift) >>
3162 m_objc_debug_taggedpointer_ext_payload_rshift);
```
At this point `unobfuscated` is 0x85800000000000f7 and
`m_objc_debug_taggedpointer_ext_payload_lshift` is 9, so
`(int64_t)0x85800000000000f7<<9` shifts off the "sign" bit and then some
zeroes etc, and that's how we get this error.
We're only trying to extract some bits in the middle of the doubleword,
so the fact that we're "losing" the sign is not a bug. Change the inner
cast to (uint64_t).
|
|
|
|
|
|
|
|
{,u}fromfp{,x}* (#86692)"
This reverts commit cd17082b24079a31eff0057abe407da5cfb7b0fc because the newly
added tests fail on 32b ARM.
Link: #86692
Link: https://lab.llvm.org/buildbot/#/builders/229/builds/24458
|
|
This patch adds error handling for shm_open failures in one case where
they were not handled before and also makes an error handler in another
case report the value of errno for diagnosis.
|
|
|
|
|
|
load/store on stack.
We try clamp the index to be within the bounds of the stack object
we create, but if we don't freeze it, poison can propagate into the
clamp code. This can cause the access to leave the bounds of the
stack object.
We have other instances of this issue in type legalization and extract_elt/subvector,
but posting this patch first for direction check.
Fixes #86717
|
|
|
|
This finishes the revert started in
aeb8628c218f8224e08dddcdd3199a445d8607a8 which didn't completely back
out the original patch.
|
|
|
|
Summary:
Previously we had an interface that checked these functions pointers to
see if they are implemented by the plugin. This was removed as currently
every single function is implemented as a part of the common interface.
These checks are now always true and do nothing.
|
|
|
|
to authenticate signed pointers (#86721)
To authenticate pointers, CodeGen needs access to the key and
discriminators that were used to sign the pointer. That information is
sometimes known from the context, but not always, which is why `Address`
needs to hold that information.
This patch adds methods and data members to `Address`, which will be
needed in subsequent patches to authenticate signed pointers, and uses
the newly added methods throughout CodeGen. Although this patch isn't
strictly NFC as it causes CodeGen to use different code paths in some
cases (e.g., `mergeAddressesInConditionalExpr`), it doesn't cause any
changes in functionality as it doesn't add any information needed for
authentication.
In addition to the changes mentioned above, this patch introduces class
`RawAddress`, which contains a pointer that we know is unsigned, and
adds several new functions for creating `Address` and `LValue` objects.
This reapplies 8bd1f9116aab879183f34707e6d21c7051d083b6. The commit
broke msan bots because LValue::IsKnownNonNull was uninitialized.
|
|
Some Zvk instructions use vd as a source regardless of tail policy.
Model this in the MC layer. We already do this for FMA for example.
|
|
This is needed to provide proper size and offset for the GPRPair subreg
indices on RISC-V. The size of a GPR already uses HwMode. Previously we
said the subreg indices have unknown size and offset, but this stops
DwarfExpression::addMachineReg from being able to find the registers
that make up the pair.
I believe this fixes https://github.com/llvm/llvm-project/issues/85864
but need to verify.
|
|
This reverts commit 009f88fc0e3a036be97ef7b222b90af342bae0b7.
Reverting PR due to test failure.
|
|
|
|
(#83821)"
Recommit with a fix for the use-after-free causing the revert.
This reverts the revert commit f872043e055f4163c3c4b1b86ca0354490174987.
Original commit message:
Dropping disjoint from an OR may yield incorrect results, as some
analysis may have converted it to an Add implicitly (e.g. SCEV used for
dependence analysis). Instead, replace it with an equivalent Add.
This is possible as all users of the disjoint OR only access lanes where
the operands are disjoint or poison otherwise.
Note that replacing all disjoint ORs with ADDs instead of dropping the
flags is not strictly necessary. It is only needed for disjoint ORs that
SCEV treated as ADDs, but those are not tracked.
There are other places that may drop poison-generating flags; those
likely need similar treatment.
Fixes https://github.com/llvm/llvm-project/issues/81872
PR: https://github.com/llvm/llvm-project/pull/83821
|
|
Summary:
The plan is to remove the entire plugin interface and simply use the
`GenericPluginTy` inside of `libomptarget` by statically linking against
it. This means that inside of `libomptarget` we will simply do
`Plugin.data_alloc` without the dynamically loaded interface. To reduce
the amount of code required, this patch simply moves all of the RTL
implementation functions inside of the Generic device. Now the
`__tgt_rtl_` interface is simply a shallow wrapper that will soon go
away. There is some redundancy here, this will be improved later. For
now what is important is minimizing the changes to the API.
|
|
(#86830)
Summary:
The 'new driver' sets up the lifetime of a registered liftime using
global constructors and destructors. Currently, this is put at priority
1 which isn't strictly conformant as it will conflict with system
utilities. We now use 101 as this is the loweest suggested for
non-system constructors and will still run before user constructors.
Secondly, there were issues with the CUDA runtime when destructed with a
global destructor. Because the global ones are in any order and
potentially run before other things we were hitting an edge case where
the OpenMP runtime was uninitialized *after* `_dl_fini` was called. This
would result in us erroring when we call into a destroyed `libcuda.so`
instance. using `atexit` is what CUDA / HIP use and it prevents this
from happening. Most everything uses `atexit` except system utilities
and because of the constructor priority it will be unregistered *after*
everything else but not after `_fl_fini`.
|
|
When enabling alignment of consecutive declarations and reference right
alignment, the needed space between `& ` and ` = ` is removed in the
following use case.
Problem (does not compile)
```
int a(const Test &= Test());
double b();
```
Expected:
```
int a(const Test & = Test());
double b();
```
Test command:
```
echo "int a(const Test& = Test()); double b();" | clang-format -style="{AlignConsecutiveDeclarations: true, ReferenceAlignment: Right}"
```
|
|
|
|
- Put the helper function in `ProfDataUtil.h/cpp`, which is already a
dependency of `Instructions.cpp`
- The helper function could be re-used to update profiles of
`InvokeInst` (in a follow-up pull request)
|
|
fixes #86719
- `SemaChecking.cpp` - Adds unsigned semaChecks to
`__builtin_elementwise_bitreverse`
- `hlsl_intrinsics.h` - remove signed `reversebits` apis
|
|
The uses of OldDef/NewDef may not be killed in the same place they
previously were after they are replaced, and so need to be cleared.
|
|
This reverts commit 36ca9a29025a2f678096e9545fa2ec44e8432592.
We were using the `time` as the seed while choosing a new TSD. To make
the access of TSDs evenly distributed, we require a higher precision in
`time`. Otherwise, many threads may result in having the same random
access pattern on TSDs because they share the same `time` in certain
period. On Linux, CLOCK_MONOTONIC_COARSE usually adopts 4 ms precision.
This is way higher than the average accessing time of TSD (which is
usually less than 1 us). As a result, when multiple threads try to
select a new TSD in a 4 ms interval, they share the same `time` seed and
end up choosing and congesting on the same TSD.
|
|
Error out the build if the checksum mismatch is extremely high, it's
better to drop the profile rather than apply the bad profile.
Note that the check is on a module level, the user could make big
changes to functions in one single module but those changes might not be
performance significant to the whole binary, so we want to be
conservative, only expect to catch big perf regression. To do this, we
select a set of the "hot" functions for the check. We use two
parameter(`hot-func-cutoff-for-staleness-error` and
`min-functions-for-staleness-error`) to control the function selection
to make sure the selected are hot enough and the num of function is not
small.
Tuned the parameters on our internal services, it works to catch big
perf regression due to the high mismatch .
|