Age | Commit message (Collapse) | Author | Files | Lines |
|
This PR starts using the correct VFS in `GCOVProfilerPass` instead of
using the real FS directly. This matches compiler's behavior for other
input files.
|
|
This patch is based on https://github.com/llvm/llvm-project/pull/159713
This patch extends AddressSanitizer to support indexed/segment
instructions in RVV. It enables proper instrumentation for these memory
operations.
A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to
describe the offset between the base pointer and the actual memory
reference address.
Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
|
|
Uses the updated handleAVX512VectorGenericMaskedFP() from
https://github.com/llvm/llvm-project/pull/159966
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|
|
(#160640)
Reapplies #159686
This reverts commit 4f33d7b7a9f39d733b7572f9afbf178bca8da127.
The original landing of this patch had an issue where it would try and
hoist allocas into the entry block that were in the entry block. This
would end up actually moving them lower in the block potentially after
users, resulting in invalid IR.
This update fixes this by ensuring that we are only hoisting static
allocas that have been sunk into a split basic block. A regression test
has been added.
Integration tested using a three stage build of clang with IRPGO
enabled.
|
|
This generalizes handleAVX512VectorGenericMaskedFP() (introduced in
#158397), to potentially handle intrinsics that have A/WriteThru/Mask in
an operand order that is different to AVX512/AVX10 rcp and rsqrt. Any
operands other than A and WriteThru must be fully initialized.
For example, the generalized handler could be applied in follow-up work
to many of the AVX512 rndscale intrinsics:
```
<32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512(<32 x half>, i32, <32 x half>, i32, i32)
<16 x float> @llvm.x86.avx512.mask.rndscale.ps.512(<16 x float>, i32, <16 x float>, i16, i32)
<8 x double> @llvm.x86.avx512.mask.rndscale.pd.512(<8 x double>, i32, <8 x double>, i8, i32)
A Imm WriteThru Mask Rounding
<8 x float> @llvm.x86.avx512.mask.rndscale.ps.256(<8 x float>, i32, <8 x float>, i8)
<4 x float> @llvm.x86.avx512.mask.rndscale.ps.128(<4 x float>, i32, <4 x float>, i8)
<4 x double> @llvm.x86.avx512.mask.rndscale.pd.256(<4 x double>, i32, <4 x double>, i8)
<2 x double> @llvm.x86.avx512.mask.rndscale.pd.128(<2 x double>, i32, <2 x double>, i8)
A Imm WriteThru Mask
```
|
|
(#159686)"
This reverts commit a00450944d2a91aba302954556c1c23ae049dfc7.
Looks like this one is actually breaking the buildbots. Reverting the switch back
to IRPGO did not fix things.
|
|
embed in MemIntrinsicInfo #157863 (#159713)
[Previously reverted due to failures on asan-rvv-intrinsics.ll, the test
case is riscv only and it is triggered by other target]
Reland [#157863](https://github.com/llvm/llvm-project/pull/157863), and
add `; REQUIRES: riscv-registered-target` in test case to skip the
configuration that doesn't register riscv target.
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
ControlHeightReduction will duplicate some blocks and insert phi nodes
in exit blocks of regions that it operates on for any live values. This
includes allocas. Having a lifetime annotation refer to a phi node was
made illegal in 92c55a315eab455d5fed2625fe0f61f88cb25499, which causes
the verifier to fail after CHR.
There are some cases where we might not need to drop lifetime
annotations (usually because we do not need the phi to begin with), but
drop all annotations for now to be conservative.
Fixes #159621.
|
|
The `NDEBUG` macro is tested for defined-ness everywhere else. The
instance here triggers a warning when compiling with `-Wundef`.
|
|
embed in MemIntrinsicInfo" (#159700)
Reverts llvm/llvm-project#157863
|
|
MemIntrinsicInfo (#157863)
Previously asan considers target intrinsics as black boxes, so asan
could not instrument accurate check. This patch make
SmallVector<InterestingMemoryOperand> a member of MemIntrinsicInfo so
that TTI can make targets describe their intrinsic informations to asan.
Note,
1. This patch move InterestingMemoryOperand from Transforms to Analysis.
2. Extend MemIntrinsicInfo by adding a
SmallVector<InterestingMemoryOperand> member.
3. This patch does not support RVV indexed/segment load/store.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161)
This is a reland of https://github.com/llvm/llvm-project/pull/158460
Test failures are gone once I undo the changes in codegenprepare.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159)
Reverts llvm/llvm-project#158460 due to buildbot failures
|
|
existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460)
Before this change, `setSectionPrefix` overwrites existing section
prefix with new one unconditionally.
After this change, `setSectionPrefix` checks for equivalences, updates
conditionally and returns whether an update happens.
Update the existing callers to make use of the return value. [PR
155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9)
is a motivating use case whether the 'update' semantic is needed.
|
|
Adds a new handler, handleAVX512VectorGenericMaskedFP(), and applies it
to AVX512/AVX10 rcp and rsqrt
|
|
Approximately handle avx512_{packssdw/packsswb/packusdw/packuswb} with
the existing handleVectorPackIntrinsic(), instead of relying on the
default (strict) handler.
|
|
|
|
|
|
Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271
|
|
https://github.com/llvm/llvm-project/pull/153927 incorrectly cast using
a hardcoded reduction factor of two, rather than using the parameter.
This caused false negatives but not false positives. (The only incorrect
case was a reduction factor of four; if four values {A,B,C,D} are being
reduced, the result is fully zero iff {A,B} and {C,D} are both zero
after pairwise reduction. If only one of those reduced pairs is zero,
then the quadwise reduction is non-zero.)
|
|
partition data (#151238)
https://github.com/llvm/llvm-project/commit/f3f28323adbb9d01372d81b4c78ed94683e58757
introduces the data access profile format as a payload inside
[memprof](https://llvm.org/docs/InstrProfileFormat.html#memprof-profile-data),
and the MemProfUse pass reads the memprof payload.
This change extends the MemProfUse pass to read the data access profiles
to annotate global variables' section prefix.
1. If there are samples for a global variable, it's annotated as hot.
2. If a global variable is seen in the profiled binary file but doesn't
have access samples, it's annotated as unlikely.
Introduce an option `annotate-static-data-prefix` to flag-gate the
global-variable annotation path, and make it false by default.
https://github.com/llvm/llvm-project/pull/155337 is the (WIP) draft
change to "reconcile" two sources of hotness.
|
|
This has recently started causing 'Invalid size request on a scalable
vector.'
|
|
Discarding the `.note.hwasan.globals` section in ldscript causes a
linker error, since `hwasan_globals` refers to the discarded section.
The issue comes from `hwasan.dummy.global` being associated via metadata
with `.note.hwasan.globals`.
Add a new `-hwasan-static-linking` option to skip inserting
`.note.hwasan.globals` for static binaries, as it is only needed for
instrumenting globals from dynamic libraries. In static binaries, the
global variables section can be accessed directly via the
`__start_hwasan_globals` and `__stop_hwasan_globals` symbols inserted by
the linker.
|
|
Currently visitIntrinsicInst() is a long, partly unsorted list. This patch groups them into cross-platform, X86 SIMD, and Arm SIMD families, making the overall intent of visitIntrinsicInst() clearer:
```
void visitIntrinsicInst(IntrinsicInst &I) {
if (maybeHandleCrossPlatformIntrinsic(I))
return;
if (maybeHandleX86SIMDIntrinsic(I))
return;
if (maybeHandleArmSIMDIntrinsic(I))
return;
if (maybeHandleUnknownIntrinsic(I))
return;
visitInstruction(I);
}
```
There is one disadvantage: the compiler will not tell us if the switch statements in the handlers have overlapping coverage.
|
|
This extends handleAVX512VectorConvertFPToInt() from
556c8467d15a131552e3c84478d768bafd95d4e6
(https://github.com/llvm/llvm-project/pull/147377) to handle AVX512
VCVTPS2PH.
|
|
(#99415)" (#154803)
Originally suggested by rnk@
(this is the simplified function-level skip version, to unblock builds
ASAP)
|
|
Instructions (VNNI) (#153927)
This extends the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to three-operand intrinsics (multiply-add-accumulate), and applies it to the AVX Vector Neural Network Instructions.
Updates the tests from https://github.com/llvm/llvm-project/pull/153135
|
|
This applies the pmadd handler (recently improved in https://github.com/llvm/llvm-project/pull/153353) to the Avx512
equivalent of the pmaddw and pmaddubs intrinsics:
<16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
<32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
|
|
llvm.x86.sse.pshuf.w(<1 x i64>, i8) and llvm.x86.avx512.pshuf.b.512(<64
x i8>, <64 x i8>) are currently handled strictly, which is suboptimal.
llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>) and
llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>) are currently heuristically
handled using maybeHandleSimpleNomemIntrinsic, which is incorrect.
Since the second argument is the shuffle order, we instrument all these
intrinsics using `handleIntrinsicByApplyingToShadow(...,
/*trailingVerbatimArgs=*/1)`
(https://github.com/llvm/llvm-project/pull/114490).
|
|
instrumentation (#153353)
This reverts commit cf002847a464c004a57ca4777251b1aafc33d958 i.e.,
relands ba603b5e4d44f1a25207a2a00196471d2ba93424. It was reverted
because it was subtly wrong: multiplying an uninitialized zero should
not result in an initialized zero.
This reland fixes the issue by using instrumentation analogous to
visitAnd (bitwise AND of an initialized zero and an uninitialized value
results in an initialized value). Additionally, this reland expands a
test case; fixes the commit message; and optimizes the change to avoid
the need for horizontalReduce.
The current instrumentation has false positives: it does not take into
account that multiplying an initialized zero value with an uninitialized
value results in an initialized zero value This change fixes the issue
during the multiplication step. The horizontal add step is modeled using
bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent
intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX
VNNI intrinsics.
|
|
into `ProfileData` (#153735)
The logic isn’t instrumentation-specific, and the refactoring allows users avoid a dependency on `Instrumentation` and just take one on `ProfileData` (which a fairly low-level dependency)
|
|
|
|
Reverts llvm/llvm-project#152941
Buildbot breakage:
https://lab.llvm.org/buildbot/#/builders/66/builds/17843
|
|
The current instrumentation has false positives: if there is a single uninitialized bit in any of the operands, the entire output is poisoned. This does not take into account that multiplying an uninitialized value with zero results in an initialized zero value.
This step allows elements that are zero to clear the corresponding shadow during the multiplication step. The horizontal add step and accumulation step (if any) are modeled using bitwise OR.
Future work can apply this improved handler to the AVX512 equivalent intrinsics (x86_avx512_pmaddw_d_512, x86_avx512_pmaddubs_w_512.) and AVX VNNI intrinsics.
|
|
/llvm-project/llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:2752:22:
error: unused variable 'ParamType' [-Werror,-Wunused-variable]
FixedVectorType *ParamType =
^
1 error generated.
|
|
The functionality is used by two helper functions, and will be used even more in the future (e.g.,
https://github.com/llvm/llvm-project/pull/152941).
|
|
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
|
|
Split out from https://github.com/llvm/llvm-project/pull/150248:
Use the size of the alloca instead of the size passed to the lifetime
intrinsic.
As a bonus, this handles dynamic allocas correctly (see the added test)
instead of doing a memset with size -1...
|
|
While not needed for most applications, some tools such as
[MUST](https://www.i12.rwth-aachen.de/cms/i12/forschung/forschungsschwerpunkte/lehrstuhl-fuer-hochleistungsrechnen/~nrbe/must/)
depend on the instrumentation being present.
MUST uses the ThreadSanitizer annotation interface to detect data races
in MPI programs, where the capture tracking is detrimental as it has no
bearing on MPI data races, leading to missed races.
|
|
(#151739)
|
|
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
|
|
Reverts llvm/llvm-project#148758
[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
|
|
hwasan-globals does not instrument globals with custom sections, because
existing code may use `__start_`/`__stop_` symbols to iterate over
globals in such a way which will cause hwasan assertions.
Introduce new hwasan-all-globals option, which instruments all
user-defined globals (but not those globals which are generated by the
hwasan instrumentation itself), including those with custom sections.
fixes #142442
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
|
|
e.g.,
<16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
<32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
<64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
Out A x b
where A and x are packed matrices, b is a vector, Out = A * x + b in
GF(2)
Multiplication in GF(2) is equivalent to bitwise AND. However, the
matrix computation also includes a parity calculation.
For the bitwise AND of bits V1 and V2, the exact shadow is:
Out_Shadow = (V1_Shadow & V2_Shadow) | (V1 & V2_Shadow) | (V1_Shadow &
V2)
We approximate the shadow of gf2p8affine using:
Out_Shadow = _mm512_gf2p8affine_epi64_epi8(x_Shadow, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x, A_shadow, 0)
| _mm512_gf2p8affine_epi64_epi8(x_Shadow, A, 0)
| _mm512_set1_epi8(b_Shadow)
This approximation has false negatives: if an intermediate dot-product
contains an even number of 1's, the parity is 0.
It has no false positives.
Updates the test from https://github.com/llvm/llvm-project/pull/149258
|
|
|
|
The first argument to a lifetime intrinsic now has to be an alloca
|
|
|
|
|