Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Implement the TypeHashPointerSplit mode: This mode assigns a token ID
based on the hash of the allocated type's name, where the top half
ID-space is reserved for types that contain pointers and the bottom half
for types that do not contain pointers.
This mode with max tokens of 2 (`-falloc-token-max=2`) may also
be valuable for heap hardening strategies that simply separate pointer
types from non-pointer types.
Make it the new default mode.
Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434
---
This change is part of the following series:
1. https://github.com/llvm/llvm-project/pull/160131
2. https://github.com/llvm/llvm-project/pull/156838
3. https://github.com/llvm/llvm-project/pull/162098
4. https://github.com/llvm/llvm-project/pull/162099
5. https://github.com/llvm/llvm-project/pull/156839
6. https://github.com/llvm/llvm-project/pull/156840
7. https://github.com/llvm/llvm-project/pull/156841
8. https://github.com/llvm/llvm-project/pull/156842
|
|
metadata (#160131)
In preparation of adding the "AllocToken" pass, add the pre-requisite
`sanitize_alloc_token` function attribute and `alloc_token` metadata.
---
This change is part of the following series:
1. https://github.com/llvm/llvm-project/pull/160131
2. https://github.com/llvm/llvm-project/pull/156838
3. https://github.com/llvm/llvm-project/pull/162098
4. https://github.com/llvm/llvm-project/pull/162099
5. https://github.com/llvm/llvm-project/pull/156839
6. https://github.com/llvm/llvm-project/pull/156840
7. https://github.com/llvm/llvm-project/pull/156841
8. https://github.com/llvm/llvm-project/pull/156842
|
|
This introduces `!captures` metadata on stores, which looks like this:
```
store ptr %x, ptr %y, !captures !{!"address", !"read_provenance"}
```
The semantics are the same as replacing the store with a call like this:
```
call void @llvm.store(ptr captures(address, read_provenance) %x, ptr %y)
```
This metadata is intended for annotation by frontends -- it's not
something we can feasibly infer at this point, as it would require
analyzing uses of the pointer stored in memory.
The motivating use case for this is Rust's `println!()` machinery, which
involves storing a reference to the value inside a structure. This means
that printing code (including conditional debugging code), can inhibit
optimizations because the pointer escapes. With the new metadata we can
annotate this as a read-only capture, which has less impact on
optimizations.
|
|
Mark the attributes as immarg to indicate that they require a constant
integer. This was previously enforced with a manual verifier check.
|
|
sanitizer-aarch64-linux-bootstrap-ubsan buildbot was previously failing.
Resolves: https://lab.llvm.org/buildbot/#/builders/169/builds/15232.
|
|
Add a new named module-level frontend-annotated metadata that
specifies the TBAA node for an integer access, for which, C/C++
`errno` accesses are guaranteed to use (under strict aliasing).
This should allow LLVM to prove the involved memory location/
accesses may not alias `errno`; thus, to perform optimizations
around errno-writing libcalls (store-to-load forwarding amongst
others).
Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
|
|
Assumes either have a boolean condition, or a number of attribute based
operand bundles. Currently, we also allow mixing both forms, though we
don't make use of this in practice. This adds additional complexity for
code dealing with assumes.
Forbid mixing both forms, by requiring that assumes with operand bundles
have an i1 true condition.
|
|
|
|
Teach the IR parser and writer to support metadata on ifuncs, and update
documentation.
In PR #153049, we have a use case of attaching the `!associated`
metadata to an ifunc.
Since an ifunc is similar to a function declaration, it seems natural to
allow metadata on ifuncs.
Currently, the metadata API allows adding Metadata to
llvm::GlobalObject, so the in-memory IR allows for metadata on ifuncs,
but the IR reader/writer is not aware of that.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
|
|
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
|
|
Rather than passes using `!prof = !{!”unknown”}`for cases where don’t have enough information to emit profile values, this patch captures the pass (or some other information) that can help diagnostics - i.e. `!{!”unknown”, !”some-pass-name”}`.
For example, suppose we emitted a `select` with the unknown metadata, and, later, end up needing to lower that to a conditional branch. If we observe (via sample profiling, for example) that the branch is biased and would have benefitted from a valid profile, the extra information can help speed up debugging.
We can also (in a subsequent pass) generate optimization remarks about such lowered selects, with a similar aim - identify patterns lowering to `select` that may be worth some extra investment in extracting a more precise profile.
|
|
Fixes #157448
|
|
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.
After PR, linking `clang.exe`:
<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>
Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).
<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
|
|
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
|
|
Some passes synthesize functions, e.g. WPD, so we may need to indicate “this synthesized function’s entry count cannot be estimated at compile time” - akin to `branch_weights`.
Issue #147390
|
|
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
|
|
Our GPU compiler usually construct pointers through inttoptr. The memory
was pre-allocated before the shader function execution and remains valid
through the execution of the shader function. This brings back the
expected behavior of instruction hoisting for the test
`hoist-speculatable-load.ll`, which was broken by #126117.
|
|
fixes #152348
SimplifyCFG collapses raw buffer store from a if\else load into a
select.
This change prevents the TargetExtType dx.Rawbuffer from being replace
thus preserving the if\else blocks.
A further change was needed to eliminate the phi node before we process
Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
|
|
subrange debug (#154665)
Set debug was failing validation where the base type was subrange. This
has
happened with the addition of the new functionality for proper subrange
debugging. This fix just allows set types to be based on subranges.
|
|
|
|
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h. With that, we no longer need to
include SmallSet.h in many files.
|
|
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note
that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer
element types:
template <typename PointeeType, unsigned N>
class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N>
{};
We only have 140 instances that rely on this "redirection", with the
vast majority of them under llvm/. Since relying on the redirection
doesn't improve readability, this patch replaces SmallSet with
SmallPtrSet for pointer element types.
|
|
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The
buildbot failure seems to have been a cmake issue which has been
discussed in more detail in this Discourse post:
https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901
If any buildbots fail to select arbitrary intrinsics with this patch,
it's worth considering using clean builds with ccache instead of
incremental builds, as recommended here:
https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds
The original commit message for this patch:
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Tail calls are handled in a future patch.
|
|
This introduces a new `ptrtoaddr` instruction which is similar to
`ptrtoint` but has two differences:
1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance
2) `ptrtoaddr` only extracts (and then extends/truncates) the low
index-width bits of the pointer
For most architectures, difference 2) does not matter since index (address)
width and pointer representation width are the same, but this does make a
difference for architectures that have pointers that aren't just plain
integer addresses such as AMDGPU fat pointers or CHERI capabilities.
This commit introduces textual and bitcode IR support as well as basic code
generation, but optimization passes do not handle the new instruction yet
so it may result in worse code than using ptrtoint. Follow-up changes will
update capture tracking, etc. for the new instruction.
RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54
Reviewed By: nikic
Pull Request: https://github.com/llvm/llvm-project/pull/139357
|
|
Now that #149310 has restricted lifetime intrinsics to only work on
allocas, we can also drop the explicit size argument. Instead, the size
is implied by the alloca.
This removes the ability to only mark a prefix of an alloca alive/dead.
We never used that capability, so we should remove the need to handle
that possibility everywhere (though many key places, including stack
coloring, did not actually respect this).
|
|
Reverts llvm/llvm-project#145859 because it broke a HIP test:
```
[34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o
/home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane
```
|
|
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave
functions. This will take as its first argument the callee with the
amdgpu_gfx_whole_wave calling convention, followed by the call
parameters which must match the signature of the callee except for the
first function argument (the i1 original EXEC mask, which doesn't need
to be passed in). Indirect calls are not allowed.
Make direct calls to amdgpu_gfx_whole_wave functions a verifier error.
Unspeakable horrors happen around calls from whole wave functions, the
plan is to improve the handling of caller/callee-saved registers in
a future patch.
Tail calls are also handled in a future patch.
|
|
This slightly relaxes the invariant established in #149310, by also
allowing the lifetime argument to be poison. This is to support the
typical pattern of RAUWing with poison when removing an instruction.
It's worth noting that this does not require any conservative
assumptions, lifetimes with poison arguments can simply be skipped.
Fixes https://github.com/llvm/llvm-project/issues/151119.
|
|
Reverts llvm/llvm-project#148758
[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
|
|
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
|
|
lifetime.start and lifetime.end are primarily intended for use on
allocas, to enable stack coloring and other liveness optimizations. This
is necessary because all (static) allocas are hoisted into the entry
block, so lifetime markers are the only way to convey the actual
lifetimes.
However, lifetime.start and lifetime.end are currently *allowed* to be
used on non-alloca pointers. We don't actually do this in practice, but
just the mere fact that this is possible breaks the core purpose of the
lifetime markers, which is stack coloring of allocas. Stack coloring can
only work correctly if all lifetime markers for an alloca are
analyzable.
* If a lifetime marker may operate on multiple allocas via a select/phi,
we don't know which lifetime actually starts/ends and handle it
incorrectly (https://github.com/llvm/llvm-project/issues/104776).
* Stack coloring operates on the assumption that all lifetime markers
are visible, and not, for example, hidden behind a function call or
escaped pointer. It's not possible to change this, as part of the
purpose of lifetime markers is that they work even in the presence of
escaped pointers, where simple use analysis is insufficient.
I don't think there is any way to have coherent semantics for lifetime
markers on allocas, while also permitting them on arbitrary pointer
values.
This PR restricts lifetimes to operate on allocas only. As a followup, I
will also drop the size argument, which is superfluous if we always
operate on an alloca. (This change also renders various code handling
lifetime markers on non-alloca dead. I plan to clean up that kind of
code after dropping the size argument as well.)
In practice, I've only found a few places that currently produce
lifetimes on non-allocas:
* CoroEarly replaces the promise alloca with the result of an intrinsic,
which will later be replaced back with an alloca. I think this is the
only place where there is some legitimate loss of functionality, but I
don't think this is particularly important (I don't think we'd expect
the promise in a coroutine to admit useful lifetime optimization.)
* SafeStack moves unsafe allocas onto a separate frame. We can safely
drop lifetimes here, as SafeStack performs its own stack coloring.
* Similar for AddressSanitizer, it also moves allocas into separate
memory.
* LSR sometimes replaces the lifetime argument with a GEP chain of the
alloca (where the offsets ultimately cancel out). This is just
unnecessary. (Fixed separately in
https://github.com/llvm/llvm-project/pull/149492.)
* InferAddrSpaces sometimes makes lifetimes operate on an addrspacecast
of an alloca. I don't think this is necessary.
|
|
Whole wave functions are functions that will run with a full EXEC mask.
They will not be invoked directly, but instead will be launched by way
of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in
a future patch). These functions are meant as an alternative to the
`llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics.
Whole wave functions will set EXEC to -1 in the prologue and restore the
original value of EXEC in the epilogue. They must have a special first
argument, `i1 %active`, that is going to be mapped to EXEC. They may
have either the default calling convention or amdgpu_gfx. The inactive
lanes need to be preserved for all registers used, active lanes only for
the CSRs.
At the IR level, arguments to a whole wave function (other than
`%active`) contain poison in their inactive lanes. Likewise, the return
value for the inactive lanes is poison.
This patch contains the following work:
* 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN
used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return
a SReg_1 representing `%active`, which needs to be passed into
SI_WHOLE_WAVE_FUNC_RETURN.
* SelectionDAG support for generating these 2 new pseudos and the
special handling of %active. Since the return may be in a different
basic block, it's difficult to add the virtual reg for %active to
SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF
which is later replaced via a custom inserter.
* Expansion of the 2 pseudos during prolog/epilog insertion. PEI also
marks any used VGPRs as WWM registers, which are then spilled and
restored with the usual logic.
Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic
and a lot of optimization work (especially in order to reduce spills
around function calls).
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
Co-authored-by: Shilei Tian <i@tianshilei.me>
|
|
Introduce `callee_type` metadata which will be attached to the indirect
call instructions.
The `callee_type` metadata will be used to generate `.callgraph` section
described in this RFC:
https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html
Reviewers: morehouse, petrhosek, nikic, ilovepi
Reviewed By: nikic, ilovepi
Pull Request: https://github.com/llvm/llvm-project/pull/87573
|
|
We no longer produce debug-intrinsics, and whenever they're spotted in
bitcode or textual IR they get autoupgraded. We could quite reasonably
reject them out of hand as a construct that shouldn't be present.
However, the DXIL folks are likely to be converting records back to
intrinsics for years to come, and there's no need to make that an error.
There's no value in verifying them IMO.
|
|
The verifier check was in the wrong place, meaning it wasn't actually
checking many instructions.
Fixing that causes a test failure (coro-dwarf-key-instrs.cpp) because
coros turn off the feature but still annotate instructions with the
metadata (which is a supported situation, but the verifier doesn't like
it, and it's hard to teach the verifier to like it).
Fix that by avoiding emitting any key instruction metadata if the
DISubprogram has opted out of key instructions.
|
|
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
|
|
Update isDereferenceableAndAlignedPointer to make use of dereferenceable
assumptions with variable sizes via SCEV.
To do so, factor out the logic to check via an assumption to a helper,
and use SE to check if the access size is less than the dereferenceable
size.
PR: https://github.com/llvm/llvm-project/pull/128436
|
|
immediate. (#147458)
This applies the same check as llvm.vector.splice which checks that the immediate is in the range [-VL, VL-1] where VL is the minimum vector length. If vscale_range is available, the lower bound is used to increase the known minimum vector length for this check. This ensures the immediate is in range for any possible value of vscale that satisfies the vscale_range.
|
|
This PR is part of https://discourse.llvm.org/t/rfc-profile-information-propagation-unittesting/73595
In a slight departure from the RFC, instead of a brand-new `MD_prof_unknown` kind, this adds a first operand to `MD_prof` metadata. This makes it easy to replace with valid metadata (only one `MD_prof`), otherwise sites inserting valid `MD_prof` would also have to check to remove the `unknown` one.
The patch just introduces the notion and fixes the verifier accordingly. Existing APIs working (esp. reading) `MD_prof` will be updated subsequently.
|
|
|
|
Patch 1/4 adding bitcode support.
Store whether or not a function is using Key Instructions in its DISubprogram so
that we don't need to rely on the -mllvm flag -dwarf-use-key-instructions to
determine whether or not to interpret Key Instructions metadata to decide
is_stmt placement at DWARF emission time. This makes bitcode support simple and
enables well defined mixing of non-key-instructions and key-instructions
functions in an LTO context.
This patch adds the bit (using DISubprogram::SubclassData1).
PR 144104 and 144103 use it during DWARF emission.
PR 44102 adds bitcode
support.
See pull request for overview of alternative attempts.
|
|
|
|
This PR places the validation of `MD_prof` instruction metadata in the Verifier.
|
|
In Ada, a record type can have a non-constant size, and a field can
appear at a non-constant bit offset in a record.
To support this, this patch changes DIType to record the size and offset
using metadata, rather than plain integers. In addition to a constant
offset, both DIVariable and DIExpression are now supported here.
One thing of note in this patch is the choice of how exactly to
represent a non-constant bit offset, with the difficulty being that
DWARF 5 does not support this. DWARF 3 did have a way to support a
non-constant byte offset, combined with a constant bit offset within the
byte, but this was deprecated in DWARF 4 and removed from DWARF 5.
This patch takes a simple approach: a DWARF extension allowing the use
of an expression with DW_AT_data_bit_offset. There is a corresponding
DWARF issue, see https://dwarfstd.org/issues/250501.1.html. The main
reason for this approach is that it keeps API simplicity: just a single
value is needed, rather than having separate data describing the byte
offset and the bit within the byte.
|
|
For some reason, some of the checks for specific assumbe bundle elements
exit early if the check pass, meaning we don't verify other entries.
Replace the early returns with early continues.
This also requires removing some tests that are currently rejected. They will
be added back as part of https://github.com/llvm/llvm-project/pull/128436.
PR: https://github.com/llvm/llvm-project/pull/145586
|
|
Also fix the LangRef to match the implementation. This was checking
against the alloca address space size rather than the default address
space.
The check was also more permissive than the LangRef. The error
check permitted any size less than the pointer size; follow the
stricter wording of the LangRef.
|
|
This patch attaches the range attribute to the setmaxnreg
and fence.proxy.tensormap.* intrinsics. The range checks
are now handled generically in the Verifier. So, this patch
removes the per-intrinsic error-handling for range-checks
from the Verifier.
This patch also adds more coverage tests for these cases.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
|