Age | Commit message (Collapse) | Author | Files | Lines |
|
values
- Currently, Intrinsic can only have up to 9 return values. In case new
intrinsics require more than 9 return values, additional ITT_STRUCTxxx
values need to be added to support > 9 return values. Instead, this
patch unifies them into a single IIT_STRUCT followed by a BYTE
specifying the minimal 2 (encoded as 0) and maximal 257 (encoded as
255) return values.
|
|
updateLoopMetadataDebugLocations (#157557)
Inliner/IROutliner/CodeExtractor all uses the
updateLoopMetadataDebugLocations helper in order to modify debug
location related to loop metadata. However, the helper has only
been updating DILocation nodes found as operands to the first level
of the MD_loop metadata. There could however be more DILocations
as part of the various kinds of followup metadata. A typical example
would be llvm.loop metadata like this
!6 = distinct !{!6, !7, !8, !9, !10, !11}
!7 = !DILocation(line: 6, column: 3, scope: !3)
!8 = !DILocation(line: 7, column: 22, scope: !3)
!11 = !{!"llvm.loop.distribute.followup_all", !7, !8, ..., !14}
!14 = !{!"llvm.loop.vectorize.followup_all", !7, !8, ...}
Instead of just updating !7 and !8 in !6, this patch make sure that
we now recursively update the DILocations in !11 and !14 as well.
Fixes #141568
|
|
Assume operand bundles are emitted in a few more places now, including
used in various places in libc++. Add a dedicated ID for them.
PR: https://github.com/llvm/llvm-project/pull/158078
|
|
sanitizer-aarch64-linux-bootstrap-ubsan buildbot was previously failing.
Resolves: https://lab.llvm.org/buildbot/#/builders/169/builds/15232.
|
|
Add a new named module-level frontend-annotated metadata that
specifies the TBAA node for an integer access, for which, C/C++
`errno` accesses are guaranteed to use (under strict aliasing).
This should allow LLVM to prove the involved memory location/
accesses may not alias `errno`; thus, to perform optimizations
around errno-writing libcalls (store-to-load forwarding amongst
others).
Previous discussion: https://discourse.llvm.org/t/rfc-modelling-errno-memory-effects/82972.
|
|
Assumes either have a boolean condition, or a number of attribute based
operand bundles. Currently, we also allow mixing both forms, though we
don't make use of this in practice. This adds additional complexity for
code dealing with assumes.
Forbid mixing both forms, by requiring that assumes with operand bundles
have an i1 true condition.
|
|
This commit adds finer-grained versions of isNonIntegralAddressSpace() and
isNonIntegralPointerType() where the current semantics prohibit
introduction of both ptrtoint and inttoptr instructions. The current
semantics are too strict for some targets (e.g. AMDGPU/CHERI) where
ptrtoint has a stable value, but the pointer has additional metadata.
Currently, marking a pointer address space as non-integral also marks it
as having an unstable bitwise representation (e.g. when pointers can be
changed by a copying GC). This property inhibits a lot of
optimizations that are perfectly legal for other non-integral pointers
such as fat pointers or CHERI capabilities that have a well-defined
bitwise representation but can't be created with only an address.
This change splits the properties of non-integral pointers and allows
for address spaces to be marked as unstable or non-integral (or both)
independently using the 'p' part of the DataLayout string.
A 'u' following the p marks the address space as unstable and specifying
a index width != representation width marks it as non-integral.
Finally, we also add an 'e' flag to mark pointers with external state
(such as the CHERI capability validity) state. These pointers require
special handling of loads and stores in addition to being non-integral.
This does not change the checks in any of the passes yet - we
currently keep the existing non-integral behaviour. In the future I plan
to audit calls to DL.isNonIntegral[PointerType]() and replace them with
the DL.mustNotIntroduce{IntToPtr,PtrToInt}() checks that allow for more
optimizations.
RFC: https://discourse.llvm.org/t/rfc-finer-grained-non-integral-pointer-properties/83176
Reviewed By: nikic, krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/105735
|
|
I noticed that `hasSameSpecialState()` checks alignment for
`load`/`store` instructions, but not for `cmpxchg` or `atomicrmw`, which
I assume is a bug. It looks like alignment for these instructions were
added in
https://github.com/llvm/llvm-project/commit/74c723757e69fbe7d85e42527d07b728113699ae.
|
|
Currently there are two serialization modes for bitstream Remarks:
standalone and separate. The separate mode splits remark metadata (e.g.
the string table) from actual remark data. The metadata is written into
the object file by the AsmPrinter, while the remark data is stored in a
separate remarks file. This means we can't use bitstream remarks with
tools like opt that don't generate an object file. Also, it is confusing
to post-process bitstream remarks files, because only the standalone
files can be read by llvm-remarkutil. We always need to use dsymutil
to convert the separate files to standalone files, which only works for
MachO. It is not possible for clang/opt to directly emit bitstream
remark files in standalone mode, because the string table can only be
serialized after all remarks were emitted.
Therefore, this change completely removes the separate serialization
mode. Instead, the remark string table is now always written to the end
of the remarks file. This requires us to tell the serializer when to
finalize remark serialization. This automatically happens when the
serializer goes out of scope. However, often the remark file goes out of
scope before the serializer is destroyed. To diagnose this, I have added
an assert to alert users that they need to explicitly call
finalizeLLVMOptimizationRemarks.
This change paves the way for further improvements to the remark
infrastructure, including more tooling (e.g. #159784), size optimizations
for bitstream remarks, and more.
Pull Request: https://github.com/llvm/llvm-project/pull/156715
|
|
conventions. (#156328)
This is the set of the calling conventions supported by the CHERIoT downstream of LLVM.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
|
|
This patch simplifies dispatchRecalculateHash and dispatchResetHash
with "constexpr if".
This patch does not inline dispatchRecalculateHash and
dispatchResetHash into their respective call sites. Using "constexpr
if" in a non-template context like MDNode::uniquify would still
require the discarded branch to be syntactically valid, causing a
compilation error for node types that do not have
recalculateHash/setHash. Using template functions ensures that the
"constexpr if" is evaluated in a proper template context, allowing the
compiler to fully discard the inactive branch.
|
|
This patch modernizes HasCachedHash.
- "struct SFINAE" is replaced with identically defined SameType.
- The return types Yes and No are replaced with std::true_type and
std::false_type.
My previous attempt (#159510) to clean up HasCachedHash failed on
clang++-18, but this version works with clang++-18.
|
|
Teach the IR parser and writer to support metadata on ifuncs, and update
documentation.
In PR #153049, we have a use case of attaching the `!associated`
metadata to an ifunc.
Since an ifunc is similar to a function declaration, it seems natural to
allow metadata on ifuncs.
Currently, the metadata API allows adding Metadata to
llvm::GlobalObject, so the in-memory IR allows for metadata on ifuncs,
but the IR reader/writer is not aware of that.
---------
Co-authored-by: Wael Yehia <wyehia@ca.ibm.com>
|
|
This reverts commit d6b7ac830ab4c1b26a1b2eecd15306eccf9cea90. Build
breakages reported on the PR hint at not working with certain versions
of the host compiler.
|
|
With is_detected, we don't need to implement a SFINAE trick on our own.
|
|
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
|
|
Something like `call void @llvm.assume(i1 true) ["align"(ptr %p, i64 8)]`
is equivalent to placing an `align 8` attribute on the parameter
and should not be considered as capturing.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159161)
This is a reland of https://github.com/llvm/llvm-project/pull/158460
Test failures are gone once I undo the changes in codegenprepare.
|
|
update if existing prefix is not equivalent to the new one. Returns whether prefix changed." (#159159)
Reverts llvm/llvm-project#158460 due to buildbot failures
|
|
existing prefix is not equivalent to the new one. Returns whether prefix changed. (#158460)
Before this change, `setSectionPrefix` overwrites existing section
prefix with new one unconditionally.
After this change, `setSectionPrefix` checks for equivalences, updates
conditionally and returns whether an update happens.
Update the existing callers to make use of the return value. [PR
155337](https://github.com/llvm/llvm-project/pull/155337/files#diff-cc0c67ac89807f4453f0cfea9164944a4650cd6873a468a0f907e7158818eae9)
is a motivating use case whether the 'update' semantic is needed.
|
|
This patch adds a class that uses SSA construction, with debug values as
definitions, to determine whether and which debug values for a
particular variable are live at each point in an IR function. This will
be used by the IR reader of llvm-debuginfo-analyzer to compute variable
ranges and coverage, although it may be applicable to other debug info
IR analyses.
|
|
This patch replaces, std::integral_constant<bool, ...> with
std::bool_constant for brevity. Note that std::bool_constant was
introduced as part of C++17.
There are cases where we could replace EXPECT_EQ(false, ...) with
EXPECT_FALSE(...), but I'm not doing that in this patch to avoid doing
multiple things in one patch.
|
|
(#154635)
`MD_prof` is safe to keep when e.g. hoisting instructions.
Issue #147390
|
|
Make IntrinsicsToAttributesMap's func. and arg. fields be able to have
adaptive sizes based on input other than hardcoded 8bits/8bits.
This will ease the pressure for adding new intrinsics in private
downstreams.
func. attr bitsize will become 7(127/128) vs 8(255/256)
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
|
|
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
The __truncdfhf2 handling is kind of convoluted, but reproduces
the existing, likely wrong, handling.
|
|
Rather than passes using `!prof = !{!”unknown”}`for cases where don’t have enough information to emit profile values, this patch captures the pass (or some other information) that can help diagnostics - i.e. `!{!”unknown”, !”some-pass-name”}`.
For example, suppose we emitted a `select` with the unknown metadata, and, later, end up needing to lower that to a conditional branch. If we observe (via sample profiling, for example) that the branch is biased and would have benefitted from a valid profile, the extra information can help speed up debugging.
We can also (in a subsequent pass) generate optimization remarks about such lowered selects, with a similar aim - identify patterns lowering to `select` that may be worth some extra investment in extracting a more precise profile.
|
|
When merging DILocations, we prefer to use DebugLoc::getMergedLocation
when possible to better preserve DebugLoc coverage tracking information
through transformations (as conversion to DILocations drops all coverage
tracking data). Currently, DebugLoc::getMergedLocation checks to see if
either DebugLoc is empty and returns it directly if so, to propagate
that DebugLoc's coverage tracking data to the merged location; however,
it only checks whether either location is valid, not whether they are
annotated.
This is significant because an annotated location is not a bug, while an
empty unannotated location may be one; therefore, we check to see if
either location is unannotated, and prefer to return that location if it
exists rather than an annotated one.
This change is NFC outside of DebugLoc coverage tracking builds.
|
|
Fixed intrinsic VPDPBUSD[,S]_128/256/512's argument types to match with the ISA.
Fixes part of #97271
|
|
Fixes #157448
|
|
I don't think we need to explicitly specify i1 alignment, as this is
going to fall back to i8 alignment.
This may change behavior if a data layout explicitly sets i8 alignment
without also setting i1 layout, but I'd expect this to be a bug fix in
that case.
|
|
In order to better see what's going on during ThinLTO linking, this PR
adds more profile tags when using `--time-trace` on a `lld-link.exe`
invocation.
After PR, linking `clang.exe`:
<img width="3839" height="2026" alt="Capture d’écran 2025-09-02 082021"
src="https://github.com/user-attachments/assets/bf0c85ba-2f85-4bbf-a5c1-800039b56910"
/>
Linking a custom (Unreal Engine game) binary gives a completly
different picture, probably because of using Unity files, and the sheer
amount of input files (here, providing over 60 GB of .OBJs/.LIBs).
<img width="1940" height="1008" alt="Capture d’écran 2025-09-02 102048"
src="https://github.com/user-attachments/assets/60b28630-7995-45ce-9e8c-13f3cb5312e0"
/>
|
|
The 256-bit maximum vector register size control was removed from AVX10
whitepaper, ref: https://cdrdv2.intel.com/v1/dl/getContent/784343
We have warned these options in LLVM21 through #132542. This patch
removes underlying implementations in LLVM22.
|
|
Some passes synthesize functions, e.g. WPD, so we may need to indicate “this synthesized function’s entry count cannot be estimated at compile time” - akin to `branch_weights`.
Issue #147390
|
|
- Add clang built-ins + sema/codegen
- Add IR Intrinsic + verifier
- Add DAG/GlobalISel codegen for the intrinsics
- Add lowering in SIMemoryLegalizer using a MMO flag.
|
|
getTypeAllocSize() currently works by taking the type store size and
aligning it to the ABI alignment. However, this ends up doing redundant
work in various cases, for example arrays will unnecessarily repeat the
alignment step, and structs will fetch the StructLayout multiple times.
As this code is rather hot (it is called every time we need to calculate
GEP offsets for example), specialize the implementation. This repeats a
small amount of logic from getAlignment(), but I think that's
worthwhile.
|
|
Reviewers: fmayer, nikic
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/156128
|
|
The number of alignment entries is usually very small (5-7), so
it is more efficient to use a linear scan than a binary search.
|
|
As noted in #153256, TableGen is generating reserved names for
RuntimeLibcalls, which resulted in a build failure for Arm64EC since
`vcruntime.h` defines `__security_check_cookie` as a macro.
To avoid using reserved names, all impl names will now be prefixed with
`Impl_`.
`NumLibcallImpls` was lifted out as a `constexpr size_t` instead of
being an enum field.
While I was churning the dependent code, I also removed the TODO to move
the impl enum into its own namespace and use an `enum class`: I
experimented with using an `enum class` and adding a namespace, but we
decided it was too verbose so it was dropped.
|
|
Our GPU compiler usually construct pointers through inttoptr. The memory
was pre-allocated before the shader function execution and remains valid
through the execution of the shader function. This brings back the
expected behavior of instruction hoisting for the test
`hoist-speculatable-load.ll`, which was broken by #126117.
|
|
Instead of relying on the implicit cast. The scalable case has
been explicitly checked beforehand.
|
|
Adds a `target_specific_attrs` optional array attribute to
`mlir.global`, as well as conversions to and from LLVM attributes on
`llvm::GlobalVariable` objects. This is necessary to preserve unknown
attributes on global variables when converting to and from the LLVM
Dialect. Previously, any attributes on an `llvm::GlobalVariable` not
explicitly modeled by `mlir.global` were dropped during conversion.
|
|
Correct few typos: 'seperate' -> 'separate' .
|
|
fixes #152348
SimplifyCFG collapses raw buffer store from a if\else load into a
select.
This change prevents the TargetExtType dx.Rawbuffer from being replace
thus preserving the if\else blocks.
A further change was needed to eliminate the phi node before we process
Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
|
|
https://github.com/llvm/llvm-project/issues/154528
# Brief
Indirect linking of llvm as a shared library is causing a "free()
invalid size abortion". In my case, my project depends on google/clspv
which in turn pulls `llvm`. Note that the issue does not occur when
`clspv` and `llvm` is all statically linked.
# Structure of a project which might be causing an error
[google/clspv](https://github.com/google/clspv) has been depending on
this project (llvm-project), as a static library.
My personal project has been depending on
[google/clspv](https://github.com/google/clspv) as a shared library.
So `MyProject` was linked to shared object `clspv_core.so` which is
containing `llvm-project` as its component.
# Problem
Linking `llvm-project` indirectly to `MyProject` via `clspv_core` was
causing the `free() invalid size` abortion.
> When library is all statically linked, this problem did not occur.
[This issue](https://github.com/llvm/llvm-project/issues/154528) has a
full log of the programme running with valgrind.
# Reason in my expectation
`KnownAssumptionStrings` from
[clang/lib/Sema/SemaOpenMP.cpp](https://github.com/llvm/llvm-project/pull/154541/files#diff-032b46da5a8b94f6d8266072e296726c361066e32139024c86dcba5bf64960fc),
[llvm/include/llvm/IR/Assumptions.h](https://github.com/llvm/llvm-project/pull/154541/files#diff-ebb09639e5957c2e4d27be9dcb1b1475da67d88db829d24ed8039f351a63ccff),
[llvm/lib/IR/Assumptions.cpp](https://github.com/llvm/llvm-project/pull/154541/files#diff-1b490dd29304c875364871e35e1cc8e47bf71898affe3a4dbde6eb91c4016d06)
and `FeatureMap` from
[llvm/lib/Analysis/MLInlineAdvisor.cpp](https://github.com/llvm/llvm-project/pull/154541/files#diff-26c738eb291410ed83595a4162de617e8cbebddb46331f56d39d193868e29857),
[llvm/include/llvm/Analysis/InlineModelFeatureMaps.h](https://github.com/llvm/llvm-project/pull/154541/files#diff-3b5a3359b2a0784186fb3f90dfabf905e8640b6adfd7d2c75259a6835751a6a7)
which have been placed on global scope, causing static initialisation
order ficasso when indirectly linked by `Myproject`.
# Fix trial
Changing those global instances I've mentioned ~
`KnownAssumptionStrings` and `FeatureMap` ~ to functions which return a
static variable's left value ~ `getKnownAssumptionStrings()`,
`getFeatureMap()` ~ has solved my personal problem, so I am pulling a
request of it.
|
|
subrange debug (#154665)
Set debug was failing validation where the base type was subrange. This
has
happened with the addition of the new functionality for proper subrange
debugging. This fix just allows set types to be based on subranges.
|
|
Upgrade the !"grid_constant" !nvvm.annotation to a "nvvm.grid_constant"
attribute. This attribute is much simpler for front-ends to apply and
faster and simpler to query.
|
|
|