Age | Commit message (Collapse) | Author | Files | Lines |
|
While we do not want to form actual lookup tables early, we do want to
perform some optimizations, as they may enable inlining of the much
simpler form.
Builds on https://github.com/llvm/llvm-project/pull/156477, which
originally included this change as well. This PR makes two changes on
top of it:
* Do not perform the optimization early if it requires adding a mask
check. These make the resulting IR less analyzable.
* Add a new SimplifyCFG option that controls switch-to-arithmetic
conversion separately from switch-to-lookup conversion. Enable the new
flag at the end of the function simplification pipeline. This means that
we attempt the arithmetic conversion before inlining, but avoid it in
the early pipeline, where it may lose information.
|
|
Introduce `AllocToken`, an instrumentation pass designed to provide
tokens to memory allocators enabling various heap organization
strategies, such as heap partitioning.
Initially, the pass instruments functions marked with a new attribute
`sanitize_alloc_token` by rewriting allocation calls to include a token
ID, appended as a function argument with the default ABI.
The design aims to provide a flexible framework for implementing
different token generation schemes. It currently supports the following
token modes:
- TypeHash (default): token IDs based on a hash of the allocated type
- Random: statically-assigned pseudo-random token IDs
- Increment: incrementing token IDs per TU
For the `TypeHash` mode introduce support for `!alloc_token` metadata:
the metadata can be attached to allocation calls to provide richer
semantic
information to be consumed by the AllocToken pass. Optimization remarks
can be enabled to show where no metadata was available.
An alternative "fast ABI" is provided, where instead of passing the
token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the
token ID is directly encoded into the name of the called function (e.g.,
`__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this
offers more efficient instrumentation by avoiding the overhead of
passing an additional argument at each allocation site.
Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1]
---
This change is part of the following series:
1. https://github.com/llvm/llvm-project/pull/160131
2. https://github.com/llvm/llvm-project/pull/156838
3. https://github.com/llvm/llvm-project/pull/162098
4. https://github.com/llvm/llvm-project/pull/162099
5. https://github.com/llvm/llvm-project/pull/156839
6. https://github.com/llvm/llvm-project/pull/156840
7. https://github.com/llvm/llvm-project/pull/156841
8. https://github.com/llvm/llvm-project/pull/156842
|
|
(#158608)
This PR, which supersedes
https://github.com/llvm/llvm-project/pull/139943, extends the scenarios
where the 'norecurse' attribute can be inferred.
Currently, the 'norecurse' attribute is only inferred if all called
functions also have this attribute. This change introduces a new pass in
the LTO pipeline, run after Whole Program Devirtualization, to broaden
the inference criteria. The new pass inspects all functions in the
module and sets a flag if any functions are external or have their
addresses taken (while ignoring those already marked norecurse). This
flag is then used with the existing conditions to enable inference in
more cases.
This enhancement allows 'norecurse' to be applied in situations where a
function calls a recursive function, but is not part of the same
recursion chain.
For example, foo can now be marked 'norecurse' in the following
scenarios:
`foo -> callee1 -> callee2 -> callee2`
In this case, foo and callee1 can both be marked 'norecurse' because
they're not part of the callee2 recursion.
Similarly, foo can be marked 'norecurse' here:
`foo -> callee1 -> callee2 -> callee1`
Here, foo is not part of the callee1 -> callee2 -> callee1 recursion
chain, so it can be marked 'norecurse'.
|
|
It's unnecessary to build the whole symtable, and on top of everything,
un-optimal to do so for every function. All we really need is the
instrumented PGO name - considering also LTO-ness - and then we can
compute the function name.
|
|
There's a pattern throughout LLVM of cl::opts being exported. That in
itself is probably a bit unfortunate, but what's especially bad about it
is that a lot of those symbols are in the global namespace. Move them
into the llvm namespace.
While doing this, I noticed some other variables in the global namespace
and moved them as well.
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|
|
Initially this was needed to replace the fixed-step canonical IV with
the variable-step EVL IV, but this was eventually superseded by the loop
vectorizer doing this transform itself in #147222. The pass was then
removed from the RISC-V pipeline in #151483 and the loop vectorizer
stopped emitting the metadata used by the pass in #155760, so now
there's no users of it.
|
|
(#159497)
Add missing coroutine passes so that coro code can be correctly
compiled.
Fix #155558
|
|
This is a very rough state of what this can look like, but I didn't want
to spend too much time on what could be a dead end.
Currently the only way to invoke callbacks is by using the default
pipelines, this is an issue if you want to define your own pipeline
using the C string API (we do that in LLVM.jl in julia) so I extended
the api to allow for invoking those callbacks just like one would call a
pass of that kind.
There are some questions about the params that these callbacks take and
also I'm missing some of them (some of them are also invoked by the
backend so we may not want to expose them)
Code written with AI help, bugs are mine. (Not sure what policy for this
is on LLVM)
|
|
In this commit:
(1) Added new pass manager support for `ReachingDefAnalysis`.
(2) Added printer pass.
(3) Make old pass manager use `ReachingDefInfoWrapperPass`
|
|
This adds a new pass for dropping assumes that are unlikely to be useful
for further optimization.
It works by discarding any assumes whose affected values are one-use
(which implies that they are only used by the assume, i.e. ephemeral).
This pass currently runs at the start of the module optimization
pipeline, that is post-inline and post-link. Before that point, it is
more likely for previously "useless" assumes to become useful again,
e.g. because an additional user of the value is introduced after
inlining + CSE.
|
|
(#159516)
No loop pass seems to use now it after LoopPredication stopped using it
in https://reviews.llvm.org/D111668
|
|
This PR is a reapplication of
https://github.com/llvm/llvm-project/pull/142686
|
|
(#158764)
This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af.
And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e.
Performance regressions and broken sanitizers, see #142686.
|
|
This patch adds the flag -fexperimental-loop-fuse to the clang and flang
drivers. This is primarily useful for experiments as we envision to
enable the pass one day.
The options are based on the same principles and reason on which we have
`floop-interchange`.
---------
Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
|
|
Align the syntax used for the optimization level argument of the
expand-fp pass in textual descriptions of pass pipelines with the syntax
used by other passes taking a similar argument. That is, use e.g.
`expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
|
|
|
|
This patch implements a correctly rounded expansion of the frem
instruction in LLVM IR. This is useful for target architectures for
which such an expansion is too involved to be implement in ISel
Lowering. The expansion is based on the code from the AMD device libs
and has been tested successfully against the OpenCL conformance tests on
amdgpu. The expansion is implemented in the preexisting "expand-fp"
pass. It replaces the expansion of "frem" in ISel for the amdgpu target;
it is enabled for targets which do not directly support "frem" and for
which no matching "fmod" LibCall is available.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
|
|
This patch introduces `SCEVDivisionPrinterPass` and registers it under
the name `print<scev-division>`, primarily for testing purposes. This
pass invokes `SCEVDivision::divide` upon encountering `sdiv`, and prints
the numerator, denominator, quotient, and remainder. It also adds
several test cases, some of which are currently incorrect and require
fixing.
Along with that, this patch added some comments to clarify the behavior
of `SCEVDivision::divide`, as follows:
- This function does NOT actually perform the division
- Given the `Numerator` and `Denominator`, find a pair
`(Quotient, Remainder)` s.t.
`Numerator = Quotient * Denominator + Remainder`
- The common condition `Remainder < Denominator` is NOT necessarily
required
- There may be multiple solutions for `(Quotient, Remainder)`, and this
function finds one of them
- Especially, there is always a trivial solution `(0, Numerator)`
- The following computations may wrap
- The multiplication of `Quotient` and `Denominator`
- The addition of `Quotient * Denominator` and `Remainder`
Related discussion: #154745
|
|
-print-after-pass-number/-print-after-pass-number options allow multiple pass numbers specified (#155228)
`-print-before` and `-print-after` support multiple passes as a list of
strings, so it makes sense that we also support
`-print-before-pass-number` and `-print-after-pass-number` taking a list
of pass numbers as input. This is useful if you want to print out the
IRs before/after specified passes with pass numbers reported by
print-pass-numbers in a single run.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
Reverts llvm/llvm-project#148758
[As
requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As [suggested in the RFC
comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4),
it adds the new metadata to all loops at the time of profile ingestion
and estimates each trip count from the loop's `branch_weights` metadata.
As [suggested in the PR #128785
review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036),
it does so via a new `PGOEstimateTripCountsPass` pass, which creates the
new metadata for each loop but omits the value if it cannot estimate a
trip count due to the loop's form.
An important observation not previously discussed is that
`PGOEstimateTripCountsPass` *often* cannot estimate a loop's trip count,
but later passes can sometimes transform the loop in a way that makes it
possible. Currently, such passes do not necessarily update the metadata,
but eventually that should be fixed. Until then, if the new metadata has
no value, `llvm::getLoopEstimatedTripCount` disregards it and tries
again to estimate the trip count from the loop's current
`branch_weights` metadata.
|
|
When compiling in `--hipstdpar` mode, the builtins corresponding to the
standard library might end up in code that is expected to execute on the
accelerator (e.g. by using the `std::` prefixed functions from
`<cmath>`). We do not have uniform handling for this in AMDGPU, and the
errors that obtain are quite arcane. Furthermore, the user-space changes
required to work around this tend to be rather intrusive.
This patch adds an additional `--hipstdpar` specific pass which forwards
to the run time component of HIPSTDPAR the intrinsics / libcalls which
result from the use of the math builtins, and which are not properly
handled. In the long run we will want to stop relying on this and handle
things in the compiler, but it is going to be a rather lengthy journey,
which makes this medium term escape hatch necessary.
The paired change in the run time component is here
<https://github.com/ROCm/rocThrust/pull/551>.
|
|
ObjCARCAPElimPass has been made obsolete now that we remove unused
autorelease pools.
|
|
Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`)
Tracking issue: #147390
|
|
This allows for unit testing of finalizeBundle with standard MIR tests
using update_mir_test_checks.py.
|
|
Enabling one of MemorySSA or MD implies the other is off.
Already approved in https://github.com/llvm/llvm-project/pull/149473 but
I had to revert as I missed updating one test.
|
|
This reverts commit 60d2d94db253a9fdc7bd111120c803f808564b30.
|
|
Enabling one of MemorySSA or MD implies the other is off.
|
|
same as https://github.com/llvm/llvm-project/pull/139517
This replaces the InvalidateAnalysisPass<MachineFunctionAnalysis> pass.
There are no cross-function analysis requirements right now, so clearing
all analyses works for the last pass in the pipeline.
Having the InvalidateAnalysisPass<MachineFunctionAnalysis>() is causing
a problem with ModuleToCGSCCPassAdaptor by deleting machine functions
for other functions and ending up with exactly one correctly compiled
MF, with the rest being vanished.
This is because ModuleToCGSCCPAdaptor propagates PassPA (received from
the CGSCCToFunctionPassAdaptor that runs the actual codegen pipeline on
MFs) to the next SCC. That causes MFA invalidation on functions in the
next SCC.
For us, PassPA happens to be returned from
invalidate<machine-function-analysis> which abandons the
MachineFunctionAnalysis. So while the first function runs through the
pipeline normally, invalidate also deletes the functions in the next SCC
before its pipeline is run. (this seems to be the intended mechanism of
the CG adaptor to allow cross-SCC invalidations.
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
This commit adds a new pass gate that allows selective disabling
of one or more passes via the clang command line using the
`-opt-disable` option. Passes to be disabled should be specified as a
comma-separated list of their names.
The implementation resides in the same file as the bisection tool. The
`getGlobalPassGate()` function returns the currently enabled gate.
Example: `-opt-disable="PassA,PassB"`
Pass names are matched using case-insensitive comparisons. However, note
that special characters, including spaces, must be included exactly as
they appear in the pass names.
Additionally, a `-opt-disable-enable-verbosity` flag has been introduced to
enable verbose output when this functionality is in use. When enabled,
it prints the status of all passes (either running or NOT running),
similar to the default behavior of `-opt-bisect-limit`. This flag is
disabled by default, which is the opposite of the `-opt-bisect-verbose`
flag (which defaults to enabled).
To validate this functionality, a test file has also been provided. It reuses
the same infrastructure as the opt-bisect test, but disables three
specific passes and checks the output to ensure the expected behavior.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
same as https://github.com/llvm/llvm-project/pull/138829
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
same as https://github.com/llvm/llvm-project/pull/138828,
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
|
|
|
|
This will be useful for testing the set of calls for different systems,
and eventually the product of context specific modifiers applied. In
the future we should also know the type signatures, and be able to
emit the correct one.
|
|
This allows `machine-function(p1,machine-function(...))` instead of
erroring.
Effectively it is flattened to a single MFPM.
|
|
As mentioned in https://github.com/llvm/llvm-project/pull/145071,
LoopInterchange should be part of the optimization pipeline rather than
the simplification pipeline. This patch moves LoopInterchange into the
optimization pipeline.
More contexts:
- By default, LoopInterchange attempts to improve data locality,
however, it also takes increasing vectorization opportunities into
account. Given that, it is reasonable to run it as close to
vectorization as possible.
- I looked into previous changes related to the placement of
LoopInterchange, but couldn’t find any strong motivation suggesting that
it benefits other simplifications.
- As far as I tried some tests (including llvm-test-suite), removing
LoopInterchange from the simplification pipeline does not affect other
simplifications. Therefore, there doesn't seem to be much value in
keeping it there.
- The new position reduces compile-time for ThinLTO, probably because it
only runs once per function in post-link optimization, rather than both
in pre-link and post-link optimization.
I haven't encountered any cases where the positional difference affects
optimization results, so please feel free to revert if you run into any issues.
|
|
There are a handful of passes in PassRegistry.def with outdated or
missing pass options. These strings describing pass options are used for
the printPassNames() function only, which is likely why they have gotten
out-of-date without being caught. This MR simply changes the few passes
where the option string is out-of-date, fixing the output of
-print-passes. This does not affect functionality of the pipeline
parser, and is hard to verify in a unit test, so no tests were added.
|
|
|
|
scaling (#143986)
Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that we can avoid scaling each time while computing embeddings. This PR refactors the vocabulary to explicitly define 3 sections---Opcodes, Types, and Arguments---used for computing Embeddings.
(Tracking issue - #141817 ; partly fixes - #141832)
|
|
Change `ModulePass::skipModule` to take const Module reference.
Additionally, make `OptPassGate::shouldRunPass` const as well as for
most implementations it's a const query. For `OptBisect`, make
`LastBisectNum` mutable so it could be updated in `shouldRunPass`.
Additional minor cleanup: Change all StringRef arguments to simple
StringRef (no const or reference), change `OptBisect::Disabled` to
constexpr.
|
|
Expose the FatLTO pipeline via `-passes="fatlto-pre-link<Ox>"`, similar
to all the other optimization pipelines. This is to allow reproducing it
outside clang. (Possibly also useful for C API users.)
|
|
|
|
Pipelines like `-passes="default<O3>"` are currently parsed in a special
way. Switch them to work like normal, parameterized module passes.
|
|
The entry count of a function needs to be updated after a callsite is elided by TRE: before elision, the entry count accounted for the recursive call at that callsite. After TRE, we need to remove that callsite's contribution.
This patch enables this for instrumented profiling cases because, there, we know the function entry count captured entries before TRE. We cannot currently address this for sample-based (because we don't know whether this function was TRE-ed in the binary that donated samples)
|
|
This is a fairly exotic pass, I don't think it makes a lot of sense to
run it at O1, esp. as vectorization wouldn't run at O1 anyway.
|
|
Make HashRecognize a non-PassManager analysis that can be called to get
the result on-demand, creating a new getResult() entry-point. The issue
was discovered when attempting to use the analysis to perform a
transform in LoopIdiomRecognize.
|
|
This pass figures out whether inlining has exposed a constant address to
a lowered type test, and remove the test if so and the address is known
to pass the test. Unfortunately this pass ends up needing to reverse
engineer what LowerTypeTests did; this is currently inherent to the design
of ThinLTO importing where LowerTypeTests needs to run at the start.
Reviewers: teresajohnson
Reviewed By: teresajohnson
Pull Request: https://github.com/llvm/llvm-project/pull/141327
|