Age | Commit message (Collapse) | Author | Files | Lines |
|
Utility functions have been moved out to Utils. Minor opportunity to
drop the header where not needed.
|
|
coroutines to the `noalloc` variant (#99285)
This patch is episode three of the middle end implementation for the
coroutine HALO improvement project published on discourse:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
After we attribute the calls to some coroutines as "coro_elide_safe" in
the C++ FE and creating a `noalloc` ramp function, we use a new middle
end pass to move the call to coroutines to the noalloc variant.
This pass should be run after CoroSplit. For each node we process in
CoroSplit, we look for its callers and replace the attributed ones in
presplit coroutines to the noalloc one. The transformed `noalloc` ramp
function will also require a frame pointer to a block of memory it can
use as an activation frame. We allocate this on the caller's frame with
an alloca.
Please note that we cannot safely transform such attributed calls in
post-split coroutines due to memory lifetime reasons. The CoroSplit pass
is responsible for creating the coroutine frame spills for all the
allocas in the coroutine. Therefore it will be unsafe to create new
allocas like this one in post-split coroutines. This happens relatively
rarely because CGSCC performs the passes on the callees before the
caller. However, if multiple coroutines coexist in one SCC, this
situation does happen (and prevents us from having potentially unbound
frame size due to recursion.)
You can find episode 1: Clang FE of this patch series at
https://github.com/llvm/llvm-project/pull/99282
Episode 2: CoroSplit at https://github.com/llvm/llvm-project/pull/99283
|
|
synthetic count passes. (#107471)
The primary motivation is to remove `EntryCount` from `FunctionSummary`.
This frees 8 bytes out of `sizeof(FunctionSummary)` (136 bytes as of
https://github.com/llvm/llvm-project/commit/64498c54831bed9cf069e0923b9b73678c6451d8).
While I'm at it, this PR clean up {SummaryBasedOptimizations,
SyntheticCountsPropagation} since they were not used and there are no
plans to further invest on them.
With this patch, bitcode writer writes a placeholder 0 at the byte
offset of `EntryCount` and bitcode reader can parse the function entry
count at the correct byte offset. Added a TODO to stop writing
`EntryCount` and bump bitcode version
|
|
Pass to flatten and lower the contextual profile to profile (i.e. `MD_prof`) metadata. This is expected to be used after all IPO transformations have happened.
Prior to lowering, the instrumentation is maintained during IPO and the contextual profile is kept in sync (see PRs #105469, #106154). Flattening (#104539) sums up all the counters belonging to all a function's context nodes.
We first propagate counter values (from the flattened profile) using the same propagation algorithm as `PGOUseFunc::populateCounters`, then map the edge values to `branch_weights`. Functions. in the module that don't have an entry in the flattened profile are deemed cold, and any `MD_prof` metadata they may have is reset. The profile summary is also reset at this point.
Issue [#89287](https://github.com/llvm/llvm-project/issues/89287)
|
|
This patch implements the new pass and registers it with the pass
manager. For context, this is a vectorizer that operates on Sandbox IR,
which is a transactional IR on top of LLVM IR.
|
|
|
|
https://github.com/llvm/llvm-project/pull/106075 has removed the
last dependency on LoopInfo in InstCombine, so don't fetch the
analysis anymore and remove the use-loop-info pass option.
|
|
(Part I) (#96878)
This is simplifycfg part of
https://github.com/llvm/llvm-project/pull/95515
In this PR, we support hoisting load/store with conditional faulting in
`SimplifyCFGOpt::speculativelyExecuteBB` to eliminate conditional
branches.
This is for cases like
```
void test (int a, int *b) {
if (a)
*b = a;
}
```
In the following patches, we will support the hoist in
`SimplifyCFGOpt::hoistCommonCodeFromSuccessors`.
That is for cases like
```
void test (int a, int *c, int *d) {
if (a)
*c = a;
else
*d = a;
}
```
|
|
This transformation doesn't actually use any of the internal state of
LSR and recomputes all information from SCEV. Splitting it out makes
it easier to test.
Note that long term I would like to write a version of this transform
which *is* integrated with LSR's solver, but if that happens, we'll
just delete the extra pass.
Integration wise, I switched from using TTI to using a pass configuration
variable. This seems slightly more idiomatic, and means we don't run
the extra logic on any target other than RISCV.
|
|
`UseCtxProfile` (#104492)
|
|
Broke this out into its own commit to make the next one easier to
review.
Pull Request: https://github.com/llvm/llvm-project/pull/100700
|
|
DXIL Metadata Analysis passes (one for legacy PM and one for new PM)
that collect following DXIL module metadata information in a structure
are added.
1. Shader Model version
2. DXIL version
3. Shader Stage
Information collected using the legacy pass is verified by adding
additional test commands to existing metadata test sources.
|
|
(#102812)
Keep respecting the old cl::opt for now.
|
|
Split from #100596.
Introduce the RealtimeSanitizer transform, which inserts the
rtsan_enter/exit functions at the appropriate places in an instrumented
function.
|
|
This is an immutable analysis that loads and makes the contextual profile available to other passes. This patch introduces the analysis and an analysis printer pass. Subsequent patches will introduce the APIs that IPO passes will call to modify the profile as result of their changes.
|
|
unpredictable. (#98495)
The `!unpredictable` metadata has been present for a long time, but
it's usage in optimizations is still limited. This patch teaches
`FoldTwoEntryPHINode()` to be more aggressive with an unpredictable
branch to reduce mispredictions.
A TTI interface `getBranchMispredictPenalty()` is added to distinguish
between different hardwares to ensure we don't go too far for simpler
cores. For simplicity, only a naive x86 implementation is included for
the time being.
|
|
[CodeGen] Change the prototype of regalloc filter function
Change the prototype of the filter function so that we can
filter not just by RegClass. We need to implement more
complicated filter based upon some other info associated
with each register.
Patch provided by: Gang Chen (gangc@amd.com)
|
|
- Add `PHIEliminationPass `.
- Support new pass manager in `MachineBasicBlock:: SplitCriticalEdge `
|
|
Add `TwoAddressInstructionPass`.
|
|
- Add `MachineVerifierPass`.
- Use complete `MachineVerifierPass` in `VerifyInstrumentation` if
possible.
`LiveStacksAnalysis` will be added in future, all other analyses are
done.
|
|
Add `MachineOptimizationRemarkEmitterAnalysis` the legacy version
`MachineOptimizationRemarkEmitterPass` is already a wrapper.
|
|
- Add `MachineBlockFrequencyAnalysis`.
- Add `MachineBlockFrequencyPrinterPass`.
- Use `MachineBlockFrequencyInfoWrapperPass` in legacy pass manager.
- `LazyMachineBlockFrequencyInfo::print` is empty, drop it due to new
pass manager migration.
|
|
- Add `LiveIntervalsAnalysis`.
- Add `LiveIntervalsPrinterPass`.
- Use `LiveIntervalsWrapperPass` in legacy pass manager.
- Use `std::unique_ptr` instead of raw pointer for `LICalc`, so
destructor and default move constructor can handle it correctly.
This would be the last analysis required by `PHIElimination`.
|
|
- Add `SlotIndexesAnalysis`.
- Add `SlotIndexesPrinterPass`.
- Use `SlotIndexesWrapperPass` in legacy pass.
|
|
- Port `LiveVariables` to new pass manager.
- Convert to `LiveVariablesWrapperPass` in legacy pass manager.
|
|
- Add `MachineLoopAnalysis`.
- Add `MachineLoopPrinterPass`.
- Convert to `MachineLoopInfoWrapperPass` in legacy pass manager.
|
|
This reverts commit 493c384a7d94cce1d18824a6b0e1f9ee20cdc681
and includes a fix for the build breakage.
|
|
This reverts commit 15ad7919f6dd18b5d7f5a22daad6a5c25ecb8793.
The commit broke the build bot
https://lab.llvm.org/buildbot/#/builders/11/builds/822.
|
|
This PR introduces the numerical sanitizer originally proposed by
Clement Courbet on https://reviews.llvm.org/D97854
(https://arxiv.org/abs/2102.12782).
The main additions include:
- Migration to LLVM opaque pointers
- Migration to various updated APIs
- Extended coverage for LLVM instructions/intrinsics
- Code refactoring
The tool is still very experimental, the coverage (e.g. for intrinsics /
library functions) is incomplete.
Link: https://discourse.llvm.org/t/rfc-revival-of-numerical-sanitizer/79601
---------
Co-authored-by: Fangrui Song <i@maskray.me>
|
|
(#96858) (#96869)
This reverts commit ab58b6d58edf6a7c8881044fc716ca435d7a0156.
In `CodeGen/Generic/MachineBranchProb.ll`, `llc` crashed with dumped MIR
when targeting PowerPC. Move test to `llc/new-pm`, which is X86
specific.
|
|
In NewPM pass managers, add a "pretty stack frame" that tells you which
pass crashed while running which function.
For example `opt -O3 -passes-ep-peephole=trigger-crash-function test.ll`
will print something like this:
```
Stack dump:
0. Program arguments: build/bin/opt -S -O3 -passes-ep-peephole=trigger-crash-function test.ll
1. Running pass "function<eager-inv>(mem2reg,instcombine<max-iterations=1;no-use-loop-info;no-verify-fixpoint>,trigger-crash-function,simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;switch-range-to-icmp;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts;speculate-blocks;simplify-cond-branch>)" on module "test.ll"
2. Running pass "trigger-crash-function" on function "fshl_concat_i8_i8"
```
While the crashing pass is usually evident from the stack trace, this
also shows which function triggered the crash, as well as the pipeline
string for the pass (including options).
Similar functionality existed in the LegacyPM.
|
|
Reverts llvm/llvm-project#96389
Some ppc bots failed.
|
|
Like IR version `print<branch-prob>`, there is also a
`print<machine-branch-prob>`.
|
|
Now we have several machine function analyses but forgot to support them
in `parseMachinePass`.
|
|
manager (#96378)
Follows #95879.
|
|
(#96462)
On MSVC the `this` uses inside `decltype` require a lambda capture. On
clang they result in an unused capture warning instead. Add the capture
and suppress the warning with `(void)this`.
-----
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.
This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.
This way there is no compile-time impact if the mapping is not used.
|
|
My attempt to fix the Windows build made things worse,
revert entirely for now.
This reverts commit e7137f2fed5cfee822ae3c4c6d39188adb59a16c.
This reverts commit 6eaf204dbb0a6a81cddfd02f625c130f7bb1aae5.
This reverts commit 957dc4366dd2ce9d5d2991c3ad76bbf438e9954e.
|
|
Initializing this map is somewhat expensive (especially for O0), so we
currently only do it if certain flags are used. I would like to make use
of it for crash dumps (#96078), where we don't know in advance whether
it will be needed or not.
This patch changes the initialization to a lazy approach, where a
callback is registered that does the actual initialization. The
callbacks will be run the first time the pass name is requested.
This way there is no compile-time impact if the mapping is not used.
|
|
RAII class (#94854)
Modify MachineFunctionProperties in PassModel makes `PassT P;
P.run(...);` not work properly. This is a necessary compromise.
|
|
(#95879)
- Add `MachineDominatorTreeAnalysis`
- Add `MachineDominatorTreePrinterPass` There is no test for this
analysis in codebase.
Also, the pass name is renamed to `machine-dom-tree` instead of
`machinedomtree`.
|
|
Previously, there was at least one virtual function call for every
allocated register. The only users of this feature are AMDGPU and RISC-V
(RVV), other targets don't use this. To easily identify these cases,
change the default functor to nullptr and don't call it for every
allocated register.
|
|
This pass is not used in any pipeline, barely used in tests and not
really useful, so drop it. The only place where we "repeat" passes is
devirt repetition, and that is done using a separate pass.
|
|
LoopIdiomVectorize (#94081)
To facilitate sharing LoopIdiomTransform between AArch64 and RISC-V,
this first patch moves AArch64LoopIdiomTransform from lib/Target/AArch64
to lib/Transforms/Vectorize and renames it to LoopIdiomVectorize. The
following patch (#94082) will teach LoopIdiomVectorize how to generate VP
intrinsics (in addition to the current masked vector style) in favor of
RVV.
|
|
This pull request port `regallocfast` to new pass manager. It exposes
the parameter `filter` to handle different register classes for AMDGPU.
IIUC AMDGPU need to allocate different register classes separately so it
need implement its own `--<reg-class>-regalloc`. Now users can use e.g.
`-passe=regallocfast<filter=sgpr>` to allocate specific register class.
The command line option `--regalloc-npm` is still in work progress, plan
to reuse the syntax of passes, e.g. use
`--regalloc-npm=regallocfast<filter=sgpr>,greedy<filter=vgpr>` to
replace `--sgpr-regalloc` and `--vgpr-regalloc`.
|
|
This is a mostly-target-independent variadic function optimisation and
lowering pass. It is only enabled for AMDGPU in this initial commit.
The purpose is to make C style variadic functions a zero cost
abstraction. They are lowered to equivalent IR which is then amenable to
other optimisations. This is inherently slightly target specific but
much less so than one might expect - the C varargs interface heavily
constrains the ABI design divergence.
The pass is primarily tested from webassembly. This is because wasm has
a straightforward variadic lowering strategy which coincides exactly
with what this pass transforms code into and a struct passing convention
with few cases to check. Adding further targets conventions is
straightforward and elided from this patch primarily to simplify the
review. Implemented in other branches are Linux X86, AMD64, AArch64 and
NVPTX.
Testing for targets that have existing lowering for va_arg from clang is
most efficiently done by checking that clang | opt completely elides the
variadic syntax from test cases. The lowering produces a struct for each
call site which can be inspected to check the various alignment and
indirections are correct.
AMDGPU presently has no variadic support other than some ad hoc printf
handling. Combined with the pass being inactive on all other targets
landing this represents strict increase in capability with zero risk.
Testing and refining will continue post commit.
In addition to the compiler tests included here, a self contained x64
clang/musl toolchain was constructed using the "lowering" instead of the
systemv ABI and used to build various C programs like lua and libxml2.
|
|
There are two AArch64 tests use `-start-before` and `-print-after`. Rest tests uses `--passes` to test this pass.
|
|
It should preserve more analysis results, but it happens immediately
after instruction selection.
|
|
Reland of https://github.com/llvm/llvm-project/pull/91334, which broke
the gcc7 buildbot and was reverted in
https://github.com/llvm/llvm-project/pull/92321.
Work around the failure by being explicit about returning an `Expected`.
cc @joker-eph
|
|
(#92321)
Reverts llvm/llvm-project#91334
This broke the gcc7 build.
I suspect the issue is a mismatch on user-defined move constructor on
the return: `return PreservedGVs;` does not match the return type of the
function.
|
|
This PR adds a string interface to `InternalizePass`' `MustPreserveGV`
option, which is a callback function to indicate if a GV is not to be
internalized. This is for use in LLVM.jl, the Julia wrapper for LLVM,
which uses the C API and is thus required to use the PassBuilder string
API for building NewPM pipelines.
|