Age | Commit message (Collapse) | Author | Files | Lines |
|
This optimization is performed as a separate pass over newly inserted PHI nodes to simplify and deduplicate them. By processing PHIs separately, we avoid the complexity of tracking reference bookkeeping needed to update BBValueInfo structures during insertion.
|
|
|
|
alternate DWARF SourceLanguage encoding (#162255)
This patch sets up `DICompileUnit` to support the DWARFv6
`DW_AT_language_name` and `DW_AT_language_version` attributes (which are
set to replace `DW_AT_language`). This patch changes the
`DICompileUnit::SourceLanguage` field type to a `DISourceLanguageName`
that encapsulates the notion of "versioned vs. unversioned name". A
"versioned" name is one that has an associated version stored separately
in `DISourceLanguageName::Version`.
This patch just changes all the clients of the `getSourceLanguage` API
to the expect a `DISourceLanguageName`. Currently they all just `assert`
(via `DISourceLanguageName::getUnversionedName`) that we're dealing with
"unversioned names" (i.e., the pre-DWARFv6 language codes). In follow-up
patches (e.g., draft is at
https://github.com/llvm/llvm-project/pull/162261), when we start
emitting versioned language codes, the `getUnversionedName` calls can
then be adjusted to `getName`.
**Implementation considerations**
* We could have added a new member to `DICompileUnit` alongside the
existing `SourceLanguage` field. I don't think this would have made the
transition any simpler (clients would still need to be aware of
"versioned" vs. "unversioned" language names). I felt that encapsulating
this inside a `DISourceLanguageName` was easier to reason about for
maintainers.
* Currently DISourceLanguageName is a `12` byte structure. We could
probably pack all the info inside a `uint64_t` (16-bits for the name,
32-bits for the version, 1-bit for answering the `hasVersionedName`).
Just to keep the prototype simple I used a `std::optional`. But since
the guts of the structure are hidden, we can always change the layout to
a more compact representation instead.
**How to review**
* The new `DISourceLanguageName` structure is defined in
`DebugInfoMetadata.h`. All the other changes fall out from changing the
`DICompileUnit::SourceLanguage` from `unsigned` to
`DISourceLanguageName`.
|
|
For uniformity with other transforms.
|
|
llvm.coro.end (#155339)" (#159278)
As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp.
Update a test in inline/ML, Reapply #155339
|
|
Splitting out just the recipe finding code from #148626 into a utility
function (along with the extra pattern matchers). Hopefully this makes
reviewing a bit easier.
Added a gtest, since this isn't actually used anywhere yet.
|
|
of llvm.coro.end #153404"" (#159236)
Reverts llvm/llvm-project#155339 because of CI fail
|
|
llvm.coro.end #153404" (#155339)
As mentioned in #151067, current design of llvm.coro.end mixes two
functionalities: querying where we are and lowering to some code. This
patch separate these functionalities into independent intrinsics by
introducing a new intrinsic llvm.coro.is_in_ramp.
|
|
This patch adds a class that uses SSA construction, with debug values as
definitions, to determine whether and which debug values for a
particular variable are live at each point in an IR function. This will
be used by the IR reader of llvm-debuginfo-analyzer to compute variable
ranges and coverage, although it may be applicable to other debug info
IR analyses.
|
|
The m_GetElementPtr matcher is incorrect and incomplete. Fix it to match
all possible GEPs to avoid misleading users. It currently just has one
use, and the change is non-functional for that use.
|
|
This patch implements the `llvm.loop.estimated_trip_count` metadata
discussed in [[RFC] Fix Loop Transformations to Preserve Block
Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785).
As the RFC explains, that metadata enables future patches, such as PR
#128785, to fix block frequency issues without losing estimated trip
counts.
|
|
`buildBitSet` had a loop trough entire GlobalLayout to pickup matching
offsets.
The patch maps all offsets to correspondign
`TypeId`, so we pass prepared list of offsets into
`buildBitSet`.
On one large internal binary, `LowerTypeTests`
took 58% of ThinLTO link time before the patch.
After the patch just 7% (absolute saving is 200s).
|
|
fixes #152348
SimplifyCFG collapses raw buffer store from a if\else load into a
select.
This change prevents the TargetExtType dx.Rawbuffer from being replace
thus preserving the if\else blocks.
A further change was needed to eliminate the phi node before we process
Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
|
|
Extend ExtractLane and FirstActiveLane to support scalable VFs. This
allows correct handling when interleaving with VF = 1.
Alive2 proofs:
- Fixed codegen with this patch: https://alive2.llvm.org/ce/z/8Y5_Vc
(verifies as correct)
- Original codegen: https://alive2.llvm.org/ce/z/twdg3X (doesn't
verify)
Fixes https://github.com/llvm/llvm-project/issues/154967.
|
|
llvm.coro.end (#153404)"
This reverts commit 19a4f520952c2b87de43e7176f34be9906384a33.
See test failure in https://github.com/llvm/llvm-project/pull/153404
|
|
(#153404)
As mentioned in #151067, current design of `llvm.coro.end` mixes two
functionalities: querying where we are and lowering to some code. This
patch separate these functionalities into independent intrinsics by
introducing a new intrinsic `llvm.coro.is_in_ramp`.
|
|
types (#146171)
Up until now the seed collector could only collect seeds with the same
element type. For example, `i32` and <2 x i32>`.
This patch implements the collection of seeds with different types, like
`i32` and `i8`.
|
|
Move the logic to expand SCEVs directly to a late VPlan transform that
expands SCEVs in the entry block. This turns VPExpandSCEVRecipe into an
abstract recipe without execute, which clarifies how the recipe is
handled, i.e. it is not executed like regular recipes.
It also helps to simplify construction, as now scalar evolution isn't
required to be passed to the recipe.
|
|
Use VPIRMetadata for VPInterleaveRecipe to preserve noalias metadata
added by versioning.
This still uses InterleaveGroup's logic to preserve existing metadata
from IR. This can be migrated separately.
Fixes https://github.com/llvm/llvm-project/issues/153006.
PR: https://github.com/llvm/llvm-project/pull/153084
|
|
(#138472)
Add 3 new iterator ranges to VPPhiAccessors
* incoming_values(): returns a range over the incoming
values of a phi
* incoming_blocks(): returns a range over the incoming
blocks of a phi
* incoming_values_and_blocks: returns a range over pairs of
incoming values and blocks.
Depends on https://github.com/llvm/llvm-project/pull/124838.
PR: https://github.com/llvm/llvm-project/pull/138472
|
|
PredicateInfo needs some no-op to which the predicate can be attached.
Currently this is an ssa.copy intrinsic. This PR replaces it with a
no-op bitcast.
Using a bitcast is more efficient because we don't have the overhead of
an overloaded intrinsic. It also makes things slightly simpler overall.
|
|
Split up the not clearly named prepareForVectorization transform into
buildVPlan0, which adds the vector preheader, middle and scalar
preheader blocks, as well as the canonical induction recipes and sets
the trip count. The new transform is run directly after building the
plain CFG VPlan initially.
The remaining code handling early exits and adding the branch in the
middle block is renamed to handleEarlyExitsAndAddMiddleCheck and still
runs at the original position.
With the code movement, we only have to add the skeleton once to the
initial VPlan, and cloning will take care of the rest. It will also
enable moving other construction steps to work directly on VPlan0, like
adding resume phis.
PR: https://github.com/llvm/llvm-project/pull/150848
|
|
The CMake flag has been on by default for a month without any issues.
This makes the feature support in LLVM unconditional (but does not
enable the feature by default).
|
|
(#150847)
The initial VPlan closely reflects the original scalar loop, so unsing
VPWidenPHIRecipe here is premature. Widened phi recipes should only be
introduced together with other widened recipes.
PR: https://github.com/llvm/llvm-project/pull/150847
|
|
This patch extends the logic added in
https://github.com/llvm/llvm-project/pull/128061 to support
dereferenceability information from assumptions as well.
Unfortunately both assumption cache and the dominator tree need to be
threaded through multiple layers to make them available where needed.
PR: https://github.com/llvm/llvm-project/pull/147047
|
|
This fixes a LeakSanitizer failure on the sanitizer buildbots:
https://lab.llvm.org/buildbot/#/builders/52/builds/10088
|
|
Without dumping the faulty recipe isn't printed, so account for that
like in the other tests. Fixes the buildbot failure at
https://lab.llvm.org/buildbot/#/builders/2/builds/30229
|
|
Noticed this when checking the invariant that all phis in the header
block must be header phis. I think there's a missing set of parentheses
here, since otherwise it only cast<VPInstruction> when RecipeI isn't a
VPInstruction.
|
|
This enables creating VPBlendRecipes without underlying PHINode.
|
|
This is one of the final remaining debug-intrinsic specific codepaths
out there, and pieces of cross-LLVM infrastructure to do with debug
intrinsics.
|
|
At this stage I'm just opportunistically deleting any code using
debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll
get to deleting that in probably one or more two commits.
|
|
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
|
|
Patch 1/4 adding bitcode support.
Store whether or not a function is using Key Instructions in its DISubprogram so
that we don't need to rely on the -mllvm flag -dwarf-use-key-instructions to
determine whether or not to interpret Key Instructions metadata to decide
is_stmt placement at DWARF emission time. This makes bitcode support simple and
enables well defined mixing of non-key-instructions and key-instructions
functions in an LTO context.
This patch adds the bit (using DISubprogram::SubclassData1).
PR 144104 and 144103 use it during DWARF emission.
PR 44102 adds bitcode
support.
See pull request for overview of alternative attempts.
|
|
(#143206)
The `SeedCollector` class gets two new arguments: `CollectStores` and
`CollectLoads`. These replace the `sbvec-collect-seeds` cl::opt flag.
This is done to help with reusing the SeedCollector class in a future
pass. The cl::opt flag is moved to the seed collection pass:
Passes/SeedCollection.cpp
|
|
It should not be possible to construct one without a triple. It would
also be nice to delete TargetLibraryInfoWrapperPass, but that is more
difficult.
|
|
(#142284)
Add a new getNumOperandsForOpcode helper to determine the number of
operands from the opcode. For now, it is used to verify the number
operands at VPInstruction construction.
It returns -1 for a few opcodes where the number of operands cannot be
determined (GEP, Switch, PHI, Call).
This can also be used in a follow-up to determine if a VPInstruction is
masked based on the number of arguments.
PR: https://github.com/llvm/llvm-project/pull/142284
|
|
We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:
lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx # compute `address - base`
ror $0x3, %rcx # compute `(address - base) ror 3` i.e. index
cmp $0x4, %rcx # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1
A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:
lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax # compute `last element - address`
ror $0x3, %rax # compute `(last element - address) ror 3` i.e. 4 - index
cmp $0x4, %rax # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1
Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.
Reviewers: fmayer, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/142887
|
|
This reverts commit 31abf0774232735ad7a7d45e531497305bf99fae.
|
|
Most of the recent development on the MemProfiler has been on the Use part. The instrumentation has been quite stable for a while. As the complexity of the use grows (with undrifting, diagnostics etc) I figured it would be good to separate these two implementations.
|
|
This reverts commit 1268352656f81ea173860a8002aadb88844137e7.
|
|
This patch implements a simple pass that tries to de-duplicate packs. If
there are two packing patterns inserting the exact same values in the
exact same order, then we will keep the top-most one of them. Even
though such patterns may be optimized away by subsequent passes it is
still useful to do this within the vectorizer because otherwise the cost
estimation may be off, making the vectorizer over conservative.
|
|
Use regular VPPhi instead of a separate opcode for resume phis. This
removes an unneeded specialized opcode and unifies the code
(verification, printing, updating when CFG is changed).
Depends on https://github.com/llvm/llvm-project/pull/140132.
PR: https://github.com/llvm/llvm-project/pull/140405
|
|
Follow-up to https://github.com/llvm/llvm-project/pull/141428, to also
use EMIT-SCALAR for VPPhis that are single scalars.
|
|
Update initial construction to connect the Plan's entry to the scalar
preheader during initial construction. This moves a small part of the
skeleton creation out of ILV and will also enable replacing
VPInstruction::ResumePhi with regular VPPhi recipes.
Resume phis need 2 incoming values to start with, the second being the
bypass value from the scalar ph (and used to replicate the incoming
value for other bypass blocks). Adding the extra edge ensures we
incoming values for resume phis match the incoming blocks.
PR: https://github.com/llvm/llvm-project/pull/140132
|
|
This patch moves the logic to manage IR flags to a separate VPIRFlags
class. For now, VPRecipeWithIRFlags is the only class that inherits
VPIRFlags. The new class allows for simpler passing of flags when
constructing recipes, simplifying the constructors for various recipes
(VPInstruction in particular, which now just has 2 constructors, one
taking an extra VPIRFlags argument.
This mirrors the approach taken for VPIRMetadata and makes it easier to
extend in the future. The patch also adds a unified flagsValidForOpcode
to check if the flags in a VPIRFlags match the provided opcode.
PR: https://github.com/llvm/llvm-project/pull/140621
|
|
This reverts commit 204252e2df80876702616518a5154dccacf3ebac.
Recommit with a fix for the leak in a unit test.
|
|
This reverts commit 793bb6b257fa4d9f4af169a4366cab3da01f2e1f.
The recommitted version contains a fix to make sure only the original
phis are processed in convertPhisToBlends nu collecting them in a vector
first. This fixes a crash when no mask is needed, because there is only
a single incoming value.
Original message:
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
|
|
This reverts commit b263c08e1a0b54a871915930aa9a1a6ba205b099.
Looks like this triggers a crash in one of the Fortran tests. Reverting
while I investigate
https://lab.llvm.org/buildbot/#/builders/41/builds/6825
|
|
This patch moves the logic to predicate and linearize a VPlan to a
dedicated VPlan transform. It mostly ports the existing logic directly.
There are a number of follow-ups planned in the near future to
further improve on the implementation:
* Edge and block masks are cached in VPPredicator, but the block masks
are still made available to VPRecipeBuilder, so they can be accessed
during recipe construction. As a follow-up, this should be replaced by
adding mask operands to all VPInstructions that need them and use that
during recipe construction.
* The mask caching in a map also means that this map needs updating each
time a new recipe replaces a VPInstruction; this would also be handled
by adding mask operands.
PR: https://github.com/llvm/llvm-project/pull/128420
|
|
Of the 128-bits of buffer descriptor only 48 bits are address bits, so
following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54,
the logic conclusion is to set the index width to 48 bits instead of
the current value of 128.
Most of the test changes are mechanical datalayout updates, but there
is one actual change: the ptrmask test now uses .i48 instead of .i128
and I had to update SelectionDAGBuilder to correctly extend the mask.
Reviewed By: krzysz00
Pull Request: https://github.com/llvm/llvm-project/pull/139419
|