Age | Commit message (Collapse) | Author | Files | Lines |
|
SubtargetFeature (#161888)
Introduce a new SchedPredicate, `FeatureSchedPredicate`, that holds true
when a certain SubtargetFeature is enabled. This could be useful when we
want to configure a scheduling model with subtarget features.
I add this as a separate SchedPredicate rather than piggy-back on the
existing `SchedPredicate<[{....}]>` because first and foremost,
`SchedPredicate` is expected to only operate on MachineInstr, so it does
_not_ appear in `MCGenSubtargetInfo::resolveVariantSchedClass` but only
show up in `TargetGenSubtargetInfo::resolveSchedClass`. Yet I think
`FeatureSchedPredicate` will be useful for both MCInst and MachineInstr.
There is another subtle difference between `resolveVariantSchedClass`
and `resolveSchedClass` regarding how we access the MCSubtargetInfo
instance, if we really want to express `FeatureSchedPredicate` using
`SchedPredicate<[{.....}]>`.
So I thought it'll be easier to add another new SchedPredicate for
SubtargetFeature.
|
|
resolveVariantSchedClassImpl (#161886)
`Target_MC::resolveVariantSchedClassImpl` is the implementation function
for `TargetGenMCSubtargetInfo::resolveVariantSchedClass`. Despite being
only called by `resolveVariantSchedClass`,
`resolveVariantSchedClassImpl` is still a standalone function that
cannot access a MCSubtargetInfo through `this` (i.e.
`TargetGenMCSubtargetInfo`). And having access to a `MCSubtargetInfo`
could be useful for some (future) SchedPredicate.
This patch modifies TableGen to generate `resolveVariantSchedClassImpl`
with an additional `MCSubtargetInfo` argument passing in. Note that this
does not change any public interface in either `TargetGenMCSubtargetInfo
` or `MCSubtargetInfo`, as `resolveVariantSchedClassImpl` is basically
an internal function.
|
|
This is a generalization of the LookupPtrRegClass mechanism.
AMDGPU has several use cases for swapping the register class of
instruction operands based on the subtarget, but none of them
really fit into the box of being pointer-like.
The current system requires manual management of an arbitrary integer
ID. For the AMDGPU use case, this would end up being around 40 new
entries to manage.
This just introduces the base infrastructure. I have ports of all
the target specific usage of PointerLikeRegClass ready.
|
|
`Predicates` and `Features` fields serve the same purpose. They should
be kept in sync, but not all predicates are based on features. This
resulted in introducing dummy features for that only reason.
This patch removes `Features` field and changes TableGen emitters to use
`Predicates` instead.
Historically, predicates were written with the assumption that the
checking code will be used in `SelectionDAGISel` subclasses, meaning
they will have access to the subclass variables, such as `Subtarget`.
There are no such variables in the generated
`GenSubtargetInfo::getHwModeSet()`, so we need to provide them. This can
be achieved by subclassing `HwModePredicateProlog`, see an example in
`Hexagon.td`.
|
|
Dynamic relocations are expensive on ELF/Linux platforms because they
are applied in userspace on process startup. Therefore, it is worth
optimizing them to make PIE and PIC dylib builds faster. In +asserts
builds (non-NDEBUG), nikic identified these schedule class name string
pointers as the leading source of dynamic relocations. [1]
This change uses llvm::StringTable and the StringToOffsetTable TableGen
helper to turn the string pointers into 32-bit offsets into a separate
character array.
The number of dynamic relocations is reduced by ~60%:
❯ llvm-readelf --dyn-relocations lib/libLLVM.so | wc -l
381376 # before
155156 # after
The test suite time is modestly affected, but I'm running on a shared
noisy workstation VM with a ton of cores:
https://gist.github.com/rnk/f38882c2fe2e63d0eb58b8fffeab69de
Testing Time: 100.88s # before
Testing Time: 78.50s. # after
Testing Time: 96.25s. # before again
I haven't used any fancy hyperfine/denoising tools, but I think the
result is clearly visible and we should ship it.
[1] https://gist.github.com/nikic/554f0a544ca15d5219788f1030f78c5a
|
|
(#144594)
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
|
|
(#137670)"
This reverts commit 9208b343e962b9f1140ee345c0050a3920bdcbf2.
TargetParser shouldn't re-run the PPC subtarget tablegen target, it
should define its own `-gen-ppc-target-def` rule like all the other
targets do in llvm/include/llvm/TargetParser/CMakeLists.txt .
One user reported that there are incorrect CMake dependencies after this
change, so I will roll this back in the meantime.
|
|
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
- Also eliminate unneeded std::string() around some literal strings.
|
|
|
|
TuneFeatures (#137864)
I don't think a processor should normally contains duplicate features. This patch prints a warning whenever that happens in either the features or tune features.
|
|
Acquire/ReleaseAtCycle (#131908)
I was looking at the value range of AcquireAtCycle / ReleaseAtCycle, and
I noticed that while the TableGen error messages said AcquireAtCycle has
to be less than ReleaseAtCycle, in reality they are actually allowed to
be the same. This patch fixes it and add more test cases.
|
|
- Eliminate use of `ConstRecIter` from `expandProcResources` by using `all_of()` to simplify the code.
|
|
- Use range for loops for processor models and schedule classes.
- Cleanup duplicated or unused iterators in CodeGenSchedule.h
|
|
This patch adds support for describing per-write resource cycle counts
for ReadAdvance records via a new optional field called `tunables`.
This makes it possible to declare ReadAdvance records such as:
def : ReadAdvance<Read_C, 1, [Write_A, Write_B], [2]>;
The above will effectively declare two entries in the ReadAdvance
table for Read_C, one for Write_A with a cycle count of 1+2, and one for
Write_B with a cycle count of 1+0 (omitted values are assumed 0).
The field `tunables` provides a list of deltas relative to the base
`cycle` count of the ReadAdvance. Since the field is optional and
defaults to a list of 0's, this change doesn't affect current targets.
|
|
(#123752)
This patch adds a new option `-aarch64-enable-zpr-predicate-spills`
(which is disabled by default), this option replaces predicate spills
with vector spills in streaming[-compatible] functions.
For example:
```
str p8, [sp, #7, mul vl] // 2-byte Folded Spill
// ...
ldr p8, [sp, #7, mul vl] // 2-byte Folded Reload
```
Becomes:
```
mov z0.b, p8/z, #1
str z0, [sp] // 16-byte Folded Spill
// ...
ldr z0, [sp] // 16-byte Folded Reload
ptrue p4.b
cmpne p8.b, p4/z, z0.b, #0
```
This is done to avoid streaming memory hazards between FPR/vector and
predicate spills, which currently occupy the same stack area even when
the `-aarch64-stack-hazard-size` flag is set.
This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO
and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos
handles scavenging the required registers (z0 in the above example) and,
in the worst case spilling a register to an emergency stack slot in the
expansion. The condition flags are also preserved around the `cmpne` in
case they are live at the expansion point.
|
|
NFC (#123876)
Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
|
|
a range. NFC (#123453)
|
|
They were accidentally dropped in
https://github.com/llvm/llvm-project/pull/96249
rdar://140853882
|
|
(#113318)
Code cleanups for TableGen files, changes includes function names,
variable names and unused imports.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
|
|
(#110713)
Change `getValueAsListOfDefs` to return a vector of const Record
pointer, and remove `getValueAsListOfConstDefs` that was added as a
transition aid.
This is a part of effort to have better const correctness in TableGen
backends:
https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
|
|
Adopt scaled indent in PredicateExpander.
Added pre/post inc/dec operators to `indent` and related unit tests.
Verified by comparing *.inc files generated by LLVM build with/without
the change.
|
|
It is almost always simpler to use {} instead of std::nullopt to
initialize an empty ArrayRef. This patch changes all occurrences I could
find in LLVM itself. In future the ArrayRef(std::nullopt_t) constructor
could be deprecated or removed.
|
|
Change CodeGenSchedule to use const Record pointers.
This is a part of effort to have better const correctness in TableGen
backends:
https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
|
|
Change SubtargetEmitter to use const RecordKeeper.
This is a part of effort to have better const correctness in TableGen
backends:
https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
|
|
Change CodeGenSchedule to use const RecordKeeper.
This is a part of effort to have better const correctness in TableGen
backends:
https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
|
|
Split RecordKeeper `getAllDerivedDefinitions` family of functions into
two variants:
(a) non-const ones that return vectors of `Record *` and
(b) const ones, that return vector/ArrayRef of `const Record *`.
This will help gradual migration of TableGen backends to use
`const RecordKeeper` and by implication change code to work
with const pointers and better const correctness.
Existing backends are not yet compatible with the const family of
functions, so change them to use a non-constant `RecordKeeper`
reference, till they are migrated.
|
|
- Extract code for sorting and checking duplicate Records into a helper
function and update `collectProcModels` to use the helper.
- Update `FeatureKeyValues` to:
(a) Remove code for duplicate checks and use the helper.
(b) Trim features with empty name explicitly to be able to use
the helper.
- Make the sorting deterministic by using record name as a secondary
key for sorting, and re-enable SubtargetFeatureUniqueNames.td test
that was disabled due to the non-determinism of the error messages.
- Change wording of error message when duplicate records are found to
be source code position agnostic, since `First` may not be before
`Second` lexically.
|
|
- Keep track of last definition of a feature in a `DenseMap` and use
it to report a better error message when a duplicate feature is found.
- Use StringMap instead of a std::map in `EmitStageAndOperandCycleData`
- Add a unit test to check if duplicate names are flagged.
|
|
... and use it to organize all of the AArch64 CPU aliases.
|
|
We hit this downstream and the only evidence of the mistake was that the
results of `Find` on `SubtargetFeatureKV` were corrupted.
|
|
FindReadAdvance (#92202)
This gives us a 30% speed improvement in our downstream.
|
|
(#92032)
SubtargetEmitter::GenSchedClassTables takes a CodeGenProcModel, but
calls hasReadOfWrite which loops over all ProcModels. We move
hasReadOfWrite to CodeGenProcModel and remove the loop over all
ProcModels. This leads to a 144% speedup on the RISC-V backend of our
downstream.
|
|
1. Bitwise operations are used to access HwMode, allowing for the
coexistence of HwMode IDs for different features (such as RegInfo and
EncodingInfo). This will provide better scalability for HwMode.
Currently, most users utilize HwMode primarily for configuring
Register-related information, and few use it for configuring Encoding.
The limited scalability of HwMode has been a significant factor in this
usage pattern.
2. Sink the HwMode Encodings selection logic down to per instruction
level, this makes the logic for choosing encodings clearer and provides
better error messages.
3. Add some HwMode ID conflict detection to the getHwMode() interface.
|
|
Refactor of the llvm-tblgen source into:
- a "Basic" library, which contains the bare minimum utilities to build
`llvm-min-tablegen`
- a "Common" library which contains all of the helpers for TableGen
backends. Such helpers can be shared by more than one backend, and even
unit tested (e.g. CodeExpander is, maybe we can add more over time)
Fixes #80647
|
|
Forming a `ReadAdvance` with an entry in the `ValidWrites` list that is
not used by any instruction results in the entire `ReadAdvance` to be
ignored by the scheduler due to an invalid entry.
The `SchedRW` collection code only picks up `SchedWrites` that are
reachable from `Instructions`, `InstRW`, `ItinRW` and `SchedAlias`,
leaving the unreachable ones with an invalid entry (0) in
`SubtargetEmitter::GenSchedClassTables` when going through the list of
`ReadAdvances`
|
|
|
|
Historically TableGen has used `A.swap(B)` to move containers without
the expense of copying them. Perhaps this predated rvalue references. In
any case `A = std::move(B)` seems like a more direct way to implement
this when only A is required after the operation.
|
|
```
find llvm/utils/TableGen -iname "*.h" -o -iname "*.cpp" | xargs clang-format-16 -i
```
Split from #80847
|
|
`Fusion` is inherited from `SubtargetFeature` now. Each definition
of `Fusion` will define a `SubtargetFeature` accordingly.
Method `getMacroFusions` is added to `TargetSubtargetInfo`, which
returns a list of `MacroFusionPredTy` that will be evaluated by
MacroFusionMution.
`getMacroFusions` will be auto-generated if the target has `Fusion`
definitions.
|
|
Use of llvm::Optional was migrated to std::optional. This included a
change in the constructor of ArrayRef.
However, there are still 2 places in the SubtargetEmitter which uses
llvm::None, causing a compile error when emitted.
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
|
|
This reverts commit 5b854f2c23ea1b000cb4cac4c0fea77326c03d43.
Build still failing.
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
This commit as previously reverted since it missed renaming that came
down after rebasing. This version of the commit fixes those problems.
Differential Revision: https://reviews.llvm.org/D158568
|
|
This reverts commit 030d33409568b2f0ea61116e83fd40ca27ba33ac.
This commit is causing build failures
|
|
D150312 added a TODO:
TODO: consider renaming the field `StartAtCycle` and `Cycles` to
`AcquireAtCycle` and `ReleaseAtCycle` respectively, to stress the
fact that resource allocation is now represented as an interval,
relatively to the issue cycle of the instruction.
This patch implements that TODO. This naming clarifies how to use these
fields in the scheduler. In addition it was confusing that `StartAtCycle` was
singular but `Cycles` was plural. This renaming fixes this inconsistency.
Differential Revision: https://reviews.llvm.org/D158568
|
|
Add missed change requested in D153371.
|
|
The sort of the elements in the GET_SUBTARGETINFO_MACRO block is done on
the "Name" field of each record. This field is not guaranteed to be unique,
is not guaranteed to even have a value at all, and is not used in the
output anyway. Change to sort on the "FieldName" field which should be
unique.
Problem spotted when lib/Target/PowerPC/PPCGenSubtargetInfo.inc changed
unexpectedly.
Differential Revision: https://reviews.llvm.org/D153371
|
|
SubtargetFeature.h is currently part of MC while it doesn't depend on
anything in MC. Since some LLVM components might have the need to work
with target features without necessarily needing MC, it might be
worthwhile to move SubtargetFeature.h to a different location. This will
reduce the dependencies of said components.
Note that I choose TargetParser as the destination because that's where
Triple lives and SubtargetFeatures feels related to that.
This issues came up during a JITLink review (D149522). JITLink would
like to avoid a dependency on MC while still needing to store target
features.
Reviewed By: MaskRay, arsenm
Differential Revision: https://reviews.llvm.org/D150549
|