Age | Commit message (Collapse) | Author | Files | Lines |
|
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
|
|
(#156400)
In future changes we will have more AV_ virtual registers, which
currently
block the formation of write2. Most of the time these registers can
simply
be constrained to VGPR, so do that.
Also relaxes the constraint in flat merging case. We already have the
necessary
code to insert copies to the original result registers, so there's no
point
in avoiding it.
Addresses the easy half of #155769
|
|
|
|
(#145078)
SILoadStoreOptimizer can now recognise consecutive 16-bit and 8-bit
`TBUFFER_LOAD`/`TBUFFER_STORE` instructions that each write
* a single component (`X`), or
* two components (`XY`),
and fold them into the wider native variants:
```
X + X --> XY
X + X + X + X --> XYZW
XY + XY --> XYZW
X + X + X --> XYZ
XY + X --> XYZ
```
The optimisation cuts the number of TBUFFER instructions, shrinking code
size and improving memory throughput.
|
|
|
|
Scaling is done on the operation size, by merging instructions we
would need to generate code to scale the offset and reset the
auto-scale bit. This is unclear if that would be beneficial, just
disable such merge for now.
|
|
getTargetLowering() already returns const SITargetLowering *.
|
|
|
|
- Change InstrInfoEmitter to emit OpName as an enum class
instead of an anonymous enum in the OpName namespace.
- This will help clearly distinguish between values that are
OpNames vs just operand indices and should help avoid
bugs due to confusion between the two.
- Rename OpName::OPERAND_LAST to NUM_OPERAND_NAMES.
- Emit declaration of getOperandIdx() along with the OpName
enum so it doesn't have to be repeated in various headers.
- Also updated AMDGPU, RISCV, and WebAssembly backends
to conform to the new definition of OpName (mostly
mechanical changes).
|
|
|
|
Generated automatically with:
$ clang-tidy -fix -checks=-*,llvm-qualified-auto $(find
lib/Target/AMDGPU/ -type f)
|
|
|
|
raw_ostream::operator<< (#106877)
These would implicitly cast the register to `unsigned`. Switch most of
them to use printReg will give a more readable output. Change some
others to use Register::id() so we can eventually remove the implicit
cast to `unsigned`.
|
|
|
|
fixes https://github.com/llvm/llvm-project/issues/102231 by inserting
missing checks.
|
|
(#101619)
Use the constrained buffer load opcodes while combining under-aligned
loads for XNACK enabled subtargets.
|
|
Consider the constrained multi-dword loads while merging
individual loads to a single multi-dword load.
|
|
|
|
Currently it is only supported for FLAT Global.
|
|
It does not properly handle situation when address calculation uses
V_ADDC_U32 0, 0, carry-in (i.e. with both src0 and src1 immediates).
|
|
Factor out copyToDestRegs and copyFromSrcRegs for merging store sources
and unmerging load results. NFC.
|
|
Use structured bindings and similar.
|
|
|
|
SILoadStoreOptimizer (#86285)
Added more buffer instruction merging support
|
|
This is part of #70452 that changes the type used for the external
interface of MMO to LocationSize as opposed to uint64_t. This means the
constructors take LocationSize, and convert ~UINT64_C(0) to
LocationSize::beforeOrAfter(). The getSize methods return a
LocationSize.
This allows us to be more precise with unknown sizes, not accidentally
treating them as unsigned values, and in the future should allow us to
add proper scalable vector support but none of that is included in this
patch. It should mostly be an NFC.
Global ISel is still expected to use the underlying LLT as it needs, and
are not expected to see unknown sizes for generic operations. Most of
the changes are hopefully fairly mechanical, adding a lot of getValue()
calls and protecting them with hasValue() where needed.
|
|
|
|
|
|
|
|
There is a problem with the
SILoadStoreOptimizer::dmasksCanBeCombined() function that can lead to
UB.
This boolean function decides if two masks can be combined into 1. The
idea here is that the bits which are "on" in one mask, don't overlap
with the "on" bits of the other. Consider an example (10 bits for
simplicity):
Mask 1: 0101101000
Mask 2: 0000000110
Those can be combined into a single mask: 0101101110.
To check if such an operation is possible, the code takes the mask
which is greater and counts how many 0s there are, starting from the
LSB and stopping at the first 1. Then, it shifts 1u by this number and
compares it with the smaller mask. The problem is that when both masks
are 0, the counter will find 32 zeroes in the first mask and will try
to do a shift by 32 positions which leads to UB.
The fix is a simple sanity check, if the bigger mask is 0 or not.
https://reviews.llvm.org/D155051
|
|
|
|
After D147334 we never select _SGPR forms of SMEM instructions on
subtargets that also support the _SGPR_IMM form, so there is no need to
handle them here.
Differential Revision: https://reviews.llvm.org/D149139
|
|
Add support for merging _IDXEN and _BOTHEN variants of
TBUFFER_LOAD_FORMAT instruction.
|
|
|
|
|
|
|
|
C++17 allows us to call constructors pair and tuple instead of helper
functions make_pair and make_tuple.
Differential Revision: https://reviews.llvm.org/D139828
|
|
|
|
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
Reviewed By: dp, arsenm
Differential Revision: https://reviews.llvm.org/D137783
|
|
In a lot of places, we were just calling `getNamedOperandIdx` to check if the result was != or == to -1.
This is fine in itself, but it's verbose and doesn't make the intention clear, IMHO. I added a `hasNamedOperand` and replaced all cases I could find with regexes and manually.
Reviewed By: arsenm, foad
Differential Revision: https://reviews.llvm.org/D137540
|
|
Reviewed By: foad, rampitec
Differential Revision: https://reviews.llvm.org/D133787
|
|
With C++17 there is no Clang pedantic warning or MSVC C5051.
|
|
Apply merging to s_load as is done for s_buffer_load.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D130742
|
|
Flat can be merged with flat global since address cast is a no-op.
A combined memory operation needs to be promoted to flat.
Differential Revision: https://reviews.llvm.org/D120431
|
|
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120351
|
|
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120346
|
|
Loads and stores can be out of order in the SILoadStoreOptimizer.
When combining MachineMemOperands of two instructions operands are
sent in the IR order into the combineKnownAdjacentMMOs. At the
moment it picks the first operand and just replaces its offset and
size. This essentially loses alignment information and may generally
result in an incorrect base pointer to be used.
Use a base pointer in memory addresses order instead and only adjust
size.
Differential Revision: https://reviews.llvm.org/D120370
|
|
This adds handling of the _SADDR forms to the GLOBAL_LOAD combining.
TODO: merge global stores.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120285
|
|
There can be situations where global and flat loads and stores are not
combined by the vectorizer, in particular if their address space
differ in the IR but they end up the same class instructions after
selection. For example a divergent load from constant address space
ends up being the same global_load as a load from global address space.
TODO: merge global stores.
TODO: handle SADDR forms.
TODO: merge flat load/stores.
TODO: merge flat with global promoting to flat.
Differential Revision: https://reviews.llvm.org/D120279
|
|
Differential Revision: https://reviews.llvm.org/D120268
|