Age | Commit message (Collapse) | Author | Files | Lines |
|
fixes https://github.com/llvm/llvm-project/issues/47156
fixes https://github.com/llvm/llvm-project/issues/47155
|
|
Vectors are always bit-packed and don't respect the elements' alignment
requirements. This is different from arrays. This means offsets of
vector GEPs need to be computed differently than offsets of array GEPs.
This PR fixes many places that rely on an incorrect pattern
that always relies on `DL.getTypeAllocSize(GTI.getIndexedType())`.
We replace these by usages of `GTI.getSequentialElementStride(DL)`,
which is a new helper function added in this PR.
This changes behavior for GEPs into vectors with element types for which
the (bit) size and alloc size is different. This includes two cases:
* Types with a bit size that is not a multiple of a byte, e.g. i1.
GEPs into such vectors are questionable to begin with, as some elements
are not even addressable.
* Overaligned types, e.g. i16 with 32-bit alignment.
Existing tests are unaffected, but a miscompilation of a new test is fixed.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
This helper function shortens examples like
`cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();` to
`Node->getConstantOperandVal(1);`.
Implemented with:
`git grep -l
"cast<ConstantSDNode>\(.*->getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)->getOperand\((.*)\)\)->getZExtValue\(\)/\1->getConstantOperandVal(\2)/`
and `git grep -l
"cast<ConstantSDNode>\(.*\.getOperand\(.*\)\)->getZExtValue\(\)" | xargs
sed -E -i
's/cast<ConstantSDNode>\((.*)\.getOperand\((.*)\)\)->getZExtValue\(\)/\1.getConstantOperandVal(\2)/'`.
With a couple of simple manual fixes needed. Result then processed by
`git clang-format`.
|
|
|
|
platform (#75905)
`cmp; bc; isync` is more performant than `lwsync` theoretically.
64-bit platform already features it, now implement it for 32-bit
platform.
|
|
(#75772)
If MI is not PPC specific instructions, let base implementation decide
if MI is rematerizable.
This can fix failure in #75570 after #75271 .
|
|
|
|
Branch-absolute instructions are currently printed in decimal, and
negative addresses are printed as positive numbers.
With this change, addresses are printed in hex and negative addresses
are converted to an unsigned 32- or 64-bit address.
|
|
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
|
|
|
|
evaluateTargetFixup. (#73721)
Instead of using the STI stored in RISCVAsmBackend, try to get it from
the MCFragment.
This addresses the issue raised here
https://discourse.llvm.org/t/possible-problem-related-to-subtarget-usage/75283
|
|
(#73003)
This patch adds the majority of the missing extended mnemonics that were
introduced in Power 10.
The only extended mnemonics that were not added are related to the plq
and pstq instructions. These will be added in a separate patch as the
instructions themselves would also have to be added.
|
|
12 bit is not enough for PPC's target specific flags. If 8 bit for the
bitmask flags, 4 bit for the direct mask, PPC can total have 16 direct
mask and 8 bitmask. Not enough for PPC, see this issue in
https://github.com/llvm/llvm-project/pull/66316
Redesign how PPC target set the target specific flags. With this patch,
all ppc target flags are direct flags. No bitmask flag in PPC anymore.
This patch aligns with some targets like X86 which also has many target
specific flags.
The patch also fixes a bug related to flag `MO_TLSGDM_FLAG` and `MO_LO`.
They are the same value and the test case changes in this PR shows the
bug.
|
|
shouldClusterMemOps (#73778)
These are picked up from getMemOperandsWithOffsetWidth but weren't then
being passed through to shouldClusterMemOps, which forces backends to
collect the information again if they want to use the kind of heuristics
typically used for the similar shouldScheduleLoadsNear function (e.g.
checking the offset is within 1 cache line).
This patch just adds the parameters, but doesn't attempt to use them.
There is potential to use them in the current PPC and AArch64
shouldClusterMemOps implementation, and I intend to use the offset in
the heuristic for RISC-V. I've left these for future patches in the
interest of being as incremental as possible.
As noted in the review and in an inline FIXME, an ElementCount-style abstraction may later be used to condense these two parameters to one argument. ElementCount isn't quite suitable as it doesn't support negative offsets.
|
|
Identified with clangd.
|
|
The register class for the PADDI definition is incorrect as register
zero for RA is treated as an actual zero.
|
|
getOperandLatency has the following behavior: it returns -1 as a special
value, negative numbers other than -1 on some target-specific overrides,
or a valid non-negative latency. This behavior can be surprising, as
some callers do arithmetic on these negative values. Change the
interface of getOperandLatency to return a std::optional<unsigned> to
prevent surprises in callers. While at it, change the interface of
getInstrLatency to return unsigned instead of int.
This change was inspired by a refactoring in
TargetSchedModel::computeOperandLatency.
|
|
EXPENSIVE_CHECKS is enabled (#73940)
This was modifying a container as it iterated it, which tripped a check
in libstdc++'s debug checks.
Instead, just assign to the item via the reference we already have.
This fixes the following expensive checks failures on my machine:
LLVM :: CodeGen/PowerPC/fp-strict.ll
LLVM :: CodeGen/PowerPC/pr55463.ll
LLVM :: CodeGen/PowerPC/register-pressure.ll
LLVM :: CodeGen/PowerPC/spe.ll
Which are some of the tests noted by #68594.
|
|
ClusterSize (#73757)
As the same hook is called for both load and store clustering, NumLoads
is a misleading name. Use ClusterSize instead.
|
|
The string pooling pass was incorrectly pooling global varables that
were part of llvm.used or llvm.compiler.used. This patch fixes the pass
to prevent that by checking each candidate to make sure that it is not
in either of those lists.
|
|
|
|
reassociation as X86/PowerPC/RISCV. (#72820)
Don't blindly copy the original flags from the pre-reassociated
instrutions.
This copied the integer poison flags which are not safe to preserve
after reassociation.
For the FP flags, I think we should only keep the intersection of
the flags. Override setSpecialOperandAttr to do this.
Fixes #72777.
|
|
It seems TypeSize is currently broken in the sense that:
TypeSize::Fixed(4) + TypeSize::Scalable(4) => TypeSize::Fixed(8)
without failing its assert that explicitly tests for this case:
assert(LHS.Scalable == RHS.Scalable && ...);
The reason this fails is that `Scalable` is a static method of class
TypeSize,
and LHS and RHS are both objects of class TypeSize. So this is
evaluating
if the pointer to the function Scalable == the pointer to the function
Scalable,
which is always true because LHS and RHS have the same class.
This patch fixes the issue by renaming `TypeSize::Scalable` ->
`TypeSize::getScalable`, as well as `TypeSize::Fixed` to
`TypeSize::getFixed`,
so that it no longer clashes with the variable in
FixedOrScalableQuantity.
The new methods now also better match the coding standard, which
specifies that:
* Variable names should be nouns (as they represent state)
* Function names should be verb phrases (as they represent actions)
|
|
Reduces diff in #72227
|
|
support `isel` (#72211)
Some subtargets of PPC don't support `isel` instruction, early-ifcvt
should not insert this instruction.
|
|
|
|
Identified with clangd.
|
|
Identified with clangd.
|
|
Ientified with clangd.
|
|
Identified with clangd.
|
|
Identified with clangd.
|
|
arrays const. (#71079)
AMDGPU arrays were already const.
|
|
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
|
|
frexpl is for ppc_fp128. The correct symbol name for f128 is frexpf128.
|
|
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
|
|
assembly. (#70255)
This option already exists on GCC and so it is being added to LLVM so
that we use the same option as them.
|
|
Opaque ptr cleanup effort.
|
|
TypeSize::{Fixed,Scalable}. NFC
|
|
This was previously passed by value. It used to be passed by non-const
reference, but it was changed to value in D110610. I'm not sure why.
|
|
Previously they were passed by non-const reference. No in tree target
modifies the values.
This makes it possible to call assignValueToAddress from
assignCustomValue without a const_cast. For example in this patch
https://github.com/llvm/llvm-project/pull/69138.
|
|
Power10 does not support Hardware Transactional Memory instructions.
Remove to keep consistency.
|
|
Matches other code like the code model checking.
|
|
Note that llvm::support::endianness has been renamed to
llvm::endianness while becoming an enum class as opposed to an
enum. This patch replaces support::{big,little,native} with
llvm::endianness::{big,little,native}.
|
|
This PR is following what https://reviews.llvm.org/D134783 does for
quardword CAS.
|
|
This custom combine currently converts `and(anyext(x),c)` into
`anyext(and(x,c))`. This is not correct, because the original expression
guaranteed that the high bits are zero, while the new one sets them to
undef.
Emit `zext(and(x,c))` instead.
Fixes https://github.com/llvm/llvm-project/issues/68783.
|
|
Most fragments of these patterns are the same, we can simplify them by
defining a common pattern.
|
|
Now that llvm::support::endianness has been renamed to
llvm::endianness, we can use the shorter form. This patch replaces
support::endianness with llvm::endianness.
|
|
Add transformed register to kill flag work list for XVCVDPSP tranformations.
Ref: reviews.llvm.org/D133103
|
|
The `BlockFrequency` class abstracts `uint64_t` frequency values. Use it
more consistently in various APIs and disable implicit conversion to
make usage more consistent and explicit.
- Use `BlockFrequency Freq` parameter for `setBlockFreq`,
`getProfileCountFromFreq` and `setBlockFreqAndScale` functions.
- Return `BlockFrequency` in `getEntryFreq()` functions.
- While on it change some `const BlockFrequency& Freq` parameters to
plain `BlockFreqency Freq`.
- Mark `BlockFrequency(uint64_t)` constructor as explicit.
- Add missing `BlockFrequency::operator!=`.
- Remove `uint64_t BlockFreqency::getMaxFrequency()`.
- Add `BlockFrequency BlockFrequency::max()` function.
|
|
The SCV instruciton was added on PowerPC on Power 9. This patch adds the
SCV so that it may be used as part of inline asm but does not provide
patterns for it or scheduling information.
Co-authored-by: Stefan Pintilie <stefanp@ca.ibm.com>
|