Age | Commit message (Collapse) | Author | Files | Lines |
|
Inspired by #160920#issuecomment-3341816198
|
|
Add `loongarch-enable-merge-offset` option to allow disabling the
`MergeBaseOffset` pass when using optimization.
Reviewers: SixWeining, heiher
Reviewed By: SixWeining, heiher
Pull Request: https://github.com/llvm/llvm-project/pull/161063
|
|
- Make use of the OpenCL extended instruction frexp.
- Creates a variable and passes it to OpExtInst instruction
|
|
This patch renames comparators
- from `std::equal_to<llvm::rdf::RegisterRef>` to
`llvm::rdf::RegisterRefEqualTo`, and
- from `std::less<llvm::rdf::RegisterRef>` to
`llvm::rdf::RegisterRefLess`.
The original specializations don't satisfy the requirements for the
original `std` templates by being stateful and
non-default-constructible, so they make the program have UB due to C++17
[namespace.std]/2, C++20/23 [namespace.std]/5.
> A program may explicitly instantiate a class template defined in the
standard library only if the declaration
> - depends on the name of at least one program-defined type, and
> - the instantiation meets the standard library requirements for the
original template.
|
|
- Added opencl Pipe builtins
- Pipe instructions were added in tablegen and lowered in
SPIRVBuiltins.cpp
---------
Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
Co-authored-by: Dmitry Sidorov <dmitry.sidorov@intel.com>
|
|
Added constraints related to Addressing model as specified in the
specification.
It conforms with the implementation in translator
Same as PR #160089
Solved all issues
|
|
SPV_EXT_relaxed_printf_string_address_space (#160245)
Added support for the extension to support more storageclass for printf
strings.
|
|
Mapped llvm.debugtrap intrinsic to OpNop in the SPIR-V backend, since
SPIR-V has no direct equivalent with tests.
|
|
Added SPIR-V support for constrained floating-point comparison
intrinsics (fcmp, fcmps) with lowering and tests.
|
|
The code was incorrectly converting all `undef` arguments to `i32`,
while the `spv_insertv` intrinsics only expects that for the first
operand, representing the aggregate type.
Fixes https://github.com/llvm/llvm-project/issues/127977
---------
Co-authored-by: Michal Paszkowski <michal@michalpaszkowski.com>
|
|
This commit add patterns for lsx and lasx to support generating
`[x]vadda.{b/h/w/d}` instructions.
Note: For convenience, this commit also set `ISD::ABS` as legal. As
shown in the tests, this brings no change to the results, just same as
the results obtained from expanding it before. But, setting it as legal
brings more vectorization opportunities to IR transformation which may
bring more vector optimization chances for later stages and the backend.
|
|
Try to remove `UnsafeFPMath` uses in arm backend. These global flags
block some improvements like
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797.
Remove them incrementally.
|
|
This patch aims at making the combination of single-float and N32/N64
ABI properly work.
Right now when both options are enabled the compiler chooses an
incorrect ABI and in some cases even generates wrong instructions.
The floating point behavior on MIPS is controlled through 3 flags:
soft-float, single-float, fp64. This makes things complicated because
fp64 indicates the presence of 64bit floating point registers, but
cannot be easily disabled (the mips3 feature require it, but mips3 CPUs
with only 32bit floating point exist). Also if fp64 is missing it
doesn't actually disable 64bit floating point operations, because
certain MIPS1/2 CPUs support 64bit floating point with 32bit registers,
hence the single-float option.
I'm guessing that originally single-float was only intended for the
latter case, and that's the reason why it doesn't properly work on 64bit
targets.
So this patch does the following:
- Make single-float a "master disable", even if fp64 is enabled this
should completely disable generation of 64bit floating point operations,
making it available on targets which hard require fp64.
- Add proper calling conventions for N32/N64 single-float combinations.
- Fixup codegen to not generate certain 64bit floating point operations,
apparently not assigning a register class to f64 values is not enough to
prevent them from showing up.
- Add tests for the new calling conventions and codegen.
|
|
If we have a AVX512 target capable of AVXIFMA but not AVX512IFMA then we must split 512-bit (or larger) types to 256-bits
Fixes #160928
|
|
declarations. (#160749)
This code doesn't work very well, but this makes it work when intrinsic
definitions are present. It now discounts functions declarations from
the set of attributes it looks at.
The code would have worked better before
0ab5b5b8581d9f2951575f7245824e6e4fc57dec when module-level attributes
could provide the information used to construct build-attributes.
|
|
This reverts commit aa08b1a9963f33ded658d3ee655429e1121b5212.
|
|
This is primarily to avoid folding a frame index materialized
into an SGPR into the pseudo; this would end up looking like:
%sreg = s_mov_b32 %stack.0
%av_32 = av_mov_b32_imm_pseudo %sreg
Which is not useful.
Match the check used for the b64 case. This is limited to the
pseudo to avoid regression due to gfx908's special case - it
is expecting to pass here with v_accvgpr_write_b32 for illegal
cases, and stay in the intermediate state with an sgpr input.
This avoids regressions in a future patch.
|
|
Since v2f32 is legal but v2i32 is not, this causes some sequences of
operations like bitcast (build_vector) to be lowered inefficiently.
|
|
This patch is based on https://github.com/llvm/llvm-project/pull/159713
This patch extends AddressSanitizer to support indexed/segment
instructions in RVV. It enables proper instrumentation for these memory
operations.
A new member, `MaybeOffset`, is added to `InterestingMemoryOperand` to
describe the offset between the base pointer and the actual memory
reference address.
Co-authored-by: Yeting Kuo <yeting.kuo@sifive.com>
|
|
This is a temporary fix for a regression from #154875.
The new pattern sets the hi part of V_BFI result and that confuses
si-fix-sgpr-copies - where the proper fix is likely to be.
During si-fix-sgpr-copies, an incorrect fold happens:
%86:vgpr_32 = V_BFI_B32_e64
%87:sreg_32 = COPY %86.hi16:vgpr_32
%95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, killed %87:sreg_32,
0, %63:vgpr_16, 0, 0
into
%86:vgpr_32 = V_BFI_B32_e64
%95:vgpr_32 = nofpexcept V_PACK_B32_F16_t16_e64 0, %86.lo16:vgpr_32, 0,
%63:vgpr_16, 0, 0
Fixes: Vulkan CTS dEQP-VK.glsl.builtin.precision_fp16_storage32b.*.
|
|
Fix M68k build failures caused by
https://github.com/llvm/llvm-project/pull/160797
|
|
permutation instructions (#160763)
In newer SiFIve7 cores like X390, permutation instructions like
vrgather.vv operates on LMUL smaller than a single DLEN could yield a
constant cycle. For slightly larger data that fits in the constraint of
`log2(SEW/8) + log2(LMUL) <= log2(DLEN / 32)`, these instructions can
also yield cycles that are proportional to the quadratic of LMUL, rather
than being proportional to VL.
Co-authored-by: Michael Maitland <michaeltmaitland@gmail.com>
|
|
Selecting vgpr for the uniform version of this pattern may lead to
unnecessary vgpr and waterfall loops.
|
|
Root Signature Flags, allow flags to block compilation of certain shader
stages. This PR implements a validation and notify the user if they
compile a root signature that is denying such shader stage.
Closes: https://github.com/llvm/llvm-project/issues/153062
Previously approved: https://github.com/llvm/llvm-project/pull/153287
---------
Co-authored-by: joaosaffran <joao.saffran@microsoft.com>
Co-authored-by: Joao Saffran <{ID}+{username}@users.noreply.github.com>
Co-authored-by: Joao Saffran <jderezende@microsoft.com>
|
|
Consolidate predicate definitions into top level entry point for PowerPC
target `PPC.td` and
remove duplicate definitions for 32/64 bit sub-target checks.
|
|
SMLoc itself encapsulates just a pointer, so there is no need to pass or
return it by reference.
|
|
|
|
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::VPERMILPV handling (#160849)
X86ISD::VPERMILPV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermilps(x,y)) -> vpermilps(freeze(x),freeze(y))
|
|
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::VPERMV handling (#160845)
X86ISD::VPERMV shuffles can't create undef/poison itself, allowing us to fold freeze(vpermps(x,y)) -> vpermps(freeze(x),freeze(y))
|
|
canCreateUndefOrPoisonForTargetNode/isGuaranteedNotToBeUndefOrPoisonForTargetNode - add X86ISD::PSHUFB handling (#160842)
X86ISD::PSHUFB shuffles can't create undef/poison itself, allowing us to fold freeze(pshufb(x,y)) -> pshufb(freeze(x),freeze(y))
|
|
Split out from #151300 to isolate TargetTransformInfo cost modelling for
fault-only-first loads from VPlan implementation details. This change
adds costing support for vp.load.ff independently of the VPlan work.
For now, model a vp.load.ff as cost-equivalent to a vp.load.
|
|
- Fix
https://github.com/llvm/llvm-project/pull/156591#issuecomment-3335218842
- As per https://cdrdv2.intel.com/v1/dl/getContent/671200 default
rounding mode is **round to nearest**.
|
|
|
|
An inline asm constraint "Jr", in AArch32, means that if the input value
is a compile-time constant in the range -4095 to +4095, then it can be
inserted into the assembly language as an immediate operand, and
otherwise it will be placed in a register.
The comment in the Arm backend said "It is not clear what this
constraint is intended for". I believe the answer is that that range of
immediate values are the ones you can use in a LDR or STR instruction.
So it's suitable for cases like this:
asm("str %0,[%1,%2]" : : "r"(data), "r"(base), "Jr"(offset) : "memory");
in the same way that the "Ir" constraint is suitable for the immediate
in a data-processing instruction such as ADD or EOR.
|
|
Factor out from #151275
Remove all UnsafeFPMath uses but ABI tags related part.
|
|
|
|
|
|
|
|
The RC of the folded operand does not need to be constrained based on
the RC of the current operand we are folding into.
The purpose of this PR is to facilitate this PR:
https://github.com/llvm/llvm-project/pull/151033
|
|
mips2 (#159717)
Modify instr movn/movz to mixture of beq, move, and sc.
Because atomic-min-max.ll test broken on the expensive builder, I revert
https://github.com/llvm/llvm-project/pull/149983 and resubmit this PR.
The broken reason:
In i16/i8 function expandAtomicBinOpSubword, we use two successor
after loop2MBB, one does not specify the second parameter, the other
use BranchProbability::getOne() that means 100% probability. This is
contradictory. And the second successor is also specified incorrectly.
The changess:
* llvm/lib/Target/Mips/MipsExpandPseudo.cpp:
Change loop2MBB`s second successor to correct one and delete the
second parameter BranchProbability::getOne().
* llvm/test/CodeGen/Mips/atomic-min-max.ll:
Add -verify-machineinstrs option in RUN command;
Modify i16 test and i8 test according to the changes.
Fix #145411.
|
|
(#155829)
Pad argument registers to preserve overaligned structs in LLVM IR.
Additionally, since i128 values will be lowered as split i64 pairs in
the backend, correctly set the alignment of such arguments as 16 bytes.
This should make clang compliant with the ABI specification and fix
https://github.com/llvm/llvm-project/issues/144709.
|
|
`extract(binop)->binop(extract)` combination (#159726)
|
|
(#159056)
After this commit, DAGCombiner will have more opportunities to optimize
vector types `and+...+not` to `andn`.
Many combines in DAGCombiner will be enabled, but only shows changes
after combining `and(add(not))` to `and(not(sub))` in the tests of this
commit.
|
|
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.
---
This is reapplication of #160228, which broke Wasm waterfall. This PR
additionally prevents `FAKE_USE`s uses from being stackified.
Previously, a 'def' whose first use was a `FAKE_USE` was able to be
stackified as `TEE`:
- Before
```
Reg = INST ... // Def
FAKE_USE ..., Reg, ... // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```
- After RegStackify
```
DefReg = INST ... // Def
TeeReg, Reg = TEE ... DefReg
FAKE_USE ..., TeeReg, ... // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```
And this assumes `DefReg` and `TeeReg` are stackified.
But this PR removes `FAKE_USE`s in the beginning of ExplicitLocals. And
later in ExplicitLocals we have a routine to unstackify registers that
have no uses left:
https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L257-L269
(This was added in #149626. Then it didn't seem it would trigger the
same assertions for `TEE`s because it was fixing the bug where a
terminator was removed in CFGSort (#149097).
Details here:
https://github.com/llvm/llvm-project/pull/149432#issuecomment-3091444141)
- After `FAKE_USE` removal and unstackification
```
DefReg = INST ...
TeeReg, Reg = TEE ... DefReg
INST ..., Reg, ...
INST ..., Reg, ...
```
And now `TeeReg` is unstackified. This triggered the assertion here,
that `TeeReg` should be stackified:
https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L316
This prevents `FAKE_USE`s' uses from being stackified altogether,
including `TEE` transformation. Even when it is not a `TEE`
transformation and just a single use stackification, it does not trigger
the assertion but there's no point stackifying it given that it will be
deleted.
---
Fixes https://github.com/emscripten-core/emscripten/issues/25301.
|
|
|
|
Fix two bugs. The first bug hid the second bug.
1. Calculate IsVALU correctly during UADDO/USUBO selection. IsVALU
should be false if the carryout users are UADDO_CARRY/USUBO_CARRY.
However instruction selection visits uses before defs, so the
UADDO_CARRY/USUBO_CARRY nodes are normally (probably always) already
converted to S_ADD_CO_PSEUDO/S_SUB_CO_PSEUDO. Fix to check for these
machine opcodes.
2. Without this fix, UADDO/USUBO selection will always select the VALU
instructions V_ADD_CO__U32_e64/V_SUB_CO_U32_e64.
S_UADDO_PSEUDO/S_USUBO_PSEUDO were never selected in the CodeGen/AMDGPU
tests. Thus, S_UADDO_PSEUDO/S_USUBO_PSEUDO cases were never hit in
EmitInstrWithCustomInserter. The code generation for
S_UADDO_PSEUDO/S_USUBO_PSEUDO had a bug where it could not handle code
generation for 32-bit $scc_out.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
I was trying to understand the cases where `emitSPUpdate` may use a
`push` to adjust the stack pointer and realized that the code was more
complex than it needed to be.
* `Chunk` is redundant as it is set to `MaxSPChunk` and never modified.
* The only time we use the `push` optimization is if `Offset` is equal
to `SlotSize` (as `SlotSize` is never as large as `MaxSPChunk`, so will
never equal if `std::min` returned that value).
* If we use the `push` optimization, then we've finished adjusting the
stack and can return early instead of continuing the loop.
|
|
|
|
Users reported regressions to important matmul kernels as a result of
#155024. Although #155024 was a revert, this PR should allow them to
recover some of the lost performance.
|
|
Some LLVM passes need access to the filesystem to read configuration
files and similar. In some places, this is achieved by grabbing the VFS
from `PGOOptions`, but some passes don't have access to these and resort
to just calling `vfs::getRealFileSystem()`. This PR allows setting the
VFS directly on `PassBuilder` that's able to pass it down to all passes
that need it.
|