Age | Commit message (Collapse) | Author | Files | Lines |
|
Addresssing issue #160847.
Move away from combining the intrinsic call and instead lower the ISD
nodes, using tablegen for pattern matching.
|
|
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.
---
This is reapplication of #160228, which broke Wasm waterfall. This PR
additionally prevents `FAKE_USE`s uses from being stackified.
Previously, a 'def' whose first use was a `FAKE_USE` was able to be
stackified as `TEE`:
- Before
```
Reg = INST ... // Def
FAKE_USE ..., Reg, ... // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```
- After RegStackify
```
DefReg = INST ... // Def
TeeReg, Reg = TEE ... DefReg
FAKE_USE ..., TeeReg, ... // Insert
INST ..., Reg, ...
INST ..., Reg, ...
```
And this assumes `DefReg` and `TeeReg` are stackified.
But this PR removes `FAKE_USE`s in the beginning of ExplicitLocals. And
later in ExplicitLocals we have a routine to unstackify registers that
have no uses left:
https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L257-L269
(This was added in #149626. Then it didn't seem it would trigger the
same assertions for `TEE`s because it was fixing the bug where a
terminator was removed in CFGSort (#149097).
Details here:
https://github.com/llvm/llvm-project/pull/149432#issuecomment-3091444141)
- After `FAKE_USE` removal and unstackification
```
DefReg = INST ...
TeeReg, Reg = TEE ... DefReg
INST ..., Reg, ...
INST ..., Reg, ...
```
And now `TeeReg` is unstackified. This triggered the assertion here,
that `TeeReg` should be stackified:
https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L316
This prevents `FAKE_USE`s' uses from being stackified altogether,
including `TEE` transformation. Even when it is not a `TEE`
transformation and just a single use stackification, it does not trigger
the assertion but there's no point stackifying it given that it will be
deleted.
---
Fixes https://github.com/emscripten-core/emscripten/issues/25301.
|
|
This change builds on https://github.com/llvm/llvm-project/pull/160319
which tries to clarify which *callers* (not backends) assume that the
result is actually trivial.
This change itself should be NFC. Essentially, I'm just renaming the
existing isTrivialRematerializable to the non-trivial version and then
adding a new trivial version (with the same name as the prior function)
and simplifying a few callers which want that semantic.
This change does *not* enable non-trivial remat any more broadly than
was already done for our targets which were lying through the old APIs;
that will come separately. The goal here is simply to make the code
easier to follow in terms of what assumptions are being made where.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
|
|
Reverts llvm/llvm-project#160228
See
https://github.com/llvm/llvm-project/pull/160228#issuecomment-3329752471
|
|
`FAKE_USE`s are essentially no-ops, so they have to be removed before
running ExplicitLocals so that `drop`s will be correctly inserted to
drop those values used by the `FAKE_USE`s.
Fixes https://github.com/emscripten-core/emscripten/issues/25301.
|
|
.. to isReMaterializableImpl. The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.
Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
|
|
This is a preparatory change for an upcoming reorganization of our
rematerialization APIs. Despite the interface being documented as
"trivial" (meaning no virtual register uses on the instruction being
considered for remat), our actual implementation inconsistently supports
non-trivial remat, and certain backends (AMDGPU and RISC-V mostly) lie
about instructions being trivial to abuse that. We want to allow
non-triial remat more broadly, but first we need to do some cleanup to
make it understandable what's going on.
These three call sites are ones which appear to actually want the
trivial definition, and appear fairly low risk to change.
p.s. I'm deliberately *not* updating any APIs in this change, I'm going
to do that as a followup once it's clear which category each callsite
fits in.
|
|
externally (#159143)
Rather then defining these tags in each object file that requires them
we can can declare them as undefined and require that they defined
externally in, for example, compiler-rt or libcxxabi.
|
|
The partial reduction intrinsics are no longer experimental, because
they've been used in production for a while and are unlikely to change.
|
|
Because `C_LONGJMP` is defined as 1 this assertion was never false.
|
|
getPointerRegClass is a layering violation. Its primary purpose
is to determine how to interpret an MCInstrDesc's operands RegClass
fields. This should be context free, and only depend on the subtarget.
The model of this is also wrong, since this should be an
instruction / operand specific property, not a global pointer class.
Remove the the function argument to help stage removal of this hook
and avoid introducing any new obstacles to replacing it.
The remaining uses of the function were to get the subtarget, which
TargetRegisterInfo already belongs to. A few targets needed new
subtarget derived properties copied there.
|
|
We currently only support partial.reduce.add in the case where we are
performing a multiply-accumulate. Now add support for any partial
reduction where the input is being extended, where we can take advantage
of extadd_pairwise.
|
|
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
|
|
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
WebAssemblyRegStackfy checks for writes to the stack pointer to avoid
stackifying across them, but it wasn't prepared for other global_set
instructions (such as writes in addrspace 1).
Fixes #156055
Thanks to @QuantumSegfault for reporting and identifying the offending
code.
|
|
First pass where we calculate the cost of the memory operation, as well
as the shuffles required. Interleaving by a factor of two should be
relatively cheap, as many ISAs have dedicated instructions to perform
the (de)interleaving. Several of these permutations can be combined for
an interleave stride of 4 and this is the highest stride we allow.
I've costed larger vectors, and more lanes, as more expensive because
not only is more work is needed but the risk of codegen going 'wrong'
rises dramatically. I also filled in a bit of cost modelling for vector
stores.
It appears the main vector plan to avoid is an interleave factor of 4
with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on
AArch64, and observe geomean improvement of ~3% with some kernels
improving 40-60%.
I know there is still significant performance being left on the table,
so this will need more development along with the rest of the cost
model.
|
|
During DAG combine, promote the operands to v8i16 by concanting with an
undef vector and then use extmul_low to perform the mul at i16. Finally,
shuffle the low bytes out of the i16 elements into the result vector.
|
|
The implementation follows what is done for ELF on other targets.
Fixes #100733.
|
|
Fixes https://github.com/llvm/llvm-project/issues/150550.
With the test case
```
void f(unsigned char *x, unsigned char *y, int n) {
// should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift
for (int i = 0; i < n; i++)
x[i] = (x[i] + y[i] + 1) / 2;
}
```
the backend failed to recognize that this can be reduced to avgr_u since
the loop vectorizer doesn't transform into the existing pattern in
tablegen.
This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to
avgr_u in the tablegen file.
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
The name is misleading, as setting Fragment to nullptr does not
necessarily make it undefined - common and equated symbols have
a nullptr fragment as well.
|
|
(#153864)
This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2.
Check if llvm-nm exists before building the benchmark.
|
|
(#153864)
…210)"
This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3.
Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)"
This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de.
Revert "TableGen: Emit statically generated hash table for runtime
libcalls (#150192)"
This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05.
Reverted three changes because of a CMake error while building llvm-nm
as reported in the following PR:
https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
|
|
(#153703)
This PR reapplies https://github.com/llvm/llvm-project/pull/149461
In the original `combineVectorSizedSetCCEquality`, the result of setcc
is being negated by returning setcc with the same cond code, leading to
wrong logic.
For example, with
```llvm
%cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16)
%res = icmp eq i32 %cmp_16, 0
```
the original PR producese all_true and then also compares the result
equal to 0 (using the same SETEQ in the returning setcc), meaning that
semantically, it effectively is calling icmp ne.
Instead, the PR should have use SETNE in the returning setcc, this way,
all true return 1, then it is compared again ne 0, which is equivalent
to icmp eq.
|
|
Does not yet fully propagate this down into the TargetLowering
uses, many of which are relying on null checks on the returned
value.
|
|
These make it easy to forget to initialize some members, like the newly
added OrigTy. Force these to always go through the ctor instead.
|
|
byte loads with simd128" (#153360)
Reverts llvm/llvm-project#149461
The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the
Emscripten test suite has failed. This PR applies a revert so I can take
a closer look at it
Test case link:
https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp
Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128
-o something.js`
Original comment report:
https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
|
|
fastSelectInstruction (#153302)
Recently my change to avoid duplicate `dontcall` attribute errors
(#152810) caused the Clang `Frontend/backend-attribute-error-warning.c`
test to fail on Arm32:
<https://lab.llvm.org/buildbot/#/builders/154/builds/20134>
The root cause is that, if the default `IFastSel` path bails, then
targets are given the opportunity to lower instructions via
`fastSelectInstruction`. That's the path taken by Arm32 and since its
implementation of `selectCall` didn't call `diagnoseDontCall` no error
was emitted.
I've checked the other implementations of `fastSelectInstruction` and
the only other one that lowers call instructions in WebAssembly, so I've
fixed that too.
|
|
loads with simd128 (#149461)
Fixes https://github.com/llvm/llvm-project/issues/149230
Previously, even with simd enabled via `-mattr=+simd128`, the compiler
cannot utilize v128 to optimize loads and setcc of i128, instead
legalizing it to consecutive i64s.
This PR then adds support for setcc of i128 by converting them to
v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16
bytes or more (when simd128 is present).
The check for enabling this optimization is if the comparison operand is
either a load or an integer in i128, with the comparison code being
either `EQ | NE`, without `NoImplicitFloat` function flag.
Inspiration taken from RISCV's isel lowering.
|
|
test_function_pointer_signature (#150921)
I fixed support for varargs functions
(previously it didn't crash but the codegen was incorrect).
I added tests for structs and unions which already work. With the
multivalue abi they crash in the backend, so I added a sema check that
rejects structs and unions for that abi.
It will also crash in the backend if passed an int128 or float128 type.
|
|
The information whether a specific argument is vararg or fixed is
currently stored separately from all the other argument information in
ArgFlags. This means that it is not accessible from CCAssign, and
backends have developed all kinds of workarounds for how they can access
it after all.
Move this information to ArgFlags to make it directly available in all
relevant places.
I've opted to invert this and store it as IsVarArg, as I think that both
makes the meaning more obvious and provides for a better default (which
is IsVarArg=false).
|
|
getImm() already returns int64_t.
|
|
The object file format specific derived classes are used in context
where the type is statically known. We don't use isa/dyn_cast and we
want to eliminate MCSymbol::Kind in the base class.
|
|
`Data` now references the first byte of the fixup offset within the current fragment.
MCAssembler::layout asserts that the fixup offset is within either the
fixed-size content or the optional variable-size tail, as this is the
most the generic code can validate without knowing the target-specific
fixup size.
Many backends applyFixup assert
```
assert(Offset + Size <= F.getSize() && "Invalid fixup offset!");
```
This refactoring allows a subsequent change to move the fixed-size
content outside of MCSection::ContentStorage, fixing the
-fsanitize=pointer-overflow issue of #150846
Pull Request: https://github.com/llvm/llvm-project/pull/151724
|
|
to facilitate replacing `MutableArrayRef<char> Data` (fragment content)
with the relocated location. This is necessary to fix the
pointer-overflow sanitizer issue and reland #150846
|
|
Also alphebetize feature list, add `-mgc` and `-mno-gc` flags, and add
some missing feature tests.
Reland of #151107.
https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
|
|
(#151268)
Reverts llvm/llvm-project#151107
|
|
See suggestion here:
https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
|
|
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle
for v16i8 mul.
On my AArch64 machine, using V8, I observe a 3.14% geomean improvement
across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv
and 1.8% for ncnn.
|
|
After #149310 lifetime intrinsics require an alloca argument, an
invariant that this pass can break.
I've fixed this in two ways:
* First, move static allocas into the entry block. Currently, the way
the pass splits the entry block makes all allocas dynamic, which I
assume was not actually intended. This will avoid unnecessary SSA
reconstruction for allocas as well, and thus avoid the problem.
* If this fails (for dynamic allocas) drop all lifetime intrinsics if
any one of them would require a rewrite during SSA reconstruction.
Fixes https://github.com/llvm/llvm-project/issues/150498.
|
|
Following footstep of https://github.com/llvm/llvm-project/pull/147487
(support for madd), this PR adds support for nmadd.
https://github.com/llvm/llvm-project/issues/55932 tracks this
|
|
The object file format specific derived classes are used in context like
MCStreamer and MCObjectTargetWriter where the type is statically known.
We don't use isa/dyn_cast and we want to eliminate
MCSection::SectionVariant in the base class.
|
|
Tests if the runtime type of the function pointer matches the static
type. If this returns false, calling the function pointer will trap.
Uses `@llvm.wasm.ref.test.func` added in #147486.
Also adds a "gc" wasm feature to gate the use of the ref.test
instruction.
|
|
Fixes https://github.com/llvm/llvm-project/issues/117200.
The default behavior in TargetLoweringBase is only scalar floats on fexp
are supported by default, not the vectorized version. This PR adds
`ISD::FEXP10` to the supported list.
|
|
Replacing an alloca with a call result in a lifetime intrinsic
will cause a verifier error.
Fixes https://github.com/llvm/llvm-project/issues/150498.
|
|
PR #147486 broke the sanitizer and expensive-checks buildbot.
These captures were needed when toWasmValType emitted a diagnostic but
are no longer needed since we changed it to an assertion failure. This
removes the unneeded captures and should fix the sanitizer-buildbot.
I also fixed the codegen in the wasm64 target: table.get requires an i32
but in wasm64 the function pointer is an i64. We need an additional
`i32.wrap_i64` to convert it. I also added `-verify-machineinstrs` to
the tests so that the test suite validates this fix.
Finally, I noticed that #150201 uses a feature of the intrinsic that is
not covered by the tests, namely `ptr` arguments. So I added one
additional test case to ensure that it works properly.
cc @dschuff
|
|
There are cases we end up removing some intructions that use stackified
registers after RegStackify. For example,
```wasm
bb.0:
%0 = ... ;; %0 is stackified
br_if %bb.1, %0
bb.1:
```
In this code, br_if will be removed in CFGSort, so we should unstackify
%0 so that it can be correctly dropped in ExplicitLocals.
Rather than handling this in case-by-case basis, this PR just
unstackifies all stackifies register with no uses in the beginning of
ExplicitLocals, so that they can be correctly dropped.
Fixes #149097.
|
|
This patch fixes:
llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp:126:26:
error: lambda capture 'DAG' is not used
[-Werror,-Wunused-lambda-capture]
llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp:239:28:
error: unused variable 'Info' [-Werror,-Wunused-variable]
|
|
This adds an llvm intrinsic for WebAssembly to test the type of a
function. It is intended for adding a future clang builtin
` __builtin_wasm_test_function_pointer_signature` so we can test whether
calling a function pointer will fail with function signature mismatch.
Since the type of a function pointer is just `ptr` we can't figure out
the expected type from that.
The way I figured out to encode the type was by passing 0's of the
appropriate type to the intrinsic.
The first argument gives the expected type of the return type and the
later values give the expected
type of the arguments. So
```llvm
@llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0)
```
tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to:
```wat
local.get $func
table.get $__indirect_function_table
ref.test (double, i32) -> (i32)
```
To indicate the function should be void, I somewhat arbitrarily picked
`token poison`, so the following tests for `(i32) -> ()`:
```llvm
@llvm.wasm.ref.test.func(ptr %func, token poison, i32 0)
```
To lower this intrinsic, we need some place to put the type information.
With `encodeFunctionSignature()` we encode the signature information
into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in
`WebAssemblyMCInstLower.cpp`.
|