aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/WebAssembly
AgeCommit message (Collapse)AuthorFilesLines
6 hours[WebAssembly] Use partial_reduce_mla ISD nodes (#161184)Sam Parker3-145/+54
Addresssing issue #160847. Move away from combining the intrinsic call and instead lower the ISD nodes, using tablegen for pattern matching.
5 days[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160768)Heejin Ahn2-0/+18
`FAKE_USE`s are essentially no-ops, so they have to be removed before running ExplicitLocals so that `drop`s will be correctly inserted to drop those values used by the `FAKE_USE`s. --- This is reapplication of #160228, which broke Wasm waterfall. This PR additionally prevents `FAKE_USE`s uses from being stackified. Previously, a 'def' whose first use was a `FAKE_USE` was able to be stackified as `TEE`: - Before ``` Reg = INST ... // Def FAKE_USE ..., Reg, ... // Insert INST ..., Reg, ... INST ..., Reg, ... ``` - After RegStackify ``` DefReg = INST ... // Def TeeReg, Reg = TEE ... DefReg FAKE_USE ..., TeeReg, ... // Insert INST ..., Reg, ... INST ..., Reg, ... ``` And this assumes `DefReg` and `TeeReg` are stackified. But this PR removes `FAKE_USE`s in the beginning of ExplicitLocals. And later in ExplicitLocals we have a routine to unstackify registers that have no uses left: https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L257-L269 (This was added in #149626. Then it didn't seem it would trigger the same assertions for `TEE`s because it was fixing the bug where a terminator was removed in CFGSort (#149097). Details here: https://github.com/llvm/llvm-project/pull/149432#issuecomment-3091444141) - After `FAKE_USE` removal and unstackification ``` DefReg = INST ... TeeReg, Reg = TEE ... DefReg INST ..., Reg, ... INST ..., Reg, ... ``` And now `TeeReg` is unstackified. This triggered the assertion here, that `TeeReg` should be stackified: https://github.com/llvm/llvm-project/blob/7b28fcd2b182ba2c9d2d71c386be92fc0ee3cc9d/llvm/lib/Target/WebAssembly/WebAssemblyExplicitLocals.cpp#L316 This prevents `FAKE_USE`s' uses from being stackified altogether, including `TEE` transformation. Even when it is not a `TEE` transformation and just a single use stackification, it does not trigger the assertion but there's no point stackifying it given that it will be deleted. --- Fixes https://github.com/emscripten-core/emscripten/issues/25301.
5 days[TII] Split isTrivialReMaterializable into two versions [nfc] (#160377)Philip Reames1-4/+1
This change builds on https://github.com/llvm/llvm-project/pull/160319 which tries to clarify which *callers* (not backends) assume that the result is actually trivial. This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic. This change does *not* enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
6 daysRevert "[WebAssembly] Remove FAKE_USEs before ExplicitLocals" (#160553)Derek Schuff1-14/+0
Reverts llvm/llvm-project#160228 See https://github.com/llvm/llvm-project/pull/160228#issuecomment-3329752471
7 days[WebAssembly] Remove FAKE_USEs before ExplicitLocals (#160228)Heejin Ahn1-0/+14
`FAKE_USE`s are essentially no-ops, so they have to be removed before running ExplicitLocals so that `drop`s will be correctly inserted to drop those values used by the `FAKE_USE`s. Fixes https://github.com/emscripten-core/emscripten/issues/25301.
7 days[CodeGen] Rename isReallyTriviallyReMaterializable [nfc]Philip Reames2-4/+4
.. to isReMaterializableImpl. The "Really" naming has always been awkward, and we're working towards removing the "Trivial" part now, so go ehead and remove both pieces in a single rename. Note that this doesn't change any aspect of the current implementation; we still "mostly" only return instructions which are trivial (meaning no virtual register uses), but some targets do lie about that today.
7 daysUpdate callers of isTriviallyReMaterializable to check trivialness (#160319)Philip Reames1-1/+4
This is a preparatory change for an upcoming reorganization of our rematerialization APIs. Despite the interface being documented as "trivial" (meaning no virtual register uses on the instruction being considered for remat), our actual implementation inconsistently supports non-trivial remat, and certain backends (AMDGPU and RISC-V mostly) lie about instructions being trivial to abuse that. We want to allow non-triial remat more broadly, but first we need to do some cleanup to make it understandable what's going on. These three call sites are ones which appear to actually want the trivial definition, and appear fairly low risk to change. p.s. I'm deliberately *not* updating any APIs in this change, I'm going to do that as a followup once it's clear which category each callsite fits in.
11 days[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined ↵Sam Clegg1-7/+0
externally (#159143) Rather then defining these tags in each object file that requires them we can can declare them as undefined and require that they defined externally in, for example, compiler-rt or libcxxabi.
13 days[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)Sander de Smalen1-3/+2
The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
2025-09-15[WebAssembly] Fix typo in Tag value assertion. NFC (#158752)Sam Clegg1-1/+1
Because `C_LONGJMP` is defined as 1 this assertion was never false.
2025-09-12CodeGen: Remove MachineFunction argument from getPointerRegClass (#158185)Matt Arsenault3-10/+7
getPointerRegClass is a layering violation. Its primary purpose is to determine how to interpret an MCInstrDesc's operands RegClass fields. This should be context free, and only depend on the subtarget. The model of this is also wrong, since this should be an instruction / operand specific property, not a global pointer class. Remove the the function argument to help stage removal of this hook and avoid introducing any new obstacles to replacing it. The remaining uses of the function were to get the subtarget, which TargetRegisterInfo already belongs to. A few targets needed new subtarget derived properties copied there.
2025-09-12[WebAssembly] Support partial-reduce accumulator (#158060)Sam Parker4-90/+127
We currently only support partial.reduce.add in the case where we are performing a multiply-accumulate. Now add support for any partial reduction where the input is being extended, where we can take advantage of extadd_pairwise.
2025-09-11[llvm] Move data layout string computation to TargetParser (#157612)Reid Kleckner1-13/+3
Clang and other frontends generally need the LLVM data layout string in order to generate LLVM IR modules for LLVM. MLIR clients often need it as well, since MLIR users often lower to LLVM IR. Before this change, the LLVM datalayout string was computed in the LLVM${TGT}CodeGen library in the relevant TargetMachine subclass. However, none of the logic for computing the data layout string requires any details of code generation. Clients who want to avoid duplicating this information were forced to link in LLVMCodeGen and all registered targets, leading to bloated binaries. This happened in PR #145899, which measurably increased binary size for some of our users. By moving this information to the TargetParser library, we can delete the duplicate datalayout strings in Clang, and retain the ability to generate IR for unregistered targets. This is intended to be a very mechanical LLVM-only change, but there is an immediately obvious follow-up to clang, which will be prepared separately. The vast majority of data layouts are computable with two inputs: the triple and the "ABI name". There is only one exception, NVPTX, which has a cl::opt to enable short device pointers. I invented a "shortptr" ABI name to pass this option through the target independent interface. Everything else fits. Mips is a bit awkward because it uses a special MipsABIInfo abstraction, which includes members with codegen-like concepts like ABI physical registers that can't live in TargetParser. I think the string logic of looking for "n32" "n64" etc is reasonable to duplicate. We have plenty of other minor duplication to preserve layering. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-10[WebAssembly] extadd_pairwise for PartialReduce (#157669)Sam Parker3-9/+14
Avoid using extends, and adding the high and low half and use extadd_pairwise instead.
2025-09-08CodeGen: Pass SubtargetInfo to TargetGenInstrInfo constructors (#157337)Matt Arsenault1-1/+1
This will make it possible for tablegen to make subtarget dependent decisions without adding new arguments to every target. --------- Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-02[WebAssembly] Guard use of getSymbolName with isSymbol (#156105)Derek Schuff1-1/+2
WebAssemblyRegStackfy checks for writes to the stack pointer to avoid stackifying across them, but it wasn't prepared for other global_set instructions (such as writes in addrspace 1). Fixes #156055 Thanks to @QuantumSegfault for reporting and identifying the offending code.
2025-08-27[WebAssembly] Implement getInterleavedMemoryOpCost (#146864)Sam Parker2-16/+113
First pass where we calculate the cost of the memory operation, as well as the shuffles required. Interleaving by a factor of two should be relatively cheap, as many ISAs have dedicated instructions to perform the (de)interleaving. Several of these permutations can be combined for an interleave stride of 4 and this is the highest stride we allow. I've costed larger vectors, and more lanes, as more expensive because not only is more work is needed but the risk of codegen going 'wrong' rises dramatically. I also filled in a bit of cost modelling for vector stores. It appears the main vector plan to avoid is an interleave factor of 4 with v16i8. I've used libyuv and ncnn for benchmarking, using V8 on AArch64, and observe geomean improvement of ~3% with some kernels improving 40-60%. I know there is still significant performance being left on the table, so this will need more development along with the rest of the cost model.
2025-08-27[WebAssembly] v8i8 mul support (#151145)Sam Parker1-22/+41
During DAG combine, promote the operands to v8i16 by concanting with an undef vector and then use extmul_low to perform the mul at i16. Finally, shuffle the low bytes out of the i16 elements into the result vector.
2025-08-24[WebAssembly] Implement the `.reloc` directive for WASM (#146952)SingleAccretion1-0/+19
The implementation follows what is done for ELF on other targets. Fixes #100733.
2025-08-22[WebAssembly] Add support for avgr_u in loops (#153252)Jasmine Tang2-0/+5
Fixes https://github.com/llvm/llvm-project/issues/150550. With the test case ``` void f(unsigned char *x, unsigned char *y, int n) { // should have been vectorized into avgr_u instead of seperated vectorized add and logical right shift for (int i = 0; i < n; i++) x[i] = (x[i] + y[i] + 1) / 2; } ``` the backend failed to recognize that this can be reduced to avgr_u since the loop vectorizer doesn't transform into the existing pattern in tablegen. This PR sets AVGCEIL_U as legal for v8i16 and v16i8 and selects it to avgr_u in the tablegen file.
2025-08-17[llvm] Remove unused includes (NFC) (#154051)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-08-17MCSymbol: Remove setUndefinedFangrui Song2-2/+0
The name is misleading, as setting Fragment to nullptr does not necessarily make it undefined - common and equated symbols have a nullptr fragment as well.
2025-08-16Reapply "RuntimeLibcalls: Generate table of libcall name lengths (#153… ↵Matt Arsenault1-2/+2
(#153864) This reverts commit 334e9bf2dd01fbbfe785624c0de477b725cde6f2. Check if llvm-nm exists before building the benchmark.
2025-08-15Revert "RuntimeLibcalls: Generate table of libcall name lengths (#153… ↵gulfemsavrun1-2/+2
(#153864) …210)" This reverts commit 9a14b1d254a43dc0d4445c3ffa3d393bca007ba3. Revert "RuntimeLibcalls: Return StringRef for libcall names (#153209)" This reverts commit cb1228fbd535b8f9fe78505a15292b0ba23b17de. Revert "TableGen: Emit statically generated hash table for runtime libcalls (#150192)" This reverts commit 769a9058c8d04fc920994f6a5bbb03c8a4fbcd05. Reverted three changes because of a CMake error while building llvm-nm as reported in the following PR: https://github.com/llvm/llvm-project/pull/150192#issuecomment-3192223073
2025-08-15[WebAssembly] Reapply #149461 with correct CondCode in combine of SETCC ↵Jasmine Tang2-3/+55
(#153703) This PR reapplies https://github.com/llvm/llvm-project/pull/149461 In the original `combineVectorSizedSetCCEquality`, the result of setcc is being negated by returning setcc with the same cond code, leading to wrong logic. For example, with ```llvm %cmp_16 = call i32 @memcmp(ptr %a, ptr %b, i32 16) %res = icmp eq i32 %cmp_16, 0 ``` the original PR producese all_true and then also compares the result equal to 0 (using the same SETEQ in the returning setcc), meaning that semantically, it effectively is calling icmp ne. Instead, the PR should have use SETNE in the returning setcc, this way, all true return 1, then it is compared again ne 0, which is equivalent to icmp eq.
2025-08-15RuntimeLibcalls: Return StringRef for libcall names (#153209)Matt Arsenault1-2/+2
Does not yet fully propagate this down into the TargetLowering uses, many of which are relying on null checks on the returned value.
2025-08-13[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)Nikita Popov1-4/+7
These make it easy to forget to initialize some members, like the newly added OrigTy. Force these to always go through the ctor instead.
2025-08-13Revert "[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 ↵Jasmine Tang2-54/+3
byte loads with simd128" (#153360) Reverts llvm/llvm-project#149461 The first test w/ memcmp in `test/neon/test_neon_wasm_simd.cpp` in the Emscripten test suite has failed. This PR applies a revert so I can take a closer look at it Test case link: https://github.com/emscripten-core/emscripten/blob/main/test/neon/test_neon_wasm_simd.cpp Compile option: `em++ test_neon_wasm_simd.cpp -O2 -mfpu=neon -msimd128 -o something.js` Original comment report: https://github.com/llvm/llvm-project/pull/149461#issuecomment-3181652746
2025-08-12Fix handling of dontcall attributes for arches that lower calls via ↵Daniel Paoliello1-0/+2
fastSelectInstruction (#153302) Recently my change to avoid duplicate `dontcall` attribute errors (#152810) caused the Clang `Frontend/backend-attribute-error-warning.c` test to fail on Arm32: <https://lab.llvm.org/buildbot/#/builders/154/builds/20134> The root cause is that, if the default `IFastSel` path bails, then targets are given the opportunity to lower instructions via `fastSelectInstruction`. That's the path taken by Arm32 and since its implementation of `selectCall` didn't call `diagnoseDontCall` no error was emitted. I've checked the other implementations of `fastSelectInstruction` and the only other one that lowers call instructions in WebAssembly, so I've fixed that too.
2025-08-12[WebAssembly] Combine i128 to v16i8 for setcc & expand memcmp for 16 byte ↵Jasmine Tang2-3/+54
loads with simd128 (#149461) Fixes https://github.com/llvm/llvm-project/issues/149230 Previously, even with simd enabled via `-mattr=+simd128`, the compiler cannot utilize v128 to optimize loads and setcc of i128, instead legalizing it to consecutive i64s. This PR then adds support for setcc of i128 by converting them to v16i8's anytrue and alltrue; consequently, this benefits memcmp of 16 bytes or more (when simd128 is present). The check for enabling this optimization is if the comparison operand is either a load or an integer in i128, with the comparison code being either `EQ | NE`, without `NoImplicitFloat` function flag. Inspiration taken from RISCV's isel lowering.
2025-08-07[clang][WebAssembly] Support reftypes & varargs in ↵Hood Chatham1-0/+9
test_function_pointer_signature (#150921) I fixed support for varargs functions (previously it didn't crash but the codegen was incorrect). I added tests for structs and unions which already work. With the multivalue abi they crash in the backend, so I added a sema check that rejects structs and unions for that abi. It will also crash in the backend if passed an int128 or float128 type.
2025-08-07[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)Nikita Popov1-2/+2
The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).
2025-08-04[Target] Remove unnecessary casts (NFC) (#151902)Kazu Hirata1-1/+1
getImm() already returns int64_t.
2025-08-03MCSymbolWasm: Remove classofFangrui Song8-42/+46
The object file format specific derived classes are used in context where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSymbol::Kind in the base class.
2025-08-02MCAsmBackend::applyFixup: Change `Data` to indicate the relocated locationFangrui Song1-6/+5
`Data` now references the first byte of the fixup offset within the current fragment. MCAssembler::layout asserts that the fixup offset is within either the fixed-size content or the optional variable-size tail, as this is the most the generic code can validate without knowing the target-specific fixup size. Many backends applyFixup assert ``` assert(Offset + Size <= F.getSize() && "Invalid fixup offset!"); ``` This refactoring allows a subsequent change to move the fixed-size content outside of MCSection::ContentStorage, fixing the -fsanitize=pointer-overflow issue of #150846 Pull Request: https://github.com/llvm/llvm-project/pull/151724
2025-08-01MCAsmBackend::applyFixup: Replace Data.getSize() with F.getSize()Fangrui Song1-1/+1
to facilitate replacing `MutableArrayRef<char> Data` (fragment content) with the relocated location. This is necessary to fix the pointer-overflow sanitizer issue and reland #150846
2025-07-30[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151294)Hood Chatham3-8/+9
Also alphebetize feature list, add `-mgc` and `-mno-gc` flags, and add some missing feature tests. Reland of #151107. https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-29Revert "[WebAssembly] Add gc target feature to addBleedingEdgeFeatures" ↵ronlieb3-9/+8
(#151268) Reverts llvm/llvm-project#151107
2025-07-29[WebAssembly] Add gc target feature to addBleedingEdgeFeatures (#151107)Hood Chatham3-8/+9
See suggestion here: https://github.com/llvm/llvm-project/pull/150201#discussion_r2237982637
2025-07-29[WebAssembly] v16i8 mul support (#150209)Sam Parker1-3/+42
During target DAG combine, use two i16x8.extmul_low_i8x16 and a shuffle for v16i8 mul. On my AArch64 machine, using V8, I observe a 3.14% geomean improvement across 65 benchmarks, including: 9.2% for spec2017.x264, 6% for libyuv and 1.8% for ncnn.
2025-07-29[WebAssemblyLowerEmscriptenEHSjLj] Avoid lifetime of phi (#150932)Nikita Popov1-0/+28
After #149310 lifetime intrinsics require an alloca argument, an invariant that this pass can break. I've fixed this in two ways: * First, move static allocas into the entry block. Currently, the way the pass splits the entry block makes all allocas dynamic, which I assume was not actually intended. This will avoid unnecessary SSA reconstruction for allocas as well, and thus avoid the problem. * If this fails (for dynamic allocas) drop all lifetime intrinsics if any one of them would require a rewrite during SSA reconstruction. Fixes https://github.com/llvm/llvm-project/issues/150498.
2025-07-28[WebAssembly] Add pattern for relaxed nmadd (#150684)Jasmine Tang1-0/+2
Following footstep of https://github.com/llvm/llvm-project/pull/147487 (support for madd), this PR adds support for nmadd. https://github.com/llvm/llvm-project/issues/55932 tracks this
2025-07-26MCSectionWasm: Remove classofFangrui Song1-2/+4
The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.
2025-07-25[WebAssembly,clang] Add __builtin_wasm_test_function_pointer_signature (#150201)Hood Chatham5-14/+22
Tests if the runtime type of the function pointer matches the static type. If this returns false, calling the function pointer will trap. Uses `@llvm.wasm.ref.test.func` added in #147486. Also adds a "gc" wasm feature to gate the use of the ref.test instruction.
2025-07-25[WebAssembly] Added vectorized version of fexp10 to the supported list (#150564)Jasmine Tang1-1/+1
Fixes https://github.com/llvm/llvm-project/issues/117200. The default behavior in TargetLoweringBase is only scalar floats on fexp are supported by default, not the vectorized version. This PR adds `ISD::FEXP10` to the supported list.
2025-07-25[WebAssemblyOptimizeReturned] Skip lifetime intrinsic usesNikita Popov1-2/+4
Replacing an alloca with a call result in a lifetime intrinsic will cause a verifier error. Fixes https://github.com/llvm/llvm-project/issues/150498.
2025-07-23[WebAssembly,llvm] Fix buildbot problems with llvm.wasm.ref.test.func (#150116)Hood Chatham1-5/+13
PR #147486 broke the sanitizer and expensive-checks buildbot. These captures were needed when toWasmValType emitted a diagnostic but are no longer needed since we changed it to an assertion failure. This removes the unneeded captures and should fix the sanitizer-buildbot. I also fixed the codegen in the wasm64 target: table.get requires an i32 but in wasm64 the function pointer is an i64. We need an additional `i32.wrap_i64` to convert it. I also added `-verify-machineinstrs` to the tests so that the test suite validates this fix. Finally, I noticed that #150201 uses a feature of the intrinsic that is not covered by the tests, namely `ptr` arguments. So I added one additional test case to ensure that it works properly. cc @dschuff
2025-07-22[WebAssembly] Unstackify registers with no uses in ExplicitLocals (#149626)Heejin Ahn1-2/+10
There are cases we end up removing some intructions that use stackified registers after RegStackify. For example, ```wasm bb.0: %0 = ... ;; %0 is stackified br_if %bb.1, %0 bb.1: ``` In this code, br_if will be removed in CFGSort, so we should unstackify %0 so that it can be correctly dropped in ExplicitLocals. Rather than handling this in case-by-case basis, this PR just unstackifies all stackifies register with no uses in the beginning of ExplicitLocals, so that they can be correctly dropped. Fixes #149097.
2025-07-22[WebAssembly] Fix warningsKazu Hirata2-1/+2
This patch fixes: llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp:126:26: error: lambda capture 'DAG' is not used [-Werror,-Wunused-lambda-capture] llvm/lib/Target/WebAssembly/WebAssemblyMCInstLower.cpp:239:28: error: unused variable 'Info' [-Werror,-Wunused-variable]
2025-07-22[WebAssembly,llvm] Add llvm.wasm.ref.test.func intrinsic (#147486)Hood Chatham4-1/+146
This adds an llvm intrinsic for WebAssembly to test the type of a function. It is intended for adding a future clang builtin ` __builtin_wasm_test_function_pointer_signature` so we can test whether calling a function pointer will fail with function signature mismatch. Since the type of a function pointer is just `ptr` we can't figure out the expected type from that. The way I figured out to encode the type was by passing 0's of the appropriate type to the intrinsic. The first argument gives the expected type of the return type and the later values give the expected type of the arguments. So ```llvm @llvm.wasm.ref.test.func(ptr %func, float 0.000000e+00, double 0.000000e+00, i32 0) ``` tests if `%func` is of type `(double, i32) -> (i32)`. It will lower to: ```wat local.get $func table.get $__indirect_function_table ref.test (double, i32) -> (i32) ``` To indicate the function should be void, I somewhat arbitrarily picked `token poison`, so the following tests for `(i32) -> ()`: ```llvm @llvm.wasm.ref.test.func(ptr %func, token poison, i32 0) ``` To lower this intrinsic, we need some place to put the type information. With `encodeFunctionSignature()` we encode the signature information into an `APInt`. We decode it in `lowerEncodedFunctionSignature` in `WebAssemblyMCInstLower.cpp`.