| Age | Commit message (Collapse) | Author | Files | Lines |
|
We already had a few MSVC STL formatters at the last release, but we
postponed the release note entry until they support PDB. The formatters
now support PDB.
There are still some left (see
https://github.com/llvm/llvm-project/issues/24834#issuecomment-3049291996),
but the most common types are formatted.
|
|
Add support for `__builtin_stack_address` builtin. The semantics match
those of GCC's builtin with the same name.
`__builtin_stack_address` returns the starting address of the stack
region that may be used by called functions. It may or may not include
the space used for on-stack arguments passed to a callee (See [GCC
Bug/121013](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121013)).
Fixes #82632.
|
|
This patch adds initial support for the ARMv9.2+ Ampere1C core.
|
|
|
|
This adds code to recognize "wasm32-wasip1", "wasm32-wasip2", and
"wasm32-wasip3" as explicit targets, and adds a deprecation warning when
the "wasm32-wasi" target is used, pointing users to the "wasm32-wasip1"
target.
Fixes #165344.
I'm filing this as a draft PR for now, as I've only just now proposed to
make this change in #165344.
|
|
Add a new metadata node `!implicit.ref` to represent an implicit
dependency between 2 symbols. The metadata is unique to AIX and gets
lowered to a relocation that adds an explicit link between the section
the global that the metadata is placed on is allocated in, to the
asscoiated symbol. This relocation will cause the associated symbol to
remain live if the section is not garbage collected. This is used mainly
for compiler features where there is some hidden runtime dependency
between the symbols that isn't otherwise obvious to the linker.
|
|
|
|
They have been deprecated for more than five years in favor of !getdagop
and !setdagop. See https://reviews.llvm.org/D89814.
|
|
Note that bitcode does not attempt to guarantee performance
parity with upgraded bitcode.
|
|
Introduce a new flag `--call-graph-info` which outputs callgraph ELF
section information to the console as a text output or as JSON output.
|
|
This new function is the same as LLVMParseIRInContext except it doesn't
take ownership of the memory buffer. This fixes a wart that has been in
place since 5ebb7b311223bcd21d2b3d25413d1edacefcc63d changed the
underlying internal API to avoid taking ownership.
Reduce nesting in the implementation of LLVMParseIRInContext (now
LLVMParseIRInContext2) as well.
Update examples, OCaml bindings, and tests including plugging some
pre-existing memory leaks. OCaml bindings have renamed `parse_ir` to
`parse_ir_bitcode_or_assembly` to provoke compilation failures in
downstream code; this is intentional as this function now requires the
memory buffer to be disposed by the caller.
|
|
(#152632)
On PowerPC targets, `half` uses the default legalization of promoting to
a `f32`. However, this has some fundamental issues related to inability
to round trip. Resolve this by switching to the soft legalization, which
passes `f16` as an `i16`.
The PowerPC ABI Specification does not define a `_Float16` type, so the
calling convention changes are acceptable.
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97975
Fixes the PowerPC part of
https://github.com/llvm/llvm-project/issues/97981
|
|
`PromoteFloat` (#152833)
The default `half` legalization, which Wasm currently uses, does not
respect IEEE conventions: for example, casting to bits may invoke a lossy
libcall, meaning soft float operations cannot be correctly implemented.
Change to the soft promotion legalization which passes `f16` as an `i16`
and treats each `half` operation as an individual
f16->f32->libcall->f32->f16 sequence.
Of note in the test updates are that `from_bits` and `to_bits` are now
libcall-free, and that chained operations now round back to `f16` after
each step.
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97981
Fixes the wasm portion of
https://github.com/llvm/llvm-project/issues/97975
Fixes: https://github.com/llvm/llvm-project/issues/96437
Fixes: https://github.com/llvm/llvm-project/issues/96438
|
|
(#170796)
This PR implements the first change outlined in
https://discourse.llvm.org/t/rfc-allow-non-constant-offsets-in-llvm-vector-splice/88974?u=lukel
In order to allow non-immediate offsets in the llvm.vector.splice
intrinsic, we need to separate out the "shift left" and "shift right"
modes into two separate intrinsics, which were previously determined by
whether or not the offset is positive or negative.
The description in the LangRef has also been reworded in terms of
sliding elements left or right and extracting either the upper or lower
half as opposed to extracting from a certain index, which brings it
inline with the definition of `llvm.fshr.*`/`llvm.fshl.*`.
This patch teaches AutoUpgrade.cpp to upgrade the old intrinsics into
their new equivalent one based on their offset, so existing uses of
vector.splice should still work.
Uses of llvm.vector.splice in `llvm/test/CodeGen` haven't been replaced
in this PR to keep the diff small and kick the tyres on the AutoUpgrader
a bit. I planned to do this in a follow up NFC but can include it in
this PR if reviewers prefer.
Similarly the shuffle costing kind `SK_Splice` has just been kept the
same for now, to be split into `SK_SpliceLeft` and `SK_SpliceRight`
later.
|
|
In line with a std proposal to introduce the llvm.clmul family of
intrinsics corresponding to carry-less multiply operations. This work
builds upon 727ee7e ([APInt] Introduce carry-less multiply primitives),
and follow-up patches will introduce custom-lowering on supported
targets, replacing target-specific clmul intrinsics.
Testing is done on the RISC-V target, which should be sufficient to
prove that the intrinsics work, since no RISC-V specific lowering has
been added.
Ref: https://isocpp.org/files/papers/P3642R3.html
Co-authored-by: Craig Topper <craig.topper@sifive.com>
|
|
Generated with the help of Gemini CLI, commands validated with local builds of LLVM and tcmalloc.
|
|
Start documenting the ABI conventions for dependency counters on
function call and return.
Stop pretending that SIInsertWaitcnts can handle anything other than the
default documented behavior.
|
|
(#171069)" (#174303)
This reverts commit 2c376ffeca490a5732e4fd6e98e5351fcf6d692a because it
breaks assembler.
```
$ llvm-mc -triple=amdgcn -mcpu=gfx1250 -show-encoding <<< "v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] matrix_b_reuse"
v_wmma_i32_16x16x64_iu8 v[16:23], v[0:7], v[8:15], v[16:23] clamp ; encoding: [0x10,0x80,0x72,0xcc,0x00,0x11,0x42,0x1c]
```
We have a fundamental issue in the clamp support in VOP3P instructions,
which will need more changes.
|
|
vectorization (#172809)
The docs for reduction vectorization currently say that
> We support floating point reduction operations when -ffast-math is
used.
This is outdated, as there are now cases where floating-point reductions
are vectorized even without -ffast-math, through ordered reduction.
This PR updates the documentation for reduction vectorization, noting
that that AArch64 and RISC-V default to ordered FP reductions being
permitted. Furthermore, an explanation of why the vectorization of FP
reduction is such a special case is added to the docs.
---------
Co-authored-by: GYT <tiborgyri@gmail.com>
Co-authored-by: Florian Hahn <flo@fhahn.com>
|
|
This change adds NVVM intrinsics and NVPTX codegen for the
`tensormap.replace` PTX instructions.
Tests are added in `tensormap_replace.ll`,
`tensormap_replace_sm_100a.ll`,
and `tensormap_replace_sm_103a.ll` and tested through `ptxas-13.0`.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-tensormap-replace
|
|
This PR introduces a new BranchOnTwoConds VPInstruction, that takes 2
boolean operands and must be placed in a block with 3 successors.
If condition I is true, branches to successor I, otherwise falls through
to check the next condition. If both conditions are false, branch to the
third successor.
This new branch recipe is used for early-exit loops, to simplify the
representation in VPlan initially, by avoid the need for splitting the
middle block early on, in a way that preserves the single-exit block
property of regions. All exits still go through the latch block, but
they can go to more than 2 successors.
This idea was part of one of the original proposals for how to model
early exits in VPlan, but at that point in time, there was no good way
to handle this during code-gen, and we went with the early split-middle
block approach initially.
Now that we dissolve regions before ::execute, the new recipe can be
lowered nicely after regions have been removed, to a set of VPBBs and
BranchOnCond recipes. The initial lowering preserves the original
structure with the split middle blocks. Follow-ups will improve the
lowering to avoid this splitting, providing performance gains.
PR: https://github.com/llvm/llvm-project/pull/172750
|
|
Fixes #166989
- Adds a clamp immediate operand to the AMDGPU WMMA iu8 intrinsic and
threads it through LLVM IR, MIR lowering, Clang builtins/tests, and MLIR
ROCDL dialect so all layers agree on the new operand
- Updates AMDGPUWmmaIntrinsicModsAB so the clamp attribute is emitted,
teaches VOP3P encoding to accept the immediate, and adjusts Clang
codegen/builtin headers plus MLIR op definitions and tests to match
- Documents what the WMMA clamp operand do
- Implement bitcode AutoUpgrade for source compatibility on WMMA IU8
Intrinsic op
Possible future enhancements:
- infer clamping as an optimization fold based on the use context
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
(#173494)
The LangRef current defines out-of-range stepvector values as poison.
This property is at odds with both the expansion used for fixed-length
vectors and the equivalent ISD node, both of which implicitly truncate
out-of-range values.
|
|
Version 0.13 of the Xqci Qualcomm uC Vendor Extension has been marked as
frozen. We've had assembler support for this since LLVM20 and code
generation support since LLVM21. I think we have enough coverage in the
code base to mark the extension as non-experimental.
|
|
Raw (as in ByteAddress) buffer accesses in DXIL must specify
ElementIndex as undef, and Structured buffer accesses must specify a
value. Ensure that we do this correctly in DXILResourceAccess, and
enforce that the operations are valid in DXILOpLowering.
Fixes #173316
|
|
This is the LLVM piece of this work. There is also a clang piece, which
adds this metadata to AllocaInst when the source does
`__attribute__((no_stack_protector))` on a variable.
We already have `__attribute__((no_stack_protector))` on functions, but
opting out the whole function might be too heavy a hammer. Instead this
allows us to opt out of stack protectors on specific allocations we
might have audited an know to be safe, but still allow the function to
generate a stack protector if other allocations necessitate it.
|
|
This patch adds three intrinsics and their corresponding Ops
representing the PTX special-register read instructions
that report various configurations of shared-memory sizes.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
|
|
This change adds full support for the ptx `barrier.cta.red` instruction,
following the same conventions as are already used for
`barrier.cta.sync` and `barrier.cta.arrive`.
In addition this MR removes the following intrinsics which are no longer
needed:
* llvm.nvvm.barrier0.popc -->
llvm.nvvm.barrier.cta.red.popc.aligned.all(0, c)
* llvm.nvvm.barrier0.and -->
llvm.nvvm.barrier.cta.red.and.aligned.all(0, z)
* llvm.nvvm.barrier0.or -->
llvm.nvvm.barrier.cta.red.or.aligned.all(0, z)
|
|
The pass now contains a non-fp expansion and should
be used for any similar expansions regardless of the
types involved. Hence a generic name seems apt.
Rename the source files, pass, and adjust the pass
description. Move all tests for the expansions
that have previously been merged into the pass
to a single directory.
|
|
Both passes expand instructions at the IR level.
They use the same kind of instruction visitation
logic and contain significant code duplication e.g.
for scalarization.
|
|
Those issues are supposed to help newcomers get familiar with the code
base and find some low-hanging fruit to work on. Using AI to fix them
makes no sense.
|
|
(#163252)
This PR implements the emitting of the post-link CFG information in PGO
analysis map, as explained in the
[RFC](https://discourse.llvm.org/t/rfc-extending-the-pgo-analysis-map-with-propeller-cfg-frequencies/88617).
This is enabled by a flag `pgo-analysis-map-emit-bb-sections-cfg`.
This PR bumps the SHT_LLVM_BB_ADDR_MAP version to 5.
Also includes some refactoring changes related to storing the CFG in the
Basic block sections profile reader.
|
|
Reland #165661 with fix for memory leak.
The call to `DummyTarget->createMCSubtargetInfo` within `mcpuHelp()`
returns a pointer that is not subsequently freed, leading to a memory
leak. Use `std::unique_ptr` to ensure the memory is released
automatically.
Original description:
---
Currently --mcpu=help and --mattr=help only produce help out when
disassembling. This patch specialises these cases to always print the
requested help.
If --triple is specified, the help text will be derived from the
specified target. Otherwise, it will be derived from the target of the
first input file.
Fixes: https://github.com/llvm/llvm-project/issues/150567
|
|
Reverts llvm/llvm-project#165661
Break
https://lab.llvm.org/buildbot/#/builders/24/builds/15720
https://lab.llvm.org/buildbot/#/builders/55/builds/21517
|
|
This PR is the first step towards introducing LFI into LLVM as a new
sub-architecture backend of AArch64. For details, please see the
[RFC](https://discourse.llvm.org/t/rfc-lightweight-fault-isolation-lfi-efficient-native-code-sandboxing-upstream-lfi-target-and-compiler-changes/88380),
which has been approved for AArch64.
This patch creates the `aarch64_lfi` architecture, and marks the
appropriate registers as reserved when it is targeted (`x25`, `x26`,
`x27`, `x28`). It also adds a Clang driver toolchain for targeting LFI,
and updates the compiler-rt CMake to allow builds for the `aarch64_lfi`
target. The patch also includes documentation for LFI and the rewrites
that will be implemented in future patches.
I am planning to split the relevant modifications for LFI into a series
of patches, organized as described below (after this one). Please let me
know if you'd like me to split the changes in a different way, or
provide one big patch.
1. The next patch will introduce the `MCLFIExpander` mechanism for
applying the MC-level rewrites needed by LFI, along with the
`.lfi_expand` and `.lfi_no_expand` assembly directives when targeting
LFI. A preview can be seen on the `lfi-project`
[fork](https://github.com/llvm/llvm-project/compare/main...lfi-project:llvm-project:lfi-patchset/aarch64-pr-2).
2. The following patch will create an `MCLFIExpander` for the AArch64
backend that performs LFI expansions. This patch will contain the
majority of the LFI-specific logic.
3. The final patch will add an optimization to the rewriter that can
eliminate redundant guard instructions that occur within the same basic
block.
We plan to introduce x86-64 support after further discussion and once
the `MCLFIExpander` infrastructure is in place.
Please let me know your feedback, and thank you very much for your help
and guidance in the review process.
|
|
Currently `--mcpu=help` and `--mattr=help` only produce help out when
disassembling. This patch specialises these cases to always print the
requested help.
If `--triple` is specified, the help text will be derived from the
specified target. Otherwise, it will be derived from the target of the
first input file.
Fixes: #150567
---------
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
|
|
This patch adds initial support for the Arm v9.3 C1 processors:
* C1-Nano
* C1-Pro
* C1-Premium
* C1-Ultra
For more information on each, see:
https://developer.arm.com/Processors/C1-Nano
https://developer.arm.com/Processors/C1-Pro
https://developer.arm.com/Processors/C1-Premium
https://developer.arm.com/Processors/C1-Ultra
Technical Reference Manual for C1-Nano:
https://developer.arm.com/documentation/107753/latest/
Technical Reference Manual for C1-Pro:
https://developer.arm.com/documentation/107771/latest/
Technical Reference Manual for C1-Premium:
https://developer.arm.com/documentation/109416/latest/
Technical Reference Manual for C1-Ultra:
https://developer.arm.com/documentation/108014/latest/
|
|
(#170861)
For both the war/raw mask, `>=` was used where it should have been `>`.
This change matches the current implementation.
The examples added in this patch should help clarify why this change is
needed.
|
|
Added support for SPV_EXT_image_raw10_raw12 extension.
|
|
We have a practice of making workflows run whenever their workflow
definition is changed to make them easy to test. Document this as a best
practice to have something to point to during code review.
|
|
This PR adds support for selecting specific archive members in
llvm-symbolizer using the `archive.a(member.o)` syntax, with
architecture-aware member selection.
**Key features:**
1. **Archive member selection syntax**: Specify archive members using
`archive.a(member.o)` format
2. **Architecture selection via `--default-arch` flag**: Select the
appropriate member when multiple members have the same name but
different architectures
3. **Architecture selection via `:arch` suffix**: Alternative syntax
`archive.a(member.o):arch` for specifying architecture
This functionality is primarily designed for AIX big archives, which can
contain multiple members with the same name but different architectures
(32-bit and 64-bit). However, the implementation works with all archive
formats (GNU, BSD, Darwin, big archive) and handles same-named members
created with llvm-ar q.
---------
Co-authored-by: Midhunesh <midhuensh.p@ibm.com>
|
|
Build examples and example plug-ins by default when running tests. If
examples are unwanted, they can still be disabled completely using
LLVM_INCLUDE_EXAMPLES=OFF. Plugin tests depend on examples and it is
beneficial to test them by default. By default, Examples will still not
be included in the default target or be installed, this remains
controlled by LLVM_BUILD_EXAMPLES (which defaults to OFF).
The additional cost for building examples for tests is 17 compilation
units (12 C++, 5 C), which should be tolerable.
I don't know how broken the examples currently are in the various build
configurations, but if we find breakage, it would be good to fix it.
Pull Request: https://github.com/llvm/llvm-project/pull/171998
|
|
Introduced in Git 2.29 (Oct 2020).
|
|
This reverts commit 3847648e84d2ff5194f605a8a9a5c0a5e5174939.
Relands https://github.com/llvm/llvm-project/pull/158043 which got
auto-merged on a revision which wasn't approved.
The only addition to the approved version was that we adjust how we set
the time for failed tests. We used to just assign it the negative value
of the elapsed time. But if the test failed with `0` seconds (which some
of the new tests do), we would mark it `-0`. But the check for whether
something failed checks for `time < 0`. That messed with the new
`--filter-failed` option of this PR. This was only an issue on Windows
CI, but presumably can happen on any platform. Happy to do this in a
separate PR.
---- Original PR
This patch adds a new --filter-failed option to llvm-lit, which when
set, will only run the tests that have previously failed.
|
|
This PR adds a platform for WebAssembly. Heavily inspired by Pavel's
QemuUser, the platform lets you configure a WebAssembly runtime to run a
Wasm binary.
For example, the following configuration can be used to launch binaries
under the WebAssembly Micro Runtime (WARM):
```
settings set -- platform.plugin.wasm.runtime-args --heap-size=1048576
settings set -- platform.plugin.wasm.port-arg -g=127.0.0.1:
settings set -- platform.plugin.wasm.runtime-path /path/to/iwasm-2.4.0
```
With the settings above, you can now launch a binary directly under
WAMR:
```
❯ lldb simple.wasm
(lldb) target create "/Users/jonas/wasm-micro-runtime/product-mini/platforms/darwin/build/simple.wasm"
Current executable set to '/Users/jonas/wasm-micro-runtime/product-mini/platforms/darwin/build/simple.wasm' (wasm32).
(lldb) b main
Breakpoint 1: 2 locations.
(lldb) r
Process 1 launched: '/Users/jonas/wasm-micro-runtime/product-mini/platforms/darwin/build/simple.wasm' (wasm32)
2 locations added to breakpoint 1
[22:28:05:124 - 16FE27000]: control thread of debug object 0x1005e9020 start
[22:28:05:124 - 16FE27000]: Debug server listening on 127.0.0.1:49170
the module name is /Users/jonas/wasm-micro-runtime/product-mini/platforms/darwin/build/simple.wasm
Process 1 stopped
* thread #1, name = 'nobody', stop reason = breakpoint 1.3
frame #0: 0x40000000000001d3 simple.wasm`main at simple.c:8:7
5 }
6
7 int main() {
-> 8 int i = 1;
9 int j = 2;
10 return add(i, j);
11 }
(lldb)
```
|
|
SwitchInst case values must be ConstantInt, which have no use list.
Therefore it is not necessary to store these as Use, instead store them
more efficiently as a simple array of pointers after the uses, similar
to how PHINode stores basic blocks.
After this change, the successors of all terminators are stored
consecutively in the operand list. This is preparatory work for
improving the performance of successor access.
Add new C API functions so that switch case values remain accessible
from bindings for other languages.
While this could also be achieved by merely changing the order of
operands (i.e., first all successors, then all constants), doing so
would increase the asymptotic runtime of addCase from O(1) to O(n)
(i.e., adding n cases would be O(n^2)), because it would need to shift
all constants by one slot. Having null/invalid operands is also a bad
idea and would cause much more breakage.
Pull Request: https://github.com/llvm/llvm-project/pull/170984
|
|
Add documentation for variadic `isa<>` in the LLVM Programmer's Manual.
|
|
Add support for specifying the names of address spaces when specifying
pointer properties for an address space. Update LLVM's AsmPrinter and
LLParser to print and read these symbolic address space name.
|
|
The NaN section says
> Floating-point math operations are allowed to treat all NaNs as if
they were quiet NaNs. For example, “pow(1.0, SNaN)” may be simplified to
1.0.
This seems worth also spelling out in the section for `pow` (and `powi`
which has a similar situation), to have everything concerning those
operations in a single place.
|
|
Reverts llvm/llvm-project#158043
This was approved for earlier revisions but the tests were failing on
Windows. I pushed a speculative fix and that fixed the CI, which caused
auto-merge to merge the PR. But I'd like to have approval for the latest
revision. So reverting for now and resubmitting a new PR
|