Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch enables support for debug entry values. This improves quality
of debug info for RISC-V
|
|
So that it runs before `MachineCSE` and other passes.
Fixes https://github.com/llvm/llvm-project/issues/158063.
|
|
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
(#156798)"
With a dependency on the Passes library added this time.
|
|
(#156798)"
This reverts commit c51db9f6f3653859a05a073dab53274510edacff.
Getting build bot failures about undefined symbol: llvm::OptimizationLevel::O0.
|
|
As noted in [156787](https://github.com/llvm/llvm-project/issues/156787)
|
|
This patch adds basic assembler and MC layer infrastructure for
RISC-V big-endian targets (riscv32be/riscv64be):
- Register big-endian targets in RISCVTargetMachine
- Add big-endian data layout strings
- Implement endianness-aware fixup application in assembler
backend
- Add byte swapping for data fixups on BE cores
- Update MC layer components (AsmInfo, MCTargetDesc, Disassembler,
AsmParser)
This provides the foundation for BE support but does not yet include:
- Codegen patterns for BE
- Load/store instruction handling
- BE-specific subtarget features
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
Some processors benefit more from store clustering than load clustering,
and vice-versa, depending on factors that are exclusive to each one
(e.g. macrofusions implemented).
Likewise, certain optimizations benefits more from misched clustering
than postRA clustering. Macrofusions are again an example: in a
processor with store pair macrofusions, like the veyron-v1, it is
observed that misched clustering increases the amount of macrofusions
more than postRA clustering. This of course isn't necessarily true for
other processors, but it shows that processors can benefit from a more
fine grained control of clustering mutations, and each one is able to do
it differently.
Add 4 new subtarget features that deprecates the existing
riscv-misched-load-store-clustering and
riscv-postmisched-load-store-clustering
options:
- disable-misched-load-clustering and disable-misched-store-clustering:
disable load/store clustering during misched;
- disable-postmisched-load-clustering and
disable-postmisched-store-clustering:
disable load/store clustering during PostRA.
Note that the new subtarget features disables specific stages of the
default
clustering settings. The default per se (load and store clustering for
both
misched and PostRA) is left untouched.
Disable all clustering but misched-store-clustering for the veyron-v1
processor
using the new features.
|
|
With the VPlan-based canonical induction variable replacement landed in
#147222, it appears to cover the functionality previously provided by
the EVLIndVarSimplify pass introduced in #131005.
This patch suggests removing EVLIndVarSimplify from the RISC-V pipeline
as a follow-up step. Feedback is very welcome!
|
|
The RISCVVLOptimizer has been enabled by default for a while now and I'm
not aware of any outstanding issues that might need it to be disabled.
This removes the -riscv-enable-vl-optimizer flag to reduce the number of
configurations we have to support.
|
|
## Purpose
This patch is one in a series of code-mods that annotate LLVM’s public
interface for export. This patch annotates the `llvm/Target` library.
These annotations currently have no meaningful impact on the LLVM build;
however, they are a prerequisite to support an LLVM Windows DLL (shared
library) build.
## Background
This effort is tracked in #109483. Additional context is provided in
[this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307),
and documentation for `LLVM_ABI` and related annotations is found in the
LLVM repo
[here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst).
A sub-set of these changes were generated automatically using the
[Interface Definition Scanner (IDS)](https://github.com/compnerd/ids)
tool, followed formatting with `git clang-format`.
The bulk of this change is manual additions of `LLVM_ABI` to
`LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target.
Adding `LLVM_ABI` to the function implementation is required here
because they do not `#include "llvm/Support/TargetSelect.h"`, which
contains the declarations for this functions and was already updated
with `LLVM_ABI` in a previous patch. I considered patching these files
with `#include "llvm/Support/TargetSelect.h"` instead, but since
TargetSelect.h is a large file with a bunch of preprocessor x-macro
stuff in it I was concerned it would unnecessarily impact compile times.
In addition, a number of unit tests under llvm/unittests/Target required
additional dependencies to make them build correctly against the LLVM
DLL on Windows using MSVC.
## Validation
Local builds and tests to validate cross-platform compatibility. This
included llvm, clang, and lldb on the following configurations:
- Windows with MSVC
- Windows with Clang
- Linux with GCC
- Linux with Clang
- Darwin with Clang
|
|
The `RISCVIndirectBranchTracking` pass inserts `lpad` instruction and
could change the basic block alignment, so this should not happen after
the branch relaxation as the adjusted offset is possible to exceed the
branch range.
|
|
If `createMachineScheduler`/`createPostMachineScheduler` return a
`nullptr`, then we will call `createSchedLive`/`createSchedPostRA`
anyway.
We can always create the Scheduler first and simplify the following
conditions.
|
|
We rename `createGenericSchedLive` and `createGenericSchedPostRA`
to `createSchedLive` and `createSchedPostRA`, and add a template
parameter `Strategy` which is the generic implementation by default.
This can simplify some code for targets that have custom scheduler
strategy.
|
|
Fix RISC-V Indirect Branch Tracking pass was not showing in the pass
debug output due to not initialized properly.
|
|
(#131005)
When we enable EVL-based loop vectorization w/ predicated tail-folding,
each vectorized loop has effectively two induction variables: one
calculates the step using (VF x vscale) and the other one increases the
IV by values returned from experiment.get.vector.length. The former,
also known as canonical IV, is more favorable for analyses as it's
"countable" in the sense of SCEV; the latter (EVL-based IV), however, is
more favorable to codegen, at least for those that support scalable
vectors like AArch64 SVE and RISC-V.
The idea is that we use canonical IV all the way until the end of all
vectorizers, where we replace it with EVL-based IV using EVLIVSimplify
introduced here. Such that we can have the best from both worlds.
This Pass is enabled by default in RISC-V. However, since we haven't
really vectorize loops with predicate tail-folding by default, this Pass
is no-op at this moment.
|
|
Register assembly printer passes in the pass registry.
This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests.
Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow
targets to use the `INITIALIZE_PASS` macros and register the pass in the
pass registry. This currently has a default parameter so it won't break
any targets that have not been updated.
|
|
Replace "concept based polymorphism" with simpler PImpl idiom.
This pursues two goals:
* Enforce static type checking. Previously, target implementations hid
base class methods and type checking was impossible. Now that they
override the methods, the compiler will complain on mismatched
signatures.
* Make the code easier to navigate. Previously, if you asked your
favorite LSP server to show a method (e.g. `getInstructionCost()`), it
would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`,
`TTIImplBase`, and target overrides. Now it is two less :)
There are three commits to hopefully simplify the review.
The first commit removes `TTI::Model`. This is done by deriving
`TargetTransformInfoImplBase` from `TTI::Concept`. This is possible
because they implement the same set of interfaces with identical
signatures.
The first commit makes `TargetTransformImplBase` polymorphic, which
means all derived classes should `override` its methods. This is done in
second commit to make the first one smaller. It appeared infeasible to
extract this into a separate PR because the first commit landed
separately would result in tons of `-Woverloaded-virtual` warnings (and
break `-Werror` builds).
The third commit eliminates `TTI::Concept` by merging it with the only
derived class `TargetTransformImplBase`. This commit could be extracted
into a separate PR, but it touches the same lines in
`TargetTransformInfoImpl.h` (removes `override` added by the second
commit and adds `virtual`), so I thought it may make sense to land these
two commits together.
Pull Request: https://github.com/llvm/llvm-project/pull/136674
|
|
- Move calls to pass initialization functions to RISCV target
initialization and remove them from pass constructors.
|
|
This patch is an alternative to PRs #117060, #131684, #131728.
The patch adds a late optimization pass that replaces conditional
branches that can be statically evaluated with an unconditinal branch.
Adding Michael as a co-author as most of the code that evaluates the
condition comes from #131684.
Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
|
|
Introduce RISCVLoadStoreOptimizer MIR Pass that will do the
optimization. The load/store pairing pass identifies adjacent load/store
instructions operating on consecutive memory locations and merges them
into a single paired instruction.
This is part of MIPS extensions for the p8700 CPU.
Production of ldp/sdp instructions is OFF by default, since it is
beneficial for -Os only in the case of p8700 CPU.
|
|
This is the follow up to #125026 that keeps mask operands in virtual
register form for as long as possible throughout the backend.
The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer
kicking in.
The invariant that the mask COPY never has a subreg no longer holds
after MachineCSE (it coalesces some copies), so it needed to be relaxed.
|
|
load/store address. (#127151)"
Tests have been re-generated with recent scheduler changes.
Original message:
SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.
One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.
This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.
This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.
This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
|
|
load/store address. (#127151)"
This reverts commit c3ebbfd7368ec3e4737427eef602296a868a4ecd.
Seeing some test failures on the build bot.
|
|
address. (#127151)
SelectionDAG will not reassociate adds to the end of a chain if
there are multiple users of later additions. This prevents isel
from folding the immediate into a load/store address.
One easy way to see this is accessing an array in a struct with
two different indices. An ADDI will be used to get to the start
of the array then 2 different SHXADD instructions will be used to
add the scaled indices. Finally the SHXADD will be used by different
load instructions. We can remove the ADDI by folding the offset into
each load.
This patch adds a new pass that analyzes how an ADDI constant
propagates through address arithmetic. If the arithmetic is only
used by a load/store and the offset is small enough, we can adjust
the load/store offset and remove the ADDI.
This pass is placed before MachineCSE to allow cleanups if some
instructions become common after removing offsets from their inputs.
This pass gives ~3% improvement on dynamic instruction count on
541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's
a ~1% improvement on 557.xz_r.
|
|
(#125026)
This is another attempt at #88496 to keep mask operands in SSA after
instruction selection.
Previously we selected the mask operands into vmv0, a singleton register
class with exactly one register, V0.
But the register allocator doesn't really support singleton register
classes and we ran into errors like "ran out of registers during
register allocation in function".
This avoids this by introducing a pass just before register allocation
that converts any use of vmv0 to a copy to $v0, i.e. what isel currently
does today.
That way the register allocator doesn't need to deal with the singleton
register class, but we get the benefits of having the mask registers in
SSA throughout the backend:
- This allows RISCVVLOptimizer to reduce the VLs of instructions that
define mask registers
- It enables CSE and code sinking in more places
- It removes the need to peek through mask copies in RISCVISelDAGToDAG
and keep track of V0 defs in RISCVVectorPeephole
This patch initially eliminates uses of vmv0s after RISCVVectorPeephole
to keep the diff to a minimum, and a follow up patch will move it past
the other MachineInstr SSA passes.
Note that it doesn't try to remove any defs of vmv0 as we shouldn't have
any instructions that have any vmv0 outputs.
As a further follow up, we can move the elimination pass to after phi
elimination and outside of SSA, which would unblock the pre-RA scheduler
around masked pseudos. This might also help the issue that
RISCVVectorMaskDAGMutation tries to solve.
|
|
Found using https://github.com/codespell-project/codespell
```
codespell RISCV --write-changes \
--ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru
```
|
|
The createSIMachineScheduler & createPostMachineScheduler
target hooks are currently placed in the PassConfig interface.
Moving it out to TargetMachine so that both legacy and
the new pass manager can effectively use them.
|
|
Adding two extensions for MIPS p8700 CPU:
1. cmove (conditional move)
2. lsp (load/store pair)
The official product page here:
https://mips.com/products/hardware/p8700
|
|
This patch adds basic support of `MachinePipeliner` and disable
it by default.
The functionality should be OK and all llvm-test-suite tests have
passed.
|
|
Now that we have testing of all instructions in the isSupportedInstr
switch, and better coverage of getOperandInfo, I think it is a good time
to enable this by default.
|
|
This follows up #115495 by enabling merging of external globals by
default, which had been left as a next step in order to make the
previous change more incremental and so we can more easily narrow down
on any identified regressions.
Enabling merging of external globals matches what Arm does (for non
mach-o targets), though AArch64 doesn't as there were [some
concerns](https://reviews.llvm.org/D61947) it might cause regressions in
some cases.
See https://github.com/llvm/llvm-project/pull/117880 for benchmark figures and discussion.
|
|
Enable `-fstack-clash-protection` for RISCV and stack probe for function
prologues.
We probe the stack by creating a loop that allocates and probe the stack
in ProbeSize chunks.
We emit an unrolled probe loop for small allocations and emit a variable
length probe loop for bigger ones.
|
|
Here we add a scheduling mutation in pre-ra scheduling, which will
add an artificial dependency edge between mask producer and its
previous nearest instruction that uses V0 register.
This prevents the overlap of live intervals of mask registers and
as a consequence we can reduce some spills/moves.
From the test changes, we can see some improvements and also some
regressions (more vtype toggles).
Partially fixes #113489.
|
|
From the discussion at the round-table at the RISC-V Summit it was clear
people see cases where global merging would help. So the direction of
enabling it by default and iteratively working to enable it in more
cases or to improve the heuristics seems sensible. This patch tries to
make a minimal step in that direction.
|
|
Following discussions in #110443, and the following earlier discussions
in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html,
https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this
PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine`
interface classes. More specifically:
1. Makes `TargetMachine` the only class implemented under
`TargetMachine.h` in the `Target` library.
2. `TargetMachine` contains target-specific interface functions that
relate to IR/CodeGen/MC constructs, whereas before (at least on paper)
it was supposed to have only IR/MC constructs. Any Target that doesn't
want to use the independent code generator simply does not implement
them, and returns either `false` or `nullptr`.
3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming
aims to make the purpose of `LLVMTargetMachine` clearer. Its interface
was moved under the CodeGen library, to further emphasis its usage in
Targets that use CodeGen directly.
4. Makes `TargetMachine` the only interface used across LLVM and its
projects. With these changes, `CodeGenCommonTMImpl` is simply a set of
shared function implementations of `TargetMachine`, and CodeGen users
don't need to static cast to `LLVMTargetMachine` every time they need a
CodeGen-specific feature of the `TargetMachine`.
5. More importantly, does not change any requirements regarding library
linking.
cc @arsenm @aeubanks
|
|
Identified with misc-include-cleaner.
|
|
by default (#115484)
AArch64 left this disabled after seeing some cases of slightly worse
codegen that weren't tracked down, so I suggest as a path to
incrementally moving towards enable globals merging we follow suit, and
evaluate turning on later.
This patch disables merging of external globals, but also adds a flag to
override that. This reduces churn in test cases, simplifies benchmarking
runs, and this flag can be removed later.
A follow-on PR enables the globals merging pass by default (and as it's
based on this commit, merging of external globals is disabled just as
they are for AArch64).
|
|
#73789 added load clustering and #73796 tried to add store clustering.
If post machine schedule is used, previous cluster of load/store which
formed in machine schedule may break. In order to solve this, add
load/sotre clustering to post machine schedule.
|
|
This patch adds CFI instructions in the function epilogue.
Before patch:
addi sp, s0, -32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
addi sp, sp, 32
ret
After patch:
addi sp, s0, -32
.cfi_def_cfa sp, 32
ld ra, 24(sp) # 8-byte Folded Reload
ld s0, 16(sp) # 8-byte Folded Reload
ld s1, 8(sp) # 8-byte Folded Reload
.cfi_restore ra
.cfi_restore s0
.cfi_restore s1
addi sp, sp, 32
.cfi_def_cfa_offset 0
ret
This functionality is already present in `riscv-gcc`, but it’s not in
`clang` and this slightly impairs the `lldb` debugging experience, e.g.
backtrace.
|
|
Now that LLVM 19.1.0 has been out for a while with post-vector-RA
vsetvli insertion enabled by default, this proposes to remove the flag
that restores the old pre-RA behaviour so we only have one configuration
going forward.
That flag was mainly meant as a fallback in case users ran into issues,
but I haven't seen anything reported so far.
|
|
If a pseudo has a passthru, I believe the first source operand will have
operand no 2, not 1.
|
|
Builds on #73789, enabling store clustering by default using the same
heuristic.
|
|
The purpose of this optimization is to make the VL argument, for
instructions that have a VL argument, as small as possible. This is
implemented by visiting each instruction in reverse order and checking
that if it has a VL argument, whether the VL can be reduced.
By putting this pass before VSETVLI insertion, we see three kinds of
changes to generated code:
1. Eliminate VSETVLI instructions
2. Reduce the VL toggle on VSETVLI instructions that also change vtype
3. Reduce the VL set by a VSETVLI instruction
The list of supported instructions is currently whitelisted for safety.
In the future, we could add more instructions to `isSupportedInstr` to
support even more VL optimization.
We originally wrote this pass because vector GEP instructions do not
take a VL, which leads us to emit code that uses VL=VLMAX to implement
GEP in the RISC-V backend. As a result, some of the vector instructions
will write to lanes, specifically between the intended VL and VLMAX,
that will never be read. As an alternative to this pass, we considered
adding a vector predicated GEP instruction, but this would not fit well
into the intrinsic type system since GEP has a variable number of
arguments, each with arbitrary types. The second approach we considered
was to put this pass after VSETVLI insertion, but we found that it was
more difficult to recognize optimization opportunities, especially
across basic block boundaries -- the data flow analysis was also a bit
more expensive and complex.
While this pass solves the GEP problem, we have expanded it to handle
more cases of VL optimization, and there is opportunity for the analysis
to be improved to enable even more optimization. We have a few follow up
patches to post, but figured this would be a good start.
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
|
|
(#110755)
|
|
We believe this is neutral or slightly better in the majority of cases.
|
|
This reverts commit 64972834c193632cbc47e54c0f0c721636b077e6.
Based on the discussions in #108991 that happened post merge, we have decided
to remove this pass in favor of generating `RISCV::G_*` opcodes in the legalizer.
We may reconsider moving that code elsewhere in the future so that we can do
a better job during generic combines. We don't feel that doing it in instruciton
selection is the right decision today. Firstly, it requires us to manually
do regbankselect on the newly introduced instructions. Secondly, it is more
difficult to test since the test output will contain whatever `RISCV::G_*`
instructions select to (instead of `RISCV::G_*`).
My personal opinion is that the legalizer pass can be split into an early
legalizer and a late legalizer, both before regbankselect. The first legalizer
would not introduce target specific generic opcodes and the generic combiner
would run after it. The second legalizer would introduce the target specific
generic opcodes. I think this approach is better than the lowerer because the
legalizer guarantees that whatever we lower to is legal, and apparently because
it is more performant at compared to the lowerer (although, I'm not sure how
true this is).
|
|
(#101023)
A recent atomics ABI change / fix requires that for the "A6C" and A6S"
atomics ABIs (i.e. both of those supported by LLVM currently), an
additional fence is inserted for an atomic_compare_exchange with seq_cst
failure ordering.
<https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445>
This isn't trivial to support through the hooks used by AtomicExpandPass
because that pass assumes that when fences are inserted, the original
atomics ordering information can be removed from the instruction. Rather
than try to change and complicate that API, this patch implements the
needed fence insertion through a small special purpose pass.
|
|
This is mostly a copy of the AArch64PostLegalizerLoweringPass, except it
removes all of the AArch64 combines.
This pass allows us to lower instructions after the generic
post-legalization combiner has had a chance to run.
We will be adding combines to this pass in future patches.
|