Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch fixes:
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp:579:11: error: unused
variable 'Subtarget' [-Werror,-Wunused-variable]
|
|
Reverted the https://github.com/llvm/llvm-project/pull/148779 changes
and
- handled the uimm9 offset in eliminateFrameIndex ()
- updated the testcase.
|
|
This adds the simplest implementation of `PreserveMost` calling
convention and we preserve `x5-x31` (except x6/x7/x28) registers.
Fixes #148147.
|
|
As of 20b5728b7b1ccc4509a316efb270d46cc9526d69, C always enables Zca, so
the check `C || Zca` is equivalent to just checking for `Zca`.
This replaces any uses of `HasStdExtCOrZca` with a new `HasStdExtZca`
(with the same assembler description, to avoid changes in error
messages), and simplifies everywhere where C++ needed to check for
either C or Zca.
The Subtarget function is just deprecated for the moment.
|
|
Corresponding gcc bug report
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110665
The generated code is pretty awful.
|
|
The QC_E_ADDI instruction from the Xqcilia extension takes a signed
26-bit immediate and can be used instead of splitting the offset across
two ADDI's while eliminating the frameindex.
|
|
Closes #130217.
https://github.com/riscv/riscv-isa-manual/blob/main/src/q-st-ext.adoc
|
|
(#139213)
|
|
This is a follow up to #133171. I realized we could assume the structure
of the previous MMO, and thus the split is much simpler than I'd
initially pictured.
|
|
The primary effect of this is that we get proper scalable sizes printed
by the assembler, but this may also enable proper aliasing analysis. I
don't see any test changes resulting from the later.
Getting the size is slightly tricky as we store the scalable size as a
non-scalable quantity in the object size field for the frame index. We
really should remove that hack at some point...
For the synthetic tuple spills and fills, I dropped the size from the
split loads and stores to avoid incorrect (overly large) sizes. We could
also divide by the NF factor if we felt like writing the code to do so.
|
|
|
|
Currently, we use `csrr` with `vlenb` to obtain the `VLEN`, but this is
not the only option. We can also use `vsetvli` with `e8`/`m1` to get
`VLENMAX`, which is equal to the VLEN. This method is preferable on some
microarchitectures and makes it easier to obtain values like `VLEN * 2`,
`VLEN * 4`, or `VLEN * 8`, reducing the number of instructions needed to
calculate VLEN multiples.
However, this approach is *NOT* always interchangeable, as it changes
the state of `VTYPE` and `VL`, which can alter the behavior of vector
instructions, potentially causing incorrect code generation if applied
after a vsetvli insertion. Therefore, we limit its use to the
prologue/epilogue for now, as there are no vector operations within the
prologue/epilogue sequence.
With further analysis, we may extend this approach beyond the
prologue/epilogue in the future, but starting here should be a good
first step.
This feature is gurded by the `+prefer-vsetvli-over-read-vlenb` feature,
which is disabled by default for now.
|
|
Currently, the spill weight is only determined by isDef/isUse and
block frequency. However, for registers with different register
classes, the costs of spilling them are different.
For example, for `LMUL>1` registers (in which, several physical
registers compound a bigger logical register), the costs are larger
than `LMUL=1` case (in which, there is only one physical register).
To solve this problem, a new target hook `getSpillWeightScaleFactor`
is added. Targets can override the default factor (which is `1.0`)
according to the register class.
For RISC-V, the factors are set to the `RegClassWeight` which is
used to track register pressure. The values of `RegClassWeight`
happen to be the number of register units.
I believe all of the targets with compounded registers can benefit
from this change, but only RISC-V is customized in this patch since
it has widely been agreed to do so. The other targets need more
performance data to go further.
Partially fixes #113489.
|
|
Found using https://github.com/codespell-project/codespell
```
codespell RISCV --write-changes \
--ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru
```
|
|
Fixes #124932.
This patch implements the getIPRACSRegs hook for RISC-V, similar to its introduction for x86 in commit 14b567d. This hook is necessary for correct code generation when Interprocedural Register Allocation (IPRA) is enabled, ensuring that the return address register (ra / x1) is correctly saved and restored when needed.
Unlike the x86 implementation, this patch only saves ra and does not yet include the frame pointer (fp). Further investigation is required to determine whether fp should also be preserved in all cases.
The test case is representative of a miscompile observed in the GCC torture suite (20090113-3.c), though similar failures occur in SPEC’s xz benchmark.
|
|
Spotted the auipc case while looking at code for P550. I'm not sure this
is the right long term fix. We're still missing rematerialization
opportunities for these pairs so a pseudo might be better. That would
interfere with folding auipc+add into load/store addressing though.
Fixes #76779.
|
|
(#115756)
This fixes
https://discourse.llvm.org/t/fixed-register-being-spill-and-restored-in-clang/83058.
We need to do it in `MachineRegisterInfo::getCalleeSavedRegs` instead of
`RISCVRegisterInfo::getCalleeSavedRegs` since the MF argument of
`TargetRegisterInfo:::getCalleeSavedRegs` is `const`, so we can't call
`MF->getRegInfo().disableCalleeSavedRegister` there.
So to put it in `MachineRegisterInfo::getCalleeSavedRegs`, we move
`isRegisterReservedByUser` into `TargetSubtargetInfo`.
|
|
This patch fixes:
llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp:476:25: error: unused
variable 'ST' [-Werror,-Wunused-variable]
|
|
We can move the logic from adjustStackForRVV into adjustReg, which
results in the remaining logic being trivially inlined to the two
callers and allows a duplicate copy of the same logic in
eliminateFrameIndex to be pruned.
|
|
Identified with misc-include-cleaner.
|
|
https://github.com/riscv-non-isa/riscv-toolchain-conventions/pull/56
Resolved https://github.com/llvm/llvm-project/issues/106700.
This enables inline asm to have vcix_state to be a clobbered register
thus disable reordering between VCIX intrinsics and inline asm.
|
|
This patches adds a 16 bit register class for use with Zhinx
instructions. This makes them more similar to Zfh instructions and
allows us to only spill 16 bits.
I've added CodeGenOnly instructions for load/store using GPRF16 as that
gave better results than insert_subreg/extract_subreg. I'm using FSGNJ
for GPRF16 copy with Zhinx as that gave better results. Zhinxmin will
use ADDI+subreg operations.
Function arguments use this new GPRF16 register class for f16 arguments
with Zhinxmin. Eliminating the need to use RISCVISD::FMV* nodes.
I plan to extend this idea to Zfinx next.
|
|
NFC (#109848)
I think the 8 here represents RVVBitsPerBlock / 8.
|
|
Since it's SiFive VCIX specific register, it's better to have a prefix
so that it's more understandable.
|
|
This makes use of the information from TableGen instead of duplicating
it in the code.
|
|
is not enabled.
We can't save vector registers without V/Zve.
|
|
RISCVRegisterInfo::needsFrameBaseReg
The vector callee saved registers shouldn't affect the frame pointer
offset so we don't want to consider them.
I've listed the GPR, FPR32, and FPR64 register classes explicitly
because getMinimalPhysRegClass is slow and this function is called
frequently. So explicitly listing the interesting classs should be
a compile time improvement.
|
|
RISCVRegisterInfo::needsFrameBaseReg.
Instead of using getReservedRegs, just check the subtarget reserved
list. getReservedRegs considers the frame pointer to be reserved when
it is being used, but we do need to save/restore it so it should be
counted as a callee saved register. AArch64 hardcodes their callee
saved size, but the comment mentions the Frame Pointer being counted.
|
|
RISCVRegisterInfo::needsFrameBaseReg.
It's already added in isFrameOffsetLegal so adding it in needsFrameBaseReg
causes it to be double counted.
|
|
Planning to declare all extensions in tablegen so we can generate the
tables for RISCVISAInfo.cpp. This requires making "e" consistent with
other extensions.
|
|
isn't used. NFC (#89163)
The callee saved size is only used if there is a frame pointer. Sink the
code onto the frame pointer only path.
|
|
As pointed out by Fraser, KillSrcReg is always false at this point in
code, and having the inconcistency on whether we check the flag between
the if and else blocks is confusing.
|
|
If we need to multiply VLENB by 2, 4, or 8 and add it to the stack
pointer, we can do so with a shNadd instead of separate shift and add
instructions.
|
|
(#87950)
If we're falling back to generic constant formation in a register +
add/sub, we can check if we have a constant which is 12-bits but left
shifted by 2 or 3. If so, we can use a sh2add or sh3add to perform the
shift and add in a single instruction.
This is profitable when the unshifted constant would require two
instructions (LUI/ADDI) to form, but is never harmful since we're going
to need at least two instructions regardless of the constant value.
Since stacks are aligned to 16 bytes by default, sh3add allows addresing
(aligned) data out to 2^14 (i.e. 16kb) in at most two instructions
w/zba.
|
|
This restructures the code to make the fact that most of
getVLENFactoredAmount is just a generic multiply w/immediate more
obvious and prepare for a couple of upcoming enhancements to this code.
Note that I plan to switch mulImm to early return, but decided I'd do
that as a separate commit to keep this diff readable.
---------
Co-authored-by: Luke Lau <luke_lau@icloud.com>
|
|
[RISCV] RISCV vector calling convention (1/2)
This is the vector calling convention based on
https://github.com/riscv-non-isa/riscv-elf-psabi-doc,
the idea is to split between "scalar" callee-saved registers
and "vector" callee-saved registers. "scalar" ones remain the
original strategy, however, "vector" ones are handled together
with RVV objects.
The stack layout would be:
|--------------------------| <-- FP
| callee-allocated save |
| area for register varargs|
|--------------------------|
| callee-saved registers | <-- scalar callee-saved
| (scalar) |
|--------------------------|
| RVV alignment padding |
|--------------------------|
| callee-saved registers | <-- vector callee-saved
| (vector) |
|--------------------------|
| RVV objects |
|--------------------------|
| padding before RVV |
|--------------------------|
| scalar local variables |
|--------------------------| <-- BP
| variable size objects |
|--------------------------| <-- SP
Note: This patch doesn't contain "tuple" type, e.g. vint32m1x2.
It will be handled in https://github.com/riscv-non-isa/riscv-elf-psabi-doc (2/2).
Differential Revision: https://reviews.llvm.org/D154576
|
|
RV32. (#85871)
I believe we can use XLen alignment as long as eliminateFrameIndex
limits the maximum folded offset to 2043. This way when we split
the load/store into two 2 instructions we'll be able to add 4
without overflowing simm12.
|
|
(#85998)
With GPR pairs from Zdinx, we can't guarantee there are no subregisters
on integer instruction operands. I've been able to get these assertions
to fire after some other recent PRs.
I've added a FIXME to support this properly. I just wanted to prevent
the assertion failure for now.
No test case because my other patch #85982 that allowed me to fail the assert
hasn't been approved yet, and I don't know for that that patch is
required to hit this assert. It's just what exposed it for me. So I
think this patch is a good precaution regardless.
|
|
getOpcode(). NFC (#85847)
|
|
registers. (#83320)
I've seen cases where the cost per use increase the number of spills.
Disabling improves the codegen for #79918.
I propose adding this option to allow easier experimentation.
|
|
We've now got enough of these in tree that we can see which patterns
appear to be idiomatic. As such, extract a helper for checking
if we know the exact VLEN.
|
|
-msave-restore/Zcmp (#81392)
PEI previously used fake frame indices for these callee saved registers.
These fake frame indices are not register with MachineFrameInfo. This
required them to be deleted form CalleeSavedInfo after PEI to avoid
breaking later passes. See #79535
Unfortunately, removing the registers from CalleeSavedInfo pessimizes
Interprocedural Register Allocation. The RegUsageInfoCollector pass runs
after PEI and uses CalleeSavedInfo.
This patch replaces #79535 by properly creating fixed stack objects
through MachineFrameInfo. This changes the stack size and offsets
returned by MachineFrameInfo which requires changes to how
RISCVFrameLowering uses that information.
In addition to the individual object for each register, I've also create
a single large fixed object that covers the entire stack area covered by
cm.push or the libcalls. cm.push must always push a multiple of 16 bytes
and the save restore libcall pushes a multiple of stack align. I think
this leaves holes in the stack where we could spill other registers, but
it matches what we did previously. Maybe we can optimize this in the
future.
The only test changes are due to stack alignment handling after the
callee save registers. Since we now have the fixed objects, on the stack
the offset is non-zero when an aligned object is processed so the offset
gets rounded up, increasing the stack size.
I suspect we might need some more updates for RVV related code. There is
very little or maybe even no testing of RVV mixed with Zcmp and
save-restore.
|
|
This patch allows VCIX instructions that have side effect to be
reordered
with memory and other side effecting instructions. However we don't want
VCIX instructions to be reordered with each other, so we propose a dummy
register called VCIX_STATE and make these instructions implicitly define
and use
it.
|
|
This hints the register allocator to use the same register for source
and destination to enable more compression.
|
|
This commit includes the necessary changes to clang and LLVM to support
codegen of `RVE` and the `ilp32e`/`lp64e` ABIs.
The differences between `RVE` and `RVI` are:
* `RVE` reduces the integer register count to 16(x0-x16).
* The ABI should be `ilp32e` for 32 bits and `lp64e` for 64 bits.
`RVE` can be combined with all current standard extensions.
The central changes in ilp32e/lp64e ABI, compared to ilp32/lp64 are:
* Only 6 integer argument registers (rather than 8).
* Only 2 callee-saved registers (rather than 12).
* A Stack Alignment of 32bits (rather than 128bits).
* ilp32e isn't compatible with D ISA extension.
If `ilp32e` or `lp64` is used with an ISA that has any of the registers
x16-x31 and f0-f31, then these registers are considered temporaries.
To be compatible with the implementation of ilp32e in GCC, we don't use
aligned registers to pass variadic arguments and set stack alignment\
to 4-bytes for types with length of 2*XLEN.
FastCC is also supported on RVE, while GHC isn't since there is only one
avaiable register.
Differential Revision: https://reviews.llvm.org/D70401
|
|
|
|
The patch adds the instructions in Zicfiss extension. Zicfiss extension
is to support shadow stack for control flow integrity. This patch is
based on version [0.3.1].
[0.3.1]: https://github.com/riscv/riscv-cfi/releases/tag/v0.3.1
|
|
Instead of using VLENB and a shift, load (VLEN/8)*LMUL directly into a
register. We could go further and use ADDI, but that would be more
intrusive to the code structure.
My primary goal is to remove the read of VLENB which might be expensive
if it's not optimized in hardware.
|
|
Adds GraalVM calling conventions. The only difference with the default calling conventions is that GraalVM reserves two registers for the heap base and the thread. Since the registers are then accessed by name, getRegisterByName has to be updated accordingly.
This patch implements the calling conventions only for X86, AArch64 and RISC-V.
For X86, the reserved registers are X14 and X15. For AArch64, they are X27 and X28. For RISC-V, they are X23 and X27.
This patch has been used by the LLVM backend of GraalVM's Native Image project in production for around 4 months with no major issues.
Differential Revision: https://reviews.llvm.org/D151107
|
|
|