riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-12	[RISCV] Enabled debug entry support by default (#157703)	Georgiy Samoylov	1	-0/+3
	This patch enables support for debug entry values. This improves quality of debug info for RISC-V
2025-09-12	[RISCV] Move MachineCombiner to addILPOpts() (#158071)	Pengcheng Wang	1	-3/+8
	So that it runs before `MachineCSE` and other passes. Fixes https://github.com/llvm/llvm-project/issues/158063.
2025-09-11	[llvm] Move data layout string computation to TargetParser (#157612)	Reid Kleckner	1	-36/+4
	Clang and other frontends generally need the LLVM data layout string in order to generate LLVM IR modules for LLVM. MLIR clients often need it as well, since MLIR users often lower to LLVM IR. Before this change, the LLVM datalayout string was computed in the LLVM${TGT}CodeGen library in the relevant TargetMachine subclass. However, none of the logic for computing the data layout string requires any details of code generation. Clients who want to avoid duplicating this information were forced to link in LLVMCodeGen and all registered targets, leading to bloated binaries. This happened in PR #145899, which measurably increased binary size for some of our users. By moving this information to the TargetParser library, we can delete the duplicate datalayout strings in Clang, and retain the ability to generate IR for unregistered targets. This is intended to be a very mechanical LLVM-only change, but there is an immediately obvious follow-up to clang, which will be prepared separately. The vast majority of data layouts are computable with two inputs: the triple and the "ABI name". There is only one exception, NVPTX, which has a cl::opt to enable short device pointers. I invented a "shortptr" ABI name to pass this option through the target independent interface. Everything else fits. Mips is a bit awkward because it uses a special MipsABIInfo abstraction, which includes members with codegen-like concepts like ABI physical registers that can't live in TargetParser. I think the string logic of looking for "n32" "n64" etc is reasonable to duplicate. We have plenty of other minor duplication to preserve layering. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
2025-09-04	Recommit "[RISCV] Don't run loop-idiom-vectorize pass in the O0 pipeline. ↵	Craig Topper	1	-1/+2
	(#156798)" With a dependency on the Passes library added this time.
2025-09-04	Revert "[RISCV] Don't run loop-idiom-vectorize pass in the O0 pipeline. ↵	Craig Topper	1	-2/+1
	(#156798)" This reverts commit c51db9f6f3653859a05a073dab53274510edacff. Getting build bot failures about undefined symbol: llvm::OptimizationLevel::O0.
2025-09-04	[RISCV] Don't run loop-idiom-vectorize pass in the O0 pipeline. (#156798)	Craig Topper	1	-1/+2
	As noted in [156787](https://github.com/llvm/llvm-project/issues/156787)
2025-08-22	[RISCV] Add initial assembler/MC layer support for big-endian (#146534)	Djordje Todorovic	1	-11/+29
	This patch adds basic assembler and MC layer infrastructure for RISC-V big-endian targets (riscv32be/riscv64be): - Register big-endian targets in RISCVTargetMachine - Add big-endian data layout strings - Implement endianness-aware fixup application in assembler backend - Add byte swapping for data fixups on BE cores - Update MC layer components (AsmInfo, MCTargetDesc, Disassembler, AsmParser) This provides the foundation for BE support but does not yet include: - Codegen patterns for BE - Load/store instruction handling - BE-specific subtarget features
2025-08-17	[llvm] Remove unused includes (NFC) (#154051)	Kazu Hirata	1	-1/+0
	These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-08-06	[RISCV] add load/store misched/PostRA subtarget features (#149409)	Daniel Henrique Barboza	1	-15/+10
	Some processors benefit more from store clustering than load clustering, and vice-versa, depending on factors that are exclusive to each one (e.g. macrofusions implemented). Likewise, certain optimizations benefits more from misched clustering than postRA clustering. Macrofusions are again an example: in a processor with store pair macrofusions, like the veyron-v1, it is observed that misched clustering increases the amount of macrofusions more than postRA clustering. This of course isn't necessarily true for other processors, but it shows that processors can benefit from a more fine grained control of clustering mutations, and each one is able to do it differently. Add 4 new subtarget features that deprecates the existing riscv-misched-load-store-clustering and riscv-postmisched-load-store-clustering options: - disable-misched-load-clustering and disable-misched-store-clustering: disable load/store clustering during misched; - disable-postmisched-load-clustering and disable-postmisched-store-clustering: disable load/store clustering during PostRA. Note that the new subtarget features disables specific stages of the default clustering settings. The default per se (load and store clustering for both misched and PostRA) is left untouched. Disable all clustering but misched-store-clustering for the veyron-v1 processor using the new features.
2025-08-05	[RISCV][EVL] Drop EVLIndVarSimplifyPass from the pipeline (#151483)	Shih-Po Hung	1	-6/+0
	With the VPlan-based canonical induction variable replacement landed in #147222, it appears to cover the functionality previously provided by the EVLIndVarSimplify pass introduced in #131005. This patch suggests removing EVLIndVarSimplify from the RISC-V pipeline as a follow-up step. Feedback is very welcome!
2025-07-25	[RISCV] Remove -riscv-enable-vl-optimizer flag (#149349)	Luke Lau	1	-7/+1
	The RISCVVLOptimizer has been enabled by default for a while now and I'm not aware of any outstanding issues that might need it to be disabled. This removes the -riscv-enable-vl-optimizer flag to reduce the number of configurations we have to support.
2025-06-17	[llvm] annotate interfaces in llvm/Target for DLL export (#143615)	Andrew Rogers	1	-1/+2
	## Purpose This patch is one in a series of code-mods that annotate LLVM’s public interface for export. This patch annotates the `llvm/Target` library. These annotations currently have no meaningful impact on the LLVM build; however, they are a prerequisite to support an LLVM Windows DLL (shared library) build. ## Background This effort is tracked in #109483. Additional context is provided in [this discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307), and documentation for `LLVM_ABI` and related annotations is found in the LLVM repo [here](https://github.com/llvm/llvm-project/blob/main/llvm/docs/InterfaceExportAnnotations.rst). A sub-set of these changes were generated automatically using the [Interface Definition Scanner (IDS)](https://github.com/compnerd/ids) tool, followed formatting with `git clang-format`. The bulk of this change is manual additions of `LLVM_ABI` to `LLVMInitializeX` functions defined in .cpp files under llvm/lib/Target. Adding `LLVM_ABI` to the function implementation is required here because they do not `#include "llvm/Support/TargetSelect.h"`, which contains the declarations for this functions and was already updated with `LLVM_ABI` in a previous patch. I considered patching these files with `#include "llvm/Support/TargetSelect.h"` instead, but since TargetSelect.h is a large file with a bunch of preprocessor x-macro stuff in it I was concerned it would unnecessarily impact compile times. In addition, a number of unit tests under llvm/unittests/Target required additional dependencies to make them build correctly against the LLVM DLL on Windows using MSVC. ## Validation Local builds and tests to validate cross-platform compatibility. This included llvm, clang, and lldb on the following configurations: - Windows with MSVC - Windows with Clang - Linux with GCC - Linux with Clang - Darwin with Clang
2025-06-17	[RISCV] Move RISCVIndirectBranchTracking before Branch Relaxation (#139993)	Jesse Huang	1	-1/+4
	The `RISCVIndirectBranchTracking` pass inserts `lpad` instruction and could change the basic block alignment, so this should not happen after the branch relaxation as the adjusted offset is possible to exceed the branch range.
2025-06-03	[RISCV][NFC] Simplify the creation of Scheduler (#142553)	Pengcheng Wang	1	-7/+4
	If `createMachineScheduler`/`createPostMachineScheduler` return a `nullptr`, then we will call `createSchedLive`/`createSchedPostRA` anyway. We can always create the Scheduler first and simplify the following conditions.
2025-06-03	[MISched] Add templates for creating custom schedulers (#141935)	Pengcheng Wang	1	-3/+3
	We rename `createGenericSchedLive` and `createGenericSchedPostRA` to `createSchedLive` and `createSchedPostRA`, and add a template parameter `Strategy` which is the generic implementation by default. This can simplify some code for targets that have custom scheduler strategy.
2025-05-28	[RISCV] Make RISCVIndirectBranchTracking visible in debug output (#141623)	Jesse Huang	1	-0/+1
	Fix RISC-V Indirect Branch Tracking pass was not showing in the pass debug output due to not initialized properly.
2025-05-14	[LV][EVL] Introduce the EVLIndVarSimplify Pass for EVL-vectorized loops ↵	Min-Yih Hsu	1	-0/+7
	(#131005) When we enable EVL-based loop vectorization w/ predicated tail-folding, each vectorized loop has effectively two induction variables: one calculates the step using (VF x vscale) and the other one increases the IV by values returned from experiment.get.vector.length. The former, also known as canonical IV, is more favorable for analyses as it's "countable" in the sense of SCEV; the latter (EVL-based IV), however, is more favorable to codegen, at least for those that support scalable vectors like AArch64 SVE and RISC-V. The idea is that we use canonical IV all the way until the end of all vectorizers, where we replace it with EVL-based IV using EVLIVSimplify introduced here. Such that we can have the best from both worlds. This Pass is enabled by default in RISC-V. However, since we haven't really vectorize loops with predicate tail-folding by default, this Pass is no-op at this moment.
2025-05-06	Register assembly printer passes (#138348)	Matthias Braun	1	-0/+1
	Register assembly printer passes in the pass registry. This makes it possible to use `llc -start-before=<target>-asm-printer ...` in tests. Adds a `char &ID` parameter to the AssemblyPrinter constructor to allow targets to use the `INITIALIZE_PASS` macros and register the pass in the pass registry. This currently has a default parameter so it won't break any targets that have not been updated.
2025-04-26	[TTI] Simplify implementation (NFCI) (#136674)	Sergei Barannikov	1	-1/+1
	Replace "concept based polymorphism" with simpler PImpl idiom. This pursues two goals: * Enforce static type checking. Previously, target implementations hid base class methods and type checking was impossible. Now that they override the methods, the compiler will complain on mismatched signatures. * Make the code easier to navigate. Previously, if you asked your favorite LSP server to show a method (e.g. `getInstructionCost()`), it would show you methods from `TTI`, `TTI::Concept`, `TTI::Model`, `TTIImplBase`, and target overrides. Now it is two less :) There are three commits to hopefully simplify the review. The first commit removes `TTI::Model`. This is done by deriving `TargetTransformInfoImplBase` from `TTI::Concept`. This is possible because they implement the same set of interfaces with identical signatures. The first commit makes `TargetTransformImplBase` polymorphic, which means all derived classes should `override` its methods. This is done in second commit to make the first one smaller. It appeared infeasible to extract this into a separate PR because the first commit landed separately would result in tons of `-Woverloaded-virtual` warnings (and break `-Werror` builds). The third commit eliminates `TTI::Concept` by merging it with the only derived class `TargetTransformImplBase`. This commit could be extracted into a separate PR, but it touches the same lines in `TargetTransformInfoImpl.h` (removes `override` added by the second commit and adds `virtual`), so I thought it may make sense to land these two commits together. Pull Request: https://github.com/llvm/llvm-project/pull/136674
2025-04-03	[NFC][LLVM][RISCV] Cleanup pass initialization for RISCV (#134279)	Rahul Joshi	1	-0/+2
	- Move calls to pass initialization functions to RISCV target initialization and remove them from pass constructors.
2025-03-27	[RISCV] Add late optimization pass for riscv (#133256)	Mikhail R. Gadelha	1	-0/+3
	This patch is an alternative to PRs #117060, #131684, #131728. The patch adds a late optimization pass that replaces conditional branches that can be statically evaluated with an unconditinal branch. Adding Michael as a co-author as most of the code that evaluates the condition comes from #131684. Co-authored-by: Michael Maitland michaeltmaitland@gmail.com
2025-03-07	[RISCV] Generate MIPS load/store pair instructions (#124717)	Djordje Todorovic	1	-0/+3
	Introduce RISCVLoadStoreOptimizer MIR Pass that will do the optimization. The load/store pairing pass identifies adjacent load/store instructions operating on consecutive memory locations and merges them into a single paired instruction. This is part of MIPS extensions for the p8700 CPU. Production of ldp/sdp instructions is OFF by default, since it is beneficial for -Os only in the case of p8700 CPU.
2025-02-20	[RISCV] Move VMV0 elimination past machine SSA opts (#126850)	Luke Lau	1	-6/+2
	This is the follow up to #125026 that keeps mask operands in virtual register form for as long as possible throughout the backend. The diffs in this patch are from MachineCSE/MachineSink/RISCVVLOptimizer kicking in. The invariant that the mask COPY never has a subreg no longer holds after MachineCSE (it coalesces some copies), so it needed to be relaxed.
2025-02-19	Recommit "[RISCV] Add a pass to remove ADDI by reassociating to fold into ↵	Craig Topper	1	-0/+2
	load/store address. (#127151)" Tests have been re-generated with recent scheduler changes. Original message: SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.
2025-02-19	Revert "[RISCV] Add a pass to remove ADDI by reassociating to fold into ↵	Craig Topper	1	-2/+0
	load/store address. (#127151)" This reverts commit c3ebbfd7368ec3e4737427eef602296a868a4ecd. Seeing some test failures on the build bot.
2025-02-19	[RISCV] Add a pass to remove ADDI by reassociating to fold into load/store ↵	Craig Topper	1	-0/+2
	address. (#127151) SelectionDAG will not reassociate adds to the end of a chain if there are multiple users of later additions. This prevents isel from folding the immediate into a load/store address. One easy way to see this is accessing an array in a struct with two different indices. An ADDI will be used to get to the start of the array then 2 different SHXADD instructions will be used to add the scaled indices. Finally the SHXADD will be used by different load instructions. We can remove the ADDI by folding the offset into each load. This patch adds a new pass that analyzes how an ADDI constant propagates through address arithmetic. If the arithmetic is only used by a load/store and the offset is small enough, we can adjust the load/store offset and remove the ADDI. This pass is placed before MachineCSE to allow cleanups if some instructions become common after removing offsets from their inputs. This pass gives ~3% improvement on dynamic instruction count on 541.leela_r and 544.nab_r from SPEC2017 for the train data set. There's a ~1% improvement on 557.xz_r.
2025-02-12	[RISCV] Select mask operands as virtual registers and eliminate uses of vmv0 ↵	Luke Lau	1	-0/+7
	(#125026) This is another attempt at #88496 to keep mask operands in SSA after instruction selection. Previously we selected the mask operands into vmv0, a singleton register class with exactly one register, V0. But the register allocator doesn't really support singleton register classes and we ran into errors like "ran out of registers during register allocation in function". This avoids this by introducing a pass just before register allocation that converts any use of vmv0 to a copy to $v0, i.e. what isel currently does today. That way the register allocator doesn't need to deal with the singleton register class, but we get the benefits of having the mask registers in SSA throughout the backend: - This allows RISCVVLOptimizer to reduce the VLs of instructions that define mask registers - It enables CSE and code sinking in more places - It removes the need to peek through mask copies in RISCVISelDAGToDAG and keep track of V0 defs in RISCVVectorPeephole This patch initially eliminates uses of vmv0s after RISCVVectorPeephole to keep the diff to a minimum, and a follow up patch will move it past the other MachineInstr SSA passes. Note that it doesn't try to remove any defs of vmv0 as we shouldn't have any instructions that have any vmv0 outputs. As a further follow up, we can move the elimination pass to after phi elimination and outside of SSA, which would unblock the pre-RA scheduler around masked pseudos. This might also help the issue that RISCVVectorMaskDAGMutation tries to solve.
2025-02-07	[RISCV] Fix typos discovered by codespell (NFC) (#126191)	Sudharsan Veeravalli	1	-1/+1
	Found using https://github.com/codespell-project/codespell ``` codespell RISCV --write-changes \ --ignore-words-list=FPR,fpr,VAs,ORE,WorstCase,hart,sie,MIs,FLE,fle,CarryIn,vor,OLT,VILL,vill,bu,pass-thru ```
2025-02-05	[CodeGen] Move MISched target hooks into TargetMachine (#125700)	Christudasan Devadasan	1	-33/+33
	The createSIMachineScheduler & createPostMachineScheduler target hooks are currently placed in the PassConfig interface. Moving it out to TargetMachine so that both legacy and the new pass manager can effectively use them.
2025-01-28	[RISCV] Add MIPS extensions (#121394)	Djordje Todorovic	1	-0/+1
	Adding two extensions for MIPS p8700 CPU: 1. cmove (conditional move) 2. lsp (load/store pair) The official product page here: https://mips.com/products/hardware/p8700
2024-12-19	[RISCV] Add software pipeliner support (#117546)	Pengcheng Wang	1	-0/+8
	This patch adds basic support of `MachinePipeliner` and disable it by default. The functionality should be OK and all llvm-test-suite tests have passed.
2024-12-17	[RISCV][VLOPT] Enable the RISCVVLOptimizer by default (#119461)	Michael Maitland	1	-1/+1
	Now that we have testing of all instructions in the isSupportedInstr switch, and better coverage of getOperandInfo, I think it is a good time to enable this by default.
2024-12-11	[RISCV] Enable merging of external globals by default (#117880)	Alex Bradbury	1	-9/+1
	This follows up #115495 by enabling merging of external globals by default, which had been left as a next step in order to make the previous change more incremental and so we can more easily narrow down on any identified regressions. Enabling merging of external globals matches what Arm does (for non mach-o targets), though AArch64 doesn't as there were [some concerns](https://reviews.llvm.org/D61947) it might cause regressions in some cases. See https://github.com/llvm/llvm-project/pull/117880 for benchmark figures and discussion.
2024-12-10	[RISCV] Add stack clash protection (#117612)	Raphael Moreira Zinsly	1	-2/+2
	Enable `-fstack-clash-protection` for RISCV and stack probe for function prologues. We probe the stack by creating a loop that allocates and probe the stack in ProbeSize chunks. We emit an unrolled probe loop for small allocations and emit a variable length probe loop for bigger ones.
2024-11-29	[RISCV] Set a barrier between mask producer and user of V0 (#114012)	Pengcheng Wang	1	-0/+11
	Here we add a scheduling mutation in pre-ra scheduling, which will add an artificial dependency edge between mask producer and its previous nearest instruction that uses V0 register. This prevents the overlap of live intervals of mask registers and as a consequence we can reduce some spills/moves. From the test changes, we can see some improvements and also some regressions (more vtype toggles). Partially fixes #113489.
2024-11-15	[RISCV] Enable global merging by default (#115495)	Alex Bradbury	1	-1/+7
	From the discussion at the round-table at the RISC-V Summit it was clear people see cases where global merging would help. So the direction of enabling it by default and iteratively working to enable it in more cases or to improve the heuristics seems sensible. This patch tries to make a minimal step in that direction.
2024-11-14	Overhaul the TargetMachine and LLVMTargetMachine Classes (#111234)	Matin Raayai	1	-3/+3
	Following discussions in #110443, and the following earlier discussions in https://lists.llvm.org/pipermail/llvm-dev/2017-October/117907.html, https://reviews.llvm.org/D38482, https://reviews.llvm.org/D38489, this PR attempts to overhaul the `TargetMachine` and `LLVMTargetMachine` interface classes. More specifically: 1. Makes `TargetMachine` the only class implemented under `TargetMachine.h` in the `Target` library. 2. `TargetMachine` contains target-specific interface functions that relate to IR/CodeGen/MC constructs, whereas before (at least on paper) it was supposed to have only IR/MC constructs. Any Target that doesn't want to use the independent code generator simply does not implement them, and returns either `false` or `nullptr`. 3. Renames `LLVMTargetMachine` to `CodeGenCommonTMImpl`. This renaming aims to make the purpose of `LLVMTargetMachine` clearer. Its interface was moved under the CodeGen library, to further emphasis its usage in Targets that use CodeGen directly. 4. Makes `TargetMachine` the only interface used across LLVM and its projects. With these changes, `CodeGenCommonTMImpl` is simply a set of shared function implementations of `TargetMachine`, and CodeGen users don't need to static cast to `LLVMTargetMachine` every time they need a CodeGen-specific feature of the `TargetMachine`. 5. More importantly, does not change any requirements regarding library linking. cc @arsenm @aeubanks
2024-11-11	[RISCV] Remove unused includes (NFC) (#115814)	Kazu Hirata	1	-2/+0
	Identified with misc-include-cleaner.
2024-11-09	[RISCV] When using global merging, don't enable merging of external globals ↵	Alex Bradbury	1	-1/+9
	by default (#115484) AArch64 left this disabled after seeing some cases of slightly worse codegen that weren't tracked down, so I suggest as a path to incrementally moving towards enable globals merging we follow suit, and evaluate turning on later. This patch disables merging of external globals, but also adds a flag to override that. This reduces churn in test cases, simplifies benchmarking runs, and this flag can be removed later. A follow-on PR enables the globals merging pass by default (and as it's based on this commit, merging of external globals is disabled just as they are for AArch64).
2024-11-06	[RISCV] Add load/store clustering in post machine schedule (#111504)	BoyaoWang430	1	-0/+18
	#73789 added load clustering and #73796 tried to add store clustering. If post machine schedule is used, previous cluster of load/store which formed in machine schedule may break. In order to solve this, add load/sotre clustering to post machine schedule.
2024-11-06	[RISCV][CFI] add function epilogue cfi information (#110810)	dlav-sc	1	-0/+2
	This patch adds CFI instructions in the function epilogue. Before patch: addi sp, s0, -32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload addi sp, sp, 32 ret After patch: addi sp, s0, -32 .cfi_def_cfa sp, 32 ld ra, 24(sp) # 8-byte Folded Reload ld s0, 16(sp) # 8-byte Folded Reload ld s1, 8(sp) # 8-byte Folded Reload .cfi_restore ra .cfi_restore s0 .cfi_restore s1 addi sp, sp, 32 .cfi_def_cfa_offset 0 ret This functionality is already present in `riscv-gcc`, but it’s not in `clang` and this slightly impairs the `lldb` debugging experience, e.g. backtrace.
2024-10-28	[RISCV] Remove support for pre-RA vsetvli insertion (#110796)	Luke Lau	1	-18/+2
	Now that LLVM 19.1.0 has been out for a while with post-vector-RA vsetvli insertion enabled by default, this proposes to remove the flag that restores the old pre-RA behaviour so we only have one configuration going forward. That flag was mainly meant as a fallback in case users ran into issues, but I haven't seen anything reported so far.
2024-10-14	[RISCV][VLOPT] Fix passthru check in getOperandInfo (#112244)	Luke Lau	1	-0/+1
	If a pseudo has a passthru, I believe the first source operand will have operand no 2, not 1.
2024-10-11	[RISCV] Enable store clustering by default (#73796)	Alex Bradbury	1	-4/+6
	Builds on #73789, enabling store clustering by default using the same heuristic.
2024-10-11	[RISCV] Introduce VLOptimizer pass (#108640)	Michael Maitland	1	-1/+9
	The purpose of this optimization is to make the VL argument, for instructions that have a VL argument, as small as possible. This is implemented by visiting each instruction in reverse order and checking that if it has a VL argument, whether the VL can be reduced. By putting this pass before VSETVLI insertion, we see three kinds of changes to generated code: 1. Eliminate VSETVLI instructions 2. Reduce the VL toggle on VSETVLI instructions that also change vtype 3. Reduce the VL set by a VSETVLI instruction The list of supported instructions is currently whitelisted for safety. In the future, we could add more instructions to `isSupportedInstr` to support even more VL optimization. We originally wrote this pass because vector GEP instructions do not take a VL, which leads us to emit code that uses VL=VLMAX to implement GEP in the RISC-V backend. As a result, some of the vector instructions will write to lanes, specifically between the intended VL and VLMAX, that will never be read. As an alternative to this pass, we considered adding a vector predicated GEP instruction, but this would not fit well into the intrinsic type system since GEP has a variable number of arguments, each with arbitrary types. The second approach we considered was to put this pass after VSETVLI insertion, but we found that it was more difficult to recognize optimization opportunities, especially across basic block boundaries -- the data flow analysis was also a bit more expensive and complex. While this pass solves the GEP problem, we have expanded it to handle more cases of VL optimization, and there is opportunity for the analysis to be improved to enable even more optimization. We have a few follow up patches to post, but figured this would be a good start. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com> Co-authored-by: Kito Cheng <kito.cheng@sifive.com>
2024-10-01	[RISCV][GISel] Add RISCVPassConfig::getCSEConfig() to match other targets. ↵	Craig Topper	1	-0/+7
	(#110755)
2024-10-01	[RISCV] Enable load clustering by default (#73789)	Alex Bradbury	1	-1/+1
	We believe this is neutral or slightly better in the majority of cases.
2024-09-20	Revert "[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991)"	Michael Maitland	1	-2/+0
	This reverts commit 64972834c193632cbc47e54c0f0c721636b077e6. Based on the discussions in #108991 that happened post merge, we have decided to remove this pass in favor of generating `RISCV::G_` opcodes in the legalizer. We may reconsider moving that code elsewhere in the future so that we can do a better job during generic combines. We don't feel that doing it in instruciton selection is the right decision today. Firstly, it requires us to manually do regbankselect on the newly introduced instructions. Secondly, it is more difficult to test since the test output will contain whatever `RISCV::G_` instructions select to (instead of `RISCV::G_*`). My personal opinion is that the legalizer pass can be split into an early legalizer and a late legalizer, both before regbankselect. The first legalizer would not introduce target specific generic opcodes and the generic combiner would run after it. The second legalizer would introduce the target specific generic opcodes. I think this approach is better than the lowerer because the legalizer guarantees that whatever we lower to is legal, and apparently because it is more performant at compared to the lowerer (although, I'm not sure how true this is).
2024-09-19	[RISCV] Add additional fence for amocas when required by recent ABI change ↵	Alex Bradbury	1	-0/+1
	(#101023) A recent atomics ABI change / fix requires that for the "A6C" and A6S" atomics ABIs (i.e. both of those supported by LLVM currently), an additional fence is inserted for an atomic_compare_exchange with seq_cst failure ordering. <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/445> This isn't trivial to support through the hooks used by AtomicExpandPass because that pass assumes that when fences are inserted, the original atomics ordering information can be removed from the instruction. Rather than try to change and complicate that API, this patch implements the needed fence insertion through a small special purpose pass.
2024-09-17	[RISCV][GISEL] Introduce the RISCVPostLegalizerLowering pass (#108991)	Michael Maitland	1	-0/+2
	This is mostly a copy of the AArch64PostLegalizerLoweringPass, except it removes all of the AArch64 combines. This pass allows us to lower instructions after the generic post-legalization combiner has had a chance to run. We will be adding combines to this pass in future patches.