riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
8 days	[SimplifyCFG] Allow some switch optimizations early in the pipeline (#158242)	Nikita Popov	3	-2/+9
	While we do not want to form actual lookup tables early, we do want to perform some optimizations, as they may enable inlining of the much simpler form. Builds on https://github.com/llvm/llvm-project/pull/156477, which originally included this change as well. This PR makes two changes on top of it: * Do not perform the optimization early if it requires adding a mask check. These make the resulting IR less analyzable. * Add a new SimplifyCFG option that controls switch-to-arithmetic conversion separately from switch-to-lookup conversion. Enable the new flag at the end of the function simplification pipeline. This means that we attempt the arithmetic conversion before inlining, but avoid it in the early pipeline, where it may lose information.
9 days	[AllocToken] Introduce AllocToken instrumentation pass (#156838)	Marco Elver	2	-0/+2
	Introduce `AllocToken`, an instrumentation pass designed to provide tokens to memory allocators enabling various heap organization strategies, such as heap partitioning. Initially, the pass instruments functions marked with a new attribute `sanitize_alloc_token` by rewriting allocation calls to include a token ID, appended as a function argument with the default ABI. The design aims to provide a flexible framework for implementing different token generation schemes. It currently supports the following token modes: - TypeHash (default): token IDs based on a hash of the allocated type - Random: statically-assigned pseudo-random token IDs - Increment: incrementing token IDs per TU For the `TypeHash` mode introduce support for `!alloc_token` metadata: the metadata can be attached to allocation calls to provide richer semantic information to be consumed by the AllocToken pass. Optimization remarks can be enabled to show where no metadata was available. An alternative "fast ABI" is provided, where instead of passing the token ID as an argument (e.g., `__alloc_token_malloc(size, id)`), the token ID is directly encoded into the name of the called function (e.g., `__alloc_token_0_malloc(size)`). Where the maximum tokens is small, this offers more efficient instrumentation by avoiding the overhead of passing an additional argument at each allocation site. Link: https://discourse.llvm.org/t/rfc-a-framework-for-allocator-partitioning-hints/87434 [1] --- This change is part of the following series: 1. https://github.com/llvm/llvm-project/pull/160131 2. https://github.com/llvm/llvm-project/pull/156838 3. https://github.com/llvm/llvm-project/pull/162098 4. https://github.com/llvm/llvm-project/pull/162099 5. https://github.com/llvm/llvm-project/pull/156839 6. https://github.com/llvm/llvm-project/pull/156840 7. https://github.com/llvm/llvm-project/pull/156841 8. https://github.com/llvm/llvm-project/pull/156842
10 days	[FuncAttrs][LTO] Relax norecurse attribute inference during postlink LTO ↵	Usha Gupta	2	-0/+2
	(#158608) This PR, which supersedes https://github.com/llvm/llvm-project/pull/139943, extends the scenarios where the 'norecurse' attribute can be inferred. Currently, the 'norecurse' attribute is only inferred if all called functions also have this attribute. This change introduces a new pass in the LTO pipeline, run after Whole Program Devirtualization, to broaden the inference criteria. The new pass inspects all functions in the module and sets a flag if any functions are external or have their addresses taken (while ignoring those already marked norecurse). This flag is then used with the existing conditions to enable inference in more cases. This enhancement allows 'norecurse' to be applied in situations where a function calls a recursive function, but is not part of the same recursion chain. For example, foo can now be marked 'norecurse' in the following scenarios: `foo -> callee1 -> callee2 -> callee2` In this case, foo and callee1 can both be marked 'norecurse' because they're not part of the callee2 recursion. Similarly, foo can be marked 'norecurse' here: `foo -> callee1 -> callee2 -> callee1` Here, foo is not part of the callee1 -> callee2 -> callee1 recursion chain, so it can be marked 'norecurse'.
2025-10-01	[JTS][NFC] Optimize guid fetching (#161612)	Mircea Trofin	1	-1/+3
	It's unnecessary to build the whole symtable, and on top of everything, un-optimal to do so for every function. All we really need is the instrumented PGO name - considering also LTO-ness - and then we can compute the function name.
2025-10-01	Cleanup the LLVM exported symbols namespace (#161240)	Nicolai Hähnle	1	-1/+2
	There's a pattern throughout LLVM of cl::opts being exported. That in itself is probably a bit unfortunate, but what's especially bad about it is that a lot of those symbols are in the global namespace. Move them into the llvm namespace. While doing this, I noticed some other variables in the global namespace and moved them as well.
2025-09-25	[llvm] Add `vfs::FileSystem` to `PassBuilder` (#160188)	Jan Svoboda	2	-26/+22
	Some LLVM passes need access to the filesystem to read configuration files and similar. In some places, this is achieved by grabbing the VFS from `PGOOptions`, but some passes don't have access to these and resort to just calling `vfs::getRealFileSystem()`. This PR allows setting the VFS directly on `PassBuilder` that's able to pass it down to all passes that need it.
2025-09-25	[LV] Remove EVLIndVarSimplify pass (#160454)	Luke Lau	2	-2/+0
	Initially this was needed to replace the fixed-step canonical IV with the variable-step EVL IV, but this was eventually superseded by the loop vectorizer doing this transform itself in #147222. The pass was then removed from the RISC-V pipeline in #151483 and the loop vectorizer stopped emitting the metadata used by the pass in #155760, so now there's no users of it.
2025-09-23	[LTO][Pipeline][Coro] Add missing coro pass to O0 post-link thinlto pipeline ↵	Weibo He	1	-0/+1
	(#159497) Add missing coroutine passes so that coro code can be correctly compiled. Fix #155558
2025-09-19	[PassBuilder] Add callback invoking to PassBuilder string API (#157153)	Gabriel Baraldi	2	-21/+144
	This is a very rough state of what this can look like, but I didn't want to spend too much time on what could be a dead end. Currently the only way to invoke callbacks is by using the default pipelines, this is an issue if you want to define your own pipeline using the C string API (we do that in LLVM.jl in julia) so I extended the api to allow for invoking those callbacks just like one would call a pass of that kind. There are some questions about the params that these callbacks take and also I'm missing some of them (some of them are also invoked by the backend so we may not want to expose them) Code written with AI help, bugs are mine. (Not sure what policy for this is on LLVM)
2025-09-19	[CodeGen][NewPM] Port `ReachingDefAnalysis` to new pass manager. (#159572)	Mikhail Gudim	1	-0/+1
	In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`
2025-09-18	[DropUnnecessaryAssumes] Add pass for dropping assumes (#159403)	Nikita Popov	3	-0/+10
	This adds a new pass for dropping assumes that are unlikely to be useful for further optimization. It works by discarding any assumes whose affected values are one-use (which implies that they are only used by the assume, i.e. ephemeral). This pass currently runs at the start of the module optimization pipeline, that is post-inline and post-link. Before that point, it is more likely for previously "useless" assumes to become useful again, e.g. because an additional user of the value is introduced after inlining + CSE.
2025-09-18	[NewPM] Remove BranchProbabilityInfo from FunctionToLoopPassAdaptor. NFCI ↵	Luke Lau	1	-4/+1
	(#159516) No loop pass seems to use now it after LoopPredication stopped using it in https://reviews.llvm.org/D111668
2025-09-16	Reapply "Introduce -fexperimental-loop-fusion to clang and flang (#158844)	Madhur Amilkanthwar	1	-1/+8
	This PR is a reapplication of https://github.com/llvm/llvm-project/pull/142686
2025-09-16	Revert "Introduce -fexperimental-loop-fuse to clang and flang (#142686)" ↵	Vitaly Buka	1	-7/+1
	(#158764) This reverts commit 895cda70a95529fd22aac05eee7c34f7624996af. And fix attempt: 06f671e57a574ba1c5127038eff8e8773273790e. Performance regressions and broken sanitizers, see #142686.
2025-09-15	Introduce -fexperimental-loop-fuse to clang and flang (#142686)	Sebastian Pop	1	-1/+7
	This patch adds the flag -fexperimental-loop-fuse to the clang and flang drivers. This is primarily useful for experiments as we envision to enable the pass one day. The options are based on the same principles and reason on which we have `floop-interchange`. --------- Co-authored-by: Madhur Amilkanthwar <madhura@nvidia.com>
2025-09-10	[AMDGPU] Change expand-fp opt level argument syntax (#157408)	Frederik Harwath	2	-13/+11
	Align the syntax used for the optimization level argument of the expand-fp pass in textual descriptions of pass pipelines with the syntax used by other passes taking a similar argument. That is, use e.g. `expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
2025-09-05	Minor post-commit review changes for #130988 (NFC) (#156895)	Frederik Harwath	1	-2/+2

2025-09-03	[AMDGPU] Implement IR expansion for frem instruction (#130988)	Frederik Harwath	2	-1/+31
	This patch implements a correctly rounded expansion of the frem instruction in LLVM IR. This is useful for target architectures for which such an expansion is too involved to be implement in ISel Lowering. The expansion is based on the code from the AMD device libs and has been tested successfully against the OpenCL conformance tests on amdgpu. The expansion is implemented in the preexisting "expand-fp" pass. It replaces the expansion of "frem" in ISel for the amdgpu target; it is enabled for targets which do not directly support "frem" and for which no matching "fmod" LibCall is available. --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
2025-08-29	[SCEVDivision] Add SCEVDivisionPrinterPass with corresponding tests (#155832)	Ryotaro Kasuga	2	-0/+2
	This patch introduces `SCEVDivisionPrinterPass` and registers it under the name `print<scev-division>`, primarily for testing purposes. This pass invokes `SCEVDivision::divide` upon encountering `sdiv`, and prints the numerator, denominator, quotient, and remainder. It also adds several test cases, some of which are currently incorrect and require fixing. Along with that, this patch added some comments to clarify the behavior of `SCEVDivision::divide`, as follows: - This function does NOT actually perform the division - Given the `Numerator` and `Denominator`, find a pair `(Quotient, Remainder)` s.t. `Numerator = Quotient * Denominator + Remainder` - The common condition `Remainder < Denominator` is NOT necessarily required - There may be multiple solutions for `(Quotient, Remainder)`, and this function finds one of them - Especially, there is always a trivial solution `(0, Numerator)` - The following computations may wrap - The multiplication of `Quotient` and `Denominator` - The addition of `Quotient * Denominator` and `Remainder` Related discussion: #154745
2025-08-26	[StandardInstrumentations] Make ↵	Mingjie Xu	1	-11/+11
	-print-after-pass-number/-print-after-pass-number options allow multiple pass numbers specified (#155228) `-print-before` and `-print-after` support multiple passes as a list of strings, so it makes sense that we also support `-print-before-pass-number` and `-print-after-pass-number` taking a list of pass numbers as input. This is useful if you want to print out the IRs before/after specified passes with pass numbers reported by print-pass-numbers in a single run.
2025-08-17	[llvm] Remove unused includes (NFC) (#154051)	Kazu Hirata	1	-1/+0
	These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-07-31	Revert "[PGO] Add `llvm.loop.estimated_trip_count` metadata" (#151585)	Joel E. Denny	3	-8/+2
	Reverts llvm/llvm-project#148758 [As requested.](https://github.com/llvm/llvm-project/pull/148758#pullrequestreview-3076627201)
2025-07-31	[PGO] Add `llvm.loop.estimated_trip_count` metadata (#148758)	Joel E. Denny	3	-2/+8
	This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As [suggested in the RFC comments](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785/4), it adds the new metadata to all loops at the time of profile ingestion and estimates each trip count from the loop's `branch_weights` metadata. As [suggested in the PR #128785 review](https://github.com/llvm/llvm-project/pull/128785#discussion_r2151091036), it does so via a new `PGOEstimateTripCountsPass` pass, which creates the new metadata for each loop but omits the value if it cannot estimate a trip count due to the loop's form. An important observation not previously discussed is that `PGOEstimateTripCountsPass` often cannot estimate a loop's trip count, but later passes can sometimes transform the loop in a way that makes it possible. Currently, such passes do not necessarily update the metadata, but eventually that should be fixed. Until then, if the new metadata has no value, `llvm::getLoopEstimatedTripCount` disregards it and tries again to estimate the trip count from the loop's current `branch_weights` metadata.
2025-07-28	[HIPSTDPAR] Add handling for math builtins (#140158)	Alex Voicu	1	-0/+1
	When compiling in `--hipstdpar` mode, the builtins corresponding to the standard library might end up in code that is expected to execute on the accelerator (e.g. by using the `std::` prefixed functions from `<cmath>`). We do not have uniform handling for this in AMDGPU, and the errors that obtain are quite arcane. Furthermore, the user-space changes required to work around this tend to be rather intrusive. This patch adds an additional `--hipstdpar` specific pass which forwards to the run time component of HIPSTDPAR the intrinsics / libcalls which result from the use of the math builtins, and which are not properly handled. In the long run we will want to stop relying on this and handle things in the compiler, but it is going to be a rather lengthy journey, which makes this medium term escape hatch necessary. The paired change in the run time component is here <https://github.com/ROCm/rocThrust/pull/551>.
2025-07-26	[ObjCARC] Completely remove ObjCARCAPElimPass (#150717)	AZero13	1	-1/+0
	ObjCARCAPElimPass has been made obsolete now that we remove unused autorelease pools.
2025-07-23	[PGO] Add ProfileInjector and ProfileVerifier passes (#147388)	Mircea Trofin	2	-0/+3
	Adding 2 passes, one to inject `MD_prof` and one to check its presence. A subsequent patch will add these (similar to debugify) to `opt` (and, eventually, a variant of this, to `llc`) Tracking issue: #147390
2025-07-23	[CodeGen] Add a pass for testing finalizeBundle (#149813)	Jay Foad	1	-0/+1
	This allows for unit testing of finalizeBundle with standard MIR tests using update_mir_test_checks.py.
2025-07-21	Reapply "[GVN] memoryssa implies no-memdep (#149473)" (#149767)	Madhur Amilkanthwar	1	-0/+4
	Enabling one of MemorySSA or MD implies the other is off. Already approved in https://github.com/llvm/llvm-project/pull/149473 but I had to revert as I missed updating one test.
2025-07-21	Revert "[GVN] memoryssa implies no-memdep (#149473)" (#149766)	Madhur Amilkanthwar	1	-4/+0
	This reverts commit 60d2d94db253a9fdc7bd111120c803f808564b30.
2025-07-21	[GVN] memoryssa implies no-memdep (#149473)	Madhur Amilkanthwar	1	-0/+4
	Enabling one of MemorySSA or MD implies the other is off.
2025-07-18	[CodeGen][NPM] Clear MachineFunctions without using PA (#148113)	Vikram Hegde	1	-0/+1
	same as https://github.com/llvm/llvm-project/pull/139517 This replaces the InvalidateAnalysisPass<MachineFunctionAnalysis> pass. There are no cross-function analysis requirements right now, so clearing all analyses works for the last pass in the pipeline. Having the InvalidateAnalysisPass<MachineFunctionAnalysis>() is causing a problem with ModuleToCGSCCPassAdaptor by deleting machine functions for other functions and ending up with exactly one correctly compiled MF, with the rest being vanished. This is because ModuleToCGSCCPAdaptor propagates PassPA (received from the CGSCCToFunctionPassAdaptor that runs the actual codegen pipeline on MFs) to the next SCC. That causes MFA invalidation on functions in the next SCC. For us, PassPA happens to be returned from invalidate<machine-function-analysis> which abandons the MachineFunctionAnalysis. So while the first function runs through the pipeline normally, invalidate also deletes the functions in the next SCC before its pipeline is run. (this seems to be the intended mechanism of the CG adaptor to allow cross-SCC invalidations. Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-16	[OptBisect][IR] Adding a new OptPassGate for disabling passes via name (#145059)	Cristian Assaiante	1	-3/+7
	This commit adds a new pass gate that allows selective disabling of one or more passes via the clang command line using the `-opt-disable` option. Passes to be disabled should be specified as a comma-separated list of their names. The implementation resides in the same file as the bisection tool. The `getGlobalPassGate()` function returns the currently enabled gate. Example: `-opt-disable="PassA,PassB"` Pass names are matched using case-insensitive comparisons. However, note that special characters, including spaces, must be included exactly as they appear in the pass names. Additionally, a `-opt-disable-enable-verbosity` flag has been introduced to enable verbose output when this functionality is in use. When enabled, it prints the status of all passes (either running or NOT running), similar to the default behavior of `-opt-bisect-limit`. This flag is disabled by default, which is the opposite of the `-opt-bisect-verbose` flag (which defaults to enabled). To validate this functionality, a test file has also been provided. It reuses the same infrastructure as the opt-bisect test, but disables three specific passes and checks the output to ensure the expected behavior. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2025-07-16	[CodeGen][NPM] Port ProcessImplicitDefs to NPM (#148110)	Vikram Hegde	1	-0/+1
	same as https://github.com/llvm/llvm-project/pull/138829 Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-15	[CodeGen][NPM] Register Function Passes (#148109)	Vikram Hegde	2	-0/+5
	same as https://github.com/llvm/llvm-project/pull/138828, Co-authored-by : Oke, Akshat <[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
2025-07-10	[CodeGen][NewPM] Port "PostRAMachineSink" pass to NPM (#129690)	Vikram Hegde	1	-0/+1

2025-07-09	[CodeGen][NPM] Port InitUndef to NPM (#138495)	Akshat Oke	1	-0/+1

2025-07-09	Utils: Add pass to declare runtime libcalls (#147534)	Matt Arsenault	2	-0/+2
	This will be useful for testing the set of calls for different systems, and eventually the product of context specific modifiers applied. In the future we should also know the type signatures, and be able to emit the correct one.
2025-07-07	[CodeGen][NPM] Allow nested MF pass managers for -passes (#128852)	Akshat Oke	1	-1/+10
	This allows `machine-function(p1,machine-function(...))` instead of erroring. Effectively it is flattened to a single MFPM.
2025-07-04	[Passes] Move LoopInterchange into optimization pipeline (#145503)	Ryotaro Kasuga	1	-3/+4
	As mentioned in https://github.com/llvm/llvm-project/pull/145071, LoopInterchange should be part of the optimization pipeline rather than the simplification pipeline. This patch moves LoopInterchange into the optimization pipeline. More contexts: - By default, LoopInterchange attempts to improve data locality, however, it also takes increasing vectorization opportunities into account. Given that, it is reasonable to run it as close to vectorization as possible. - I looked into previous changes related to the placement of LoopInterchange, but couldn’t find any strong motivation suggesting that it benefits other simplifications. - As far as I tried some tests (including llvm-test-suite), removing LoopInterchange from the simplification pipeline does not affect other simplifications. Therefore, there doesn't seem to be much value in keeping it there. - The new position reduces compile-time for ThinLTO, probably because it only runs once per function in post-link optimization, rather than both in pre-link and post-link optimization. I haven't encountered any cases where the positional difference affects optimization results, so please feel free to revert if you run into any issues.
2025-07-01	Fix missing/outdated pass options in PassRegistry.def (#146160)	Meredith Julian	1	-13/+17
	There are a handful of passes in PassRegistry.def with outdated or missing pass options. These strings describing pass options are used for the printPassNames() function only, which is likely why they have gotten out-of-date without being caught. This MR simply changes the few passes where the option string is out-of-date, fixing the output of -print-passes. This does not affect functionality of the pipeline parser, and is hard to verify in a unit test, so no tests were added.
2025-07-01	[NFC] Formatting PassRegistry.def (#144139)	S. VenkataKeerthy	1	-17/+23

2025-06-30	[IR2Vec] Scale embeddings once in vocab analysis instead of repetitive ↵	S. VenkataKeerthy	1	-0/+1
	scaling (#143986) Changes to scale opcodes, types and args once in `IR2VecVocabAnalysis` so that we can avoid scaling each time while computing embeddings. This PR refactors the vocabulary to explicitly define 3 sections---Opcodes, Types, and Arguments---used for computing Embeddings. (Tracking issue - #141817 ; partly fixes - #141832)
2025-06-30	[LLVM] Change `ModulePass::skipModule` to take a const reference (#146168)	Rahul Joshi	1	-1/+1
	Change `ModulePass::skipModule` to take const Module reference. Additionally, make `OptPassGate::shouldRunPass` const as well as for most implementations it's a const query. For `OptBisect`, make `LastBisectNum` mutable so it could be updated in `shouldRunPass`. Additional minor cleanup: Change all StringRef arguments to simple StringRef (no const or reference), change `OptBisect::Disabled` to constexpr.
2025-06-30	[PassBuilder][FatLTO] Expose FatLTO pipeline via pipeline string (#146048)	Nikita Popov	2	-0/+42
	Expose the FatLTO pipeline via `-passes="fatlto-pre-link<Ox>"`, similar to all the other optimization pipelines. This is to allow reproducing it outside clang. (Possibly also useful for C API users.)
2025-06-27	[LowerAllowCheckPass] allow to specify runtime.check hotness (#145998)	Florian Mayer	1	-0/+13

2025-06-27	[PassBuilder] Treat pipeline aliases as normal passes (#146038)	Nikita Popov	2	-55/+55
	Pipelines like `-passes="default<O3>"` are currently parsed in a special way. Switch them to work like normal, parameterized module passes.
2025-06-23	[TRE] Adjust function entry count when using instrumented profiles (#143987)	Mircea Trofin	1	-3/+11
	The entry count of a function needs to be updated after a callsite is elided by TRE: before elision, the entry count accounted for the recursive call at that callsite. After TRE, we need to remove that callsite's contribution. This patch enables this for instrumented profiling cases because, there, we know the function entry count captured entries before TRE. We cannot currently address this for sample-based (because we don't know whether this function was TRE-ed in the binary that donated samples)
2025-06-23	[Passes] Remove LoopInterchange from O1 pipeline (#145071)	Nikita Popov	1	-3/+0
	This is a fairly exotic pass, I don't think it makes a lot of sense to run it at O1, esp. as vectorization wouldn't run at O1 anyway.
2025-06-19	[HashRecognize] Make it a non-PM analysis (#144742)	Ramkumar Ramachandra	1	-1/+0
	Make HashRecognize a non-PassManager analysis that can be called to get the result on-demand, creating a new getResult() entry-point. The issue was discovered when attempting to use the analysis to perform a transform in LoopIdiomRecognize.
2025-06-05	Add SimplifyTypeTests pass.	Peter Collingbourne	2	-0/+4
	This pass figures out whether inlining has exposed a constant address to a lowered type test, and remove the test if so and the address is known to pass the test. Unfortunately this pass ends up needing to reverse engineer what LowerTypeTests did; this is currently inherent to the design of ThinLTO importing where LowerTypeTests needs to run at the start. Reviewers: teresajohnson Reviewed By: teresajohnson Pull Request: https://github.com/llvm/llvm-project/pull/141327