riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
8 days	[SSAUpdaterBulk] Add PHI simplification pass. (#150936)	Valery Pykhtin	1	-0/+220
	This optimization is performed as a separate pass over newly inserted PHI nodes to simplify and deduplicate them. By processing PHIs separately, we avoid the complexity of tracking reference bookkeeping needed to update BBValueInfo structures during insertion.
13 days	[llvm][DebugInfo][NFC] Abstract DICompileUnit::SourceLanguage to allow ↵	Michael Buch	1	-11/+13
	alternate DWARF SourceLanguage encoding (#162255) This patch sets up `DICompileUnit` to support the DWARFv6 `DW_AT_language_name` and `DW_AT_language_version` attributes (which are set to replace `DW_AT_language`). This patch changes the `DICompileUnit::SourceLanguage` field type to a `DISourceLanguageName` that encapsulates the notion of "versioned vs. unversioned name". A "versioned" name is one that has an associated version stored separately in `DISourceLanguageName::Version`. This patch just changes all the clients of the `getSourceLanguage` API to the expect a `DISourceLanguageName`. Currently they all just `assert` (via `DISourceLanguageName::getUnversionedName`) that we're dealing with "unversioned names" (i.e., the pre-DWARFv6 language codes). In follow-up patches (e.g., draft is at https://github.com/llvm/llvm-project/pull/162261), when we start emitting versioned language codes, the `getUnversionedName` calls can then be adjusted to `getName`. Implementation considerations * We could have added a new member to `DICompileUnit` alongside the existing `SourceLanguage` field. I don't think this would have made the transition any simpler (clients would still need to be aware of "versioned" vs. "unversioned" language names). I felt that encapsulating this inside a `DISourceLanguageName` was easier to reason about for maintainers. * Currently DISourceLanguageName is a `12` byte structure. We could probably pack all the info inside a `uint64_t` (16-bits for the name, 32-bits for the version, 1-bit for answering the `hasVersionedName`). Just to keep the prototype simple I used a `std::optional`. But since the guts of the structure are hidden, we can always change the layout to a more compact representation instead. How to review * The new `DISourceLanguageName` structure is defined in `DebugInfoMetadata.h`. All the other changes fall out from changing the `DICompileUnit::SourceLanguage` from `unsigned` to `DISourceLanguageName`.
2025-09-24	Reapply "[Coroutines] Add llvm.coro.is_in_ramp and drop return value of ↵	Weibo He	1	-3/+3
	llvm.coro.end (#155339)" (#159278) As mentioned in #151067, current design of llvm.coro.end mixes two functionalities: querying where we are and lowering to some code. This patch separate these functionalities into independent intrinsics by introducing a new intrinsic llvm.coro.is_in_ramp. Update a test in inline/ML, Reapply #155339
2025-09-16	Add DebugSSAUpdater class to track debug value liveness (#135349)	Stephen Tozer	2	-0/+220
	This patch adds a class that uses SSA construction, with debug values as definitions, to determine whether and which debug values for a particular variable are live at each point in an IR function. This will be used by the IR reader of llvm-debuginfo-analyzer to compute variable ranges and coverage, although it may be applicable to other debug info IR analyses.
2025-09-11	[PGO] Add llvm.loop.estimated_trip_count metadata (#152775)	Joel E. Denny	1	-0/+53
	This patch implements the `llvm.loop.estimated_trip_count` metadata discussed in [[RFC] Fix Loop Transformations to Preserve Block Frequencies](https://discourse.llvm.org/t/rfc-fix-loop-transformations-to-preserve-block-frequencies/85785). As the RFC explains, that metadata enables future patches, such as PR #128785, to fix block frequency issues without losing estimated trip counts.
2025-08-29	[DirectX] Make dx.RawBuffer an op that can't be replaced (#154620)	Farzon Lotfi	1	-0/+8
	fixes #152348 SimplifyCFG collapses raw buffer store from a if\else load into a select. This change prevents the TargetExtType dx.Rawbuffer from being replace thus preserving the if\else blocks. A further change was needed to eliminate the phi node before we process Intrinsic::dx_resource_getpointer in DXILResourceAccess.cpp
2025-08-08	[KeyInstr] Remove LLVM_EXPERIMENTAL_KEY_INSTRUCTIONS CMake flag (#152735)	Orlando Cazalet-Hyams	1	-4/+0
	The CMake flag has been on by default for a month without any issues. This makes the feature support in LLVM unconditional (but does not enable the feature by default).
2025-07-21	[DebugInfo] Remove intrinsic-flavours of findDbgUsers (#149816)	Jeremy Morse	1	-17/+6
	This is one of the final remaining debug-intrinsic specific codepaths out there, and pieces of cross-LLVM infrastructure to do with debug intrinsics.
2025-07-18	[DebugInfo] Shave even more users of DbgVariableIntrinsic from LLVM (#149136)	Jeremy Morse	2	-64/+1
	At this stage I'm just opportunistically deleting any code using debug-intrinsic types, largely adjacent to calls to findDbgUsers. I'll get to deleting that in probably one or more two commits.
2025-07-15	[DebugInfo][RemoveDIs] Suppress getNextNonDebugInfoInstruction (#144383)	Jeremy Morse	1	-7/+7
	There are no longer debug-info instructions, thus we don't need this skipping. Horray!
2025-06-30	[KeyInstr] Add DISubprogram::keyInstructions bit (#144107)	Orlando Cazalet-Hyams	1	-2/+2
	Patch 1/4 adding bitcode support. Store whether or not a function is using Key Instructions in its DISubprogram so that we don't need to rely on the -mllvm flag -dwarf-use-key-instructions to determine whether or not to interpret Key Instructions metadata to decide is_stmt placement at DWARF emission time. This makes bitcode support simple and enables well defined mixing of non-key-instructions and key-instructions functions in an LTO context. This patch adds the bit (using DISubprogram::SubclassData1). PR 144104 and 144103 use it during DWARF emission. PR 44102 adds bitcode support. See pull request for overview of alternative attempts.
2025-06-26	TargetLibraryInfo: Delete default TargetLibraryInfoImpl constructor (#145826)	Matt Arsenault	6	-11/+12
	It should not be possible to construct one without a triple. It would also be nice to delete TargetLibraryInfoWrapperPass, but that is more difficult.
2025-05-19	[AMDGPU] Set AS8 address width to 48 bits	Alexander Richardson	1	-1/+1
	Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-12	[LLVM][Transforms] Add unit test for resolved cloning bug (#139223)	Christian Ulmann	1	-0/+19
	This commit introduces a new unit test that covers a block address cloning bug. Specifically, this test covers the bug tracked in http://github.com/llvm/llvm-project/issues/47769 which has been resolved in the meantime.
2025-05-06	[IR] Remove the AtomicMem*Inst helper classes (#138710)	Philip Reames	1	-4/+4
	Migrate their usage to the `AnyMemInst` family, and add a isAtomic() query on the base class for that hierarchy. This matches the idioms we use for e.g. isAtomic on load, store, etc.. instructions, the existing isVolatile idioms on mem routines, and allows us to more easily share code between atomic and non-atomic variants. As with #138568, the goal here is to simplify the class hierarchy and make it easier to reason about. I'm moving from easiest to hardest, and will stop at some point when I hit "good enough". Longer term, I'd sorta like to merge or reverse the naming on the plain MemInst and the AnyMemInst, but that's a much larger and more risky change. Not sure I'm going to actually do that.
2025-05-06	[NFC][KeyInstr] Add Atom Group (re)mapping (#133479)	Orlando Cazalet-Hyams	1	-0/+104
	Add: mapAtomInstance - map the atom group number to a new group. RemapSourceAtom - apply the mapped atom group number to this instruction. Modify: CloneBasicBlock - Call mapAtomInstance on cloned instruction's DebugLocs if MapAtoms is true (default). Setting to false could lead to a degraded debugging experience. See code comment. Optimisations like loop unroll that duplicate instructions need to remap source atom groups so that each duplicated source construct instance is considered distinct when determining is_stmt locations. This commit adds the remapping functionality and a unittest. RFC: https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
2025-04-30	[BasicBlockUtils] Remove broken support for eh pads in SplitEdge() (#137816)	Nikita Popov	1	-16/+2
	d81d9e8b8604c85709de0a22bb8dd672a28f0401 changed SplitEdge() to make use of ehAwareSplitEdge() for critical edges where the target is an eh pad. However, the implementation is incorrect at least for landing pads. What is currently produced for the code in the modified unit test is something like this: continue: invoke void @sink() to label %normal unwind label %new_bb new_bb: %cp = cleanuppad within %exception [] cleanupret from %cp unwind label %exception exception: %cleanup = landingpad i8 cleanup br label %trivial-eh-handler This mixes different exception handling mechanisms in a nonsensical way, and is not well-formed IR. To actually "split" the landingpad edge (for a rather loose definition of "split"), I think we'd have to generate something along these lines: exception.split: %cleanup.split = landingpad i8 cleanup br label %exception.cont exception: %cleanup.orig = landingpad i8 cleanup br label %exception.cont exception.cont: %cleanup = phi i8 [ %cleanup.split, %exception_split ], [ %cleanup.orig, %exception ] I didn't bother actually implementing that, seeing as how nobody noticed the existing codegen being broken in the last four years, so clearly nobody actually needs this function to work with EH edges. Just return nullptr instead.
2025-04-26	Remove a debugging message left in a unit test. (#137451)	Abid Qadeer	1	-1/+0
	This should fix the regression reported in https://github.com/llvm/llvm-project/pull/136016#issuecomment-2831987288
2025-04-26	[CodeExtractor] Improve debug info for input values. (#136016)	Abid Qadeer	1	-0/+84
	If we use `CodeExtractor` to extract the block1 into a new function, ``` define void @foo() !dbg !2 { entry: %1 = alloca i32, i64 1, align 4 %2 = alloca i32, i64 1, align 4 #dbg_declare(ptr %1, !8, !DIExpression(), !1) br label %block1 block1: store i32 1, ptr %1, align 4 store i32 2, ptr %2, align 4 #dbg_declare(ptr %2, !10, !DIExpression(), !1) ret void } ``` it will look like the extracted function shown below (with some irrelevent details removed). ``` define internal void @extracted(ptr %arg0, ptr %arg1) { newFuncRoot: br label %block1 block1: store i32 1, ptr %arg0, align 4 store i32 2, ptr %arg1, align 4 ret void } ``` You will notice that it has replaced the usage of values that were in the parent function (%1 and %2) with the arguments to the new function. But it did not do the same thing with `#dbg_declare` which was simply dropped because its location pointed to a value outside of the new function. Similarly arg0 is without any debug record, although the value that it replaced had one and we could materialize one for it based on that. This is not just a theoretical limitations. `CodeExtractor` is used to create functions that implement many of the `OpenMP` constructs in `OMPIRBuilder`. As a result of these limitations, the debug information is missing from the created functions. This PR tries to address this problem. It iterates over the input to the extracted function and looks at their debug uses. If they were present in the new function, it updates their location. Otherwise it materialize a similar usage in the new function. Most of these changes are localized in `fixupDebugInfoPostExtraction`. Only other change is to propagate function inputs and the replacement values to it. --------- Co-authored-by: Tim Gymnich <tim@gymni.ch> Co-authored-by: Michael Kruse <llvm-project@meinersbur.de>
2025-04-09	[DebugInfo][RemoveDIs] Eliminate another debug-info variation flag (#133917)	Jeremy Morse	1	-78/+6
	The "preserve input debug-info format" flag allowed some tooling to opt into not seeing the new debug records yet, and to not autoupgrade. This was good at the time, but un-necessary now that we'll be ditching intrinsics shortly. It also hides errors now: verify-uselistorder was hardcoding this flag to on, and as a result it hasn't seen debug records before. Thus, we missed a uselistorder variation: constant-expressions such as GEPs can be contained within debug records and completely isolated from the value hierachy, see the metadata-use-uselistorder.ll test. These Values didn't get ordered, but were legitimate uses of constants like "i64 0", and we now run into difficulty handling that. The patch to AsmWriter seeks Values to order even through debug-info now. Finally there are a few intrinsics-tests relying on this flag that we can just delete, such as one in llvm-reduce and another few in the LocalTest unit tests. For the fast-isel test, it was added in https://reviews.llvm.org/D67703 explicitly for checking the size of blocks without debug-info and in 1525abb9c94 the codepath it tests moved towards being sunsetted. It'll be totally redundant once RemoveDIs is on permanently. Note that there's now no explicit test for the textual-IR autoupgrade path. I submit that we can rely on the thousands of .ll files where we've only been bothered to update the outputs, not the inputs, to debug records.
2025-04-03	[Utils] Fix incorrect LCSSA PHI nodes when splitting critical edges with ↵	Camsyn	1	-0/+57
	MergeIdenticalEdges (#131744) This PR fixes incorrect LCSSA PHI node generation when splitting critical edges with both `PreserveLCSSA` and `MergeIdenticalEdges` enabled. The bug caused PHI nodes in the split block to miss predecessors when multiple identical edges were merged.
2025-04-01	[DebugInfo][RemoveDIs] Remove debug-intrinsic printing cmdline options (#131855)	Jeremy Morse	1	-7/+1
	During the transition from debug intrinsics to debug records, we used several different command line options to customise handling: the printing of debug records to bitcode and textual could be independent of how the debug-info was represented inside a module, whether the autoupgrader ran could be customised. This was all valuable during development, but now that totally removing debug intrinsics is coming up, this patch removes those options in favour of a single flag (experimental-debuginfo-iterators), which enables autoupgrade, in-memory debug records, and debug record printing to bitcode and textual IR. We need to do this ahead of removing the experimental-debuginfo-iterators flag, to reduce the amount of test-juggling that happens at that time. There are quite a number of weird test behaviours related to this -- some of which I simply delete in this commit. Things like print-non-instruction-debug-info.ll , the test suite now checks for debug records in all tests, and we don't want to check we can print as intrinsics. Or the update_test_checks tests -- these are duplicated with write-experimental-debuginfo=false to ensure file writing for intrinsics is correct, but that's something we're imminently going to delete. A short survey of curious test changes: * free-intrinsics.ll: we don't need to test that debug-info is a zero cost intrinsic, because we won't be using intrinsics in the future. * undef-dbg-val.ll: apparently we pinned this to non-RemoveDIs in-memory mode while we sorted something out; it works now either way. * salvage-cast-debug-info.ll: was testing intrinsics-in-memory get salvaged, isn't necessary now * localize-constexpr-debuginfo.ll: was producing "dead metadata" intrinsics for optimised-out variable values, dbg-records takes the (correct) representation of poison/undef as an operand. Looks like we didn't update this in the past to avoid spurious test differences. * Transforms/Scalarizer/dbginfo.ll: this test was explicitly testing that debug-info affected codegen, and we deferred updating the tests until now. This is just one of those silent gnochange issues that get fixed by RemoveDIs. Finally: I've added a bitcode test, dbg-intrinsics-autoupgrade.ll.bc, that checks we can autoupgrade debug intrinsics that are in bitcode into the new debug records.
2025-03-31	[IRBuilder] Add new overload for CreateIntrinsic (#131942)	Rahul Joshi	1	-2/+2
	Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2025-03-21	[SSAUpdaterBulk] Fix incorrect live-in values for a block. (#131762)	Valery Pykhtin	1	-7/+4
	The previous implementation incorrectly calculated incoming values from loop backedges, as demonstrated by the tests. The issue was that it did not distinguish between live-in and live-out values for blocks. This patch addresses the problem and fixes https://github.com/llvm/llvm-project/pull/131761. To avoid bloating storage in `R.Defines`, processing data has been moved to a temporary map `BBInfos`. This change helps manage heap allocation more efficiently and likely improves caching.
2025-03-18	[SSAUpdaterBulk] Add expectedly failing loop tests. (#131761)	Valery Pykhtin	1	-0/+119
	These tests demonstrate the issue in SSAUpdaterBulk when it calculates incoming values from loop back edges. The failures are marked with `EXPECT_NONFATAL_FAILURE`, which is the way to designate an "expected fail" in the Google Test suite.
2025-03-14	[ctxprof] Capture sampling info for context roots (#131201)	Mircea Trofin	1	-0/+4
	When we collect a contextual profile, we sample the threads entering its root and only collect on one at a time (see `ContextRoot::Taken`). If we want to compare profiles between contextual profiles, and/or flat profiles, we have a problem: we don't know how to compare the counter values relative to each other. To that end, we add `ContextRoot::TotalEntries`, which is incremented every time a root is entered and serves as multiplier for the counter values collected under that root. We expose this in the profile and leave the normalization to the user of the profile, for a few reasons: * it's only needed if reasoning about all profiles in aggregate. * the goal, in compiler_rt, is to flush out the profile as quickly as possible, and performing multiplications adds an overhead that may not even be necessary if the consumer of the profile doesn't care about combining profiles * the information itself may be interesting as an indication of relative sampling of various contexts.
2025-03-05	[ctxprof] Prepare profile format for flat profiles (#129626)	Mircea Trofin	1	-28/+30
	The profile format has now a separate section called "Contexts" - there will be a corresponding one for flat profiles. The root has a separate tag because, in addition to not having a callsite ID as all the other context nodes have under it, it will have additional fields in subsequent patches. The rest of this patch amounts to a bit of refactorings in the reader/writer (for better reuse later) and tests fixups.
2025-02-28	[PAC] Make ValueMapper handle ConstantPtrAuth values (#129088)	Anatoly Trosinenko	1	-0/+30
	Fix assertion failure when building PAuth-hardened code with LTO. W/o assertions we end with invalid codegen.
2025-02-13	[reland][DebugInfo] Update DIBuilder insertion to take InsertPosition (#126967)	Harald van Dijk	1	-2/+3
	After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock . This commit also updates the existing calls to the DIBuilder methods to pass in iterators. As a special exception, DIBuilder::insertDeclare() keeps a separate overload accepting a BasicBlock InsertAtEnd. This is because despite the name, this method does not insert at the end of the block, therefore this cannot be handled implicitly by using InsertPosition.
2025-02-12	Revert "[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)"	Harald van Dijk	1	-3/+2
	This reverts commit 3ec9f7494b31f2fe51d5ed0e07adcf4b7199def6.
2025-02-12	[DebugInfo] Update DIBuilder insertion to take InsertPosition (#126059)	Harald van Dijk	1	-2/+3
	After #124287 updated several functions to return iterators rather than Instruction , it was no longer straightforward to pass their result to DIBuilder. This commit updates DIBuilder methods to accept an InsertPosition instead, so that they can be called with an iterator (preferred), or with a deprecation warning an Instruction , or a BasicBlock *. This commit also updates the existing calls to the DIBuilder methods to pass in iterators.
2025-02-06	[llvm][DebugInfo] Add new DW_AT_APPLE_enum_kind to encode enum_extensibility ↵	Michael Buch	1	-1/+2
	(#124752) When creating `EnumDecl`s from DWARF for Objective-C `NS_ENUM`s, the Swift compiler tries to figure out if it should perform "swiftification" of that enum (which involves renaming the enumerator cases, etc.). The heuristics by which it determines whether we want to swiftify an enum is by checking the `enum_extensibility` attribute (because that's what `NS_ENUM` pretty much are). Currently LLDB fails to attach the `EnumExtensibilityAttr` to `EnumDecl`s it creates (because there's not enough info in DWARF to derive it), which means we have to fall back to re-building Swift modules on-the-fly, slowing down expression evaluation substantially. This happens around https://github.com/swiftlang/swift/blob/4b3931c8ce437b3f13f245e6423f95c94a5876ac/lib/ClangImporter/ImportEnumInfo.cpp#L37-L59 To speed up Swift exression evaluation, this patch proposes encoding the C/C++/Objective-C `enum_extensibility` attribute in DWARF via a new `DW_AT_APPLE_ENUM_KIND`. This would currently be only used from the LLDB Swift plugin. But may be of interest to other language plugins as well (though I haven't come up with a concrete use-case for it outside of Swift). I'm open to naming suggestions of the various new attributes/attribute constants proposed here. I tried to be as generic as possible if we wanted to extend it to other kinds of enum properties (e.g., flag enums). The new attribute would look as follows: ``` DW_TAG_enumeration_type DW_AT_type (0x0000003a "unsigned int") DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Closed) DW_AT_name ("ClosedEnum") DW_AT_byte_size (0x04) DW_AT_decl_file ("enum.c") DW_AT_decl_line (23) DW_TAG_enumeration_type DW_AT_type (0x0000003a "unsigned int") DW_AT_APPLE_enum_kind (DW_APPLE_ENUM_KIND_Open) DW_AT_name ("OpenEnum") DW_AT_byte_size (0x04) DW_AT_decl_file ("enum.c") DW_AT_decl_line (27) ``` Absence of the attribute means the extensibility of the enum is unknown and abides by whatever the language rules of that CU dictate. This does feel like a big hammer for quite a specific use-case, so I'm happy to discuss alternatives. Alternatives considered: * Re-using an existing DWARF attribute to express extensibility. E.g., a `DW_TAG_enumeration_type` could have a `DW_AT_count` or `DW_AT_upper_bound` indicating the number of enumerators, which could imply closed-ness. I felt like a dedicated attribute (which could be generalized further) seemed more applicable. But I'm open to re-using existing attributes. * Encoding the entire attribute string (i.e., `DW_TAG_LLVM_annotation ("enum_extensibility((open))")`) on the `DW_TAG_enumeration_type`. Then in LLDB somehow parse that out into a `EnumExtensibilityAttr`. I haven't found a great API in Clang to parse arbitrary strings into AST nodes (the ones I've found required fully formed C++ constructs). Though if someone knows of a good way to do this, happy to consider that too.
2025-01-29	[IR] Convert from nocapture to captures(none) (#123181)	Nikita Popov	1	-1/+2
	This PR removes the old `nocapture` attribute, replacing it with the new `captures` attribute introduced in #116990. This change is intended to be essentially NFC, replacing existing uses of `nocapture` with `captures(none)` without adding any new analysis capabilities. Making use of non-`none` values is left for a followup. Some notes: * `nocapture` will be upgraded to `captures(none)` by the bitcode reader. * `nocapture` will also be upgraded by the textual IR reader. This is to make it easier to use old IR files and somewhat reduce the test churn in this PR. * Helper APIs like `doesNotCapture()` will check for `captures(none)`. * MLIR import will convert `captures(none)` into an `llvm.nocapture` attribute. The representation in the LLVM IR dialect should be updated separately.
2025-01-27	[LoopUnroll] Add RuntimeUnrollMultiExit to loop unroll options (NFC) (#124462)	Florian Hahn	1	-1/+1
	Add an extra knob to RuntimeUnrollMultiExit to let backends control whether to allow multi-exit unrolling on a per-loop basis. This gives backends more fine-grained control on deciding if multi-exit unrolling is profitable for a given loop and uarch. Similar to 4226e0a0c75. PR: https://github.com/llvm/llvm-project/pull/124462
2025-01-15	[ctxprof] dump profiles using yaml (for testing) (#123108)	Mircea Trofin	1	-40/+30
	This is a follow-up from PR #122545, which enabled converting yaml to contextual profiles. This change uses the lower level yaml APIs because: - the mapping APIs `llvm::yaml` offers don't work with `const` values, because they (the APIs) want to enable both serialization and deserialization - building a helper data structure would be an alternative, but it'd be either memory-consuming or overly-complex design, given the recursive nature of the contextual profiles.
2025-01-13	[DebugInfo] Map VAM args to `poison` instead of `undef` [NFC] (#122756)	Pedro Lobo	1	-2/+2
	If an argument cannot be mapped in `Mapper::mapValue`, it can be mapped to `poison` instead of `undef`.
2025-01-10	[ctxprof] Move test serialization to yaml (#122545)	Mircea Trofin	1	-1/+1
	We have a textual representation of contextual profiles for test scenarios, mainly. This patch moves that to YAML instead of JSON. YAML is more succinct and readable (some of the .ll tests should be illustrative). In addition, JSON is parse-able by the YAML reader. A subsequent patch will address deserialization. (thanks, @kazutakahirata, for showing me how to use the llvm YAML reader/writer APIs, which I incorrectly thought to be more low-level than the JSON ones!)
2024-12-02	[TTI] Add SCEVExpansionBudget to loop unrolling options. (#118316)	Florian Hahn	1	-1/+1
	Add an extra know to UnrollingPreferences to let backends control the maximum budget for SCEV expansions. This gives backends more fine-grained control on the cost of the runtime checks for runtime unrolling. PR: https://github.com/llvm/llvm-project/pull/118316
2024-11-28	[UnitTests] Move MergeFunction test from Utils to IPO	Nikita Popov	2	-247/+0
	So we depend on the correct library.
2024-11-28	[MergeFunctions] Add support to run the pass over a set of function pointers ↵	Rafael Eckstein	2	-0/+247
	(#111045) This modification will enable the usage of `MergeFunctions` as a standalone library. Currently, `MergeFunctions` can only be applied to an entire module. By adopting this change, developers will gain the flexibility to reuse the `MergeFunctions` code within their own projects, choosing which functions to merge; hence, promoting code reusability. Notice that this modification will not break backward compatibility, because `MergeFunctions` will still work as a pass after the modification.
2024-11-16	[llvm] Replace `UndefValue` placeholders with `PoisonValue` in unit tests ↵	Lee Wei	1	-2/+2
	[NFC] (#116453) This PR replaces all `UndefValue` act as placeholders with `PoisonValue` in `llvm/unittests`.
2024-11-12	[CodeExtractor] Refactor extractCodeRegion, fix alloca emission. (#114419)	Michael Kruse	1	-11/+80
	Reorganize the code into phases: * Analyze/normalize * Create extracted function prototype * Generate the new function's implementation * Generate call to new function * Connect call to original function's CFG The motivation is #114669 to optionally clone the selected code region into the new function instead of moving it. The current structure made it difficult to add such functionality since there was no obvious place to do so, not made easier by some functions doing more than their name suggests. For instance, constructFunction modifies code outside the constructed function, but also function properties such as setPersonalityFn are derived somewhere else. Another example is emitCallAndSwitchStatement, which despite its name also inserts stores for output parameters. Many operations also implicitly depend on the order they are applied which this patch tries to reduce. For instance, ExtractedFuncRetVals becomes the list exit blocks which also defines the return value when leaving via that block. It is computed early such that the new function's return instructions and the switch can be generated independently. Also, ExtractedFuncRetVals is combining the lists ExitBlocks and OldTargets which were not always kept consistent with each other or NumExitBlocks. The method recomputeExitBlocks() will update it when necessary. The coding style partially contradict the current coding standard. For instance some local variable start with lower case letters. I updated some, but not all occurrences to make the diff match at least some lines as unchanged. The patch [D96854](https://reviews.llvm.org/D96854) introduced some confusion of function argument indexes this is fixed here as well, hence the patch is not NFC anymore. Tested in modified CodeExtractorTest.cpp. Patch [D121061](https://reviews.llvm.org/D121061) introduced AllocationBlock, but not all allocas were inserted there. Efectively includes the following fixes: 1. https://github.com/llvm/llvm-project/commit/ce73b1672a6053d5974dc2342881aac02efe2dbb 2. https://github.com/llvm/llvm-project/commit/4aaa92578686176243a294eeb2ca5697a99edcaa 3. Missing allocas, still unfixed Originally submitted as https://reviews.llvm.org/D115218
2024-11-04	[llvm][CodeExtractor] fix bug in parameter naming (#114237)	Tom Eccles	1	-0/+47
	The code extractor tries to apply the names of source input and output values to function arguments. Not all input and output values get added as arguments: some are instead placed inside of a struct passed to the function. The existing renaming code skipped trying to set these struct-packed arguments names (as there is no corresponding function argument to rename), but it still incremented the iterator over the function arguments. This could result in dereferencing an end iterator if struct-packed inputs/outputs preceded non-struct-packed inputs/outputs. This patch rewrites this loop to avoid the end iterator dereference.
2024-10-01	[SCEVExpander] Clear flags when reusing GEP (#109293)	Nikita Popov	1	-0/+32
	As pointed out in the review of #102133, SCEVExpander currently incorrectly reuses GEP instructions that have poison-generating flags set. Fix this by clearing the flags on the reused instruction.
2024-09-21	[unittests] Use {} instead of std::nullopt to initialize empty ArrayRef ↵	Jay Foad	2	-13/+13
	(#109388) Follow up to #109133.
2024-09-13	[llvm][unittests] Strip unneeded use of raw_string_ostream::str() (NFC)	JOE1994	1	-1/+1
	Avoid excess layer of indirection.
2024-09-03	[ctx_prof] Add Inlining support (#106154)	Mircea Trofin	1	-2/+1
	Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant. Post-inlining, the update mainly consists of: - making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions) - in the contextual profile: - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual. - the contexts of the callee (at the inlined callsite) are moved to the caller. - the callee context at the inlined callsite is deleted.
2024-08-27	Fix dynamic linking bots after PR #105469	Mircea Trofin	1	-0/+1

2024-08-27	[ctx_prof] Add support for ICP (#105469)	Mircea Trofin	1	-0/+157
	An overload of `llvm::promoteCallWithIfThenElse` that updates the contextual profile. High-level, this is very simple: after creating the `if... then (direct call) else (indirect call)` structure, we instrument the new callsites and BBs (the instrumentation will help with tracking for other IPO transformations, and, ultimately, to match counter values before flattening to `MD_prof`). In more detail: - move the callsite instrumentation of the indirect call to the `else` BB, before the indirect call - create a new callsite instrumentation for the direct call - create instrumentation for both the `then` and `else` BBs - we could instrument just one (MST-style) but we're not running the binary with this instrumentation, and at most this would save some space (less counters tracked). For simplicity instrumenting both at this point - update each context belonging to the caller by updating the counters, and moving the indirect callee to the new, direct callsite ID Issue #89287
2024-08-12	Clean up pointer casts etc after opaque pointers transition. NFC (#102631)	Bjorn Pettersson	1	-19/+12