diff options
author | Arthur Eubanks <aeubanks@google.com> | 2024-06-11 10:50:13 -0600 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-06-11 09:50:13 -0700 |
commit | 71497cc7a4695d22fc5dfd64958744816c15a19e (patch) | |
tree | d11ee580a32384decfe7fb2845b829128ff06deb /llvm/lib/Analysis/CGSCCPassManager.cpp | |
parent | a03e93e1b2172791085f3f8c293b8e5d6ed8d841 (diff) | |
download | llvm-71497cc7a4695d22fc5dfd64958744816c15a19e.zip llvm-71497cc7a4695d22fc5dfd64958744816c15a19e.tar.gz llvm-71497cc7a4695d22fc5dfd64958744816c15a19e.tar.bz2 |
[CGSCC] Fix compile time blowup with large RefSCCs (#94815)
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.
There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().
1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().
2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.
Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.
Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
Diffstat (limited to 'llvm/lib/Analysis/CGSCCPassManager.cpp')
-rw-r--r-- | llvm/lib/Analysis/CGSCCPassManager.cpp | 42 |
1 files changed, 9 insertions, 33 deletions
diff --git a/llvm/lib/Analysis/CGSCCPassManager.cpp b/llvm/lib/Analysis/CGSCCPassManager.cpp index 8ae5c3d..2ed1d98 100644 --- a/llvm/lib/Analysis/CGSCCPassManager.cpp +++ b/llvm/lib/Analysis/CGSCCPassManager.cpp @@ -158,10 +158,12 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) { SmallDenseSet<std::pair<LazyCallGraph::Node *, LazyCallGraph::SCC *>, 4> InlinedInternalEdges; + SmallVector<Function *, 4> DeadFunctions; + CGSCCUpdateResult UR = { - RCWorklist, CWorklist, InvalidRefSCCSet, - InvalidSCCSet, nullptr, PreservedAnalyses::all(), - InlinedInternalEdges, {}}; + RCWorklist, CWorklist, InvalidRefSCCSet, + InvalidSCCSet, nullptr, PreservedAnalyses::all(), + InlinedInternalEdges, DeadFunctions, {}}; // Request PassInstrumentation from analysis manager, will use it to run // instrumenting callbacks for the passes later. @@ -340,6 +342,10 @@ ModuleToPostOrderCGSCCPassAdaptor::run(Module &M, ModuleAnalysisManager &AM) { } while (!RCWorklist.empty()); } + CG.removeDeadFunctions(DeadFunctions); + for (Function *DeadF : DeadFunctions) + DeadF->eraseFromParent(); + #if defined(EXPENSIVE_CHECKS) // Verify that the call graph is still valid. CG.verify(); @@ -1030,36 +1036,6 @@ static LazyCallGraph::SCC &updateCGAndAnalysisManagerForPass( return true; }); - // Now do a batch removal of the internal ref edges left. - auto NewRefSCCs = RC->removeInternalRefEdge(N, DeadTargets); - if (!NewRefSCCs.empty()) { - // The old RefSCC is dead, mark it as such. - UR.InvalidatedRefSCCs.insert(RC); - - // Note that we don't bother to invalidate analyses as ref-edge - // connectivity is not really observable in any way and is intended - // exclusively to be used for ordering of transforms rather than for - // analysis conclusions. - - // Update RC to the "bottom". - assert(G.lookupSCC(N) == C && "Changed the SCC when splitting RefSCCs!"); - RC = &C->getOuterRefSCC(); - assert(G.lookupRefSCC(N) == RC && "Failed to update current RefSCC!"); - - // The RC worklist is in reverse postorder, so we enqueue the new ones in - // RPO except for the one which contains the source node as that is the - // "bottom" we will continue processing in the bottom-up walk. - assert(NewRefSCCs.front() == RC && - "New current RefSCC not first in the returned list!"); - for (RefSCC *NewRC : llvm::reverse(llvm::drop_begin(NewRefSCCs))) { - assert(NewRC != RC && "Should not encounter the current RefSCC further " - "in the postorder list of new RefSCCs."); - UR.RCWorklist.insert(NewRC); - LLVM_DEBUG(dbgs() << "Enqueuing a new RefSCC in the update worklist: " - << *NewRC << "\n"); - } - } - // Next demote all the call edges that are now ref edges. This helps make // the SCCs small which should minimize the work below as we don't want to // form cycles that this would break. |