[MemProf] Control availability of hot/cold operator new from LTO link

Adds an LTO option to indicate that whether we are linking with an allocator that supports hot/cold operator new interfaces. If not, at the start of the LTO backends any existing memprof hot/cold attributes are removed from the IR, and we also remove memprof metadata so that post-LTO inlining doesn't add any new attributes. This is done via setting a new flag in the module summary index. It is important to communicate via the index to the LTO backends so that distributed ThinLTO handles this correctly, as they are invoked by separate clang processes and the combined index is how we communicate information from the LTO link. Specifically, for distributed ThinLTO the LTO related processes look like: ``` # Thin link: $ lld --thinlto-index-only obj1.o ... objN.o -llib ... # ThinLTO backends: $ clang -x ir obj1.o -fthinlto-index=obj1.o.thinlto.bc -c -O2 ... $ clang -x ir objN.o -fthinlto-index=objN.o.thinlto.bc -c -O2 ``` It is during the thin link (lld --thinlto-index-only) that we have visibility into linker dependences and want to be able to pass the new option via -Wl,-supports-hot-cold-new. This will be recorded in the summary indexes created for the distributed backend processes (*.thinlto.bc) and queried from there, so that we don't need to know during those individual clang backends what allocation library was linked. Since in-process ThinLTO and regular LTO also use a combined index, for consistency we query the flag out of the index in all LTO backends. Additionally, when the LTO option is disabled, exit early from the MemProfContextDisambiguation handling performed during LTO, as this is unnecessary. Depends on D149117 and D149192. Differential Revision: https://reviews.llvm.org/D149215
author: Teresa Johnson <tejohnson@google.com> 2023-05-06 07:14:56 -0700
committer: Teresa Johnson <tejohnson@google.com> 2023-05-08 08:02:21 -0700
commit: 176889868024d98db032842bc47b416997d9e349 (patch)
tree: 7468cc29a11423938b46cc00c87bf38a460baaa3 /llvm/lib
parent: efd0b42171e9b3536d02becbbf3738ded2adbcb3 (diff)
download: llvm-176889868024d98db032842bc47b416997d9e349.zip
llvm-176889868024d98db032842bc47b416997d9e349.tar.gz
llvm-176889868024d98db032842bc47b416997d9e349.tar.bz2
5 files changed, 80 insertions, 2 deletions
diff --git a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
index 6461624..a1d7044 100644
--- a/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
+++ b/llvm/lib/Bitcode/Reader/BitcodeReader.cpp
@@ -8067,7 +8067,7 @@ static Expected<bool> getEnableSplitLTOUnitFlag(BitstreamCursor &Stream,
     case bitc::FS_FLAGS: { // [flags]
       uint64_t Flags = Record[0];
       // Scan flags.
-      assert(Flags <= 0xff && "Unexpected bits in flag");
+      assert(Flags <= 0x1ff && "Unexpected bits in flag");
 
       return Flags & 0x8;
     }
diff --git a/llvm/lib/IR/ModuleSummaryIndex.cpp b/llvm/lib/IR/ModuleSummaryIndex.cpp
index 99e5044..7cba03e 100644
--- a/llvm/lib/IR/ModuleSummaryIndex.cpp
+++ b/llvm/lib/IR/ModuleSummaryIndex.cpp
@@ -107,11 +107,13 @@ uint64_t ModuleSummaryIndex::getFlags() const {
     Flags |= 0x40;
   if (withWholeProgramVisibility())
     Flags |= 0x80;
+  if (withSupportsHotColdNew())
+    Flags |= 0x100;
   return Flags;
 }
 
 void ModuleSummaryIndex::setFlags(uint64_t Flags) {
-  assert(Flags <= 0xff && "Unexpected bits in flag");
+  assert(Flags <= 0x1ff && "Unexpected bits in flag");
   // 1 bit: WithGlobalValueDeadStripping flag.
   // Set on combined index only.
   if (Flags & 0x1)
@@ -145,6 +147,10 @@ void ModuleSummaryIndex::setFlags(uint64_t Flags) {
   // Set on combined index only.
   if (Flags & 0x80)
     setWithWholeProgramVisibility();
+  // 1 bit: WithSupportsHotColdNew flag.
+  // Set on combined index only.
+  if (Flags & 0x100)
+    setWithSupportsHotColdNew();
 }
 
 // Collect for the given module the list of function it defines
diff --git a/llvm/lib/LTO/LTO.cpp b/llvm/lib/LTO/LTO.cpp
index fee09fc..de9b8f1 100644
--- a/llvm/lib/LTO/LTO.cpp
+++ b/llvm/lib/LTO/LTO.cpp
@@ -76,6 +76,10 @@ cl::opt<bool> EnableLTOInternalization(
     cl::desc("Enable global value internalization in LTO"));
 }
 
+/// Indicate we are linking with an allocator that supports hot/cold operator
+/// new interfaces.
+extern cl::opt<bool> SupportsHotColdNew;
+
 /// Enable MemProf context disambiguation for thin link.
 extern cl::opt<bool> EnableMemProfContextDisambiguation;
 
@@ -1079,6 +1083,14 @@ Error LTO::run(AddStreamFn AddStream, FileCache Cache) {
     return StatsFileOrErr.takeError();
   std::unique_ptr<ToolOutputFile> StatsFile = std::move(StatsFileOrErr.get());
 
+  // TODO: Ideally this would be controlled automatically by detecting that we
+  // are linking with an allocator that supports these interfaces, rather than
+  // an internal option (which would still be needed for tests, however). For
+  // example, if the library exported a symbol like __malloc_hot_cold the linker
+  // could recognize that and set a flag in the lto::Config.
+  if (SupportsHotColdNew)
+    ThinLTO.CombinedIndex.setWithSupportsHotColdNew();
+
   Error Result = runRegularLTO(AddStream);
   if (!Result)
     Result = runThinLTO(AddStream, Cache, GUIDPreservedSymbols);
@@ -1089,6 +1101,37 @@ Error LTO::run(AddStreamFn AddStream, FileCache Cache) {
   return Result;
 }
 
+void lto::updateMemProfAttributes(Module &Mod,
+                                  const ModuleSummaryIndex &Index) {
+  if (Index.withSupportsHotColdNew())
+    return;
+
+  // The profile matcher applies hotness attributes directly for allocations,
+  // and those will cause us to generate calls to the hot/cold interfaces
+  // unconditionally. If supports-hot-cold-new was not enabled in the LTO
+  // link then assume we don't want these calls (e.g. not linking with
+  // the appropriate library, or otherwise trying to disable this behavior).
+  for (auto &F : Mod) {
+    for (auto &BB : F) {
+      for (auto &I : BB) {
+        auto *CI = dyn_cast<CallBase>(&I);
+        if (!CI)
+          continue;
+        if (CI->hasFnAttr("memprof"))
+          CI->removeFnAttr("memprof");
+        // Strip off all memprof metadata as it is no longer needed.
+        // Importantly, this avoids the addition of new memprof attributes
+        // after inlining propagation.
+        // TODO: If we support additional types of MemProf metadata beyond hot
+        // and cold, we will need to update the metadata based on the allocator
+        // APIs supported instead of completely stripping all.
+        CI->setMetadata(LLVMContext::MD_memprof, nullptr);
+        CI->setMetadata(LLVMContext::MD_callsite, nullptr);
+      }
+    }
+  }
+}
+
 Error LTO::runRegularLTO(AddStreamFn AddStream) {
   // Setup optimization remarks.
   auto DiagFileOrErr = lto::setupLLVMOptimizationRemarks(
@@ -1142,6 +1185,8 @@ Error LTO::runRegularLTO(AddStreamFn AddStream) {
     }
   }
 
+  updateMemProfAttributes(*RegularLTO.CombinedModule, ThinLTO.CombinedIndex);
+
   // If allowed, upgrade public vcall visibility metadata to linkage unit
   // visibility before whole program devirtualization in the optimizer.
   updateVCallVisibilityInModule(*RegularLTO.CombinedModule,
diff --git a/llvm/lib/LTO/LTOBackend.cpp b/llvm/lib/LTO/LTOBackend.cpp
index d574ae5..a18963f 100644
--- a/llvm/lib/LTO/LTOBackend.cpp
+++ b/llvm/lib/LTO/LTOBackend.cpp
@@ -565,6 +565,8 @@ Error lto::thinBackend(const Config &Conf, unsigned Task, AddStreamFn AddStream,
   // the module, if applicable.
   Mod.setPartialSampleProfileRatio(CombinedIndex);
 
+  updateMemProfAttributes(Mod, CombinedIndex);
+
   updatePublicTypeTestCalls(Mod, CombinedIndex.withWholeProgramVisibility());
 
   if (Conf.CodeGenOnly) {
diff --git a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
index e65b9da..a3ca910 100644
--- a/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
+++ b/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp
@@ -104,6 +104,12 @@ static cl::opt<std::string> MemProfImportSummary(
     cl::desc("Import summary to use for testing the ThinLTO backend via opt"),
     cl::Hidden);
 
+// Indicate we are linking with an allocator that supports hot/cold operator
+// new interfaces.
+cl::opt<bool> SupportsHotColdNew(
+    "supports-hot-cold-new", cl::init(false), cl::Hidden,
+    cl::desc("Linking with hot/cold operator new interfaces"));
+
 /// CRTP base for graphs built from either IR or ThinLTO summary index.
 ///
 /// The graph represents the call contexts in all memprof metadata on allocation
@@ -3190,6 +3196,17 @@ bool MemProfContextDisambiguation::processModule(
   if (ImportSummary)
     return applyImport(M);
 
+  // TODO: If/when other types of memprof cloning are enabled beyond just for
+  // hot and cold, we will need to change this to individually control the
+  // AllocationType passed to addStackNodesForMIB during CCG construction.
+  // Note that we specifically check this after applying imports above, so that
+  // the option isn't needed to be passed to distributed ThinLTO backend
+  // clang processes, which won't necessarily have visibility into the linker
+  // dependences. Instead the information is communicated from the LTO link to
+  // the backends via the combined summary index.
+  if (!SupportsHotColdNew)
+    return false;
+
   ModuleCallsiteContextGraph CCG(M, OREGetter);
   return CCG.process();
 }
@@ -3241,6 +3258,14 @@ void MemProfContextDisambiguation::run(
     ModuleSummaryIndex &Index,
     function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
         isPrevailing) {
+  // TODO: If/when other types of memprof cloning are enabled beyond just for
+  // hot and cold, we will need to change this to individually control the
+  // AllocationType passed to addStackNodesForMIB during CCG construction.
+  // The index was set from the option, so these should be in sync.
+  assert(Index.withSupportsHotColdNew() == SupportsHotColdNew);
+  if (!SupportsHotColdNew)
+    return;
+
   IndexCallsiteContextGraph CCG(Index, isPrevailing);
   CCG.process();
 }
author	Teresa Johnson <tejohnson@google.com>	2023-05-06 07:14:56 -0700
committer	Teresa Johnson <tejohnson@google.com>	2023-05-08 08:02:21 -0700
commit	176889868024d98db032842bc47b416997d9e349 (patch)
tree	7468cc29a11423938b46cc00c87bf38a460baaa3 /llvm/lib
parent	efd0b42171e9b3536d02becbbf3738ded2adbcb3 (diff)
download	llvm-176889868024d98db032842bc47b416997d9e349.zip llvm-176889868024d98db032842bc47b416997d9e349.tar.gz llvm-176889868024d98db032842bc47b416997d9e349.tar.bz2