diff options
author | Kazu Hirata <kazu@google.com> | 2024-06-07 17:25:57 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-06-07 17:25:57 -0700 |
commit | dc3f8c2f587e9647d1ce86426c2cf317c98f453c (patch) | |
tree | 23b30b42ffc526c74b34740615231aadaf96aa21 /llvm/unittests/ProfileData/MemProfTest.cpp | |
parent | 435dd9746107e13c2ad019be3bd34815f7d2360d (diff) | |
download | llvm-dc3f8c2f587e9647d1ce86426c2cf317c98f453c.zip llvm-dc3f8c2f587e9647d1ce86426c2cf317c98f453c.tar.gz llvm-dc3f8c2f587e9647d1ce86426c2cf317c98f453c.tar.bz2 |
[memprof] Improve deserialization performance in V3 (#94787)
We call llvm::sort in a couple of places in the V3 encoding:
- We sort Frames by FrameIds for stability of the output.
- We sort call stacks in the dictionary order to maximize the length
of the common prefix between adjacent call stacks.
It turns out that we can improve the deserialization performance by
modifying the comparison functions -- without changing the format at
all. Both places take advantage of the histogram of Frames -- how
many times each Frame occurs in the call stacks.
- Frames: We serialize popular Frames in the descending order of
popularity for improved cache locality. For two equally popular
Frames, we break a tie by serializing one that tends to appear
earlier in call stacks. Here, "earlier" means a smaller index
within llvm::SmallVector<FrameId>.
- Call Stacks: We sort the call stacks to reduce the number of times
we follow pointers to parents during deserialization. Specifically,
instead of comparing two call stacks in the strcmp style -- integer
comparisons of FrameIds, we compare two FrameIds F1 and F2 with
Histogram[F1] < Histogram[F2] at respective indexes. Since we
encode from the end of the sorted list of call stacks, we tend to
encode popular call stacks first.
Since the two places use the same histogram, we compute it once and
share it in the two places.
Sorting the call stacks reduces the number of "jumps" by 74% when we
deserialize all MemProfRecords. The cycle and instruction counts go
down by 10% and 1.5%, respectively.
If we sort the Frames in addition to the call stacks, then the cycle
and instruction counts go down by 14% and 1.6%, respectively, relative
to the same baseline (that is, without this patch).
Diffstat (limited to 'llvm/unittests/ProfileData/MemProfTest.cpp')
-rw-r--r-- | llvm/unittests/ProfileData/MemProfTest.cpp | 24 |
1 files changed, 20 insertions, 4 deletions
diff --git a/llvm/unittests/ProfileData/MemProfTest.cpp b/llvm/unittests/ProfileData/MemProfTest.cpp index 2642120..15eb59e 100644 --- a/llvm/unittests/ProfileData/MemProfTest.cpp +++ b/llvm/unittests/ProfileData/MemProfTest.cpp @@ -667,8 +667,12 @@ TEST(MemProf, MissingFrameId) { TEST(MemProf, RadixTreeBuilderEmpty) { llvm::DenseMap<FrameId, llvm::memprof::LinearFrameId> MemProfFrameIndexes; llvm::MapVector<CallStackId, llvm::SmallVector<FrameId>> MemProfCallStackData; + llvm::DenseMap<llvm::memprof::FrameId, llvm::memprof::FrameStat> + FrameHistogram = + llvm::memprof::computeFrameHistogram(MemProfCallStackData); llvm::memprof::CallStackRadixTreeBuilder Builder; - Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes); + Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes, + FrameHistogram); ASSERT_THAT(Builder.getRadixArray(), testing::IsEmpty()); const auto Mappings = Builder.takeCallStackPos(); ASSERT_THAT(Mappings, testing::IsEmpty()); @@ -681,8 +685,12 @@ TEST(MemProf, RadixTreeBuilderOne) { llvm::SmallVector<llvm::memprof::FrameId> CS1 = {13, 12, 11}; llvm::MapVector<CallStackId, llvm::SmallVector<FrameId>> MemProfCallStackData; MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS1), CS1}); + llvm::DenseMap<llvm::memprof::FrameId, llvm::memprof::FrameStat> + FrameHistogram = + llvm::memprof::computeFrameHistogram(MemProfCallStackData); llvm::memprof::CallStackRadixTreeBuilder Builder; - Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes); + Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes, + FrameHistogram); EXPECT_THAT(Builder.getRadixArray(), testing::ElementsAreArray({ 3U, // Size of CS1, 3U, // MemProfFrameIndexes[13] @@ -704,8 +712,12 @@ TEST(MemProf, RadixTreeBuilderTwo) { llvm::MapVector<CallStackId, llvm::SmallVector<FrameId>> MemProfCallStackData; MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS1), CS1}); MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS2), CS2}); + llvm::DenseMap<llvm::memprof::FrameId, llvm::memprof::FrameStat> + FrameHistogram = + llvm::memprof::computeFrameHistogram(MemProfCallStackData); llvm::memprof::CallStackRadixTreeBuilder Builder; - Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes); + Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes, + FrameHistogram); EXPECT_THAT(Builder.getRadixArray(), testing::ElementsAreArray({ 2U, // Size of CS1 @@ -738,8 +750,12 @@ TEST(MemProf, RadixTreeBuilderSuccessiveJumps) { MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS2), CS2}); MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS3), CS3}); MemProfCallStackData.insert({llvm::memprof::hashCallStack(CS4), CS4}); + llvm::DenseMap<llvm::memprof::FrameId, llvm::memprof::FrameStat> + FrameHistogram = + llvm::memprof::computeFrameHistogram(MemProfCallStackData); llvm::memprof::CallStackRadixTreeBuilder Builder; - Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes); + Builder.build(std::move(MemProfCallStackData), MemProfFrameIndexes, + FrameHistogram); EXPECT_THAT(Builder.getRadixArray(), testing::ElementsAreArray({ 4U, // Size of CS1 |