diff options
author | Sam McCall <sam.mccall@gmail.com> | 2022-09-27 01:15:06 +0200 |
---|---|---|
committer | Sam McCall <sam.mccall@gmail.com> | 2022-10-05 18:29:01 +0200 |
commit | 41b51007e6376cba72b00fb655a63b06c554d4e1 (patch) | |
tree | 9348e99cdc4bfb627cd6fe4d037fbe43094157fc /clang/lib/Basic/SourceManager.cpp | |
parent | 15aa64301ab146dec7c6ffcd0418ed834bf099e2 (diff) | |
download | llvm-41b51007e6376cba72b00fb655a63b06c554d4e1.zip llvm-41b51007e6376cba72b00fb655a63b06c554d4e1.tar.gz llvm-41b51007e6376cba72b00fb655a63b06c554d4e1.tar.bz2 |
Fix SourceManager::isBeforeInTranslationUnit bug with token-pasting
isBeforeInTranslationUnit compares SourceLocations across FileIDs by
mapping them onto a common ancestor file, following include/expansion edges.
It is possible to get a tie in the common ancestor, because multiple
"chunks" of a macro arg will expand to the same macro param token in the body:
#define ID(X) X
#define TWO 2
ID(1 TWO)
Here two FileIDs both expand into `X` in ID's expansion:
- one containing `1` and spelled on line 3
- one containing `2` and spelled by the macro expansion of TWO
isBeforeInTranslationUnit breaks this tie by comparing the two FileIDs:
the one "on the left" is always created first and is numerically smaller.
This seems correct so far.
Prior to this patch it also takes a shortcut (unclear if intentionally).
Instead of comparing the two FileIDs that directly expand to the same location,
it compares the original FileIDs being compared. These may not be the
same if there are multiple macro expansions in between.
This *almost* always yields the right answer, because macro expansion
yields "trees" of FileIDs allocated in a contiguous range: when comparing tree A
to tree B, it doesn't matter what representative you pick.
However, the splitting of >> tokens is modeled as macro expansion (as if
the first '>' was a macro that expands to a '>' spelled a scratch buffer).
This splitting occurs retroactively when parsing, so the FileID allocated is
larger than expected if it were a real macro expansion performed during lexing.
As a result, macro tree A can be on the left of tree B, and yet contain
a token-split FileID whose numeric value is *greator* than those in B.
In this case the tiebreak gives the wrong answer.
Concretely:
#define ID(X) X
template <typename> class S{};
ID(
ID(S<S<int>> x);
int y;
)
Given Greater = (typeloc of S<int>).getEndLoc();
Y = (decl of y).getLocation();
isBeforeInTranslationUnit(Greater, Y) should return true, but returns false.
Here the common FileID of (Greater, Y) is the body of the outer ID
expansion, and they both expand to X within it.
With the current tiebreak rules, we compare the FileID of Greater (a split)
to the FileID of Y (a macro arg expansion into X of the outer ID).
The former is larger because the token split occurred relatively late.
This patch fixes the issue by removing the shortcut. It tracks the immediate
FileIDs used to reach the common file, and uses these IDs to break ties.
In the example, we now compare the macro arg expansion of the inner ID()
to the macro arg expansion of Y, and find that it is smaller.
This requires some changes to the InBeforeInTUCacheEntry (sic).
We store a little more data so it's probably slightly slower.
It was difficult to resist more invasive changes:
- performance: the sizing is very suspicious, and once the cache "fills up"
we're thrashing a single entry
- API: the class seems to be needlessly complicated
However I tried to avoid mixing these with subtle behavior changes, and
will send a followup instead.
Differential Revision: https://reviews.llvm.org/D134685
Diffstat (limited to 'clang/lib/Basic/SourceManager.cpp')
-rw-r--r-- | clang/lib/Basic/SourceManager.cpp | 77 |
1 files changed, 49 insertions, 28 deletions
diff --git a/clang/lib/Basic/SourceManager.cpp b/clang/lib/Basic/SourceManager.cpp index 24e4aef..7131b50c 100644 --- a/clang/lib/Basic/SourceManager.cpp +++ b/clang/lib/Basic/SourceManager.cpp @@ -1992,6 +1992,7 @@ InBeforeInTUCacheEntry &SourceManager::getInBeforeInTUCache(FileID LFID, // This is a magic number for limiting the cache size. It was experimentally // derived from a small Objective-C project (where the cache filled // out to ~250 items). We can make it larger if necessary. + // FIXME: this is almost certainly full these days. Use an LRU cache? enum { MagicCacheSize = 300 }; IsBeforeInTUCacheKey Key(LFID, RFID); @@ -2000,7 +2001,7 @@ InBeforeInTUCacheEntry &SourceManager::getInBeforeInTUCache(FileID LFID, // use. When they update the value, the cache will get automatically // updated as well. if (IBTUCache.size() < MagicCacheSize) - return IBTUCache[Key]; + return IBTUCache.try_emplace(Key, LFID, RFID).first->second; // Otherwise, do a lookup that will not construct a new value. InBeforeInTUCache::iterator I = IBTUCache.find(Key); @@ -2008,6 +2009,7 @@ InBeforeInTUCacheEntry &SourceManager::getInBeforeInTUCache(FileID LFID, return I->second; // Fall back to the overflow value. + IBTUCacheOverflow.setQueryFIDs(LFID, RFID); return IBTUCacheOverflow; } @@ -2082,43 +2084,62 @@ std::pair<bool, bool> SourceManager::isInTheSameTranslationUnit( // If we are comparing a source location with multiple locations in the same // file, we get a big win by caching the result. - if (IsBeforeInTUCache.isCacheValid(LOffs.first, ROffs.first)) + if (IsBeforeInTUCache.isCacheValid()) return std::make_pair( true, IsBeforeInTUCache.getCachedResult(LOffs.second, ROffs.second)); - // Okay, we missed in the cache, start updating the cache for this query. - IsBeforeInTUCache.setQueryFIDs(LOffs.first, ROffs.first, - /*isLFIDBeforeRFID=*/LOffs.first.ID < ROffs.first.ID); - + // Okay, we missed in the cache, we'll compute the answer and populate it. // We need to find the common ancestor. The only way of doing this is to // build the complete include chain for one and then walking up the chain // of the other looking for a match. - // We use a map from FileID to Offset to store the chain. Easier than writing - // a custom set hash info that only depends on the first part of a pair. - using LocSet = llvm::SmallDenseMap<FileID, unsigned, 16>; - LocSet LChain; + + // A location within a FileID on the path up from LOffs to the main file. + struct Entry { + unsigned Offset; + FileID ParentFID; // Used for breaking ties. + }; + llvm::SmallDenseMap<FileID, Entry, 16> LChain; + + FileID Parent; do { - LChain.insert(LOffs); + LChain.try_emplace(LOffs.first, Entry{LOffs.second, Parent}); // We catch the case where LOffs is in a file included by ROffs and // quit early. The other way round unfortunately remains suboptimal. - } while (LOffs.first != ROffs.first && !MoveUpIncludeHierarchy(LOffs, *this)); - LocSet::iterator I; - while((I = LChain.find(ROffs.first)) == LChain.end()) { - if (MoveUpIncludeHierarchy(ROffs, *this)) - break; // Met at topmost file. - } - if (I != LChain.end()) - LOffs = *I; + if (LOffs.first == ROffs.first) + break; + Parent = LOffs.first; + } while (!MoveUpIncludeHierarchy(LOffs, *this)); - // If we exited because we found a nearest common ancestor, compare the - // locations within the common file and cache them. - if (LOffs.first == ROffs.first) { - IsBeforeInTUCache.setCommonLoc(LOffs.first, LOffs.second, ROffs.second); - return std::make_pair( - true, IsBeforeInTUCache.getCachedResult(LOffs.second, ROffs.second)); - } - // Clear the lookup cache, it depends on a common location. - IsBeforeInTUCache.clear(); + Parent = FileID(); + do { + auto I = LChain.find(ROffs.first); + if (I != LChain.end()) { + // Compare the locations within the common file and cache them. + LOffs.first = I->first; + LOffs.second = I->second.Offset; + // The relative order of LParent and RParent is a tiebreaker when + // - locs expand to the same location (occurs in macro arg expansion) + // - one loc is a parent of the other (we consider the parent as "first") + // For the parent to be first, the invalid file ID must compare smaller. + // However loaded FileIDs are <0, so we perform *unsigned* comparison! + // This changes the relative order of local vs loaded FileIDs, but it + // doesn't matter as these are never mixed in macro expansion. + unsigned LParent = I->second.ParentFID.ID; + unsigned RParent = Parent.ID; + assert((LOffs.second != ROffs.second) || (LParent == 0 || RParent == 0) || + isInSameSLocAddrSpace(getComposedLoc(I->second.ParentFID, 0), + getComposedLoc(Parent, 0), nullptr) && + "Mixed local/loaded FileIDs with same include location?"); + IsBeforeInTUCache.setCommonLoc(LOffs.first, LOffs.second, ROffs.second, + LParent < RParent); + return std::make_pair( + true, IsBeforeInTUCache.getCachedResult(LOffs.second, ROffs.second)); + } + Parent = ROffs.first; + } while (!MoveUpIncludeHierarchy(ROffs, *this)); + + // If we found no match, we're not in the same TU. + // We don't cache this, but it is rare. return std::make_pair(false, false); } |