aboutsummaryrefslogtreecommitdiff
path: root/clang/unittests/Lex/LexerTest.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-06-26[clang] NFC: Add alias for std::pair<FileID, unsigned> used in ↵Haojian Wu1-4/+3
SourceLocation (#145711) Introduce a type alias for the commonly used `std::pair<FileID, unsigned>` to improve code readability, and make it easier for future updates (64-bit source locations).
2025-06-26[clang][Preprocessor] Handle the first pp-token in EnterMainSourceFile (#145244)yronglin1-15/+19
Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects](https://github.com/llvm/llvm-project/pull/143898). This PR fix the performance regression that introduced in https://github.com/llvm/llvm-project/pull/144233. The original PR(https://github.com/llvm/llvm-project/pull/144233) handle the first pp-token in the main source file in the macro definition/expansion and `Lexer::Lex`, but the lexer is almost always on the hot path, we may hit a performance regression. In this PR, we handle the first pp-token in `Preprocessor::EnterMainSourceFile`. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
2025-06-21[C++][Modules] A module directive may only appear as the first preprocessing ↵yronglin1-1/+46
tokens in a file (#144233) This PR is 2nd part of [P1857R3](https://github.com/llvm/llvm-project/pull/107168) implementation, and mainly implement the restriction `A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)`: [cpp.pre](https://eel.is/c++draft/cpp.pre): ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` We also refine tests use `split-file` instead of conditional macro. Signed-off-by: yronglin <yronglin777@gmail.com>
2025-05-22Reapply "[clang] Remove intrusive reference count from `DiagnosticOptions` ↵Jan Svoboda1-6/+4
(#139584)" This reverts commit e2a885537f11f8d9ced1c80c2c90069ab5adeb1d. Build failures were fixed right away and reverting the original commit without the fixes breaks the build again.
2025-05-22Revert "[clang] Remove intrusive reference count from `DiagnosticOptions` ↵Kazu Hirata1-4/+6
(#139584)" This reverts commit 9e306ad4600c4d3392c194a8be88919ee758425c. Multiple builtbot failures have been reported: https://github.com/llvm/llvm-project/pull/139584
2025-05-22[clang] Remove intrusive reference count from `DiagnosticOptions` (#139584)Jan Svoboda1-6/+4
The `DiagnosticOptions` class is currently intrusively reference-counted, which makes reasoning about its lifetime very difficult in some cases. For example, `CompilerInvocation` owns the `DiagnosticOptions` instance (wrapped in `llvm::IntrusiveRefCntPtr`) and only exposes an accessor returning `DiagnosticOptions &`. One would think this gives `CompilerInvocation` exclusive ownership of the object, but that's not the case: ```c++ void shareOwnership(CompilerInvocation &CI) { llvm::IntrusiveRefCntPtr<DiagnosticOptions> CoOwner = &CI.getDiagnosticOptions(); // ... } ``` This is a perfectly valid pattern that is being actually used in the codebase. I would like to ensure the ownership of `DiagnosticOptions` by `CompilerInvocation` is guaranteed to be exclusive. This can be leveraged for a copy-on-write optimization later on. This PR changes usages of `DiagnosticOptions` across `clang`, `clang-tools-extra` and `lldb` to not be intrusively reference-counted.
2025-04-28[clang] Hide the `TargetOptions` pointer from `CompilerInvocation` (#106271)Jan Svoboda1-1/+1
This PR hides the reference-counted pointer that holds `TargetOptions` from the public API of `CompilerInvocation`. This gives `CompilerInvocation` an exclusive control over the lifetime of this member, which will eventually be leveraged to implement a copy-on-write behavior. There are two clients that currently share ownership of that pointer: * `TargetInfo` - This was refactored to hold a non-owning reference to `TargetOptions`. The options object is typically owned by the `CompilerInvocation` or by the new `CompilerInstance::AuxTargetOpts` for the auxiliary target. This needed a bit of care in `ASTUnit::Parse()` to keep the `CompilerInvocation` alive. * `clangd::PreambleData` - This was refactored to exclusively own the `TargetOptions` that get moved out of the `CompilerInvocation`.
2025-04-04[clang] Do not share ownership of `PreprocessorOptions` (#133467)Jan Svoboda1-2/+2
This PR makes it so that `CompilerInvocation` is the sole owner of the `PreprocessorOptions` instance.
2025-03-25[clang][lex] Store non-owning options ref in `HeaderSearch` (#132780)Jan Svoboda1-2/+2
This makes it so that `CompilerInvocation` can be the only entity that manages ownership of `HeaderSearchOptions`, making it possible to implement copy-on-write semantics.
2025-01-22[clang-reorder-fields] Reorder leading comments (#123740)Clement Courbet1-0/+35
Similarly to https://github.com/llvm/llvm-project/pull/122918, leading comments are currently not being moved. ``` struct Foo { // This one is the cool field. int a; int b; }; ``` becomes: ``` struct Foo { // This one is the cool field. int b; int a; }; ``` but should be: ``` struct Foo { int b; // This one is the cool field. int a; }; ```
2025-01-16[clang][refactor] Refactor `findNextTokenIncludingComments` (#123060)Clement Courbet1-0/+21
We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter. --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com>
2024-12-05Skip escaped newlines before checking for whitespace in Lexer::getRawToken. ↵Samira Bazuzi1-0/+32
(#117548) The Lexer used in getRawToken is not told to keep whitespace, so when it skips over escaped newlines, it also ignores whitespace, regardless of getRawToken's IgnoreWhiteSpace parameter. Instead of letting this case fall through to lexing, check for whitespace after skipping over any escaped newlines.
2023-10-05[Lex] Introduce Preprocessor::LexTokensUntilEOF()Jonas Hahnfeld1-13/+2
This new method repeatedly calls Lex() until end of file is reached and optionally fills a std::vector of Tokens. Use it in Clang's unit tests to avoid quite some code duplication. Differential Revision: https://reviews.llvm.org/D158413
2023-08-22[Lex] Preambles should contain the global module fragment.Sam McCall1-0/+34
For applications like clangd, the preamble remains an important optimization when editing a module definition. The global module fragment is a good fit for it as it by definition contains only preprocessor directives. Before this patch, we would terminate the preamble immediately at the "module" keyword. Differential Revision: https://reviews.llvm.org/D158439
2023-01-09Move from llvm::makeArrayRef to ArrayRef deduction guides - clang/ partserge-sans-paille1-1/+1
This is a follow-up to https://reviews.llvm.org/D140896, split into several parts as it touches a lot of files. Differential Revision: https://reviews.llvm.org/D141139
2022-10-06Revert "Revert "[clang][Lex] Fix a crash on malformed string literals""Kadir Cetinkaya1-0/+1
This reverts commit feea7ef23cb1bef92d363cc613052f8f3a878fc2. Drops the test case, see https://reviews.llvm.org/D135161#3839510
2022-10-05Revert "[clang][Lex] Fix a crash on malformed string literals"Kadir Cetinkaya1-8/+0
This reverts commit 36a200208facf58d454c9b7253c956c2f2a8b946.
2022-10-05[clang][Lex] Fix a crash on malformed string literalsKadir Cetinkaya1-0/+8
Differential Revision: https://reviews.llvm.org/D135161
2022-06-25[clang, clang-tools-extra] Don't use Optional::{hasValue,getValue} (NFC)Kazu Hirata1-1/+1
2022-02-07[clang][Lexer] Fix tests after ff77071a4d67Kadir Cetinkaya1-0/+2
2022-01-31[clang][Lexer] Make raw and normal lexer behave the same for line commentsKadir Cetinkaya1-0/+25
Normally there are heruistics in lexer to treat `//*` specially in language modes that don't have line comments (to emit `/`). Unfortunately this only applied to the first occurence of a line comment inside the file, as the subsequent line comments were treated as if language had support for them. This unfortunately only holds in normal lexing mode, as in raw mode all occurences of line comments received this treatment, which created discrepancies when comparing expanded and spelled tokens. The proper fix would be to just make sure we treat all the line comments with a subsequent `*` the same way, but it would imply breaking some code that's accepted by clang today. So instead we introduce the same bug into raw lexing mode. Fixes https://github.com/clangd/clangd/issues/1003. Differential Revision: https://reviews.llvm.org/D118471
2021-07-14[Lexer] Fix bug in `makeFileCharRange` called on split tokens.Yitzhak Mandelbaum1-1/+64
When the end loc of the specified range is a split token, `makeFileCharRange` does not process it correctly. This patch adds proper support for split tokens. Differential Revision: https://reviews.llvm.org/D105365
2020-04-22[clang] Make sure argument expansion locations are correct in presence of ↵Kadir Cetinkaya1-0/+13
predefined buffer Summary: Macro argument expansion logic relies on skipping file IDs that created as a result of an include. Unfortunately it fails to do that for predefined buffer since it doesn't have a valid insertion location. As a result of that any file ID created for an include inside the predefined buffers breaks the traversal logic in SourceManager::computeMacroArgsCache. To fix this issue we first record number of created FIDs for predefined buffer, and then skip them explicitly in source manager. Another solution would be to just give predefined buffers a valid source location, but it is unclear where that should be.. Reviewers: sammccall Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D78649
2020-01-28Make llvm::StringRef to std::string conversions explicit.Benjamin Kramer1-1/+1
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2019-10-07[clang] Add test for FindNextToken in Lexer.Utkarsh Saxena1-2/+23
Reviewers: ilya-biryukov Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D68565 llvm-svn: 373910
2019-08-14[Clang] Migrate llvm::make_unique to std::make_uniqueJonas Devlieghere1-1/+1
Now that we've moved to C++14, we no longer need the llvm::make_unique implementation from STLExtras.h. This patch is a mechanical replacement of (hopefully) all the llvm::make_unique instances across the monorepo. Differential revision: https://reviews.llvm.org/D66259 llvm-svn: 368942
2019-07-30Remove cache for macro arg stringizationReid Kleckner1-5/+8
Summary: The cache recorded the wrong expansion location for all but the first stringization. It seems uncommon to stringize the same macro argument multiple times, so this cache doesn't seem that important. Fixes PR39942 Reviewers: vsk, rsmith Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D65428 llvm-svn: 367337
2019-04-23Re-apply r357823 "[Lexer] NFC: Fix an off-by-one bug in getAsCharRange()."Artem Dergachev1-0/+19
It now comes with a follow-up fix for the clients of this API in clangd and clang-tidy. Differential Revision: https://reviews.llvm.org/D59977 llvm-svn: 359035
2019-04-05Revert "[Lexer] NFC: Fix an off-by-one bug in getAsCharRange()."Artem Dergachev1-19/+0
This reverts commit r357823. Was breaking clang-tidy! Differential Revision: https://reviews.llvm.org/D59977 llvm-svn: 357827
2019-04-05[Lexer] NFC: Fix an off-by-one bug in getAsCharRange().Artem Dergachev1-0/+19
As the unit test demonstrates, subtracting 1 from the offset was unnecessary. The only user of this function was the plist file emitter (in Static Analyzer and ARCMigrator). It means that a lot of Static Analyzer's plist arrows are in fact off by one character. The patch carefully preserves this completely incorrect behavior and causes no functional change, i.e. no plist format breakage. Differential Revision: https://reviews.llvm.org/D59977 llvm-svn: 357823
2019-03-09Modules: Rename MemoryBufferCache to InMemoryModuleCacheDuncan P. N. Exon Smith1-3/+1
Change MemoryBufferCache to InMemoryModuleCache, moving it from Basic to Serialization. Another patch will start using it to manage module build more explicitly, but this is split out because it's mostly mechanical. Because of the move to Serialization we can no longer abuse the Preprocessor to forward it to the ASTReader. Besides the rename and file move, that means Preprocessor::Preprocessor has one fewer parameter and ASTReader::ASTReader has one more. llvm-svn: 355777
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth1-4/+3
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2018-04-30PR37189 Fix incorrect end source location and spelling for a split '>>' token.Richard Smith1-5/+4
When a '>>' token is split into two '>' tokens (in C++11 onwards), or (as an extension) when we do the same for other tokens starting with a '>', we can't just use a location pointing to the first '>' as the location of the split token, because that would result in our miscomputing the length and spelling for the token. As a consequence, for example, a refactoring replacing 'A<X>' with something else would sometimes replace one character too many, and similarly diagnostics highlighting a template-id source range would highlight one character too many. Fix this by creating an expansion range covering the first character of the '>>' token, whose spelling is '>'. For this to work, we generalize the expansion range of a macro FileID to be either a token range (the common case) or a character range (used in this new case). llvm-svn: 331155
2018-01-12[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.Volodymyr Sapsai1-0/+2
Fix makes the loop in LexAngledStringLiteral more like the loops in LexStringLiteral, LexCharConstant. When we skip a character after backslash, we need to check if we reached the end of the file instead of reading the next character unconditionally. Discovered by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3832 rdar://problem/35572754 Reviewers: arphaman, kcc, rsmith, dexonsmith Reviewed By: rsmith, dexonsmith Subscribers: cfe-commits, rsmith, dexonsmith Differential Revision: https://reviews.llvm.org/D41423 llvm-svn: 322390
2018-01-10[Lex] Inline a variable in test in preparation for more similar tests. NFC.Volodymyr Sapsai1-2/+1
llvm-svn: 322240
2017-12-06Stringizing raw string literals containing newlineTaewook Oh1-1/+39
Summary: This patch implements 4.3 of http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4220.pdf. If a raw string contains a newline character, replace each newline character with the \n escape code. Without this patch, included test case (macro_raw_string.cpp) results compilation failure. Reviewers: rsmith, doug.gregor, jkorous-apple Reviewed By: jkorous-apple Subscribers: jkorous-apple, vsapsai, cfe-commits Differential Revision: https://reviews.llvm.org/D39279 llvm-svn: 319904
2017-10-14[Lex] Avoid out-of-bounds dereference in SkipLineCommentAlex Lorenz1-0/+5
Credit to OSS-Fuzz for discovery: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3145 rdar://34526482 llvm-svn: 315785
2017-08-10[Lexer] Finding beginning of token with escaped new lineAlexander Kornienko1-0/+53
Summary: Lexer::GetBeginningOfToken produced invalid location when backtracking across escaped new lines. This fixes PR26228 Reviewers: akyrtzi, alexfh, rsmith, doug.gregor Reviewed By: alexfh Subscribers: alexfh, cfe-commits Patch by Paweł Żukowski! Differential Revision: https://reviews.llvm.org/D30748 llvm-svn: 310576
2017-07-17[NFC] Refactor the Preprocessor function that handles Macro definitions and ↵Faisal Vali1-3/+3
rename Arguments to Parameters in Macro Definitions. - Extracted the reading of the tokens out into a separate function. - Replace 'Argument' with 'Parameter' when referring to the identifiers of the macro definition (as opposed to the supplied arguments - MacroArgs - during the macro invocation). This is in preparation for submitting patches for review to implement __VA_OPT__ which will otherwise just keep lengthening the HandleDefineDirective function and making it less comprehensible. I will also directly update some extra clang tooling that is broken by the change from Argument to Parameter. Hopefully the bots will stay appeased. Thanks! llvm-svn: 308190
2017-07-17Revert changes from my previous refactoring - will need to fix dependencies ↵Faisal Vali1-3/+3
in clang's extra tooling (such as clang-tidy etc.). Sorry about that. llvm-svn: 308158
2017-07-17[NFC] Refactor the Preprocessor function that handles Macro definitions and ↵Faisal Vali1-3/+3
rename Arguments to Parameters in Macro Definitions. - Extracted the reading of the tokens out into a separate function. - Replace 'Argument' with 'Parameter' when referring to the identifiers of the macro definition (as opposed to the supplied arguments - MacroArgs - during the macro invocation). This is in preparation for submitting patches for review to implement __VA_OPT__ which will otherwise just keep lengthening the HandleDefineDirective function and making it less comprehensible. Thanks! llvm-svn: 308157
2017-06-15LexerTest memory leak fix-Erich Keane1-1/+3
A new LexerTest unittest introduced a memory leak. This patch uses a unique_ptr with a custom deleter to ensure it is properly deleted. llvm-svn: 305491
2017-06-15Fix LexerTest signed/unsigned comparison.Erich Keane1-1/+1
Werror was catching a signed/unsigned compare in an assert, correct the signed 'expected' value to be unsigned. llvm-svn: 305435
2017-06-14[Preprocessor]Correct Macro-Arg allocation of StringifiedArguments, Erich Keane1-9/+62
correct getNumArguments StringifiedArguments is allocated (resized) based on the size the getNumArguments function. However, this function ACTUALLY currently returns the amount of total UnexpArgTokens which is minimum the same as the new implementation of getNumMacroArguments, since empty/omitted arguments result in 1 UnexpArgToken, and included ones at minimum include 2 (1 for the arg itself, 1 for eof). This patch renames the otherwise unused getNumArguments to be more clear that it is the number of arguments that the Macro expects, and thus the maximum number that can be stringified. This patch also replaces the explicit memset (which results in value instantiation of the new tokens, PLUS clearing the memory) with brace initialization. Differential Revision: https://reviews.llvm.org/D32046 llvm-svn: 305425
2017-06-09Add #pragma clang module build/endbuild pragmas for performing a module buildRichard Smith1-19/+1
as part of a compilation. This is intended for two purposes: 1) Writing self-contained test cases for modules: we can now write a single source file test that builds some number of module files on the side and imports them. 2) Debugging / test case reduction. A single-source testcase is much more amenable to reduction, compared to a VFS tarball or .pcm files. llvm-svn: 305101
2017-03-20Reapply "Modules: Cache PCMs in memory and avoid a use-after-free"Duncan P. N. Exon Smith1-1/+4
This reverts commit r298185, effectively reapplying r298165, after fixing the new unit tests (PR32338). The memory buffer generator doesn't null-terminate the MemoryBuffer it creates; this version of the commit informs getMemBuffer about that to avoid the assert. Original commit message follows: ---- Clang's internal build system for implicit modules uses lock files to ensure that after a process writes a PCM it will read the same one back in (without contention from other -cc1 commands). Since PCMs are read from disk repeatedly while invalidating, building, and importing, the lock is not released quickly. Furthermore, the LockFileManager is not robust in every environment. Other -cc1 commands can stall until timeout (after about eight minutes). This commit changes the lock file from being necessary for correctness to a (possibly dubious) performance hack. The remaining benefit is to reduce duplicate work in competing -cc1 commands which depend on the same module. Follow-up commits will change the internal build system to continue after a timeout, and reduce the timeout. Perhaps we should reconsider blocking at all. This also fixes a use-after-free, when one part of a compilation validates a PCM and starts using it, and another tries to swap out the PCM for something new. The PCMCache is a new type called MemoryBufferCache, which saves memory buffers based on their filename. Its ownership is shared by the CompilerInstance and ModuleManager. - The ModuleManager stores PCMs there that it loads from disk, never touching the disk if the cache is hot. - When modules fail to validate, they're removed from the cache. - When a CompilerInstance is spawned to build a new module, each already-loaded PCM is assumed to be valid, and is frozen to avoid the use-after-free. - Any newly-built module is written directly to the cache to avoid the round-trip to the filesystem, making lock files unnecessary for correctness. Original patch by Manman Ren; most testcases by Adrian Prantl! llvm-svn: 298278
2017-03-18Revert "Modules: Cache PCMs in memory and avoid a use-after-free"Renato Golin1-4/+1
This reverts commit r298165, as it broke the ARM builds. llvm-svn: 298185
2017-03-17Modules: Cache PCMs in memory and avoid a use-after-freeDuncan P. N. Exon Smith1-1/+4
Clang's internal build system for implicit modules uses lock files to ensure that after a process writes a PCM it will read the same one back in (without contention from other -cc1 commands). Since PCMs are read from disk repeatedly while invalidating, building, and importing, the lock is not released quickly. Furthermore, the LockFileManager is not robust in every environment. Other -cc1 commands can stall until timeout (after about eight minutes). This commit changes the lock file from being necessary for correctness to a (possibly dubious) performance hack. The remaining benefit is to reduce duplicate work in competing -cc1 commands which depend on the same module. Follow-up commits will change the internal build system to continue after a timeout, and reduce the timeout. Perhaps we should reconsider blocking at all. This also fixes a use-after-free, when one part of a compilation validates a PCM and starts using it, and another tries to swap out the PCM for something new. The PCMCache is a new type called MemoryBufferCache, which saves memory buffers based on their filename. Its ownership is shared by the CompilerInstance and ModuleManager. - The ModuleManager stores PCMs there that it loads from disk, never touching the disk if the cache is hot. - When modules fail to validate, they're removed from the cache. - When a CompilerInstance is spawned to build a new module, each already-loaded PCM is assumed to be valid, and is frozen to avoid the use-after-free. - Any newly-built module is written directly to the cache to avoid the round-trip to the filesystem, making lock files unnecessary for correctness. Original patch by Manman Ren; most testcases by Adrian Prantl! llvm-svn: 298165
2017-01-06shared_ptrify (from InclusiveRefCntPtr) HeaderSearchOptionsDavid Blaikie1-2/+2
llvm-svn: 291202
2017-01-05Move PreprocessorOptions to std::shared_ptr from IntrusiveRefCntPtrDavid Blaikie1-2/+2
llvm-svn: 291160