aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Lex/Lexer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2020-10-19Lexer: Update the Lexer to use MemoryBufferRef, NFCDuncan P. N. Exon Smith1-12/+12
Update `Lexer` / `Lexer::Lexer` to use `MemoryBufferRef` instead of `MemoryBuffer*`. Callers that were acquiring a `MemoryBuffer*` via `SourceManager::getBuffer` were updated, such that if they checked `Invalid` they use `getBufferOrNone` and otherwise `getBufferOrFake`. Differential Revision: https://reviews.llvm.org/D89398
2020-09-21[Coverage] Add empty line regions to SkippedRegionsZequan Wu1-1/+24
Differential Revision: https://reviews.llvm.org/D84988
2020-06-20[AST/Lex/Parse/Sema] As part of using inclusive language withinEric Christopher1-1/+1
the llvm project, migrate away from the use of blacklist and whitelist.
2020-04-22Rename warning identifiers from cxx2a to cxx20; NFC.Aaron Ballman1-1/+1
2020-04-21C++2a -> C++20 in some identifiers; NFC.Aaron Ballman1-1/+1
2020-03-26[CodeComplete] Don't replace the rest of line in #include completion.Haojian Wu1-2/+8
Summary: The previous behavior was aggressive, #include "abc/f^/abc.h" foo/ -> candidate "f/abc.h" is replaced with "foo/", this patch will preserve the "abc.h". Reviewers: sammccall Subscribers: jkorous, arphaman, kadircet, usaxena95, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76770
2020-02-10Prefer __vector over vector keyword for altivecserge-sans-paille1-2/+2
`vector' uses the keyword-and-predefine mode from gcc, while __vector is reliably supported. As a side effect, it also makes the code consistent in its usage of __vector. Differential Revision: https://reviews.llvm.org/D74129
2020-01-28Make llvm::StringRef to std::string conversions explicit.Benjamin Kramer1-1/+1
This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.
2020-01-15[Lexer] Allow UCN for dollar symbol '\u0024' in identifiers when using ↵Scott Egerton1-0/+2
-fdollars-in-identifiers flag. Summary: Previously, the -fdollars-in-identifiers flag allows the '$' symbol to be used in an identifier but the universal character name equivalent '\u0024' is not allowed. This patch changes this, so that \u0024 is valid in identifiers. Reviewers: rsmith, jordan_rose Reviewed By: rsmith Subscribers: dexonsmith, simoncook, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D71758
2019-10-28Lexer::ReadToEndOfLine - fix Token uninitialised value warnings. NFCI.Simon Pilgrim1-0/+1
Use Token::startToken to initialize Token.
2019-09-11[clang-scan-deps] add skip excluded conditional preprocessor block ↵Alex Lorenz1-0/+9
preprocessing optimization This commit adds an optimization to clang-scan-deps and clang's preprocessor that skips excluded preprocessor blocks by bumping the lexer pointer, and not lexing the tokens until reaching appropriate #else/#endif directive. The skip positions and lexer offsets are computed when the file is minimized, directly from the minimized tokens. On an 18-core iMacPro with macOS Catalina Beta I got 10-15% speed-up from this optimization when running clang-scan-deps on the compilation database for a recent LLVM and Clang (3511 files). Differential Revision: https://reviews.llvm.org/D67127 llvm-svn: 371656
2019-07-12Delete dead storesFangrui Song1-2/+1
llvm-svn: 365901
2019-03-19Replace tok::angle_string_literal with new tok::header_name.Richard Smith1-2/+4
Use the new kind for both angled header-name tokens and for double-quoted header-name tokens. This is in preparation for C++20's context-sensitive header-name token formation rules. llvm-svn: 356530
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth1-4/+3
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2018-12-10Misc typos fixes in ./lib folderRaphael Isemann1-1/+1
Summary: Found via `codespell -q 3 -I ../clang-whitelist.txt -L uint,importd,crasher,gonna,cant,ue,ons,orign,ned` Reviewers: teemperor Reviewed By: teemperor Subscribers: teemperor, jholewinski, jvesely, nhaehnle, whisperity, jfb, cfe-commits Differential Revision: https://reviews.llvm.org/D55475 llvm-svn: 348755
2018-10-30NFC: Remove the ObjC1/ObjC2 distinction from clang (and related projects)Erik Pilkington1-1/+1
We haven't supported compiling ObjC1 for a long time (and never will again), so there isn't any reason to keep these separate. This patch replaces LangOpts::ObjC1 and LangOpts::ObjC2 with LangOpts::ObjC. Differential revision: https://reviews.llvm.org/D53547 llvm-svn: 345637
2018-09-25Don't emit "will be treated as an identifier character" warning forRichard Smith1-4/+2
UTF-8 characters that aren't identifier characters in the current language mode. llvm-svn: 343040
2018-09-18[CodeComplete] Add completions for filenames in #include directives.Sam McCall1-5/+42
Summary: The dir component ("somedir" in #include <somedir/fo...>) is considered fixed. We append "foo" to each directory on the include path, and then list its files. Completions are of the forms: #include <somedir/fo^ foo.h> fox/ The filter is set to the filename part ("fo"), so fuzzy matching can be applied to the filename only. No fancy scoring/priorities are set, and no information is added to CodeCompleteResult to make smart scoring possible. Could be in future. Reviewers: ilya-biryukov Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D52076 llvm-svn: 342449
2018-09-07PR38870: Add warning for zero-width unicode characters appearing inRichard Smith1-3/+18
identifiers. llvm-svn: 341700
2018-05-09Remove \brief commands from doxygen comments.Adrian Prantl1-8/+8
This is similar to the LLVM change https://reviews.llvm.org/D46290. We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\@brief'); do perl -pi -e 's/\@brief //g' $i & done for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46320 llvm-svn: 331834
2018-04-30PR37189 Fix incorrect end source location and spelling for a split '>>' token.Richard Smith1-15/+11
When a '>>' token is split into two '>' tokens (in C++11 onwards), or (as an extension) when we do the same for other tokens starting with a '>', we can't just use a location pointing to the first '>' as the location of the split token, because that would result in our miscomputing the length and spelling for the token. As a consequence, for example, a refactoring replacing 'A<X>' with something else would sometimes replace one character too many, and similarly diagnostics highlighting a template-id source range would highlight one character too many. Fix this by creating an expansion range covering the first character of the '>>' token, whose spelling is '>'. For this to work, we generalize the expansion range of a macro FileID to be either a token range (the common case) or a character range (used in this new case). llvm-svn: 331155
2018-04-25[CodeComplete] Fix completion in the middle of ident in ctor lists.Ilya Biryukov1-1/+15
Summary: The example that was broken before (^ designates completion points): class Foo { Foo() : fie^ld^() {} // no completions were provided here. int field; }; To fix it we don't cut off lexing after an identifier followed by code completion token is lexed. Instead we skip the rest of identifier and continue lexing. This is consistent with behavior of completion when completion token is right before the identifier. Reviewers: sammccall, aaron.ballman, bkramer, sepavloff, arphaman, rsmith Reviewed By: aaron.ballman Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D44932 llvm-svn: 330833
2018-04-24[CodeComplete] Fix completion at the end of keywordsIlya Biryukov1-8/+12
Summary: Make completion behave consistently no matter if it is run at the start, in the middle or at the end of an identifier that happens to be a keyword or a macro name. Since completion is often ran on incomplete identifiers, they may turn into keywords by accident. For example, we should produce same results for all of these completion points: // ^ is completion point. ^class cla^ss class^ Previously clang produced different results for the last case (as if the completion point was after a space: `class ^`). This change also updates some offsets in tests that (unintentionally?) relied on the old behavior. Reviewers: sammccall, bkramer, arphaman, aaron.ballman Reviewed By: sammccall Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D45887 llvm-svn: 330717
2018-04-06Fix typos in clangAlexander Kornienko1-4/+4
Found via codespell -q 3 -I ../clang-whitelist.txt Where whitelist consists of: archtype cas classs checkk compres definit frome iff inteval ith lod methode nd optin ot pres statics te thru Patch by luzpaz! (This is a subset of D44188 that applies cleanly with a few files that have dubious fixes reverted.) Differential revision: https://reviews.llvm.org/D44188 llvm-svn: 329399
2018-01-12[Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.Volodymyr Sapsai1-8/+11
Fix makes the loop in LexAngledStringLiteral more like the loops in LexStringLiteral, LexCharConstant. When we skip a character after backslash, we need to check if we reached the end of the file instead of reading the next character unconditionally. Discovered by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3832 rdar://problem/35572754 Reviewers: arphaman, kcc, rsmith, dexonsmith Reviewed By: rsmith, dexonsmith Subscribers: cfe-commits, rsmith, dexonsmith Differential Revision: https://reviews.llvm.org/D41423 llvm-svn: 322390
2017-12-14Warn if we find a Unicode homoglyph for a symbol in an identifier.Richard Smith1-1/+78
Specifically, warn if: * we find a character that the language standard says we must treat as an identifier, and * that character is not reasonably an identifier character (it's a punctuation character or similar), and * it renders identically to a valid non-identifier character in common fixed-width fonts. Some tools "helpfully" substitute the surprising characters for the expected characters, and replacing semicolons with Greek question marks is a common "prank". llvm-svn: 320697
2017-12-08[Lex] Fix some Clang-tidy modernize and Include What You Use warnings; other ↵Eugene Zelenko1-49/+54
minor fixes (NFC). llvm-svn: 320207
2017-12-06Stringizing raw string literals containing newlineTaewook Oh1-56/+65
Summary: This patch implements 4.3 of http://open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4220.pdf. If a raw string contains a newline character, replace each newline character with the \n escape code. Without this patch, included test case (macro_raw_string.cpp) results compilation failure. Reviewers: rsmith, doug.gregor, jkorous-apple Reviewed By: jkorous-apple Subscribers: jkorous-apple, vsapsai, cfe-commits Differential Revision: https://reviews.llvm.org/D39279 llvm-svn: 319904
2017-12-04Now that C++17 is official (https://www.iso.org/standard/68564.html), start ↵Aaron Ballman1-2/+2
changing the C++1z terminology over to C++17. NFC intended, these are all mechanical changes. llvm-svn: 319688
2017-12-01[c++2a] P0515R3: lexer support for new <=> token.Richard Smith1-0/+18
llvm-svn: 319509
2017-11-03[refactor][extract] insert semicolons into extracted/inserted codeAlex Lorenz1-16/+20
when needed This commit implements the semicolon insertion logic into the extract refactoring. The following rules are used: - extracting expression: add terminating ';' to the extracted function. - extracting statements that don't require terminating ';' (e.g. switch): add terminating ';' to the callee. - extracting statements with ';': move (if possible) the original ';' from the callee and add terminating ';'. - otherwise, add ';' to both places. Differential Revision: https://reviews.llvm.org/D39441 llvm-svn: 317343
2017-10-15Add -f[no-]double-square-bracket-attributes as new driver options to control ↵Aaron Ballman1-1/+3
use of [[]] attributes in all language modes. This is the initial implementation of WG14 N2165, which is a proposal to add [[]] attributes to C2x, but also allows you to enable these attributes in C++98, or disable them in C++11 or later. llvm-svn: 315856
2017-10-14[Lex] Avoid out-of-bounds dereference in SkipLineCommentAlex Lorenz1-1/+2
Credit to OSS-Fuzz for discovery: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3145 rdar://34526482 llvm-svn: 315785
2017-10-11A '<' with a trigraph '#' is not a valid editor placeholderAlex Lorenz1-1/+2
Credit to OSS-Fuzz for discovery: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3137#c5 rdar://34923985 llvm-svn: 315398
2017-09-20Fixed unused variable warning introduced in r313796 causing build failureCameron Desrochers1-3/+0
llvm-svn: 313802
2017-09-20[PCH] Fixed preamble breaking with BOM presence (and particularly, ↵Cameron Desrochers1-7/+7
fluctuating BOM presence) This patch fixes broken preamble-skipping when the preamble region includes a byte order mark (BOM). Previously, parsing would fail if preamble PCH generation was enabled and a BOM was present. This also fixes preamble invalidation when a BOM appears or disappears. This may seem to be an obscure edge case, but it happens regularly with IDEs that pass buffer overrides that never (or always) have a BOM, yet the underlying file from the initial parse that generated a PCH might (or might not) have a BOM. I've included a test case for these scenarios. Differential Revision: https://reviews.llvm.org/D37491 llvm-svn: 313796
2017-09-05[Preprocessor] Correct internal token parsing of newline characters in CRLFErich Keane1-2/+3
Correct implementation: Apparently I managed in r311683 to submit the wrong version of the patch for this, so I'm correcting it now. Differential Revision: https://reviews.llvm.org/D37079 llvm-svn: 312542
2017-08-24[Preprocessor] Correct internal token parsing of newline characters in CRLFErich Keane1-0/+2
Discovered due to a goofy git setup, the test system-headerline-directive.c (and a few others) failed because the token-consumption will consume only the '\r' in CRLF, making the preprocessor's printed value give the wrong line number when returning from an include. For example: (line 1):#include <noline.h>\r\n The "file exit" code causes the printer to try to print the 'returned to the main file' line. It looks up what the current line number is. However, since the current 'token' is the '\n' (since only the \r was consumed), it will give the line number as '1", not '2'. This results in a few failed tests, but more importantly, results in error messages being incorrect when compiling a previously preprocessed file. Differential Revision: https://reviews.llvm.org/D37079 llvm-svn: 311683
2017-08-10[Lexer] Finding beginning of token with escaped new lineAlexander Kornienko1-28/+44
Summary: Lexer::GetBeginningOfToken produced invalid location when backtracking across escaped new lines. This fixes PR26228 Reviewers: akyrtzi, alexfh, rsmith, doug.gregor Reviewed By: alexfh Subscribers: alexfh, cfe-commits Patch by Paweł Żukowski! Differential Revision: https://reviews.llvm.org/D30748 llvm-svn: 310576
2017-07-05Fix invalid warnings for header guards in preamblesErik Verbruggen1-1/+1
Fixes https://bugs.llvm.org/show_bug.cgi?id=33574 Differential Revision: https://reviews.llvm.org/D34882 llvm-svn: 307134
2017-06-16[PR33394] Avoid lexing editor placeholders when Clang is used onlyAlex Lorenz1-1/+2
for preprocessing r300667 added support for editor placeholder to Clang. That commit didn’t take into account that users who use Clang for preprocessing only (-E) will get the "editor placeholder in source file" error when preprocessing their source (PR33394). This commit ensures that Clang doesn't lex editor placeholders when running a preprocessor only action. rdar://32718000 Differential Revision: https://reviews.llvm.org/D34256 llvm-svn: 305576
2017-06-03Added LLVM_FALLTHROUGH to address warning: this statement may fall through. NFC.Galina Kistanova1-0/+2
llvm-svn: 304643
2017-05-30Allow for unfinished #if blocks in preamblesErik Verbruggen1-28/+11
Previously, a preamble only included #if blocks (and friends like ifdef) if there was a corresponding #endif before any declaration or definition. The problem is that any header file that uses include guards will not have a preamble generated, which can make code-completion very slow. To prevent errors about unbalanced preprocessor conditionals in the preamble, and unbalanced preprocessor conditionals after a preamble containing unfinished conditionals, the conditional stack is stored in the pch file. This fixes PR26045. Differential Revision: http://reviews.llvm.org/D15994 llvm-svn: 304207
2017-05-17[Lexer] Ensure that the token is not an annotation token whenAlex Lorenz1-0/+4
retrieving the identifer info for an Objective-C keyword This commit fixes an assertion that's triggered in getIdentifier when the token is an annotation token. rdar://32225463 llvm-svn: 303246
2017-05-05Add a fix-it for -Wunguarded-availabilityAlex Lorenz1-17/+49
This patch adds a fix-it for the -Wunguarded-availability warning. This fix-it is similar to the Swift one: it suggests that you wrap the statement in an `if (@available)` check. The produced fixits are indented (just like the Swift ones) to make them look nice in Xcode's fix-it preview. rdar://31680358 Differential Revision: https://reviews.llvm.org/D32424 llvm-svn: 302253
2017-04-19Add support for editor placeholders to ClangAlex Lorenz1-0/+33
This commit teaches Clang to recognize editor placeholders that are produced when an IDE like Xcode inserts a code-completion result that includes a placeholder. Now when the lexer sees a placeholder token, it emits an 'editor placeholder in source file' error and creates an identifier token that represents the placeholder. The parser/sema can now recognize the placeholders and can suppress the diagnostics related to the placeholders. This ensures that live issues in an IDE like Xcode won't get spurious diagnostics related to placeholders. This commit also adds a new compiler option named '-fallow-editor-placeholders' that silences the 'editor placeholder in source file' error. This is useful for an IDE like Xcode as we don't want to display those errors in live issues. rdar://31581400 Differential Revision: https://reviews.llvm.org/D32081 llvm-svn: 300667
2017-04-18Do not warn about whitespace between ??/ trigraph and newline in line ↵Richard Smith1-4/+6
comments if trigraphs are disabled in the current language. llvm-svn: 300609
2017-04-17Fix mishandling of escaped newlines followed by newlines or nuls.Richard Smith1-18/+10
Previously, if an escaped newline was followed by a newline or a nul, we'd lex the escaped newline as a bogus space character. This led to a bunch of different broken corner cases: For the pattern "\\\n\0#", we would then have a (horizontal) space whose spelling ends in a newline, and would decide that the '#' is at the start of a line, and incorrectly start preprocessing a directive in the middle of a logical source line. If we were already in the middle of a directive, this would result in our attempting to process multiple directives at the same time! This resulted in crashes, asserts, and hangs on invalid input, as discovered by fuzz-testing. For the pattern "\\\n" at EOF (with an implicit following nul byte), we would produce a bogus trailing space character with spelling "\\\n". This was mostly harmless, but would lead to clang-format getting confused and misformatting in rare cases. We now produce a trailing EOF token with spelling "\\\n", consistent with our handling for other similar cases -- an escaped newline is always part of the token containing the next character, if any. For the pattern "\\\n\n", this was somewhat more benign, but would produce an extraneous whitespace token to clients who care about preserving whitespace. However, it turns out that our lexing for line comments was relying on this bug due to an off-by-one error in its computation of the end of the comment, on the slow path where the comment might contain escaped newlines. llvm-svn: 300515
2017-04-07Skip Unicode character expansion in assembly filesSanne Wouda1-9/+11
Summary: When using the C preprocessor with assembly files, either with a capital `S` file extension, or with `-xassembler-with-cpp`, the Unicode escape sequence `\u` is ignored. The `\u` pattern can be used for expanding a macro argument that starts with `u`. Author: Salman Arif <salman.arif@arm.com> Reviewers: rengolin, olista01 Reviewed By: olista01 Subscribers: cfe-commits Differential Revision: https://reviews.llvm.org/D31765 llvm-svn: 299754
2016-12-30Allow lexer to handle string_view literals. Patch from Anton Bikineev.Eric Fiselier1-3/+3
This implements the compiler side of p0403r0. This patch was reviewed as https://reviews.llvm.org/D26829. llvm-svn: 290744