aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Lex/Lexer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-08-18[clang] Allow trivial pp-directives before C++ module directive (#153641)yronglin1-9/+0
Consider the following code: ```cpp # 1 __FILE__ 1 3 export module a; ``` According to the wording in [P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html): ``` A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.) ``` and the wording in [[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file) ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` `#` is the first pp-token in the translation unit, and it was rejected by clang, but they really should be exempted from this rule. The goal is to not allow any preprocessor conditionals or most state changes, but these don't fit that. State change would mean most semantically observable preprocessor state, particularly anything that is order dependent. Global flags like being a system header/module shouldn't matter. We should exempt a brunch of directives, even though it violates the current standard wording. In this patch, we introduce a `TrivialDirectiveTracer` to trace the **State change** that described above and propose to exempt the following kind of directive: `#line`, GNU line marker, `#ident`, `#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma warning`, `#pragma execution_character_set`, `#pragma clang assume_nonnull` and builtin macro expansion. Fixes https://github.com/llvm/llvm-project/issues/145274 --------- Signed-off-by: yronglin <yronglin777@gmail.com>
2025-07-09Address a handful of C4146 compiler warnings where literals can be replaced ↵Alex Sepkowski1-1/+2
with std::numeric_limits (#147623) This PR addresses instances of compiler warning C4146 that can be replaced with std::numeric_limits. Specifically, these are cases where a literal such as '-1ULL' was used to assign a value to a uint64_t variable. The intent is much cleaner if we use the appropriate std::numeric_limits value<Type>::max() for these cases. Addresses #147439
2025-07-07[clang][deps] Stop lexing if hit a failure while loading a PCH/module in a ↵Volodymyr Sapsai1-0/+3
submodule. (#146976) Otherwise we are continuing in an invalid state and can easily crash. It is a follow-up to cde90e68f8123e7abef3f9e18d79980aa19f460a but an important difference is when a failure happens in a submodule. In this case in `Preprocessor::HandleEndOfFile` `tok::eof` is replaced by `tok::annot_module_end`. And after exiting a file with bad `#include/#import` we work with a new buffer, so `BufferPtr < BufferEnd`. As there are no signs to stop lexing we just keep doing it. The fix is the same as in dc9fdaf2171cc480300d5572606a8ede1678d18b in `Lexer::LexTokenInternal` but this time in `Lexer::LexDependencyDirectiveToken` as well. rdar://152499276
2025-07-07NFC, use structured binding to simplify the code.Haojian Wu1-3/+1
2025-06-28[clang] Remove unused includes (NFC) (#146254)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-06-26[clang] NFC: Add alias for std::pair<FileID, unsigned> used in ↵Haojian Wu1-11/+10
SourceLocation (#145711) Introduce a type alias for the commonly used `std::pair<FileID, unsigned>` to improve code readability, and make it easier for future updates (64-bit source locations).
2025-06-26[clang][Preprocessor] Handle the first pp-token in EnterMainSourceFile (#145244)yronglin1-7/+2
Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token without side-effects](https://github.com/llvm/llvm-project/pull/143898). This PR fix the performance regression that introduced in https://github.com/llvm/llvm-project/pull/144233. The original PR(https://github.com/llvm/llvm-project/pull/144233) handle the first pp-token in the main source file in the macro definition/expansion and `Lexer::Lex`, but the lexer is almost always on the hot path, we may hit a performance regression. In this PR, we handle the first pp-token in `Preprocessor::EnterMainSourceFile`. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
2025-06-24[clang][Preprocessor] Add peekNextPPToken, makes look ahead next token ↵yronglin1-10/+11
without side-effects (#143898) This PR introduce a new function `peekNextPPToken`. It's an extension of `isNextPPTokenLParen` and can makes look ahead one token in preprocessor without side-effects. It's also the 1st part of https://github.com/llvm/llvm-project/pull/107168 and it was used to look ahead next token then determine whether current lexing pp directive is one of pp-import or pp-module directive. At the start of phase 4 an import or module token is treated as starting a directive and are converted to their respective keywords iff: - After skipping horizontal whitespace are - at the start of a logical line, or - preceded by an export at the start of the logical line. - Are followed by an identifier pp token (before macro expansion), or - <, ", or : (but not ::) pp tokens for import, or - ; for module Otherwise the token is treated as an identifier. --------- Signed-off-by: yronglin <yronglin777@gmail.com>
2025-06-21[C++][Modules] A module directive may only appear as the first preprocessing ↵yronglin1-0/+13
tokens in a file (#144233) This PR is 2nd part of [P1857R3](https://github.com/llvm/llvm-project/pull/107168) implementation, and mainly implement the restriction `A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)`: [cpp.pre](https://eel.is/c++draft/cpp.pre): ``` module-file: pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt] ``` We also refine tests use `split-file` instead of conditional macro. Signed-off-by: yronglin <yronglin777@gmail.com>
2025-04-04[clang][deps] Respect `Lexer::cutOffLexing()` (#134404)Jan Svoboda1-0/+3
This is crucial when recovering from fatal loader errors. Without it, the `Lexer` keeps yielding more tokens and the compiler may access invalid `ASTReader` state. rdar://133388373
2025-03-19Suppress pedantic diagnostic for a file not ending in EOL (#131794)Aaron Ballman1-22/+6
WG14 added N3411 to the list of papers which apply to older versions of C in C2y, and WG21 adopted CWG787 as a Defect Report in C++11. So we no longer should be issuing a pedantic diagnostic about a file which does not end with a newline character. We do, however, continue to support -Wnewline-eof as an opt-in diagnostic.
2025-03-18[C2y] Add octal prefixes, deprecate unprefixed octals (#131626)Aaron Ballman1-10/+31
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It also deprecates use of octal literals without a prefix, except for the literal 0. This feature is being exposed as an extension in older C language modes as well as in all C++ language modes.
2025-03-07[C2y] Implement WG14 N3411 (#130180)Aaron Ballman1-8/+7
This paper (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3411.pdf) allows a source file to end without a newline. Clang has supported this as a conforming extension for a long time, so this suppresses the diagnotic in C2y mode but continues to diagnose as an extension in earlier language modes. It also continues to diagnose if the user passes -Wnewline-eof explicitly.
2025-01-22[clang-reorder-fields] Reorder leading comments (#123740)Clement Courbet1-0/+21
Similarly to https://github.com/llvm/llvm-project/pull/122918, leading comments are currently not being moved. ``` struct Foo { // This one is the cool field. int a; int b; }; ``` becomes: ``` struct Foo { // This one is the cool field. int b; int a; }; ``` but should be: ``` struct Foo { int b; // This one is the cool field. int a; }; ```
2025-01-16[clang][refactor] Refactor `findNextTokenIncludingComments` (#123060)Clement Courbet1-1/+3
We have two copies of the same code in clang-tidy and clang-reorder-fields, and those are extremenly similar to `Lexer::findNextToken`, so just add an extra agument to the latter. --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com>
2024-12-05Skip escaped newlines before checking for whitespace in Lexer::getRawToken. ↵Samira Bazuzi1-1/+1
(#117548) The Lexer used in getRawToken is not told to keep whitespace, so when it skips over escaped newlines, it also ignores whitespace, regardless of getRawToken's IgnoreWhiteSpace parameter. Instead of letting this case fall through to lexing, check for whitespace after skipping over any escaped newlines.
2024-11-16[Lex] Remove unused includes (NFC) (#116460)Kazu Hirata1-1/+0
Identified with misc-include-cleaner.
2024-09-16Remove ^^ as a token in OpenCL (#108224)Aaron Ballman1-3/+2
OpenCL has a reserved operator (^^), the use of which was diagnosed as an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL also encourages working with the blocks language extension. This token has a parsing ambiguity as a result. Consider: unsigned x=0; unsigned y=x^^{return 0;}(); This should result in y holding the value zero (0^0) through an immediately invoked block call as the right-hand side of the xor operator. However, it causes errors instead because of this reserved token: https://godbolt.org/z/navf7jTv1 This token is still reserved in OpenCL 3.0, so we still wish to issue a diagnostic for its use. However, we do not need to create a token for an extension point that's been unused for about a decade. So this patch moves the diagnostic from a parsing diagnostic to a lexing diagnostic and no longer forms a single token. The diagnostic behavior is slightly worse as a result, but still seems acceptable. Part of the reason this is coming up is because WG21 is considering using ^^ as a token for reflection, so this token may come back in the future.
2024-09-05[Clang] Warn with -Wpre-c23-compat instead of -Wpre-c++17-compat for u8 ↵Mital Ashok1-1/+3
character literals in C23 (#97210) Co-authored-by: cor3ntin <corentinjabot@gmail.com>
2024-07-10[Clang] Allow raw string literals in C as an extension (#88265)Sirraide1-5/+5
This enables raw R"" string literals in C in some language modes and adds an option to disable or enable them explicitly as an extension. Background: GCC supports raw string literals in C in `-gnuXY` modes starting with gnu99. This pr both enables raw string literals in gnu99 mode and later in C and adds an `-f[no-]raw-string-literals` flag to override this behaviour. The decision not to enable raw string literals in gnu89 mode, according to the GCC devs, is intentional as that mode is supposed to be used for ‘old code’ that they don’t want to break; we’ve decided to match GCC’s behaviour here as well. The `-fraw-string-literals` flag can additionally be used to enable raw string literals in modes where they aren’t enabled by default (such as c99—as opposed to gnu99—or even e.g. C++03); conversely, the negated flag can be used to disable them in any gnuXY modes that *do* provide them by default, or to override a previous flag. However, we do *not* support disabling raw string literals (or indeed either of these two options) in C++11 mode and later, because we don’t want to just start supporting disabling features that are actually part of the language in the general case. This fixes #85703.
2024-06-13Fix off-by-one issue found by post-commit reviewAaron Ballman1-1/+1
2024-05-28[Clang] allow `` `@$ `` in raw string delimiters in C++26 (#93216)cor3ntin1-1/+10
And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes #93130
2024-02-15bad error message on incorrect string literal #18079 (#81670)akshaykumars6141-0/+2
(bad error message on incorrect string literal) Fixed the error message for incorrect string literal before: ``` test.cpp:1:19: error: invalid character ' ' character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string char const* a = R" ^ ``` now: ``` test.cpp:1:19: error: invalid newline character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string 1 | char const* a = R" | ^ ``` --------- Co-authored-by: Jon Roelofs <jroelofs@gmail.com>
2024-01-31[clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101)Owen Pan1-0/+45
So that it can be used by clang-format.
2023-12-13[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)Kazu Hirata1-2/+2
This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
2023-11-29[HLSL] Support vector swizzles on scalars (#67700)Chris B1-0/+4
HLSL supports vector swizzles on scalars by implicitly converting the scalar to a single-element vector. This syntax is a convienent way to initialize vectors based on filling a scalar value. There are two parts of this change. The first part in the Lexer splits numeric constant tokens when a `.x` or `.r` suffix is encountered. This splitting is a bit hacky but allows the numeric constant to be parsed separately from the vector element expression. There is an ambiguity here with the `r` suffix used by fixed point types, however fixed point types aren't supported in HLSL so this should not cause any exposable problems (a separate issue has been filed to track validating language options for HLSL: #67689). The second part of this change is in Sema::LookupMemberExpr. For HLSL, if the base type is a scalar, we implicit cast the scalar to a one-element vector then call back to perform the vector lookup. Fixes #56658 and #67511
2023-10-31[clang] Change GetCharAndSizeSlow interface to by-value styleserge-sans-paille1-33/+40
Instead of passing the Size by reference, assuming it is initialized, return it alongside the expected char result as a POD. This makes the interface less error prone: previous interface expected the Size reference to be initialized, and it was often forgotten, leading to uninitialized variable usage. This patch fixes the issue. This also generates faster code, as the returned POD (a char and an unsigned) fits in 64 bits. The speedup according to compile time tracker reach -O.7%, with a good number of -0.4%. Details are available on https://llvm-compile-time-tracker.com/compare.php?from=3fe63f81fcb999681daa11b2890c82fda3aaeef5&to=fc76a9202f737472ecad4d6e0b0bf87a013866f3&stat=instructions:u And icing on the cake, on my setup it also shaves 2kB out of libclang-cpp :-) This is a recommit of d8f5a18b6e587aeaa8b99707e87b652f49b160cd for
2023-10-29Revert "Perf/lexer faster slow get char and size (#70543)"Nico Weber1-40/+33
This reverts commit d8f5a18b6e587aeaa8b99707e87b652f49b160cd. Breaks build, see: https://github.com/llvm/llvm-project/pull/70543#issuecomment-1784227421
2023-10-29Perf/lexer faster slow get char and size (#70543)serge-sans-paille1-33/+40
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-10-19[clang] Provide an SSE4.2 implementation of identifier token lexer (#68962)serge-sans-paille1-7/+39
The _mm_cmpistri instruction can be used to quickly parse identifiers. With this patch activated, clang pre-processes <iostream> 1.8% faster, and sqlite3.c amalgametion 1.5% faster, based on time measurements and number of executed instructions as measured by valgrind. The introduction of an extra helper function in the regular case has no impact on performance, see https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u --------- Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
2023-10-07[clang][Lex][NFC] Make some local variables constTimm Bäder1-4/+4
2023-09-06[Clang] Handle non-ASCII after line splicingCorentin Jabot1-17/+28
int a\ ス; Failed to be parsed as a valid identifier. Fixes #65156 Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D159345
2023-09-04[clang][NFC] Remove stray slashTimm Bäder1-1/+1
2023-08-22Revert "[Clang] CWG1473: do not err on the lack of space after operator"""Reid Kleckner1-4/+46
This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b. There is a large body of non-conforming C-like code using format strings like this: #define PRIuS "zu" void h(size_t foo, size_t bar) { printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar); } Rejecting this code would be very disruptive. We could decide to do that, but it's sufficiently disruptive that I think it requires gathering more community consensus with an RFC, and Aaron indicated [1] it's OK to revert for now so continuous testing systems can see past this issue while we decide what to do. [1] https://reviews.llvm.org/D153156#4607717
2023-08-22[Lex] Preambles should contain the global module fragment.Sam McCall1-0/+16
For applications like clangd, the preamble remains an important optimization when editing a module definition. The global module fragment is a good fit for it as it by definition contains only preprocessor directives. Before this patch, we would terminate the preamble immediately at the "module" keyword. Differential Revision: https://reviews.llvm.org/D158439
2023-08-17[Clang] CWG1473: do not err on the lack of space after operator""Po-yao Chang1-46/+4
In addition: 1. Fix tests for CWG2521 deprecation warning. 2. Enable -Wdeprecated-literal-operator by default. Differential Revision: https://reviews.llvm.org/D153156
2023-08-11[C23] Rename C2x->C23 in diagnosticsAaron Ballman1-1/+1
This renames C2x to C23 in diagnostic identifiers and messages. The changes were made mechanically.
2023-08-11[C23] Rename C2x -> C23; NFCAaron Ballman1-6/+6
This does the rename for most internal uses of C2x, but does not rename or reword diagnostics (those will be done in a follow-up). I also updated standards references and citations to the final wording in the standard.
2023-07-22[clang] Enable C++11-style attributes in all language modesNikolas Klauser1-5/+2
This also ignores and deprecates the `-fdouble-square-bracket-attributes` command line flag, which seems to not be used anywhere. At least a code search exclusively found mentions of it in documentation: https://sourcegraph.com/search?q=context:global+-fdouble-square-bracket-attributes+-file:clang/*+-file:test/Sema/*+-file:test/Parser/*+-file:test/AST/*+-file:test/Preprocessor/*+-file:test/Misc/*+archived:yes&patternType=standard&sm=0&groupBy=repo RFC: https://discourse.llvm.org/t/rfc-enable-c-11-c2x-attributes-in-all-standard-modes-as-an-extension-and-remove-fdouble-square-bracket-attributes This enables `[[]]` attributes in all C and C++ language modes without warning by default. `-Wc++-extensions` does warn. GCC has enabled this extension in all C modes since GCC 10. Reviewed By: aaron.ballman, MaskRay Spies: #clang-vendors, beanz, JDevlieghere, Michael137, MaskRay, sstefan1, jplehr, cfe-commits, lldb-commits, dmgreen, jdoerfert, wenlei, wlei Differential Revision: https://reviews.llvm.org/D151683
2023-07-12[Clang] Correctly handle $, @, and ` when represented as UCNCorentin Jabot1-6/+8
This covers * P2558R2 (C++, wg21.link/P2558) * N2701 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm) * N3124 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf) This patch * Disallow representing $ as a UCN in all language mode, which did not properly work (see GH62133), and which in made ill-formed in C++ and C by P2558 and N3124 respectively * Allow a UCN for any character in C2X, in string and character literals Fixes #62133 Reviewed By: #clang-language-wg, tahonermann Differential Revision: https://reviews.llvm.org/D153621
2023-05-04[clang] Use -std=c++23 instead of -std=c++2bMark de Wever1-4/+4
During the ISO C++ Committee meeting plenary session the C++23 Standard has been voted as technical complete. This updates the reference to c++2b to c++23 and updates the __cplusplus macro. Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen. Reviewed By: aaron.ballman Differential Revision: https://reviews.llvm.org/D149553
2023-05-03[clang][deps] Teach dep directive scanner about #pragma clang system_headerBen Langmuir1-0/+1
This ensures we get the correct FileCharacteristic during scanning. In a yet-to-be-upstreamed branch this fixes observable failures, but it's also good to handle this on principle: the FileCharacteristic is a property of the file that is observable in the scanner, so there is nothing preventing us from depending on it. rdar://108627403 Differential Revision: https://reviews.llvm.org/D149777
2023-04-27[C++20] [Modules] Avoid crash if the inconsistency the size of lang options ↵Chuanqi Xu1-3/+1
exceeds 1 Close https://github.com/llvm/llvm-project/issues/62359 The root reason for the crash is that we didn't test the case that the bits number of a language option exceeds 1.
2023-03-15Use *{Map,Set}::contains (NFC)Kazu Hirata1-1/+1
Differential Revision: https://reviews.llvm.org/D146104
2023-01-28Use llvm::count{lr}_{zero,one} (NFC)Kazu Hirata1-1/+1
2023-01-19[Lex] For dependency directive lexing, angled includes in `__has_include` ↵Argyrios Kyrtzidis1-0/+16
should be lexed as string literals rdar://104386604 Differential Revision: https://reviews.llvm.org/D142143
2023-01-14[clang] Remove remaining uses of llvm::Optional (NFC)Kazu Hirata1-1/+0
This patch removes several "using" declarations and #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14[clang] Use std::optional instead of llvm::Optional (NFC)Kazu Hirata1-12/+12
This patch replaces (llvm::|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14[clang] Add #include <optional> (NFC)Kazu Hirata1-0/+1
This patch adds #include <optional> to those files containing llvm::Optional<...> or Optional<...>. I'll post a separate patch to actually replace llvm::Optional with std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-03[Clang] Fix a crash when encountering an ill-formed delimited UCN.Corentin Jabot1-1/+1
\u<DIGIT>{...} was incorrectly parsed as a valid UCN instead of emitting a diagnostic, causing an assertion failure. Reviewed By: tahonermann Differential Revision: https://reviews.llvm.org/D139889