aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Format/FormatTokenLexer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-07-21[clang-format] Remove code related to trigraphs (#148640)sstwcw1-24/+11
When reviewing #147156, the reviewers pointed out that we didn't need to support the trigraph. The code never handled it right. In the debug build, this kind of input caused the assertion in the function `countLeadingWhitespace` to fail. The release build without assertions outputted `?` `?` `/` separated by spaces. ```C #define A ??/ int i; ``` This is because the code in `countLeadingWhitespace` assumed that the underlying lexer recognized the entire `??/` sequence as a single token. In fact, the lexer recognized it as 3 separate tokens. The flag to make the lexer recognize trigraphs was never enabled. This patch enables the flag in the underlying lexer. This way, the program now either turns the trigraph into a single `\` or removes it altogether if the line is short enough. There are operators like the `??=` in C#. So the flag is not enabled for all input languages. Instead the check for the token size is moved from the assert line into the if line. The problem was introduced by my own patch 370bee480139 from about 3 years ago. I added code to count the number of characters in the escape sequence probably just because the block of code used to have a comment saying someone should add the feature. Maybe I forgot to enable assertions when I ran the code. I found the problem because reviewing pull request 145243 made me look at the code again.
2025-07-13[clang-format] Add MacrosSkippedByRemoveParentheses option (#148345)Owen Pan1-0/+4
This allows RemoveParentheses to skip the invocations of function-like macros. Fixes #68354. Fixes #147780.
2025-07-10[clang-format] Split line comments separated by backslashes (#147648)Owen Pan1-9/+12
Fixes #147341
2025-07-06[clang-format][NFC] Use `empty()` instead of comparing size() to 0 or 1Owen Pan1-2/+2
2025-07-06[clang-format][NFC] Replace size() with empty() (#147164)Owen Pan1-6/+5
2025-06-25[clang-format] Handle Trailing Whitespace After Line Continuation (P2223R2) ↵Naveen Seth Hanig1-9/+21
(#145243) Fixes #145226. Implement [P2223R2](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2223r2.pdf) in clang-format to correctly handle cases where a backslash '\\' is followed by trailing whitespace before the newline. Previously, `clang-format` failed to properly detect and handle such cases, leading to misformatted code. With this, `clang-format` matches the behavior already implemented in Clang's lexer and `DependencyDirectivesScanner.cpp`, which allow trailing whitespace after a line continuation in any C++ standard.
2025-05-25[clang-format] Handle Java text blocks (#141334)Owen Pan1-0/+32
Fix #61954
2025-05-23[clang-format][NFC] FormatTokenLexer.cpp cleanup (#141202)Owen Pan1-39/+26
2025-04-30Reland [clang-format] Add OneLineFormatOffRegex option (#137577)Owen Pan1-1/+36
2025-04-30Revert "[clang-format] Add OneLineFormatOffRegex option (#137577)"Owen Pan1-35/+0
This reverts commit b8bb1ccb4f9126d1bc9817be24e17f186a75a08b which triggered an assertion failure in CodeGenTest.TestNonAlterTest.
2025-04-29[clang-format] Add OneLineFormatOffRegex option (#137577)Owen Pan1-0/+35
Close #54334
2025-04-22[clang-format] Fix a bug in lexing C++ UDL ending in $ (#136476)Owen Pan1-0/+29
Fix #61612
2025-04-12[clang-format][NFC] Add isJava() and isTextProto() in FormatStyle (#135466)Owen Pan1-5/+5
Also remove redundant name qualifiers format::, FormatStyle::, and LanguageKind::.
2025-04-10[clang-format] Handle C++ keywords in other languages better (#132941)sstwcw1-3/+0
There is some code to make sure that C++ keywords that are identifiers in the other languages are not treated as keywords. Right now, the kind is set to identifier, and the identifier info is cleared. The latter is probably so that the code for identifying C++ structures does not recognize those structures by mistake when formatting a language that does not have those structures. But we did not find an instance where the language can have the sequence of tokens, the code tries to parse the structure as if it is C++ using the identifier info instead of the token kind, but without checking for the language setting. However, there are places where the code checks whether the identifier info field is null or not. They are places where an identifier and a keyword are treated the same way. For example, the name of a function in JavaScript. This patch removes the lines that clear the identifier info. This way, a C++ keyword gets treated in the same way as an identifier in those places. JavaScript New ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Old ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Java New ```Java enum union { ABC, CDE } ``` Old ```Java enum union { ABC, CDE } ``` This reverts commit 97dcbdef6089175c45e14fcbcf5c88b10233a79a.
2025-04-01Revert "[clang-format] Handle C++ keywords in other languages better (#132941)"Owen Pan1-0/+3
This reverts commit ab7cee8a0ecf29fdb47c64c8d431a694d63390d2 which had formatting errors.
2025-03-31[clang-format] Handle C++ keywords in other languages better (#132941)sstwcw1-3/+0
There is some code to make sure that C++ keywords that are identifiers in the other languages are not treated as keywords. Right now, the kind is set to identifier, and the identifier info is cleared. The latter is probably so that the code for identifying C++ structures does not recognize those structures by mistake when formatting a language that does not have those structures. But we did not find an instance where the language can have the sequence of tokens, the code tries to parse the structure as if it is C++ using the identifier info instead of the token kind, but without checking for the language setting. However, there are places where the code checks whether the identifier info field is null or not. They are places where an identifier and a keyword are treated the same way. For example, the name of a function in JavaScript. This patch removes the lines that clear the identifier info. This way, a C++ keyword gets treated in the same way as an identifier in those places. JavaScript New ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Old ```JavaScript async function union( myparamnameiswaytooloooong) { } ``` Java New ```Java enum union { ABC, CDE } ``` Old ```Java enum union { ABC, CDE } ```
2025-03-16[clang-format] Correctly annotate user-defined conversion functions (#131434)Owen Pan1-3/+3
Also fix/delete existing invalid/redundant test cases. Fix #130894
2025-01-14[clang-format][NFC] Make formatting Verilog faster (#121139)sstwcw1-20/+37
A regular expression was used in the lexing process. It made the program take more than linear time with regards to the length of the input. It looked like the entire buffer could be scanned for every token lexed. Now the regular expression is replaced with code. Previously it took 20 minutes for the program to format 125 000 lines of code on my computer. Now it takes 315 milliseconds.
2025-01-04[clang-format][NFC] Replace SmallVectorImpl with ArrayRef (#121621)Owen Pan1-3/+2
2025-01-01[clang-format] Add `VariableTemplates` option (#121318)Owen Pan1-0/+4
Closes #120148.
2024-10-23[clang-format] Add KeepFormFeed option (#113268)Owen Pan1-0/+8
Closes #113170.
2024-10-02[clang-format] Add TemplateNames option to help parse C++ angles (#109916)Owen Pan1-0/+4
Closes #109912.
2024-09-17[clang-format] Reimplement InsertNewlineAtEOF (#108513)Owen Pan1-0/+7
Fixes #108333.
2024-05-13Reland "[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)"Owen Pan1-1/+0
Remove FormatToken::isSimpleTypeSpecifier() and call Token::isSimpleTypeSpecifier(LangOpts) instead.
2024-05-12Revert "[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)"Owen Pan1-0/+1
This reverts commits e62ce1f8842c, 5cd280433e8e, and de641e289269 due to buildbot failures.
2024-05-10[clang-format] Fix FormatToken::isSimpleTypeSpecifier() (#91712)Owen Pan1-1/+0
Remove FormatToken::isSimpleTypeSpecifier() and call Token::isSimpleTypeSpecifier(LangOpts) instead.
2024-04-06[clang-format][NFC] Use `is` instead of `getType() ==`Owen Pan1-1/+1
2024-03-19Revert "[clang-format][NFC] Delete 100+ redundant #include lines in .cpp files"Owen Pan1-0/+4
This reverts commit b92d6dd704d789240685a336ad8b25a9f381b4cc. See github.com/llvm/llvm-project/commit/b92d6dd704d7#commitcomment-139992444 We should use a tool like Visual Studio to clean up the headers.
2024-03-19Revert "[clang-format][NFC] Eliminate the IsCpp parameter in all functions ↵Owen Pan1-3/+2
(#84599)" This reverts c3a1eb6207d8 (and the related commit f3c5278efa3b) which makes cleanupAroundReplacements() no longer thread-safe.
2024-03-16[clang-format][NFC] Delete 100+ redundant #include lines in .cpp filesOwen Pan1-4/+0
2024-03-14Reland [clang-format][NFC] Eliminate the IsCpp parameter in all functions ↵Owen Pan1-2/+3
(#84599) Initialize IsCpp in LeftRightQualifierAlignmentFixer ctor.
2024-03-14Revert "[clang-format][NFC] Eliminate the IsCpp parameter in all functions" ↵Mehdi Amini1-3/+2
(#85353) Reverts llvm/llvm-project#84599 This broke the presubmit bot.
2024-03-14[clang-format][NFC] Eliminate the IsCpp parameter in all functions (#84599)Owen Pan1-2/+3
2024-02-13Revert "[clang-format][NFC] Make LangOpts global in namespace Format"Owen Pan1-4/+8
This reverts commit 32e65b0b8a743678974c7ca7913c1d6c41bb0772. It seems to break some PowerPC bots. See https://github.com/llvm/llvm-project/pull/81390#issuecomment-1941964803.
2024-02-12[clang-format] Support of TableGen value annotations. (#80299)Hirofumi Nakamura1-1/+1
This implements the annotation of the values in TableGen. The main changes are, - parseTableGenValue(), the simplified parser method for the syntax of values. - modified consumeToken() to parseTableGenValue in 'if', 'assert' and after '='. - modified parseParens() to call parseTableGenValue inside. - modified parseSquare() to to call parseTableGenValue inside, with skipping separator tokens. - modified parseAngle() to call parseTableGenValue inside, with skipping separator tokens.
2024-02-11Reland "[clang-format][NFC] Make LangOpts global in namespace Format (#81390)"Owen Pan1-8/+4
Restore getFormattingLangOpts().
2024-02-11Revert "[clang-format][NFC] Make LangOpts global in namespace Format (#81390)"Owen Pan1-4/+8
This reverts commit 03f571995b4f0c260254955afd16ec44d0764794. We can't hide getFormattingLangOpts() as it's used by other tools.
2024-02-11[clang-format][NFC] Make LangOpts global in namespace Format (#81390)Owen Pan1-8/+4
2024-02-09Revert "[clang-format] Update FormatToken::isSimpleTypeSpecifier() (#80241)"Owen Pan1-4/+3
This reverts commit 763139afc19ddf2e0f0265dc828ce8e5fbe92530. It seems that LangOpts is not initialized before use.
2024-02-08[clang-format] Update FormatToken::isSimpleTypeSpecifier() (#80241)Owen Pan1-3/+4
Now with a8279a8bc541, we can make the update.
2024-01-31[clang] Use StringRef::starts_with (NFC)Kazu Hirata1-1/+1
2024-01-31[clang-format] Support of TableGen tokens with unary operator like form, ↵Hirofumi Nakamura1-7/+38
bang operators and numeric literals. (#78996) Adds the support for tokens that have forms like unary operators. - bang operators: `!name` - cond operator: `!cond` - numeric literals: `+1`, `-1` cond operator are one of bang operators but is distinguished because it has very specific syntax.
2024-01-20[clang-format] Support of TableGen identifiers beginning with a number. (#78571)Hirofumi Nakamura1-1/+41
TableGen allows the identifiers beginning with a number. This patch add the support of the recognition of such identifiers.
2024-01-17[clang-format] TableGen multi line string support. (#78032)Hirofumi Nakamura1-0/+41
Support the handling of TableGen's multiline string (code) literal. That has the form, [{ this is the string possibly with multi line... }]
2024-01-11[clang-format] TableGen keywords support. (#77477)Hirofumi Nakamura1-0/+3
Add TableGen keywords to the additional keyword list of the formatter. This pull request is the splited part from https://github.com/llvm/llvm-project/pull/76059 .
2023-12-13[clang] Use StringRef::{starts,ends}_with (NFC) (#75149)Kazu Hirata1-4/+4
This patch replaces uses of StringRef::{starts,ends}with with StringRef::{starts,ends}_with for consistency with std::{string,string_view}::{starts,ends}_with in C++20. I'm planning to deprecate and eventually remove StringRef::{starts,ends}with.
2023-11-29[clang-format][NFC] Extend isProto() to also cover LK_TextProto (#73582)Owen Pan1-4/+1
2023-11-29[clang-format] Add spaces around the Verilog implication operator (#71352)sstwcw1-2/+4
The Verilog implication operator `->` is a binary operator meaning either the left hand side is false or the right hand side is true. Previously it was treated as the C++ struct member operator. I didn't even know it existed when I added the operator formatting part. And I didn't check all the tests for all the operators I added. That is how the bad test got in.
2023-08-24[clang-format][NFC] Replace !is() with isNot()Owen Pan1-11/+11
Differential Revision: https://reviews.llvm.org/D158571
2023-07-18[clang-format] Add TypeNames option to disambiguate types/objectsOwen Pan1-1/+7
If a non-keyword identifier is found in TypeNames, then a *, &, or && that follows it is annotated as TT_PointerOrReference. Differential Revision: https://reviews.llvm.org/D155273