Age | Commit message (Collapse) | Author | Files | Lines |
|
Consider the following code:
```cpp
# 1 __FILE__ 1 3
export module a;
```
According to the wording in
[P1857R3](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1857r3.html):
```
A module directive may only appear as the first preprocessing tokens in a file (excluding the global module fragment.)
```
and the wording in
[[cpp.pre]](https://eel.is/c++draft/cpp.pre#nt:module-file)
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
`#` is the first pp-token in the translation unit, and it was rejected
by clang, but they really should be exempted from this rule. The goal is
to not allow any preprocessor conditionals or most state changes, but
these don't fit that.
State change would mean most semantically observable preprocessor state,
particularly anything that is order dependent. Global flags like being a
system header/module shouldn't matter.
We should exempt a brunch of directives, even though it violates the
current standard wording.
In this patch, we introduce a `TrivialDirectiveTracer` to trace the
**State change** that described above and propose to exempt the
following kind of directive: `#line`, GNU line marker, `#ident`,
`#pragma comment`, `#pragma mark`, `#pragma detect_mismatch`, `#pragma
clang __debug`, `#pragma message`, `#pragma GCC warning`, `#pragma GCC
error`, `#pragma gcc diagnostic`, `#pragma OPENCL EXTENSION`, `#pragma
warning`, `#pragma execution_character_set`, `#pragma clang
assume_nonnull` and builtin macro expansion.
Fixes https://github.com/llvm/llvm-project/issues/145274
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
|
|
with std::numeric_limits (#147623)
This PR addresses instances of compiler warning C4146 that can be
replaced with std::numeric_limits. Specifically, these are cases where a
literal such as '-1ULL' was used to assign a value to a uint64_t
variable. The intent is much cleaner if we use the appropriate
std::numeric_limits value<Type>::max() for these cases.
Addresses #147439
|
|
submodule. (#146976)
Otherwise we are continuing in an invalid state and can easily crash.
It is a follow-up to cde90e68f8123e7abef3f9e18d79980aa19f460a but an
important difference is when a failure happens in a submodule. In this
case in `Preprocessor::HandleEndOfFile` `tok::eof` is replaced by
`tok::annot_module_end`. And after exiting a file with bad
`#include/#import` we work with a new buffer, so `BufferPtr < BufferEnd`.
As there are no signs to stop lexing we just keep doing it.
The fix is the same as in dc9fdaf2171cc480300d5572606a8ede1678d18b in
`Lexer::LexTokenInternal` but this time in
`Lexer::LexDependencyDirectiveToken` as well.
rdar://152499276
|
|
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
SourceLocation (#145711)
Introduce a type alias for the commonly used `std::pair<FileID,
unsigned>` to improve code readability, and make it easier for future
updates (64-bit source locations).
|
|
Depends on [[clang][Preprocessor] Add peekNextPPToken, makes look ahead
next token without
side-effects](https://github.com/llvm/llvm-project/pull/143898).
This PR fix the performance regression that introduced in
https://github.com/llvm/llvm-project/pull/144233.
The original PR(https://github.com/llvm/llvm-project/pull/144233) handle
the first pp-token in the main source file in the macro
definition/expansion and `Lexer::Lex`, but the lexer is almost always on
the hot path, we may hit a performance regression. In this PR, we handle
the first pp-token in `Preprocessor::EnterMainSourceFile`.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
|
|
without side-effects (#143898)
This PR introduce a new function `peekNextPPToken`. It's an extension of
`isNextPPTokenLParen` and can makes look ahead one token in preprocessor
without side-effects.
It's also the 1st part of
https://github.com/llvm/llvm-project/pull/107168 and it was used to look
ahead next token then determine whether current lexing pp directive is
one of pp-import or pp-module directive.
At the start of phase 4 an import or module token is treated as starting
a directive and are converted to their respective keywords iff:
- After skipping horizontal whitespace are
- at the start of a logical line, or
- preceded by an export at the start of the logical line.
- Are followed by an identifier pp token (before macro expansion), or
- <, ", or : (but not ::) pp tokens for import, or
- ; for module
Otherwise the token is treated as an identifier.
---------
Signed-off-by: yronglin <yronglin777@gmail.com>
|
|
tokens in a file (#144233)
This PR is 2nd part of
[P1857R3](https://github.com/llvm/llvm-project/pull/107168)
implementation, and mainly implement the restriction `A module directive
may only appear as the first preprocessing tokens in a file (excluding
the global module fragment.)`:
[cpp.pre](https://eel.is/c++draft/cpp.pre):
```
module-file:
pp-global-module-fragment[opt] pp-module group[opt] pp-private-module-fragment[opt]
```
We also refine tests use `split-file` instead of conditional macro.
Signed-off-by: yronglin <yronglin777@gmail.com>
|
|
This is crucial when recovering from fatal loader errors. Without it,
the `Lexer` keeps yielding more tokens and the compiler may access
invalid `ASTReader` state.
rdar://133388373
|
|
WG14 added N3411 to the list of papers which apply to older versions of
C in C2y, and WG21 adopted CWG787 as a Defect Report in C++11. So we no
longer should be issuing a pedantic diagnostic about a file which does
not end with a newline character.
We do, however, continue to support -Wnewline-eof as an opt-in
diagnostic.
|
|
WG14 N3353 added support for 0o and 0O as octal literal prefixes. It
also deprecates use of octal literals without a prefix, except for the
literal 0.
This feature is being exposed as an extension in older C language modes
as well as in all C++ language modes.
|
|
This paper (https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3411.pdf)
allows a source file to end without a newline. Clang has supported this
as a conforming extension for a long time, so this suppresses the
diagnotic in C2y mode but continues to diagnose as an extension in
earlier language modes. It also continues to diagnose if the user passes
-Wnewline-eof explicitly.
|
|
Similarly to https://github.com/llvm/llvm-project/pull/122918, leading
comments are currently not being moved.
```
struct Foo {
// This one is the cool field.
int a;
int b;
};
```
becomes:
```
struct Foo {
// This one is the cool field.
int b;
int a;
};
```
but should be:
```
struct Foo {
int b;
// This one is the cool field.
int a;
};
```
|
|
We have two copies of the same code in clang-tidy and
clang-reorder-fields, and those are extremenly similar to
`Lexer::findNextToken`, so just add an extra agument to the latter.
---------
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
|
|
(#117548)
The Lexer used in getRawToken is not told to keep whitespace, so when it
skips over escaped newlines, it also ignores whitespace, regardless of
getRawToken's IgnoreWhiteSpace parameter.
Instead of letting this case fall through to lexing, check
for whitespace after skipping over any escaped newlines.
|
|
Identified with misc-include-cleaner.
|
|
OpenCL has a reserved operator (^^), the use of which was diagnosed as
an error (735c6cdebdcd4292928079cb18a90f0dd5cd65fb). However, OpenCL
also encourages working with the blocks language extension. This token
has a parsing ambiguity as a result. Consider:
unsigned x=0;
unsigned y=x^^{return 0;}();
This should result in y holding the value zero (0^0) through an
immediately invoked block call as the right-hand side of the xor
operator. However, it causes errors instead because of this reserved
token: https://godbolt.org/z/navf7jTv1
This token is still reserved in OpenCL 3.0, so we still wish to issue a
diagnostic for its use. However, we do not need to create a token for an
extension point that's been unused for about a decade. So this patch
moves the diagnostic from a parsing diagnostic to a lexing diagnostic
and no longer forms a single token. The diagnostic behavior is slightly
worse as a result, but still seems acceptable.
Part of the reason this is coming up is because WG21 is considering
using ^^ as a token for reflection, so this token may come back in the
future.
|
|
character literals in C23 (#97210)
Co-authored-by: cor3ntin <corentinjabot@gmail.com>
|
|
This enables raw R"" string literals in C in some language modes
and adds an option to disable or enable them explicitly as an
extension.
Background: GCC supports raw string literals in C in `-gnuXY` modes
starting with gnu99. This pr both enables raw string literals in gnu99
mode and later in C and adds an `-f[no-]raw-string-literals` flag to override
this behaviour. The decision not to enable raw string literals in gnu89
mode, according to the GCC devs, is intentional as that mode is supposed
to be used for ‘old code’ that they don’t want to break; we’ve decided to
match GCC’s behaviour here as well.
The `-fraw-string-literals` flag can additionally be used to enable raw string
literals in modes where they aren’t enabled by default (such as c99—as
opposed to gnu99—or even e.g. C++03); conversely, the negated flag can
be used to disable them in any gnuXY modes that *do* provide them by
default, or to override a previous flag. However, we do *not* support
disabling raw string literals (or indeed either of these two options) in
C++11 mode and later, because we don’t want to just start supporting
disabling features that are actually part of the language in the general case.
This fixes #85703.
|
|
|
|
And as an extension in older language modes.
Per https://eel.is/c++draft/lex.string#nt:d-char
Fixes #93130
|
|
(bad error message on incorrect string literal)
Fixed the error message for incorrect string literal
before:
```
test.cpp:1:19: error: invalid character '
' character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
char const* a = R"
^
```
now:
```
test.cpp:1:19: error: invalid newline character in raw string delimiter; use PREFIX( )PREFIX to delimit raw string
1 | char const* a = R"
| ^
```
---------
Co-authored-by: Jon Roelofs <jroelofs@gmail.com>
|
|
So that it can be used by clang-format.
|
|
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
|
|
HLSL supports vector swizzles on scalars by implicitly converting the
scalar to a single-element vector. This syntax is a convienent way to
initialize vectors based on filling a scalar value.
There are two parts of this change. The first part in the Lexer splits
numeric constant tokens when a `.x` or `.r` suffix is encountered. This
splitting is a bit hacky but allows the numeric constant to be parsed
separately from the vector element expression. There is an ambiguity
here with the `r` suffix used by fixed point types, however fixed point
types aren't supported in HLSL so this should not cause any exposable
problems (a separate issue has been filed to track validating language
options for HLSL: #67689).
The second part of this change is in Sema::LookupMemberExpr. For HLSL,
if the base type is a scalar, we implicit cast the scalar to a
one-element vector then call back to perform the vector lookup.
Fixes #56658 and #67511
|
|
Instead of passing the Size by reference, assuming it is initialized,
return it alongside the expected char result as a POD.
This makes the interface less error prone: previous interface expected
the Size reference to be initialized, and it was often forgotten,
leading to uninitialized variable usage. This patch fixes the issue.
This also generates faster code, as the returned POD (a char and an
unsigned) fits in 64 bits. The speedup according to compile time tracker
reach -O.7%, with a good number of -0.4%. Details are available on
https://llvm-compile-time-tracker.com/compare.php?from=3fe63f81fcb999681daa11b2890c82fda3aaeef5&to=fc76a9202f737472ecad4d6e0b0bf87a013866f3&stat=instructions:u
And icing on the cake, on my setup it also shaves 2kB out of
libclang-cpp :-)
This is a recommit of d8f5a18b6e587aeaa8b99707e87b652f49b160cd for
|
|
This reverts commit d8f5a18b6e587aeaa8b99707e87b652f49b160cd.
Breaks build, see:
https://github.com/llvm/llvm-project/pull/70543#issuecomment-1784227421
|
|
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
|
|
The _mm_cmpistri instruction can be used to quickly parse identifiers.
With this patch activated, clang pre-processes <iostream> 1.8% faster,
and sqlite3.c amalgametion 1.5% faster, based on time measurements and
number of executed instructions as measured by valgrind.
The introduction of an extra helper function in the regular case has no
impact on performance, see
https://llvm-compile-time-tracker.com/compare.php?from=30240e428f0ec7d4a6d1b84f9f807ce12b46cfd1&to=12bcb016cde4579ca7b75397762098c03eb4f264&stat=instructions:u
---------
Co-authored-by: serge-sans-paille <sguelton@mozilla.com>
|
|
|
|
int a\
ス;
Failed to be parsed as a valid identifier.
Fixes #65156
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D159345
|
|
|
|
This reverts commit f2583f3acf596cc545c8c0e3cb28e712f4ebf21b.
There is a large body of non-conforming C-like code using format strings
like this:
#define PRIuS "zu"
void h(size_t foo, size_t bar) {
printf("foo is %"PRIuS", bar is %"PRIuS, foo, bar);
}
Rejecting this code would be very disruptive. We could decide to do
that, but it's sufficiently disruptive that I think it requires
gathering more community consensus with an RFC, and Aaron indicated [1]
it's OK to revert for now so continuous testing systems can see past
this issue while we decide what to do.
[1] https://reviews.llvm.org/D153156#4607717
|
|
For applications like clangd, the preamble remains an important optimization
when editing a module definition. The global module fragment is a good fit for
it as it by definition contains only preprocessor directives.
Before this patch, we would terminate the preamble immediately at the "module"
keyword.
Differential Revision: https://reviews.llvm.org/D158439
|
|
In addition:
1. Fix tests for CWG2521 deprecation warning.
2. Enable -Wdeprecated-literal-operator by default.
Differential Revision: https://reviews.llvm.org/D153156
|
|
This renames C2x to C23 in diagnostic identifiers and messages. The
changes were made mechanically.
|
|
This does the rename for most internal uses of C2x, but does not rename
or reword diagnostics (those will be done in a follow-up).
I also updated standards references and citations to the final wording
in the standard.
|
|
This also ignores and deprecates the `-fdouble-square-bracket-attributes` command line flag, which seems to not be used anywhere. At least a code search exclusively found mentions of it in documentation: https://sourcegraph.com/search?q=context:global+-fdouble-square-bracket-attributes+-file:clang/*+-file:test/Sema/*+-file:test/Parser/*+-file:test/AST/*+-file:test/Preprocessor/*+-file:test/Misc/*+archived:yes&patternType=standard&sm=0&groupBy=repo
RFC: https://discourse.llvm.org/t/rfc-enable-c-11-c2x-attributes-in-all-standard-modes-as-an-extension-and-remove-fdouble-square-bracket-attributes
This enables `[[]]` attributes in all C and C++ language modes without warning by default. `-Wc++-extensions` does warn. GCC has enabled this extension in all C modes since GCC 10.
Reviewed By: aaron.ballman, MaskRay
Spies: #clang-vendors, beanz, JDevlieghere, Michael137, MaskRay, sstefan1, jplehr, cfe-commits, lldb-commits, dmgreen, jdoerfert, wenlei, wlei
Differential Revision: https://reviews.llvm.org/D151683
|
|
This covers
* P2558R2 (C++, wg21.link/P2558)
* N2701 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm)
* N3124 (C, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3124.pdf)
This patch
* Disallow representing $ as a UCN in all language mode, which did not
properly work (see GH62133), and which in made ill-formed in
C++ and C by P2558 and N3124 respectively
* Allow a UCN for any character in C2X, in string and character
literals
Fixes #62133
Reviewed By: #clang-language-wg, tahonermann
Differential Revision: https://reviews.llvm.org/D153621
|
|
During the ISO C++ Committee meeting plenary session the C++23 Standard
has been voted as technical complete.
This updates the reference to c++2b to c++23 and updates the __cplusplus
macro.
Drive-by fixes c++1z -> c++17 and c++2a -> c++20 when seen.
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D149553
|
|
This ensures we get the correct FileCharacteristic during scanning. In a
yet-to-be-upstreamed branch this fixes observable failures, but it's
also good to handle this on principle: the FileCharacteristic is a
property of the file that is observable in the scanner, so there is
nothing preventing us from depending on it.
rdar://108627403
Differential Revision: https://reviews.llvm.org/D149777
|
|
exceeds 1
Close https://github.com/llvm/llvm-project/issues/62359
The root reason for the crash is that we didn't test the case that
the bits number of a language option exceeds 1.
|
|
Differential Revision: https://reviews.llvm.org/D146104
|
|
|
|
should be lexed as string literals
rdar://104386604
Differential Revision: https://reviews.llvm.org/D142143
|
|
This patch removes several "using" declarations and #include
"llvm/ADT/Optional.h".
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
This patch replaces (llvm::|)Optional< with std::optional<. I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
This patch adds #include <optional> to those files containing
llvm::Optional<...> or Optional<...>.
I'll post a separate patch to actually replace llvm::Optional with
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
|
|
\u<DIGIT>{...} was incorrectly parsed as a valid UCN instead
of emitting a diagnostic, causing an assertion failure.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D139889
|