aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Lex/Lexer.cpp
AgeCommit message (Collapse)AuthorFilesLines
2013-05-10Typo and misc comment fix.Richard Smith1-2/+4
llvm-svn: 181583
2013-04-19[libclang] Make sure the preable does not truncate comments.Argyrios Kyrtzidis1-2/+15
rdar://13647445 llvm-svn: 179907
2013-03-11Add -Wc99-compat warning for C11 unicode string and character literals.Richard Smith1-6/+8
llvm-svn: 176817
2013-03-09When lexing in C11 mode, accept unicode character and string literals, per C11Richard Smith1-9/+13
6.4.4.4/1 and 6.4.5/1. llvm-svn: 176780
2013-03-05Preprocessor: don't consider // to be a line comment in -E -std=c89 mode.Jordan Rose1-4/+7
It's beneficial when compiling to treat // as the start of a line comment even in -std=c89 mode, since it's not valid C code (with a few rare exceptions) and is usually intended as such. We emit a pedantic warning and then continue on as if line comments were enabled. This has been our behavior for quite some time. However, people use the preprocessor for things besides C source files. In today's prompting example, the input contains (unquoted) URLs, which contain // but should still be preserved. This change instructs the lexer to treat // as a plain token if Clang is in C90 mode and generating preprocessed output rather than actually compiling. <rdar://problem/13338743> llvm-svn: 176526
2013-02-21Preprocessor: preserve whitespace in -traditional-cpp mode.Jordan Rose1-17/+28
Note that unlike GNU cpp we currently do not preserve whitespace in macros (even in -traditional-cpp mode). <rdar://problem/12897179> llvm-svn: 175778
2013-02-09Properly validate UCNs for C99 and C++03 (both more restrictive than C(++)11).Jordan Rose1-89/+86
Add warnings under -Wc++11-compat, -Wc++98-compat, and -Wc99-compat when a particular UCN is incompatible with a different standard, and -Wunicode when a UCN refers to a surrogate character in C++03. llvm-svn: 174788
2013-02-08Pull Lexer's CharInfo table out for general use throughout Clang.Jordan Rose1-170/+5
Rewriting the same predicates over and over again is bad for code size and code maintainence. Using the functions in <ctype.h> is generally unsafe unless they are specified to be locale-independent (i.e. only isdigit and isxdigit). The next commit will try to clean up uses of <ctype.h> functions within Clang. llvm-svn: 174765
2013-01-31Lexer: Don't warn about Unicode in preprocessor directives.Jordan Rose1-2/+4
This allows people to use Unicode in their #pragma mark and in macros that exist only to be string-ized. <rdar://problem/13107323&13121362> llvm-svn: 174081
2013-01-30Fix r173881 to properly skip invalid UTF-8 characters in raw lexing and -E.Jordan Rose1-0/+1
This caused hangs as we processed the same invalid byte over and over. <rdar://problem/13115651> llvm-svn: 173959
2013-01-30Move UTF conversion routines from clang/lib/Basic to llvm/lib/SupportDmitri Gribenko1-9/+11
This is required to use them in TableGen. llvm-svn: 173924
2013-01-30Don't warn about Unicode characters in -E mode.Jordan Rose1-18/+20
People use the C preprocessor for things other than C files. Some of them have Unicode characters. We shouldn't warn about Unicode characters appearing outside of identifiers in this case. There's not currently a way for the preprocessor to tell if it's in -E mode, so I added a new flag, derived from the PreprocessorOutputOptions. This is only used by the Unicode warnings for now, but could conceivably be used by other warnings or even behavioral differences later. <rdar://problem/13107323> llvm-svn: 173881
2013-01-28PR15067 (again): Don't warn about UCNs in C90 if we're raw-lexing.Jordan Rose1-1/+2
Fixes a crash. Thanks, Richard. llvm-svn: 173701
2013-01-27PR15067: Don't assert when a UCN appears in a C90 file.Jordan Rose1-3/+6
Unfortunately, we can't accept the UCN as an extension because we're required to treat it as two tokens for preprocessing purposes. llvm-svn: 173622
2013-01-25Lexer.cpp: Fix a warning with ptrdiff_t on i686. [-Wsign-compare]NAKAMURA Takumi1-1/+1
llvm-svn: 173447
2013-01-25Clarify comment: "diagnose" is better than "warn" when emitting an error.Jordan Rose1-1/+1
Thanks, Dmitri. llvm-svn: 173400
2013-01-24Add a fixit for \U1234 -> \u1234.Jordan Rose1-1/+9
llvm-svn: 173371
2013-01-24As an extension, treat Unicode whitespace characters as whitespace.Jordan Rose1-0/+23
llvm-svn: 173370
2013-01-24Handle universal character names and Unicode characters outside of literals.Jordan Rose1-13/+275
This is a missing piece for C99 conformance. This patch handles UCNs by adding a '\\' case to LexTokenInternal and LexIdentifier -- if we see a backslash, we tentatively try to read in a UCN. If the UCN is not syntactically well-formed, we fall back to the old treatment: a backslash followed by an identifier beginning with 'u' (or 'U'). Because the spelling of an identifier with UCNs still has the UCN in it, we need to convert that to UTF-8 in Preprocessor::LookUpIdentifierInfo. Of course, valid code that does *not* use UCNs will see only a very minimal performance hit (checks after each identifier for non-ASCII characters, checks when converting raw_identifiers to identifiers that they do not contain UCNs, and checks when getting the spelling of an identifier that it does not contain a UCN). This patch also adds basic support for actual UTF-8 in the source. This is treated almost exactly the same as UCNs except that we consider stray Unicode characters to be mistakes and offer a fixit to remove them. llvm-svn: 173369
2013-01-12Remove useless 'llvm::' qualifier from names like StringRef and others that areDmitri Gribenko1-1/+1
brought into 'clang' namespace by clang/Basic/LLVM.h llvm-svn: 172323
2013-01-07Pull the bulk of Lexer::MeasureTokenLength() out into a new function,Argyrios Kyrtzidis1-5/+15
Lexer::getRawToken(). No functionality change. llvm-svn: 171771
2013-01-02s/CPlusPlus0x/CPlusPlus11/gRichard Smith1-7/+7
llvm-svn: 171367
2012-12-04Sort all of Clang's files under 'lib', and fix up the broken headersChandler Carruth1-4/+4
uncovered. This required manually correcting all of the incorrect main-module headers I could find, and running the new llvm/utils/sort_includes.py script over the files. I also manually added quite a few missing headers that were uncovered by shuffling the order or moving headers up to be main-module-headers. llvm-svn: 169237
2012-11-28Teach Lexer::getSpelling about raw string literals. Specifically, if a rawRichard Smith1-42/+67
string literal needs cleaning (because it contains line-splicing in the encoding prefix or in the ud-suffix), do not clean the section between the double-quotes -- that's the "raw" bit! llvm-svn: 168776
2012-11-17Fix crash on end-of-file after \ in a char literal, fixes PR14369.Nico Weber1-6/+8
This makes LexCharConstant() look more like LexStringLiteral(), which doesn't have this bug. Add tests for eof after \ for several other cases. llvm-svn: 168269
2012-11-14Fix an assertion failure printing the unused-label fixit in files using CRLF ↵Eli Friedman1-1/+8
line endings. <rdar://problem/12639047>. llvm-svn: 167900
2012-11-13Revert r167801, "[preprocessor] When #including something that contributes noDaniel Dunbar1-22/+0
tokens at all,". This change broke External/Nurbs in LLVM test-suite. llvm-svn: 167858
2012-11-13UCNs in char literals are done (in LiteralSupport), remove FIXME. Expand UCN ↵Nico Weber1-2/+1
FIXME in LexNumericConstant. llvm-svn: 167818
2012-11-13[preprocessor] When #including something that contributes no tokens at all,Argyrios Kyrtzidis1-0/+22
don't recursively continue lexing. This avoids a stack overflow with a sequence of many empty #includes. rdar://11988695 llvm-svn: 167801
2012-11-13In Lexer::LexTokenInternal, avoid code duplication; no functionality change.Argyrios Kyrtzidis1-39/+26
llvm-svn: 167800
2012-11-11s/BCPLComment/LineComment/Nico Weber1-22/+22
llvm-svn: 167690
2012-10-25Take into account that there may be a BOM at the beginning of the file,Argyrios Kyrtzidis1-3/+6
when computing the size of the precompiled preamble. llvm-svn: 166659
2012-09-24StringRef'ize Preprocessor::CreateString().Dmitri Gribenko1-1/+1
llvm-svn: 164555
2012-09-06Dont cast away const needlessly. Found by gcc48 -Wcast-qual.Roman Divacky1-1/+2
llvm-svn: 163325
2012-08-31Make a bunch of methods on Lexer private.Eli Friedman1-1/+1
llvm-svn: 162970
2012-07-30Lexer: remove dead stores. Found by Clang static analyzer!Dmitri Gribenko1-5/+2
llvm-svn: 160973
2012-06-28Add warning flag -Winvalid-pp-token for preprocessing-tokens which haveRichard Smith1-3/+3
undefined behaviour, and move the diagnostic for '' from an Error into an ExtWarn in this group. This is important for some users of the preprocessor, and is necessary for gcc compatibility. llvm-svn: 159335
2012-06-17Documentation cleanup:James Dennett1-11/+7
* Removed docs for Lexer::makeFileCharRange from Lexer.cpp, as they're in the header file; * Reworked the documentation for SkipBlockComment so that it doesn't confuse Doxygen's comment parsing; * Added another summary with \brief markup. llvm-svn: 158618
2012-06-15[-E] Emit a rewritten _Pragma on its own line.Jordan Rose1-1/+1
1. Teach Lexer that pragma lexers are like macro expansions at EOF. 2. Treat pragmas like #define/#undef when printing. 3. If we just printed a directive, add a newline before any more tokens. (4. Miscellaneous cleanup in PrintPreprocessedOutput.cpp) PR10594 and <rdar://problem/11562490> (two separate related problems) llvm-svn: 158571
2012-06-15Documentation cleanup: escape backslashes in Doxygen comments.James Dennett1-4/+5
llvm-svn: 158552
2012-06-15PR12717: Clang supports hexadecimal floating-point literals in all languageRichard Smith1-2/+14
modes. For languages other than C99/C11, this isn't quite a conforming extension, and for C++11, it breaks some reasonable code containing user-defined literals. In languages which don't officially have hexfloats, pare back this extension to only apply in cases where the token starts 0x and does not contain an underscore. The extension is still not quite conforming, but it's a lot closer now. llvm-svn: 158487
2012-06-15Fix PR13065.David Blaikie1-1/+1
This condition (added in r158093) was overly conservative. llvm-svn: 158483
2012-06-08Correct method name in comment: from LexRawToken to LexFromRawLexer, accordingDmitri Gribenko1-2/+2
to a change done long ago in r57393. llvm-svn: 158243
2012-06-07Insert a space if necessary when suggesting CFBridgingRetain/Release.Jordan Rose1-0/+5
This was a problem for people who write 'return(result);' Also fix ARCMT's corresponding code, though there's no test case for this because implicit casts like this are rejected by the migrator for being ambiguous, and explicit casts have no problem. <rdar://problem/11577346> llvm-svn: 158130
2012-06-06Add a -rewrite-includes option, which is similar to -rewrite-macros, but ↵David Blaikie1-2/+3
only expands #include directives. Patch contributed by Lubos Lunak (l.lunax@suse.cz). Review by Matt Beaumont-Gay (matthewbg@google.com). llvm-svn: 158093
2012-06-06Escape \n and \r in doxycomment.David Blaikie1-2/+2
llvm-svn: 158091
2012-05-18Lexer::ReadToEndOfLine: Only build the string if it's actually used and do ↵Benjamin Kramer1-7/+8
so in a less malloc-intensive way. llvm-svn: 157064
2012-04-13Support -Wc++98-compat-pedantic as requested:Seth Cantrell1-4/+4
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120409/056126.html llvm-svn: 154655
2012-04-13C++11 no longer requires files to end with a newlineSeth Cantrell1-1/+2
llvm-svn: 154643
2012-04-07ext_reserved_user_defined_literal must not default to Error in ↵Francois Pichet1-1/+3
MicrosoftMode. Hence create ext_ms_reserved_user_defined_literal that doesn't default to Error; otherwise MSVC headers won't parse. Fixes PR12383. llvm-svn: 154273