aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-06-06[analyzer][NFCI] Remove ad-hoc program point tagging (#142980)Donát Nagy1-2/+1
Previously some checkers attached explicitly created program point tags to some of the exploded graph nodes that they created. In most of the checkers this ad-hoc tagging only affected the debug dump of the exploded graph (and they weren't too relevant for debugging) so this commit removes them. There were two checkers where the tagging _did_ have a functional role: - In `RetainCountChecker` the presence of tags were checked by `RefCountReportVisitor`. - In `DynamicTypePropagation` the checker sometimes wanted to create two identical nodes and had to apply an explicit tag on the second one to avoid "caching out". In these two situations I preserved the tags but switched to using `SimpleProgramPointTag` instead of `CheckerProgramPointTag` because `CheckerProgramPointTag` didn't provide enough benefits to justify its existence. Note that this commit depends on the earlier commit "[analyzer] Fix tagging of PostAllocatorCall" ec96c0c072ef3f78813c378949c00e1c07aa44e5 and would introduce crashes when cherry-picked onto a branch that doesn't contain that commit. For more details about the background see the discourse thread https://discourse.llvm.org/t/role-of-programpointtag-in-the-static-analyzer/ As a tangentially related changes, this commit also adds some comments to document the surprising behavior of `CheckerContext::addTransition` and an assertion in the constructor of `PathSensitiveBugReport` to get a more readable crash dump in the case when the report is constructed with `nullptr` as the `ErrorNode`. (This can happen due to "caching out".)
2025-05-26[StaticAnalyzer] Remove unused includes (NFC) (#141525)Kazu Hirata1-1/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-05-04[clang] Remove unused local variables (NFC) (#138453)Kazu Hirata1-1/+0
2025-03-28[analyzer] Fix format attribute handling in GenericTaintChecker (#132765)Donát Nagy1-0/+16
Previously `optin.taint.GenericTaint` misinterpreted the parameter indices and produced false positives in situations when a [format attribute](https://clang.llvm.org/docs/AttributeReference.html#format) is applied on a non-static method. This commit fixes this bug
2024-07-10[analyzer] Split TaintPropagation checker into reporting and modeling ↵Daniel Krupp1-6/+22
checkers (#98157) Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.
2024-05-16[analyzer] Clean up list of taint propagation functions (#91635)Donát Nagy1-166/+204
This commit refactors GenericTaintChecker and performs various improvements in the list of taint propagation functions: 1. The matching mode (usually `CDM::CLibrary` or `CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g. C++ methods or functions from a user-defined namespace that happen to share the name of a well-known library function. 2. With these matching modes, a `CallDescription` can automatically match builtin variants of the functions, so entries that explicitly specified a builtin function were removed. This eliminated inconsistencies where the "normal" and the builtin variant of the same function was handled differently (e.g. `__builtin_strlcat` was covered, while plain `strlcat` wasn't; while `__builtin_memcpy` and `memcpy` were both on the list with different propagation rules). 3. The modeling of the functions `strlcat` and `strncat` was updated to propagate taint from the first argument (index 0), because a tainted string should remain tainted even if we append something else to it. Note that this was already applied to `strcat` and `wcsncat` by commit 6ceb1c0ef9f544be0eed65e46cc7d99941a001bf. 4. Some functions were updated to propagate taint from a size/length argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will now return a tainted value (because the attacker can manipulate it). This principle was already present in some propagation rules (e.g. `__builtin_memcpy` was handled this way), and even after this commit there are still some functions where it isn't applied. (I only aimed for consistency within the same function family.) 5. Functions that have hardened `__FOO_chk()` variants are matched in `CDM:CLibraryMaybeHardened` to ensure consistent handling of the "normal" and the hardened variant. I added special handling for the hardened variants of "sprintf" and "snprintf" because there the extra parameters are inserted into the middle of the parameter list. 6. Modeling of `sscanf_s` was added, to complete the group of `fscanf`, `fscanf_s` and `sscanf`. 7. The `Source()` specifications for `gets`, `gets_s` and `wgetch` were ill-formed: they were specifying variadic arguments starting at argument index `ReturnValueIndex`. (That is, in addition to the return value they were propagating taint to all arguments.) 8. Functions that were related to each other were grouped together. (I know that this makes the diff harder to read, but I felt that the full list is unreadable without some reorganization.) 9. I spotted and removed some redundant curly braces. Perhaps would be good to switch to a cleaner layout with less nested braces... 10. I updated some obsolete comments and added two TODOs for issues that should be fixed in followup commits.
2024-05-11[clang] Use StringRef::operator== instead of StringRef::equals (NFC) (#91844)Kazu Hirata1-4/+3
I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 24 under clang/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-02[analyzer] Remove untrusted buffer size warning in the TaintPropagation ↵Daniel Krupp1-39/+18
checker (#68607) Before this commit the the checker alpha.security.taint.TaintPropagation always reported warnings when the size argument of a memcpy-like or malloc-like function was tainted. However, this produced false positive reports in situations where the size was tainted, but correctly performed bound checks guaranteed the safety of the call. This commit removes the rough "always warn if the size argument is tainted" heuristic; but it would be good to add a more refined "warns if the size argument is tainted and can be too large" heuristic in follow-up commits. That logic would belong to CStringChecker and MallocChecker, because those are the checkers responsible for the more detailed modeling of memcpy-like and malloc-like functions. To mark this plan, TODO comments are added in those two checkers. There were several test cases that used these sinks to test generic properties of taint tracking; those were adapted to use different logic. As a minor unrelated change, this commit ensures that strcat (and its wide variant, wcsncat) propagates taint from the first argument to the first argument, i.e. a tainted string remains tainted if we concatenate it with another string. This change was required because the adapted variant of multipleTaintedArgs is relying on strncat to compose a value that combines taint from two different sources.
2024-04-05[analyzer] Make recognition of hardened __FOO_chk functions explicit (#86536)NagyDonat1-12/+15
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard functions are implemented as macros that expand to calls of hardened functions that take one additional argument compared to the "usual" variant and perform additional input validation. For example, a `memcpy` call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`. Before this commit, `CallDescription`s created with the matching mode `CDM::CLibrary` automatically matched these hardened variants (in a addition to the "usual" function) with a fairly lenient heuristic. Unfortunately this heuristic meant that the `CLibrary` matching mode was only usable by checkers that were prepared to handle matches with an unusual number of arguments. This commit limits the recognition of the hardened functions to a separate matching mode `CDM::CLibraryMaybeHardened` and applies this mode for functions that have hardened variants and were previously recognized with `CDM::CLibrary`. This way checkers that are prepared to handle the hardened variants will be able to detect them easily; while other checkers can simply use `CDM::CLibrary` for matching C library functions (and they won't encounter surprising argument counts). The initial motivation for refactoring this area was that previously `CDM::CLibrary` accepted calls with more arguments/parameters than the expected number, so I wasn't able to use it for `malloc` without accidentally matching calls to the 3-argument BSD kernel malloc. After this commit this "may have more args/params" logic will only activate when we're actually matching a hardened variant function (in `CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and "snprintf()" in CStringChecker was refactored, because previously it was abusing the behavior that extra arguments are accepted even if the matched function is not a hardened variant. This commit also fixes the oversight that the old code would've recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`. After this commit I'm planning to create several follow-up commits that ensure that checkers looking for C library functions use `CDM::CLibrary` as a "sane default" matching mode. This commit is not truly NFC (it eliminates some buggy corner cases), but it does not intentionally modify the behavior of CSA on real-world non-crazy code. As a minor unrelated change I'm eliminating the argument/variable "IsBuiltin" from the evalSprintf function family in CStringChecker, because it was completely unused. --------- Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-03-04[analyzer] Refactor CallDescription match mode (NFC) (#83432)NagyDonat1-28/+22
The class `CallDescription` is used to define patterns that are used for matching `CallEvent`s. For example, a `CallDescription{{"std", "find_if"}, 3}` matches a call to `std::find_if` with 3 arguments. However, these patterns are somewhat fuzzy, so this pattern could also match something like `std::__1::find_if` (with an additional namespace layer), or, unfortunately, a `CallDescription` for the well-known function `free()` can match a C++ method named `free()`: https://github.com/llvm/llvm-project/issues/81597 To prevent this kind of ambiguity this commit introduces the enum `CallDescription::Mode` which can limit the pattern matching to non-method function calls (or method calls etc.). After this NFC change, one or more follow-up commits will apply the right pattern matching modes in the ~30 checkers that use `CallDescription`s. Note that `CallDescription` previously had a `Flags` field which had only two supported values: - `CDF_None` was the default "match anything" mode, - `CDF_MaybeBuiltin` was a "match only C library functions and accept some inexact matches" mode. This commit preserves `CDF_MaybeBuiltin` under the more descriptive name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`). Instead of this "Flags" model I'm switching to a plain enumeration becasue I don't think that there is a natural usecase to combine the different matching modes. (Except for the default "match anything" mode, which is currently kept for compatibility, but will be phased out in the follow-up commits.)
2023-09-22[analyzer] Fix taint sink rules for exec-like functions (#66358)Balazs Benics1-6/+8
Variadic arguments were not considered as taint sink arguments. I also decided to extend the list of exec-like functions. (Juliet CWE78_OS_Command_Injection__char_connect_socket_execl)
2023-09-19[analyzer] TaintPropagation checker strlen() should not propagate (#66086)Daniel Krupp1-3/+4
strlen(..) call should not propagate taintedness, because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always get warnings about tainted data used as the malloc parameter: buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block). The change has been evaluated on the following open source projects: - memcached: [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tmux: 0 lost reports - twin [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - vim [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - openssl 0 lost reports - sqliste [2 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - ffmpeg 0 lost repots - postgresql [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tinyxml 0 lost reports - libwebm 0 lost reports - xerces 0 lost reports In all cases the lost reports are originating from copying untrusted environment variables into another buffer. There are 2 types of lost false positive reports: 1) [Where the warning is emitted at the malloc call by the TaintPropagation Checker ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c) ` len = strlen(portnumber_filename)+4+1; temp_portnumber_filename = malloc(len); ` 2) When pointers are set based on the length of the tainted string by the ArrayOutofBoundsv2 checker. For example [this ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.
2023-09-14[analyzer] Propagate taint for wchar variants of some APIs (#66074)Balazs Benics1-0/+4
Functions like `fgets`, `strlen`, `strcat` propagate taint. However, their `wchar_t` variants don't. This patch fixes that. Notice, that there could be many more APIs missing. This patch intends to fix those that so far surfaced, instead of exhaustively fixing this issue. https://github.com/llvm/llvm-project/pull/66074
2023-09-14[analyzer] Make socket `accept()` propagate taint (#66074)Balazs Benics1-0/+1
This allows to track taint on real code from `socket()` to reading into a buffer using `recv()`. https://github.com/llvm/llvm-project/pull/66074
2023-09-14[analyzer] Fix stdin declaration in C++ tests (#66074)Balazs Benics1-2/+1
The `stdin` declaration should be within `extern "C" {...}`, in C++ mode. In addition, it should be also marked `extern` in both C and C++ modes. I tightened the check to ensure we only accept `stdin` if both of these match. However, from the Juliet test suite's perspective, this commit should not matter. https://github.com/llvm/llvm-project/pull/66074
2023-07-21[clang][analyzer]Fix non-effective taint sanitationDaniel Krupp1-2/+9
There was a bug in alpha.security.taint.TaintPropagation checker in Clang Static Analyzer. Taint filtering could only sanitize const arguments. After this patch, taint filtering is effective also on non-const parameters. Differential Revision: https://reviews.llvm.org/D155848
2023-04-26[NFC] Wrap entire debug logging loop in LLVM_DEBUGJordan Rupprecht1-4/+4
2023-04-26[analyzer] Show taint origin and propagation correctlyDaniel Krupp1-35/+142
This patch improves the diagnostics of the alpha.security.taint.TaintPropagation checker and taint related checkers by showing the "Taint originated here" note at the correct place, where the attacker may inject it. This greatly improves the understandability of the taint reports. In the baseline the taint source was pointing to an invalid location, typically somewhere between the real taint source and sink. After the fix, the "Taint originated here" tag is correctly shown at the taint source. This is the function call where the attacker can inject a malicious data (e.g. reading from environment variable, reading from file, reading from standard input etc.). This patch removes the BugVisitor from the implementation and replaces it with 2 new NoteTags. One, in the taintOriginTrackerTag() prints the "taint originated here" Note and the other in taintPropagationExplainerTag() explaining how the taintedness is propagating from argument to argument or to the return value ("Taint propagated to the Xth argument"). This implementation uses the interestingess BugReport utility to track back the tainted symbols through propagating function calls to the point where the taintedness was introduced by a source function call. The checker which wishes to emit a Taint related diagnostic must use the categories::TaintedData BugType category and must mark the tainted symbols as interesting. Then the TaintPropagationChecker will automatically generate the "Taint originated here" and the "Taint propagated to..." diagnostic notes.
2023-01-14[clang] Use std::optional instead of llvm::Optional (NFC)Kazu Hirata1-13/+14
This patch replaces (llvm::|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-27[clang] Use a StringRef instead of a raw char pointer to store builtin and ↵serge-sans-paille1-137/+135
call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of e953ae5bbc313fd0cc980ce021d487e5b5199ea4 and the subsequent fixes caa713559bd38f337d7d35de35686775e8fb5175 and 06b90e2e9c991e211fecc97948e533320a825470. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab0017d9732e82b8682c9848ab25ff9e. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881
2022-12-25Revert "[clang] Use a StringRef instead of a raw char pointer to store ↵Vitaly Buka1-135/+137
builtin and call information" Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4 (part 2)" Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4" GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104 compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0. This reverts commit caa713559bd38f337d7d35de35686775e8fb5175. This reverts commit 06b90e2e9c991e211fecc97948e533320a825470. This reverts commit e953ae5bbc313fd0cc980ce021d487e5b5199ea4.
2022-12-24[clang] Use a StringRef instead of a raw char pointer to store builtin and ↵serge-sans-paille1-137/+135
call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This is a recommit of 719d98dfa841c522d8d452f0685e503538415a53 that into account a GGC issue (probably https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92181) when dealing with intiailizer_list and constant expressions. Workaround this by avoiding initializer list, at the expense of a temporary plain old array. Differential Revision: https://reviews.llvm.org/D139881
2022-12-23Revert "[clang] Use a StringRef instead of a raw char pointer to store ↵serge-sans-paille1-135/+137
builtin and call information" There are still remaining issues with GCC 12, see for instance https://lab.llvm.org/buildbot/#/builders/93/builds/12669 This reverts commit 5ce4e92264102de21760c94db9166afe8f71fcf6.
2022-12-23[clang] Use a StringRef instead of a raw char pointer to store builtin and ↵serge-sans-paille1-137/+135
call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This is a recommit of 719d98dfa841c522d8d452f0685e503538415a53 with a change to llvm/utils/TableGen/OptParserEmitter.cpp to cope with GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158 Differential Revision: https://reviews.llvm.org/D139881
2022-12-23Revert "[clang] Use a StringRef instead of a raw char pointer to store ↵serge-sans-paille1-135/+137
builtin and call information" Failing builds: https://lab.llvm.org/buildbot#builders/9/builds/19030 This is GCC specific and has been reported upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158 This reverts commit 719d98dfa841c522d8d452f0685e503538415a53.
2022-12-23[clang] Use a StringRef instead of a raw char pointer to store builtin and ↵serge-sans-paille1-137/+135
call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u Differential Revision: https://reviews.llvm.org/D139881
2022-12-10[Checkers] Use std::optional in GenericTaintChecker.cpp (NFC)Kazu Hirata1-2/+3
This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03[StaticAnalyzer] Use std::nullopt instead of None (NFC)Kazu Hirata1-8/+11
This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-06-20[clang] Don't use Optional::getValue (NFC)Kazu Hirata1-1/+1
2022-06-20Don't use Optional::hasValue (NFC)Kazu Hirata1-1/+1
2022-06-18[clang] Use value_or instead of getValueOr (NFC)Kazu Hirata1-1/+1
2022-06-15[analyzer] Treat system globals as mutable if they are not constBalazs Benics1-4/+2
Previously, system globals were treated as immutable regions, unless it was the `errno` which is known to be frequently modified. D124244 wants to add a check for stores to immutable regions. It would basically turn all stores to system globals into an error even though we have no reason to believe that those mutable sys globals should be treated as if they were immutable. And this leads to false-positives if we apply D124244. In this patch, I'm proposing to treat mutable sys globals actually mutable, hence allocate them into the `GlobalSystemSpaceRegion`, UNLESS they were declared as `const` (and a primitive arithmetic type), in which case, we should use `GlobalImmutableSpaceRegion`. In any other cases, I'm using the `GlobalInternalSpaceRegion`, which is no different than the previous behavior. --- In the tests I added, only the last `expected-warning` was different, compared to the baseline. Which is this: ```lang=C++ void test_my_mutable_system_global_constraint() { assert(my_mutable_system_global > 2); clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}} invalidate_globals(); clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE. } void test_my_mutable_system_global_assign(int x) { my_mutable_system_global = x; clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}} invalidate_globals(); clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE. } ``` --- Unfortunately, the taint checker will be also affected. The `stdin` global variable is a pointer, which is assumed to be a taint source, and the rest of the taint propagation rules will propagate from it. However, since mutable variables are no longer treated immutable, they also get invalidated, when an opaque function call happens, such as the first `scanf(stdin, ...)`. This would effectively remove taint from the pointer, consequently disable all the rest of the taint propagations down the line from the `stdin` variable. All that said, I decided to look through `DerivedSymbol`s as well, to acquire the memregion in that case as well. This should preserve the previously existing taint reports. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127306
2022-06-15[analyzer][NFC] Prefer using isa<> instead getAs<> in conditionsBalazs Benics1-1/+1
Depends on D125709 Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127742
2022-06-15[analyzer][NFC] Remove dead code and modernize surroundingsBalazs Benics1-14/+0
Thanks @kazu for helping me clean these parts in D127799. I'm leaving the dump methods, along with the unused visitor handlers and the forwarding methods. The dead parts actually helped to uncover two bugs, to which I'm going to post separate patches. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127836
2022-04-19[analyzer] Expose Taint.h to pluginsTom Ritter1-1/+1
Reviewed By: NoQ, xazax.hun, steakhal Differential Revision: https://reviews.llvm.org/D123155
2022-03-07[analyzer] Add more propagations to Taint analysisEndre Fülöp1-1/+74
Add more functions as taint propators to GenericTaintChecker. Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D120369
2022-02-28[analyzer] Add more sources to Taint analysisEndre Fülöp1-0/+19
Add more functions as taint sources to GenericTaintChecker. Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D120236
2022-02-24[analyzer] Just use default capture after ↵Fangrui Song1-3/+1
7fd60ee6e0a87957a718297a4a42d9881fc561e3
2022-02-24[analyzer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off buildsFangrui Song1-0/+1
2022-02-23Revert "Revert "[analyzer] Fix taint rule of fgets and setproctitle_init""Balazs Benics1-2/+2
This reverts commit 2acead35c1289d2b3593a992b0639ca6427e481f. Let's try `REQUIRES: asserts`.
2022-02-23Revert "Revert "[analyzer] Fix taint propagation by remembering to the ↵Balazs Benics1-11/+28
location context"" This reverts commit d16c5f4192c30d53468a472c6820163a81192825. Let's try `REQUIRES: asserts`.
2022-02-23Revert "Revert "[analyzer] Add failing test case demonstrating buggy taint ↵Balazs Benics1-3/+22
propagation"" This reverts commit b8ae323cca61dc1edcd36e9ae18c7e4c3d76d52e. Let's try `REQUIRES: asserts`.
2022-02-14Revert "[analyzer] Add failing test case demonstrating buggy taint propagation"Balazs Benics1-22/+3
This reverts commit 744745ae195f0997e5bfd5aa2de47b9ea156b6a6. I'm reverting this since this patch caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14Revert "[analyzer] Fix taint propagation by remembering to the location context"Balazs Benics1-28/+11
This reverts commit b099e1e562555fbc67e2e0abbc15074e14a85ff1. I'm reverting this since the head of the patch stack caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14Revert "[analyzer] Fix taint rule of fgets and setproctitle_init"Balazs Benics1-2/+2
This reverts commit bf5963bf19670ea58facdf57492e147c13bb650f. I'm reverting this since the head of the patch stack caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14[analyzer] Fix taint rule of fgets and setproctitle_initBalazs Benics1-2/+2
There was a typo in the rule. `{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the variadic index is `-1`. What we wanted instead is that both `0` and `-1` are in the discrete index list. Instead of this, we wanted to express that both `0` and the `ReturnValueIndex` is in the discrete arg list. The manual inspection revealed that `setproctitle_init` also suffered a probably incomplete propagation rule. Reviewed By: Szelethus, gamesh411 Differential Revision: https://reviews.llvm.org/D119129
2022-02-14[analyzer] Fix taint propagation by remembering to the location contextBalazs Benics1-11/+28
Fixes the issue D118987 by mapping the propagation to the callsite's LocationContext. This way we can keep track of the in-flight propagations. Note that empty propagation sets won't be inserted. Reviewed By: NoQ, Szelethus Differential Revision: https://reviews.llvm.org/D119128
2022-02-14[analyzer] Add failing test case demonstrating buggy taint propagationBalazs Benics1-3/+22
Recently we uncovered a serious bug in the `GenericTaintChecker`. It was already flawed before D116025, but that was the patch that turned this silent bug into a crash. It happens if the `GenericTaintChecker` has a rule for a function, which also has a definition. char *fgets(char *s, int n, FILE *fp) { nested_call(); // no parameters! return (char *)0; } // Within some function: fgets(..., tainted_fd); When the engine inlines the definition and finds a function call within that, the `PostCall` event for the call will get triggered sooner than the `PostCall` for the original function. This mismatch violates the assumption of the `GenericTaintChecker` which wants to propagate taint information from the `PreCall` event to the `PostCall` event, where it can actually bind taint to the return value **of the same call**. Let's get back to the example and go through step-by-step. The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>` event, so it would 'remember' that it needs to taint the return value and the buffer, from the `PostCall` handler, where it has access to the return value symbol. However, the engine will inline fgets and the `nested_call()` gets evaluated subsequently, which produces an unimportant `PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is observed by the `GenericTaintChecker`, which will unconditionally mark tainted the 'remembered' arg indexes, trying to access a non-existing argument, resulting in a crash. If it doesn't crash, it will behave completely unintuitively, by marking completely unrelated memory regions tainted, which is even worse. The resulting assertion is something like this: Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion `Arg < getNumArgs() && "Arg access out of range!"' failed. The gist of the backtrace: CallExpr::getArg(unsigned int) const SimpleFunctionCall::getArgExpr(unsigned int) CallEvent::getArgSVal(unsigned int) const GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const Prior to D116025, there was a check for the argument count before it applied taint, however, it still suffered from the same underlying issue/bug regarding propagation. This path does not intend to fix the bug, rather start a discussion on how to fix this. --- Let me elaborate on how I see this problem. This pre-call, post-call juggling is just a workaround. The engine should by itself propagate taint where necessary right where it invalidates regions. For the tracked values, which potentially escape, we need to erase the information we know about them; and this is exactly what is done by invalidation. However, in the case of taint, we basically want to approximate from the opposite side of the spectrum. We want to preserve taint in most cases, rather than cleansing them. Now, we basically sanitize all escaping tainted regions implicitly, since invalidation binds a fresh conjured symbol for the given region, and that has not been associated with taint. IMO this is a bad default behavior, we should be more aggressive about preserving taint if not further spreading taint to the reachable regions. We have a couple of options for dealing with it (let's call it //tainting policy//): 1) Taint only the parameters which were tainted prior to the call. 2) Taint the return value of the call, since it likely depends on the tainted input - if any arguments were tainted. 3) Taint all escaped regions - (maybe transitively using the cluster algorithm) - if any arguments were tainted. 4) Not taint anything - this is what we do right now :D The `ExprEngine` should not deal with taint on its own. It should be done by a checker, such as the `GenericTaintChecker`. However, the `Pre`-`PostCall` checker callbacks are not designed for this. `RegionChanges` would be a much better fit for modeling taint propagation. What we would need in the `RegionChanges` callback is the `State` prior invalidation, the `State` after the invalidation, and a `CheckerContext` in which the checker can create transitions, where it would place `NoteTags` for the modeled taint propagations and report errors if a taint sink rule gets violated. In this callback, we could query from the prior State, if the given value was tainted; then act and taint if necessary according to the checker's tainting policy. By using RegionChanges for this, we would 'fix' the mentioned propagation bug 'by-design'. Reviewed By: Szelethus Differential Revision: https://reviews.llvm.org/D118987
2022-01-18Fix pair construction with an implicit constructor inside.Tres Popp1-1/+1
2022-01-18[analyzer][NFC] Refactor GenericTaintChecker to use CallDescriptionMapEndre Fülöp1-748/+664
GenericTaintChecker now uses CallDescriptionMap to describe the possible operation in code which trigger the introduction (sources), the removal (filters), the passing along (propagations) and detection (sinks) of tainted values. Reviewed By: steakhal, NoQ Differential Revision: https://reviews.llvm.org/D116025