aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-05-31[Analysis] Remove unused includes (NFC) (#142255)Kazu Hirata1-2/+0
These are identified by misc-include-cleaner. I've filtered out those that break builds. Also, I'm staying away from llvm-config.h, config.h, and Compiler.h, which likely cause platform- or compiler-specific build failures.
2025-04-27[clang] Use range constructors of *Set (NFC) (#137574)Kazu Hirata1-1/+1
2025-02-01[clang][dataflow][NFC] Remove double lookup (#125282)Balazs Benics1-2/+3
2024-05-08[clang][dataflow] Allow `DataflowAnalysisContext` to use a non-owned ↵martinboehme1-5/+5
`Solver`. (#91316) For some callers (see change in DataflowAnalysis.h), this is more convenient.
2024-04-16[clang][dataflow] Expose getReferencedDecls and relocate free functions. ↵Samira Bazuzi1-52/+1
(#88754) Moves free functions from DataflowEnvironment.h/cc and DataflowAnalysisContext.h/cc to RecordOps and a new ASTOps and exposes them as needed for current use and to expose getReferencedDecls for out-of-tree use. Minimal change in functionality, only to modify the return type of getReferenceDecls to return the collected decls instead of using output params. Tested with `ninja check-clang-tooling`.
2024-03-19[clang][dataflow][NFC] Rename `ControlFlowContext` to `AdornedCFG`. (#85640)martinboehme1-5/+5
This expresses better what the class actually does, and it reduces the number of `Context`s that we have in the codebase. A deprecated alias `ControlFlowContext` is available from the old header.
2024-01-16[clang][dataflow] Tighten checking for existence of a function body. (#78163)martinboehme1-1/+1
In various places, we would previously call `FunctionDecl::hasBody()` (which checks whether any redeclaration of the function has a body, not necessarily the one on which `hasBody()` is being called). This is bug-prone, as a recent bug in Crubit's nullability checker has shown ([fix](https://github.com/google/crubit/commit/4b01ed0f14d953cda20f92d62256e7365d206b2e), [fix for the fix](https://github.com/google/crubit/commit/e0c5d8ddd7d647da483c2ae198ff91d131c12055)). Instead, we now use `FunctionDecl::doesThisDeclarationHaveABody()` which, as the name implies, checks whether the specific redeclaration it is being called on has a body. Alternatively, I considered being more lenient and "canonicalizing" to the `FunctionDecl` that has the body if the `FunctionDecl` being passed is a different redeclaration. However, this also risks hiding bugs: A caller might inadverently perform the analysis for all redeclarations of a function and end up duplicating work without realizing it. By accepting only the redeclaration that contains the body, we prevent this. I've checked, and all clients that I'm aware of do currently pass in the redeclaration that contains the function body. Typically this is because they use the `ast_matchers::hasBody()` matcher which, unlike `FunctionDecl::hasBody()`, only matches for the redeclaration containing the body.
2024-01-16[clang][dataflow] Add an early-out to `flowConditionImplies()` / ↵martinboehme1-0/+6
`flowConditionAllows()`. (#78172) This saves having to assemble the set of constraints and run the SAT solver in the trivial case of `flowConditionImplies(true)` or `flowConditionAllows(false)`. This is an update / reland of my previous reverted [#77453](https://github.com/llvm/llvm-project/pull/77453). That PR contained a logic bug -- the early-out for `flowConditionAllows()` was wrong because my intuition about the logic was wrong. (In particular, note that `flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even though this may run counter to intuition.) I've now done what I should have done on the first iteration and added more tests. These pass both with and without my early-outs. This patch is a performance win on the benchmarks for the Crubit nullability checker, except for one slight regression on a relatively short benchmark: ``` name old cpu/op new cpu/op delta BM_PointerAnalysisCopyPointer 68.5µs ± 7% 67.6µs ± 4% ~ (p=0.159 n=18+19) BM_PointerAnalysisIntLoop 173µs ± 3% 162µs ± 4% -6.40% (p=0.000 n=19+20) BM_PointerAnalysisPointerLoop 307µs ± 2% 312µs ± 4% +1.56% (p=0.013 n=18+20) BM_PointerAnalysisBranch 199µs ± 4% 181µs ± 4% -8.81% (p=0.000 n=20+20) BM_PointerAnalysisLoopAndBranch 503µs ± 3% 508µs ± 2% ~ (p=0.081 n=18+19) BM_PointerAnalysisTwoLoops 304µs ± 4% 286µs ± 2% -6.04% (p=0.000 n=19+20) BM_PointerAnalysisJoinFilePath 4.78ms ± 3% 4.54ms ± 4% -4.97% (p=0.000 n=20+20) BM_PointerAnalysisCallInLoop 3.05ms ± 3% 2.90ms ± 4% -5.05% (p=0.000 n=19+20) ``` When running clang-tidy on real-world code, the results are less clear. In three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s of user time without this patch and 11.40 s with it, though with considerable measurement noise (I'm seeing up to 0.2 s of variation between runs). Still, this is a very simple change, and it is a clear win in benchmarks, so I think it is worth making.
2024-01-10Revert "[clang][dataflow] Add an early-out to `flowConditionImplies()` / ↵martinboehme1-6/+0
`flowConditionAllows()`." (#77570) Reverts llvm/llvm-project#77453
2024-01-09[clang][dataflow] Add an early-out to `flowConditionImplies()` / ↵martinboehme1-0/+6
`flowConditionAllows()`. (#77453) This saves having to assemble the set of constraints and run the SAT solver in the trivial case where `F` is true. This is a performance win on the benchmarks for the Crubit nullability checker: ``` name old cpu/op new cpu/op delta BM_PointerAnalysisCopyPointer 64.1µs ± 5% 63.1µs ± 0% -1.56% (p=0.000 n=20+17) BM_PointerAnalysisIntLoop 172µs ± 2% 171µs ± 0% ~ (p=0.752 n=20+17) BM_PointerAnalysisPointerLoop 408µs ± 3% 355µs ± 0% -12.99% (p=0.000 n=20+17) BM_PointerAnalysisBranch 201µs ± 2% 184µs ± 0% -8.28% (p=0.000 n=20+19) BM_PointerAnalysisLoopAndBranch 684µs ± 2% 613µs ± 2% -10.38% (p=0.000 n=20+19) BM_PointerAnalysisTwoLoops 309µs ± 2% 308µs ± 2% ~ (p=0.728 n=20+19) BM_PointerAnalysisJoinFilePath 37.9ms ± 2% 37.9ms ± 2% +0.06% (p=0.041 n=20+19) BM_PointerAnalysisCallInLoop 26.5ms ± 2% 26.4ms ± 4% -0.59% (p=0.024 n=20+20) ``` When running clang-tidy on real-world code, the results are less clear. In three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s of user time without this patch and 11.81 s with it, though with considerable measurement noise (I'm seeing up to 0.2 s of variation between runs). Still, this is a very simple change, and it is a clear win in benchmarks, so I think it is worth making.
2023-12-04[clang][dataflow] Add synthetic fields to `RecordStorageLocation` (#73860)martinboehme1-1/+39
Synthetic fields are intended to model the internal state of a class (e.g. the value stored in a `std::optional`) without having to depend on that class's implementation details. Today, this is typically done with properties on `RecordValue`s, but these have several drawbacks: * Care must be taken to call `refreshRecordValue()` before modifying a property so that the modified property values aren’t seen by other environments that may have access to the same `RecordValue`. * Properties aren’t associated with a storage location. If an analysis needs to associate a location with the value stored in a property (e.g. to model the reference returned by `std::optional::value()`), it needs to manually add an indirection using a `PointerValue`. (See for example the way this is done in UncheckedOptionalAccessModel.cpp, specifically in `maybeInitializeOptionalValueMember()`.) * Properties don’t participate in the builtin compare, join, and widen operations. If an analysis needs to apply these operations to properties, it needs to override the corresponding methods of `ValueModel`. * Longer-term, we plan to eliminate `RecordValue`, as by-value operations on records aren’t really “a thing” in C++ (see https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This would obviously eliminate the ability to set properties on `RecordValue`s. To demonstrate the advantages of synthetic fields, this patch converts UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly simplifies the implementation of the check. This PR is pretty big; to make it easier to review, I have broken it down into a stack of three commits, each of which contains a set of logically related changes. I considered submitting each of these as a separate PR, but the commits only really make sense when taken together. To review, I suggest first looking at the changes in UncheckedOptionalAccessModel.cpp. This gives a flavor for how the various API changes work together in the context of an analysis. Then, review the rest of the changes.
2023-11-07[clang][dataflow] Fix -Wrange-loop-construct in DataflowAnalysisContext.cpp ↵Jie Fu1-1/+1
(NFC) /llvm-project/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp:247:10: note: use reference type 'const llvm::Sma llVector<Atom> &' to prevent copying for (const llvm::SmallVector<Atom> Class : Info.EquivalentAtoms) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ & 1 error generated.
2023-11-07[clang][dataflow] Simplify flow conditions displayed in HTMLLogger. (#70848)martinboehme1-1/+39
This can make the flow condition significantly easier to interpret; see below for an example. I had hoped that adding the simplification as a preprocessing step before the SAT solver (in `DataflowAnalysisContext::querySolver()`) would also speed up SAT solving and maybe even eliminate SAT solver timeouts, but in my testing, this actually turns out to be a pessimization. It appears that these simplifications are easy enough for the SAT solver to perform itself. Nevertheless, the improvement in debugging alone makes this a worthwhile change. Example of flow condition output with these changes: ``` Flow condition token: V37 Constraints: (V16 = (((V15 & (V19 = V12)) & V22) & V25)) (V15 = ((V12 & ((V14 = V9) | (V14 = V4))) & (V13 = V14))) True atoms: (V0, V1, V2, V5, V6, V7, V29, V30, V32, V34, V35, V37) False atoms: (V3, V8, V17) Equivalent atoms: (V11, V15) Flow condition constraints before simplification: V37 ((!V3 & !V8) & !V17) (V37 = V34) (V34 = (V29 & (V35 = V30))) (V29 = (((V16 | V2) & V32) & (V30 = V32))) (V16 = (((V15 & (V19 = V12)) & V22) & V25)) (V15 = V11) (V11 = ((((V7 | V2) & V12) & ((V7 & (V14 = V9)) | (V2 & (V14 = V4)))) & (V13 = V14))) (V2 = V1) (V1 = V0) V0 (V7 = V6) (V6 = V5) (V5 = V2) ```
2023-10-25[clang][dataflow] Add `Environment::allows()`. (#70046)martinboehme1-4/+13
This allows querying whether, given the flow condition, a certain formula still has a solution (though it is not necessarily implied by the flow condition, as `flowConditionImplies()` would check). This can be checked today, but only with a double negation, i.e. to check whether, given the flow condition, a formula F has a solution, you can check `!Env.flowConditionImplies(Arena.makeNot(F))`. The double negation makes this hard to reason about, and it would be nicer to have a way of directly checking this. For consistency, this patch also renames `flowConditionImplies()` to `proves()`; the old name is kept around for compatibility but deprecated.
2023-10-23[clang][dataflow] Remove ↵martinboehme1-9/+0
`DataflowAnalysisContext::flowConditionIsTautology()`. (#69601) It's only used in its own unit tests.
2023-10-19Reapply "[dataflow] use true/false literals in formulas, rather than variables"Sam McCall1-8/+1
This reverts commit 3353f7dd3d91c9b2b6a15ba9229bee53e0cb8196. Fixed test bug (unspecified order of arg evaluation)
2023-10-11[clang][dataflow] Add support for lambda captures (#68558)Stanislav Gatev1-1/+1
This adds support for copy, ref, and this lambda captures to the core framework and also adds relevant tests in UncheckedOptionalAccessTest.
2023-09-22Revert "[dataflow] use true/false literals in formulas, rather than variables"Douglas Yung1-1/+8
This reverts commit 36bd5bd888f193b70abf43a09bb4fc04cd2a2ff1. This change is causing a test failure on several build bots: - https://lab.llvm.org/buildbot/#/builders/139/builds/50255 - https://lab.llvm.org/buildbot/#/builders/216/builds/27735 - https://lab.llvm.org/buildbot/#/builders/247/builds/9334
2023-09-22[dataflow] use true/false literals in formulas, rather than variablesSam McCall1-8/+1
And simplify formulas containing true/false It's unclear to me how useful this is, it does make formulas more conveniently self-contained now (we can usefully print them without carrying around the "true/false" labels) (while here, simplify !!X to X, too) Differential Revision: https://reviews.llvm.org/D153485
2023-09-14[dataflow] Add global invariant condition to DataflowAnalysisContext (#65949)Sam McCall1-27/+36
This records facts that are not sensitive to the current flow condition, and should apply to all environments. The motivating case is recording information about where a Value originated, such as nullability: - we may see the same Value for multiple expressions (e.g. reads of the same field) in multiple environments (multiple blocks or iterations) - we want to record information only when we first see the Value (e.g. Nullability annotations on fields only add information if we don't know where the value came from) - this information should be expressible as a SAT condition - we must add this SAT condition to every environment where the Value may appear We solve this by recording the information in the global condition. This doesn't seem particularly elegant, but solves the problem and is a fairly small and natural extension of the Environment. Alternatives considered: - store the constraint directly as a property on the Value. But it's more composable for such properties to always be variables (AtomicBoolValue), and constrain them with SAT conditions. - add a hook whenever values are created, giving the analysis the chance to populate them. However the framework relies on/provides the ability to construct values in arbitrary places without providing the context such a hook would need, this would be a very invasive change.
2023-08-29[clang][dataflow][NFC] Eliminate `getStorageLocation()` / ↵Martin Braenne1-5/+7
`setStorageLocation()` in `DataflowAnalysisContext`. Instead, inline them into the `getStableStorageLocation()` overloads, which is the only place they were called from (and should be called from). `getStorageLocation()` / `setStorageLocation()` were confusing because neither their name nor their documentation made reference to the fact that the storage location is stable. It didn't make sense to keep these as private member functions either. The code for the two `getStableStorageLocation()` overloads has become only marginally more complex by inlining these functions, and the `Expr` version is actually more efficient because we only call `ignoreCFGOmittedNodes()` once instead of twice. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D158981
2023-08-01[clang][dataflow] Rename `AggregateStorageLocation` to ↵Martin Braenne1-1/+1
`RecordStorageLocation` and `StructValue` to `RecordValue`. - Both of these constructs are used to represent structs, classes, and unions; Clang uses the collective term "record" for these. - The term "aggregate" in `AggregateStorageLocation` implies that, at some point, the intention may have been to use it also for arrays, but it don't think it's possible to use it for arrays. Records and arrays are very different and therefore need to be modeled differently. Records have a fixed set of named fields, which can have different type; arrays have a variable number of elements, but they all have the same type. - Futhermore, "aggregate" has a very specific meaning in C++ (https://en.cppreference.com/w/cpp/language/aggregate_initialization). Aggregates of class type may not have any user-declared or inherited constructors, no private or protected non-static data members, no virtual member functions, and so on, but we use `AggregateStorageLocations` to model all objects of class type. In addition, for consistency, we also rename the following: - `getAggregateLoc()` (in `RecordValue`, formerly known as `StructValue`) to simply `getLoc()`. - `refreshStructValue()` to `refreshRecordValue()` We keep the old names around as deprecated synonyms to enable clients to be migrated to the new names. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D156788
2023-07-24[clang][dataflow] Eliminate duplication between `AggregateStorageLocation` ↵Martin Braenne1-1/+5
and `StructValue`. After this change, `StructValue` is just a wrapper for an `AggregateStorageLocation`. For the wider context, see https://discourse.llvm.org/t/70086. ## How to review - Start by looking at the comments added / changed in Value.h, StorageLocation.h, and DataflowEnvironment.h. This will give you a good overview of the semantic changes. - Look at the corresponding .cpp files that implement the semantic changes. - Transfer.cpp, TypeErasedDataflowAnalysis.cpp, and RecordOps.cpp show how the core of the framework is affected by the semantic changes. - UncheckedOptionalAccessModel.cpp shows how this complex model is affected by the changes. - Many of the changes in the rest of the patch are mechanical in nature. Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155446
2023-07-17[clang][dataflow] Add `DataflowEnvironment::createObject()`.Martin Braenne1-1/+1
This consolidates the code used in various places to initialize objects (usually for variables) into one central location. It will also help reduce the number of changes needed when we make the upcoming API changes to `AggregateStorageLocation` and `StructValue`. Depends On D155074 Reviewed By: ymandel, xazax.hun Differential Revision: https://reviews.llvm.org/D155075
2023-07-12[clang] Create a buildkite-pipeline.yml file for clangNikolas Klauser1-1/+1
This moves the formatting job to a shell script, which should also fix the clang pre-commit CI. Differential Revision: https://reviews.llvm.org/D153920
2023-07-10[clang][dataflow] Various changes to handling of modeled fields.Martin Braenne1-24/+19
- Rename `getReferencedFields()` to `getModeledFields()`. Move the logic that returns all object fields when doing a context-sensitive analysis to here from `DataflowAnalysisContext::createStorageLocation()`. I think all callers of the previous `getReferencedFields()` should use this logic; the fact that they were not doing so looks like a bug. - Make `getModeledFields()` public. I have an upcoming patch that will need to use this function from Transfer.cpp, and there doesn't seem to be any reason why this function should not be public. - Use a `SmallSetVector` to get deterministic iteration order. I have a pending patch where I'm getting flaky tests because `Environment::createValueUnlessSelfReferential()` is non-deterministically populating different fields depending on the iteration order. This change fixes those flaky tests. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D154586
2023-07-07Reland "[dataflow] Add dedicated representation of boolean formulas"Sam McCall1-46/+56
This reverts commit 7a72ce98224be76d9328e65eee472381f7c8e7fe. Test problems were due to unspecified order of function arg evaluation. Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)" This properly frees the Value hierarchy from managing boolean formulas. We still distinguish AtomicBoolValue; this type is used in client code. However we expect to convert such uses to BoolValue (where the distinction is not needed) or Atom (where atomic identity is intended), and then fold AtomicBoolValue into FormulaBoolValue. We also distinguish TopBoolValue; this has distinct rules for widen/join/equivalence, and top-ness is not represented in Formula. It'd be nice to find a cleaner representation (e.g. the absence of a formula), but no immediate plans. For now, BoolValues with the same Formula are deduplicated. This doesn't seem desirable, as Values are mutable by their creators (properties). We can probably drop this for FormulaBoolValue immediately (not in this patch, to isolate changes). For AtomicBoolValue we first need to update clients to stop using value pointers for atom identity. The data structures around flow conditions are updated: - flow condition tokens are Atom, rather than AtomicBoolValue* - conditions are Formula, rather than BoolValue Most APIs were changed directly, some with many clients had a new version added and the existing one deprecated. The factories for BoolValues in Environment keep their existing signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue) and are not deprecated. These have very many clients and finding the most ergonomic API & migration path still needs some thought. Differential Revision: https://reviews.llvm.org/D153469
2023-07-05Revert "Reland "[dataflow] Add dedicated representation of boolean formulas" ↵Sam McCall1-51/+48
and followups These changes are OK, but they break downstream stuff that needs more time to adapt :-( This reverts commit 71579569f4399d3cf6bc618dcd449b5388d749cc. This reverts commit 5e4ad816bf100b0325ed45ab1cfea18deb3ab3d1. This reverts commit 1c3ac8dfa16c42a631968aadd0396cfe7f7778e0.
2023-07-05[dataflow] use true/false literals in formulas, rather than variablesSam McCall1-8/+1
And simplify formulas containing true/false It's unclear to me how useful this is, it does make formulas more conveniently self-contained now (we can usefully print them without carrying around the "true/false" labels) (while here, simplify !!X to X, too) Differential Revision: https://reviews.llvm.org/D153485
2023-07-05[dataflow] Replace most BoolValue subclasses with references to Formula (and ↵Sam McCall1-50/+47
AtomicBoolValue => Atom and BoolValue => Formula where appropriate) This properly frees the Value hierarchy from managing boolean formulas. We still distinguish AtomicBoolValue; this type is used in client code. However we expect to convert such uses to BoolValue (where the distinction is not needed) or Atom (where atomic identity is intended), and then fold AtomicBoolValue into FormulaBoolValue. We also distinguish TopBoolValue; this has distinct rules for widen/join/equivalence, and top-ness is not represented in Formula. It'd be nice to find a cleaner representation (e.g. the absence of a formula), but no immediate plans. For now, BoolValues with the same Formula are deduplicated. This doesn't seem desirable, as Values are mutable by their creators (properties). We can probably drop this for FormulaBoolValue immediately (not in this patch, to isolate changes). For AtomicBoolValue we first need to update clients to stop using value pointers for atom identity. The data structures around flow conditions are updated: - flow condition tokens are Atom, rather than AtomicBoolValue* - conditions are Formula, rather than BoolValue Most APIs were changed directly, some with many clients had a new version added and the existing one deprecated. The factories for BoolValues in Environment keep their existing signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue) and are not deprecated. These have very many clients and finding the most ergonomic API & migration path still needs some thought. Differential Revision: https://reviews.llvm.org/D153469
2023-07-05Reland "[dataflow] Add dedicated representation of boolean formulas"Sam McCall1-5/+18
This reverts commit 7a72ce98224be76d9328e65eee472381f7c8e7fe. Test problems were due to unspecified order of function arg evaluation.
2023-07-04Revert "[dataflow] Add dedicated representation of boolean formulas"Tom Weaver1-18/+5
This reverts commit 2fd614efc1bb9c27f1bc6c3096c60a7fe121e274. Commit caused failures on the following two build bots: http://45.33.8.238/win/80815/step_7.txt https://lab.llvm.org/buildbot/#/builders/139/builds/44269
2023-07-04[dataflow] Add dedicated representation of boolean formulasSam McCall1-5/+18
This is the first step in untangling the two current jobs of BoolValue. === Desired end-state: === - BoolValue will model C++ booleans e.g. held in StorageLocations. this includes describing uncertainty (e.g. "top" is a Value concern) - Formula describes analysis-level assertions in terms of SAT atoms. These can still be linked together: a BoolValue may have a corresponding SAT atom which is constrained by formulas. === Done in this patch: === BoolValue is left intact, Formula is just the input type to the SAT solver, and we build formulas as needed to invoke the solver. === Incidental changes to debug string printing: === - variables renamed from B0 etc to V0 etc B0 collides with the names of basic blocks, which is confusing when debugging flow conditions. - debug printing of formulas (Formula and Atom) uses operator<< rather than debugString(), so works with gtest. Therefore moved out of DebugSupport.h - Did the same to Solver::Result, and some helper changes to SolverTest, so that we get useful messages on unit test failures - formulas are now printed as infix expressions on one line, rather than wrapped/indented S-exprs. My experience is that this is easier to scan FCs for small examples, and large ones are unreadable either way. - most of the several debugString() functions for constraints/results are unused, so removed them rather than updating tests. Inlined the one that was actually used into its callsite. Differential Revision: https://reviews.llvm.org/D153366
2023-06-26[dataflow] Make SAT solver deterministicSam McCall1-14/+16
The SAT solver imported its constraints by iterating over an unordered DenseSet. The path taken, and ultimately the runtime, the specific solution found, and whether it would time out or complete could depend on the iteration order. Instead, have the caller specify an ordered collection of constraints. If this is built in a deterministic way, the system can have deterministic behavior. (The main alternative is to sort the constraints by value, but this option is simpler today). A lot of nondeterminism still appears to be remain in the framework, so today the solver's inputs themselves are not deterministic yet. Differential Revision: https://reviews.llvm.org/D153584
2023-05-30[clang][dataflow] Remove unnecessary `ASTContext` parameter from ↵Martin Braenne1-1/+1
`ControlFlowContext::build` overload. When introducing this new overload in https://reviews.llvm.org/D151183, I didn't consider that the `ASTContext` parameter was unnecessary because it could also be obtained from the `FunctionDecl`. Reviewed By: gribozavr2, xazax.hun Differential Revision: https://reviews.llvm.org/D151549
2023-05-25[clang][dataflow] Add a `ControlFlowContext::build()` overload taking a ↵Martin Braenne1-2/+2
`FunctionDecl`. This is the most common use case, so it makes sense to have a specific overload for it. Reviewed By: xazax.hun Differential Revision: https://reviews.llvm.org/D151183
2023-05-15[clang][dataflow] Don't analyze templated declarations.Martin Braenne1-1/+1
Attempting to analyze templated code doesn't have a good cost-benefit ratio. We have so far done a best-effort attempt at this, but maintaining this support has an ongoing high maintenance cost because the AST for templates can violate a lot of the invariants that otherwise hold for the AST of concrete code. As just one example, in concrete code the operand of a UnaryOperator '*' is always a prvalue (https://godbolt.org/z/s3e5xxMd1), but in templates this isn't true (https://godbolt.org/z/6W9xxGvoM). Further rationale for not analyzing templates: * The semantics of a template itself are weakly defined; semantics can depend strongly on the concrete template arguments. Analyzing the template itself (as opposed to an instantiation) therefore has limited value. * Analyzing templates requires a lot of special-case code that isn't necessary for concrete code because dependent types are hard to deal with and the AST violates invariants that otherwise hold for concrete code (see above). * There's precedent in that neither Clang Static Analyzer nor the flow-sensitive warnings in Clang (such as uninitialized variables) support analyzing templates. Reviewed By: gribozavr2, xazax.hun Differential Revision: https://reviews.llvm.org/D150352
2023-04-19[dataflow] add HTML logger: browse code/cfg/analysis timeline/stateSam McCall1-5/+36
With -dataflow-log=/dir we will write /dir/0.html etc for each function analyzed. These files show the function's code and CFG, and the path through the CFG taken by the analysis. At each analysis point we can see the lattice state. Currently the lattice state dump is not terribly useful but we can improve this: showing values associated with the current Expr, simplifying flow condition, highlighting changes etc. (Trying not to let this patch scope-creep too much, so I ripped out the half-finished features) Demo: https://htmlpreview.github.io/?https://gist.githubusercontent.com/sam-mccall/1746985bf13406bd19181af281aea9ff/raw/9718fdd48406dabccb3092acd983b4bd55da9dfa/analysis.html Differential Revision: https://reviews.llvm.org/D146591
2023-04-19[dataflow] Extract arena for Value/StorageLocation out of ↵Sam McCall1-179/+21
DataflowAnalysisContext DataflowAnalysisContext has a few too many responsibilities, this narrows them. It also allows the Arena to be shared with analysis steps, which need to create Values, without exposing the whole DACtx API (flow conditions etc). This means Environment no longer needs to proxy all these methods. (For now it still does, because there are many callsites to update, and maybe if we separate bool formulas from values we can avoid churning them twice) In future, if we untangle the concepts of Values from boolean formulas/atoms, Arena would also be responsible for creating formulas and managing atom IDs. Differential Revision: https://reviews.llvm.org/D148554
2023-04-06[clang][dataflow] Use `isRecordType()` where appropriate.Martin Braenne1-2/+1
This is less verbose than checking for class, struct, and union individually, and I believe it's also more efficient (not that that should be the overriding concern). Reviewed By: sammccall, xazax.hun Differential Revision: https://reviews.llvm.org/D147603
2023-04-04[clang][dataflow] Fix -Wdeprecated-declarations after D147302 (NFC)Jie Fu1-3/+3
Replace: 1. createAtomicBoolValue() --> create<AtomicBoolValue>() 2. createTopBoolValue() --> create<TopBoolValue>() /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:386:19: error: 'createAtomicBoolValue' is deprecated: use create<AtomicBoolValue> instead [-Werror,-Wdeprecated-declarations] return DACtx->createAtomicBoolValue(); ^~~~~~~~~~~~~~~~~~~~~ create<AtomicBoolValue> /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:215:3: note: 'createAtomicBoolValue' has been explicitly marked deprecated here LLVM_DEPRECATED("use create<AtomicBoolValue> instead", ^ /Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED' ^ In file included from /Users/jiefu/llvm-project/clang/lib/Analysis/FlowSensitive/Transfer.cpp:14: In file included from /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/Transfer.h:19: /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:391:19: error: 'createTopBoolValue' is deprecated: use create<TopBoolValue> instead [-Werror,-Wdeprecated-declarations] return DACtx->createTopBoolValue(); ^~~~~~~~~~~~~~~~~~ create<TopBoolValue> /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:227:3: note: 'createTopBoolValue' has been explicitly marked deprecated here LLVM_DEPRECATED("use create<TopBoolValue> instead", "create<TopBoolValue>") ^ /Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED' ^ 2 errors generated.
2023-04-04[clang][dataflow] Add `create<T>()` methods to `Environment` and ↵Martin Braenne1-14/+8
`DataflowAnalysisContext`. These methods provide a less verbose way of allocating `StorageLocation`s and `Value`s than the existing `takeOwnership(make_unique(...))` pattern. In addition, because allocation of `StorageLocation`s and `Value`s now happens within the `DataflowAnalysisContext`, the `create<T>()` open up the possibility of using `BumpPtrAllocator` to allocate these objects if it turns out this helps performance. Reviewed By: ymandel, xazax.hun, gribozavr2 Differential Revision: https://reviews.llvm.org/D147302
2023-03-23[FlowSensitive] Log analysis progress for debugging purposesSam McCall1-0/+28
The goal is to be able to understand how the analysis executes, and what its incremental and final findings are, by enabling logging and reading the logs. This should include both framework and analysis-specific information. Ad-hoc printf-debugging doesn't seem sufficient for my understanding, at least. Being able to check in logging, turn it on in a production binary, and quickly find particular analysis steps within complex functions seem important. This can be enabled programmatically through DataflowAnalysisOptions, or via the flag -dataflow-log. (Works in unittests, clang-tidy, standalone tools...) Important missing pieces here: - a logger implementation that produces an interactive report (HTML file) which can be navigated via timeline/code/CFG. (I think the Logger interface is sufficient for this, but need to prototype). - display of the application-specific lattice - more useful display for the built-in environment (e.g. meaningful & consistent names for values, hiding redundant variables in the flow condition, hiding unreachable expressions) Differential Revision: https://reviews.llvm.org/D144730
2023-03-22[dataflow] Log flow condition to the correct stream.Sam McCall1-2/+3
Differential Revision: https://reviews.llvm.org/D146527
2023-03-15[clang] Use *{Map,Set}::contains (NFC)Kazu Hirata1-4/+3
2023-01-10[clang][dataflow][NFC] Refine names and comments for field filtering.Yitzhak Mandelbaum1-5/+5
Tweaks elements of the new API for filtering the set of modeled fields. Differential Revision: https://reviews.llvm.org/D141319
2023-01-10[clang][dataflow] Unify `TransferOptions` and ↵Yitzhak Mandelbaum1-3/+2
`DataflowAnalysisContext::Options`. Merges `TransferOptions` into the newly-introduced `DataflowAnalysisContext::Options` and removes explicit parameter for `TransferOptions`, relying instead on the common options carried by the analysis context. Given that there was no intent to allow different options between calls to `transfer`, a common value for the options is preferable. Differential Revision: https://reviews.llvm.org/D140703
2023-01-09Revert "Revert "[clang][dataflow] Only model struct fields that are used in ↵Yitzhak Mandelbaum1-3/+24
the function being analyzed."" This reverts commit 2b1a517a92bfdfa3b692a660e19a2bb22513a567. It's a fix forward with two memory errors fixed, one of which was the cause of the build breakage in the buildbots. Original message: Previously, the model for structs modeled all fields in a struct when `createValue` was called for that type. This patch adds a prepass on the function under analysis to discover the fields referenced in the scope and then limits modeling to only those fields. This reduces wasted memory usage (modeling unused fields) which can be important for programs that use large structs. Note: This patch obviates the need for https://reviews.llvm.org/D123032.
2023-01-06Revert "[clang][dataflow] Only model struct fields that are used in the ↵Yitzhak Mandelbaum1-24/+3
function being analyzed." This reverts commit 5e8f597c2fedc740b71f07dfdb1ef3c2d348b193. It caused msan and ubsan breakages.
2023-01-05[clang][dataflow] Only model struct fields that are used in the function ↵Yitzhak Mandelbaum1-3/+24
being analyzed. Previously, the model for structs modeled all fields in a struct when `createValue` was called for that type. This patch adds a prepass on the function under analysis to discover the fields referenced in the scope and then limits modeling to only those fields. This reduces wasted memory usage (modeling unused fields) which can be important for programss that use large structs. Note: This patch obviates the need for https://reviews.llvm.org/D123032. Differential Revision: https://reviews.llvm.org/D140694