diff options
author | martinboehme <mboehme@google.com> | 2024-03-07 13:31:23 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-03-07 13:31:23 +0100 |
commit | d5aecf0c19fc8850d7d34ac8c339bcc7e133b5fb (patch) | |
tree | 579adeb87a4fddbbda942d754ffd3b35b84a2d43 /clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | |
parent | a11ab139e4de9cdad41c299f198515c09be6f05d (diff) | |
download | llvm-d5aecf0c19fc8850d7d34ac8c339bcc7e133b5fb.zip llvm-d5aecf0c19fc8850d7d34ac8c339bcc7e133b5fb.tar.gz llvm-d5aecf0c19fc8850d7d34ac8c339bcc7e133b5fb.tar.bz2 |
[clang][nullability] Don't discard expression state before end of full-expression. (#82611)
In https://github.com/llvm/llvm-project/pull/72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.
It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:
```cxx
void f(int*, int);
bool cond();
void target() {
int i = 0;
f(&i, cond() ? 1 : 0);
}
```
([godbolt](https://godbolt.org/z/hrbj1Mj3o))
In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.
I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.
Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.
This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in https://github.com/llvm/llvm-project/pull/72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).
The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 71.9µs ± 1% 71.9µs ± 2% ~ (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop 190µs ± 1% 192µs ± 2% +1.06% (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop 325µs ± 5% 324µs ± 4% ~ (p=0.496 n=18+20)
BM_PointerAnalysisBranch 193µs ± 0% 192µs ± 4% ~ (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch 521µs ± 1% 525µs ± 3% +0.94% (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops 337µs ± 1% 341µs ± 3% +1.19% (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath 1.62ms ± 2% 1.64ms ± 3% +0.92% (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop 1.14ms ± 1% 1.15ms ± 4% ~ (p=0.135 n=16+18)
```
[^1]:
```
[B5 (ENTRY)]
Succs (1): B4
[B1]
1: [B4.9] ? [B2.1] : [B3.1]
2: [B4.4]([B4.6], [B1.1])
Preds (2): B2 B3
Succs (1): B0
[B2]
1: 1
Preds (1): B4
Succs (1): B1
[B3]
1: 0
Preds (1): B4
Succs (1): B1
[B4]
1: 0
2: int i = 0;
3: f
4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
5: i
6: &[B4.5]
7: cond
8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
9: [B4.8]()
T: [B4.9] ? ... : ...
Preds (1): B5
Succs (2): B2 B3
[B0 (EXIT)]
Preds (1): B1
```
Diffstat (limited to 'clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp')
-rw-r--r-- | clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp | 28 |
1 files changed, 24 insertions, 4 deletions
diff --git a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp index fd7b06e..62332a1 100644 --- a/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp +++ b/clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp @@ -48,6 +48,24 @@ static llvm::DenseMap<const ValueDecl *, StorageLocation *> intersectDeclToLoc( return Result; } +// Performs a join on either `ExprToLoc` or `ExprToVal`. +// The maps must be consistent in the sense that any entries for the same +// expression must map to the same location / value. This is the case if we are +// performing a join for control flow within a full-expression (which is the +// only case when this function should be used). +template <typename MapT> MapT joinExprMaps(const MapT &Map1, const MapT &Map2) { + MapT Result = Map1; + + for (const auto &Entry : Map2) { + [[maybe_unused]] auto [It, Inserted] = Result.insert(Entry); + // If there was an existing entry, its value should be the same as for the + // entry we were trying to insert. + assert(It->second == Entry.second); + } + + return Result; +} + // Whether to consider equivalent two values with an unknown relation. // // FIXME: this function is a hack enabling unsoundness to support @@ -627,7 +645,8 @@ LatticeJoinEffect Environment::widen(const Environment &PrevEnv, } Environment Environment::join(const Environment &EnvA, const Environment &EnvB, - Environment::ValueModel &Model) { + Environment::ValueModel &Model, + ExprJoinBehavior ExprBehavior) { assert(EnvA.DACtx == EnvB.DACtx); assert(EnvA.ThisPointeeLoc == EnvB.ThisPointeeLoc); assert(EnvA.CallStack == EnvB.CallStack); @@ -675,9 +694,10 @@ Environment Environment::join(const Environment &EnvA, const Environment &EnvB, JoinedEnv.LocToVal = joinLocToVal(EnvA.LocToVal, EnvB.LocToVal, EnvA, EnvB, JoinedEnv, Model); - // We intentionally leave `JoinedEnv.ExprToLoc` and `JoinedEnv.ExprToVal` - // empty, as we never need to access entries in these maps outside of the - // basic block that sets them. + if (ExprBehavior == KeepExprState) { + JoinedEnv.ExprToVal = joinExprMaps(EnvA.ExprToVal, EnvB.ExprToVal); + JoinedEnv.ExprToLoc = joinExprMaps(EnvA.ExprToLoc, EnvB.ExprToLoc); + } return JoinedEnv; } |