diff options
author | David Malcolm <dmalcolm@redhat.com> | 2019-09-27 09:23:16 -0400 |
---|---|---|
committer | David Malcolm <dmalcolm@redhat.com> | 2020-01-14 15:34:24 -0500 |
commit | 757bf1dff5e8cee34c0a75d06140ca972bfecfa7 (patch) | |
tree | cca8a96a39f87c90df46a389d1777854f97017d3 /gcc/doc/analyzer.texi | |
parent | 08c8c973c082457a7d6192673e87475f1fdfdbef (diff) | |
download | gcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.zip gcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.tar.gz gcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.tar.bz2 |
Initial commit of analyzer
This patch adds a static analysis pass to the middle-end, focusing
for this release on C code, and malloc/free issues in particular.
See:
https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer
gcc/ChangeLog:
* Makefile.in (lang_opt_files): Add analyzer.opt.
(ANALYZER_OBJS): New.
(OBJS): Add digraph.o, graphviz.o, ordered-hash-map-tests.o,
tristate.o and ANALYZER_OBJS.
(TEXI_GCCINT_FILES): Add analyzer.texi.
* common.opt (-fanalyzer): New driver option.
* config.in: Regenerate.
* configure: Regenerate.
* configure.ac (--disable-analyzer, ENABLE_ANALYZER): New option.
(gccdepdir): Also create depdir for "analyzer" subdir.
* digraph.cc: New file.
* digraph.h: New file.
* doc/analyzer.texi: New file.
* doc/gccint.texi ("Static Analyzer") New menu item.
(analyzer.texi): Include it.
* doc/invoke.texi ("Static Analyzer Options"): New list and new section.
("Warning Options"): Add static analysis warnings to the list.
(-Wno-analyzer-double-fclose): New option.
(-Wno-analyzer-double-free): New option.
(-Wno-analyzer-exposure-through-output-file): New option.
(-Wno-analyzer-file-leak): New option.
(-Wno-analyzer-free-of-non-heap): New option.
(-Wno-analyzer-malloc-leak): New option.
(-Wno-analyzer-possible-null-argument): New option.
(-Wno-analyzer-possible-null-dereference): New option.
(-Wno-analyzer-null-argument): New option.
(-Wno-analyzer-null-dereference): New option.
(-Wno-analyzer-stale-setjmp-buffer): New option.
(-Wno-analyzer-tainted-array-index): New option.
(-Wno-analyzer-use-after-free): New option.
(-Wno-analyzer-use-of-pointer-in-stale-stack-frame): New option.
(-Wno-analyzer-use-of-uninitialized-value): New option.
(-Wanalyzer-too-complex): New option.
(-fanalyzer-call-summaries): New warning.
(-fanalyzer-checker=): New warning.
(-fanalyzer-fine-grained): New warning.
(-fno-analyzer-state-merge): New warning.
(-fno-analyzer-state-purge): New warning.
(-fanalyzer-transitivity): New warning.
(-fanalyzer-verbose-edges): New warning.
(-fanalyzer-verbose-state-changes): New warning.
(-fanalyzer-verbosity=): New warning.
(-fdump-analyzer): New warning.
(-fdump-analyzer-callgraph): New warning.
(-fdump-analyzer-exploded-graph): New warning.
(-fdump-analyzer-exploded-nodes): New warning.
(-fdump-analyzer-exploded-nodes-2): New warning.
(-fdump-analyzer-exploded-nodes-3): New warning.
(-fdump-analyzer-supergraph): New warning.
* doc/sourcebuild.texi (dg-require-dot): New.
(dg-check-dot): New.
* gdbinit.in (break-on-saved-diagnostic): New command.
* graphviz.cc: New file.
* graphviz.h: New file.
* ordered-hash-map-tests.cc: New file.
* ordered-hash-map.h: New file.
* passes.def (pass_analyzer): Add before
pass_ipa_whole_program_visibility.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::ordered_hash_map_tests_cc_tests.
* selftest.h (selftest::ordered_hash_map_tests_cc_tests): New
decl.
* shortest-paths.h: New file.
* timevar.def (TV_ANALYZER): New timevar.
(TV_ANALYZER_SUPERGRAPH): Likewise.
(TV_ANALYZER_STATE_PURGE): Likewise.
(TV_ANALYZER_PLAN): Likewise.
(TV_ANALYZER_SCC): Likewise.
(TV_ANALYZER_WORKLIST): Likewise.
(TV_ANALYZER_DUMP): Likewise.
(TV_ANALYZER_DIAGNOSTICS): Likewise.
(TV_ANALYZER_SHORTEST_PATHS): Likewise.
* tree-pass.h (make_pass_analyzer): New decl.
* tristate.cc: New file.
* tristate.h: New file.
gcc/analyzer/ChangeLog:
* ChangeLog: New file.
* analyzer-selftests.cc: New file.
* analyzer-selftests.h: New file.
* analyzer.opt: New file.
* analysis-plan.cc: New file.
* analysis-plan.h: New file.
* analyzer-logging.cc: New file.
* analyzer-logging.h: New file.
* analyzer-pass.cc: New file.
* analyzer.cc: New file.
* analyzer.h: New file.
* call-string.cc: New file.
* call-string.h: New file.
* checker-path.cc: New file.
* checker-path.h: New file.
* constraint-manager.cc: New file.
* constraint-manager.h: New file.
* diagnostic-manager.cc: New file.
* diagnostic-manager.h: New file.
* engine.cc: New file.
* engine.h: New file.
* exploded-graph.h: New file.
* pending-diagnostic.cc: New file.
* pending-diagnostic.h: New file.
* program-point.cc: New file.
* program-point.h: New file.
* program-state.cc: New file.
* program-state.h: New file.
* region-model.cc: New file.
* region-model.h: New file.
* sm-file.cc: New file.
* sm-malloc.cc: New file.
* sm-malloc.dot: New file.
* sm-pattern-test.cc: New file.
* sm-sensitive.cc: New file.
* sm-signal.cc: New file.
* sm-taint.cc: New file.
* sm.cc: New file.
* sm.h: New file.
* state-purge.cc: New file.
* state-purge.h: New file.
* supergraph.cc: New file.
* supergraph.h: New file.
gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/CVE-2005-1689-minimal.c: New test.
* gcc.dg/analyzer/abort.c: New test.
* gcc.dg/analyzer/alloca-leak.c: New test.
* gcc.dg/analyzer/analyzer-decls.h: New header.
* gcc.dg/analyzer/analyzer-verbosity-0.c: New test.
* gcc.dg/analyzer/analyzer-verbosity-1.c: New test.
* gcc.dg/analyzer/analyzer-verbosity-2.c: New test.
* gcc.dg/analyzer/analyzer.exp: New suite.
* gcc.dg/analyzer/attribute-nonnull.c: New test.
* gcc.dg/analyzer/call-summaries-1.c: New test.
* gcc.dg/analyzer/conditionals-2.c: New test.
* gcc.dg/analyzer/conditionals-3.c: New test.
* gcc.dg/analyzer/conditionals-notrans.c: New test.
* gcc.dg/analyzer/conditionals-trans.c: New test.
* gcc.dg/analyzer/data-model-1.c: New test.
* gcc.dg/analyzer/data-model-2.c: New test.
* gcc.dg/analyzer/data-model-3.c: New test.
* gcc.dg/analyzer/data-model-4.c: New test.
* gcc.dg/analyzer/data-model-5.c: New test.
* gcc.dg/analyzer/data-model-5b.c: New test.
* gcc.dg/analyzer/data-model-5c.c: New test.
* gcc.dg/analyzer/data-model-5d.c: New test.
* gcc.dg/analyzer/data-model-6.c: New test.
* gcc.dg/analyzer/data-model-7.c: New test.
* gcc.dg/analyzer/data-model-8.c: New test.
* gcc.dg/analyzer/data-model-9.c: New test.
* gcc.dg/analyzer/data-model-11.c: New test.
* gcc.dg/analyzer/data-model-12.c: New test.
* gcc.dg/analyzer/data-model-13.c: New test.
* gcc.dg/analyzer/data-model-14.c: New test.
* gcc.dg/analyzer/data-model-15.c: New test.
* gcc.dg/analyzer/data-model-16.c: New test.
* gcc.dg/analyzer/data-model-17.c: New test.
* gcc.dg/analyzer/data-model-18.c: New test.
* gcc.dg/analyzer/data-model-19.c: New test.
* gcc.dg/analyzer/data-model-path-1.c: New test.
* gcc.dg/analyzer/disabling.c: New test.
* gcc.dg/analyzer/dot-output.c: New test.
* gcc.dg/analyzer/double-free-lto-1-a.c: New test.
* gcc.dg/analyzer/double-free-lto-1-b.c: New test.
* gcc.dg/analyzer/double-free-lto-1.h: New header.
* gcc.dg/analyzer/equivalence.c: New test.
* gcc.dg/analyzer/explode-1.c: New test.
* gcc.dg/analyzer/explode-2.c: New test.
* gcc.dg/analyzer/factorial.c: New test.
* gcc.dg/analyzer/fibonacci.c: New test.
* gcc.dg/analyzer/fields.c: New test.
* gcc.dg/analyzer/file-1.c: New test.
* gcc.dg/analyzer/file-2.c: New test.
* gcc.dg/analyzer/function-ptr-1.c: New test.
* gcc.dg/analyzer/function-ptr-2.c: New test.
* gcc.dg/analyzer/function-ptr-3.c: New test.
* gcc.dg/analyzer/gzio-2.c: New test.
* gcc.dg/analyzer/gzio-3.c: New test.
* gcc.dg/analyzer/gzio-3a.c: New test.
* gcc.dg/analyzer/gzio.c: New test.
* gcc.dg/analyzer/infinite-recursion.c: New test.
* gcc.dg/analyzer/loop-2.c: New test.
* gcc.dg/analyzer/loop-2a.c: New test.
* gcc.dg/analyzer/loop-3.c: New test.
* gcc.dg/analyzer/loop-4.c: New test.
* gcc.dg/analyzer/loop.c: New test.
* gcc.dg/analyzer/malloc-1.c: New test.
* gcc.dg/analyzer/malloc-2.c: New test.
* gcc.dg/analyzer/malloc-3.c: New test.
* gcc.dg/analyzer/malloc-callbacks.c: New test.
* gcc.dg/analyzer/malloc-dce.c: New test.
* gcc.dg/analyzer/malloc-dedupe-1.c: New test.
* gcc.dg/analyzer/malloc-ipa-1.c: New test.
* gcc.dg/analyzer/malloc-ipa-10.c: New test.
* gcc.dg/analyzer/malloc-ipa-11.c: New test.
* gcc.dg/analyzer/malloc-ipa-12.c: New test.
* gcc.dg/analyzer/malloc-ipa-13.c: New test.
* gcc.dg/analyzer/malloc-ipa-2.c: New test.
* gcc.dg/analyzer/malloc-ipa-3.c: New test.
* gcc.dg/analyzer/malloc-ipa-4.c: New test.
* gcc.dg/analyzer/malloc-ipa-5.c: New test.
* gcc.dg/analyzer/malloc-ipa-6.c: New test.
* gcc.dg/analyzer/malloc-ipa-7.c: New test.
* gcc.dg/analyzer/malloc-ipa-8-double-free.c: New test.
* gcc.dg/analyzer/malloc-ipa-8-lto-a.c: New test.
* gcc.dg/analyzer/malloc-ipa-8-lto-b.c: New test.
* gcc.dg/analyzer/malloc-ipa-8-lto-c.c: New test.
* gcc.dg/analyzer/malloc-ipa-8-lto.h: New test.
* gcc.dg/analyzer/malloc-ipa-8-unchecked.c: New test.
* gcc.dg/analyzer/malloc-ipa-9.c: New test.
* gcc.dg/analyzer/malloc-macro-inline-events.c: New test.
* gcc.dg/analyzer/malloc-macro-separate-events.c: New test.
* gcc.dg/analyzer/malloc-macro.h: New header.
* gcc.dg/analyzer/malloc-many-paths-1.c: New test.
* gcc.dg/analyzer/malloc-many-paths-2.c: New test.
* gcc.dg/analyzer/malloc-many-paths-3.c: New test.
* gcc.dg/analyzer/malloc-paths-1.c: New test.
* gcc.dg/analyzer/malloc-paths-10.c: New test.
* gcc.dg/analyzer/malloc-paths-2.c: New test.
* gcc.dg/analyzer/malloc-paths-3.c: New test.
* gcc.dg/analyzer/malloc-paths-4.c: New test.
* gcc.dg/analyzer/malloc-paths-5.c: New test.
* gcc.dg/analyzer/malloc-paths-6.c: New test.
* gcc.dg/analyzer/malloc-paths-7.c: New test.
* gcc.dg/analyzer/malloc-paths-8.c: New test.
* gcc.dg/analyzer/malloc-paths-9.c: New test.
* gcc.dg/analyzer/malloc-vs-local-1a.c: New test.
* gcc.dg/analyzer/malloc-vs-local-1b.c: New test.
* gcc.dg/analyzer/malloc-vs-local-2.c: New test.
* gcc.dg/analyzer/malloc-vs-local-3.c: New test.
* gcc.dg/analyzer/malloc-vs-local-4.c: New test.
* gcc.dg/analyzer/operations.c: New test.
* gcc.dg/analyzer/params-2.c: New test.
* gcc.dg/analyzer/params.c: New test.
* gcc.dg/analyzer/paths-1.c: New test.
* gcc.dg/analyzer/paths-1a.c: New test.
* gcc.dg/analyzer/paths-2.c: New test.
* gcc.dg/analyzer/paths-3.c: New test.
* gcc.dg/analyzer/paths-4.c: New test.
* gcc.dg/analyzer/paths-5.c: New test.
* gcc.dg/analyzer/paths-6.c: New test.
* gcc.dg/analyzer/paths-7.c: New test.
* gcc.dg/analyzer/pattern-test-1.c: New test.
* gcc.dg/analyzer/pattern-test-2.c: New test.
* gcc.dg/analyzer/pointer-merging.c: New test.
* gcc.dg/analyzer/pr61861.c: New test.
* gcc.dg/analyzer/pragma-1.c: New test.
* gcc.dg/analyzer/scope-1.c: New test.
* gcc.dg/analyzer/sensitive-1.c: New test.
* gcc.dg/analyzer/setjmp-1.c: New test.
* gcc.dg/analyzer/setjmp-2.c: New test.
* gcc.dg/analyzer/setjmp-3.c: New test.
* gcc.dg/analyzer/setjmp-4.c: New test.
* gcc.dg/analyzer/setjmp-5.c: New test.
* gcc.dg/analyzer/setjmp-6.c: New test.
* gcc.dg/analyzer/setjmp-7.c: New test.
* gcc.dg/analyzer/setjmp-7a.c: New test.
* gcc.dg/analyzer/setjmp-8.c: New test.
* gcc.dg/analyzer/setjmp-9.c: New test.
* gcc.dg/analyzer/signal-1.c: New test.
* gcc.dg/analyzer/signal-2.c: New test.
* gcc.dg/analyzer/signal-3.c: New test.
* gcc.dg/analyzer/signal-4a.c: New test.
* gcc.dg/analyzer/signal-4b.c: New test.
* gcc.dg/analyzer/strcmp-1.c: New test.
* gcc.dg/analyzer/switch.c: New test.
* gcc.dg/analyzer/taint-1.c: New test.
* gcc.dg/analyzer/zlib-1.c: New test.
* gcc.dg/analyzer/zlib-2.c: New test.
* gcc.dg/analyzer/zlib-3.c: New test.
* gcc.dg/analyzer/zlib-4.c: New test.
* gcc.dg/analyzer/zlib-5.c: New test.
* gcc.dg/analyzer/zlib-6.c: New test.
* lib/gcc-defs.exp (dg-check-dot): New procedure.
* lib/target-supports.exp (check_dot_available): New procedure.
(check_effective_target_analyzer): New.
* lib/target-supports-dg.exp (dg-require-dot): New procedure.
Diffstat (limited to 'gcc/doc/analyzer.texi')
-rw-r--r-- | gcc/doc/analyzer.texi | 513 |
1 files changed, 513 insertions, 0 deletions
diff --git a/gcc/doc/analyzer.texi b/gcc/doc/analyzer.texi new file mode 100644 index 0000000..67efa52 --- /dev/null +++ b/gcc/doc/analyzer.texi @@ -0,0 +1,513 @@ +@c Copyright (C) 2019 Free Software Foundation, Inc. +@c This is part of the GCC manual. +@c For copying conditions, see the file gcc.texi. +@c Contributed by David Malcolm <dmalcolm@redhat.com>. + +@node Static Analyzer +@chapter Static Analyzer +@cindex analyzer +@cindex static analysis +@cindex static analyzer + +@menu +* Analyzer Internals:: Analyzer Internals +* Debugging the Analyzer:: Useful debugging tips +@end menu + +@node Analyzer Internals +@section Analyzer Internals +@cindex analyzer, internals +@cindex static analyzer, internals + +@subsection Overview + +The analyzer implementation works on the gimple-SSA representation. +(I chose this in the hopes of making it easy to work with LTO to +do whole-program analysis). + +The implementation is read-only: it doesn't attempt to change anything, +just emit warnings. + +First, we build a @code{supergraph} which combines the callgraph and all +of the CFGs into a single directed graph, with both interprocedural and +intraprocedural edges. The nodes and edges in the supergraph are called +``supernodes'' and ``superedges'', and often referred to in code as +@code{snodes} and @code{sedges}. Basic blocks in the CFGs are split at +interprocedural calls, so there can be more than one supernode per +basic block. Most statements will be in just one supernode, but a call +statement can appear in two supernodes: at the end of one for the call, +and again at the start of another for the return. + +The supergraph can be seen using @option{-fdump-analyzer-supergraph}. + +We then build an @code{analysis_plan} which walks the callgraph to +determine which calls might be suitable for being summarized (rather +than fully explored) and thus in what order to explore the functions. + +Next is the heart of the analyzer: we use a worklist to explore state +within the supergraph, building an "exploded graph". +Nodes in the exploded graph correspond to <point,@w{ }state> pairs, as in + "Precise Interprocedural Dataflow Analysis via Graph Reachability" + (Thomas Reps, Susan Horwitz and Mooly Sagiv). + +We reuse nodes for <point, state> pairs we've already seen, and avoid +tracking state too closely, so that (hopefully) we rapidly converge +on a final exploded graph, and terminate the analysis. We also bail +out if the number of exploded <end-of-basic-block, state> nodes gets +larger than a particular multiple of the total number of basic blocks +(to ensure termination in the face of pathological state-explosion +cases, or bugs). We also stop exploring a point once we hit a limit +of states for that point. + +We can identify problems directly when processing a <point,@w{ }state> +instance. For example, if we're finding the successors of + +@smallexample + <point: before-stmt: "free (ptr);", + state: @{"ptr": freed@}> +@end smallexample + +then we can detect a double-free of "ptr". We can then emit a path +to reach the problem by finding the simplest route through the graph. + +Program points in the analysis are much more fine-grained than in the +CFG and supergraph, with points (and thus potentially exploded nodes) +for various events, including before individual statements. +By default the exploded graph merges multiple consecutive statements +in a supernode into one exploded edge to minimize the size of the +exploded graph. This can be suppressed via +@option{-fanalyzer-fine-grained}. +The fine-grained approach seems to make things simpler and more debuggable +that other approaches I tried, in that each point is responsible for one +thing. + +Program points in the analysis also have a "call string" identifying the +stack of callsites below them, so that paths in the exploded graph +correspond to interprocedurally valid paths: we always return to the +correct call site, propagating state information accordingly. +We avoid infinite recursion by stopping the analysis if a callsite +appears more than @code{analyzer-max-recursion-depth} in a callstring +(defaulting to 2). + +@subsection Graphs + +Nodes and edges in the exploded graph are called ``exploded nodes'' and +``exploded edges'' and often referred to in the code as +@code{enodes} and @code{eedges} (especially when distinguishing them +from the @code{snodes} and @code{sedges} in the supergraph). + +Each graph numbers its nodes, giving unique identifiers - supernodes +are referred to throughout dumps in the form @samp{SN': @var{index}} and +exploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and +@samp{EN:29}). + +The supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}. + +The exploded graph can be seen using @option{-fdump-analyzer-exploded-graph} +and other dump options. Exploded nodes are color-coded in the .dot output +based on state-machine states to make it easier to see state changes at +a glance. + +@subsection State Tracking + +There's a tension between: +@itemize @bullet +@item +precision of analysis in the straight-line case, vs +@item +exponential blow-up in the face of control flow. +@end itemize + +For example, in general, given this CFG: + +@smallexample + A + / \ + B C + \ / + D + / \ + E F + \ / + G +@end smallexample + +we want to avoid differences in state-tracking in B and C from +leading to blow-up. If we don't prevent state blowup, we end up +with exponential growth of the exploded graph like this: + +@smallexample + + 1:A + / \ + / \ + / \ + 2:B 3:C + | | + 4:D 5:D (2 exploded nodes for D) + / \ / \ + 6:E 7:F 8:E 9:F + | | | | + 10:G 11:G 12:G 13:G (4 exploded nodes for G) + +@end smallexample + +Similar issues arise with loops. + +To prevent this, we follow various approaches: + +@enumerate a +@item +state pruning: which tries to discard state that won't be relevant +later on withing the function. +This can be disabled via @option{-fno-analyzer-state-purge}. + +@item +state merging. We can try to find the commonality between two +program_state instances to make a third, simpler program_state. +We have two strategies here: + + @enumerate + @item + the worklist keeps new nodes for the same program_point together, + and tries to merge them before processing, and thus before they have + successors. Hence, in the above, the two nodes for D (4 and 5) reach + the front of the worklist together, and we create a node for D with + the merger of the incoming states. + + @item + try merging with the state of existing enodes for the program_point + (which may have already been explored). There will be duplication, + but only one set of duplication; subsequent duplicates are more likely + to hit the cache. In particular, (hopefully) all merger chains are + finite, and so we guarantee termination. + This is intended to help with loops: we ought to explore the first + iteration, and then have a "subsequent iterations" exploration, + which uses a state merged from that of the first, to be more abstract. + @end enumerate + +We avoid merging pairs of states that have state-machine differences, +as these are the kinds of differences that are likely to be most +interesting. So, for example, given: + +@smallexample + if (condition) + ptr = malloc (size); + else + ptr = local_buf; + + .... do things with 'ptr' + + if (condition) + free (ptr); + + ...etc +@end smallexample + +then we end up with an exploded graph that looks like this: + +@smallexample + + if (condition) + / T \ F + --------- ---------- + / \ + ptr = malloc (size) ptr = local_buf + | | + copy of copy of + "do things with 'ptr'" "do things with 'ptr'" + with ptr: heap-allocated with ptr: stack-allocated + | | + if (condition) if (condition) + | known to be T | known to be F + free (ptr); | + \ / + ----------------------------- + | ('ptr' is pruned, so states can be merged) + etc + +@end smallexample + +where some duplication has occurred, but only for the places where the +the different paths are worth exploringly separately. + +Merging can be disabled via @option{-fno-analyzer-state-merge}. +@end enumerate + +@subsection Region Model + +Part of the state stored at a @code{exploded_node} is a @code{region_model}. +This is an implementation of the region-based ternary model described in +@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf, +"A Memory Model for Static Analysis of C Programs"} +(Zhongxing Xu, Ted Kremenek, and Jian Zhang). + +A @code{region_model} encapsulates a representation of the state of +memory, with a tree of @code{region} instances, along with their associated +values. The representation is graph-like because values can be pointers +to regions. It also stores a constraint_manager, capturing relationships +between the values. + +Because each node in the @code{exploded_graph} has a @code{region_model}, +and each of the latter is graph-like, the @code{exploded_graph} is in some +ways a graph of graphs. + +Here's an example of printing a @code{region_model}, showing the ASCII-art +used to visualize the region hierarchy (colorized when printing to stderr): + +@smallexample +(gdb) call debug (*this) +r0: @{kind: 'root', parent: null, sval: null@} +|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@} +| |: sval: sv1: @{poisoned: uninit@} +| |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@} +| | `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@} +| | |: sval: sv3: @{type: 'void *', unknown@} +| | |: type: 'void *' +| `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@} +| |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} +| | |: sval: sv4: @{type: 'void *', &r6@} +| | |: type: 'void *' +| |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} +| | |: sval: sv4: @{type: 'void *', &r6@} +| | |: type: 'void *' +| `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} +| |: sval: sv4: @{type: 'void *', &r6@} +| |: type: 'void *' +`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@} + |: sval: sv2: @{poisoned: uninit@} + `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@} +svalues: + sv0: @{type: 'size_t', '1024'@} + sv1: @{poisoned: uninit@} + sv2: @{poisoned: uninit@} + sv3: @{type: 'void *', unknown@} + sv4: @{type: 'void *', &r6@} +constraint manager: + equiv classes: + ec0: @{sv0 == '1024'@} + ec1: @{sv4@} + constraints: +@end smallexample + +This is the state at the point of returning from @code{calls_malloc} back +to @code{test} in the following: + +@smallexample +void * +calls_malloc (void) +@{ + void *result = malloc (1024); + return result; +@} + +void test (void) +@{ + void *ptr = calls_malloc (); + /* etc. */ +@} +@end smallexample + +The ``root'' region (``r0'') has a ``stack'' child (``r1''), with two +children: a frame for @code{test} (``r2''), and a frame for +@code{calls_malloc} (``r4''). These frame regions have child regions for +storing their local variables. For example, the return region +and that of various other regions within the ``calls_malloc'' frame all have +value ``sv4'', a pointer to a heap-allocated region ``r6''. Within the parent +frame, @code{ptr_3} has value ``sv3'', an unknown @code{void *}. + +@subsection Analyzer Paths + +We need to explain to the user what the problem is, and to persuade them +that there really is a problem. Hence having a @code{diagnostic_path} +isn't just an incidental detail of the analyzer; it's required. + +Paths ought to be: +@itemize @bullet +@item +interprocedurally-valid +@item +feasible +@end itemize + +Without state-merging, all paths in the exploded graph are feasible +(in terms of constraints being satisified). +With state-merging, paths in the exploded graph can be infeasible. + +We collate warnings and only emit them for the simplest path +e.g. for a bug in a utility function, with lots of routes to calling it, +we only emit the simplest path (which could be intraprocedural, if +it can be reproduced without a caller). We apply a check that +each duplicate warning's shortest path is feasible, rejecting any +warnings for which the shortest path is infeasible (which could lead to +false negatives). + +We use the shortest feasible @code{exploded_path} through the +@code{exploded_graph} (a list of @code{exploded_edge *}) to build a +@code{diagnostic_path} (a list of events for the diagnostic subsystem) - +specifically a @code{checker_path}. + +Having built the @code{checker_path}, we prune it to try to eliminate +events that aren't relevant, to minimize how much the user has to read. + +After pruning, we notify each event in the path of its ID and record the +IDs of interesting events, allowing for events to refer to other events +in their descriptions. The @code{pending_diagnostic} class has various +vfuncs to support emitting more precise descriptions, so that e.g. + +@itemize @bullet +@item +a deref-of-unchecked-malloc diagnostic might use: +@smallexample + returning possibly-NULL pointer to 'make_obj' from 'allocator' +@end smallexample +for a @code{return_event} to make it clearer how the unchecked value moves +from callee back to caller +@item +a double-free diagnostic might use: +@smallexample + second 'free' here; first 'free' was at (3) +@end smallexample +and a use-after-free might use +@smallexample + use after 'free' here; memory was freed at (2) +@end smallexample +@end itemize + +At this point we can emit the diagnostic. + +@subsection Limitations + +@itemize @bullet +@item +Only for C so far +@item +The implementation of call summaries is currently very simplistic. +@item +Lack of function pointer analysis +@item +The region model code creates lots of little mutable objects at each +@code{region_model} (and thus per @code{exploded_node}) rather than +sharing immutable objects and having the mutable state in the +@code{program_state} or @code{region_model}. The latter approach might be +more efficient, and might avoid dealing with IDs rather than pointers +(which requires us to impose an ordering to get meaningful equality). +@item +The region model code doesn't yet support @code{memcpy}. At the +gimple-ssa level these have been optimized to statements like this: +@smallexample +_10 = MEM <long unsigned int> [(char * @{ref-all@})&c] +MEM <long unsigned int> [(char * @{ref-all@})&d] = _10; +@end smallexample +Perhaps they could be supported via a new @code{compound_svalue} type. +@item +There are various other limitations in the region model (grep for TODO/xfail +in the testsuite). +@item +The constraint_manager's implementation of transitivity is currently too +expensive to enable by default and so must be manually enabled via +@option{-fanalyzer-transitivity}). +@item +The checkers are currently hardcoded and don't allow for user extensibility +(e.g. adding allocate/release pairs). +@item +Although the analyzer's test suite has a proof-of-concept test case for +LTO, LTO support hasn't had extensive testing. There are various +lang-specific things in the analyzer that assume C rather than LTO. +For example, SSA names are printed to the user in ``raw'' form, rather +than printing the underlying variable name. +@end itemize + +Some ideas for other checkers +@itemize @bullet +@item +File-descriptor-based APIs +@item +Linux kernel internal APIs +@item +Signal handling +@end itemize + +@node Debugging the Analyzer +@section Debugging the Analyzer +@cindex analyzer, debugging +@cindex static analyzer, debugging + +@subsection Special Functions for Debugging the Analyzer + +The analyzer recognizes various special functions by name, for use +in debugging the analyzer. Declarations can be seen in the testsuite +in @file{analyzer-decls.h}. None of these functions are actually +implemented. + +Add: +@smallexample + __analyzer_break (); +@end smallexample +to the source being analyzed to trigger a breakpoint in the analyzer when +that source is reached. By putting a series of these in the source, it's +much easier to effectively step through the program state as it's analyzed. + +@smallexample +__analyzer_dump (); +@end smallexample + +will dump the copious information about the analyzer's state each time it +reaches the call in its traversal of the source. + +@smallexample +__analyzer_dump_path (); +@end smallexample + +will emit a placeholder ``note'' diagnostic with a path to that call site, +if the analyzer finds a feasible path to it. + +The builtin @code{__analyzer_dump_exploded_nodes} will dump information +after analysis on all of the exploded nodes at that program point: + +@smallexample + __analyzer_dump_exploded_nodes (0); +@end smallexample + +will dump just the number of nodes, and their IDs. + +@smallexample + __analyzer_dump_exploded_nodes (1); +@end smallexample + +will also dump all of the states within those nodes. + +@smallexample + __analyzer_dump_region_model (); +@end smallexample +will dump the region_model's state to stderr. + +@smallexample +__analyzer_eval (expr); +@end smallexample +will emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the +truthfulness of the argument. This is useful for writing DejaGnu tests. + + +@subsection Other Debugging Techniques + +One approach when tracking down where a particular bogus state is +introduced into the @code{exploded_graph} is to add custom code to +@code{region_model::validate}. + +For example, this custom code (added to @code{region_model::validate}) +breaks with an assertion failure when a variable called @code{ptr} +acquires a value that's unknown, using +@code{region_model::get_value_by_name} to locate the variable + +@smallexample + /* Find a variable matching "ptr". */ + svalue_id sid = get_value_by_name ("ptr"); + if (!sid.null_p ()) + @{ + svalue *sval = get_svalue (sid); + gcc_assert (sval->get_kind () != SK_UNKNOWN); + @} +@end smallexample + +making it easier to investigate further in a debugger when this occurs. |