aboutsummaryrefslogtreecommitdiff
path: root/gcc/doc/analyzer.texi
diff options
context:
space:
mode:
authorDavid Malcolm <dmalcolm@redhat.com>2019-09-27 09:23:16 -0400
committerDavid Malcolm <dmalcolm@redhat.com>2020-01-14 15:34:24 -0500
commit757bf1dff5e8cee34c0a75d06140ca972bfecfa7 (patch)
treecca8a96a39f87c90df46a389d1777854f97017d3 /gcc/doc/analyzer.texi
parent08c8c973c082457a7d6192673e87475f1fdfdbef (diff)
downloadgcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.zip
gcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.tar.gz
gcc-757bf1dff5e8cee34c0a75d06140ca972bfecfa7.tar.bz2
Initial commit of analyzer
This patch adds a static analysis pass to the middle-end, focusing for this release on C code, and malloc/free issues in particular. See: https://gcc.gnu.org/wiki/DavidMalcolm/StaticAnalyzer gcc/ChangeLog: * Makefile.in (lang_opt_files): Add analyzer.opt. (ANALYZER_OBJS): New. (OBJS): Add digraph.o, graphviz.o, ordered-hash-map-tests.o, tristate.o and ANALYZER_OBJS. (TEXI_GCCINT_FILES): Add analyzer.texi. * common.opt (-fanalyzer): New driver option. * config.in: Regenerate. * configure: Regenerate. * configure.ac (--disable-analyzer, ENABLE_ANALYZER): New option. (gccdepdir): Also create depdir for "analyzer" subdir. * digraph.cc: New file. * digraph.h: New file. * doc/analyzer.texi: New file. * doc/gccint.texi ("Static Analyzer") New menu item. (analyzer.texi): Include it. * doc/invoke.texi ("Static Analyzer Options"): New list and new section. ("Warning Options"): Add static analysis warnings to the list. (-Wno-analyzer-double-fclose): New option. (-Wno-analyzer-double-free): New option. (-Wno-analyzer-exposure-through-output-file): New option. (-Wno-analyzer-file-leak): New option. (-Wno-analyzer-free-of-non-heap): New option. (-Wno-analyzer-malloc-leak): New option. (-Wno-analyzer-possible-null-argument): New option. (-Wno-analyzer-possible-null-dereference): New option. (-Wno-analyzer-null-argument): New option. (-Wno-analyzer-null-dereference): New option. (-Wno-analyzer-stale-setjmp-buffer): New option. (-Wno-analyzer-tainted-array-index): New option. (-Wno-analyzer-use-after-free): New option. (-Wno-analyzer-use-of-pointer-in-stale-stack-frame): New option. (-Wno-analyzer-use-of-uninitialized-value): New option. (-Wanalyzer-too-complex): New option. (-fanalyzer-call-summaries): New warning. (-fanalyzer-checker=): New warning. (-fanalyzer-fine-grained): New warning. (-fno-analyzer-state-merge): New warning. (-fno-analyzer-state-purge): New warning. (-fanalyzer-transitivity): New warning. (-fanalyzer-verbose-edges): New warning. (-fanalyzer-verbose-state-changes): New warning. (-fanalyzer-verbosity=): New warning. (-fdump-analyzer): New warning. (-fdump-analyzer-callgraph): New warning. (-fdump-analyzer-exploded-graph): New warning. (-fdump-analyzer-exploded-nodes): New warning. (-fdump-analyzer-exploded-nodes-2): New warning. (-fdump-analyzer-exploded-nodes-3): New warning. (-fdump-analyzer-supergraph): New warning. * doc/sourcebuild.texi (dg-require-dot): New. (dg-check-dot): New. * gdbinit.in (break-on-saved-diagnostic): New command. * graphviz.cc: New file. * graphviz.h: New file. * ordered-hash-map-tests.cc: New file. * ordered-hash-map.h: New file. * passes.def (pass_analyzer): Add before pass_ipa_whole_program_visibility. * selftest-run-tests.c (selftest::run_tests): Call selftest::ordered_hash_map_tests_cc_tests. * selftest.h (selftest::ordered_hash_map_tests_cc_tests): New decl. * shortest-paths.h: New file. * timevar.def (TV_ANALYZER): New timevar. (TV_ANALYZER_SUPERGRAPH): Likewise. (TV_ANALYZER_STATE_PURGE): Likewise. (TV_ANALYZER_PLAN): Likewise. (TV_ANALYZER_SCC): Likewise. (TV_ANALYZER_WORKLIST): Likewise. (TV_ANALYZER_DUMP): Likewise. (TV_ANALYZER_DIAGNOSTICS): Likewise. (TV_ANALYZER_SHORTEST_PATHS): Likewise. * tree-pass.h (make_pass_analyzer): New decl. * tristate.cc: New file. * tristate.h: New file. gcc/analyzer/ChangeLog: * ChangeLog: New file. * analyzer-selftests.cc: New file. * analyzer-selftests.h: New file. * analyzer.opt: New file. * analysis-plan.cc: New file. * analysis-plan.h: New file. * analyzer-logging.cc: New file. * analyzer-logging.h: New file. * analyzer-pass.cc: New file. * analyzer.cc: New file. * analyzer.h: New file. * call-string.cc: New file. * call-string.h: New file. * checker-path.cc: New file. * checker-path.h: New file. * constraint-manager.cc: New file. * constraint-manager.h: New file. * diagnostic-manager.cc: New file. * diagnostic-manager.h: New file. * engine.cc: New file. * engine.h: New file. * exploded-graph.h: New file. * pending-diagnostic.cc: New file. * pending-diagnostic.h: New file. * program-point.cc: New file. * program-point.h: New file. * program-state.cc: New file. * program-state.h: New file. * region-model.cc: New file. * region-model.h: New file. * sm-file.cc: New file. * sm-malloc.cc: New file. * sm-malloc.dot: New file. * sm-pattern-test.cc: New file. * sm-sensitive.cc: New file. * sm-signal.cc: New file. * sm-taint.cc: New file. * sm.cc: New file. * sm.h: New file. * state-purge.cc: New file. * state-purge.h: New file. * supergraph.cc: New file. * supergraph.h: New file. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/CVE-2005-1689-minimal.c: New test. * gcc.dg/analyzer/abort.c: New test. * gcc.dg/analyzer/alloca-leak.c: New test. * gcc.dg/analyzer/analyzer-decls.h: New header. * gcc.dg/analyzer/analyzer-verbosity-0.c: New test. * gcc.dg/analyzer/analyzer-verbosity-1.c: New test. * gcc.dg/analyzer/analyzer-verbosity-2.c: New test. * gcc.dg/analyzer/analyzer.exp: New suite. * gcc.dg/analyzer/attribute-nonnull.c: New test. * gcc.dg/analyzer/call-summaries-1.c: New test. * gcc.dg/analyzer/conditionals-2.c: New test. * gcc.dg/analyzer/conditionals-3.c: New test. * gcc.dg/analyzer/conditionals-notrans.c: New test. * gcc.dg/analyzer/conditionals-trans.c: New test. * gcc.dg/analyzer/data-model-1.c: New test. * gcc.dg/analyzer/data-model-2.c: New test. * gcc.dg/analyzer/data-model-3.c: New test. * gcc.dg/analyzer/data-model-4.c: New test. * gcc.dg/analyzer/data-model-5.c: New test. * gcc.dg/analyzer/data-model-5b.c: New test. * gcc.dg/analyzer/data-model-5c.c: New test. * gcc.dg/analyzer/data-model-5d.c: New test. * gcc.dg/analyzer/data-model-6.c: New test. * gcc.dg/analyzer/data-model-7.c: New test. * gcc.dg/analyzer/data-model-8.c: New test. * gcc.dg/analyzer/data-model-9.c: New test. * gcc.dg/analyzer/data-model-11.c: New test. * gcc.dg/analyzer/data-model-12.c: New test. * gcc.dg/analyzer/data-model-13.c: New test. * gcc.dg/analyzer/data-model-14.c: New test. * gcc.dg/analyzer/data-model-15.c: New test. * gcc.dg/analyzer/data-model-16.c: New test. * gcc.dg/analyzer/data-model-17.c: New test. * gcc.dg/analyzer/data-model-18.c: New test. * gcc.dg/analyzer/data-model-19.c: New test. * gcc.dg/analyzer/data-model-path-1.c: New test. * gcc.dg/analyzer/disabling.c: New test. * gcc.dg/analyzer/dot-output.c: New test. * gcc.dg/analyzer/double-free-lto-1-a.c: New test. * gcc.dg/analyzer/double-free-lto-1-b.c: New test. * gcc.dg/analyzer/double-free-lto-1.h: New header. * gcc.dg/analyzer/equivalence.c: New test. * gcc.dg/analyzer/explode-1.c: New test. * gcc.dg/analyzer/explode-2.c: New test. * gcc.dg/analyzer/factorial.c: New test. * gcc.dg/analyzer/fibonacci.c: New test. * gcc.dg/analyzer/fields.c: New test. * gcc.dg/analyzer/file-1.c: New test. * gcc.dg/analyzer/file-2.c: New test. * gcc.dg/analyzer/function-ptr-1.c: New test. * gcc.dg/analyzer/function-ptr-2.c: New test. * gcc.dg/analyzer/function-ptr-3.c: New test. * gcc.dg/analyzer/gzio-2.c: New test. * gcc.dg/analyzer/gzio-3.c: New test. * gcc.dg/analyzer/gzio-3a.c: New test. * gcc.dg/analyzer/gzio.c: New test. * gcc.dg/analyzer/infinite-recursion.c: New test. * gcc.dg/analyzer/loop-2.c: New test. * gcc.dg/analyzer/loop-2a.c: New test. * gcc.dg/analyzer/loop-3.c: New test. * gcc.dg/analyzer/loop-4.c: New test. * gcc.dg/analyzer/loop.c: New test. * gcc.dg/analyzer/malloc-1.c: New test. * gcc.dg/analyzer/malloc-2.c: New test. * gcc.dg/analyzer/malloc-3.c: New test. * gcc.dg/analyzer/malloc-callbacks.c: New test. * gcc.dg/analyzer/malloc-dce.c: New test. * gcc.dg/analyzer/malloc-dedupe-1.c: New test. * gcc.dg/analyzer/malloc-ipa-1.c: New test. * gcc.dg/analyzer/malloc-ipa-10.c: New test. * gcc.dg/analyzer/malloc-ipa-11.c: New test. * gcc.dg/analyzer/malloc-ipa-12.c: New test. * gcc.dg/analyzer/malloc-ipa-13.c: New test. * gcc.dg/analyzer/malloc-ipa-2.c: New test. * gcc.dg/analyzer/malloc-ipa-3.c: New test. * gcc.dg/analyzer/malloc-ipa-4.c: New test. * gcc.dg/analyzer/malloc-ipa-5.c: New test. * gcc.dg/analyzer/malloc-ipa-6.c: New test. * gcc.dg/analyzer/malloc-ipa-7.c: New test. * gcc.dg/analyzer/malloc-ipa-8-double-free.c: New test. * gcc.dg/analyzer/malloc-ipa-8-lto-a.c: New test. * gcc.dg/analyzer/malloc-ipa-8-lto-b.c: New test. * gcc.dg/analyzer/malloc-ipa-8-lto-c.c: New test. * gcc.dg/analyzer/malloc-ipa-8-lto.h: New test. * gcc.dg/analyzer/malloc-ipa-8-unchecked.c: New test. * gcc.dg/analyzer/malloc-ipa-9.c: New test. * gcc.dg/analyzer/malloc-macro-inline-events.c: New test. * gcc.dg/analyzer/malloc-macro-separate-events.c: New test. * gcc.dg/analyzer/malloc-macro.h: New header. * gcc.dg/analyzer/malloc-many-paths-1.c: New test. * gcc.dg/analyzer/malloc-many-paths-2.c: New test. * gcc.dg/analyzer/malloc-many-paths-3.c: New test. * gcc.dg/analyzer/malloc-paths-1.c: New test. * gcc.dg/analyzer/malloc-paths-10.c: New test. * gcc.dg/analyzer/malloc-paths-2.c: New test. * gcc.dg/analyzer/malloc-paths-3.c: New test. * gcc.dg/analyzer/malloc-paths-4.c: New test. * gcc.dg/analyzer/malloc-paths-5.c: New test. * gcc.dg/analyzer/malloc-paths-6.c: New test. * gcc.dg/analyzer/malloc-paths-7.c: New test. * gcc.dg/analyzer/malloc-paths-8.c: New test. * gcc.dg/analyzer/malloc-paths-9.c: New test. * gcc.dg/analyzer/malloc-vs-local-1a.c: New test. * gcc.dg/analyzer/malloc-vs-local-1b.c: New test. * gcc.dg/analyzer/malloc-vs-local-2.c: New test. * gcc.dg/analyzer/malloc-vs-local-3.c: New test. * gcc.dg/analyzer/malloc-vs-local-4.c: New test. * gcc.dg/analyzer/operations.c: New test. * gcc.dg/analyzer/params-2.c: New test. * gcc.dg/analyzer/params.c: New test. * gcc.dg/analyzer/paths-1.c: New test. * gcc.dg/analyzer/paths-1a.c: New test. * gcc.dg/analyzer/paths-2.c: New test. * gcc.dg/analyzer/paths-3.c: New test. * gcc.dg/analyzer/paths-4.c: New test. * gcc.dg/analyzer/paths-5.c: New test. * gcc.dg/analyzer/paths-6.c: New test. * gcc.dg/analyzer/paths-7.c: New test. * gcc.dg/analyzer/pattern-test-1.c: New test. * gcc.dg/analyzer/pattern-test-2.c: New test. * gcc.dg/analyzer/pointer-merging.c: New test. * gcc.dg/analyzer/pr61861.c: New test. * gcc.dg/analyzer/pragma-1.c: New test. * gcc.dg/analyzer/scope-1.c: New test. * gcc.dg/analyzer/sensitive-1.c: New test. * gcc.dg/analyzer/setjmp-1.c: New test. * gcc.dg/analyzer/setjmp-2.c: New test. * gcc.dg/analyzer/setjmp-3.c: New test. * gcc.dg/analyzer/setjmp-4.c: New test. * gcc.dg/analyzer/setjmp-5.c: New test. * gcc.dg/analyzer/setjmp-6.c: New test. * gcc.dg/analyzer/setjmp-7.c: New test. * gcc.dg/analyzer/setjmp-7a.c: New test. * gcc.dg/analyzer/setjmp-8.c: New test. * gcc.dg/analyzer/setjmp-9.c: New test. * gcc.dg/analyzer/signal-1.c: New test. * gcc.dg/analyzer/signal-2.c: New test. * gcc.dg/analyzer/signal-3.c: New test. * gcc.dg/analyzer/signal-4a.c: New test. * gcc.dg/analyzer/signal-4b.c: New test. * gcc.dg/analyzer/strcmp-1.c: New test. * gcc.dg/analyzer/switch.c: New test. * gcc.dg/analyzer/taint-1.c: New test. * gcc.dg/analyzer/zlib-1.c: New test. * gcc.dg/analyzer/zlib-2.c: New test. * gcc.dg/analyzer/zlib-3.c: New test. * gcc.dg/analyzer/zlib-4.c: New test. * gcc.dg/analyzer/zlib-5.c: New test. * gcc.dg/analyzer/zlib-6.c: New test. * lib/gcc-defs.exp (dg-check-dot): New procedure. * lib/target-supports.exp (check_dot_available): New procedure. (check_effective_target_analyzer): New. * lib/target-supports-dg.exp (dg-require-dot): New procedure.
Diffstat (limited to 'gcc/doc/analyzer.texi')
-rw-r--r--gcc/doc/analyzer.texi513
1 files changed, 513 insertions, 0 deletions
diff --git a/gcc/doc/analyzer.texi b/gcc/doc/analyzer.texi
new file mode 100644
index 0000000..67efa52
--- /dev/null
+++ b/gcc/doc/analyzer.texi
@@ -0,0 +1,513 @@
+@c Copyright (C) 2019 Free Software Foundation, Inc.
+@c This is part of the GCC manual.
+@c For copying conditions, see the file gcc.texi.
+@c Contributed by David Malcolm <dmalcolm@redhat.com>.
+
+@node Static Analyzer
+@chapter Static Analyzer
+@cindex analyzer
+@cindex static analysis
+@cindex static analyzer
+
+@menu
+* Analyzer Internals:: Analyzer Internals
+* Debugging the Analyzer:: Useful debugging tips
+@end menu
+
+@node Analyzer Internals
+@section Analyzer Internals
+@cindex analyzer, internals
+@cindex static analyzer, internals
+
+@subsection Overview
+
+The analyzer implementation works on the gimple-SSA representation.
+(I chose this in the hopes of making it easy to work with LTO to
+do whole-program analysis).
+
+The implementation is read-only: it doesn't attempt to change anything,
+just emit warnings.
+
+First, we build a @code{supergraph} which combines the callgraph and all
+of the CFGs into a single directed graph, with both interprocedural and
+intraprocedural edges. The nodes and edges in the supergraph are called
+``supernodes'' and ``superedges'', and often referred to in code as
+@code{snodes} and @code{sedges}. Basic blocks in the CFGs are split at
+interprocedural calls, so there can be more than one supernode per
+basic block. Most statements will be in just one supernode, but a call
+statement can appear in two supernodes: at the end of one for the call,
+and again at the start of another for the return.
+
+The supergraph can be seen using @option{-fdump-analyzer-supergraph}.
+
+We then build an @code{analysis_plan} which walks the callgraph to
+determine which calls might be suitable for being summarized (rather
+than fully explored) and thus in what order to explore the functions.
+
+Next is the heart of the analyzer: we use a worklist to explore state
+within the supergraph, building an "exploded graph".
+Nodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
+ "Precise Interprocedural Dataflow Analysis via Graph Reachability"
+ (Thomas Reps, Susan Horwitz and Mooly Sagiv).
+
+We reuse nodes for <point, state> pairs we've already seen, and avoid
+tracking state too closely, so that (hopefully) we rapidly converge
+on a final exploded graph, and terminate the analysis. We also bail
+out if the number of exploded <end-of-basic-block, state> nodes gets
+larger than a particular multiple of the total number of basic blocks
+(to ensure termination in the face of pathological state-explosion
+cases, or bugs). We also stop exploring a point once we hit a limit
+of states for that point.
+
+We can identify problems directly when processing a <point,@w{ }state>
+instance. For example, if we're finding the successors of
+
+@smallexample
+ <point: before-stmt: "free (ptr);",
+ state: @{"ptr": freed@}>
+@end smallexample
+
+then we can detect a double-free of "ptr". We can then emit a path
+to reach the problem by finding the simplest route through the graph.
+
+Program points in the analysis are much more fine-grained than in the
+CFG and supergraph, with points (and thus potentially exploded nodes)
+for various events, including before individual statements.
+By default the exploded graph merges multiple consecutive statements
+in a supernode into one exploded edge to minimize the size of the
+exploded graph. This can be suppressed via
+@option{-fanalyzer-fine-grained}.
+The fine-grained approach seems to make things simpler and more debuggable
+that other approaches I tried, in that each point is responsible for one
+thing.
+
+Program points in the analysis also have a "call string" identifying the
+stack of callsites below them, so that paths in the exploded graph
+correspond to interprocedurally valid paths: we always return to the
+correct call site, propagating state information accordingly.
+We avoid infinite recursion by stopping the analysis if a callsite
+appears more than @code{analyzer-max-recursion-depth} in a callstring
+(defaulting to 2).
+
+@subsection Graphs
+
+Nodes and edges in the exploded graph are called ``exploded nodes'' and
+``exploded edges'' and often referred to in the code as
+@code{enodes} and @code{eedges} (especially when distinguishing them
+from the @code{snodes} and @code{sedges} in the supergraph).
+
+Each graph numbers its nodes, giving unique identifiers - supernodes
+are referred to throughout dumps in the form @samp{SN': @var{index}} and
+exploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
+@samp{EN:29}).
+
+The supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
+
+The exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
+and other dump options. Exploded nodes are color-coded in the .dot output
+based on state-machine states to make it easier to see state changes at
+a glance.
+
+@subsection State Tracking
+
+There's a tension between:
+@itemize @bullet
+@item
+precision of analysis in the straight-line case, vs
+@item
+exponential blow-up in the face of control flow.
+@end itemize
+
+For example, in general, given this CFG:
+
+@smallexample
+ A
+ / \
+ B C
+ \ /
+ D
+ / \
+ E F
+ \ /
+ G
+@end smallexample
+
+we want to avoid differences in state-tracking in B and C from
+leading to blow-up. If we don't prevent state blowup, we end up
+with exponential growth of the exploded graph like this:
+
+@smallexample
+
+ 1:A
+ / \
+ / \
+ / \
+ 2:B 3:C
+ | |
+ 4:D 5:D (2 exploded nodes for D)
+ / \ / \
+ 6:E 7:F 8:E 9:F
+ | | | |
+ 10:G 11:G 12:G 13:G (4 exploded nodes for G)
+
+@end smallexample
+
+Similar issues arise with loops.
+
+To prevent this, we follow various approaches:
+
+@enumerate a
+@item
+state pruning: which tries to discard state that won't be relevant
+later on withing the function.
+This can be disabled via @option{-fno-analyzer-state-purge}.
+
+@item
+state merging. We can try to find the commonality between two
+program_state instances to make a third, simpler program_state.
+We have two strategies here:
+
+ @enumerate
+ @item
+ the worklist keeps new nodes for the same program_point together,
+ and tries to merge them before processing, and thus before they have
+ successors. Hence, in the above, the two nodes for D (4 and 5) reach
+ the front of the worklist together, and we create a node for D with
+ the merger of the incoming states.
+
+ @item
+ try merging with the state of existing enodes for the program_point
+ (which may have already been explored). There will be duplication,
+ but only one set of duplication; subsequent duplicates are more likely
+ to hit the cache. In particular, (hopefully) all merger chains are
+ finite, and so we guarantee termination.
+ This is intended to help with loops: we ought to explore the first
+ iteration, and then have a "subsequent iterations" exploration,
+ which uses a state merged from that of the first, to be more abstract.
+ @end enumerate
+
+We avoid merging pairs of states that have state-machine differences,
+as these are the kinds of differences that are likely to be most
+interesting. So, for example, given:
+
+@smallexample
+ if (condition)
+ ptr = malloc (size);
+ else
+ ptr = local_buf;
+
+ .... do things with 'ptr'
+
+ if (condition)
+ free (ptr);
+
+ ...etc
+@end smallexample
+
+then we end up with an exploded graph that looks like this:
+
+@smallexample
+
+ if (condition)
+ / T \ F
+ --------- ----------
+ / \
+ ptr = malloc (size) ptr = local_buf
+ | |
+ copy of copy of
+ "do things with 'ptr'" "do things with 'ptr'"
+ with ptr: heap-allocated with ptr: stack-allocated
+ | |
+ if (condition) if (condition)
+ | known to be T | known to be F
+ free (ptr); |
+ \ /
+ -----------------------------
+ | ('ptr' is pruned, so states can be merged)
+ etc
+
+@end smallexample
+
+where some duplication has occurred, but only for the places where the
+the different paths are worth exploringly separately.
+
+Merging can be disabled via @option{-fno-analyzer-state-merge}.
+@end enumerate
+
+@subsection Region Model
+
+Part of the state stored at a @code{exploded_node} is a @code{region_model}.
+This is an implementation of the region-based ternary model described in
+@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf,
+"A Memory Model for Static Analysis of C Programs"}
+(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
+
+A @code{region_model} encapsulates a representation of the state of
+memory, with a tree of @code{region} instances, along with their associated
+values. The representation is graph-like because values can be pointers
+to regions. It also stores a constraint_manager, capturing relationships
+between the values.
+
+Because each node in the @code{exploded_graph} has a @code{region_model},
+and each of the latter is graph-like, the @code{exploded_graph} is in some
+ways a graph of graphs.
+
+Here's an example of printing a @code{region_model}, showing the ASCII-art
+used to visualize the region hierarchy (colorized when printing to stderr):
+
+@smallexample
+(gdb) call debug (*this)
+r0: @{kind: 'root', parent: null, sval: null@}
+|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@}
+| |: sval: sv1: @{poisoned: uninit@}
+| |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@}
+| | `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@}
+| | |: sval: sv3: @{type: 'void *', unknown@}
+| | |: type: 'void *'
+| `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@}
+| |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
+| | |: sval: sv4: @{type: 'void *', &r6@}
+| | |: type: 'void *'
+| |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
+| | |: sval: sv4: @{type: 'void *', &r6@}
+| | |: type: 'void *'
+| `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
+| |: sval: sv4: @{type: 'void *', &r6@}
+| |: type: 'void *'
+`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@}
+ |: sval: sv2: @{poisoned: uninit@}
+ `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@}
+svalues:
+ sv0: @{type: 'size_t', '1024'@}
+ sv1: @{poisoned: uninit@}
+ sv2: @{poisoned: uninit@}
+ sv3: @{type: 'void *', unknown@}
+ sv4: @{type: 'void *', &r6@}
+constraint manager:
+ equiv classes:
+ ec0: @{sv0 == '1024'@}
+ ec1: @{sv4@}
+ constraints:
+@end smallexample
+
+This is the state at the point of returning from @code{calls_malloc} back
+to @code{test} in the following:
+
+@smallexample
+void *
+calls_malloc (void)
+@{
+ void *result = malloc (1024);
+ return result;
+@}
+
+void test (void)
+@{
+ void *ptr = calls_malloc ();
+ /* etc. */
+@}
+@end smallexample
+
+The ``root'' region (``r0'') has a ``stack'' child (``r1''), with two
+children: a frame for @code{test} (``r2''), and a frame for
+@code{calls_malloc} (``r4''). These frame regions have child regions for
+storing their local variables. For example, the return region
+and that of various other regions within the ``calls_malloc'' frame all have
+value ``sv4'', a pointer to a heap-allocated region ``r6''. Within the parent
+frame, @code{ptr_3} has value ``sv3'', an unknown @code{void *}.
+
+@subsection Analyzer Paths
+
+We need to explain to the user what the problem is, and to persuade them
+that there really is a problem. Hence having a @code{diagnostic_path}
+isn't just an incidental detail of the analyzer; it's required.
+
+Paths ought to be:
+@itemize @bullet
+@item
+interprocedurally-valid
+@item
+feasible
+@end itemize
+
+Without state-merging, all paths in the exploded graph are feasible
+(in terms of constraints being satisified).
+With state-merging, paths in the exploded graph can be infeasible.
+
+We collate warnings and only emit them for the simplest path
+e.g. for a bug in a utility function, with lots of routes to calling it,
+we only emit the simplest path (which could be intraprocedural, if
+it can be reproduced without a caller). We apply a check that
+each duplicate warning's shortest path is feasible, rejecting any
+warnings for which the shortest path is infeasible (which could lead to
+false negatives).
+
+We use the shortest feasible @code{exploded_path} through the
+@code{exploded_graph} (a list of @code{exploded_edge *}) to build a
+@code{diagnostic_path} (a list of events for the diagnostic subsystem) -
+specifically a @code{checker_path}.
+
+Having built the @code{checker_path}, we prune it to try to eliminate
+events that aren't relevant, to minimize how much the user has to read.
+
+After pruning, we notify each event in the path of its ID and record the
+IDs of interesting events, allowing for events to refer to other events
+in their descriptions. The @code{pending_diagnostic} class has various
+vfuncs to support emitting more precise descriptions, so that e.g.
+
+@itemize @bullet
+@item
+a deref-of-unchecked-malloc diagnostic might use:
+@smallexample
+ returning possibly-NULL pointer to 'make_obj' from 'allocator'
+@end smallexample
+for a @code{return_event} to make it clearer how the unchecked value moves
+from callee back to caller
+@item
+a double-free diagnostic might use:
+@smallexample
+ second 'free' here; first 'free' was at (3)
+@end smallexample
+and a use-after-free might use
+@smallexample
+ use after 'free' here; memory was freed at (2)
+@end smallexample
+@end itemize
+
+At this point we can emit the diagnostic.
+
+@subsection Limitations
+
+@itemize @bullet
+@item
+Only for C so far
+@item
+The implementation of call summaries is currently very simplistic.
+@item
+Lack of function pointer analysis
+@item
+The region model code creates lots of little mutable objects at each
+@code{region_model} (and thus per @code{exploded_node}) rather than
+sharing immutable objects and having the mutable state in the
+@code{program_state} or @code{region_model}. The latter approach might be
+more efficient, and might avoid dealing with IDs rather than pointers
+(which requires us to impose an ordering to get meaningful equality).
+@item
+The region model code doesn't yet support @code{memcpy}. At the
+gimple-ssa level these have been optimized to statements like this:
+@smallexample
+_10 = MEM <long unsigned int> [(char * @{ref-all@})&c]
+MEM <long unsigned int> [(char * @{ref-all@})&d] = _10;
+@end smallexample
+Perhaps they could be supported via a new @code{compound_svalue} type.
+@item
+There are various other limitations in the region model (grep for TODO/xfail
+in the testsuite).
+@item
+The constraint_manager's implementation of transitivity is currently too
+expensive to enable by default and so must be manually enabled via
+@option{-fanalyzer-transitivity}).
+@item
+The checkers are currently hardcoded and don't allow for user extensibility
+(e.g. adding allocate/release pairs).
+@item
+Although the analyzer's test suite has a proof-of-concept test case for
+LTO, LTO support hasn't had extensive testing. There are various
+lang-specific things in the analyzer that assume C rather than LTO.
+For example, SSA names are printed to the user in ``raw'' form, rather
+than printing the underlying variable name.
+@end itemize
+
+Some ideas for other checkers
+@itemize @bullet
+@item
+File-descriptor-based APIs
+@item
+Linux kernel internal APIs
+@item
+Signal handling
+@end itemize
+
+@node Debugging the Analyzer
+@section Debugging the Analyzer
+@cindex analyzer, debugging
+@cindex static analyzer, debugging
+
+@subsection Special Functions for Debugging the Analyzer
+
+The analyzer recognizes various special functions by name, for use
+in debugging the analyzer. Declarations can be seen in the testsuite
+in @file{analyzer-decls.h}. None of these functions are actually
+implemented.
+
+Add:
+@smallexample
+ __analyzer_break ();
+@end smallexample
+to the source being analyzed to trigger a breakpoint in the analyzer when
+that source is reached. By putting a series of these in the source, it's
+much easier to effectively step through the program state as it's analyzed.
+
+@smallexample
+__analyzer_dump ();
+@end smallexample
+
+will dump the copious information about the analyzer's state each time it
+reaches the call in its traversal of the source.
+
+@smallexample
+__analyzer_dump_path ();
+@end smallexample
+
+will emit a placeholder ``note'' diagnostic with a path to that call site,
+if the analyzer finds a feasible path to it.
+
+The builtin @code{__analyzer_dump_exploded_nodes} will dump information
+after analysis on all of the exploded nodes at that program point:
+
+@smallexample
+ __analyzer_dump_exploded_nodes (0);
+@end smallexample
+
+will dump just the number of nodes, and their IDs.
+
+@smallexample
+ __analyzer_dump_exploded_nodes (1);
+@end smallexample
+
+will also dump all of the states within those nodes.
+
+@smallexample
+ __analyzer_dump_region_model ();
+@end smallexample
+will dump the region_model's state to stderr.
+
+@smallexample
+__analyzer_eval (expr);
+@end smallexample
+will emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
+truthfulness of the argument. This is useful for writing DejaGnu tests.
+
+
+@subsection Other Debugging Techniques
+
+One approach when tracking down where a particular bogus state is
+introduced into the @code{exploded_graph} is to add custom code to
+@code{region_model::validate}.
+
+For example, this custom code (added to @code{region_model::validate})
+breaks with an assertion failure when a variable called @code{ptr}
+acquires a value that's unknown, using
+@code{region_model::get_value_by_name} to locate the variable
+
+@smallexample
+ /* Find a variable matching "ptr". */
+ svalue_id sid = get_value_by_name ("ptr");
+ if (!sid.null_p ())
+ @{
+ svalue *sval = get_svalue (sid);
+ gcc_assert (sval->get_kind () != SK_UNKNOWN);
+ @}
+@end smallexample
+
+making it easier to investigate further in a debugger when this occurs.