diff options
Diffstat (limited to 'boehm-gc/doc/gcdescr.html')
-rw-r--r-- | boehm-gc/doc/gcdescr.html | 560 |
1 files changed, 0 insertions, 560 deletions
diff --git a/boehm-gc/doc/gcdescr.html b/boehm-gc/doc/gcdescr.html deleted file mode 100644 index cab6bde..0000000 --- a/boehm-gc/doc/gcdescr.html +++ /dev/null @@ -1,560 +0,0 @@ -<HTML> -<HEAD> - <TITLE> Conservative GC Algorithmic Overview </TITLE> - <AUTHOR> Hans-J. Boehm, HP Labs (Much of this was written at SGI)</author> -</HEAD> -<BODY> -<H1> <I>This is under construction, and may always be.</i> </h1> -<H1> Conservative GC Algorithmic Overview </h1> -<P> -This is a description of the algorithms and data structures used in our -conservative garbage collector. I expect the level of detail to increase -with time. For a survey of GC algorithms, see for example -<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's -excellent paper</a>. For an overview of the collector interface, -see <A HREF="gcinterface.html">here</a>. -<P> -This description is targeted primarily at someone trying to understand the -source code. It specifically refers to variable and function names. -It may also be useful for understanding the algorithms at a higher level. -<P> -The description here assumes that the collector is used in default mode. -In particular, we assume that it used as a garbage collector, and not just -a leak detector. We initially assume that it is used in stop-the-world, -non-incremental mode, though the presence of the incremental collector -will be apparent in the design. -We assume the default finalization model, but the code affected by that -is very localized. -<H2> Introduction </h2> -The garbage collector uses a modified mark-sweep algorithm. Conceptually -it operates roughly in four phases, which are performed occasionally -as part of a memory allocation: - -<OL> - -<LI> -<I>Preparation</i> Each object has an associated mark bit. -Clear all mark bits, indicating that all objects -are potentially unreachable. - -<LI> -<I>Mark phase</i> Marks all objects that can be reachable via chains of -pointers from variables. Often the collector has no real information -about the location of pointer variables in the heap, so it -views all static data areas, stacks and registers as potentially containing -pointers. Any bit patterns that represent addresses inside -heap objects managed by the collector are viewed as pointers. -Unless the client program has made heap object layout information -available to the collector, any heap objects found to be reachable from -variables are again scanned similarly. - -<LI> -<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked, -objects, and returns them to an appropriate free list for reuse. This is -not really a separate phase; even in non incremental mode this is operation -is usually performed on demand during an allocation that discovers an empty -free list. Thus the sweep phase is very unlikely to touch a page that -would not have been touched shortly thereafter anyway. - -<LI> -<I>Finalization phase</i> Unreachable objects which had been registered -for finalization are enqueued for finalization outside the collector. - -</ol> - -<P> -The remaining sections describe the memory allocation data structures, -and then the last 3 collection phases in more detail. We conclude by -outlining some of the additional features implemented in the collector. - -<H2>Allocation</h2> -The collector includes its own memory allocator. The allocator obtains -memory from the system in a platform-dependent way. Under UNIX, it -uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>. -<P> -Most static data used by the allocator, as well as that needed by the -rest of the garbage collector is stored inside the -<TT>_GC_arrays</tt> structure. -This allows the garbage collector to easily ignore the collectors own -data structures when it searches for root pointers. Other allocator -and collector internal data structures are allocated dynamically -with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not -allow for deallocation, and is therefore used only for permanent data -structures. -<P> -The allocator allocates objects of different <I>kinds</i>. -Different kinds are handled somewhat differently by certain parts -of the garbage collector. Certain kinds are scanned for pointers, -others are not. Some may have per-object type descriptors that -determine pointer locations. Or a specific kind may correspond -to one specific object layout. Two built-in kinds are uncollectable. -One (<TT>STUBBORN</tt>) is immutable without special precautions. -In spite of that, it is very likely that most C clients of the -collector currently -use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects. -The <A HREF="http://gcc.gnu.org/java">gcj</a> runtime also makes -heavy use of a kind (allocated with GC_gcj_malloc) that stores -type information at a known offset in method tables. -<P> -The collector uses a two level allocator. A large block is defined to -be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2, -typically on the order of the page size. -<P> -Large block sizes are rounded up to -the next multiple of <TT>HBLKSIZE</tt> and then allocated by -<TT>GC_allochblk</tt>. Recent versions of the collector -use an approximate best fit algorithm by keeping free lists for -several large block sizes. -The actual -implementation of <TT>GC_allochblk</tt> -is significantly complicated by black-listing issues -(see below). -<P> -Small blocks are allocated in chunks of size <TT>HBLKSIZE</tt>. -Each chunk is -dedicated to only one object size and kind. The allocator maintains -separate free lists for each size and kind of object. -<P> -Once a large block is split for use in smaller objects, it can only -be used for objects of that size, unless the collector discovers a completely -empty chunk. Completely empty chunks are restored to the appropriate -large block free list. -<P> -In order to avoid allocating blocks for too many distinct object sizes, -the collector normally does not directly allocate objects of every possible -request size. Instead request are rounded up to one of a smaller number -of allocated sizes, for which free lists are maintained. The exact -allocated sizes are computed on demand, but subject to the constraint -that they increase roughly in geometric progression. Thus objects -requested early in the execution are likely to be allocated with exactly -the requested size, subject to alignment constraints. -See <TT>GC_init_size_map</tt> for details. -<P> -The actual size rounding operation during small object allocation is -implemented as a table lookup in <TT>GC_size_map</tt>. -<P> -Both collector initialization and computation of allocated sizes are -handled carefully so that they do not slow down the small object fast -allocation path. An attempt to allocate before the collector is initialized, -or before the appropriate <TT>GC_size_map</tt> entry is computed, -will take the same path as an allocation attempt with an empty free list. -This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>) -which performs the appropriate initialization checks. -<P> -In non-incremental mode, we make a decision about whether to garbage collect -whenever an allocation would otherwise have failed with the current heap size. -If the total amount of allocation since the last collection is less than -the heap size divided by <TT>GC_free_space_divisor</tt>, we try to -expand the heap. Otherwise, we initiate a garbage collection. This ensures -that the amount of garbage collection work per allocated byte remains -constant. -<P> -The above is in fact an oversimplification of the real heap expansion -and GC triggering heuristic, which adjusts slightly for root size -and certain kinds of -fragmentation. In particular: -<UL> -<LI> Programs with a large root set size and -little live heap memory will expand the heap to amortize the cost of -scanning the roots. -<LI> Versions 5.x of the collector actually collect more frequently in -nonincremental mode. The large block allocator usually refuses to split -large heap blocks once the garbage collection threshold is -reached. This often has the effect of collecting well before the -heap fills up, thus reducing fragmentation and working set size at the -expense of GC time. Versions 6.x choose an intermediate strategy depending -on how much large object allocation has taken place in the past. -(If the collector is configured to unmap unused pages, versions 6.x -use the 5.x strategy.) -<LI> In calculating the amount of allocation since the last collection we -give partial credit for objects we expect to be explicitly deallocated. -Even if all objects are explicitly managed, it is often desirable to collect -on rare occasion, since that is our only mechanism for coalescing completely -empty chunks. -</ul> -<P> -It has been suggested that this should be adjusted so that we favor -expansion if the resulting heap still fits into physical memory. -In many cases, that would no doubt help. But it is tricky to do this -in a way that remains robust if multiple application are contending -for a single pool of physical memory. - -<H2>Mark phase</h2> - -At each collection, the collector marks all objects that are -possibly reachable from pointer variables. Since it cannot generally -tell where pointer variables are located, it scans the following -<I>root segments</i> for pointers: -<UL> -<LI>The registers. Depending on the architecture, this may be done using -assembly code, or by calling a <TT>setjmp</tt>-like function which saves -register contents on the stack. -<LI>The stack(s). In the case of a single-threaded application, -on most platforms this -is done by scanning the memory between (an approximation of) the current -stack pointer and <TT>GC_stackbottom</tt>. (For Itanium, the register stack -scanned separately.) The <TT>GC_stackbottom</tt> variable is set in -a highly platform-specific way depending on the appropriate configuration -information in <TT>gcconfig.h</tt>. Note that the currently active -stack needs to be scanned carefully, since callee-save registers of -client code may appear inside collector stack frames, which may -change during the mark process. This is addressed by scanning -some sections of the stack "eagerly", effectively capturing a snapshot -at one point in time. -<LI>Static data region(s). In the simplest case, this is the region -between <TT>DATASTART</tt> and <TT>DATAEND</tt>, as defined in -<TT>gcconfig.h</tt>. However, in most cases, this will also involve -static data regions associated with dynamic libraries. These are -identified by the mostly platform-specific code in <TT>dyn_load.c</tt>. -</ul> -The marker maintains an explicit stack of memory regions that are known -to be accessible, but that have not yet been searched for contained pointers. -Each stack entry contains the starting address of the block to be scanned, -as well as a descriptor of the block. If no layout information is -available for the block, then the descriptor is simply a length. -(For other possibilities, see <TT>gc_mark.h</tt>.) -<P> -At the beginning of the mark phase, all root segments -(as described above) are pushed on the -stack by <TT>GC_push_roots</tt>. (Registers and eagerly processed -stack sections are processed by pushing the referenced objects instead -of the stack section itself.) If <TT>ALL_INTERIOR_PTRS</tt> is not -defined, then stack roots require special treatment. In this case, the -normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt> -explicitly checks for interior pointers and pushes descriptors for target -objects. -<P> -The marker is structured to allow incremental marking. -Each call to <TT>GC_mark_some</tt> performs a small amount of -work towards marking the heap. -It maintains -explicit state in the form of <TT>GC_mark_state</tt>, which -identifies a particular sub-phase. Some other pieces of state, most -notably the mark stack, identify how much work remains to be done -in each sub-phase. The normal progression of mark states for -a stop-the-world collection is: -<OL> -<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked -objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously -be false, so the mark state is advanced to -<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push -uncollectable objects, roots, and then mark everything reachable from them. -<TT>Scan_ptr</tt> is advanced through the heap until all uncollectable -objects are pushed, and objects reachable from them are marked. -At that point, the next call to <TT>GC_mark_some</tt> calls -<TT>GC_push_roots</tt> to push the roots. It the advances the -mark state to -<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is -empty, all reachable objects are marked. Once in this state, we work -only on emptying the mark stack. Once this is completed, the state -changes to -<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked. -</ol> - -The core mark routine <TT>GC_mark_from</tt>, is called -repeatedly by several of the sub-phases when the mark stack starts to fill -up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state -to empty the mark stack. -The routine is designed to only perform a limited amount of marking at -each call, so that it can also be used by the incremental collector. -It is fairly carefully tuned, since it usually consumes a large majority -of the garbage collection time. -<P> -The fact that it perform a only a small amount of work per call also -allows it to be used as the core routine of the parallel marker. In that -case it is normally invoked on thread-private mark stacks instead of the -global mark stack. More details can be found in -<A HREF="scale.html">scale.html</a> -<P> -The marker correctly handles mark stack overflows. Whenever the mark stack -overflows, the mark state is reset to <TT>MS_INVALID</tt>. -Since there are already marked objects in the heap, -this eventually forces a complete -scan of the heap, searching for pointers, during which any unmarked objects -referenced by marked objects are again pushed on the mark stack. This -process is repeated until the mark phase completes without a stack overflow. -Each time the stack overflows, an attempt is made to grow the mark stack. -All pieces of the collector that push regions onto the mark stack have to be -careful to ensure forward progress, even in case of repeated mark stack -overflows. Every mark attempt results in additional marked objects. -<P> -Each mark stack entry is processed by examining all candidate pointers -in the range described by the entry. If the region has no associated -type information, then this typically requires that each 4-byte aligned -quantity (8-byte aligned with 64-bit pointers) be considered a candidate -pointer. -<P> -We determine whether a candidate pointer is actually the address of -a heap block. This is done in the following steps: -<NL> -<LI> The candidate pointer is checked against rough heap bounds. -These heap bounds are maintained such that all actual heap objects -fall between them. In order to facilitate black-listing (see below) -we also include address regions that the heap is likely to expand into. -Most non-pointers fail this initial test. -<LI> The candidate pointer is divided into two pieces; the most significant -bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and -the least significant bits specify an offset within that page. -(A hardware page may actually consist of multiple such pages. -HBLKSIZE is usually the page size divided by a small power of two.) -<LI> -The page address part of the candidate pointer is looked up in a -<A HREF="tree.html">table</a>. -Each table entry contains either 0, indicating that the page is not part -of the garbage collected heap, a small integer <I>n</i>, indicating -that the page is part of large object, starting at least <I>n</i> pages -back, or a pointer to a descriptor for the page. In the first case, -the candidate pointer i not a true pointer and can be safely ignored. -In the last two cases, we can obtain a descriptor for the page containing -the beginning of the object. -<LI> -The starting address of the referenced object is computed. -The page descriptor contains the size of the object(s) -in that page, the object kind, and the necessary mark bits for those -objects. The size information can be used to map the candidate pointer -to the object starting address. To accelerate this process, the page header -also contains a pointer to a precomputed map of page offsets to displacements -from the beginning of an object. The use of this map avoids a -potentially slow integer remainder operation in computing the object -start address. -<LI> -The mark bit for the target object is checked and set. If the object -was previously unmarked, the object is pushed on the mark stack. -The descriptor is read from the page descriptor. (This is computed -from information <TT>GC_obj_kinds</tt> when the page is first allocated.) -</nl> -<P> -At the end of the mark phase, mark bits for left-over free lists are cleared, -in case a free list was accidentally marked due to a stray pointer. - -<H2>Sweep phase</h2> - -At the end of the mark phase, all blocks in the heap are examined. -Unmarked large objects are immediately returned to the large object free list. -Each small object page is checked to see if all mark bits are clear. -If so, the entire page is returned to the large object free list. -Small object pages containing some reachable object are queued for later -sweeping, unless we determine that the page contains very little free -space, in which case it is not examined further. -<P> -This initial sweep pass touches only block headers, not -the blocks themselves. Thus it does not require significant paging, even -if large sections of the heap are not in physical memory. -<P> -Nonempty small object pages are swept when an allocation attempt -encounters an empty free list for that object size and kind. -Pages for the correct size and kind are repeatedly swept until at -least one empty block is found. Sweeping such a page involves -scanning the mark bit array in the page header, and building a free -list linked through the first words in the objects themselves. -This does involve touching the appropriate data page, but in most cases -it will be touched only just before it is used for allocation. -Hence any paging is essentially unavoidable. -<P> -Except in the case of pointer-free objects, we maintain the invariant -that any object in a small object free list is cleared (except possibly -for the link field). Thus it becomes the burden of the small object -sweep routine to clear objects. This has the advantage that we can -easily recover from accidentally marking a free list, though that could -also be handled by other means. The collector currently spends a fair -amount of time clearing objects, and this approach should probably be -revisited. -<P> -In most configurations, we use specialized sweep routines to handle common -small object sizes. Since we allocate one mark bit per word, it becomes -easier to examine the relevant mark bits if the object size divides -the word length evenly. We also suitably unroll the inner sweep loop -in each case. (It is conceivable that profile-based procedure cloning -in the compiler could make this unnecessary and counterproductive. I -know of no existing compiler to which this applies.) -<P> -The sweeping of small object pages could be avoided completely at the expense -of examining mark bits directly in the allocator. This would probably -be more expensive, since each allocation call would have to reload -a large amount of state (e.g. next object address to be swept, position -in mark bit table) before it could do its work. The current scheme -keeps the allocator simple and allows useful optimizations in the sweeper. - -<H2>Finalization</h2> -Both <TT>GC_register_disappearing_link</tt> and -<TT>GC_register_finalizer</tt> add the request to a corresponding hash -table. The hash table is allocated out of collected memory, but -the reference to the finalizable object is hidden from the collector. -Currently finalization requests are processed non-incrementally at the -end of a mark cycle. -<P> -The collector makes an initial pass over the table of finalizable objects, -pushing the contents of unmarked objects onto the mark stack. -After pushing each object, the marker is invoked to mark all objects -reachable from it. The object itself is not explicitly marked. -This assures that objects on which a finalizer depends are neither -collected nor finalized. -<P> -If in the process of marking from an object the -object itself becomes marked, we have uncovered -a cycle involving the object. This usually results in a warning from the -collector. Such objects are not finalized, since it may be -unsafe to do so. See the more detailed -<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/finalization.html"> discussion of finalization semantics</a>. -<P> -Any objects remaining unmarked at the end of this process are added to -a queue of objects whose finalizers can be run. Depending on collector -configuration, finalizers are dequeued and run either implicitly during -allocation calls, or explicitly in response to a user request. -(Note that the former is unfortunately both the default and not generally safe. -If finalizers perform synchronization, it may result in deadlocks. -Nontrivial finalizers generally need to perform synchronization, and -thus require a different collector configuration.) -<P> -The collector provides a mechanism for replacing the procedure that is -used to mark through objects. This is used both to provide support for -Java-style unordered finalization, and to ignore certain kinds of cycles, -<I>e.g.</i> those arising from C++ implementations of virtual inheritance. - -<H2>Generational Collection and Dirty Bits</h2> -We basically use the concurrent and generational GC algorithm described in -<A HREF="http://www.hpl.hp.com/personal/Hans_Boehm/gc/papers/pldi91.ps.Z">"Mostly Parallel Garbage Collection"</a>, -by Boehm, Demers, and Shenker. -<P> -The most significant modification is that -the collector always starts running in the allocating thread. -There is no separate garbage collector thread. (If parallel GC is -enabled, helper threads may also be woken up.) -If an allocation attempt either requests a large object, or encounters -an empty small object free list, and notices that there is a collection -in progress, it immediately performs a small amount of marking work -as described above. -<P> -This change was made both because we wanted to easily accommodate -single-threaded environments, and because a separate GC thread requires -very careful control over the scheduler to prevent the mutator from -out-running the collector, and hence provoking unneeded heap growth. -<P> -In incremental mode, the heap is always expanded when we encounter -insufficient space for an allocation. Garbage collection is triggered -whenever we notice that more than -<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt> -bytes of allocation have taken place. -After <TT>GC_full_freq</tt> minor collections a major collection -is started. -<P> -All collections initially run interrupted until a predetermined -amount of time (50 msecs by default) has expired. If this allows -the collection to complete entirely, we can avoid correcting -for data structure modifications during the collection. If it does -not complete, we return control to the mutator, and perform small -amounts of additional GC work during those later allocations that -cannot be satisfied from small object free lists. When marking completes, -the set of modified pages is retrieved, and we mark once again from -marked objects on those pages, this time with the mutator stopped. -<P> -We keep track of modified pages using one of several distinct mechanisms: -<OL> -<LI> -Through explicit mutator cooperation. Currently this requires -the use of <TT>GC_malloc_stubborn</tt>, and is rarely used. -<LI> -(<TT>MPROTECT_VDB</tt>) By write-protecting physical pages and -catching write faults. This is -implemented for many Unix-like systems and for win32. It is not possible -in a few environments. -<LI> -(<TT>PROC_VDB</tt>) By retrieving dirty bit information from /proc. -(Currently only Sun's -Solaris supports this. Though this is considerably cleaner, performance -may actually be better with mprotect and signals.) -<LI> -(<TT>PCR_VDB</tt>) By relying on an external dirty bit implementation, in this -case the one in Xerox PCR. -<LI> -(<TT>DEFAULT_VDB</tt>) By treating all pages as dirty. This is the default if -none of the other techniques is known to be usable, and -<TT>GC_malloc_stubborn</tt> is not used. Practical only for testing, or if -the vast majority of objects use <TT>GC_malloc_stubborn</tt>. -</ol> - -<H2>Black-listing</h2> - -The collector implements <I>black-listing</i> of pages, as described -in -<A HREF="http://www.acm.org/pubs/citations/proceedings/pldi/155090/p197-boehm/"> -Boehm, ``Space Efficient Conservative Collection'', PLDI '93</a>, also available -<A HREF="papers/pldi93.ps.Z">here</a>. -<P> -During the mark phase, the collector tracks ``near misses'', i.e. attempts -to follow a ``pointer'' to just outside the garbage-collected heap, or -to a currently unallocated page inside the heap. Pages that have been -the targets of such near misses are likely to be the targets of -misidentified ``pointers'' in the future. To minimize the future -damage caused by such misidentifications they will be allocated only to -small pointerfree objects. -<P> -The collector understands two different kinds of black-listing. A -page may be black listed for interior pointer references -(<TT>GC_add_to_black_list_stack</tt>), if it was the target of a near -miss from a location that requires interior pointer recognition, -<I>e.g.</i> the stack, or the heap if <TT>GC_all_interior_pointers</tt> -is set. In this case, we also avoid allocating large blocks that include -this page. -<P> -If the near miss came from a source that did not require interior -pointer recognition, it is black-listed with -<TT>GC_add_to_black_list_normal</tt>. -A page black-listed in this way may appear inside a large object, -so long as it is not the first page of a large object. -<P> -The <TT>GC_allochblk</tt> routine respects black-listing when assigning -a block to a particular object kind and size. It occasionally -drops (i.e. allocates and forgets) blocks that are completely black-listed -in order to avoid excessively long large block free lists containing -only unusable blocks. This would otherwise become an issue -if there is low demand for small pointerfree objects. - -<H2>Thread support</h2> -We support several different threading models. Unfortunately Pthreads, -the only reasonably well standardized thread model, supports too narrow -an interface for conservative garbage collection. There appears to be -no completely portable way to allow the collector -to coexist with various Pthreads -implementations. Hence we currently support only the more -common Pthreads implementations. -<P> -In particular, it is very difficult for the collector to stop all other -threads in the system and examine the register contents. This is currently -accomplished with very different mechanisms for some Pthreads -implementations. The Solaris implementation temporarily disables much -of the user-level threads implementation by stopping kernel-level threads -("lwp"s). The Linux/HPUX/OSF1 and Irix implementations sends signals to -individual Pthreads and has them wait in the signal handler. -<P> -The Linux and Irix implementations use -only documented Pthreads calls, but rely on extensions to their semantics. -The Linux implementation <TT>linux_threads.c</tt> relies on only very -mild extensions to the pthreads semantics, and already supports a large number -of other Unix-like pthreads implementations. Our goal is to make this the -only pthread support in the collector. -<P> -(The Irix implementation is separate only for historical reasons and should -clearly be merged. The current Solaris implementation probably performs -better in the uniprocessor case, but does not support thread operations in the -collector. Hence it cannot support the parallel marker.) -<P> -All implementations must -intercept thread creation and a few other thread-specific calls to allow -enumeration of threads and location of thread stacks. This is current -accomplished with <TT># define</tt>'s in <TT>gc.h</tt> -(really <TT>gc_pthread_redirects.h</tt>), or optionally -by using ld's function call wrapping mechanism under Linux. -<P> -Recent versions of the collector support several facilites to enhance -the processor-scalability and thread performance of the collector. -These are discussed in more detail <A HREF="scale.html">here</a>. -<P> -Comments are appreciated. Please send mail to -<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> or -<A HREF="mailto:Hans.Boehm@hp.com"><TT>Hans.Boehm@hp.com</tt></a> -<P> -This is a modified copy of a page written while the author was at SGI. -The original was <A HREF="http://reality.sgi.com/boehm/gcdescr.html">here</a>. -</body> -</html> |