diff options
author | Tom Tromey <tromey@gcc.gnu.org> | 2001-08-17 18:39:15 +0000 |
---|---|---|
committer | Tom Tromey <tromey@gcc.gnu.org> | 2001-08-17 18:39:15 +0000 |
commit | 61f38a77a0c4f467c59707ce434d34eae52d6692 (patch) | |
tree | b8e0afefff8ad4f413b4d19a28a69f1e15de225b /boehm-gc/doc | |
parent | b38a75e57d32c479904ae185ad716f66ab44c591 (diff) | |
download | gcc-61f38a77a0c4f467c59707ce434d34eae52d6692.zip gcc-61f38a77a0c4f467c59707ce434d34eae52d6692.tar.gz gcc-61f38a77a0c4f467c59707ce434d34eae52d6692.tar.bz2 |
Initial revision
From-SVN: r44969
Diffstat (limited to 'boehm-gc/doc')
-rw-r--r-- | boehm-gc/doc/README.autoconf | 59 | ||||
-rw-r--r-- | boehm-gc/doc/README.macros | 78 | ||||
-rw-r--r-- | boehm-gc/doc/debugging.html | 289 | ||||
-rw-r--r-- | boehm-gc/doc/gcdescr.html | 438 | ||||
-rw-r--r-- | boehm-gc/doc/tree.html | 198 |
5 files changed, 1062 insertions, 0 deletions
diff --git a/boehm-gc/doc/README.autoconf b/boehm-gc/doc/README.autoconf new file mode 100644 index 0000000..53fcf5a --- /dev/null +++ b/boehm-gc/doc/README.autoconf @@ -0,0 +1,59 @@ +As of GC6.0alpha8, we attempt to support GNU-style builds based on automake, +autoconf and libtool. This is based almost entirely on Tom Tromey's work +with gcj. + +To build and install libraries use + +configure; make; make install + +The advantages of this process are: + +1) It should eventually do a better job of automatically determining the +right compiler to use, etc. It probably already does in some cases. + +2) It tries to automatically set a good set of default GC parameters for +the platform (e.g. thread support). It provides an easier way to configure +some of the others. + +3) It integrates better with other projects using a GNU-style build process. + +4) It builds both dynamic and static libraries. + +The known disadvantages are: + +1) The build scripts are much more complex and harder to debug (though largely +standard). I don't understand them all, and there's probably lots of redundant +stuff. + +2) It probably doesn't work on all Un*x-like platforms yet. It probably will +never work on the rest. + +3) The scripts are not yet complete. Some of the standard GNU targets don't +yet work. (Corrections/additions are very welcome.) + +The distribution should contain all files needed to run "configure" and "make", +as well as the sources needed to regenerate the derived files. (If I missed +some, please let me know.) + +Note that the distribution comes with a "Makefile" which will be overwritten +by "configure" with one that is not at all equiavelent to the original. The +distribution contains a copy of the original "Makefile" in "Makefile.direct". + +Important options to configure: + + --prefix=PREFIX install architecture-independent files in PREFIX + [/usr/local] + --exec-prefix=EPREFIX install architecture-dependent files in EPREFIX + [same as prefix] + --enable-threads=TYPE choose threading package + --enable-parallel-mark parallelize marking and free list construction + --enable-full-debug include full support for pointer backtracing etc. + +Unless --prefix is set (or --exec-prefix or one of the more obscure options), +make install will install libgc.a and libgc.so in /usr/local/bin, which +would typically require the "make install" to be run as root. + +Most commonly --enable-threads=posix or will be needed. --enable-parallel-mark +is recommended for multiprocessors if it is supported on the platform. + + diff --git a/boehm-gc/doc/README.macros b/boehm-gc/doc/README.macros new file mode 100644 index 0000000..d9df8dd --- /dev/null +++ b/boehm-gc/doc/README.macros @@ -0,0 +1,78 @@ +The collector uses a large amount of conditional compilation in order to +deal with platform dependencies. This violates a number of known coding +standards. On the other hand, it seems to be the only practical way to +support this many platforms without excessive code duplication. + +A few guidelines have mostly been followed in order to keep this manageable: + +1) #if and #ifdef directives are properly indented whenever easily possible. +All known C compilers allow whitespace between the "#" and the "if" to make +this possible. ANSI C also allows white space before the "#", though we +avoid that. It has the known disadvantages that it differs from the normal +GNU conventions, and that it makes patches larger than otherwise necessary. +In my opinion, it's still well worth it, for the same reason that we indent +ordinary "if" statements. + +2) Whenever possible, tests are performed on the macros defined in gcconfig.h +instead of directly testing patform-specific predefined macros. This makes it +relatively easy to adapt to new compilers with a different set of predefined +macros. Currently these macros generally identify platforms instead of +features. In many cases, this is a mistake. + +3) The code currently avoids #elif, eventhough that would make it more +readable. This was done since #elif would need to be understood by ALL +compilers used to build the collector, and that hasn't always been the case. +It makes sense to reconsider this decision at some point, since #elif has been +standardized at least since 1989. + +Many of the tested configuration macros are at least somewhat defined in +either include/private/gcconfig.h or in Makefile.direct. Here is an attempt +at defining some of the remainder: (Thanks to Walter Bright for suggesting +this. This is a work in progress) + +MACRO EXPLANATION +----- ----------- + +__DMC__ Always #define'd by the Digital Mars compiler. Expands + to the compiler version number in hex, i.e. 0x810 is + version 8.1b0 + +_ENABLE_ARRAYNEW + #define'd by the Digital Mars C++ compiler when + operator new[] and delete[] are separately + overloadable. Used in gc_cpp.h. + +_MSC_VER Expands to the Visual C++ compiler version. Assumed to + not be defined for other compilers (at least if they behave + appreciably differently). + +_DLL Defined by Visual C++ if dynamic libraries are being built + or used. Used to test whether __declspec(dllimport) or + __declspec(dllexport) needs to be added to declarations + to support the case in which the collector is in a dll. + +GC_DLL User-settable macro that forces the effect of _DLL. + +GC_NOT_DLL User-settable macro that overrides _DLL, e.g. if dynamic + libraries are used, but the collector is in a static library. + +__STDC__ Assumed to be defined only by compilers that understand + prototypes and other C89 features. Its value is generally + not used, since we are fine with most nonconforming extensions. + +SUNOS5SIGS Solaris-like signal handling. This is probably misnamed, + since it really doesn't guarantee much more than Posix. + Currently set only for Solaris2.X, HPUX, and DRSNX. Should + probably be set for some other platforms. + +PCR Set if the collector is being built as part of the Xerox + Portable Common Runtime. + +SRC_M3 Set if the collector is being built as a replacement of the + one in the DEC/Compaq SRC Modula-3 runtime. I suspect this + was last used around 1994, and no doubt broke a long time ago. + It's there primarily incase someone wants to port to a similar + system. + + + diff --git a/boehm-gc/doc/debugging.html b/boehm-gc/doc/debugging.html new file mode 100644 index 0000000..a186ff5 --- /dev/null +++ b/boehm-gc/doc/debugging.html @@ -0,0 +1,289 @@ +<HTML> +<HEAD> +<TITLE>Debugging Garbage Collector Related Problems</title> +</head> +<BODY> +<H1>Debugging Garbage Collector Related Problems</h1> +This page contains some hints on +debugging issues specific to +the Boehm-Demers-Weiser conservative garbage collector. +It applies both to debugging issues in client code that manifest themselves +as collector misbehavior, and to debugging the collector itself. +<P> +If you suspect a bug in the collector itself, it is strongly recommended +that you try the latest collector release, even if it is labelled as "alpha", +before proceeding. +<H2>Bus Errors and Segmentation Violations</h2> +<P> +If the fault occurred in GC_find_limit, or with incremental collection enabled, +this is probably normal. The collector installs handlers to take care of +these. You will not see these unless you are using a debugger. +Your debugger <I>should</i> allow you to continue. +It's often preferable to tell the debugger to ignore SIGBUS and SIGSEGV +("<TT>handle SIGSEGV SIGBUS nostop noprint</tt>" in gdb, +"<TT>ignore SIGSEGV SIGBUS</tt>" in most versions of dbx) +and set a breakpoint in <TT>abort</tt>. +The collector will call abort if the signal had another cause, +and there was not other handler previously installed. +<P> +We recommend debugging without incremental collection if possible. +(This applies directly to UNIX systems. +Debugging with incremental collection under win32 is worse. See README.win32.) +<P> +If the application generates an unhandled SIGSEGV or equivalent, it may +often be easiest to set the environment variable GC_LOOP_ON_ABORT. On many +platforms, this will cause the collector to loop in a handler when the +SIGSEGV is encountered (or when the collector aborts for some other reason), +and a debugger can then be attached to the looping +process. This sidesteps common operating system problems related +to incomplete core files for multithreaded applications, etc. +<H2>Other Signals</h2> +On most platforms, the multithreaded version of the collector needs one or +two other signals for internal use by the collector in stopping threads. +It is normally wise to tell the debugger to ignore these. On Linux, +the collector currently uses SIGPWR and SIGXCPU by default. +<H2>Warning Messages About Needing to Allocate Blacklisted Blocks</h2> +The garbage collector generates warning messages of the form +<PRE> +Needed to allocate blacklisted block at 0x... +</pre> +when it needs to allocate a block at a location that it knows to be +referenced by a false pointer. These false pointers can be either permanent +(<I>e.g.</i> a static integer variable that never changes) or temporary. +In the latter case, the warning is largely spurious, and the block will +eventually be reclaimed normally. +In the former case, the program will still run correctly, but the block +will never be reclaimed. Unless the block is intended to be +permanent, the warning indicates a memory leak. +<OL> +<LI>Ignore these warnings while you are using GC_DEBUG. Some of the routines +mentioned below don't have debugging equivalents. (Alternatively, write +the missing routines and send them to me.) +<LI>Replace allocator calls that request large blocks with calls to +<TT>GC_malloc_ignore_off_page</tt> or +<TT>GC_malloc_atomic_ignore_off_page</tt>. You may want to set a +breakpoint in <TT>GC_default_warn_proc</tt> to help you identify such calls. +Make sure that a pointer to somewhere near the beginning of the resulting block +is maintained in a (preferably volatile) variable as long as +the block is needed. +<LI> +If the large blocks are allocated with realloc, we suggest instead allocating +them with something like the following. Note that the realloc size increment +should be fairly large (e.g. a factor of 3/2) for this to exhibit reasonable +performance. But we all know we should do that anyway. +<PRE> +void * big_realloc(void *p, size_t new_size) +{ + size_t old_size = GC_size(p); + void * result; + + if (new_size <= 10000) return(GC_realloc(p, new_size)); + if (new_size <= old_size) return(p); + result = GC_malloc_ignore_off_page(new_size); + if (result == 0) return(0); + memcpy(result,p,old_size); + GC_free(p); + return(result); +} +</pre> + +<LI> In the unlikely case that even relatively small object +(<20KB) allocations are triggering these warnings, then your address +space contains lots of "bogus pointers", i.e. values that appear to +be pointers but aren't. Usually this can be solved by using GC_malloc_atomic +or the routines in gc_typed.h to allocate large pointer-free regions of bitmaps, etc. Sometimes the problem can be solved with trivial changes of encoding +in certain values. It is possible, to identify the source of the bogus +pointers by building the collector with <TT>-DPRINT_BLACK_LIST</tt>, +which will cause it to print the "bogus pointers", along with their location. + +<LI> If you get only a fixed number of these warnings, you are probably only +introducing a bounded leak by ignoring them. If the data structures being +allocated are intended to be permanent, then it is also safe to ignore them. +The warnings can be turned off by calling GC_set_warn_proc with a procedure +that ignores these warnings (e.g. by doing absolutely nothing). +</ol> + +<H2>The Collector References a Bad Address in <TT>GC_malloc</tt></h2> + +This typically happens while the collector is trying to remove an entry from +its free list, and the free list pointer is bad because the free list link +in the last allocated object was bad. +<P> +With > 99% probability, you wrote past the end of an allocated object. +Try setting <TT>GC_DEBUG</tt> before including <TT>gc.h</tt> and +allocating with <TT>GC_MALLOC</tt>. This will try to detect such +overwrite errors. + +<H2>Unexpectedly Large Heap</h2> + +Unexpected heap growth can be due to one of the following: +<OL> +<LI> Data structures that are being unintentionally retained. This +is commonly caused by data structures that are no longer being used, +but were not cleared, or by caches growing without bounds. +<LI> Pointer misidentification. The garbage collector is interpreting +integers or other data as pointers and retaining the "referenced" +objects. +<LI> Heap fragmentation. This should never result in unbounded growth, +but it may account for larger heaps. This is most commonly caused +by allocation of large objects. On some platforms it can be reduced +by building with -DUSE_MUNMAP, which will cause the collector to unmap +memory corresponding to pages that have not been recently used. +<LI> Per object overhead. This is usually a relatively minor effect, but +it may be worth considering. If the collector recognizes interior +pointers, object sizes are increased, so that one-past-the-end pointers +are correctly recognized. The collector can be configured not to do this +(<TT>-DDONT_ADD_BYTE_AT_END</tt>). +<P> +The collector rounds up object sizes so the result fits well into the +chunk size (<TT>HBLKSIZE</tt>, normally 4K on 32 bit machines, 8K +on 64 bit machines) used by the collector. Thus it may be worth avoiding +objects of size 2K + 1 (or 2K if a byte is being added at the end.) +</ol> +The last two cases can often be identified by looking at the output +of a call to <TT>GC_dump()</tt>. Among other things, it will print the +list of free heap blocks, and a very brief description of all chunks in +the heap, the object sizes they correspond to, and how many live objects +were found in the chunk at the last collection. +<P> +Growing data structures can usually be identified by +<OL> +<LI> Building the collector with <TT>-DKEEP_BACK_PTRS</tt>, +<LI> Preferably using debugging allocation (defining <TT>GC_DEBUG</tt> +before including <TT>gc.h</tt> and allocating with <TT>GC_MALLOC</tt>), +so that objects will be identified by their allocation site, +<LI> Running the application long enough so +that most of the heap is composed of "leaked" memory, and +<LI> Then calling <TT>GC_generate_random_backtrace()</tt> from backptr.h +a few times to determine why some randomly sampled objects in the heap are +being retained. +</ol> +<P> +The same technique can often be used to identify problems with false +pointers, by noting whether the reference chains printed by +<TT>GC_generate_random_backtrace()</tt> involve any misidentified pointers. +An alternate technique is to build the collector with +<TT>-DPRINT_BLACK_LIST</tt> which will cause it to report values that +are almost, but not quite, look like heap pointers. It is very likely that +actual false pointers will come from similar sources. +<P> +In the unlikely case that false pointers are an issue, it can usually +be resolved using one or more of the following techniques: +<OL> +<LI> Use <TT>GC_malloc_atomic</tt> for objects containing no pointers. +This is especially important for large arrays containing compressed data, +pseudo-random numbers, and the like. It is also likely to improve GC +performance, perhaps drastically so if the application is paging. +<LI> If you allocate large objects containing only +one or two pointers at the beginning, either try the typed allocation +primitives is <TT>gc_typed.h</tt>, or separate out the pointerfree component. +<LI> Consider using <TT>GC_malloc_ignore_off_page()</tt> +to allocate large objects. (See <TT>gc.h</tt> and above for details. +Large means > 100K in most environments.) +</ol> +<H2>Prematurely Reclaimed Objects</h2> +The usual symptom of this is a segmentation fault, or an obviously overwritten +value in a heap object. This should, of course, be impossible. In practice, +it may happen for reasons like the following: +<OL> +<LI> The collector did not intercept the creation of threads correctly in +a multithreaded application, <I>e.g.</i> because the client called +<TT>pthread_create</tt> without including <TT>gc.h</tt>, which redefines it. +<LI> The last pointer to an object in the garbage collected heap was stored +somewhere were the collector couldn't see it, <I>e.g.</i> in an +object allocated with system <TT>malloc</tt>, in certain types of +<TT>mmap</tt>ed files, +or in some data structure visible only to the OS. (On some platforms, +thread-local storage is one of these.) +<LI> The last pointer to an object was somehow disguised, <I>e.g.</i> by +XORing it with another pointer. +<LI> Incorrect use of <TT>GC_malloc_atomic</tt> or typed allocation. +<LI> An incorrect <TT>GC_free</tt> call. +<LI> The client program overwrote an internal garbage collector data structure. +<LI> A garbage collector bug. +<LI> (Empirically less likely than any of the above.) A compiler optimization +that disguised the last pointer. +</ol> +The following relatively simple techniques should be tried first to narrow +down the problem: +<OL> +<LI> If you are using the incremental collector try turning it off for +debugging. +<LI> Try to reproduce the problem with fully debuggable unoptimized code. +This will eliminate the last possibility, as well as making debugging easier. +<LI> Try replacing any suspect typed allocation and <TT>GC_malloc_atomic</tt> +calls with calls to <TT>GC_malloc</tt>. +<LI> Try removing any GC_free calls (<I>e.g.</i> with a suitable +<TT>#define</tt>). +<LI> Rebuild the collector with <TT>-DGC_ASSERTIONS</tt>. +<LI> If the following works on your platform (i.e. if gctest still works +if you do this), try building the collector with +<TT>-DREDIRECT_MALLOC=GC_malloc_uncollectable</tt>. This will cause +the collector to scan memory allocated with malloc. +</ol> +If all else fails, you will have to attack this with a debugger. +Suggested steps: +<OL> +<LI> Call <TT>GC_dump()</tt> from the debugger around the time of the failure. Verify +that the collectors idea of the root set (i.e. static data regions which +it should scan for pointers) looks plausible. If not, i.e. if it doesn't +include some static variables, report this as +a collector bug. Be sure to describe your platform precisely, since this sort +of problem is nearly always very platform dependent. +<LI> Especially if the failure is not deterministic, try to isolate it to +a relatively small test case. +<LI> Set a break point in <TT>GC_finish_collection</tt>. This is a good +point to examine what has been marked, i.e. found reachable, by the +collector. +<LI> If the failure is deterministic, run the process +up to the last collection before the failure. +Note that the variable <TT>GC_gc_no</tt> counts collections and can be used +to set a conditional breakpoint in the right one. It is incremented just +before the call to GC_finish_collection. +If object <TT>p</tt> was prematurely recycled, it may be helpful to +look at <TT>*GC_find_header(p)</tt> at the failure point. +The <TT>hb_last_reclaimed</tt> field will identify the collection number +during which its block was last swept. +<LI> Verify that the offending object still has its correct contents at +this point. +The call <TT>GC_is_marked(p)</tt> from the debugger to verify that the +object has not been marked, and is about to be reclaimed. +<LI> Determine a path from a root, i.e. static variable, stack, or +register variable, +to the reclaimed object. Call <TT>GC_is_marked(q)</tt> for each object +<TT>q</tt> along the path, trying to locate the first unmarked object, say +<TT>r</tt>. +<LI> If <TT>r</tt> is pointed to by a static root, +verify that the location +pointing to it is part of the root set printed by <TT>GC_dump()</tt>. If it +is on the stack in the main (or only) thread, verify that +<TT>GC_stackbottom</tt> is set correctly to the base of the stack. If it is +in another thread stack, check the collector's thread data structure +(<TT>GC_thread[]</tt> on several platforms) to make sure that stack bounds +are set correctly. +<LI> If <TT>r</tt> is pointed to by heap object <TT>s</tt>, check that the +collector's layout description for <TT>s</tt> is such that the pointer field +will be scanned. Call <TT>*GC_find_header(s)</tt> to look at the descriptor +for the heap chunk. The <TT>hb_descr</tt> field specifies the layout +of objects in that chunk. See gc_mark.h for the meaning of the descriptor. +(If it's low order 2 bits are zero, then it is just the length of the +object prefix to be scanned. This form is always used for objects allocated +with <TT>GC_malloc</tt> or <TT>GC_malloc_atomic</tt>.) +<LI> If the failure is not deterministic, you may still be able to apply some +of the above technique at the point of failure. But remember that objects +allocated since the last collection will not have been marked, even if the +collector is functioning properly. On some platforms, the collector +can be configured to save call chains in objects for debugging. +Enabling this feature will also cause it to save the call stack at the +point of the last GC in GC_arrays._last_stack. +<LI> When looking at GC internal data structures remember that a number +of <TT>GC_</tt><I>xxx</i> variables are really macro defined to +<TT>GC_arrays._</tt><I>xxx</i>, so that +the collector can avoid scanning them. +</ol> +</body> +</html> + + + + diff --git a/boehm-gc/doc/gcdescr.html b/boehm-gc/doc/gcdescr.html new file mode 100644 index 0000000..65e8a8f --- /dev/null +++ b/boehm-gc/doc/gcdescr.html @@ -0,0 +1,438 @@ +<HTML> +<HEAD> + <TITLE> Conservative GC Algorithmic Overview </TITLE> + <AUTHOR> Hans-J. Boehm, Silicon Graphics</author> +</HEAD> +<BODY> +<H1> <I>This is under construction</i> </h1> +<H1> Conservative GC Algorithmic Overview </h1> +<P> +This is a description of the algorithms and data structures used in our +conservative garbage collector. I expect the level of detail to increase +with time. For a survey of GC algorithms, see for example +<A HREF="ftp://ftp.cs.utexas.edu/pub/garbage/gcsurvey.ps"> Paul Wilson's +excellent paper</a>. For an overview of the collector interface, +see <A HREF="gcinterface.html">here</a>. +<P> +This description is targeted primarily at someone trying to understand the +source code. It specifically refers to variable and function names. +It may also be useful for understanding the algorithms at a higher level. +<P> +The description here assumes that the collector is used in default mode. +In particular, we assume that it used as a garbage collector, and not just +a leak detector. We initially assume that it is used in stop-the-world, +non-incremental mode, though the presence of the incremental collector +will be apparent in the design. +We assume the default finalization model, but the code affected by that +is very localized. +<H2> Introduction </h2> +The garbage collector uses a modified mark-sweep algorithm. Conceptually +it operates roughly in four phases: + +<OL> + +<LI> +<I>Preparation</i> Clear all mark bits, indicating that all objects +are potentially unreachable. + +<LI> +<I>Mark phase</i> Marks all objects that can be reachable via chains of +pointers from variables. Normally the collector has no real information +about the location of pointer variables in the heap, so it +views all static data areas, stacks and registers as potentially containing +containing pointers. Any bit patterns that represent addresses inside +heap objects managed by the collector are viewed as pointers. +Unless the client program has made heap object layout information +available to the collector, any heap objects found to be reachable from +variables are again scanned similarly. + +<LI> +<I>Sweep phase</i> Scans the heap for inaccessible, and hence unmarked, +objects, and returns them to an appropriate free list for reuse. This is +not really a separate phase; even in non incremental mode this is operation +is usually performed on demand during an allocation that discovers an empty +free list. Thus the sweep phase is very unlikely to touch a page that +would not have been touched shortly thereafter anyway. + +<LI> +<I>Finalization phase</i> Unreachable objects which had been registered +for finalization are enqueued for finalization outside the collector. + +</ol> + +<P> +The remaining sections describe the memory allocation data structures, +and then the last 3 collection phases in more detail. We conclude by +outlining some of the additional features implemented in the collector. + +<H2>Allocation</h2> +The collector includes its own memory allocator. The allocator obtains +memory from the system in a platform-dependent way. Under UNIX, it +uses either <TT>malloc</tt>, <TT>sbrk</tt>, or <TT>mmap</tt>. +<P> +Most static data used by the allocator, as well as that needed by the +rest of the garbage collector is stored inside the +<TT>_GC_arrays</tt> structure. +This allows the garbage collector to easily ignore the collectors own +data structures when it searches for root pointers. Other allocator +and collector internal data structures are allocated dynamically +with <TT>GC_scratch_alloc</tt>. <TT>GC_scratch_alloc</tt> does not +allow for deallocation, and is therefore used only for permanent data +structures. +<P> +The allocator allocates objects of different <I>kinds</i>. +Different kinds are handled somewhat differently by certain parts +of the garbage collector. Certain kinds are scanned for pointers, +others are not. Some may have per-object type descriptors that +determine pointer locations. Or a specific kind may correspond +to one specific object layout. Two built-in kinds are uncollectable. +One (<TT>STUBBORN</tt>) is immutable without special precautions. +In spite of that, it is very likely that most applications currently +use at most two kinds: <TT>NORMAL</tt> and <TT>PTRFREE</tt> objects. +<P> +The collector uses a two level allocator. A large block is defined to +be one larger than half of <TT>HBLKSIZE</tt>, which is a power of 2, +typically on the order of the page size. +<P> +Large block sizes are rounded up to +the next multiple of <TT>HBLKSIZE</tt> and then allocated by +<TT>GC_allochblk</tt>. This uses roughly what Paul Wilson has termed +a "next fit" algorithm, i.e. first-fit with a rotating pointer. +The implementation does check for a better fitting immediately +adjacent block, which gives it somewhat better fragmentation characteristics. +I'm now convinced it should use a best fit algorithm. The actual +implementation of <TT>GC_allochblk</tt> +is significantly complicated by black-listing issues +(see below). +<P> +Small blocks are allocated in blocks of size <TT>HBLKSIZE</tt>. +Each block is +dedicated to only one object size and kind. The allocator maintains +separate free lists for each size and kind of object. +<P> +In order to avoid allocating blocks for too many distinct object sizes, +the collector normally does not directly allocate objects of every possible +request size. Instead request are rounded up to one of a smaller number +of allocated sizes, for which free lists are maintained. The exact +allocated sizes are computed on demand, but subject to the constraint +that they increase roughly in geometric progression. Thus objects +requested early in the execution are likely to be allocated with exactly +the requested size, subject to alignment constraints. +See <TT>GC_init_size_map</tt> for details. +<P> +The actual size rounding operation during small object allocation is +implemented as a table lookup in <TT>GC_size_map</tt>. +<P> +Both collector initialization and computation of allocated sizes are +handled carefully so that they do not slow down the small object fast +allocation path. An attempt to allocate before the collector is initialized, +or before the appropriate <TT>GC_size_map</tt> entry is computed, +will take the same path as an allocation attempt with an empty free list. +This results in a call to the slow path code (<TT>GC_generic_malloc_inner</tt>) +which performs the appropriate initialization checks. +<P> +In non-incremental mode, we make a decision about whether to garbage collect +whenever an allocation would otherwise have failed with the current heap size. +If the total amount of allocation since the last collection is less than +the heap size divided by <TT>GC_free_space_divisor</tt>, we try to +expand the heap. Otherwise, we initiate a garbage collection. This ensures +that the amount of garbage collection work per allocated byte remains +constant. +<P> +The above is in fat an oversimplification of the real heap expansion +heuristic, which adjusts slightly for root size and certain kinds of +fragmentation. In particular, programs with a large root set size and +little live heap memory will expand the heap to amortize the cost of +scanning the roots. +<P> +Versions 5.x of the collector actually collect more frequently in +nonincremental mode. The large block allocator usually refuses to split +large heap blocks once the garbage collection threshold is +reached. This often has the effect of collecting well before the +heap fills up, thus reducing fragmentation and working set size at the +expense of GC time. 6.x will chose an intermediate strategy depending +on how much large object allocation has taken place in the past. +(If the collector is configured to unmap unused pages, versions 6.x +will use the 5.x strategy.) +<P> +(It has been suggested that this should be adjusted so that we favor +expansion if the resulting heap still fits into physical memory. +In many cases, that would no doubt help. But it is tricky to do this +in a way that remains robust if multiple application are contending +for a single pool of physical memory.) + +<H2>Mark phase</h2> + +The marker maintains an explicit stack of memory regions that are known +to be accessible, but that have not yet been searched for contained pointers. +Each stack entry contains the starting address of the block to be scanned, +as well as a descriptor of the block. If no layout information is +available for the block, then the descriptor is simply a length. +(For other possibilities, see <TT>gc_mark.h</tt>.) +<P> +At the beginning of the mark phase, all root segments are pushed on the +stack by <TT>GC_push_roots</tt>. If <TT>ALL_INTERIOR_PTRS</tt> is not +defined, then stack roots require special treatment. In this case, the +normal marking code ignores interior pointers, but <TT>GC_push_all_stack</tt> +explicitly checks for interior pointers and pushes descriptors for target +objects. +<P> +The marker is structured to allow incremental marking. +Each call to <TT>GC_mark_some</tt> performs a small amount of +work towards marking the heap. +It maintains +explicit state in the form of <TT>GC_mark_state</tt>, which +identifies a particular sub-phase. Some other pieces of state, most +notably the mark stack, identify how much work remains to be done +in each sub-phase. The normal progression of mark states for +a stop-the-world collection is: +<OL> +<LI> <TT>MS_INVALID</tt> indicating that there may be accessible unmarked +objects. In this case <TT>GC_objects_are_marked</tt> will simultaneously +be false, so the mark state is advanced to +<LI> <TT>MS_PUSH_UNCOLLECTABLE</tt> indicating that it suffices to push +uncollectable objects, roots, and then mark everything reachable from them. +<TT>Scan_ptr</tt> is advanced through the heap until all uncollectable +objects are pushed, and objects reachable from them are marked. +At that point, the next call to <TT>GC_mark_some</tt> calls +<TT>GC_push_roots</tt> to push the roots. It the advances the +mark state to +<LI> <TT>MS_ROOTS_PUSHED</tt> asserting that once the mark stack is +empty, all reachable objects are marked. Once in this state, we work +only on emptying the mark stack. Once this is completed, the state +changes to +<LI> <TT>MS_NONE</tt> indicating that reachable objects are marked. +</ol> + +The core mark routine <TT>GC_mark_from_mark_stack</tt>, is called +repeatedly by several of the sub-phases when the mark stack starts to fill +up. It is also called repeatedly in <TT>MS_ROOTS_PUSHED</tt> state +to empty the mark stack. +The routine is designed to only perform a limited amount of marking at +each call, so that it can also be used by the incremental collector. +It is fairly carefully tuned, since it usually consumes a large majority +of the garbage collection time. +<P> +The marker correctly handles mark stack overflows. Whenever the mark stack +overflows, the mark state is reset to <TT>MS_INVALID</tt>. +Since there are already marked objects in the heap, +this eventually forces a complete +scan of the heap, searching for pointers, during which any unmarked objects +referenced by marked objects are again pushed on the mark stack. This +process is repeated until the mark phase completes without a stack overflow. +Each time the stack overflows, an attempt is made to grow the mark stack. +All pieces of the collector that push regions onto the mark stack have to be +careful to ensure forward progress, even in case of repeated mark stack +overflows. Every mark attempt results in additional marked objects. +<P> +Each mark stack entry is processed by examining all candidate pointers +in the range described by the entry. If the region has no associated +type information, then this typically requires that each 4-byte aligned +quantity (8-byte aligned with 64-bit pointers) be considered a candidate +pointer. +<P> +We determine whether a candidate pointer is actually the address of +a heap block. This is done in the following steps: +<NL> +<LI> The candidate pointer is checked against rough heap bounds. +These heap bounds are maintained such that all actual heap objects +fall between them. In order to facilitate black-listing (see below) +we also include address regions that the heap is likely to expand into. +Most non-pointers fail this initial test. +<LI> The candidate pointer is divided into two pieces; the most significant +bits identify a <TT>HBLKSIZE</tt>-sized page in the address space, and +the least significant bits specify an offset within that page. +(A hardware page may actually consist of multiple such pages. +HBLKSIZE is usually the page size divided by a small power of two.) +<LI> +The page address part of the candidate pointer is looked up in a +<A HREF="tree.html">table</a>. +Each table entry contains either 0, indicating that the page is not part +of the garbage collected heap, a small integer <I>n</i>, indicating +that the page is part of large object, starting at least <I>n</i> pages +back, or a pointer to a descriptor for the page. In the first case, +the candidate pointer i not a true pointer and can be safely ignored. +In the last two cases, we can obtain a descriptor for the page containing +the beginning of the object. +<LI> +The starting address of the referenced object is computed. +The page descriptor contains the size of the object(s) +in that page, the object kind, and the necessary mark bits for those +objects. The size information can be used to map the candidate pointer +to the object starting address. To accelerate this process, the page header +also contains a pointer to a precomputed map of page offsets to displacements +from the beginning of an object. The use of this map avoids a +potentially slow integer remainder operation in computing the object +start address. +<LI> +The mark bit for the target object is checked and set. If the object +was previously unmarked, the object is pushed on the mark stack. +The descriptor is read from the page descriptor. (This is computed +from information <TT>GC_obj_kinds</tt> when the page is first allocated.) +</nl> +<P> +At the end of the mark phase, mark bits for left-over free lists are cleared, +in case a free list was accidentally marked due to a stray pointer. + +<H2>Sweep phase</h2> + +At the end of the mark phase, all blocks in the heap are examined. +Unmarked large objects are immediately returned to the large object free list. +Each small object page is checked to see if all mark bits are clear. +If so, the entire page is returned to the large object free list. +Small object pages containing some reachable object are queued for later +sweeping. +<P> +This initial sweep pass touches only block headers, not +the blocks themselves. Thus it does not require significant paging, even +if large sections of the heap are not in physical memory. +<P> +Nonempty small object pages are swept when an allocation attempt +encounters an empty free list for that object size and kind. +Pages for the correct size and kind are repeatedly swept until at +least one empty block is found. Sweeping such a page involves +scanning the mark bit array in the page header, and building a free +list linked through the first words in the objects themselves. +This does involve touching the appropriate data page, but in most cases +it will be touched only just before it is used for allocation. +Hence any paging is essentially unavoidable. +<P> +Except in the case of pointer-free objects, we maintain the invariant +that any object in a small object free list is cleared (except possibly +for the link field). Thus it becomes the burden of the small object +sweep routine to clear objects. This has the advantage that we can +easily recover from accidentally marking a free list, though that could +also be handled by other means. The collector currently spends a fair +amount of time clearing objects, and this approach should probably be +revisited. +<P> +In most configurations, we use specialized sweep routines to handle common +small object sizes. Since we allocate one mark bit per word, it becomes +easier to examine the relevant mark bits if the object size divides +the word length evenly. We also suitably unroll the inner sweep loop +in each case. (It is conceivable that profile-based procedure cloning +in the compiler could make this unnecessary and counterproductive. I +know of no existing compiler to which this applies.) +<P> +The sweeping of small object pages could be avoided completely at the expense +of examining mark bits directly in the allocator. This would probably +be more expensive, since each allocation call would have to reload +a large amount of state (e.g. next object address to be swept, position +in mark bit table) before it could do its work. The current scheme +keeps the allocator simple and allows useful optimizations in the sweeper. + +<H2>Finalization</h2> +Both <TT>GC_register_disappearing_link</tt> and +<TT>GC_register_finalizer</tt> add the request to a corresponding hash +table. The hash table is allocated out of collected memory, but +the reference to the finalizable object is hidden from the collector. +Currently finalization requests are processed non-incrementally at the +end of a mark cycle. +<P> +The collector makes an initial pass over the table of finalizable objects, +pushing the contents of unmarked objects onto the mark stack. +After pushing each object, the marker is invoked to mark all objects +reachable from it. The object itself is not explicitly marked. +This assures that objects on which a finalizer depends are neither +collected nor finalized. +<P> +If in the process of marking from an object the +object itself becomes marked, we have uncovered +a cycle involving the object. This usually results in a warning from the +collector. Such objects are not finalized, since it may be +unsafe to do so. See the more detailed +<A HREF="finalization.html"> discussion of finalization semantics</a>. +<P> +Any objects remaining unmarked at the end of this process are added to +a queue of objects whose finalizers can be run. Depending on collector +configuration, finalizers are dequeued and run either implicitly during +allocation calls, or explicitly in response to a user request. +<P> +The collector provides a mechanism for replacing the procedure that is +used to mark through objects. This is used both to provide support for +Java-style unordered finalization, and to ignore certain kinds of cycles, +<I>e.g.</i> those arising from C++ implementations of virtual inheritance. + +<H2>Generational Collection and Dirty Bits</h2> +We basically use the parallel and generational GC algorithm described in +<A HREF="papers/pldi91.ps.gz">"Mostly Parallel Garbage Collection"</a>, +by Boehm, Demers, and Shenker. +<P> +The most significant modification is that +the collector always runs in the allocating thread. +There is no separate garbage collector thread. +If an allocation attempt either requests a large object, or encounters +an empty small object free list, and notices that there is a collection +in progress, it immediately performs a small amount of marking work +as described above. +<P> +This change was made both because we wanted to easily accommodate +single-threaded environments, and because a separate GC thread requires +very careful control over the scheduler to prevent the mutator from +out-running the collector, and hence provoking unneeded heap growth. +<P> +In incremental mode, the heap is always expanded when we encounter +insufficient space for an allocation. Garbage collection is triggered +whenever we notice that more than +<TT>GC_heap_size</tt>/2 * <TT>GC_free_space_divisor</tt> +bytes of allocation have taken place. +After <TT>GC_full_freq</tt> minor collections a major collection +is started. +<P> +All collections initially run interrupted until a predetermined +amount of time (50 msecs by default) has expired. If this allows +the collection to complete entirely, we can avoid correcting +for data structure modifications during the collection. If it does +not complete, we return control to the mutator, and perform small +amounts of additional GC work during those later allocations that +cannot be satisfied from small object free lists. When marking completes, +the set of modified pages is retrieved, and we mark once again from +marked objects on those pages, this time with the mutator stopped. +<P> +We keep track of modified pages using one of three distinct mechanisms: +<OL> +<LI> +Through explicit mutator cooperation. Currently this requires +the use of <TT>GC_malloc_stubborn</tt>. +<LI> +By write-protecting physical pages and catching write faults. This is +implemented for many Unix-like systems and for win32. It is not possible +in a few environments. +<LI> +By retrieving dirty bit information from /proc. (Currently only Sun's +Solaris supports this. Though this is considerably cleaner, performance +may actually be better with mprotect and signals.) +</ol> + +<H2>Thread support</h2> +We support several different threading models. Unfortunately Pthreads, +the only reasonably well standardized thread model, supports too narrow +an interface for conservative garbage collection. There appears to be +no portable way to allow the collector to coexist with various Pthreads +implementations. Hence we currently support only a few of the more +common Pthreads implementations. +<P> +In particular, it is very difficult for the collector to stop all other +threads in the system and examine the register contents. This is currently +accomplished with very different mechanisms for different Pthreads +implementations. The Solaris implementation temporarily disables much +of the user-level threads implementation by stopping kernel-level threads +("lwp"s). The Irix implementation sends signals to individual Pthreads +and has them wait in the signal handler. The Linux implementation +is similar in spirit to the Irix one. +<P> +The Irix implementation uses +only documented Pthreads calls, but relies on extensions to their semantics, +notably the use of mutexes and condition variables from signal +handlers. The Linux implementation should be far closer to +portable, though impirically it is not completely portable. +<P> +All implementations must +intercept thread creation and a few other thread-specific calls to allow +enumeration of threads and location of thread stacks. This is current +accomplished with <TT># define</tt>'s in <TT>gc.h</tt>, or optionally +by using ld's function call wrapping mechanism under Linux. +<P> +Comments are appreciated. Please send mail to +<A HREF="mailto:boehm@acm.org"><TT>boehm@acm.org</tt></a> +</body> diff --git a/boehm-gc/doc/tree.html b/boehm-gc/doc/tree.html new file mode 100644 index 0000000..89c515d --- /dev/null +++ b/boehm-gc/doc/tree.html @@ -0,0 +1,198 @@ +<HTML> +<HEAD> + <TITLE> Two-Level Tree Structure for Fast Pointer Lookup</TITLE> + <AUTHOR> Hans-J. Boehm, Silicon Graphics</author> +</HEAD> +<BODY> +<H1>Two-Level Tree Structure for Fast Pointer Lookup</h1> +<P> +The conservative garbage collector described +<A HREF="gc.html">here</a> uses a 2-level tree +data structure to aid in fast pointer identification. +This data structure is described in a bit more detail here, since +<OL> +<LI> Variations of the data structure are more generally useful. +<LI> It appears to be hard to understand by reading the code. +<LI> Some other collectors appear to use inferior data structures to +solve the same problem. +<LI> It is central to fast collector operation. +</ol> +A candidate pointer is divided into three sections, the <I>high</i>, +<I>middle</i>, and <I>low</i> bits. The exact division between these +three groups of bits is dependent on the detailed collector configuration. +<P> +The high and middle bits are used to look up an entry in the table described +here. The resulting table entry consists of either a block descriptor +(<TT>struct hblkhdr *</tt> or <TT>hdr *</tt>) +identifying the layout of objects in the block, or an indication that this +address range corresponds to the middle of a large block, together with a +hint for locating the actual block descriptor. Such a hint consist +of a displacement that can be subtracted from the middle bits of the candidate +pointer without leaving the object. +<P> +In either case, the block descriptor (<TT>struct hblkhdr</tt>) +refers to a table of object starting addresses (the <TT>hb_map</tt> field). +The starting address table is indexed by the low bits if the candidate pointer. +The resulting entry contains a displacement to the beginning of the object, +or an indication that this cannot be a valid object pointer. +(If all interior pointer are recognized, pointers into large objects +are handled specially, as appropriate.) + +<H2>The Tree</h2> +<P> +The rest of this discussion focuses on the two level data structure +used to map the high and middle bits to the block descriptor. +<P> +The high bits are used as an index into the <TT>GC_top_index</tt> (really +<TT>GC_arrays._top_index</tt>) array. Each entry points to a +<TT>bottom_index</tt> data structure. This structure in turn consists +mostly of an array <TT>index</tt> indexed by the middle bits of +the candidate pointer. The <TT>index</tt> array contains the actual +<TT>hdr</tt> pointers. +<P> +Thus a pointer lookup consists primarily of a handful of memory references, +and can be quite fast: +<OL> +<LI> The appropriate <TT>bottom_index</tt> pointer is looked up in +<TT>GC_top_index</tt>, based on the high bits of the candidate pointer. +<LI> The appropriate <TT>hdr</tt> pointer is looked up in the +<TT>bottom_index</tt> structure, based on the middle bits. +<LI> The block layout map pointer is retrieved from the <TT>hdr</tt> +structure. (This memory reference is necessary since we try to share +block layout maps.) +<LI> The displacement to the beginning of the object is retrieved from the +above map. +</ol> +<P> +In order to conserve space, not all <TT>GC_top_index</tt> entries in fact +point to distinct <TT>bottom_index</tt> structures. If no address with +the corresponding high bits is part of the heap, then the entry points +to <TT>GC_all_nils</tt>, a single <TT>bottom_index</tt> structure consisting +only of NULL <TT>hdr</tt> pointers. +<P> +<TT>Bottom_index</tt> structures contain slightly more information than +just <TT>hdr</tt> pointers. The <TT>asc_link</tt> field is used to link +all <TT>bottom_index</tt> structures in ascending order for fast traversal. +This list is pointed to be <TT>GC_all_bottom_indices</tt>. +It is maintained with the aid of <TT>key</tt> field that contains the +high bits corresponding to the <TT>bottom_index</tt>. + +<H2>64 bit addresses</h2> +<P> +In the case of 64 bit addresses, this picture is complicated slightly +by the fact that one of the index structures would have to be huge to +cover the entire address space with a two level tree. We deal with this +by turning <TT>GC_top_index</tt> into a chained hash table, instead of +a simple array. This adds a <TT>hash_link</tt> field to the +<TT>bottom_index</tt> structure. +<P> +The "hash function" consists of dropping the high bits. This is cheap to +compute, and guarantees that there will be no collisions if the heap +is contiguous and not excessively large. + +<H2>A picture</h2> +<P> +The following is an ASCII diagram of the data structure. +This was contributed by Dave Barrett several years ago. +<PRE> + + Data Structure used by GC_base in gc3.7: + 21-Apr-94 + + + + + 63 LOG_TOP_SZ[11] LOG_BOTTOM_SZ[10] LOG_HBLKSIZE[13] + +------------------+----------------+------------------+------------------+ + p:| | TL_HASH(hi) | | HBLKDISPL(p) | + +------------------+----------------+------------------+------------------+ + \-----------------------HBLKPTR(p)-------------------/ + \------------hi-------------------/ + \______ ________/ \________ _______/ \________ _______/ + V V V + | | | + GC_top_index[] | | | + --- +--------------+ | | | + ^ | | | | | + | | | | | | + TOP +--------------+<--+ | | + _SZ +-<| [] | * | | +(items)| +--------------+ if 0 < bi< HBLKSIZE | | + | | | | then large object | | + | | | | starts at the bi'th | | + v | | | HBLK before p. | i | + --- | +--------------+ | (word- | + v | aligned) | + bi= |GET_BI(p){->hash_link}->key==hi | | + v | | + | (bottom_index) \ scratch_alloc'd | | + | ( struct bi ) / by get_index() | | + --- +->+--------------+ | | + ^ | | | | + ^ | | | | + BOTTOM | | ha=GET_HDR_ADDR(p) | | +_SZ(items)+--------------+<----------------------+ +-------+ + | +--<| index[] | | + | | +--------------+ GC_obj_map: v + | | | | from / +-+-+-----+-+-+-+-+ --- + v | | | GC_add < 0| | | | | | | | ^ + --- | +--------------+ _map_entry \ +-+-+-----+-+-+-+-+ | + | | asc_link | +-+-+-----+-+-+-+-+ MAXOBJSZ + | +--------------+ +-->| | | j | | | | | +1 + | | key | | +-+-+-----+-+-+-+-+ | + | +--------------+ | +-+-+-----+-+-+-+-+ | + | | hash_link | | | | | | | | | | v + | +--------------+ | +-+-+-----+-+-+-+-+ --- + | | |<--MAX_OFFSET--->| + | | (bytes) +HDR(p)| GC_find_header(p) | |<--MAP_ENTRIES-->| + | \ from | =HBLKSIZE/WORDSZ + | (hdr) (struct hblkhdr) / alloc_hdr() | (1024 on Alpha) + +-->+----------------------+ | (8/16 bits each) +GET_HDR(p)| word hb_sz (words) | | + +----------------------+ | + | struct hblk *hb_next | | + +----------------------+ | + |mark_proc hb_mark_proc| | + +----------------------+ | + | char * hb_map |>-------------+ + +----------------------+ + | ushort hb_obj_kind | + +----------------------+ + | hb_last_reclaimed | + --- +----------------------+ + ^ | | + MARK_BITS| hb_marks[] | *if hdr is free, hb_sz + DISCARD_WORDS +_SZ(words)| | is the size of a heap chunk (struct hblk) + v | | of at least MININCR*HBLKSIZE bytes (below), + --- +----------------------+ otherwise, size of each object in chunk. + +Dynamic data structures above are interleaved throughout the heap in blocks of +size MININCR * HBLKSIZE bytes as done by gc_scratch_alloc which cannot be +freed; free lists are used (e.g. alloc_hdr). HBLK's below are collected. + + (struct hblk) + --- +----------------------+ < HBLKSIZE --- --- DISCARD_ + ^ |garbage[DISCARD_WORDS]| aligned ^ ^ HDR_BYTES WORDS + | | | | v (bytes) (words) + | +-----hb_body----------+ < WORDSZ | --- --- + | | | aligned | ^ ^ + | | Object 0 | | hb_sz | + | | | i |(word- (words)| + | | | (bytes)|aligned) v | + | + - - - - - - - - - - -+ --- | --- | + | | | ^ | ^ | + n * | | j (words) | hb_sz BODY_SZ + HBLKSIZE | Object 1 | v v | (words) + (bytes) | |--------------- v MAX_OFFSET + | + - - - - - - - - - - -+ --- (bytes) + | | | !All_INTERIOR_PTRS ^ | + | | | sets j only for hb_sz | + | | Object N | valid object offsets. | | + v | | All objects WORDSZ v v + --- +----------------------+ aligned. --- --- + +DISCARD_WORDS is normally zero. Indeed the collector has not been tested +with another value in ages. +</pre> +</body> |