diff options
Diffstat (limited to 'gcc/dwarfout.c')
-rw-r--r-- | gcc/dwarfout.c | 542 |
1 files changed, 542 insertions, 0 deletions
diff --git a/gcc/dwarfout.c b/gcc/dwarfout.c index 7db0def..f319187 100644 --- a/gcc/dwarfout.c +++ b/gcc/dwarfout.c @@ -20,6 +20,548 @@ along with GNU CC; see the file COPYING. If not, write to the Free Software Foundation, 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ +/* + + Notes on the GNU Implementation of DWARF Debugging Information + -------------------------------------------------------------- + Last Major Update: Sun Jul 17 08:17:42 PDT 1994 by rfg@segfault.us.com + ------------------------------------------------------------ + + This file describes special and unique aspects of the GNU implementation of + the DWARF Version 1 debugging information language, as provided in the GNU + version 2.x compiler(s). + + For general information about the DWARF debugging information language, + you should obtain the DWARF version 1.1 specification document (and perhaps + also the DWARF version 2 draft specification document) developed by the + (now defunct) UNIX International Programming Languages Special Interest Group. + + To obtain a copy of the DWARF Version 1 and/or DWARF Version 2 + specification, visit the web page for the DWARF Version 2 committee, at + + http://www.eagercon.com/dwarf/dwarf2std.htm + + The generation of DWARF debugging information by the GNU version 2.x C + compiler has now been tested rather extensively for m88k, i386, i860, and + Sparc targets. The DWARF output of the GNU C compiler appears to inter- + operate well with the standard SVR4 SDB debugger on these kinds of target + systems (but of course, there are no guarantees). + + DWARF 1 generation for the GNU g++ compiler is implemented, but limited. + C++ users should definitely use DWARF 2 instead. + + Future plans for the dwarfout.c module of the GNU compiler(s) includes the + addition of full support for GNU FORTRAN. (This should, in theory, be a + lot simpler to add than adding support for g++... but we'll see.) + + Many features of the DWARF version 2 specification have been adapted to + (and used in) the GNU implementation of DWARF (version 1). In most of + these cases, a DWARF version 2 approach is used in place of (or in addition + to) DWARF version 1 stuff simply because it is apparent that DWARF version + 1 is not sufficiently expressive to provide the kinds of information which + may be necessary to support really robust debugging. In all of these cases + however, the use of DWARF version 2 features should not interfere in any + way with the interoperability (of GNU compilers) with generally available + "classic" (pre version 1) DWARF consumer tools (e.g. SVR4 SDB). + + The DWARF generation enhancement for the GNU compiler(s) was initially + donated to the Free Software Foundation by Network Computing Devices. + (Thanks NCD!) Additional development and maintenance of dwarfout.c has + been largely supported (i.e. funded) by Intel Corporation. (Thanks Intel!) + + If you have questions or comments about the DWARF generation feature, please + send mail to me <rfg@netcom.com>. I will be happy to investigate any bugs + reported and I may even provide fixes (but of course, I can make no promises). + + The DWARF debugging information produced by GCC may deviate in a few minor + (but perhaps significant) respects from the DWARF debugging information + currently produced by other C compilers. A serious attempt has been made + however to conform to the published specifications, to existing practice, + and to generally accepted norms in the GNU implementation of DWARF. + + ** IMPORTANT NOTE ** ** IMPORTANT NOTE ** ** IMPORTANT NOTE ** + + Under normal circumstances, the DWARF information generated by the GNU + compilers (in an assembly language file) is essentially impossible for + a human being to read. This fact can make it very difficult to debug + certain DWARF-related problems. In order to overcome this difficulty, + a feature has been added to dwarfout.c (enabled by the -dA + option) which causes additional comments to be placed into the assembly + language output file, out to the right-hand side of most bits of DWARF + material. The comments indicate (far more clearly that the obscure + DWARF hex codes do) what is actually being encoded in DWARF. Thus, the + -dA option can be highly useful for those who must study the + DWARF output from the GNU compilers in detail. + + --------- + + (Footnote: Within this file, the term `Debugging Information Entry' will + be abbreviated as `DIE'.) + + + Release Notes (aka known bugs) + ------------------------------- + + In one very obscure case involving dynamically sized arrays, the DWARF + "location information" for such an array may make it appear that the + array has been totally optimized out of existence, when in fact it + *must* actually exist. (This only happens when you are using *both* -g + *and* -O.) This is due to aggressive dead store elimination in the + compiler, and to the fact that the DECL_RTL expressions associated with + variables are not always updated to correctly reflect the effects of + GCC's aggressive dead store elimination. + + ------------------------------- + + When attempting to set a breakpoint at the "start" of a function compiled + with -g1, the debugger currently has no way of knowing exactly where the + end of the prologue code for the function is. Thus, for most targets, + all the debugger can do is to set the breakpoint at the AT_low_pc address + for the function. But if you stop there and then try to look at one or + more of the formal parameter values, they may not have been "homed" yet, + so you may get inaccurate answers (or perhaps even addressing errors). + + Some people may consider this simply a non-feature, but I consider it a + bug, and I hope to provide some GNU-specific attributes (on function + DIEs) which will specify the address of the end of the prologue and the + address of the beginning of the epilogue in a future release. + + ------------------------------- + + It is believed at this time that old bugs relating to the AT_bit_offset + values for bit-fields have been fixed. + + There may still be some very obscure bugs relating to the DWARF description + of type `long long' bit-fields for target machines (e.g. 80x86 machines) + where the alignment of type `long long' data objects is different from + (and less than) the size of a type `long long' data object. + + Please report any problems with the DWARF description of bit-fields as you + would any other GCC bug. (Procedures for bug reporting are given in the + GNU C compiler manual.) + + -------------------------------- + + At this time, GCC does not know how to handle the GNU C "nested functions" + extension. (See the GCC manual for more info on this extension to ANSI C.) + + -------------------------------- + + The GNU compilers now represent inline functions (and inlined instances + thereof) in exactly the manner described by the current DWARF version 2 + (draft) specification. The version 1 specification for handling inline + functions (and inlined instances) was known to be brain-damaged (by the + PLSIG) when the version 1 spec was finalized, but it was simply too late + in the cycle to get it removed before the version 1 spec was formally + released to the public (by UI). + + -------------------------------- + + At this time, GCC does not generate the kind of really precise information + about the exact declared types of entities with signed integral types which + is required by the current DWARF draft specification. + + Specifically, the current DWARF draft specification seems to require that + the type of an non-unsigned integral bit-field member of a struct or union + type be represented as either a "signed" type or as a "plain" type, + depending upon the exact set of keywords that were used in the + type specification for the given bit-field member. It was felt (by the + UI/PLSIG) that this distinction between "plain" and "signed" integral types + could have some significance (in the case of bit-fields) because ANSI C + does not constrain the signedness of a plain bit-field, whereas it does + constrain the signedness of an explicitly "signed" bit-field. For this + reason, the current DWARF specification calls for compilers to produce + type information (for *all* integral typed entities... not just bit-fields) + which explicitly indicates the signedness of the relevant type to be + "signed" or "plain" or "unsigned". + + Unfortunately, the GNU DWARF implementation is currently incapable of making + such distinctions. + + -------------------------------- + + + Known Interoperability Problems + ------------------------------- + + Although the GNU implementation of DWARF conforms (for the most part) with + the current UI/PLSIG DWARF version 1 specification (with many compatible + version 2 features added in as "vendor specific extensions" just for good + measure) there are a few known cases where GCC's DWARF output can cause + some confusion for "classic" (pre version 1) DWARF consumers such as the + System V Release 4 SDB debugger. These cases are described in this section. + + -------------------------------- + + The DWARF version 1 specification includes the fundamental type codes + FT_ext_prec_float, FT_complex, FT_dbl_prec_complex, and FT_ext_prec_complex. + Since GNU C is only a C compiler (and since C doesn't provide any "complex" + data types) the only one of these fundamental type codes which GCC ever + generates is FT_ext_prec_float. This fundamental type code is generated + by GCC for the `long double' data type. Unfortunately, due to an apparent + bug in the SVR4 SDB debugger, SDB can become very confused wherever any + attempt is made to print a variable, parameter, or field whose type was + given in terms of FT_ext_prec_float. + + (Actually, SVR4 SDB fails to understand *any* of the four fundamental type + codes mentioned here. This will fact will cause additional problems when + there is a GNU FORTRAN front-end.) + + -------------------------------- + + In general, it appears that SVR4 SDB is not able to effectively ignore + fundamental type codes in the "implementation defined" range. This can + cause problems when a program being debugged uses the `long long' data + type (or the signed or unsigned varieties thereof) because these types + are not defined by ANSI C, and thus, GCC must use its own private fundamental + type codes (from the implementation-defined range) to represent these types. + + -------------------------------- + + + General GNU DWARF extensions + ---------------------------- + + In the current DWARF version 1 specification, no mechanism is specified by + which accurate information about executable code from include files can be + properly (and fully) described. (The DWARF version 2 specification *does* + specify such a mechanism, but it is about 10 times more complicated than + it needs to be so I'm not terribly anxious to try to implement it right + away.) + + In the GNU implementation of DWARF version 1, a fully downward-compatible + extension has been implemented which permits the GNU compilers to specify + which executable lines come from which files. This extension places + additional information (about source file names) in GNU-specific sections + (which should be totally ignored by all non-GNU DWARF consumers) so that + this extended information can be provided (to GNU DWARF consumers) in a way + which is totally transparent (and invisible) to non-GNU DWARF consumers + (e.g. the SVR4 SDB debugger). The additional information is placed *only* + in specialized GNU-specific sections, where it should never even be seen + by non-GNU DWARF consumers. + + To understand this GNU DWARF extension, imagine that the sequence of entries + in the .lines section is broken up into several subsections. Each contiguous + sequence of .line entries which relates to a sequence of lines (or statements) + from one particular file (either a `base' file or an `include' file) could + be called a `line entries chunk' (LEC). + + For each LEC there is one entry in the .debug_srcinfo section. + + Each normal entry in the .debug_srcinfo section consists of two 4-byte + words of data as follows: + + (1) The starting address (relative to the entire .line section) + of the first .line entry in the relevant LEC. + + (2) The starting address (relative to the entire .debug_sfnames + section) of a NUL terminated string representing the + relevant filename. (This filename name be either a + relative or an absolute filename, depending upon how the + given source file was located during compilation.) + + Obviously, each .debug_srcinfo entry allows you to find the relevant filename, + and it also points you to the first .line entry that was generated as a result + of having compiled a given source line from the given source file. + + Each subsequent .line entry should also be assumed to have been produced + as a result of compiling yet more lines from the same file. The end of + any given LEC is easily found by looking at the first 4-byte pointer in + the *next* .debug_srcinfo entry. That next .debug_srcinfo entry points + to a new and different LEC, so the preceding LEC (implicitly) must have + ended with the last .line section entry which occurs at the 2 1/2 words + just before the address given in the first pointer of the new .debug_srcinfo + entry. + + The following picture may help to clarify this feature. Let's assume that + `LE' stands for `.line entry'. Also, assume that `* 'stands for a pointer. + + + .line section .debug_srcinfo section .debug_sfnames section + ---------------------------------------------------------------- + + LE <---------------------- * + LE * -----------------> "foobar.c" <--- + LE | + LE | + LE <---------------------- * | + LE * -----------------> "foobar.h" <| | + LE | | + LE | | + LE <---------------------- * | | + LE * -----------------> "inner.h" | | + LE | | + LE <---------------------- * | | + LE * ------------------------------- | + LE | + LE | + LE | + LE | + LE <---------------------- * | + LE * ----------------------------------- + LE + LE + LE + + In effect, each entry in the .debug_srcinfo section points to *both* a + filename (in the .debug_sfnames section) and to the start of a block of + consecutive LEs (in the .line section). + + Note that just like in the .line section, there are specialized first and + last entries in the .debug_srcinfo section for each object file. These + special first and last entries for the .debug_srcinfo section are very + different from the normal .debug_srcinfo section entries. They provide + additional information which may be helpful to a debugger when it is + interpreting the data in the .debug_srcinfo, .debug_sfnames, and .line + sections. + + The first entry in the .debug_srcinfo section for each compilation unit + consists of five 4-byte words of data. The contents of these five words + should be interpreted (by debuggers) as follows: + + (1) The starting address (relative to the entire .line section) + of the .line section for this compilation unit. + + (2) The starting address (relative to the entire .debug_sfnames + section) of the .debug_sfnames section for this compilation + unit. + + (3) The starting address (in the execution virtual address space) + of the .text section for this compilation unit. + + (4) The ending address plus one (in the execution virtual address + space) of the .text section for this compilation unit. + + (5) The date/time (in seconds since midnight 1/1/70) at which the + compilation of this compilation unit occurred. This value + should be interpreted as an unsigned quantity because gcc + might be configured to generate a default value of 0xffffffff + in this field (in cases where it is desired to have object + files created at different times from identical source files + be byte-for-byte identical). By default, these timestamps + are *not* generated by dwarfout.c (so that object files + compiled at different times will be byte-for-byte identical). + If you wish to enable this "timestamp" feature however, you + can simply place a #define for the symbol `DWARF_TIMESTAMPS' + in your target configuration file and then rebuild the GNU + compiler(s). + + Note that the first string placed into the .debug_sfnames section for each + compilation unit is the name of the directory in which compilation occurred. + This string ends with a `/' (to help indicate that it is the pathname of a + directory). Thus, the second word of each specialized initial .debug_srcinfo + entry for each compilation unit may be used as a pointer to the (string) + name of the compilation directory, and that string may in turn be used to + "absolutize" any relative pathnames which may appear later on in the + .debug_sfnames section entries for the same compilation unit. + + The fifth and last word of each specialized starting entry for a compilation + unit in the .debug_srcinfo section may (depending upon your configuration) + indicate the date/time of compilation, and this may be used (by a debugger) + to determine if any of the source files which contributed code to this + compilation unit are newer than the object code for the compilation unit + itself. If so, the debugger may wish to print an "out-of-date" warning + about the compilation unit. + + The .debug_srcinfo section associated with each compilation will also have + a specialized terminating entry. This terminating .debug_srcinfo section + entry will consist of the following two 4-byte words of data: + + (1) The offset, measured from the start of the .line section to + the beginning of the terminating entry for the .line section. + + (2) A word containing the value 0xffffffff. + + -------------------------------- + + In the current DWARF version 1 specification, no mechanism is specified by + which information about macro definitions and un-definitions may be provided + to the DWARF consumer. + + The DWARF version 2 (draft) specification does specify such a mechanism. + That specification was based on the GNU ("vendor specific extension") + which provided some support for macro definitions and un-definitions, + but the "official" DWARF version 2 (draft) specification mechanism for + handling macros and the GNU implementation have diverged somewhat. I + plan to update the GNU implementation to conform to the "official" + DWARF version 2 (draft) specification as soon as I get time to do that. + + Note that in the GNU implementation, additional information about macro + definitions and un-definitions is *only* provided when the -g3 level of + debug-info production is selected. (The default level is -g2 and the + plain old -g option is considered to be identical to -g2.) + + GCC records information about macro definitions and undefinitions primarily + in a section called the .debug_macinfo section. Normal entries in the + .debug_macinfo section consist of the following three parts: + + (1) A special "type" byte. + + (2) A 3-byte line-number/filename-offset field. + + (3) A NUL terminated string. + + The interpretation of the second and third parts is dependent upon the + value of the leading (type) byte. + + The type byte may have one of four values depending upon the type of the + .debug_macinfo entry which follows. The 1-byte MACINFO type codes presently + used, and their meanings are as follows: + + MACINFO_start A base file or an include file starts here. + MACINFO_resume The current base or include file ends here. + MACINFO_define A #define directive occurs here. + MACINFO_undef A #undef directive occur here. + + (Note that the MACINFO_... codes mentioned here are simply symbolic names + for constants which are defined in the GNU dwarf.h file.) + + For MACINFO_define and MACINFO_undef entries, the second (3-byte) field + contains the number of the source line (relative to the start of the current + base source file or the current include files) when the #define or #undef + directive appears. For a MACINFO_define entry, the following string field + contains the name of the macro which is defined, followed by its definition. + Note that the definition is always separated from the name of the macro + by at least one whitespace character. For a MACINFO_undef entry, the + string which follows the 3-byte line number field contains just the name + of the macro which is being undef'ed. + + For a MACINFO_start entry, the 3-byte field following the type byte contains + the offset, relative to the start of the .debug_sfnames section for the + current compilation unit, of a string which names the new source file which + is beginning its inclusion at this point. Following that 3-byte field, + each MACINFO_start entry always contains a zero length NUL terminated + string. + + For a MACINFO_resume entry, the 3-byte field following the type byte contains + the line number WITHIN THE INCLUDING FILE at which the inclusion of the + current file (whose inclusion ends here) was initiated. Following that + 3-byte field, each MACINFO_resume entry always contains a zero length NUL + terminated string. + + Each set of .debug_macinfo entries for each compilation unit is terminated + by a special .debug_macinfo entry consisting of a 4-byte zero value followed + by a single NUL byte. + + -------------------------------- + + In the current DWARF draft specification, no provision is made for providing + a separate level of (limited) debugging information necessary to support + tracebacks (only) through fully-debugged code (e.g. code in system libraries). + + A proposal to define such a level was submitted (by me) to the UI/PLSIG. + This proposal was rejected by the UI/PLSIG for inclusion into the DWARF + version 1 specification for two reasons. First, it was felt (by the PLSIG) + that the issues involved in supporting a "traceback only" subset of DWARF + were not well understood. Second, and perhaps more importantly, the PLSIG + is already having enough trouble agreeing on what it means to be "conforming" + to the DWARF specification, and it was felt that trying to specify multiple + different *levels* of conformance would only complicate our discussions of + this already divisive issue. Nonetheless, the GNU implementation of DWARF + provides an abbreviated "traceback only" level of debug-info production for + use with fully-debugged "system library" code. This level should only be + used for fully debugged system library code, and even then, it should only + be used where there is a very strong need to conserve disk space. This + abbreviated level of debug-info production can be used by specifying the + -g1 option on the compilation command line. + + -------------------------------- + + As mentioned above, the GNU implementation of DWARF currently uses the DWARF + version 2 (draft) approach for inline functions (and inlined instances + thereof). This is used in preference to the version 1 approach because + (quite simply) the version 1 approach is highly brain-damaged and probably + unworkable. + + -------------------------------- + + + GNU DWARF Representation of GNU C Extensions to ANSI C + ------------------------------------------------------ + + The file dwarfout.c has been designed and implemented so as to provide + some reasonable DWARF representation for each and every declarative + construct which is accepted by the GNU C compiler. Since the GNU C + compiler accepts a superset of ANSI C, this means that there are some + cases in which the DWARF information produced by GCC must take some + liberties in improvising DWARF representations for declarations which + are only valid in (extended) GNU C. + + In particular, GNU C provides at least three significant extensions to + ANSI C when it comes to declarations. These are (1) inline functions, + and (2) dynamic arrays, and (3) incomplete enum types. (See the GCC + manual for more information on these GNU extensions to ANSI C.) When + used, these GNU C extensions are represented (in the generated DWARF + output of GCC) in the most natural and intuitively obvious ways. + + In the case of inline functions, the DWARF representation is exactly as + called for in the DWARF version 2 (draft) specification for an identical + function written in C++; i.e. we "reuse" the representation of inline + functions which has been defined for C++ to support this GNU C extension. + + In the case of dynamic arrays, we use the most obvious representational + mechanism available; i.e. an array type in which the upper bound of + some dimension (usually the first and only dimension) is a variable + rather than a constant. (See the DWARF version 1 specification for more + details.) + + In the case of incomplete enum types, such types are represented simply + as TAG_enumeration_type DIEs which DO NOT contain either AT_byte_size + attributes or AT_element_list attributes. + + -------------------------------- + + + Future Directions + ----------------- + + The codes, formats, and other paraphernalia necessary to provide proper + support for symbolic debugging for the C++ language are still being worked + on by the UI/PLSIG. The vast majority of the additions to DWARF which will + be needed to completely support C++ have already been hashed out and agreed + upon, but a few small issues (e.g. anonymous unions, access declarations) + are still being discussed. Also, we in the PLSIG are still discussing + whether or not we need to do anything special for C++ templates. (At this + time it is not yet clear whether we even need to do anything special for + these.) + + With regard to FORTRAN, the UI/PLSIG has defined what is believed to be a + complete and sufficient set of codes and rules for adequately representing + all of FORTRAN 77, and most of Fortran 90 in DWARF. While some support for + this has been implemented in dwarfout.c, further implementation and testing + is needed. + + GNU DWARF support for other languages (i.e. Pascal and Modula) is a moot + issue until there are GNU front-ends for these other languages. + + As currently defined, DWARF only describes a (binary) language which can + be used to communicate symbolic debugging information from a compiler + through an assembler and a linker, to a debugger. There is no clear + specification of what processing should be (or must be) done by the + assembler and/or the linker. Fortunately, the role of the assembler + is easily inferred (by anyone knowledgeable about assemblers) just by + looking at examples of assembly-level DWARF code. Sadly though, the + allowable (or required) processing steps performed by a linker are + harder to infer and (perhaps) even harder to agree upon. There are + several forms of very useful `post-processing' steps which intelligent + linkers *could* (in theory) perform on object files containing DWARF, + but any and all such link-time transformations are currently both disallowed + and unspecified. + + In particular, possible link-time transformations of DWARF code which could + provide significant benefits include (but are not limited to): + + Commonization of duplicate DIEs obtained from multiple input + (object) files. + + Cross-compilation type checking based upon DWARF type information + for objects and functions. + + Other possible `compacting' transformations designed to save disk + space and to reduce linker & debugger I/O activity. + +*/ + #include "config.h" #ifdef DWARF_DEBUGGING_INFO |