diff options
Diffstat (limited to 'ld/ld.texinfo')
-rw-r--r-- | ld/ld.texinfo | 1014 |
1 files changed, 1014 insertions, 0 deletions
diff --git a/ld/ld.texinfo b/ld/ld.texinfo new file mode 100644 index 0000000..1764ad5 --- /dev/null +++ b/ld/ld.texinfo @@ -0,0 +1,1014 @@ +\input texinfo +@parindent=0pt +@setfilename gld +@c @@setchapternewpage odd +@settitle GLD, The GNU linker +@titlepage +@title{gld} +@subtitle{The gnu loader} +@sp 1 +@subtitle Second Edition---gld version 2.0 +@subtitle January 1991 +@vskip 0pt plus 1filll +Copyright @copyright{} 1991 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided also that +the entire resulting derived work is distributed under the terms of a +permission notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. + +@author {Steve Chamberlain} +@author {Cygnus Support} +@author {steve@@cygnus.com} +@end titlepage + +@node Top,,, +@comment node-name, next, previous, up +@ifinfo +This file documents the GNU linker gld. +@end ifinfo + +@c chapter What does a linker do ? +@c chapter Command Language +@noindent +@chapter Overview + + +The @code{gld} command combines a number of object and archive files, +relocates their data and ties up symbol references. Often the last +step in building a new compiled program to run is a call to @code{gld}. + +The @code{gld} command accepts Linker Command Language files in +a superset of AT+T's Link Editor Command Language syntax, +to provide explict and total control over the linking process. + +This version of @code{gld} uses the general purpose @code{bfd} libraries +to operate on object files. This allows @code{gld} to read and +write any of the formats supported by @code{bfd}, different +formats may be linked together producing any available object file. + +Supported formats: +@itemize @bullet +@item +Sun3 68k a.out +@item +IEEE-695 68k Object Module Format +@item +Oasys 68k Binary Relocatable Object File Format +@item +Sun4 sparc a.out +@item +88k bcs coff +@item +i960 coff little endian +@item +i960 coff big endian +@item +i960 b.out little endian +@item +i960 b.out big endian +@item +s-records +@end itemize + +When linking similar formats, @code{gld} maintains all debugging +information. + +@chapter Command line options + +@example + gld [ -Bstatic ] [ -D @var{datasize} ] + [ -c @var{filename} ] + [ -d ] | [ -dc ] | [ -dp ] + [ -i ] + [ -e @var{entry} ] [ -l @var{arch} ] [ -L @var{searchdir} ] [ -M ] + [ -N | -n | -z ] [ -noinhibit-exec ] [ -r ] [ -S ] [ -s ] + [ -f @var{fill} ] + [ -T @var{textorg} ] [ -Tdata @var{dataorg} ] [ -t ] [ -u @var{sym}] + [ -X ] [ -x ] + [-o @var{output} ] @var{objfiles}@dots{} +@end example + +Command-line options to GNU @code{gld} may be specified in any order, and +may be repeated at will. For the most part, repeating an option with a +different argument will either have no further effect, or override prior +occurrences (those further to the left on the command line) of an +option. + +The exceptions which may meaningfully be present several times +are @code{-L}, @code{-l}, and @code{-u}. + +@var{objfiles} may follow, precede, or be mixed in with +command-line options; save that an @var{objfiles} argument may not be +placed between an option flag and its argument. + +Option arguments must follow the option letter without intervening +whitespace, or be given as separate arguments immediately following the +option that requires them. + +@table @code +@item @var{objfiles}@dots{} +The object files @var{objfiles} to be linked; at least one must be specified. + +@item -Bstatic +This flag is accepted for command-line compatibility with the SunOS linker, +but has no effect on @code{gld}. + +@item -c @var{commandfile} +Directs @code{gld} to read linkage commands from the file @var{commandfile}. + +@item -D @var{datasize} +Use this option to specify a target size for the @code{data} segment of +your linked program. The option is only obeyed if @var{datasize} is +larger than the natural size of the program's @code{data} segment. + +@var{datasize} must be an integer specified in hexadecimal. + +@code{ld} will simply increase the size of the @code{data} segment, +padding the created gap with zeros, and reduce the size of the +@code{bss} segment to match. + +@item -d +Force @code{ld} to assign space to common symbols +even if a relocatable output file is specified (@code{-r}). + +@item -dc | -dp +This flags is accepted for command-line compatibility with the SunOS linker, +but has no effect on @code{gld}. + +@item -e @var{entry} +Use @var{entry} as the explicit symbol for beginning execution of your +program, rather than the default entry point. If this symbol is +not specified, the symbol @code{start} is used as the entry address. +If there is no symbol called @code{start}, then the entry address +is set to the first address in the first output section +(usually the @samp{text} section). + +@item -f @var{fill} +Sets the default fill pattern for ``holes'' in the output file to +the lowest two bytes of the expression specified. + +@item -i +Produce an incremental link (same as option @code{-r}). + +@item -l @var{arch} +Add an archive file @var{arch} to the list of files to link. This +option may be used any number of times. @code{ld} will search its +path-list for occurrences of @code{lib@var{arch}.a} for every @var{arch} +specified. + +@c This also has a side effect of using the "c++ demangler" if we happen +@c to specify -llibg++. Document? pesch@@cygnus.com, 24jan91 + +@item -L @var{searchdir} +This command adds path @var{searchdir} to the +list of paths that @code{gld} will search for archive libraries. You +may use this option any number of times. + +@c Should we make any attempt to list the standard paths searched +@c without listing? When hacking on a new system I often want to know +@c this, but this may not be the place... it's not constant across +@c systems, of course, which is what makes it interesting. +@c pesch@@cygnus.com, 24jan91. + +@item -M +@itemx -m +Print (to the standard output file) a link map---diagnostic information +about where symbols are mapped by @code{ld}, and information on global +common storage allocation. + +@item -N +specifies read and writable @code{text} and @code{data} sections. If +the output format supports Unix style magic numbers, then OMAGIC is set. + +@item -n +sets the text segment to be read only, and @code{NMAGIC} is written +if possible. + +@item -o @var{output} +@var{output} is a name for the program produced by @code{ld}; if this +option is not specified, the name @samp{a.out} is used by default. + +@item -r +Generates relocatable output---i.e., generate an output file that can in +turn serve as input to @code{gld}. As a side effect, this option also +sets the output file's magic number to @code{OMAGIC}; see @samp{-N}. If this +option is not specified, an absolute file is produced. + +@item -S +Omits debugger symbol information (but not all symbols) from the output file. + +@item -s +Omits all symbol information from the output file. + +@item -T @var{textorg} +@itemx -Ttext @var{textorg} +Use @var{textorg} as the starting address for the @code{text} segment of the +output file. Both forms of this option are equivalent. The option +argument must be a hexadecimal integer. + +@item -Tdata @var{dataorg} +Use @var{dataorg} as the starting address for the @code{data} segment of +the output file. The option argument must be a hexadecimal integer. + +@item -t +Prints names of input files as @code{ld} processes them. + +@item -u @var{sym} +Forces @var{sym} to be entered in the output file as an undefined symbol. +This may, for example, trigger linking of additional modules from +standard libraries. @code{-u} may be repeated with different option +arguments to enter additional undefined symbols. This option is equivalent +to the @code{EXTERN} linker command. + +@item -X +If @code{-s} or @code{-S} is also specified, delete only local symbols +beginning with @samp{L}. + +@item -z +@code{-z} sets @code{ZMAGIC}, the default: the @code{text} segment is +read-only, demand pageable, and shared. + +Specifying a relocatable output file (@code{-r}) will also set the magic +number to @code{OMAGIC}. + +See description of @samp{-N}. + + +@end table +@chapter Command Language + + +The command language allows explicit control over the linkage process, allowing +specification of: +@table @bullet +@item input files +@item file formats +@item output file format +@item addresses of sections +@item placement of common blocks +@item and more +@end table + +A command file may be supplied to the linker, either explicitly through the +@code{-c} option, or implicitly as an ordinary file. If the linker opens +a file which does not have a reasonable object or archive format, it tries +to read the file as if it were a command file. +@section Structure +To be added + +@section Expressions +The syntax for expressions in the command language is identical to that of +C expressions, with the following features: +@table @bullet +@item All expressions evaluated as integers and +are of ``long'' or ``unsigned long'' type. +@item All constants are integers. +@item All of the C arithmetic operators are provided. +@item Global variables may be referenced, defined and created. +@item Build in functions may be called. +@end table + +@section Expressions + +The linker has a practice of ``lazy evaluation'' for expressions; it only +calculates an expression when absolutely necessary. For instance, +when the linker reads in the command file it has to know the values +of the start address and the length of the memory regions for linkage to continue, so these +values are worked out, but other values (such as symbol values) are not +known or needed until after storage allocation. +They are evaluated later, when the other +information, such as the sizes of output sections are available for use in +the symbol assignment expression. + +When a linker expression is evaluated and assigned to a variable it is given +either an absolute or a relocatable type. An absolute expression type +is one in which the symbol contains the value that it will have in the +output file, a relocateable expression type is one in which the value +is expressed as a fixed offset from the base of a section. + +The type of the expression is controlled by its position in the script +file. A symbol assigned within a @code{SECTION} specification is +created relative to the base of the section, a symbol assigned in any +other place is created as an absolute symbol. Since a symbol created +within a @code{SECTION} specification is relative to the base of the +section it will remain relocatable if relocatable output is requested. +A symbol may be created with an absolute value even when assigned to +within a @code{SECTION} specification by using the absolute assignment +function @code{ABSOLUTE} For example, to create an absolute symbol +whose address is the last byte of the output section @code{.data}: +@example +.data : + @{ + *(.data) + _edata = ABSOLUTE(.) ; + @} +@end example + +Unless quoted, symbol names start with a letter, underscore, point or +minus sign and may include any letters, underscores, digits, points, +and minus signs. Unquoted symbol names must not conflict with any +keywords. To specify a symbol which contains odd characters or has +the same name as a keyword surround it in double quotes: +@example + ``SECTION'' = 9; + ``with a space'' = ``also with a space'' + 10; +@end example + +@subsection Integers +An octal integer is @samp{0} followed by zero or more of the octal +digits (@samp{01234567}). + +A decimal integer starts with a non-zero digit followed by zero or +more digits (@samp{0123456789}). + +A hexadecimal integer is @samp{0x} or @samp{0X} followed by one or +more hexadecimal digits chosen from @samp{0123456789abcdefABCDEF}. + +Integers have the usual values. To denote a negative integer, use +the unary operator @samp{-} discussed under expressions. + +Additionally the suffixes @code{K} and @code{M} may be used to multiply the +previous constant by 1024 or +@tex +$1024^2$ +@end tex +respectively. + +@example + _as_decimal = 57005; + _as_hex = 0xdead; + _as_octal = 0157255; + + _4k_1 = 4K; + _4k_2 = 4096; + _4k_3 = 0x1000; +@end example +@subsection Operators +The linker provides the standard C set of arithmetic operators, with +the standard bindings and precedence levels: +@example + +@end example +@tex + +\vbox{\offinterlineskip +\hrule +\halign +{\vrule#&\hfil#\hfil&\vrule#&\hfil#\hfil&\vrule#&\hfil#\hfil&\vrule#\cr +height2pt&&&&&\cr +&Level&& associativity &&Operators&\cr +height2pt&&&&&\cr +\noalign{\hrule} +height2pt&&&&&\cr +&highest&&&&&&\cr +&1&&left&&$ ! - ~$&\cr +height2pt&&&&&\cr +&2&&left&&* / \%&\cr +height2pt&&&&&\cr +&3&&left&&+ -&\cr +height2pt&&&&&\cr +&4&&left&&$>> <<$&\cr +height2pt&&&&&\cr +&5&&left&&$== != > < <= >=$&\cr +height2pt&&&&&\cr +&6&&left&&\&&\cr +height2pt&&&&&\cr +&7&&left&&|&\cr +height2pt&&&&&\cr +&8&&left&&{\&\&}&\cr +height2pt&&&&&\cr +&9&&left&&||&\cr +height2pt&&&&&\cr +&10&&right&&? :&\cr +height2pt&&&&&\cr +&11&&right&&$${\&= += -= *= /=}&\cr +&lowest&&&&&&\cr +height2pt&&&&&\cr} +\hrule} +@end tex + +@section Built in Functions +The command language provides built in functions for use in +expressions in linkage scripts. +@table @bullet +@item @code{ALIGN(@var{exp})} +returns the result of the current location counter (@code{dot}) +aligned to the next @var{exp} boundary, where @var{exp} is a power of +two. This is equivalent to @code{(. + @var{exp} -1) & ~(@var{exp}-1)}. +As an example, to align the output @code{.data} section to the +next 0x2000 byte boundary after the preceding section and to set a +variable within the section to the next 0x8000 boundary after the +input sections: +@example + .data ALIGN(0x2000) :@{ + *(.data) + variable = ALIGN(0x8000); + @} +@end example + +@item @code{ADDR(@var{section name})} +returns the absolute address of the named section if the section has +already been bound. In the following examples the @code{symbol_1} and +@code{symbol_2} are assigned identical values: +@example + .output1: + @{ + start_of_output_1 $= .; + ... + @} + .output: + @{ + symbol_1 = ADDR(.output1); + symbol_2 = start_of_output_1; + @} +@end example + +@item @code{SIZEOF(@var{section name})} +returns the size in bytes of the named section, if the section has +been allocated. In the following example the @code{symbol_1} and +@code{symbol_2} are assigned identical values: +@example + .output @{ + .start = . ; + ... + .end = .; + @} + symbol_1 = .end - .start; + symbol_2 = SIZEOF(.output); +@end example + +@item @code{DEFINED(@var{symbol name})} +Returns 1 if the symbol is in the linker global symbol table and is +defined, otherwise it returns 0. This example shows the setting of a +global symbol @code{begin} to the first location in the @code{.text} +section, only if there is no other symbol +called @code{begin} already: +@example + .text: @{ + begin = DEFINED(begin) ? begin : . ; + ... + @} +@end example +@end table +@page +@section MEMORY Directive +The linker's default configuration is for all memory to be +allocatable. This state may be overridden by using the @code{MEMORY} +directive. The @code{MEMORY} directive describes the location and +size of blocks of memory in the target. Careful use can describe +memory regions which may or may not be used by the linker. The linker +does not shuffle sections to fit into the available regions, but does +move the requested sections into the correct regions and issue errors +when the regions become too full. The syntax is: + +@example + MEMORY + @{ +@tex + $\bigl\lbrace {\it name_1} ({\it attr_1}):$ ORIGIN = ${\it origin_1},$ LENGTH $= {\it len_1} \bigr\rbrace $ +@end tex + + @} +@end example +@table @code +@item @var{name} +is a name used internally by the linker to refer to the region. Any +symbol name may be used. The region names are stored in a separate +name space, and will not conflict with symbols, filenames or section +names. +@item @var{attr} +is an optional list of attributes, parsed for compatibility with the +AT+T linker +but ignored by the both the AT+T and the gnu linker. +@item @var{origin} +is the start address of the region in physical memory expressed as +standard linker expression which must evaluate to a constant before +memory allocation is performed. The keyword @code{ORIGIN} may be +abbreviated to @code{org} or @code{o}. +@item @var{len} +is the size in bytes of the region as a standard linker expression. +The keyword @code{LENGTH} may be abbreviated to @code{len} or @code{l} +@end table + +For example, to specify that memory has two regions available for +allocation; one starting at 0 for 256k, and the other starting at +0x40000000 for four megabytes: + +@example + MEMORY + @{ + rom : ORIGIN= 0, LENGTH = 256K + ram : ORIGIN= 0x40000000, LENGTH = 4M + @} + +@end example + +If the combined output sections directed to a region are too big for +the region the linker will emit an error message. +@page +@section SECTIONS Directive +The @code{SECTIONS} directive +controls exactly where input sections are placed into output sections, their +order and to which output sections they are allocated. + +When no @code{SECTIONS} directives are specified, the default action +of the linker is to place each input section into an identically named +output section in the order that the sections appear in the first +file, and then the order of the files. + +The syntax of the @code{SECTIONS} directive is: + +@example + SECTIONS + @{ +@tex + $\bigl\lbrace {\it name_n}\bigl[options\bigr]\colon$ $\bigl\lbrace {\it statements_n} \bigr\rbrace \bigl[ = {\it fill expression } \bigr] \bigl[ > mem spec \bigr] \bigr\rbrace $ +@end tex + @} +@end example + +@table @code +@item @var{name} +controls the name of the output section. In formats which only support +a limited number of sections, such as @code{a.out}, the name must be +one of the names supported by the format (in the case of a.out, +@code{.text}, @code{.data} or @code{.bss}). If the output format +supports any number of sections, but with numbers and not names (in +the case of IEEE), the name should be supplied as a quoted numeric +string. A section name may consist of any sequence characters, but +any name which does not conform to the standard @code{gld} symbol name +syntax must be quoted. To copy sections 1 through 4 from a Oasys file +into the @code{.text} section of an @code{a.out} file, and sections 13 +and 14 into the @code{data} section: +@example + + SECTION @{ + .text :@{ + *(``1'' ``2'' ``3'' ``4'') + @} + + .data :@{ + *(``13'' ``14'') + @} + @} +@end example + +@item @var{fill expression} +If present this +expression sets the fill value. Any unallocated holes in the current output +section when written to the output file will +be filled with the two least significant bytes of the value, repeated as +necessary. +@page +@item @var{options} +the @var{options} parameter is a list of optional arguments specifying +attributes of the output section, they may be taken from the following +list: +@table @bullet{} +@item @var{addr expression} +forces the output section to be loaded at a specified address. The +address is specified as a standard linker expression. The following +example generates section @var{output} at location +@code{0x40000000}: +@example + SECTIONS @{ + output 0x40000000: @{ + ... + @} + @} +@end example +Since the built in function @code{ALIGN} references the location +counter implicitly, a section may be located on a certain boundary by +using the @code{ALIGN} function in the expression. For example, to +locate the @code{.data} section on the next 8k boundary after the end +of the @code{.text} section: +@example + SECTIONS @{ + .text @{ + ... + @} + .data ALIGN(4K) @{ + ... + @} + @} +@end example +@end table +@item @var{statements} +is a list of file names, input sections and assignments. These statements control what is placed into the +output section. +The syntax of a single @var{statement} is one of: +@table @bullet + +@item @var{symbol} [ $= | += | -= | *= | /= ] @var{ expression} @code{;} + +Global symbols may be created and have their values (addresses) +altered using the assignment statement. The linker tries to put off +the evaluation of an assignment until all the terms in the source +expression are known; for instance the sizes of sections cannot be +known until after allocation, so assignments dependent upon these are +not performed until after allocation. Some expressions, such as those +depending upon the location counter @code{dot}, @samp{.} must be +evaluated during allocation. If the result of an expression is +required, but the value is not available, then an error results: eg +@example + SECTIONS @{ + text 9+this_isnt_constant: + @{ + @} + @} + testscript:21: Non constant expression for initial address +@end example + +@item @code{CREATE_OBJECT_SYMBOLS} +causes the linker to create a symbol for each input file and place it +into the specified section set with the value of the first byte of +data written from the input file. For instance, with @code{a.out} +files it is conventional to have a symbol for each input file. +@example + SECTIONS @{ + .text 0x2020 : + @{ + CREATE_OBJECT_SYMBOLS + *(.text) + _etext = ALIGN(0x2000); + @} + @} +@end example +Supplied with four object files, @code{a.o}, @code{b.o}, @code{c.o}, +and @code{d.o} a run of +@code{gld} could create a map: +@example +From functions like : +a.c: + afunction() { } + int adata=1; + int abss; + +00000000 A __DYNAMIC +00004020 B _abss +00004000 D _adata +00002020 T _afunction +00004024 B _bbss +00004008 D _bdata +00002038 T _bfunction +00004028 B _cbss +00004010 D _cdata +00002050 T _cfunction +0000402c B _dbss +00004018 D _ddata +00002068 T _dfunction +00004020 D _edata +00004030 B _end +00004000 T _etext +00002020 t a.o +00002038 t b.o +00002050 t c.o +00002068 t d.o + +@end example + +@item @var{filename} @code{(} @var{section name list} @code{)} +This command allocates all the named sections from the input object +file supplied into the output section at the current point. Sections +are written in the order they appear in the list so: +@example + SECTIONS @{ + .text 0x2020 : + @{ + a.o(.data) + b.o(.data) + *(.text) + @} + .data : + @{ + *(.data) + @} + .bss : + @{ + *(.bss) + COMMON + @} + @} +@end example +will produce a map: +@example + + insert here +@end example +@item @code{* (} @var{section name list} @code{)} +This command causes all sections from all input files which have not +yet been assigned output sections to be assigned the current output +section. + +@item @var{filename} @code{[COMMON]} +This allocates all the common symbols from the specified file and places +them into the current output section. + +@item @code{* [COMMON]} +This allocates all the common symbols from the files which have not +yet had their common symbols allocated and places them into the current +output section. + +@item @var{filename} +A filename alone within a @code{SECTIONS} statement will cause all the +input sections from the file to be placed into the current output +section at the current location. If the file name has been mentioned +before with a section name list then only those +sections which have not yet been allocated are noted. + +The following example reads all of the sections from file all.o and +places them at the start of output section @code{outputa} which starts +at location @code{0x10000}. All of the data from section @code{.input1} from +file foo.o is placed next into the same output section. All of +section @code{.input2} is read from foo.o and placed into output +section @code{outputb}. Next all of section @code{.input1} is read +from foo1.o. All of the remaining @code{.input1} and @code{.input2} +sections from any files are written to output section @code{output3}. + +@example + SECTIONS + @{ + outputa 0x10000 : + @{ + all.o + foo.o (.input1) + @} + outputb : + @{ + foo.o (.input2) + foo1.o (.input1) + @} + outputc : + @{ + *(.input1) + *(.input2) + @} + @} + +@end example +@end table +@end table +@section Using the Location Counter +The special linker variable @code{dot}, @samp{.} always contains the +current output location counter. Since the @code{dot} always refers to +a location in an output section, it must always appear in an +expression within a @code{SECTIONS} directive. The @code{dot} symbol +may appear anywhere that an ordinary symbol may appear in an +expression, but its assignments have a side effect. Assigning a value +to the @code{dot} symbol will cause the location counter to be moved. +This may be used to create holes in the output section. The location +counter may never be moved backwards. +@example + SECTIONS + @{ + output : + @{ + file1(.text) + . = . + 1000; + file2(.text) + . += 1000; + file3(.text) + . -= 32; + file4(.text) + @} = 0x1234; + @} +@end example +In the previous example, @code{file1} is located at the beginning of +the output section, then there is a 1000 byte gap, filled with 0x1234. +Then @code{file2} appears, also with a 1000 byte gap following before +@code{file3} is loaded. Then the first 32 bytes of @code{file4} are +placed over the last 32 bytes of @code{file3}. +@section Command Language Syntax +@section The Entry Point +The linker chooses the first executable instruction in an output file from a list +of possibilities, in order: +@itemize @bullet +@item +The value of the symbol provided to the command line with the @code{-e} option, when +present. +@item +The value of the symbol provided in the @code{ENTRY} directive, +if present. +@item +The value of the symbol @code{start}, if present. +@item +The value of the symbol @code{_main}, if present. +@item +The address of the first byte of the @code{.text} section, if present. +@item +The value 0. +@end itemize +If the symbol @code{start} is not defined within the set of input +files to a link, it may be generated by a simple assignment +expression. eg. +@example + start = 0x2020; +@end example +@section Section Attributes +@section Allocation of Sections into Memory +@section Defining Symbols +@chapter Examples of operation +The simplest case is linking standard Unix object files on a standard +Unix system supported by the linker. To link a file hello.o: +@example +$ gld -o output /lib/crt0.o hello.o -lc +@end example +This tells gld to produce a file called @code{output} after linking +the file @code{/lib/crt0.o} with @code{hello.o} and the library +@code{libc.a} which will come from the standard search directories. +@chapter Partial Linking +Specifying the @code{-r} on the command line causes @code{gld} to +perform a partial link. + + +@chapter BFD + +The linker accesses object and archive files using the @code{bfd} +libraries. These libraries allow the linker to use the same routines +to operate on object files whatever the object file format. + +A different object file format can be supported simply by creating a +new @code{bfd} back end and adding it to the library. + +Formats currently supported: +@itemize @bullet +@item +Sun3 68k a.out +@item +IEEE-695 68k Object Module Format +@item +Oasys 68k Binary Relocatable Object File Format +@item +Sun4 sparc a.out +@item +88k bcs coff +@item +i960 coff little endian +@item +i960 coff big endian +@item +i960 b.out little endian +@item +i960 b.out big endian +@end itemize + +As with most implementations, @code{bfd} is a compromise between +several conflicting requirements. The major factor influencing +@code{bfd} design was efficiency, any time used converting between +formats is time which would not have been spent had @code{bfd} not +been involved. This is partly offset by abstraction payback; since +@code{bfd} simplifies applications and back ends, more time and care +may be spent optimizing algorithms for a greater speed. + +One minor artifact of the @code{bfd} solution which the +user should be aware of is information lossage. +There are two places where useful information can be lost using the +@code{bfd} mechanism; during conversion and during output. + +@section How it works +When an object file is opened, @code{bfd} +tries to automatically determine the format of the input object file, a +descriptor is built in memory with pointers to routines to access +elements of the object file's data structures. + +As different information from the the object files is required +@code{bfd} reads from different sections of the file and processes +them. For example a very common operation for the linker is processing +symbol tables. Each @code{bfd} back end provides a routine for +converting between the object file's representation of symbols and an +internal canonical format. When the linker asks for the symbol table +of an object file, it calls through the memory pointer to the relevant +@code{bfd} back end routine which reads and converts the table into +the canonical form. Linker then operates upon the common form. When +the link is finished and the linker writes the symbol table of the +output file, another @code{bfd} back end routine is called which takes +the newly created symbol table and converts it into the output format. + +@section Information Leaks +@table @bullet{} +@item Information lost during output. +The output formats supported by @code{bfd} do not provide identical +facilities, and information which may be described in one form +has no where to go in another format. One example of this would be +alignment information in @code{b.out}. There is no where in an @code{a.out} +format file to store alignment information on the contained data, so when +a file is linked from @code{b.out} and an @code{a.out} image is produced, +alignment information is lost. (Note that in this case the linker has the +alignment information internally, so the link is performed correctly). + +Another example is COFF section names. COFF files may contain an +unlimited number of sections, each one with a textual section name. If +the target of the link is a format which does not have many sections +(eg @code{a.out}) or has sections without names (eg the Oasys format) +the link cannot be done simply. It is possible to circumvent this +problem by describing the desired input section to output section +mapping with the command language. + +@item Information lost during canonicalization. +The @code{bfd} +internal canonical form of the external formats is not exhaustive, +there are structures in input formats for which there is no direct +representation internally. This means that the @code{bfd} back ends +cannot maintain all the data richness through the transformation +between external to internal and back to external formats. + +This limitation is only a problem when using the linker to read one +format and write another. Each @code{bfd} back end is responsible for +maintaining as much data as possible, and the internal @code{bfd} +canonical form has structures which are opaque to the @code{bfd} core, +and exported only to the back ends. When a file is read in one format, +the canonical form is generated for @code{bfd} and the linker. At the +same time, the back end saves away any information which may otherwise +be lost. If the data is then written back to the same back end, the +back end routine will be able to use the canonical form provided by +the @code{bfd} core as well as the information it prepared earlier. +Since there is a great deal of commonality between back ends, this +mechanism is very useful. There is no information lost when linking +big endian COFF to little endian COFF, or from a.out to b.out. When a +mixture of formats are linked, the information is only lost from the +files with a different format to the destination. +@end table +@section Mechanism +The smallest amount of information is preserved when there +is a small union between the information provided by the source +format, that stored by the canonical format and the information needed +by the destination format. A brief description of the canonical form +will help the user appreciate what is possible to be maintained +between conversions. + +@table @bullet +@item file level Information on target machine +architecture, particular implementation and format type are stored on +a per file basis. Other information includes a demand pageable bit and +a write protected bit. Note that information like Unix magic numbers +is not stored here, only the magic numbers meaning, so a ZMAGIC file +would have both the demand pageable bit and the write protected text +bit set. + +The byte order of the target is stored on a per file basis, so that +both big and little endian object files may be linked together at the +same time. +@item section level +Each section in the input file contains the name of the section, the +original address in the object file, various flags, size and alignment +information and pointers into other @code{bfd} data structures. +@item symbol level +Each symbol contains a pointer to the object file which originally +defined it, its name, value and various flags bits. When a symbol +table is read in all symbols are relocated to make them relative to +the base of the section they were defined in, so each symbol points to +the containing section. Each symbol also has a varying amount of +hidden data to contain private data for the back end. Since the symbol +points to the original file, the symbol private data format is +accessible. Operations may be done to a list of symbols of wildly +different formats without problems. + +Normal global and simple local symbols are maintained on output, so an +output file, no matter the format will retain symbols pointing to +functions, globals, statics and commons. Some symbol information is +not worth retaining; in @code{a.out} type information is stored in the +symbol table as long symbol names. This information would be useless +to most coff debuggers and may be thrown away with appropriate command +line switches. (Note that gdb does support stabs in coff). + +There is one word of type information within the symbol, so if the +format supports symbol type information within symbols - (eg COFF, +IEEE, Oasys) and the type is simple enough to fit within one word +(nearly everything but aggregates) the information will be preserved. + +@item relocation level +Each canonical relocation record contains a pointer to the symbol to +relocate to, the offset of the data to relocate, the section the data +is in and a pointer to a relocation type descriptor. Relocation is +performed effectively by message passing through the relocation type +descriptor and symbol pointer. It allows relocations to be performed +on output data using a relocation method only available in one of the +input formats. For instance, Oasys provides a byte relocation format. +A relocation record requesting this relocation type would point +indirectly to a routine to perform this, so the relocation may be +performed on a byte being written to a COFF file, even though 68k COFF +has no such relocation type. + +@item line numbers +Line numbers have to be relocated along with the symbol information. +Each symbol with an associated list of line number records points to +the first record of the list. The head of a line number list consists +of a pointer to the symbol, which allows divination of the address of +the function who's line number is being described. The rest of the +list is tuples offsets into the section and line indexes. Any format +which can simply derive this information can pass it without lossage +between formats (COFF, IEEE and Oasys). +@end table + + +@bye + + |