aboutsummaryrefslogtreecommitdiff
path: root/gas/doc/internals.texi
diff options
context:
space:
mode:
authorRichard Henderson <rth@redhat.com>1999-05-03 07:29:11 +0000
committerRichard Henderson <rth@redhat.com>1999-05-03 07:29:11 +0000
commit252b5132c753830d5fd56823373aed85f2a0db63 (patch)
tree1af963bfd8d3e55167b81def4207f175eaff3a56 /gas/doc/internals.texi
downloadgdb-252b5132c753830d5fd56823373aed85f2a0db63.zip
gdb-252b5132c753830d5fd56823373aed85f2a0db63.tar.gz
gdb-252b5132c753830d5fd56823373aed85f2a0db63.tar.bz2
19990502 sourceware importbinu_ss_19990502
Diffstat (limited to 'gas/doc/internals.texi')
-rw-r--r--gas/doc/internals.texi1557
1 files changed, 1557 insertions, 0 deletions
diff --git a/gas/doc/internals.texi b/gas/doc/internals.texi
new file mode 100644
index 0000000..dd3b4ab
--- /dev/null
+++ b/gas/doc/internals.texi
@@ -0,0 +1,1557 @@
+\input texinfo
+@setfilename internals.info
+@node Top
+@top Assembler Internals
+@raisesections
+@cindex internals
+
+This chapter describes the internals of the assembler. It is incomplete, but
+it may help a bit.
+
+This chapter was last modified on $Date$. It is not updated regularly, and it
+may be out of date.
+
+@menu
+* GAS versions:: GAS versions
+* Data types:: Data types
+* GAS processing:: What GAS does when it runs
+* Porting GAS:: Porting GAS
+* Relaxation:: Relaxation
+* Broken words:: Broken words
+* Internal functions:: Internal functions
+* Test suite:: Test suite
+@end menu
+
+@node GAS versions
+@section GAS versions
+
+GAS has acquired layers of code over time. The original GAS only supported the
+a.out object file format, with three sections. Support for multiple sections
+has been added in two different ways.
+
+The preferred approach is to use the version of GAS created when the symbol
+@code{BFD_ASSEMBLER} is defined. The other versions of GAS are documented for
+historical purposes, and to help anybody who has to debug code written for
+them.
+
+The type @code{segT} is used to represent a section in code which must work
+with all versions of GAS.
+
+@menu
+* Original GAS:: Original GAS version
+* MANY_SEGMENTS:: MANY_SEGMENTS gas version
+* BFD_ASSEMBLER:: BFD_ASSEMBLER gas version
+@end menu
+
+@node Original GAS
+@subsection Original GAS
+
+The original GAS only supported the a.out object file format with three
+sections: @samp{.text}, @samp{.data}, and @samp{.bss}. This is the version of
+GAS that is compiled if neither @code{BFD_ASSEMBLER} nor @code{MANY_SEGMENTS}
+is defined. This version of GAS is still used for the m68k-aout target, and
+perhaps others.
+
+This version of GAS should not be used for any new development.
+
+There is still code that is specific to this version of GAS, notably in
+@file{write.c}. There is no way for this code to loop through all the
+sections; it simply looks at global variables like @code{text_frag_root} and
+@code{data_frag_root}.
+
+The type @code{segT} is an enum.
+
+@node MANY_SEGMENTS
+@subsection MANY_SEGMENTS gas version
+@cindex MANY_SEGMENTS
+
+The @code{MANY_SEGMENTS} version of gas is only used for COFF. It uses the BFD
+library, but it writes out all the data itself using @code{bfd_write}. This
+version of gas supports up to 40 normal sections. The section names are stored
+in the @code{seg_name} array. Other information is stored in the
+@code{segment_info} array.
+
+The type @code{segT} is an enum. Code that wants to examine all the sections
+can use a @code{segT} variable as loop index from @code{SEG_E0} up to but not
+including @code{SEG_UNKNOWN}.
+
+Most of the code specific to this version of GAS is in the file
+@file{config/obj-coff.c}, in the portion of that file that is compiled when
+@code{BFD_ASSEMBLER} is not defined.
+
+This version of GAS is still used for several COFF targets.
+
+@node BFD_ASSEMBLER
+@subsection BFD_ASSEMBLER gas version
+@cindex BFD_ASSEMBLER
+
+The preferred version of GAS is the @code{BFD_ASSEMBLER} version. In this
+version of GAS, the output file is a normal BFD, and the BFD routines are used
+to generate the output.
+
+@code{BFD_ASSEMBLER} will automatically be used for certain targets, including
+those that use the ELF, ECOFF, and SOM object file formats, and also all Alpha,
+MIPS, PowerPC, and SPARC targets. You can force the use of
+@code{BFD_ASSEMBLER} for other targets with the configure option
+@samp{--enable-bfd-assembler}; however, it has not been tested for many
+targets, and can not be assumed to work.
+
+@node Data types
+@section Data types
+@cindex internals, data types
+
+This section describes some fundamental GAS data types.
+
+@menu
+* Symbols:: The symbolS structure
+* Expressions:: The expressionS structure
+* Fixups:: The fixS structure
+* Frags:: The fragS structure
+@end menu
+
+@node Symbols
+@subsection Symbols
+@cindex internals, symbols
+@cindex symbols, internal
+@cindex symbolS structure
+
+The definition for @code{struct symbol}, also known as @code{symbolS}, is
+located in @file{struc-symbol.h}. Symbol structures contain the following
+fields:
+
+@table @code
+@item sy_value
+This is an @code{expressionS} that describes the value of the symbol. It might
+refer to one or more other symbols; if so, its true value may not be known
+until @code{resolve_symbol_value} is called in @code{write_object_file}.
+
+The expression is often simply a constant. Before @code{resolve_symbol_value}
+is called, the value is the offset from the frag (@pxref{Frags}). Afterward,
+the frag address has been added in.
+
+@item sy_resolved
+This field is non-zero if the symbol's value has been completely resolved. It
+is used during the final pass over the symbol table.
+
+@item sy_resolving
+This field is used to detect loops while resolving the symbol's value.
+
+@item sy_used_in_reloc
+This field is non-zero if the symbol is used by a relocation entry. If a local
+symbol is used in a relocation entry, it must be possible to redirect those
+relocations to other symbols, or this symbol cannot be removed from the final
+symbol list.
+
+@item sy_next
+@itemx sy_previous
+These pointers to other @code{symbolS} structures describe a singly or doubly
+linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the
+@code{sy_previous} field will be omitted; @code{SYMBOLS_NEED_BACKPOINTERS} is
+always defined if @code{BFD_ASSEMBLER}.) These fields should be accessed with
+the @code{symbol_next} and @code{symbol_previous} macros.
+
+@item sy_frag
+This points to the frag (@pxref{Frags}) that this symbol is attached to.
+
+@item sy_used
+Whether the symbol is used as an operand or in an expression. Note: Not all of
+the backends keep this information accurate; backends which use this bit are
+responsible for setting it when a symbol is used in backend routines.
+
+@item sy_mri_common
+Whether the symbol is an MRI common symbol created by the @code{COMMON}
+pseudo-op when assembling in MRI mode.
+
+@item bsym
+If @code{BFD_ASSEMBLER} is defined, this points to the BFD @code{asymbol} that
+will be used in writing the object file.
+
+@item sy_name_offset
+(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of
+the symbol's name in the string table of the object file. On some formats,
+this will start at position 4, with position 0 reserved for unnamed symbols.
+This field is not used until @code{write_object_file} is called.
+
+@item sy_symbol
+(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the
+format-specific symbol structure, as it would be written into the object file.
+
+@item sy_number
+(Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol
+number, for use in constructing relocation table entries.
+
+@item sy_obj
+This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by
+that name is defined in @file{obj-format.h}, this field is not defined.
+
+@item sy_tc
+This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro
+by that name is defined in @file{targ-cpu.h}, this field is not defined.
+
+@item TARGET_SYMBOL_FIELDS
+If this macro is defined, it defines additional fields in the symbol structure.
+This macro is obsolete, and should be replaced when possible by uses of
+@code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
+@end table
+
+There are a number of access routines used to extract the fields of a
+@code{symbolS} structure. When possible, these routines should be used rather
+than referring to the fields directly. These routines will work for any GAS
+version.
+
+@table @code
+@item S_SET_VALUE
+@cindex S_SET_VALUE
+Set the symbol's value.
+
+@item S_GET_VALUE
+@cindex S_GET_VALUE
+Get the symbol's value. This will cause @code{resolve_symbol_value} to be
+called if necessary, so @code{S_GET_VALUE} should only be called when it is
+safe to resolve symbols (i.e., after the entire input file has been read and
+all symbols have been defined).
+
+@item S_SET_SEGMENT
+@cindex S_SET_SEGMENT
+Set the section of the symbol.
+
+@item S_GET_SEGMENT
+@cindex S_GET_SEGMENT
+Get the symbol's section.
+
+@item S_GET_NAME
+@cindex S_GET_NAME
+Get the name of the symbol.
+
+@item S_SET_NAME
+@cindex S_SET_NAME
+Set the name of the symbol.
+
+@item S_IS_EXTERNAL
+@cindex S_IS_EXTERNAL
+Return non-zero if the symbol is externally visible.
+
+@item S_IS_EXTERN
+@cindex S_IS_EXTERN
+A synonym for @code{S_IS_EXTERNAL}. Don't use it.
+
+@item S_IS_WEAK
+@cindex S_IS_WEAK
+Return non-zero if the symbol is weak.
+
+@item S_IS_COMMON
+@cindex S_IS_COMMON
+Return non-zero if this is a common symbol. Common symbols are sometimes
+represented as undefined symbols with a value, in which case this function will
+not be reliable.
+
+@item S_IS_DEFINED
+@cindex S_IS_DEFINED
+Return non-zero if this symbol is defined. This function is not reliable when
+called on a common symbol.
+
+@item S_IS_DEBUG
+@cindex S_IS_DEBUG
+Return non-zero if this is a debugging symbol.
+
+@item S_IS_LOCAL
+@cindex S_IS_LOCAL
+Return non-zero if this is a local assembler symbol which should not be
+included in the final symbol table. Note that this is not the opposite of
+@code{S_IS_EXTERNAL}. The @samp{-L} assembler option affects the return value
+of this function.
+
+@item S_SET_EXTERNAL
+@cindex S_SET_EXTERNAL
+Mark the symbol as externally visible.
+
+@item S_CLEAR_EXTERNAL
+@cindex S_CLEAR_EXTERNAL
+Mark the symbol as not externally visible.
+
+@item S_SET_WEAK
+@cindex S_SET_WEAK
+Mark the symbol as weak.
+
+@item S_GET_TYPE
+@item S_GET_DESC
+@item S_GET_OTHER
+@cindex S_GET_TYPE
+@cindex S_GET_DESC
+@cindex S_GET_OTHER
+Get the @code{type}, @code{desc}, and @code{other} fields of the symbol. These
+are only defined for object file formats for which they make sense (primarily
+a.out).
+
+@item S_SET_TYPE
+@item S_SET_DESC
+@item S_SET_OTHER
+@cindex S_SET_TYPE
+@cindex S_SET_DESC
+@cindex S_SET_OTHER
+Set the @code{type}, @code{desc}, and @code{other} fields of the symbol. These
+are only defined for object file formats for which they make sense (primarily
+a.out).
+
+@item S_GET_SIZE
+@cindex S_GET_SIZE
+Get the size of a symbol. This is only defined for object file formats for
+which it makes sense (primarily ELF).
+
+@item S_SET_SIZE
+@cindex S_SET_SIZE
+Set the size of a symbol. This is only defined for object file formats for
+which it makes sense (primarily ELF).
+@end table
+
+@node Expressions
+@subsection Expressions
+@cindex internals, expressions
+@cindex expressions, internal
+@cindex expressionS structure
+
+Expressions are stored in an @code{expressionS} structure. The structure is
+defined in @file{expr.h}.
+
+@cindex expression
+The macro @code{expression} will create an @code{expressionS} structure based
+on the text found at the global variable @code{input_line_pointer}.
+
+@cindex make_expr_symbol
+@cindex expr_symbol_where
+A single @code{expressionS} structure can represent a single operation.
+Complex expressions are formed by creating @dfn{expression symbols} and
+combining them in @code{expressionS} structures. An expression symbol is
+created by calling @code{make_expr_symbol}. An expression symbol should
+naturally never appear in a symbol table, and the implementation of
+@code{S_IS_LOCAL} (@pxref{Symbols}) reflects that. The function
+@code{expr_symbol_where} returns non-zero if a symbol is an expression symbol,
+and also returns the file and line for the expression which caused it to be
+created.
+
+The @code{expressionS} structure has two symbol fields, a number field, an
+operator field, and a field indicating whether the number is unsigned.
+
+The operator field is of type @code{operatorT}, and describes how to interpret
+the other fields; see the definition in @file{expr.h} for the possibilities.
+
+An @code{operatorT} value of @code{O_big} indicates either a floating point
+number, stored in the global variable @code{generic_floating_point_number}, or
+an integer to large to store in an @code{offsetT} type, stored in the global
+array @code{generic_bignum}. This rather inflexible approach makes it
+impossible to use floating point numbers or large expressions in complex
+expressions.
+
+@node Fixups
+@subsection Fixups
+@cindex internals, fixups
+@cindex fixups
+@cindex fixS structure
+
+A @dfn{fixup} is basically anything which can not be resolved in the first
+pass. Sometimes a fixup can be resolved by the end of the assembly; if not,
+the fixup becomes a relocation entry in the object file.
+
+@cindex fix_new
+@cindex fix_new_exp
+A fixup is created by a call to @code{fix_new} or @code{fix_new_exp}. Both
+take a frag (@pxref{Frags}), a position within the frag, a size, an indication
+of whether the fixup is PC relative, and a type. In a @code{BFD_ASSEMBLER}
+GAS, the type is nominally a @code{bfd_reloc_code_real_type}, but several
+targets use other type codes to represent fixups that can not be described as
+relocations.
+
+The @code{fixS} structure has a number of fields, several of which are obsolete
+or are only used by a particular target. The important fields are:
+
+@table @code
+@item fx_frag
+The frag (@pxref{Frags}) this fixup is in.
+
+@item fx_where
+The location within the frag where the fixup occurs.
+
+@item fx_addsy
+The symbol this fixup is against. Typically, the value of this symbol is added
+into the object contents. This may be NULL.
+
+@item fx_subsy
+The value of this symbol is subtracted from the object contents. This is
+normally NULL.
+
+@item fx_offset
+A number which is added into the fixup.
+
+@item fx_addnumber
+Some CPU backends use this field to convey information between
+@code{md_apply_fix} and @code{tc_gen_reloc}. The machine independent code does
+not use it.
+
+@item fx_next
+The next fixup in the section.
+
+@item fx_r_type
+The type of the fixup. This field is only defined if @code{BFD_ASSEMBLER}, or
+if the target defines @code{NEED_FX_R_TYPE}.
+
+@item fx_size
+The size of the fixup. This is mostly used for error checking.
+
+@item fx_pcrel
+Whether the fixup is PC relative.
+
+@item fx_done
+Non-zero if the fixup has been applied, and no relocation entry needs to be
+generated.
+
+@item fx_file
+@itemx fx_line
+The file and line where the fixup was created.
+
+@item tc_fix_data
+This has the type @code{TC_FIX_TYPE}, and is only defined if the target defines
+that macro.
+@end table
+
+@node Frags
+@subsection Frags
+@cindex internals, frags
+@cindex frags
+@cindex fragS structure.
+
+The @code{fragS} structure is defined in @file{as.h}. Each frag represents a
+portion of the final object file. As GAS reads the source file, it creates
+frags to hold the data that it reads. At the end of the assembly the frags and
+fixups are processed to produce the final contents.
+
+@table @code
+@item fr_address
+The address of the frag. This is not set until the assembler rescans the list
+of all frags after the entire input file is parsed. The function
+@code{relax_segment} fills in this field.
+
+@item fr_next
+Pointer to the next frag in this (sub)section.
+
+@item fr_fix
+Fixed number of characters we know we're going to emit to the output file. May
+be zero.
+
+@item fr_var
+Variable number of characters we may output, after the initial @code{fr_fix}
+characters. May be zero.
+
+@item fr_offset
+The interpretation of this field is controlled by @code{fr_type}. Generally,
+if @code{fr_var} is non-zero, this is a repeat count: the @code{fr_var}
+characters are output @code{fr_offset} times.
+
+@item line
+Holds line number info when an assembler listing was requested.
+
+@item fr_type
+Relaxation state. This field indicates the interpretation of @code{fr_offset},
+@code{fr_symbol} and the variable-length tail of the frag, as well as the
+treatment it gets in various phases of processing. It does not affect the
+initial @code{fr_fix} characters; they are always supposed to be output
+verbatim (fixups aside). See below for specific values this field can have.
+
+@item fr_subtype
+Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is
+assumed to be an index into @code{TC_GENERIC_RELAX_TABLE} for the generic
+relaxation code to process (@pxref{Relaxation}). If @code{md_relax_frag} is
+defined, this field is available for any use by the CPU-specific code.
+
+@item fr_symbol
+This normally indicates the symbol to use when relaxing the frag according to
+@code{fr_type}.
+
+@item fr_opcode
+Points to the lowest-addressed byte of the opcode, for use in relaxation.
+
+@item tc_frag_data
+Target specific fragment data of type TC_FRAG_TYPE.
+Only present if @code{TC_FRAG_TYPE} is defined.
+
+@item fr_file
+@itemx fr_line
+The file and line where this frag was last modified.
+
+@item fr_literal
+Declared as a one-character array, this last field grows arbitrarily large to
+hold the actual contents of the frag.
+@end table
+
+These are the possible relaxation states, provided in the enumeration type
+@code{relax_stateT}, and the interpretations they represent for the other
+fields:
+
+@table @code
+@item rs_align
+@itemx rs_align_code
+The start of the following frag should be aligned on some boundary. In this
+frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes.
+(For example, if alignment on an 8-byte boundary were desired, @code{fr_offset}
+would have a value of 3.) The variable characters indicate the fill pattern to
+be used. The @code{fr_subtype} field holds the maximum number of bytes to skip
+when doing this alignment. If more bytes are needed, the alignment is not
+done. An @code{fr_subtype} value of 0 means no maximum, which is the normal
+case. Target backends can use @code{rs_align_code} to handle certain types of
+alignment differently.
+
+@item rs_broken_word
+This indicates that ``broken word'' processing should be done (@pxref{Broken
+words}). If broken word processing is not necessary on the target machine,
+this enumerator value will not be defined.
+
+@item rs_cfa
+This state is used to implement exception frame optimizations. The
+@code{fr_symbol} is an expression symbol for the subtraction which may be
+relaxed. The @code{fr_opcode} field holds the frag for the preceding command
+byte. The @code{fr_offset} field holds the offset within that frag. The
+@code{fr_subtype} field is used during relaxation to hold the current size of
+the frag.
+
+@item rs_fill
+The variable characters are to be repeated @code{fr_offset} times. If
+@code{fr_offset} is 0, this frag has a length of @code{fr_fix}. Most frags
+have this type.
+
+@item rs_leb128
+This state is used to implement the DWARF ``little endian base 128''
+variable length number format. The @code{fr_symbol} is always an expression
+symbol, as constant expressions are emitted directly. The @code{fr_offset}
+field is used during relaxation to hold the previous size of the number so
+that we can determine if the fragment changed size.
+
+@item rs_machine_dependent
+Displacement relaxation is to be done on this frag. The target is indicated by
+@code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the
+particular machine-specific addressing mode desired. @xref{Relaxation}.
+
+@item rs_org
+The start of the following frag should be pushed back to some specific offset
+within the section. (Some assemblers use the value as an absolute address; GAS
+does not handle final absolute addresses, but rather requires that the linker
+set them.) The offset is given by @code{fr_symbol} and @code{fr_offset}; one
+character from the variable-length tail is used as the fill character.
+@end table
+
+@cindex frchainS structure
+A chain of frags is built up for each subsection. The data structure
+describing a chain is called a @code{frchainS}, and contains the following
+fields:
+
+@table @code
+@item frch_root
+Points to the first frag in the chain. May be NULL if there are no frags in
+this chain.
+@item frch_last
+Points to the last frag in the chain, or NULL if there are none.
+@item frch_next
+Next in the list of @code{frchainS} structures.
+@item frch_seg
+Indicates the section this frag chain belongs to.
+@item frch_subseg
+Subsection (subsegment) number of this frag chain.
+@item fix_root, fix_tail
+(Defined only if @code{BFD_ASSEMBLER} is defined). Point to first and last
+@code{fixS} structures associated with this subsection.
+@item frch_obstack
+Not currently used. Intended to be used for frag allocation for this
+subsection. This should reduce frag generation caused by switching sections.
+@item frch_frag_now
+The current frag for this subsegment.
+@end table
+
+A @code{frchainS} corresponds to a subsection; each section has a list of
+@code{frchainS} records associated with it. In most cases, only one subsection
+of each section is used, so the list will only be one element long, but any
+processing of frag chains should be prepared to deal with multiple chains per
+section.
+
+After the input files have been completely processed, and no more frags are to
+be generated, the frag chains are joined into one per section for further
+processing. After this point, it is safe to operate on one chain per section.
+
+The assembler always has a current frag, named @code{frag_now}. More space is
+allocated for the current frag using the @code{frag_more} function; this
+returns a pointer to the amount of requested space. Relaxing is done using
+variant frags allocated by @code{frag_var} or @code{frag_variant}
+(@pxref{Relaxation}).
+
+@node GAS processing
+@section What GAS does when it runs
+@cindex internals, overview
+
+This is a quick look at what an assembler run looks like.
+
+@itemize @bullet
+@item
+The assembler initializes itself by calling various init routines.
+
+@item
+For each source file, the @code{read_a_source_file} function reads in the file
+and parses it. The global variable @code{input_line_pointer} points to the
+current text; it is guaranteed to be correct up to the end of the line, but not
+farther.
+
+@item
+For each line, the assembler passes labels to the @code{colon} function, and
+isolates the first word. If it looks like a pseudo-op, the word is looked up
+in the pseudo-op hash table @code{po_hash} and dispatched to a pseudo-op
+routine. Otherwise, the target dependent @code{md_assemble} routine is called
+to parse the instruction.
+
+@item
+When pseudo-ops or instructions output data, they add it to a frag, calling
+@code{frag_more} to get space to store it in.
+
+@item
+Pseudo-ops and instructions can also output fixups created by @code{fix_new} or
+@code{fix_new_exp}.
+
+@item
+For certain targets, instructions can create variant frags which are used to
+store relaxation information (@pxref{Relaxation}).
+
+@item
+When the input file is finished, the @code{write_object_file} routine is
+called. It assigns addresses to all the frags (@code{relax_segment}), resolves
+all the fixups (@code{fixup_segment}), resolves all the symbol values (using
+@code{resolve_symbol_value}), and finally writes out the file (in the
+@code{BFD_ASSEMBLER} case, this is done by simply calling @code{bfd_close}).
+@end itemize
+
+@node Porting GAS
+@section Porting GAS
+@cindex porting
+
+Each GAS target specifies two main things: the CPU file and the object format
+file. Two main switches in the @file{configure.in} file handle this. The
+first switches on CPU type to set the shell variable @code{cpu_type}. The
+second switches on the entire target to set the shell variable @code{fmt}.
+
+The configure script uses the value of @code{cpu_type} to select two files in
+the @file{config} directory: @file{tc-@var{CPU}.c} and @file{tc-@var{CPU}.h}.
+The configuration process will create a file named @file{targ-cpu.h} in the
+build directory which includes @file{tc-@var{CPU}.h}.
+
+The configure script also uses the value of @code{fmt} to select two files:
+@file{obj-@var{fmt}.c} and @file{obj-@var{fmt}.h}. The configuration process
+will create a file named @file{obj-format.h} in the build directory which
+includes @file{obj-@var{fmt}.h}.
+
+You can also set the emulation in the configure script by setting the @code{em}
+variable. Normally the default value of @samp{generic} is fine. The
+configuration process will create a file named @file{targ-env.h} in the build
+directory which includes @file{te-@var{em}.h}.
+
+Porting GAS to a new CPU requires writing the @file{tc-@var{CPU}} files.
+Porting GAS to a new object file format requires writing the
+@file{obj-@var{fmt}} files. There is sometimes some interaction between these
+two files, but it is normally minimal.
+
+The best approach is, of course, to copy existing files. The documentation
+below assumes that you are looking at existing files to see usage details.
+
+These interfaces have grown over time, and have never been carefully thought
+out or designed. Nothing about the interfaces described here is cast in stone.
+It is possible that they will change from one version of the assembler to the
+next. Also, new macros are added all the time as they are needed.
+
+@menu
+* CPU backend:: Writing a CPU backend
+* Object format backend:: Writing an object format backend
+* Emulations:: Writing emulation files
+@end menu
+
+@node CPU backend
+@subsection Writing a CPU backend
+@cindex CPU backend
+@cindex @file{tc-@var{CPU}}
+
+The CPU backend files are the heart of the assembler. They are the only parts
+of the assembler which actually know anything about the instruction set of the
+processor.
+
+You must define a reasonably small list of macros and functions in the CPU
+backend files. You may define a large number of additional macros in the CPU
+backend files, not all of which are documented here. You must, of course,
+define macros in the @file{.h} file, which is included by every assembler
+source file. You may define the functions as macros in the @file{.h} file, or
+as functions in the @file{.c} file.
+
+@table @code
+@item TC_@var{CPU}
+@cindex TC_@var{CPU}
+By convention, you should define this macro in the @file{.h} file. For
+example, @file{tc-m68k.h} defines @code{TC_M68K}. You might have to use this
+if it is necessary to add CPU specific code to the object format file.
+
+@item TARGET_FORMAT
+This macro is the BFD target name to use when creating the output file. This
+will normally depend upon the @code{OBJ_@var{FMT}} macro.
+
+@item TARGET_ARCH
+This macro is the BFD architecture to pass to @code{bfd_set_arch_mach}.
+
+@item TARGET_MACH
+This macro is the BFD machine number to pass to @code{bfd_set_arch_mach}. If
+it is not defined, GAS will use 0.
+
+@item TARGET_BYTES_BIG_ENDIAN
+You should define this macro to be non-zero if the target is big endian, and
+zero if the target is little endian.
+
+@item md_shortopts
+@itemx md_longopts
+@itemx md_longopts_size
+@itemx md_parse_option
+@itemx md_show_usage
+@cindex md_shortopts
+@cindex md_longopts
+@cindex md_longopts_size
+@cindex md_parse_option
+@cindex md_show_usage
+GAS uses these variables and functions during option processing.
+@code{md_shortopts} is a @code{const char *} which GAS adds to the machine
+independent string passed to @code{getopt}. @code{md_longopts} is a
+@code{struct option []} which GAS adds to the machine independent long options
+passed to @code{getopt}; you may use @code{OPTION_MD_BASE}, defined in
+@file{as.h}, as the start of a set of long option indices, if necessary.
+@code{md_longopts_size} is a @code{size_t} holding the size @code{md_longopts}.
+GAS will call @code{md_parse_option} whenever @code{getopt} returns an
+unrecognized code, presumably indicating a special code value which appears in
+@code{md_longopts}. GAS will call @code{md_show_usage} when a usage message is
+printed; it should print a description of the machine specific options.
+
+@item md_begin
+@cindex md_begin
+GAS will call this function at the start of the assembly, after the command
+line arguments have been parsed and all the machine independent initializations
+have been completed.
+
+@item md_cleanup
+@cindex md_cleanup
+If you define this macro, GAS will call it at the end of each input file.
+
+@item md_assemble
+@cindex md_assemble
+GAS will call this function for each input line which does not contain a
+pseudo-op. The argument is a null terminated string. The function should
+assemble the string as an instruction with operands. Normally
+@code{md_assemble} will do this by calling @code{frag_more} and writing out
+some bytes (@pxref{Frags}). @code{md_assemble} will call @code{fix_new} to
+create fixups as needed (@pxref{Fixups}). Targets which need to do special
+purpose relaxation will call @code{frag_var}.
+
+@item md_pseudo_table
+@cindex md_pseudo_table
+This is a const array of type @code{pseudo_typeS}. It is a mapping from
+pseudo-op names to functions. You should use this table to implement
+pseudo-ops which are specific to the CPU.
+
+@item tc_conditional_pseudoop
+@cindex tc_conditional_pseudoop
+If this macro is defined, GAS will call it with a @code{pseudo_typeS} argument.
+It should return non-zero if the pseudo-op is a conditional which controls
+whether code is assembled, such as @samp{.if}. GAS knows about the normal
+conditional pseudo-ops,and you should normally not have to define this macro.
+
+@item comment_chars
+@cindex comment_chars
+This is a null terminated @code{const char} array of characters which start a
+comment.
+
+@item tc_comment_chars
+@cindex tc_comment_chars
+If this macro is defined, GAS will use it instead of @code{comment_chars}.
+
+@item tc_symbol_chars
+@cindex tc_symbol_chars
+If this macro is defined, it is a pointer to a null terminated list of
+characters which may appear in an operand. GAS already assumes that all
+alphanumberic characters, and @samp{$}, @samp{.}, and @samp{_} may appear in an
+operand (see @samp{symbol_chars} in @file{app.c}). This macro may be defined
+to treat additional characters as appearing in an operand. This affects the
+way in which GAS removes whitespace before passing the string to
+@samp{md_assemble}.
+
+@item line_comment_chars
+@cindex line_comment_chars
+This is a null terminated @code{const char} array of characters which start a
+comment when they appear at the start of a line.
+
+@item line_separator_chars
+@cindex line_separator_chars
+This is a null terminated @code{const char} array of characters which separate
+lines (semicolon and newline are such characters by default, and need not be
+listed in this array).
+
+@item EXP_CHARS
+@cindex EXP_CHARS
+This is a null terminated @code{const char} array of characters which may be
+used as the exponent character in a floating point number. This is normally
+@code{"eE"}.
+
+@item FLT_CHARS
+@cindex FLT_CHARS
+This is a null terminated @code{const char} array of characters which may be
+used to indicate a floating point constant. A zero followed by one of these
+characters is assumed to be followed by a floating point number; thus they
+operate the way that @code{0x} is used to indicate a hexadecimal constant.
+Usually this includes @samp{r} and @samp{f}.
+
+@item LEX_AT
+@cindex LEX_AT
+You may define this macro to the lexical type of the @kbd{@}} character. The
+default is zero.
+
+Lexical types are a combination of @code{LEX_NAME} and @code{LEX_BEGIN_NAME},
+both defined in @file{read.h}. @code{LEX_NAME} indicates that the character
+may appear in a name. @code{LEX_BEGIN_NAME} indicates that the character may
+appear at the beginning of a nem.
+
+@item LEX_BR
+@cindex LEX_BR
+You may define this macro to the lexical type of the brace characters @kbd{@{},
+@kbd{@}}, @kbd{[}, and @kbd{]}. The default value is zero.
+
+@item LEX_PCT
+@cindex LEX_PCT
+You may define this macro to the lexical type of the @kbd{%} character. The
+default value is zero.
+
+@item LEX_QM
+@cindex LEX_QM
+You may define this macro to the lexical type of the @kbd{?} character. The
+default value it zero.
+
+@item LEX_DOLLAR
+@cindex LEX_DOLLAR
+You may define this macro to the lexical type of the @kbd{$} character. The
+default value is @code{LEX_NAME | LEX_BEGIN_NAME}.
+
+@item SINGLE_QUOTE_STRINGS
+@cindex SINGLE_QUOTE_STRINGS
+If you define this macro, GAS will treat single quotes as string delimiters.
+Normally only double quotes are accepted as string delimiters.
+
+@item NO_STRING_ESCAPES
+@cindex NO_STRING_ESCAPES
+If you define this macro, GAS will not permit escape sequences in a string.
+
+@item ONLY_STANDARD_ESCAPES
+@cindex ONLY_STANDARD_ESCAPES
+If you define this macro, GAS will warn about the use of nonstandard escape
+sequences in a string.
+
+@item md_start_line_hook
+@cindex md_start_line_hook
+If you define this macro, GAS will call it at the start of each line.
+
+@item LABELS_WITHOUT_COLONS
+@cindex LABELS_WITHOUT_COLONS
+If you define this macro, GAS will assume that any text at the start of a line
+is a label, even if it does not have a colon.
+
+@item TC_START_LABEL
+@cindex TC_START_LABEL
+You may define this macro to control what GAS considers to be a label. The
+default definition is to accept any name followed by a colon character.
+
+@item NO_PSEUDO_DOT
+@cindex NO_PSEUDO_DOT
+If you define this macro, GAS will not require pseudo-ops to start with a
+@kbd{.} character.
+
+@item TC_EQUAL_IN_INSN
+@cindex TC_EQUAL_IN_INSN
+If you define this macro, it should return nonzero if the instruction is
+permitted to contain an @kbd{=} character. GAS will use this to decide if a
+@kbd{=} is an assignment or an instruction.
+
+@item TC_EOL_IN_INSN
+@cindex TC_EOL_IN_INSN
+If you define this macro, it should return nonzero if the current input line
+pointer should be treated as the end of a line.
+
+@item md_parse_name
+@cindex md_parse_name
+If this macro is defined, GAS will call it for any symbol found in an
+expression. You can define this to handle special symbols in a special way.
+If a symbol always has a certain value, you should normally enter it in the
+symbol table, perhaps using @code{reg_section}.
+
+@item md_undefined_symbol
+@cindex md_undefined_symbol
+GAS will call this function when a symbol table lookup fails, before it
+creates a new symbol. Typically this would be used to supply symbols whose
+name or value changes dynamically, possibly in a context sensitive way.
+Predefined symbols with fixed values, such as register names or condition
+codes, are typically entered directly into the symbol table when @code{md_begin}
+is called.
+
+@item md_operand
+@cindex md_operand
+GAS will call this function for any expression that can not be recognized.
+When the function is called, @code{input_line_pointer} will point to the start
+of the expression.
+
+@item tc_unrecognized_line
+@cindex tc_unrecognized_line
+If you define this macro, GAS will call it when it finds a line that it can not
+parse.
+
+@item md_do_align
+@cindex md_do_align
+You may define this macro to handle an alignment directive. GAS will call it
+when the directive is seen in the input file. For example, the i386 backend
+uses this to generate efficient nop instructions of varying lengths, depending
+upon the number of bytes that the alignment will skip.
+
+@item HANDLE_ALIGN
+@cindex HANDLE_ALIGN
+You may define this macro to do special handling for an alignment directive.
+GAS will call it at the end of the assembly.
+
+@item md_flush_pending_output
+@cindex md_flush_pending_output
+If you define this macro, GAS will call it each time it skips any space because of a
+space filling or alignment or data allocation pseudo-op.
+
+@item TC_PARSE_CONS_EXPRESSION
+@cindex TC_PARSE_CONS_EXPRESSION
+You may define this macro to parse an expression used in a data allocation
+pseudo-op such as @code{.word}. You can use this to recognize relocation
+directives that may appear in such directives.
+
+@item BITFIELD_CONS_EXPRESSION
+@cindex BITFIELD_CONS_EXPRESSION
+If you define this macro, GAS will recognize bitfield instructions in data
+allocation pseudo-ops, as used on the i960.
+
+@item REPEAT_CONS_EXPRESSION
+@cindex REPEAT_CONS_EXPRESSION
+If you define this macro, GAS will recognize repeat counts in data allocation
+pseudo-ops, as used on the MIPS.
+
+@item md_cons_align
+@cindex md_cons_align
+You may define this macro to do any special alignment before a data allocation
+pseudo-op.
+
+@item TC_CONS_FIX_NEW
+@cindex TC_CONS_FIX_NEW
+You may define this macro to generate a fixup for a data allocation pseudo-op.
+
+@item TC_INIT_FIX_DATA (@var{fixp})
+@cindex TC_INIT_FIX_DATA
+A C statement to initialize the target specific fields of fixup @var{fixp}.
+These fields are defined with the @code{TC_FIX_TYPE} macro.
+
+@item TC_FIX_DATA_PRINT (@var{stream}, @var{fixp})
+@cindex TC_FIX_DATA_PRINT
+A C statement to output target specific debugging information for
+fixup @var{fixp} to @var{stream}. This macro is called by @code{print_fixup}.
+
+@item TC_FRAG_INIT (@var{fragp})
+@cindex TC_FRAG_INIT
+A C statement to initialize the target specific fields of frag @var{fragp}.
+These fields are defined with the @code{TC_FRAG_TYPE} macro.
+
+@item md_number_to_chars
+@cindex md_number_to_chars
+This should just call either @code{number_to_chars_bigendian} or
+@code{number_to_chars_littleendian}, whichever is appropriate. On targets like
+the MIPS which support options to change the endianness, which function to call
+is a runtime decision. On other targets, @code{md_number_to_chars} can be a
+simple macro.
+
+@item md_reloc_size
+@cindex md_reloc_size
+This variable is only used in the original version of gas (not
+@code{BFD_ASSEMBLER} and not @code{MANY_SEGMENTS}). It holds the size of a
+relocation entry.
+
+@item WORKING_DOT_WORD
+@itemx md_short_jump_size
+@itemx md_long_jump_size
+@itemx md_create_short_jump
+@itemx md_create_long_jump
+@cindex WORKING_DOT_WORD
+@cindex md_short_jump_size
+@cindex md_long_jump_size
+@cindex md_create_short_jump
+@cindex md_create_long_jump
+If @code{WORKING_DOT_WORD} is defined, GAS will not do broken word processing
+(@pxref{Broken words}). Otherwise, you should set @code{md_short_jump_size} to
+the size of a short jump (a jump that is just long enough to jump around a long
+jmp) and @code{md_long_jump_size} to the size of a long jump (a jump that can
+go anywhere in the function), You should define @code{md_create_short_jump} to
+create a short jump around a long jump, and define @code{md_create_long_jump}
+to create a long jump.
+
+@item md_estimate_size_before_relax
+@cindex md_estimate_size_before_relax
+This function returns an estimate of the size of a @code{rs_machine_dependent}
+frag before any relaxing is done. It may also create any necessary
+relocations.
+
+@item md_relax_frag
+@cindex md_relax_frag
+This macro may be defined to relax a frag. GAS will call this with the frag
+and the change in size of all previous frags; @code{md_relax_frag} should
+return the change in size of the frag. @xref{Relaxation}.
+
+@item TC_GENERIC_RELAX_TABLE
+@cindex TC_GENERIC_RELAX_TABLE
+If you do not define @code{md_relax_frag}, you may define
+@code{TC_GENERIC_RELAX_TABLE} as a table of @code{relax_typeS} structures. The
+machine independent code knows how to use such a table to relax PC relative
+references. See @file{tc-m68k.c} for an example. @xref{Relaxation}.
+
+@item md_prepare_relax_scan
+@cindex md_prepare_relax_scan
+If defined, it is a C statement that is invoked prior to scanning
+the relax table.
+
+@item LINKER_RELAXING_SHRINKS_ONLY
+@cindex LINKER_RELAXING_SHRINKS_ONLY
+If you define this macro, and the global variable @samp{linkrelax} is set
+(because of a command line option, or unconditionally in @code{md_begin}), a
+@samp{.align} directive will cause extra space to be allocated. The linker can
+then discard this space when relaxing the section.
+
+@item md_convert_frag
+@cindex md_convert_frag
+GAS will call this for each rs_machine_dependent fragment.
+The instruction is completed using the data from the relaxation pass.
+It may also create any necessary relocations.
+@xref{Relaxation}.
+
+@item md_apply_fix
+@cindex md_apply_fix
+GAS will call this for each fixup. It should store the correct value in the
+object file.
+
+@item TC_HANDLES_FX_DONE
+@cindex TC_HANDLES_FX_DONE
+If this macro is defined, it means that @code{md_apply_fix} correctly sets the
+@code{fx_done} field in the fixup.
+
+@item tc_gen_reloc
+@cindex tc_gen_reloc
+A @code{BFD_ASSEMBLER} GAS will call this to generate a reloc. GAS will pass
+the resulting reloc to @code{bfd_install_relocation}. This currently works
+poorly, as @code{bfd_install_relocation} often does the wrong thing, and
+instances of @code{tc_gen_reloc} have been written to work around the problems,
+which in turns makes it difficult to fix @code{bfd_install_relocation}.
+
+@item RELOC_EXPANSION_POSSIBLE
+@cindex RELOC_EXPANSION_POSSIBLE
+If you define this macro, it means that @code{tc_gen_reloc} may return multiple
+relocation entries for a single fixup. In this case, the return value of
+@code{tc_gen_reloc} is a pointer to a null terminated array.
+
+@item MAX_RELOC_EXPANSION
+@cindex MAX_RELOC_EXPANSION
+You must define this if @code{RELOC_EXPANSION_POSSIBLE} is defined; it
+indicates the largest number of relocs which @code{tc_gen_reloc} may return for
+a single fixup.
+
+@item tc_fix_adjustable
+@cindex tc_fix_adjustable
+You may define this macro to indicate whether a fixup against a locally defined
+symbol should be adjusted to be against the section symbol. It should return a
+non-zero value if the adjustment is acceptable.
+
+@item MD_PCREL_FROM_SECTION
+@cindex MD_PCREL_FROM_SECTION
+If you define this macro, it should return the offset between the address of a
+PC relative fixup and the position from which the PC relative adjustment should
+be made. On many processors, the base of a PC relative instruction is the next
+instruction, so this macro would return the length of an instruction.
+
+@item md_pcrel_from
+@cindex md_pcrel_from
+This is the default value of @code{MD_PCREL_FROM_SECTION}. The difference is
+that @code{md_pcrel_from} does not take a section argument.
+
+@item tc_frob_label
+@cindex tc_frob_label
+If you define this macro, GAS will call it each time a label is defined.
+
+@item md_section_align
+@cindex md_section_align
+GAS will call this function for each section at the end of the assembly, to
+permit the CPU backend to adjust the alignment of a section.
+
+@item tc_frob_section
+@cindex tc_frob_section
+If you define this macro, a @code{BFD_ASSEMBLER} GAS will call it for each
+section at the end of the assembly.
+
+@item tc_frob_file_before_adjust
+@cindex tc_frob_file_before_adjust
+If you define this macro, GAS will call it after the symbol values are
+resolved, but before the fixups have been changed from local symbols to section
+symbols.
+
+@item tc_frob_symbol
+@cindex tc_frob_symbol
+If you define this macro, GAS will call it for each symbol. You can indicate
+that the symbol should not be included in the object file by definining this
+macro to set its second argument to a non-zero value.
+
+@item tc_frob_file
+@cindex tc_frob_file
+If you define this macro, GAS will call it after the symbol table has been
+completed, but before the relocations have been generated.
+
+@item tc_frob_file_after_relocs
+If you define this macro, GAS will call it after the relocs have been
+generated.
+
+@item LISTING_HEADER
+A string to use on the header line of a listing. The default value is simply
+@code{"GAS LISTING"}.
+
+@item LISTING_WORD_SIZE
+The number of bytes to put into a word in a listing. This affects the way the
+bytes are clumped together in the listing. For example, a value of 2 might
+print @samp{1234 5678} where a value of 1 would print @samp{12 34 56 78}. The
+default value is 4.
+
+@item LISTING_LHS_WIDTH
+The number of words of data to print on the first line of a listing for a
+particular source line, where each word is @code{LISTING_WORD_SIZE} bytes. The
+default value is 1.
+
+@item LISTING_LHS_WIDTH_SECOND
+Like @code{LISTING_LHS_WIDTH}, but applying to the second and subsequent line
+of the data printed for a particular source line. The default value is 1.
+
+@item LISTING_LHS_CONT_LINES
+The maximum number of continuation lines to print in a listing for a particular
+source line. The default value is 4.
+
+@item LISTING_RHS_WIDTH
+The maximum number of characters to print from one line of the input file. The
+default value is 100.
+@end table
+
+@node Object format backend
+@subsection Writing an object format backend
+@cindex object format backend
+@cindex @file{obj-@var{fmt}}
+
+As with the CPU backend, the object format backend must define a few things,
+and may define some other things. The interface to the object format backend
+is generally simpler; most of the support for an object file format consists of
+defining a number of pseudo-ops.
+
+The object format @file{.h} file must include @file{targ-cpu.h}.
+
+This section will only define the @code{BFD_ASSEMBLER} version of GAS. It is
+impossible to support a new object file format using any other version anyhow,
+as the original GAS version only supports a.out, and the @code{MANY_SEGMENTS}
+GAS version only supports COFF.
+
+@table @code
+@item OBJ_@var{format}
+@cindex OBJ_@var{format}
+By convention, you should define this macro in the @file{.h} file. For
+example, @file{obj-elf.h} defines @code{OBJ_ELF}. You might have to use this
+if it is necessary to add object file format specific code to the CPU file.
+
+@item obj_begin
+If you define this macro, GAS will call it at the start of the assembly, after
+the command line arguments have been parsed and all the machine independent
+initializations have been completed.
+
+@item obj_app_file
+@cindex obj_app_file
+If you define this macro, GAS will invoke it when it sees a @code{.file}
+pseudo-op or a @samp{#} line as used by the C preprocessor.
+
+@item OBJ_COPY_SYMBOL_ATTRIBUTES
+@cindex OBJ_COPY_SYMBOL_ATTRIBUTES
+You should define this macro to copy object format specific information from
+one symbol to another. GAS will call it when one symbol is equated to
+another.
+
+@item obj_fix_adjustable
+@cindex obj_fix_adjustable
+You may define this macro to indicate whether a fixup against a locally defined
+symbol should be adjusted to be against the section symbol. It should return a
+non-zero value if the adjustment is acceptable.
+
+@item obj_sec_sym_ok_for_reloc
+@cindex obj_sec_sym_ok_for_reloc
+You may define this macro to indicate that it is OK to use a section symbol in
+a relocateion entry. If it is not, GAS will define a new symbol at the start
+of a section.
+
+@item EMIT_SECTION_SYMBOLS
+@cindex EMIT_SECTION_SYMBOLS
+You should define this macro with a zero value if you do not want to include
+section symbols in the output symbol table. The default value for this macro
+is one.
+
+@item obj_adjust_symtab
+@cindex obj_adjust_symtab
+If you define this macro, GAS will invoke it just before setting the symbol
+table of the output BFD. For example, the COFF support uses this macro to
+generate a @code{.file} symbol if none was generated previously.
+
+@item SEPARATE_STAB_SECTIONS
+@cindex SEPARATE_STAB_SECTIONS
+You may define this macro to indicate that stabs should be placed in separate
+sections, as in ELF.
+
+@item INIT_STAB_SECTION
+@cindex INIT_STAB_SECTION
+You may define this macro to initialize the stabs section in the output file.
+
+@item OBJ_PROCESS_STAB
+@cindex OBJ_PROCESS_STAB
+You may define this macro to do specific processing on a stabs entry.
+
+@item obj_frob_section
+@cindex obj_frob_section
+If you define this macro, GAS will call it for each section at the end of the
+assembly.
+
+@item obj_frob_file_before_adjust
+@cindex obj_frob_file_before_adjust
+If you define this macro, GAS will call it after the symbol values are
+resolved, but before the fixups have been changed from local symbols to section
+symbols.
+
+@item obj_frob_symbol
+@cindex obj_frob_symbol
+If you define this macro, GAS will call it for each symbol. You can indicate
+that the symbol should not be included in the object file by definining this
+macro to set its second argument to a non-zero value.
+
+@item obj_frob_file
+@cindex obj_frob_file
+If you define this macro, GAS will call it after the symbol table has been
+completed, but before the relocations have been generated.
+
+@item obj_frob_file_after_relocs
+If you define this macro, GAS will call it after the relocs have been
+generated.
+@end table
+
+@node Emulations
+@subsection Writing emulation files
+
+Normally you do not have to write an emulation file. You can just use
+@file{te-generic.h}.
+
+If you do write your own emulation file, it must include @file{obj-format.h}.
+
+An emulation file will often define @code{TE_@var{EM}}; this may then be used
+in other files to change the output.
+
+@node Relaxation
+@section Relaxation
+@cindex relaxation
+
+@dfn{Relaxation} is a generic term used when the size of some instruction or
+data depends upon the value of some symbol or other data.
+
+GAS knows to relax a particular type of PC relative relocation using a table.
+You can also define arbitrarily complex forms of relaxation yourself.
+
+@menu
+* Relaxing with a table:: Relaxing with a table
+* General relaxing:: General relaxing
+@end menu
+
+@node Relaxing with a table
+@subsection Relaxing with a table
+
+If you do not define @code{md_relax_frag}, and you do define
+@code{TC_GENERIC_RELAX_TABLE}, GAS will relax @code{rs_machine_dependent} frags
+based on the frag subtype and the displacement to some specified target
+address. The basic idea is that several machines have different addressing
+modes for instructions that can specify different ranges of values, with
+successive modes able to access wider ranges, including the entirety of the
+previous range. Smaller ranges are assumed to be more desirable (perhaps the
+instruction requires one word instead of two or three); if this is not the
+case, don't describe the smaller-range, inferior mode.
+
+The @code{fr_subtype} field of a frag is an index into a CPU-specific
+relaxation table. That table entry indicates the range of values that can be
+stored, the number of bytes that will have to be added to the frag to
+accomodate the addressing mode, and the index of the next entry to examine if
+the value to be stored is outside the range accessible by the current
+addressing mode. The @code{fr_symbol} field of the frag indicates what symbol
+is to be accessed; the @code{fr_offset} field is added in.
+
+If the @code{TC_PCREL_ADJUST} macro is defined, which currently should only happen
+for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to
+compute an adjustment to be made to the displacement.
+
+The value fitted by the relaxation code is always assumed to be a displacement
+from the current frag. (More specifically, from @code{fr_fix} bytes into the
+frag.)
+@ignore
+This seems kinda silly. What about fitting small absolute values? I suppose
+@code{md_assemble} is supposed to take care of that, but if the operand is a
+difference between symbols, it might not be able to, if the difference was not
+computable yet.
+@end ignore
+
+The end of the relaxation sequence is indicated by a ``next'' value of 0. This
+means that the first entry in the table can't be used.
+
+For some configurations, the linker can do relaxing within a section of an
+object file. If call instructions of various sizes exist, the linker can
+determine which should be used in each instance, when a symbol's value is
+resolved. In order for the linker to avoid wasting space and having to insert
+no-op instructions, it must be able to expand or shrink the section contents
+while still preserving intra-section references and meeting alignment
+requirements.
+
+For the i960 using b.out format, no expansion is done; instead, each
+@samp{.align} directive causes extra space to be allocated, enough that when
+the linker is relaxing a section and removing unneeded space, it can discard
+some or all of this extra padding and cause the following data to be correctly
+aligned.
+
+For the H8/300, I think the linker expands calls that can't reach, and doesn't
+worry about alignment issues; the cpu probably never needs any significant
+alignment beyond the instruction size.
+
+The relaxation table type contains these fields:
+
+@table @code
+@item long rlx_forward
+Forward reach, must be non-negative.
+@item long rlx_backward
+Backward reach, must be zero or negative.
+@item rlx_length
+Length in bytes of this addressing mode.
+@item rlx_more
+Index of the next-longer relax state, or zero if there is no next relax state.
+@end table
+
+The relaxation is done in @code{relax_segment} in @file{write.c}. The
+difference in the length fields between the original mode and the one finally
+chosen by the relaxing code is taken as the size by which the current frag will
+be increased in size. For example, if the initial relaxing mode has a length
+of 2 bytes, and because of the size of the displacement, it gets upgraded to a
+mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
+(The initial two bytes should have been part of the fixed portion of the frag,
+since it is already known that they will be output.) This growth must be
+effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field
+by the appropriate size, and fill in the appropriate bytes of the frag.
+(Enough space for the maximum growth should have been allocated in the call to
+frag_var as the second argument.)
+
+If relocation records are needed, they should be emitted by
+@code{md_estimate_size_before_relax}. This function should examine the target
+symbol of the supplied frag and correct the @code{fr_subtype} of the frag if
+needed. When this function is called, if the symbol has not yet been defined,
+it will not become defined later; however, its value may still change if the
+section it is in gets relaxed.
+
+Usually, if the symbol is in the same section as the frag (given by the
+@var{sec} argument), the narrowest likely relaxation mode is stored in
+@code{fr_subtype}, and that's that.
+
+If the symbol is undefined, or in a different section (and therefore moveable
+to an arbitrarily large distance), the largest available relaxation mode is
+specified, @code{fix_new} is called to produce the relocation record,
+@code{fr_fix} is increased to include the relocated field (remember, this
+storage was allocated when @code{frag_var} was called), and @code{frag_wane} is
+called to convert the frag to an @code{rs_fill} frag with no variant part.
+Sometimes changing addressing modes may also require rewriting the instruction.
+It can be accessed via @code{fr_opcode} or @code{fr_fix}.
+
+Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not
+called. I'm not sure, but I think this is to keep @code{fr_fix} referring to
+an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so
+that @code{md_convert_frag} will get called.
+
+@node General relaxing
+@subsection General relaxing
+
+If using a simple table is not suitable, you may implement arbitrarily complex
+relaxation semantics yourself. For example, the MIPS backend uses this to emit
+different instruction sequences depending upon the size of the symbol being
+accessed.
+
+When you assemble an instruction that may need relaxation, you should allocate
+a frag using @code{frag_var} or @code{frag_variant} with a type of
+@code{rs_machine_dependent}. You should store some sort of information in the
+@code{fr_subtype} field so that you can figure out what to do with the frag
+later.
+
+When GAS reaches the end of the input file, it will look through the frags and
+work out their final sizes.
+
+GAS will first call @code{md_estimate_size_before_relax} on each
+@code{rs_machine_dependent} frag. This function must return an estimated size
+for the frag.
+
+GAS will then loop over the frags, calling @code{md_relax_frag} on each
+@code{rs_machine_dependent} frag. This function should return the change in
+size of the frag. GAS will keep looping over the frags until none of the frags
+changes size.
+
+@node Broken words
+@section Broken words
+@cindex internals, broken words
+@cindex broken words
+
+Some compilers, including GCC, will sometimes emit switch tables specifying
+16-bit @code{.word} displacements to branch targets, and branch instructions
+that load entries from that table to compute the target address. If this is
+done on a 32-bit machine, there is a chance (at least with really large
+functions) that the displacement will not fit in 16 bits. The assembler
+handles this using a concept called @dfn{broken words}. This idea is well
+named, since there is an implied promise that the 16-bit field will in fact
+hold the specified displacement.
+
+If broken word processing is enabled, and a situation like this is encountered,
+the assembler will insert a jump instruction into the instruction stream, close
+enough to be reached with the 16-bit displacement. This jump instruction will
+transfer to the real desired target address. Thus, as long as the @code{.word}
+value really is used as a displacement to compute an address to jump to, the
+net effect will be correct (minus a very small efficiency cost). If
+@code{.word} directives with label differences for values are used for other
+purposes, however, things may not work properly. For targets which use broken
+words, the @samp{-K} option will warn when a broken word is discovered.
+
+The broken word code is turned off by the @code{WORKING_DOT_WORD} macro. It
+isn't needed if @code{.word} emits a value large enough to contain an address
+(or, more correctly, any possible difference between two addresses).
+
+@node Internal functions
+@section Internal functions
+
+This section describes basic internal functions used by GAS.
+
+@menu
+* Warning and error messages:: Warning and error messages
+* Hash tables:: Hash tables
+@end menu
+
+@node Warning and error messages
+@subsection Warning and error messages
+
+@deftypefun @{@} int had_warnings (void)
+@deftypefunx @{@} int had_errors (void)
+Returns non-zero if any warnings or errors, respectively, have been printed
+during this invocation.
+@end deftypefun
+
+@deftypefun @{@} void as_perror (const char *@var{gripe}, const char *@var{filename})
+Displays a BFD or system error, then clears the error status.
+@end deftypefun
+
+@deftypefun @{@} void as_tsktsk (const char *@var{format}, ...)
+@deftypefunx @{@} void as_warn (const char *@var{format}, ...)
+@deftypefunx @{@} void as_bad (const char *@var{format}, ...)
+@deftypefunx @{@} void as_fatal (const char *@var{format}, ...)
+These functions display messages about something amiss with the input file, or
+internal problems in the assembler itself. The current file name and line
+number are printed, followed by the supplied message, formatted using
+@code{vfprintf}, and a final newline.
+
+An error indicated by @code{as_bad} will result in a non-zero exit status when
+the assembler has finished. Calling @code{as_fatal} will result in immediate
+termination of the assembler process.
+@end deftypefun
+
+@deftypefun @{@} void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
+@deftypefunx @{@} void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
+These variants permit specification of the file name and line number, and are
+used when problems are detected when reprocessing information saved away when
+processing some earlier part of the file. For example, fixups are processed
+after all input has been read, but messages about fixups should refer to the
+original filename and line number that they are applicable to.
+@end deftypefun
+
+@deftypefun @{@} void fprint_value (FILE *@var{file}, valueT @var{val})
+@deftypefunx @{@} void sprint_value (char *@var{buf}, valueT @var{val})
+These functions are helpful for converting a @code{valueT} value into printable
+format, in case it's wider than modes that @code{*printf} can handle. If the
+type is narrow enough, a decimal number will be produced; otherwise, it will be
+in hexadecimal. The value itself is not examined to make this determination.
+@end deftypefun
+
+@node Hash tables
+@subsection Hash tables
+@cindex hash tables
+
+@deftypefun @{@} @{struct hash_control *@} hash_new (void)
+Creates the hash table control structure.
+@end deftypefun
+
+@deftypefun @{@} void hash_die (struct hash_control *)
+Destroy a hash table.
+@end deftypefun
+
+@deftypefun @{@} PTR hash_delete (struct hash_control *, const char *)
+Deletes entry from the hash table, returns the value it had.
+@end deftypefun
+
+@deftypefun @{@} PTR hash_replace (struct hash_control *, const char *, PTR)
+Updates the value for an entry already in the table, returning the old value.
+If no entry was found, just returns NULL.
+@end deftypefun
+
+@deftypefun @{@} @{const char *@} hash_insert (struct hash_control *, const char *, PTR)
+Inserting a value already in the table is an error.
+Returns an error message or NULL.
+@end deftypefun
+
+@deftypefun @{@} @{const char *@} hash_jam (struct hash_control *, const char *, PTR)
+Inserts if the value isn't already present, updates it if it is.
+@end deftypefun
+
+@node Test suite
+@section Test suite
+@cindex test suite
+
+The test suite is kind of lame for most processors. Often it only checks to
+see if a couple of files can be assembled without the assembler reporting any
+errors. For more complete testing, write a test which either examines the
+assembler listing, or runs @code{objdump} and examines its output. For the
+latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the
+base name of a file, and looks for @file{@var{file}.d}. This file should
+contain as its initial lines a set of variable settings in @samp{#} comments,
+in the form:
+
+@example
+ #@var{varname}: @var{value}
+@end example
+
+The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case
+it specifies the options to be passed to the specified programs. Exactly one
+of @code{objdump} or @code{nm} must be specified, as that also specifies which
+program to run after the assembler has finished. If @var{varname} is
+@code{source}, it specifies the name of the source file; otherwise,
+@file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the
+name of the test to be used in the @code{pass} or @code{fail} messages.
+
+The non-commented parts of the file are interpreted as regular expressions, one
+per line. Blank lines in the @code{objdump} or @code{nm} output are skipped,
+as are blank lines in the @code{.d} file; the other lines are tested to see if
+the regular expression matches the program output. If it does not, the test
+fails.
+
+Note that this means the tests must be modified if the @code{objdump} output
+style is changed.
+
+@bye
+@c Local Variables:
+@c fill-column: 79
+@c End: