diff options
Diffstat (limited to 'gprof/gprof.texi')
-rw-r--r-- | gprof/gprof.texi | 1038 |
1 files changed, 1038 insertions, 0 deletions
diff --git a/gprof/gprof.texi b/gprof/gprof.texi new file mode 100644 index 0000000..81d5f0f --- /dev/null +++ b/gprof/gprof.texi @@ -0,0 +1,1038 @@ +\input texinfo @c -*-texinfo-*- +@setfilename gprof.info +@settitle GNU gprof +@setchapternewpage odd +@ifinfo +This file documents the gprof profiler of the GNU system. + +Copyright (C) 1988, 1992 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through Tex and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the above conditions for modified versions. +@end ifinfo + +@finalout +@smallbook + +@titlepage +@title GNU gprof +@subtitle The @sc{gnu} Profiler +@author Jay Fenlason and Richard Stallman + +@page + +This manual describes the @sc{gnu} profiler, @code{gprof}, and how you +can use it to determine which parts of a program are taking most of the +execution time. We assume that you know how to write, compile, and +execute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason. + +This manual was edited January 1993 by Jeffrey Osier. + +@vskip 0pt plus 1filll +Copyright @copyright{} 1988, 1992 Free Software Foundation, Inc. + +Permission is granted to make and distribute verbatim copies of +this manual provided the copyright notice and this permission notice +are preserved on all copies. + +@ignore +Permission is granted to process this file through TeX and print the +results, provided the printed document carries copying permission +notice identical to this one except for the removal of this paragraph +(this paragraph not being relevant to the printed manual). + +@end ignore +Permission is granted to copy and distribute modified versions of this +manual under the conditions for verbatim copying, provided that the entire +resulting derived work is distributed under the terms of a permission +notice identical to this one. + +Permission is granted to copy and distribute translations of this manual +into another language, under the same conditions as for modified versions. + +@end titlepage + +@ifinfo +@node Top +@top Profiling a Program: Where Does It Spend Its Time? + +This manual describes the @sc{gnu} profiler, @code{gprof}, and how you +can use it to determine which parts of a program are taking most of the +execution time. We assume that you know how to write, compile, and +execute programs. @sc{gnu} @code{gprof} was written by Jay Fenlason. + +@menu +* Why:: What profiling means, and why it is useful. +* Compiling:: How to compile your program for profiling. +* Executing:: How to execute your program to generate the + profile data file @file{gmon.out}. +* Invoking:: How to run @code{gprof}, and how to specify + options for it. + +* Flat Profile:: The flat profile shows how much time was spent + executing directly in each function. +* Call Graph:: The call graph shows which functions called which + others, and how much time each function used + when its subroutine calls are included. + +* Implementation:: How the profile data is recorded and written. +* Sampling Error:: Statistical margins of error. + How to accumulate data from several runs + to make it more accurate. + +* Assumptions:: Some of @code{gprof}'s measurements are based + on assumptions about your program + that could be very wrong. + +* Incompatibilities:: (between GNU @code{gprof} and Unix @code{gprof}.) +@end menu +@end ifinfo + +@node Why +@chapter Why Profile + +Profiling allows you to learn where your program spent its time and which +functions called which other functions while it was executing. This +information can show you which pieces of your program are slower than you +expected, and might be candidates for rewriting to make your program +execute faster. It can also tell you which functions are being called more +or less often than you expected. This may help you spot bugs that had +otherwise been unnoticed. + +Since the profiler uses information collected during the actual execution +of your program, it can be used on programs that are too large or too +complex to analyze by reading the source. However, how your program is run +will affect the information that shows up in the profile data. If you +don't use some feature of your program while it is being profiled, no +profile information will be generated for that feature. + +Profiling has several steps: + +@itemize @bullet +@item +You must compile and link your program with profiling enabled. +@xref{Compiling}. + +@item +You must execute your program to generate a profile data file. +@xref{Executing}. + +@item +You must run @code{gprof} to analyze the profile data. +@xref{Invoking}. +@end itemize + +The next three chapters explain these steps in greater detail. + +The result of the analysis is a file containing two tables, the +@dfn{flat profile} and the @dfn{call graph} (plus blurbs which briefly +explain the contents of these tables). + +The flat profile shows how much time your program spent in each function, +and how many times that function was called. If you simply want to know +which functions burn most of the cycles, it is stated concisely here. +@xref{Flat Profile}. + +The call graph shows, for each function, which functions called it, which +other functions it called, and how many times. There is also an estimate +of how much time was spent in the subroutines of each function. This can +suggest places where you might try to eliminate function calls that use a +lot of time. @xref{Call Graph}. + +@node Compiling +@chapter Compiling a Program for Profiling + +The first step in generating profile information for your program is +to compile and link it with profiling enabled. + +To compile a source file for profiling, specify the @samp{-pg} option when +you run the compiler. (This is in addition to the options you normally +use.) + +To link the program for profiling, if you use a compiler such as @code{cc} +to do the linking, simply specify @samp{-pg} in addition to your usual +options. The same option, @samp{-pg}, alters either compilation or linking +to do what is necessary for profiling. Here are examples: + +@example +cc -g -c myprog.c utils.c -pg +cc -o myprog myprog.o utils.o -pg +@end example + +The @samp{-pg} option also works with a command that both compiles and links: + +@example +cc -o myprog myprog.c utils.c -g -pg +@end example + +If you run the linker @code{ld} directly instead of through a compiler +such as @code{cc}, you must specify the profiling startup file +@file{/lib/gcrt0.o} as the first input file instead of the usual startup +file @file{/lib/crt0.o}. In addition, you would probably want to +specify the profiling C library, @file{/usr/lib/libc_p.a}, by writing +@samp{-lc_p} instead of the usual @samp{-lc}. This is not absolutely +necessary, but doing this gives you number-of-calls information for +standard library functions such as @code{read} and @code{open}. For +example: + +@example +ld -o myprog /lib/gcrt0.o myprog.o utils.o -lc_p +@end example + +If you compile only some of the modules of the program with @samp{-pg}, you +can still profile the program, but you won't get complete information about +the modules that were compiled without @samp{-pg}. The only information +you get for the functions in those modules is the total time spent in them; +there is no record of how many times they were called, or from where. This +will not affect the flat profile (except that the @code{calls} field for +the functions will be blank), but will greatly reduce the usefulness of the +call graph. + +@node Executing +@chapter Executing the Program to Generate Profile Data + +Once the program is compiled for profiling, you must run it in order to +generate the information that @code{gprof} needs. Simply run the program +as usual, using the normal arguments, file names, etc. The program should +run normally, producing the same output as usual. It will, however, run +somewhat slower than normal because of the time spent collecting and the +writing the profile data. + +The way you run the program---the arguments and input that you give +it---may have a dramatic effect on what the profile information shows. The +profile data will describe the parts of the program that were activated for +the particular input you use. For example, if the first command you give +to your program is to quit, the profile data will show the time used in +initialization and in cleanup, but not much else. + +You program will write the profile data into a file called @file{gmon.out} +just before exiting. If there is already a file called @file{gmon.out}, +its contents are overwritten. There is currently no way to tell the +program to write the profile data under a different name, but you can rename +the file afterward if you are concerned that it may be overwritten. + +In order to write the @file{gmon.out} file properly, your program must exit +normally: by returning from @code{main} or by calling @code{exit}. Calling +the low-level function @code{_exit} does not write the profile data, and +neither does abnormal termination due to an unhandled signal. + +The @file{gmon.out} file is written in the program's @emph{current working +directory} at the time it exits. This means that if your program calls +@code{chdir}, the @file{gmon.out} file will be left in the last directory +your program @code{chdir}'d to. If you don't have permission to write in +this directory, the file is not written. You may get a confusing error +message if this happens. (We have not yet replaced the part of Unix +responsible for this; when we do, we will make the error message +comprehensible.) + +@node Invoking +@chapter @code{gprof} Command Summary + +After you have a profile data file @file{gmon.out}, you can run @code{gprof} +to interpret the information in it. The @code{gprof} program prints a +flat profile and a call graph on standard output. Typically you would +redirect the output of @code{gprof} into a file with @samp{>}. + +You run @code{gprof} like this: + +@smallexample +gprof @var{options} [@var{executable-file} [@var{profile-data-files}@dots{}]] [> @var{outfile}] +@end smallexample + +@noindent +Here square-brackets indicate optional arguments. + +If you omit the executable file name, the file @file{a.out} is used. If +you give no profile data file name, the file @file{gmon.out} is used. If +any file is not in the proper format, or if the profile data file does not +appear to belong to the executable file, an error message is printed. + +You can give more than one profile data file by entering all their names +after the executable file name; then the statistics in all the data files +are summed together. + +The following options may be used to selectively include or exclude +functions in the output: + +@table @code +@item -a +The @samp{-a} option causes @code{gprof} to suppress the printing of +statically declared (private) functions. (These are functions whose +names are not listed as global, and which are not visible outside the +file/function/block where they were defined.) Time spent in these +functions, calls to/from them, etc, will all be attributed to the +function that was loaded directly before it in the executable file. +@c This is compatible with Unix @code{gprof}, but a bad idea. +This option affects both the flat profile and the call graph. + +@item -e @var{function_name} +The @samp{-e @var{function}} option tells @code{gprof} to not print +information about the function @var{function_name} (and its +children@dots{}) in the call graph. The function will still be listed +as a child of any functions that call it, but its index number will be +shown as @samp{[not printed]}. More than one @samp{-e} option may be +given; only one @var{function_name} may be indicated with each @samp{-e} +option. + +@item -E @var{function_name} +The @code{-E @var{function}} option works like the @code{-e} option, but +time spent in the function (and children who were not called from +anywhere else), will not be used to compute the percentages-of-time for +the call graph. More than one @samp{-E} option may be given; only one +@var{function_name} may be indicated with each @samp{-E} option. + +@item -f @var{function_name} +The @samp{-f @var{function}} option causes @code{gprof} to limit the +call graph to the function @var{function_name} and its children (and +their children@dots{}). More than one @samp{-f} option may be given; +only one @var{function_name} may be indicated with each @samp{-f} +option. + +@item -F @var{function_name} +The @samp{-F @var{function}} option works like the @code{-f} option, but +only time spent in the function and its children (and their +children@dots{}) will be used to determine total-time and +percentages-of-time for the call graph. More than one @samp{-F} option +may be given; only one @var{function_name} may be indicated with each +@samp{-F} option. The @samp{-F} option overrides the @samp{-E} option. + +@item -k @var{from@dots{}} @var{to@dots{}} +The @samp{-k} option allows you to delete from the profile any arcs from +routine @var{from} to routine @var{to}. + +@item -z +If you give the @samp{-z} option, @code{gprof} will mention all +functions in the flat profile, even those that were never called, and +that had no time spent in them. This is useful in conjunction with the +@samp{-c} option for discovering which routines were never called. +@end table + +The order of these options does not matter. + +Note that only one function can be specified with each @code{-e}, +@code{-E}, @code{-f} or @code{-F} option. To specify more than one +function, use multiple options. For example, this command: + +@example +gprof -e boring -f foo -f bar myprogram > gprof.output +@end example + +@noindent +lists in the call graph all functions that were reached from either +@code{foo} or @code{bar} and were not reachable from @code{boring}. + +There are a few other useful @code{gprof} options: + +@table @code +@item -b +If the @samp{-b} option is given, @code{gprof} doesn't print the +verbose blurbs that try to explain the meaning of all of the fields in +the tables. This is useful if you intend to print out the output, or +are tired of seeing the blurbs. + +@item -c +The @samp{-c} option causes the static call-graph of the program to be +discovered by a heuristic which examines the text space of the object +file. Static-only parents or children are indicated with call counts of +@samp{0}. + +@item -d @var{num} +The @samp{-d @var{num}} option specifies debugging options. +@c @xref{debugging}. + +@item -s +The @samp{-s} option causes @code{gprof} to summarize the information +in the profile data files it read in, and write out a profile data +file called @file{gmon.sum}, which contains all the information from +the profile data files that @code{gprof} read in. The file @file{gmon.sum} +may be one of the specified input files; the effect of this is to +merge the data in the other input files into @file{gmon.sum}. +@xref{Sampling Error}. + +Eventually you can run @code{gprof} again without @samp{-s} to analyze the +cumulative data in the file @file{gmon.sum}. + +@item -T +The @samp{-T} option causes @code{gprof} to print its output in +``traditional'' BSD style. +@end table + +@node Flat Profile +@chapter How to Understand the Flat Profile +@cindex flat profile + +The @dfn{flat profile} shows the total amount of time your program +spent executing each function. Unless the @samp{-z} option is given, +functions with no apparent time spent in them, and no apparent calls +to them, are not mentioned. Note that if a function was not compiled +for profiling, and didn't run long enough to show up on the program +counter histogram, it will be indistinguishable from a function that +was never called. + +This is part of a flat profile for a small program: + +@smallexample +@group +Flat profile: + +Each sample counts as 0.01 seconds. + % cumulative self self total + time seconds seconds calls ms/call ms/call name + 33.34 0.02 0.02 7208 0.00 0.00 open + 16.67 0.03 0.01 244 0.04 0.12 offtime + 16.67 0.04 0.01 8 1.25 1.25 memccpy + 16.67 0.05 0.01 7 1.43 1.43 write + 16.67 0.06 0.01 mcount + 0.00 0.06 0.00 236 0.00 0.00 tzset + 0.00 0.06 0.00 192 0.00 0.00 tolower + 0.00 0.06 0.00 47 0.00 0.00 strlen + 0.00 0.06 0.00 45 0.00 0.00 strchr + 0.00 0.06 0.00 1 0.00 50.00 main + 0.00 0.06 0.00 1 0.00 0.00 memcpy + 0.00 0.06 0.00 1 0.00 10.11 print + 0.00 0.06 0.00 1 0.00 0.00 profil + 0.00 0.06 0.00 1 0.00 50.00 report +@dots{} +@end group +@end smallexample + +@noindent +The functions are sorted by decreasing run-time spent in them. The +functions @samp{mcount} and @samp{profil} are part of the profiling +aparatus and appear in every flat profile; their time gives a measure of +the amount of overhead due to profiling. + +The sampling period estimates the margin of error in each of the time +figures. A time figure that is not much larger than this is not +reliable. In this example, the @samp{self seconds} field for +@samp{mcount} might well be @samp{0} or @samp{0.04} in another run. +@xref{Sampling Error}, for a complete discussion. + +Here is what the fields in each line mean: + +@table @code +@item % time +This is the percentage of the total execution time your program spent +in this function. These should all add up to 100%. + +@item cumulative seconds +This is the cumulative total number of seconds the computer spent +executing this functions, plus the time spent in all the functions +above this one in this table. + +@item self seconds +This is the number of seconds accounted for by this function alone. +The flat profile listing is sorted first by this number. + +@item calls +This is the total number of times the function was called. If the +function was never called, or the number of times it was called cannot +be determined (probably because the function was not compiled with +profiling enabled), the @dfn{calls} field is blank. + +@item self ms/call +This represents the average number of milliseconds spent in this +function per call, if this function is profiled. Otherwise, this field +is blank for this function. + +@item total ms/call +This represents the average number of milliseconds spent in this +function and its descendants per call, if this function is profiled. +Otherwise, this field is blank for this function. + +@item name +This is the name of the function. The flat profile is sorted by this +field alphabetically after the @dfn{self seconds} field is sorted. +@end table + +@node Call Graph +@chapter How to Read the Call Graph +@cindex call graph + +The @dfn{call graph} shows how much time was spent in each function +and its children. From this information, you can find functions that, +while they themselves may not have used much time, called other +functions that did use unusual amounts of time. + +Here is a sample call from a small program. This call came from the +same @code{gprof} run as the flat profile example in the previous +chapter. + +@smallexample +@group +granularity: each sample hit covers 2 byte(s) for 20.00% of 0.05 seconds + +index % time self children called name + <spontaneous> +[1] 100.0 0.00 0.05 start [1] + 0.00 0.05 1/1 main [2] + 0.00 0.00 1/2 on_exit [28] + 0.00 0.00 1/1 exit [59] +----------------------------------------------- + 0.00 0.05 1/1 start [1] +[2] 100.0 0.00 0.05 1 main [2] + 0.00 0.05 1/1 report [3] +----------------------------------------------- + 0.00 0.05 1/1 main [2] +[3] 100.0 0.00 0.05 1 report [3] + 0.00 0.03 8/8 timelocal [6] + 0.00 0.01 1/1 print [9] + 0.00 0.01 9/9 fgets [12] + 0.00 0.00 12/34 strncmp <cycle 1> [40] + 0.00 0.00 8/8 lookup [20] + 0.00 0.00 1/1 fopen [21] + 0.00 0.00 8/8 chewtime [24] + 0.00 0.00 8/16 skipspace [44] +----------------------------------------------- +[4] 59.8 0.01 0.02 8+472 <cycle 2 as a whole> [4] + 0.01 0.02 244+260 offtime <cycle 2> [7] + 0.00 0.00 236+1 tzset <cycle 2> [26] +----------------------------------------------- +@end group +@end smallexample + +The lines full of dashes divide this table into @dfn{entries}, one for each +function. Each entry has one or more lines. + +In each entry, the primary line is the one that starts with an index number +in square brackets. The end of this line says which function the entry is +for. The preceding lines in the entry describe the callers of this +function and the following lines describe its subroutines (also called +@dfn{children} when we speak of the call graph). + +The entries are sorted by time spent in the function and its subroutines. + +The internal profiling function @code{mcount} (@pxref{Flat Profile}) +is never mentioned in the call graph. + +@menu +* Primary:: Details of the primary line's contents. +* Callers:: Details of caller-lines' contents. +* Subroutines:: Details of subroutine-lines' contents. +* Cycles:: When there are cycles of recursion, + such as @code{a} calls @code{b} calls @code{a}@dots{} +@end menu + +@node Primary +@section The Primary Line + +The @dfn{primary line} in a call graph entry is the line that +describes the function which the entry is about and gives the overall +statistics for this function. + +For reference, we repeat the primary line from the entry for function +@code{report} in our main example, together with the heading line that +shows the names of the fields: + +@smallexample +@group +index % time self children called name +@dots{} +[3] 100.0 0.00 0.05 1 report [3] +@end group +@end smallexample + +Here is what the fields in the primary line mean: + +@table @code +@item index +Entries are numbered with consecutive integers. Each function +therefore has an index number, which appears at the beginning of its +primary line. + +Each cross-reference to a function, as a caller or subroutine of +another, gives its index number as well as its name. The index number +guides you if you wish to look for the entry for that function. + +@item % time +This is the percentage of the total time that was spent in this +function, including time spent in subroutines called from this +function. + +The time spent in this function is counted again for the callers of +this function. Therefore, adding up these percentages is meaningless. + +@item self +This is the total amount of time spent in this function. This +should be identical to the number printed in the @code{seconds} field +for this function in the flat profile. + +@item children +This is the total amount of time spent in the subroutine calls made by +this function. This should be equal to the sum of all the @code{self} +and @code{children} entries of the children listed directly below this +function. + +@item called +This is the number of times the function was called. + +If the function called itself recursively, there are two numbers, +separated by a @samp{+}. The first number counts non-recursive calls, +and the second counts recursive calls. + +In the example above, the function @code{report} was called once from +@code{main}. + +@item name +This is the name of the current function. The index number is +repeated after it. + +If the function is part of a cycle of recursion, the cycle number is +printed between the function's name and the index number +(@pxref{Cycles}). For example, if function @code{gnurr} is part of +cycle number one, and has index number twelve, its primary line would +be end like this: + +@example +gnurr <cycle 1> [12] +@end example +@end table + +@node Callers, Subroutines, Primary, Call Graph +@section Lines for a Function's Callers + +A function's entry has a line for each function it was called by. +These lines' fields correspond to the fields of the primary line, but +their meanings are different because of the difference in context. + +For reference, we repeat two lines from the entry for the function +@code{report}, the primary line and one caller-line preceding it, together +with the heading line that shows the names of the fields: + +@smallexample +index % time self children called name +@dots{} + 0.00 0.05 1/1 main [2] +[3] 100.0 0.00 0.05 1 report [3] +@end smallexample + +Here are the meanings of the fields in the caller-line for @code{report} +called from @code{main}: + +@table @code +@item self +An estimate of the amount of time spent in @code{report} itself when it was +called from @code{main}. + +@item children +An estimate of the amount of time spent in subroutines of @code{report} +when @code{report} was called from @code{main}. + +The sum of the @code{self} and @code{children} fields is an estimate +of the amount of time spent within calls to @code{report} from @code{main}. + +@item called +Two numbers: the number of times @code{report} was called from @code{main}, +followed by the total number of nonrecursive calls to @code{report} from +all its callers. + +@item name and index number +The name of the caller of @code{report} to which this line applies, +followed by the caller's index number. + +Not all functions have entries in the call graph; some +options to @code{gprof} request the omission of certain functions. +When a caller has no entry of its own, it still has caller-lines +in the entries of the functions it calls. + +If the caller is part of a recursion cycle, the cycle number is +printed between the name and the index number. +@end table + +If the identity of the callers of a function cannot be determined, a +dummy caller-line is printed which has @samp{<spontaneous>} as the +``caller's name'' and all other fields blank. This can happen for +signal handlers. +@c What if some calls have determinable callers' names but not all? +@c FIXME - still relevant? + +@node Subroutines, Cycles, Callers, Call Graph +@section Lines for a Function's Subroutines + +A function's entry has a line for each of its subroutines---in other +words, a line for each other function that it called. These lines' +fields correspond to the fields of the primary line, but their meanings +are different because of the difference in context. + +For reference, we repeat two lines from the entry for the function +@code{main}, the primary line and a line for a subroutine, together +with the heading line that shows the names of the fields: + +@smallexample +index % time self children called name +@dots{} +[2] 100.0 0.00 0.05 1 main [2] + 0.00 0.05 1/1 report [3] +@end smallexample + +Here are the meanings of the fields in the subroutine-line for @code{main} +calling @code{report}: + +@table @code +@item self +An estimate of the amount of time spent directly within @code{report} +when @code{report} was called from @code{main}. + +@item children +An estimate of the amount of time spent in subroutines of @code{report} +when @code{report} was called from @code{main}. + +The sum of the @code{self} and @code{children} fields is an estimate +of the total time spent in calls to @code{report} from @code{main}. + +@item called +Two numbers, the number of calls to @code{report} from @code{main} +followed by the total number of nonrecursive calls to @code{report}. + +@item name +The name of the subroutine of @code{main} to which this line applies, +followed by the subroutine's index number. + +If the caller is part of a recursion cycle, the cycle number is +printed between the name and the index number. +@end table + +@node Cycles,, Subroutines, Call Graph +@section How Mutually Recursive Functions Are Described +@cindex cycle +@cindex recursion cycle + +The graph may be complicated by the presence of @dfn{cycles of +recursion} in the call graph. A cycle exists if a function calls +another function that (directly or indirectly) calls (or appears to +call) the original function. For example: if @code{a} calls @code{b}, +and @code{b} calls @code{a}, then @code{a} and @code{b} form a cycle. + +Whenever there are call-paths both ways between a pair of functions, they +belong to the same cycle. If @code{a} and @code{b} call each other and +@code{b} and @code{c} call each other, all three make one cycle. Note that +even if @code{b} only calls @code{a} if it was not called from @code{a}, +@code{gprof} cannot determine this, so @code{a} and @code{b} are still +considered a cycle. + +The cycles are numbered with consecutive integers. When a function +belongs to a cycle, each time the function name appears in the call graph +it is followed by @samp{<cycle @var{number}>}. + +The reason cycles matter is that they make the time values in the call +graph paradoxical. The ``time spent in children'' of @code{a} should +include the time spent in its subroutine @code{b} and in @code{b}'s +subroutines---but one of @code{b}'s subroutines is @code{a}! How much of +@code{a}'s time should be included in the children of @code{a}, when +@code{a} is indirectly recursive? + +The way @code{gprof} resolves this paradox is by creating a single entry +for the cycle as a whole. The primary line of this entry describes the +total time spent directly in the functions of the cycle. The +``subroutines'' of the cycle are the individual functions of the cycle, and +all other functions that were called directly by them. The ``callers'' of +the cycle are the functions, outside the cycle, that called functions in +the cycle. + +Here is an example portion of a call graph which shows a cycle containing +functions @code{a} and @code{b}. The cycle was entered by a call to +@code{a} from @code{main}; both @code{a} and @code{b} called @code{c}. + +@smallexample +index % time self children called name +---------------------------------------- + 1.77 0 1/1 main [2] +[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] + 1.02 0 3 b <cycle 1> [4] + 0.75 0 2 a <cycle 1> [5] +---------------------------------------- + 3 a <cycle 1> [5] +[4] 52.85 1.02 0 0 b <cycle 1> [4] + 2 a <cycle 1> [5] + 0 0 3/6 c [6] +---------------------------------------- + 1.77 0 1/1 main [2] + 2 b <cycle 1> [4] +[5] 38.86 0.75 0 1 a <cycle 1> [5] + 3 b <cycle 1> [4] + 0 0 3/6 c [6] +---------------------------------------- +@end smallexample + +@noindent +(The entire call graph for this program contains in addition an entry for +@code{main}, which calls @code{a}, and an entry for @code{c}, with callers +@code{a} and @code{b}.) + +@smallexample +index % time self children called name + <spontaneous> +[1] 100.00 0 1.93 0 start [1] + 0.16 1.77 1/1 main [2] +---------------------------------------- + 0.16 1.77 1/1 start [1] +[2] 100.00 0.16 1.77 1 main [2] + 1.77 0 1/1 a <cycle 1> [5] +---------------------------------------- + 1.77 0 1/1 main [2] +[3] 91.71 1.77 0 1+5 <cycle 1 as a whole> [3] + 1.02 0 3 b <cycle 1> [4] + 0.75 0 2 a <cycle 1> [5] + 0 0 6/6 c [6] +---------------------------------------- + 3 a <cycle 1> [5] +[4] 52.85 1.02 0 0 b <cycle 1> [4] + 2 a <cycle 1> [5] + 0 0 3/6 c [6] +---------------------------------------- + 1.77 0 1/1 main [2] + 2 b <cycle 1> [4] +[5] 38.86 0.75 0 1 a <cycle 1> [5] + 3 b <cycle 1> [4] + 0 0 3/6 c [6] +---------------------------------------- + 0 0 3/6 b <cycle 1> [4] + 0 0 3/6 a <cycle 1> [5] +[6] 0.00 0 0 6 c [6] +---------------------------------------- +@end smallexample + +The @code{self} field of the cycle's primary line is the total time +spent in all the functions of the cycle. It equals the sum of the +@code{self} fields for the individual functions in the cycle, found +in the entry in the subroutine lines for these functions. + +The @code{children} fields of the cycle's primary line and subroutine lines +count only subroutines outside the cycle. Even though @code{a} calls +@code{b}, the time spent in those calls to @code{b} is not counted in +@code{a}'s @code{children} time. Thus, we do not encounter the problem of +what to do when the time in those calls to @code{b} includes indirect +recursive calls back to @code{a}. + +The @code{children} field of a caller-line in the cycle's entry estimates +the amount of time spent @emph{in the whole cycle}, and its other +subroutines, on the times when that caller called a function in the cycle. + +The @code{calls} field in the primary line for the cycle has two numbers: +first, the number of times functions in the cycle were called by functions +outside the cycle; second, the number of times they were called by +functions in the cycle (including times when a function in the cycle calls +itself). This is a generalization of the usual split into nonrecursive and +recursive calls. + +The @code{calls} field of a subroutine-line for a cycle member in the +cycle's entry says how many time that function was called from functions in +the cycle. The total of all these is the second number in the primary line's +@code{calls} field. + +In the individual entry for a function in a cycle, the other functions in +the same cycle can appear as subroutines and as callers. These lines show +how many times each function in the cycle called or was called from each other +function in the cycle. The @code{self} and @code{children} fields in these +lines are blank because of the difficulty of defining meanings for them +when recursion is going on. + +@node Implementation, Sampling Error, Call Graph, Top +@chapter Implementation of Profiling + +Profiling works by changing how every function in your program is compiled +so that when it is called, it will stash away some information about where +it was called from. From this, the profiler can figure out what function +called it, and can count how many times it was called. This change is made +by the compiler when your program is compiled with the @samp{-pg} option. + +Profiling also involves watching your program as it runs, and keeping a +histogram of where the program counter happens to be every now and then. +Typically the program counter is looked at around 100 times per second of +run time, but the exact frequency may vary from system to system. + +A special startup routine allocates memory for the histogram and sets up +a clock signal handler to make entries in it. Use of this special +startup routine is one of the effects of using @samp{gcc @dots{} -pg} to +link. The startup file also includes an @samp{exit} function which is +responsible for writing the file @file{gmon.out}. + +Number-of-calls information for library routines is collected by using a +special version of the C library. The programs in it are the same as in +the usual C library, but they were compiled with @samp{-pg}. If you +link your program with @samp{gcc @dots{} -pg}, it automatically uses the +profiling version of the library. + +The output from @code{gprof} gives no indication of parts of your program that +are limited by I/O or swapping bandwidth. This is because samples of the +program counter are taken at fixed intervals of run time. Therefore, the +time measurements in @code{gprof} output say nothing about time that your +program was not running. For example, a part of the program that creates +so much data that it cannot all fit in physical memory at once may run very +slowly due to thrashing, but @code{gprof} will say it uses little time. On +the other hand, sampling by run time has the advantage that the amount of +load due to other users won't directly affect the output you get. + +@node Sampling Error, Assumptions, Implementation, Top +@chapter Statistical Inaccuracy of @code{gprof} Output + +The run-time figures that @code{gprof} gives you are based on a sampling +process, so they are subject to statistical inaccuracy. If a function runs +only a small amount of time, so that on the average the sampling process +ought to catch that function in the act only once, there is a pretty good +chance it will actually find that function zero times, or twice. + +By contrast, the number-of-calls figures are derived by counting, not +sampling. They are completely accurate and will not vary from run to run +if your program is deterministic. + +The @dfn{sampling period} that is printed at the beginning of the flat +profile says how often samples are taken. The rule of thumb is that a +run-time figure is accurate if it is considerably bigger than the sampling +period. + +The actual amount of error is usually more than one sampling period. In +fact, if a value is @var{n} times the sampling period, the @emph{expected} +error in it is the square-root of @var{n} sampling periods. If the +sampling period is 0.01 seconds and @code{foo}'s run-time is 1 second, the +expected error in @code{foo}'s run-time is 0.1 seconds. It is likely to +vary this much @emph{on the average} from one profiling run to the next. +(@emph{Sometimes} it will vary more.) + +This does not mean that a small run-time figure is devoid of information. +If the program's @emph{total} run-time is large, a small run-time for one +function does tell you that that function used an insignificant fraction of +the whole program's time. Usually this means it is not worth optimizing. + +One way to get more accuracy is to give your program more (but similar) +input data so it will take longer. Another way is to combine the data from +several runs, using the @samp{-s} option of @code{gprof}. Here is how: + +@enumerate +@item +Run your program once. + +@item +Issue the command @samp{mv gmon.out gmon.sum}. + +@item +Run your program again, the same as before. + +@item +Merge the new data in @file{gmon.out} into @file{gmon.sum} with this command: + +@example +gprof -s @var{executable-file} gmon.out gmon.sum +@end example + +@item +Repeat the last two steps as often as you wish. + +@item +Analyze the cumulative data using this command: + +@example +gprof @var{executable-file} gmon.sum > @var{output-file} +@end example +@end enumerate + +@node Assumptions, Incompatibilities, Sampling Error, Top +@chapter Estimating @code{children} Times Uses an Assumption + +Some of the figures in the call graph are estimates---for example, the +@code{children} time values and all the the time figures in caller and +subroutine lines. + +There is no direct information about these measurements in the profile +data itself. Instead, @code{gprof} estimates them by making an assumption +about your program that might or might not be true. + +The assumption made is that the average time spent in each call to any +function @code{foo} is not correlated with who called @code{foo}. If +@code{foo} used 5 seconds in all, and 2/5 of the calls to @code{foo} came +from @code{a}, then @code{foo} contributes 2 seconds to @code{a}'s +@code{children} time, by assumption. + +This assumption is usually true enough, but for some programs it is far +from true. Suppose that @code{foo} returns very quickly when its argument +is zero; suppose that @code{a} always passes zero as an argument, while +other callers of @code{foo} pass other arguments. In this program, all the +time spent in @code{foo} is in the calls from callers other than @code{a}. +But @code{gprof} has no way of knowing this; it will blindly and +incorrectly charge 2 seconds of time in @code{foo} to the children of +@code{a}. + +@c FIXME - has this been fixed? +We hope some day to put more complete data into @file{gmon.out}, so that +this assumption is no longer needed, if we can figure out how. For the +nonce, the estimated figures are usually more useful than misleading. + +@node Incompatibilities, , Assumptions, Top +@chapter Incompatibilities with Unix @code{gprof} + +@sc{gnu} @code{gprof} and Berkeley Unix @code{gprof} use the same data +file @file{gmon.out}, and provide essentially the same information. But +there are a few differences. + +@itemize @bullet +@item +For a recursive function, Unix @code{gprof} lists the function as a +parent and as a child, with a @code{calls} field that lists the number +of recursive calls. @sc{gnu} @code{gprof} omits these lines and puts +the number of recursive calls in the primary line. + +@item +When a function is suppressed from the call graph with @samp{-e}, @sc{gnu} +@code{gprof} still lists it as a subroutine of functions that call it. + +@ignore - it does this now +@item +The function names printed in @sc{gnu} @code{gprof} output do not include +the leading underscores that are added internally to the front of all +C identifiers on many operating systems. +@end ignore + +@item +The blurbs, field widths, and output formats are different. @sc{gnu} +@code{gprof} prints blurbs after the tables, so that you can see the +tables without skipping the blurbs. + +@contents +@bye + +NEEDS AN INDEX + +Still relevant? + The @file{gmon.out} file is written in the program's @emph{current working + directory} at the time it exits. This means that if your program calls + @code{chdir}, the @file{gmon.out} file will be left in the last directory + your program @code{chdir}'d to. If you don't have permission to write in + this directory, the file is not written. You may get a confusing error + message if this happens. (We have not yet replaced the part of Unix + responsible for this; when we do, we will make the error message + comprehensible.) + +-k from to...? + +-d debugging...? should this be documented? + +-T - "traditional BSD style": How is it different? Should the +differences be documented? + +what is this about? (and to think, I *wrote* it...) + @item -c + The @samp{-c} option causes the static call-graph of the program to be + discovered by a heuristic which examines the text space of the object + file. Static-only parents or children are indicated with call counts of + @samp{0}. + +example flat file adds up to 100.01%... + +note: time estimates now only go out to one decimal place (0.0), where +they used to extend two (78.67). |