1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
|
SSP - The Single Step Profiler
Original Author: DJ Delorie <dj@redhat.com>
The SSP is a program that uses the Win32 debug API to run a program
one ASM instruction at a time. It records the location of each
instruction used, how many times that instruction is used, and all
function calls. The results are saved in a format that is usable by
the profiling program "gprof", although gprof will claim the values
are seconds, they really are instruction counts. More on that later.
Because the SSP was originally designed to profile the cygwin DLL, it
does not automatically select a block of code to report statistics on.
You must specify the range of memory addresses to keep track of
manually, but it's not hard to figure out what to specify. Use the
"objdump" program to determine the bounds of the target's ".text"
section. Let's say we're profiling cygwin1.dll. Make sure you've
built it with debug symbols (else gprof won't run) and run objdump
like this:
objdump -h cygwin1.dll
It will print a report like this:
cygwin1.dll: file format pei-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 0007ea00 61001000 61001000 00000400 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE, DATA
1 .data 00008000 61080000 61080000 0007ee00 2**2
CONTENTS, ALLOC, LOAD, DATA
. . .
The only information we're concerned with are the VMA of the .text
section and the VMA of the section after it (sections are usually
contiguous; you can also add the Size to the VMA to get the end
address). In this case, the VMA is 0x61001000 and the ending address
is either 0x61080000 (start of .data method) or 0x0x6107fa00 (VMA+Size
method).
There are two basic ways to use SSP - either profiling a whole
program, or selectively profiling parts of the program.
To profile a whole program, just run ssp without options. By default,
it will step the whole program. Here's a simple example, using the
numbers above:
ssp 0x61001000 0x61080000 hello.exe
This will step the whole program. It will take at least 8 minutes on
a PII/300 (yes, really). When it's done, it will create a file called
"gmon.out". You can turn this data file into a readable report with
gprof:
gprof -b cygwin1.dll
The "-b" means "skip the help pages". You can omit this until you're
familiar with the report layout. The gprof documentation explains
a lot about this report, but ssp changes a few things. For example,
the first part of the report reports the amount of time spent in each
function, like this:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
10.02 231.22 72.43 46 1574.57 1574.57 strcspn
7.95 288.70 57.48 130 442.15 442.15 strncasematch
The "seconds" columns are really CPU opcodes, 1/100 second per opcode.
So, "231.22" above means 23,122 opcodes. The ms/call values are 10x
too big; 1574.57 means 157.457 opcodes per call. Similar adjustments
need to be made for the "self" and "children" columns in the second
part of the report.
OK, so now we've got a huge report that took a long time to generate,
and we've identified a spot we want to work on optimizing. Let's say
it's the time() function. We can use SSP to selectively profile this
function by using OutputDebugString() to control SSP from within the
program. Here's a sample program:
#include <windows.h>
main()
{
time_t t;
OutputDebugString("ssp on");
time(&t);
OutputDebugString("ssp off");
}
Then, add the "-d" option to ssp to default to *disabling* profiling.
The program will run at full speed until the first OutputDebugString,
then step until the second.
ssp -d 0x61001000 0x61080000 hello.exe
You can then use gprof (as usual) to see the performance profile for
just that portion of the program's execution.
OK, now for the other ssp options, and when to use them:
"-v" - verbose. This prints messages about threads starting and
stopping, OutputDebugString calls, DLLs loading, etc.
"-t" and "-tc" - tracing. With -t, *every* step's address is written
to the file "trace.ssp". This can be used to help debug functions,
since it can trace multiple threads. Clever use of scripts can match
addresses with disassembled opcodes if needed. Warning: creates
*huge* files, very quickly. "-tc" prints each address to the console,
useful for debugging key chunks of assembler.
"-s" - subthreads. Usually, you only need to trace the main thread,
but sometimes you need to trace all threads, so this enables that.
It's also needed when you want to profile a function that only a
subthread calls. However, using OutputDebugString automatically
enables profiling on the thread that called it, not the main thread.
"-dll" - dll profiling. Generates a pretty table of how much time was
spent in each dll the program used. No sense optimizing a function in
your program if most of the time is spent in the DLL.
I usually use the -v, -s, and -dll options:
ssp -v -s -dll -d 0x61001000 0x61080000 hello.exe
|