1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
|
@c \input texinfo
@c %**start of header
@c @setfilename agentexpr.info
@c @settitle GDB Agent Expressions
@c @setchapternewpage off
@c %**end of header
@c This file is part of the GDB manual.
@c
@c Copyright (C) 2003-2017 Free Software Foundation, Inc.
@c
@c See the file gdb.texinfo for copying conditions.
@node Agent Expressions
@appendix The GDB Agent Expression Mechanism
In some applications, it is not feasible for the debugger to interrupt
the program's execution long enough for the developer to learn anything
helpful about its behavior. If the program's correctness depends on its
real-time behavior, delays introduced by a debugger might cause the
program to fail, even when the code itself is correct. It is useful to
be able to observe the program's behavior without interrupting it.
Using GDB's @code{trace} and @code{collect} commands, the user can
specify locations in the program, and arbitrary expressions to evaluate
when those locations are reached. Later, using the @code{tfind}
command, she can examine the values those expressions had when the
program hit the trace points. The expressions may also denote objects
in memory --- structures or arrays, for example --- whose values GDB
should record; while visiting a particular tracepoint, the user may
inspect those objects as if they were in memory at that moment.
However, because GDB records these values without interacting with the
user, it can do so quickly and unobtrusively, hopefully not disturbing
the program's behavior.
When GDB is debugging a remote target, the GDB @dfn{agent} code running
on the target computes the values of the expressions itself. To avoid
having a full symbolic expression evaluator on the agent, GDB translates
expressions in the source language into a simpler bytecode language, and
then sends the bytecode to the agent; the agent then executes the
bytecode, and records the values for GDB to retrieve later.
The bytecode language is simple; there are forty-odd opcodes, the bulk
of which are the usual vocabulary of C operands (addition, subtraction,
shifts, and so on) and various sizes of literals and memory reference
operations. The bytecode interpreter operates strictly on machine-level
values --- various sizes of integers and floating point numbers --- and
requires no information about types or symbols; thus, the interpreter's
internal data structures are simple, and each bytecode requires only a
few native machine instructions to implement it. The interpreter is
small, and strict limits on the memory and time required to evaluate an
expression are easy to determine, making it suitable for use by the
debugging agent in real-time applications.
@menu
* General Bytecode Design:: Overview of the interpreter.
* Bytecode Descriptions:: What each one does.
* Using Agent Expressions:: How agent expressions fit into the big picture.
* Varying Target Capabilities:: How to discover what the target can do.
* Rationale:: Why we did it this way.
@end menu
@c @node Rationale
@c @section Rationale
@node General Bytecode Design
@section General Bytecode Design
The agent represents bytecode expressions as an array of bytes. Each
instruction is one byte long (thus the term @dfn{bytecode}). Some
instructions are followed by operand bytes; for example, the @code{goto}
instruction is followed by a destination for the jump.
The bytecode interpreter is a stack-based machine; most instructions pop
their operands off the stack, perform some operation, and push the
result back on the stack for the next instruction to consume. Each
element of the stack may contain either a integer or a floating point
value; these values are as many bits wide as the largest integer that
can be directly manipulated in the source language. Stack elements
carry no record of their type; bytecode could push a value as an
integer, then pop it as a floating point value. However, GDB will not
generate code which does this. In C, one might define the type of a
stack element as follows:
@example
union agent_val @{
LONGEST l;
DOUBLEST d;
@};
@end example
@noindent
where @code{LONGEST} and @code{DOUBLEST} are @code{typedef} names for
the largest integer and floating point types on the machine.
By the time the bytecode interpreter reaches the end of the expression,
the value of the expression should be the only value left on the stack.
For tracing applications, @code{trace} bytecodes in the expression will
have recorded the necessary data, and the value on the stack may be
discarded. For other applications, like conditional breakpoints, the
value may be useful.
Separate from the stack, the interpreter has two registers:
@table @code
@item pc
The address of the next bytecode to execute.
@item start
The address of the start of the bytecode expression, necessary for
interpreting the @code{goto} and @code{if_goto} instructions.
@end table
@noindent
Neither of these registers is directly visible to the bytecode language
itself, but they are useful for defining the meanings of the bytecode
operations.
There are no instructions to perform side effects on the running
program, or call the program's functions; we assume that these
expressions are only used for unobtrusive debugging, not for patching
the running code.
Most bytecode instructions do not distinguish between the various sizes
of values, and operate on full-width values; the upper bits of the
values are simply ignored, since they do not usually make a difference
to the value computed. The exceptions to this rule are:
@table @asis
@item memory reference instructions (@code{ref}@var{n})
There are distinct instructions to fetch different word sizes from
memory. Once on the stack, however, the values are treated as full-size
integers. They may need to be sign-extended; the @code{ext} instruction
exists for this purpose.
@item the sign-extension instruction (@code{ext} @var{n})
These clearly need to know which portion of their operand is to be
extended to occupy the full length of the word.
@end table
If the interpreter is unable to evaluate an expression completely for
some reason (a memory location is inaccessible, or a divisor is zero,
for example), we say that interpretation ``terminates with an error''.
This means that the problem is reported back to the interpreter's caller
in some helpful way. In general, code using agent expressions should
assume that they may attempt to divide by zero, fetch arbitrary memory
locations, and misbehave in other ways.
Even complicated C expressions compile to a few bytecode instructions;
for example, the expression @code{x + y * z} would typically produce
code like the following, assuming that @code{x} and @code{y} live in
registers, and @code{z} is a global variable holding a 32-bit
@code{int}:
@example
reg 1
reg 2
const32 @i{address of z}
ref32
ext 32
mul
add
end
@end example
In detail, these mean:
@table @code
@item reg 1
Push the value of register 1 (presumably holding @code{x}) onto the
stack.
@item reg 2
Push the value of register 2 (holding @code{y}).
@item const32 @i{address of z}
Push the address of @code{z} onto the stack.
@item ref32
Fetch a 32-bit word from the address at the top of the stack; replace
the address on the stack with the value. Thus, we replace the address
of @code{z} with @code{z}'s value.
@item ext 32
Sign-extend the value on the top of the stack from 32 bits to full
length. This is necessary because @code{z} is a signed integer.
@item mul
Pop the top two numbers on the stack, multiply them, and push their
product. Now the top of the stack contains the value of the expression
@code{y * z}.
@item add
Pop the top two numbers, add them, and push the sum. Now the top of the
stack contains the value of @code{x + y * z}.
@item end
Stop executing; the value left on the stack top is the value to be
recorded.
@end table
@node Bytecode Descriptions
@section Bytecode Descriptions
Each bytecode description has the following form:
@table @asis
@item @code{add} (0x02): @var{a} @var{b} @result{} @var{a+b}
Pop the top two stack items, @var{a} and @var{b}, as integers; push
their sum, as an integer.
@end table
In this example, @code{add} is the name of the bytecode, and
@code{(0x02)} is the one-byte value used to encode the bytecode, in
hexadecimal. The phrase ``@var{a} @var{b} @result{} @var{a+b}'' shows
the stack before and after the bytecode executes. Beforehand, the stack
must contain at least two values, @var{a} and @var{b}; since the top of
the stack is to the right, @var{b} is on the top of the stack, and
@var{a} is underneath it. After execution, the bytecode will have
popped @var{a} and @var{b} from the stack, and replaced them with a
single value, @var{a+b}. There may be other values on the stack below
those shown, but the bytecode affects only those shown.
Here is another example:
@table @asis
@item @code{const8} (0x22) @var{n}: @result{} @var{n}
Push the 8-bit integer constant @var{n} on the stack, without sign
extension.
@end table
In this example, the bytecode @code{const8} takes an operand @var{n}
directly from the bytecode stream; the operand follows the @code{const8}
bytecode itself. We write any such operands immediately after the name
of the bytecode, before the colon, and describe the exact encoding of
the operand in the bytecode stream in the body of the bytecode
description.
For the @code{const8} bytecode, there are no stack items given before
the @result{}; this simply means that the bytecode consumes no values
from the stack. If a bytecode consumes no values, or produces no
values, the list on either side of the @result{} may be empty.
If a value is written as @var{a}, @var{b}, or @var{n}, then the bytecode
treats it as an integer. If a value is written is @var{addr}, then the
bytecode treats it as an address.
We do not fully describe the floating point operations here; although
this design can be extended in a clean way to handle floating point
values, they are not of immediate interest to the customer, so we avoid
describing them, to save time.
@table @asis
@item @code{float} (0x01): @result{}
Prefix for floating-point bytecodes. Not implemented yet.
@item @code{add} (0x02): @var{a} @var{b} @result{} @var{a+b}
Pop two integers from the stack, and push their sum, as an integer.
@item @code{sub} (0x03): @var{a} @var{b} @result{} @var{a-b}
Pop two integers from the stack, subtract the top value from the
next-to-top value, and push the difference.
@item @code{mul} (0x04): @var{a} @var{b} @result{} @var{a*b}
Pop two integers from the stack, multiply them, and push the product on
the stack. Note that, when one multiplies two @var{n}-bit numbers
yielding another @var{n}-bit number, it is irrelevant whether the
numbers are signed or not; the results are the same.
@item @code{div_signed} (0x05): @var{a} @var{b} @result{} @var{a/b}
Pop two signed integers from the stack; divide the next-to-top value by
the top value, and push the quotient. If the divisor is zero, terminate
with an error.
@item @code{div_unsigned} (0x06): @var{a} @var{b} @result{} @var{a/b}
Pop two unsigned integers from the stack; divide the next-to-top value
by the top value, and push the quotient. If the divisor is zero,
terminate with an error.
@item @code{rem_signed} (0x07): @var{a} @var{b} @result{} @var{a modulo b}
Pop two signed integers from the stack; divide the next-to-top value by
the top value, and push the remainder. If the divisor is zero,
terminate with an error.
@item @code{rem_unsigned} (0x08): @var{a} @var{b} @result{} @var{a modulo b}
Pop two unsigned integers from the stack; divide the next-to-top value
by the top value, and push the remainder. If the divisor is zero,
terminate with an error.
@item @code{lsh} (0x09): @var{a} @var{b} @result{} @var{a<<b}
Pop two integers from the stack; let @var{a} be the next-to-top value,
and @var{b} be the top value. Shift @var{a} left by @var{b} bits, and
push the result.
@item @code{rsh_signed} (0x0a): @var{a} @var{b} @result{} @code{(signed)}@var{a>>b}
Pop two integers from the stack; let @var{a} be the next-to-top value,
and @var{b} be the top value. Shift @var{a} right by @var{b} bits,
inserting copies of the top bit at the high end, and push the result.
@item @code{rsh_unsigned} (0x0b): @var{a} @var{b} @result{} @var{a>>b}
Pop two integers from the stack; let @var{a} be the next-to-top value,
and @var{b} be the top value. Shift @var{a} right by @var{b} bits,
inserting zero bits at the high end, and push the result.
@item @code{log_not} (0x0e): @var{a} @result{} @var{!a}
Pop an integer from the stack; if it is zero, push the value one;
otherwise, push the value zero.
@item @code{bit_and} (0x0f): @var{a} @var{b} @result{} @var{a&b}
Pop two integers from the stack, and push their bitwise @code{and}.
@item @code{bit_or} (0x10): @var{a} @var{b} @result{} @var{a|b}
Pop two integers from the stack, and push their bitwise @code{or}.
@item @code{bit_xor} (0x11): @var{a} @var{b} @result{} @var{a^b}
Pop two integers from the stack, and push their bitwise
exclusive-@code{or}.
@item @code{bit_not} (0x12): @var{a} @result{} @var{~a}
Pop an integer from the stack, and push its bitwise complement.
@item @code{equal} (0x13): @var{a} @var{b} @result{} @var{a=b}
Pop two integers from the stack; if they are equal, push the value one;
otherwise, push the value zero.
@item @code{less_signed} (0x14): @var{a} @var{b} @result{} @var{a<b}
Pop two signed integers from the stack; if the next-to-top value is less
than the top value, push the value one; otherwise, push the value zero.
@item @code{less_unsigned} (0x15): @var{a} @var{b} @result{} @var{a<b}
Pop two unsigned integers from the stack; if the next-to-top value is less
than the top value, push the value one; otherwise, push the value zero.
@item @code{ext} (0x16) @var{n}: @var{a} @result{} @var{a}, sign-extended from @var{n} bits
Pop an unsigned value from the stack; treating it as an @var{n}-bit
twos-complement value, extend it to full length. This means that all
bits to the left of bit @var{n-1} (where the least significant bit is bit
0) are set to the value of bit @var{n-1}. Note that @var{n} may be
larger than or equal to the width of the stack elements of the bytecode
engine; in this case, the bytecode should have no effect.
The number of source bits to preserve, @var{n}, is encoded as a single
byte unsigned integer following the @code{ext} bytecode.
@item @code{zero_ext} (0x2a) @var{n}: @var{a} @result{} @var{a}, zero-extended from @var{n} bits
Pop an unsigned value from the stack; zero all but the bottom @var{n}
bits.
The number of source bits to preserve, @var{n}, is encoded as a single
byte unsigned integer following the @code{zero_ext} bytecode.
@item @code{ref8} (0x17): @var{addr} @result{} @var{a}
@itemx @code{ref16} (0x18): @var{addr} @result{} @var{a}
@itemx @code{ref32} (0x19): @var{addr} @result{} @var{a}
@itemx @code{ref64} (0x1a): @var{addr} @result{} @var{a}
Pop an address @var{addr} from the stack. For bytecode
@code{ref}@var{n}, fetch an @var{n}-bit value from @var{addr}, using the
natural target endianness. Push the fetched value as an unsigned
integer.
Note that @var{addr} may not be aligned in any particular way; the
@code{ref@var{n}} bytecodes should operate correctly for any address.
If attempting to access memory at @var{addr} would cause a processor
exception of some sort, terminate with an error.
@item @code{ref_float} (0x1b): @var{addr} @result{} @var{d}
@itemx @code{ref_double} (0x1c): @var{addr} @result{} @var{d}
@itemx @code{ref_long_double} (0x1d): @var{addr} @result{} @var{d}
@itemx @code{l_to_d} (0x1e): @var{a} @result{} @var{d}
@itemx @code{d_to_l} (0x1f): @var{d} @result{} @var{a}
Not implemented yet.
@item @code{dup} (0x28): @var{a} => @var{a} @var{a}
Push another copy of the stack's top element.
@item @code{swap} (0x2b): @var{a} @var{b} => @var{b} @var{a}
Exchange the top two items on the stack.
@item @code{pop} (0x29): @var{a} =>
Discard the top value on the stack.
@item @code{pick} (0x32) @var{n}: @var{a} @dots{} @var{b} => @var{a} @dots{} @var{b} @var{a}
Duplicate an item from the stack and push it on the top of the stack.
@var{n}, a single byte, indicates the stack item to copy. If @var{n}
is zero, this is the same as @code{dup}; if @var{n} is one, it copies
the item under the top item, etc. If @var{n} exceeds the number of
items on the stack, terminate with an error.
@item @code{rot} (0x33): @var{a} @var{b} @var{c} => @var{c} @var{b} @var{a}
Rotate the top three items on the stack.
@item @code{if_goto} (0x20) @var{offset}: @var{a} @result{}
Pop an integer off the stack; if it is non-zero, branch to the given
offset in the bytecode string. Otherwise, continue to the next
instruction in the bytecode stream. In other words, if @var{a} is
non-zero, set the @code{pc} register to @code{start} + @var{offset}.
Thus, an offset of zero denotes the beginning of the expression.
The @var{offset} is stored as a sixteen-bit unsigned value, stored
immediately following the @code{if_goto} bytecode. It is always stored
most significant byte first, regardless of the target's normal
endianness. The offset is not guaranteed to fall at any particular
alignment within the bytecode stream; thus, on machines where fetching a
16-bit on an unaligned address raises an exception, you should fetch the
offset one byte at a time.
@item @code{goto} (0x21) @var{offset}: @result{}
Branch unconditionally to @var{offset}; in other words, set the
@code{pc} register to @code{start} + @var{offset}.
The offset is stored in the same way as for the @code{if_goto} bytecode.
@item @code{const8} (0x22) @var{n}: @result{} @var{n}
@itemx @code{const16} (0x23) @var{n}: @result{} @var{n}
@itemx @code{const32} (0x24) @var{n}: @result{} @var{n}
@itemx @code{const64} (0x25) @var{n}: @result{} @var{n}
Push the integer constant @var{n} on the stack, without sign extension.
To produce a small negative value, push a small twos-complement value,
and then sign-extend it using the @code{ext} bytecode.
The constant @var{n} is stored in the appropriate number of bytes
following the @code{const}@var{b} bytecode. The constant @var{n} is
always stored most significant byte first, regardless of the target's
normal endianness. The constant is not guaranteed to fall at any
particular alignment within the bytecode stream; thus, on machines where
fetching a 16-bit on an unaligned address raises an exception, you
should fetch @var{n} one byte at a time.
@item @code{reg} (0x26) @var{n}: @result{} @var{a}
Push the value of register number @var{n}, without sign extension. The
registers are numbered following GDB's conventions.
The register number @var{n} is encoded as a 16-bit unsigned integer
immediately following the @code{reg} bytecode. It is always stored most
significant byte first, regardless of the target's normal endianness.
The register number is not guaranteed to fall at any particular
alignment within the bytecode stream; thus, on machines where fetching a
16-bit on an unaligned address raises an exception, you should fetch the
register number one byte at a time.
@item @code{getv} (0x2c) @var{n}: @result{} @var{v}
Push the value of trace state variable number @var{n}, without sign
extension.
The variable number @var{n} is encoded as a 16-bit unsigned integer
immediately following the @code{getv} bytecode. It is always stored most
significant byte first, regardless of the target's normal endianness.
The variable number is not guaranteed to fall at any particular
alignment within the bytecode stream; thus, on machines where fetching a
16-bit on an unaligned address raises an exception, you should fetch the
register number one byte at a time.
@item @code{setv} (0x2d) @var{n}: @var{v} @result{} @var{v}
Set trace state variable number @var{n} to the value found on the top
of the stack. The stack is unchanged, so that the value is readily
available if the assignment is part of a larger expression. The
handling of @var{n} is as described for @code{getv}.
@item @code{trace} (0x0c): @var{addr} @var{size} @result{}
Record the contents of the @var{size} bytes at @var{addr} in a trace
buffer, for later retrieval by GDB.
@item @code{trace_quick} (0x0d) @var{size}: @var{addr} @result{} @var{addr}
Record the contents of the @var{size} bytes at @var{addr} in a trace
buffer, for later retrieval by GDB. @var{size} is a single byte
unsigned integer following the @code{trace} opcode.
This bytecode is equivalent to the sequence @code{dup const8 @var{size}
trace}, but we provide it anyway to save space in bytecode strings.
@item @code{trace16} (0x30) @var{size}: @var{addr} @result{} @var{addr}
Identical to trace_quick, except that @var{size} is a 16-bit big-endian
unsigned integer, not a single byte. This should probably have been
named @code{trace_quick16}, for consistency.
@item @code{tracev} (0x2e) @var{n}: @result{} @var{a}
Record the value of trace state variable number @var{n} in the trace
buffer. The handling of @var{n} is as described for @code{getv}.
@item @code{tracenz} (0x2f) @var{addr} @var{size} @result{}
Record the bytes at @var{addr} in a trace buffer, for later retrieval
by GDB. Stop at either the first zero byte, or when @var{size} bytes
have been recorded, whichever occurs first.
@item @code{printf} (0x34) @var{numargs} @var{string} @result{}
Do a formatted print, in the style of the C function @code{printf}).
The value of @var{numargs} is the number of arguments to expect on the
stack, while @var{string} is the format string, prefixed with a
two-byte length. The last byte of the string must be zero, and is
included in the length. The format string includes escaped sequences
just as it appears in C source, so for instance the format string
@code{"\t%d\n"} is six characters long, and the output will consist of
a tab character, a decimal number, and a newline. At the top of the
stack, above the values to be printed, this bytecode will pop a
``function'' and ``channel''. If the function is nonzero, then the
target may treat it as a function and call it, passing the channel as
a first argument, as with the C function @code{fprintf}. If the
function is zero, then the target may simply call a standard formatted
print function of its choice. In all, this bytecode pops 2 +
@var{numargs} stack elements, and pushes nothing.
@item @code{end} (0x27): @result{}
Stop executing bytecode; the result should be the top element of the
stack. If the purpose of the expression was to compute an lvalue or a
range of memory, then the next-to-top of the stack is the lvalue's
address, and the top of the stack is the lvalue's size, in bytes.
@end table
@node Using Agent Expressions
@section Using Agent Expressions
Agent expressions can be used in several different ways by @value{GDBN},
and the debugger can generate different bytecode sequences as appropriate.
One possibility is to do expression evaluation on the target rather
than the host, such as for the conditional of a conditional
tracepoint. In such a case, @value{GDBN} compiles the source
expression into a bytecode sequence that simply gets values from
registers or memory, does arithmetic, and returns a result.
Another way to use agent expressions is for tracepoint data
collection. @value{GDBN} generates a different bytecode sequence for
collection; in addition to bytecodes that do the calculation,
@value{GDBN} adds @code{trace} bytecodes to save the pieces of
memory that were used.
@itemize @bullet
@item
The user selects trace points in the program's code at which GDB should
collect data.
@item
The user specifies expressions to evaluate at each trace point. These
expressions may denote objects in memory, in which case those objects'
contents are recorded as the program runs, or computed values, in which
case the values themselves are recorded.
@item
GDB transmits the tracepoints and their associated expressions to the
GDB agent, running on the debugging target.
@item
The agent arranges to be notified when a trace point is hit.
@item
When execution on the target reaches a trace point, the agent evaluates
the expressions associated with that trace point, and records the
resulting values and memory ranges.
@item
Later, when the user selects a given trace event and inspects the
objects and expression values recorded, GDB talks to the agent to
retrieve recorded data as necessary to meet the user's requests. If the
user asks to see an object whose contents have not been recorded, GDB
reports an error.
@end itemize
@node Varying Target Capabilities
@section Varying Target Capabilities
Some targets don't support floating-point, and some would rather not
have to deal with @code{long long} operations. Also, different targets
will have different stack sizes, and different bytecode buffer lengths.
Thus, GDB needs a way to ask the target about itself. We haven't worked
out the details yet, but in general, GDB should be able to send the
target a packet asking it to describe itself. The reply should be a
packet whose length is explicit, so we can add new information to the
packet in future revisions of the agent, without confusing old versions
of GDB, and it should contain a version number. It should contain at
least the following information:
@itemize @bullet
@item
whether floating point is supported
@item
whether @code{long long} is supported
@item
maximum acceptable size of bytecode stack
@item
maximum acceptable length of bytecode expressions
@item
which registers are actually available for collection
@item
whether the target supports disabled tracepoints
@end itemize
@node Rationale
@section Rationale
Some of the design decisions apparent above are arguable.
@table @b
@item What about stack overflow/underflow?
GDB should be able to query the target to discover its stack size.
Given that information, GDB can determine at translation time whether a
given expression will overflow the stack. But this spec isn't about
what kinds of error-checking GDB ought to do.
@item Why are you doing everything in LONGEST?
Speed isn't important, but agent code size is; using LONGEST brings in a
bunch of support code to do things like division, etc. So this is a
serious concern.
First, note that you don't need different bytecodes for different
operand sizes. You can generate code without @emph{knowing} how big the
stack elements actually are on the target. If the target only supports
32-bit ints, and you don't send any 64-bit bytecodes, everything just
works. The observation here is that the MIPS and the Alpha have only
fixed-size registers, and you can still get C's semantics even though
most instructions only operate on full-sized words. You just need to
make sure everything is properly sign-extended at the right times. So
there is no need for 32- and 64-bit variants of the bytecodes. Just
implement everything using the largest size you support.
GDB should certainly check to see what sizes the target supports, so the
user can get an error earlier, rather than later. But this information
is not necessary for correctness.
@item Why don't you have @code{>} or @code{<=} operators?
I want to keep the interpreter small, and we don't need them. We can
combine the @code{less_} opcodes with @code{log_not}, and swap the order
of the operands, yielding all four asymmetrical comparison operators.
For example, @code{(x <= y)} is @code{! (x > y)}, which is @code{! (y <
x)}.
@item Why do you have @code{log_not}?
@itemx Why do you have @code{ext}?
@itemx Why do you have @code{zero_ext}?
These are all easily synthesized from other instructions, but I expect
them to be used frequently, and they're simple, so I include them to
keep bytecode strings short.
@code{log_not} is equivalent to @code{const8 0 equal}; it's used in half
the relational operators.
@code{ext @var{n}} is equivalent to @code{const8 @var{s-n} lsh const8
@var{s-n} rsh_signed}, where @var{s} is the size of the stack elements;
it follows @code{ref@var{m}} and @var{reg} bytecodes when the value
should be signed. See the next bulleted item.
@code{zero_ext @var{n}} is equivalent to @code{const@var{m} @var{mask}
log_and}; it's used whenever we push the value of a register, because we
can't assume the upper bits of the register aren't garbage.
@item Why not have sign-extending variants of the @code{ref} operators?
Because that would double the number of @code{ref} operators, and we
need the @code{ext} bytecode anyway for accessing bitfields.
@item Why not have constant-address variants of the @code{ref} operators?
Because that would double the number of @code{ref} operators again, and
@code{const32 @var{address} ref32} is only one byte longer.
@item Why do the @code{ref@var{n}} operators have to support unaligned fetches?
GDB will generate bytecode that fetches multi-byte values at unaligned
addresses whenever the executable's debugging information tells it to.
Furthermore, GDB does not know the value the pointer will have when GDB
generates the bytecode, so it cannot determine whether a particular
fetch will be aligned or not.
In particular, structure bitfields may be several bytes long, but follow
no alignment rules; members of packed structures are not necessarily
aligned either.
In general, there are many cases where unaligned references occur in
correct C code, either at the programmer's explicit request, or at the
compiler's discretion. Thus, it is simpler to make the GDB agent
bytecodes work correctly in all circumstances than to make GDB guess in
each case whether the compiler did the usual thing.
@item Why are there no side-effecting operators?
Because our current client doesn't want them? That's a cheap answer. I
think the real answer is that I'm afraid of implementing function
calls. We should re-visit this issue after the present contract is
delivered.
@item Why aren't the @code{goto} ops PC-relative?
The interpreter has the base address around anyway for PC bounds
checking, and it seemed simpler.
@item Why is there only one offset size for the @code{goto} ops?
Offsets are currently sixteen bits. I'm not happy with this situation
either:
Suppose we have multiple branch ops with different offset sizes. As I
generate code left-to-right, all my jumps are forward jumps (there are
no loops in expressions), so I never know the target when I emit the
jump opcode. Thus, I have to either always assume the largest offset
size, or do jump relaxation on the code after I generate it, which seems
like a big waste of time.
I can imagine a reasonable expression being longer than 256 bytes. I
can't imagine one being longer than 64k. Thus, we need 16-bit offsets.
This kind of reasoning is so bogus, but relaxation is pathetic.
The other approach would be to generate code right-to-left. Then I'd
always know my offset size. That might be fun.
@item Where is the function call bytecode?
When we add side-effects, we should add this.
@item Why does the @code{reg} bytecode take a 16-bit register number?
Intel's IA-64 architecture has 128 general-purpose registers,
and 128 floating-point registers, and I'm sure it has some random
control registers.
@item Why do we need @code{trace} and @code{trace_quick}?
Because GDB needs to record all the memory contents and registers an
expression touches. If the user wants to evaluate an expression
@code{x->y->z}, the agent must record the values of @code{x} and
@code{x->y} as well as the value of @code{x->y->z}.
@item Don't the @code{trace} bytecodes make the interpreter less general?
They do mean that the interpreter contains special-purpose code, but
that doesn't mean the interpreter can only be used for that purpose. If
an expression doesn't use the @code{trace} bytecodes, they don't get in
its way.
@item Why doesn't @code{trace_quick} consume its arguments the way everything else does?
In general, you do want your operators to consume their arguments; it's
consistent, and generally reduces the amount of stack rearrangement
necessary. However, @code{trace_quick} is a kludge to save space; it
only exists so we needn't write @code{dup const8 @var{SIZE} trace}
before every memory reference. Therefore, it's okay for it not to
consume its arguments; it's meant for a specific context in which we
know exactly what it should do with the stack. If we're going to have a
kludge, it should be an effective kludge.
@item Why does @code{trace16} exist?
That opcode was added by the customer that contracted Cygnus for the
data tracing work. I personally think it is unnecessary; objects that
large will be quite rare, so it is okay to use @code{dup const16
@var{size} trace} in those cases.
Whatever we decide to do with @code{trace16}, we should at least leave
opcode 0x30 reserved, to remain compatible with the customer who added
it.
@end table
|