3.2.3. The 'trace events' Subsystem

One especially important directory contained within the /sys/kernel/debug/tracing directory is the 'events' subdirectory, which contains representations of every tracepoint in the system. Listing out the contents of the 'events' subdirectory, we see mainly another set of subdirectories:

     root@sugarbay:/sys/kernel/debug/tracing# cd events
     root@sugarbay:/sys/kernel/debug/tracing/events# ls -al
     drwxr-xr-x   38 root     root             0 Nov 14 23:19 .
     drwxr-xr-x    5 root     root             0 Nov 14 23:19 ..
     drwxr-xr-x   19 root     root             0 Nov 14 23:19 block
     drwxr-xr-x   32 root     root             0 Nov 14 23:19 btrfs
     drwxr-xr-x    5 root     root             0 Nov 14 23:19 drm
     -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
     drwxr-xr-x   40 root     root             0 Nov 14 23:19 ext3
     drwxr-xr-x   79 root     root             0 Nov 14 23:19 ext4
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 ftrace
     drwxr-xr-x    8 root     root             0 Nov 14 23:19 hda
     -r--r--r--    1 root     root             0 Nov 14 23:19 header_event
     -r--r--r--    1 root     root             0 Nov 14 23:19 header_page
     drwxr-xr-x   25 root     root             0 Nov 14 23:19 i915
     drwxr-xr-x    7 root     root             0 Nov 14 23:19 irq
     drwxr-xr-x   12 root     root             0 Nov 14 23:19 jbd
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 jbd2
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 kmem
     drwxr-xr-x    7 root     root             0 Nov 14 23:19 module
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 napi
     drwxr-xr-x    6 root     root             0 Nov 14 23:19 net
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 oom
     drwxr-xr-x   12 root     root             0 Nov 14 23:19 power
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 printk
     drwxr-xr-x    8 root     root             0 Nov 14 23:19 random
     drwxr-xr-x    4 root     root             0 Nov 14 23:19 raw_syscalls
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 rcu
     drwxr-xr-x    6 root     root             0 Nov 14 23:19 rpm
     drwxr-xr-x   20 root     root             0 Nov 14 23:19 sched
     drwxr-xr-x    7 root     root             0 Nov 14 23:19 scsi
     drwxr-xr-x    4 root     root             0 Nov 14 23:19 signal
     drwxr-xr-x    5 root     root             0 Nov 14 23:19 skb
     drwxr-xr-x    4 root     root             0 Nov 14 23:19 sock
     drwxr-xr-x   10 root     root             0 Nov 14 23:19 sunrpc
     drwxr-xr-x  538 root     root             0 Nov 14 23:19 syscalls
     drwxr-xr-x    4 root     root             0 Nov 14 23:19 task
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 timer
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 udp
     drwxr-xr-x   21 root     root             0 Nov 14 23:19 vmscan
     drwxr-xr-x    3 root     root             0 Nov 14 23:19 vsyscall
     drwxr-xr-x    6 root     root             0 Nov 14 23:19 workqueue
     drwxr-xr-x   26 root     root             0 Nov 14 23:19 writeback
            

Each one of these subdirectories corresponds to a 'subsystem' and contains yet again more subdirectories, each one of those finally corresponding to a tracepoint. For example, here are the contents of the 'kmem' subsystem:

     root@sugarbay:/sys/kernel/debug/tracing/events# cd kmem
     root@sugarbay:/sys/kernel/debug/tracing/events/kmem# ls -al
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 .
     drwxr-xr-x   38 root     root             0 Nov 14 23:19 ..
     -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
     -rw-r--r--    1 root     root             0 Nov 14 23:19 filter
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kfree
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmalloc
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmalloc_node
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_alloc
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_alloc_node
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 kmem_cache_free
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc_extfrag
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_alloc_zone_locked
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_free
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_free_batched
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 mm_page_pcpu_drain
            

Let's see what's inside the subdirectory for a specific tracepoint, in this case the one for kmalloc:

     root@sugarbay:/sys/kernel/debug/tracing/events/kmem# cd kmalloc
     root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# ls -al
     drwxr-xr-x    2 root     root             0 Nov 14 23:19 .
     drwxr-xr-x   14 root     root             0 Nov 14 23:19 ..
     -rw-r--r--    1 root     root             0 Nov 14 23:19 enable
     -rw-r--r--    1 root     root             0 Nov 14 23:19 filter
     -r--r--r--    1 root     root             0 Nov 14 23:19 format
     -r--r--r--    1 root     root             0 Nov 14 23:19 id
            

The 'format' file for the tracepoint describes the event in memory, which is used by the various tracing tools that now make use of these tracepoint to parse the event and make sense of it, along with a 'print fmt' field that allows tools like ftrace to display the event as text. Here's what the format of the kmalloc event looks like:

     root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# cat format
     name: kmalloc
     ID: 313
     format:
	     field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	     field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	     field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
	     field:int common_pid;	offset:4;	size:4;	signed:1;
	     field:int common_padding;	offset:8;	size:4;	signed:1;

	     field:unsigned long call_site;	offset:16;	size:8;	signed:0;
	     field:const void * ptr;	offset:24;	size:8;	signed:0;
	     field:size_t bytes_req;	offset:32;	size:8;	signed:0;
	     field:size_t bytes_alloc;	offset:40;	size:8;	signed:0;
	     field:gfp_t gfp_flags;	offset:48;	size:4;	signed:0;

     print fmt: "call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s", REC->call_site, REC->ptr, REC->bytes_req, REC->bytes_alloc,
     (REC->gfp_flags) ? __print_flags(REC->gfp_flags, "|", {(unsigned long)(((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
     gfp_t)0x20000u) | (( gfp_t)0x02u) | (( gfp_t)0x08u)) | (( gfp_t)0x4000u) | (( gfp_t)0x10000u) | (( gfp_t)0x1000u) | (( gfp_t)0x200u) | ((
     gfp_t)0x400000u)), "GFP_TRANSHUGE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x20000u) | ((
     gfp_t)0x02u) | (( gfp_t)0x08u)), "GFP_HIGHUSER_MOVABLE"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
     gfp_t)0x20000u) | (( gfp_t)0x02u)), "GFP_HIGHUSER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | ((
     gfp_t)0x20000u)), "GFP_USER"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u) | (( gfp_t)0x80000u)), GFP_TEMPORARY"},
     {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u) | (( gfp_t)0x80u)), "GFP_KERNEL"}, {(unsigned long)((( gfp_t)0x10u) | (( gfp_t)0x40u)),
     "GFP_NOFS"}, {(unsigned long)((( gfp_t)0x20u)), "GFP_ATOMIC"}, {(unsigned long)((( gfp_t)0x10u)), "GFP_NOIO"}, {(unsigned long)((
     gfp_t)0x20u), "GFP_HIGH"}, {(unsigned long)(( gfp_t)0x10u), "GFP_WAIT"}, {(unsigned long)(( gfp_t)0x40u), "GFP_IO"}, {(unsigned long)((
     gfp_t)0x100u), "GFP_COLD"}, {(unsigned long)(( gfp_t)0x200u), "GFP_NOWARN"}, {(unsigned long)(( gfp_t)0x400u), "GFP_REPEAT"}, {(unsigned
     long)(( gfp_t)0x800u), "GFP_NOFAIL"}, {(unsigned long)(( gfp_t)0x1000u), "GFP_NORETRY"},      {(unsigned long)(( gfp_t)0x4000u), "GFP_COMP"},
     {(unsigned long)(( gfp_t)0x8000u), "GFP_ZERO"}, {(unsigned long)(( gfp_t)0x10000u), "GFP_NOMEMALLOC"}, {(unsigned long)(( gfp_t)0x20000u),
     "GFP_HARDWALL"}, {(unsigned long)(( gfp_t)0x40000u), "GFP_THISNODE"}, {(unsigned long)(( gfp_t)0x80000u), "GFP_RECLAIMABLE"}, {(unsigned
     long)(( gfp_t)0x08u), "GFP_MOVABLE"}, {(unsigned long)(( gfp_t)0), "GFP_NOTRACK"}, {(unsigned long)(( gfp_t)0x400000u), "GFP_NO_KSWAPD"},
     {(unsigned long)(( gfp_t)0x800000u), "GFP_OTHER_NODE"} ) : "GFP_NOWAIT"
            

The 'enable' file in the tracepoint directory is what allows the user (or tools such as trace-cmd) to actually turn the tracepoint on and off. When enabled, the corresponding tracepoint will start appearing in the ftrace 'trace' file described previously. For example, this turns on the kmalloc tracepoint:

     root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 1 > enable
            

At the moment, we're not interested in the function tracer or some other tracer that might be in effect, so we first turn it off, but if we do that, we still need to turn tracing on in order to see the events in the output buffer:

     root@sugarbay:/sys/kernel/debug/tracing# echo nop > current_tracer
     root@sugarbay:/sys/kernel/debug/tracing# echo 1 > tracing_on
            

Now, if we look at the the 'trace' file, we see nothing but the kmalloc events we just turned on:

     root@sugarbay:/sys/kernel/debug/tracing# cat trace | less
     # tracer: nop
     #
     # entries-in-buffer/entries-written: 1897/1897   #P:8
     #
     #                              _-----=> irqs-off
     #                             / _----=> need-resched
     #                            | / _---=> hardirq/softirq
     #                            || / _--=> preempt-depth
     #                            ||| /     delay
     #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
     #              | |       |   ||||       |         |
            dropbear-1465  [000] ...1 18154.620753: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18154.621640: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
              <idle>-0     [000] ..s3 18154.621656: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
     matchbox-termin-1361  [001] ...1 18154.755472: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f0e00 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
                Xorg-1264  [002] ...1 18154.755581: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
                Xorg-1264  [002] ...1 18154.755583: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
                Xorg-1264  [002] ...1 18154.755589: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
     matchbox-termin-1361  [001] ...1 18155.354594: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db35400 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
                Xorg-1264  [002] ...1 18155.354703: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
                Xorg-1264  [002] ...1 18155.354705: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
                Xorg-1264  [002] ...1 18155.354711: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
              <idle>-0     [000] ..s3 18155.673319: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
            dropbear-1465  [000] ...1 18155.673525: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18155.674821: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
              <idle>-0     [000] ..s3 18155.793014: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
            dropbear-1465  [000] ...1 18155.793219: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18155.794147: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
              <idle>-0     [000] ..s3 18155.936705: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
            dropbear-1465  [000] ...1 18155.936910: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18155.937869: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
     matchbox-termin-1361  [001] ...1 18155.953667: kmalloc: call_site=ffffffff81614050 ptr=ffff88006d5f2000 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_KERNEL|GFP_REPEAT
                Xorg-1264  [002] ...1 18155.953775: kmalloc: call_site=ffffffff8141abe8 ptr=ffff8800734f4cc0 bytes_req=168 bytes_alloc=192 gfp_flags=GFP_KERNEL|GFP_NOWARN|GFP_NORETRY
                Xorg-1264  [002] ...1 18155.953777: kmalloc: call_site=ffffffff814192a3 ptr=ffff88001f822520 bytes_req=24 bytes_alloc=32 gfp_flags=GFP_KERNEL|GFP_ZERO
                Xorg-1264  [002] ...1 18155.953783: kmalloc: call_site=ffffffff81419edb ptr=ffff8800721a2f00 bytes_req=64 bytes_alloc=64 gfp_flags=GFP_KERNEL|GFP_ZERO
              <idle>-0     [000] ..s3 18156.176053: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
            dropbear-1465  [000] ...1 18156.176257: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18156.177717: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
              <idle>-0     [000] ..s3 18156.399229: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d555800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
            dropbear-1465  [000] ...1 18156.399434: kmalloc: call_site=ffffffff816650d4 ptr=ffff8800729c3000 bytes_http://rostedt.homelinux.com/kernelshark/req=2048 bytes_alloc=2048 gfp_flags=GFP_KERNEL
              <idle>-0     [000] ..s3 18156.400660: kmalloc: call_site=ffffffff81619b36 ptr=ffff88006d554800 bytes_req=512 bytes_alloc=512 gfp_flags=GFP_ATOMIC
     matchbox-termin-1361  [001] ...1 18156.552800: kmalloc: call_site=ffffffff81614050 ptr=ffff88006db34800 bytes_req=576 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_REPEAT
            

To again disable the kmalloc event, we need to send 0 to the enable file:

     root@sugarbay:/sys/kernel/debug/tracing/events/kmem/kmalloc# echo 0 > enable
            

You can enable any number of events or complete subsystems (by using the 'enable' file in the subsystem directory) and get am arbitrarily fine-grained idea of what's going on in the system by enabling as many of the appropriate tracepoints as applicable.

A number of the tools described in this HOWTO do just that, including trace-cmd and kernelshark in the next section.

Tying it Together: These tracepoints and their representation are used not only by ftrace, but by many of the other tools covered in this document and they form a central point of integration for the various tracers available in Linux. They form a central part of the instrumentation for the following tools: perf, lttng, ftrace, blktrace and SystemTap
Tying it Together: Eventually all the special-purpose tracers currently available in /sys/kernel/debug/tracing will be removed and replaced with equivalent tracers based on the 'trace events' subsystem.