Oprofile as configured in Yocto is a system-wide profiler (i.e. the version in Yocto doesn't yet make use of the perf_events interface which would allow it to profile specific processes and workloads). It's relies on hardware counter support in the hardware (but can fall back to a timer-based mode), which means that it doesn't take advantage of tracepoints or other event sources for example.
It consists of a kernel module that collects samples and a userspace daemon that writes the sample data to disk.
The 'opcontrol' shell script is used for transparently managing these components and starting and stopping profiles, and the 'opreport' command is used to display the results.
The oprofile daemon should already be running, but before you start profiling, you may need to change some settings and some of these settings may require the daemon not be running. One of these settings is the path the the vmlinux file, which you'll want to set using the --vmlinux option if you want the kernel profiled:
root@crownbay:~# opcontrol --vmlinux=/boot/vmlinux-`uname -r` The profiling daemon is currently active, so changes to the configuration will be used the next time you restart oprofile after a --shutdown or --deinit.
You can check if vmlinux file: is set using opcontrol --status:
root@crownbay:~# opcontrol --status Daemon paused: pid 1334 Separate options: library vmlinux file: none Image filter: none Call-graph depth: 6
If it's not, you need to shutdown the daemon, add the setting and restart the daemon:
root@crownbay:~# opcontrol --shutdown Killing daemon. root@crownbay:~# opcontrol --vmlinux=/boot/vmlinux-`uname -r` root@crownbay:~# opcontrol --start-daemon Using default event: CPU_CLK_UNHALTED:100000:0:1:1 Using 2.6+ OProfile kernel interface. Reading module info. Using log file /var/lib/oprofile/samples/oprofiled.log Daemon started.
If we get the status again we now see our updated settings:
root@crownbay:~# opcontrol --status Daemon paused: pid 1649 Separate options: library vmlinux file: /boot/vmlinux-3.4.11-yocto-standard Image filter: none Call-graph depth: 6
We're now in a position to run a profile. For that we used 'opcontrol --start':
root@crownbay:~# opcontrol --start Profiler running.
In another window, run our wget workload:
root@crownbay:~# rm linux-2.6.19.2.tar.bz2; wget http://downloads.yoctoproject.org/mirror/sources/linux-2.6.19.2.tar.bz2; sync Connecting to downloads.yoctoproject.org (140.211.169.59:80) linux-2.6.19.2.tar.b 100% |*******************************| 41727k 0:00:00 ETA
To stop the profile we use 'opcontrol --shudown', which not only stops the profile but shuts down the daemon as well:
root@crownbay:~# opcontrol --start Stopping profiling. Killing daemon.
Oprofile writes sample data to /var/lib/oprofile/samples, which you can look at if you're interested in seeing how the samples are structured. This is also interesting because it's related to how you dive down to get further details about specific executables in OProfile.
To see the default display output for a profile, simply type 'opreport', which will show the results using the data in /var/lib/oprofile/samples:
root@crownbay:~# opreport WARNING! The OProfile kernel driver reports sample buffer overflows. Such overflows can result in incorrect sample attribution, invalid sample files and other symptoms. See the oprofiled.log for details. You should adjust your sampling frequency to eliminate (or at least minimize) these overflows. CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 464365 79.8156 vmlinux-3.4.11-yocto-standard 65108 11.1908 oprofiled CPU_CLK_UNHALT...| samples| %| ------------------ 64416 98.9372 oprofiled 692 1.0628 libc-2.16.so 36959 6.3526 no-vmlinux 4378 0.7525 busybox CPU_CLK_UNHALT...| samples| %| ------------------ 2844 64.9612 libc-2.16.so 1337 30.5391 busybox 193 4.4084 ld-2.16.so 2 0.0457 libnss_compat-2.16.so 1 0.0228 libnsl-2.16.so 1 0.0228 libnss_files-2.16.so 4344 0.7467 bash CPU_CLK_UNHALT...| samples| %| ------------------ 2657 61.1648 bash 1665 38.3287 libc-2.16.so 18 0.4144 ld-2.16.so 3 0.0691 libtinfo.so.5.9 1 0.0230 libdl-2.16.so 3118 0.5359 nf_conntrack 686 0.1179 matchbox-terminal CPU_CLK_UNHALT...| samples| %| ------------------ 214 31.1953 libglib-2.0.so.0.3200.4 114 16.6181 libc-2.16.so 79 11.5160 libcairo.so.2.11200.2 78 11.3703 libgdk-x11-2.0.so.0.2400.8 51 7.4344 libpthread-2.16.so 45 6.5598 libgobject-2.0.so.0.3200.4 29 4.2274 libvte.so.9.2800.2 25 3.6443 libX11.so.6.3.0 19 2.7697 libxcb.so.1.1.0 17 2.4781 libgtk-x11-2.0.so.0.2400.8 12 1.7493 librt-2.16.so 3 0.4373 libXrender.so.1.3.0 671 0.1153 emgd 411 0.0706 nf_conntrack_ipv4 391 0.0672 iptable_nat 378 0.0650 nf_nat 263 0.0452 Xorg CPU_CLK_UNHALT...| samples| %| ------------------ 106 40.3042 Xorg 53 20.1521 libc-2.16.so 31 11.7871 libpixman-1.so.0.27.2 26 9.8859 emgd_drv.so 16 6.0837 libemgdsrv_um.so.1.5.15.3226 11 4.1825 libEMGD2d.so.1.5.15.3226 9 3.4221 libfb.so 7 2.6616 libpthread-2.16.so 1 0.3802 libudev.so.0.9.3 1 0.3802 libdrm.so.2.4.0 1 0.3802 libextmod.so 1 0.3802 mouse_drv.so . . . 9 0.0015 connmand CPU_CLK_UNHALT...| samples| %| ------------------ 4 44.4444 libglib-2.0.so.0.3200.4 2 22.2222 libpthread-2.16.so 1 11.1111 connmand 1 11.1111 libc-2.16.so 1 11.1111 librt-2.16.so 6 0.0010 oprofile-server CPU_CLK_UNHALT...| samples| %| ------------------ 3 50.0000 libc-2.16.so 1 16.6667 oprofile-server 1 16.6667 libpthread-2.16.so 1 16.6667 libglib-2.0.so.0.3200.4 5 8.6e-04 gconfd-2 CPU_CLK_UNHALT...| samples| %| ------------------ 2 40.0000 libdbus-1.so.3.7.2 2 40.0000 libglib-2.0.so.0.3200.4 1 20.0000 libc-2.16.so
The output above shows the breakdown or samples by both number of samples and percentage for each executable. Within an executable, the sample counts are broken down further into executable and shared libraries (DSOs) used by the executable.
To get even more detailed breakdowns by function, we need to have the full paths to the DSOs, which we can get by using -f with opreport:
root@crownbay:~# opreport -f CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| 464365 79.8156 /boot/vmlinux-3.4.11-yocto-standard 65108 11.1908 /usr/bin/oprofiled CPU_CLK_UNHALT...| samples| %| ------------------ 64416 98.9372 /usr/bin/oprofiled 692 1.0628 /lib/libc-2.16.so 36959 6.3526 /no-vmlinux 4378 0.7525 /bin/busybox CPU_CLK_UNHALT...| samples| %| ------------------ 2844 64.9612 /lib/libc-2.16.so 1337 30.5391 /bin/busybox 193 4.4084 /lib/ld-2.16.so 2 0.0457 /lib/libnss_compat-2.16.so 1 0.0228 /lib/libnsl-2.16.so 1 0.0228 /lib/libnss_files-2.16.so 4344 0.7467 /bin/bash CPU_CLK_UNHALT...| samples| %| ------------------ 2657 61.1648 /bin/bash 1665 38.3287 /lib/libc-2.16.so 18 0.4144 /lib/ld-2.16.so 3 0.0691 /lib/libtinfo.so.5.9 1 0.0230 /lib/libdl-2.16.so . . .
Using the paths shown in the above output and the -l option to opreport, we can see all the functions that have hits in the profile and their sample counts and percentages. Here's a portion of what we get for the kernel:
root@crownbay:~# opreport -l /boot/vmlinux-3.4.11-yocto-standard CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 233981 50.3873 intel_idle 15437 3.3243 rb_get_reader_page 14503 3.1232 ring_buffer_consume 14092 3.0347 mutex_spin_on_owner 13024 2.8047 read_hpet 8039 1.7312 sub_preempt_count 7096 1.5281 ioread32 6997 1.5068 add_preempt_count 3985 0.8582 rb_advance_reader 3488 0.7511 add_event_entry 3303 0.7113 get_parent_ip 3104 0.6684 rb_buffer_peek 2960 0.6374 op_cpu_buffer_read_entry 2614 0.5629 sync_buffer 2545 0.5481 debug_smp_processor_id 2456 0.5289 ohci_irq 2397 0.5162 memset 2349 0.5059 __copy_to_user_ll 2185 0.4705 ring_buffer_event_length 1918 0.4130 in_lock_functions 1850 0.3984 __schedule 1767 0.3805 __copy_from_user_ll_nozero 1575 0.3392 rb_event_data_length 1256 0.2705 memcpy 1233 0.2655 system_call 1213 0.2612 menu_select
Notice that above we see an entry for the __copy_to_user_ll() function that we've looked at with other profilers as well.
Here's what we get when we do the same thing for the busybox executable:
CPU: Intel Architectural Perfmon, speed 1.3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000 samples % image name symbol name 349 8.4198 busybox retrieve_file_data 308 7.4306 libc-2.16.so _IO_file_xsgetn 283 6.8275 libc-2.16.so __read_nocancel 235 5.6695 libc-2.16.so syscall 233 5.6212 libc-2.16.so clearerr 215 5.1870 libc-2.16.so fread 181 4.3667 libc-2.16.so __write_nocancel 158 3.8118 libc-2.16.so __underflow 151 3.6429 libc-2.16.so _dl_addr 150 3.6188 busybox progress_meter 150 3.6188 libc-2.16.so __poll_nocancel 148 3.5706 libc-2.16.so _IO_file_underflow@@GLIBC_2.1 137 3.3052 busybox safe_poll 125 3.0157 busybox bb_progress_update 122 2.9433 libc-2.16.so __x86.get_pc_thunk.bx 95 2.2919 busybox full_write 81 1.9542 busybox safe_write 77 1.8577 busybox xwrite 72 1.7370 libc-2.16.so _IO_file_read 71 1.7129 libc-2.16.so _IO_sgetn 67 1.6164 libc-2.16.so poll 52 1.2545 libc-2.16.so _IO_switch_to_get_mode 45 1.0856 libc-2.16.so read 34 0.8203 libc-2.16.so write 32 0.7720 busybox monotonic_sec 25 0.6031 libc-2.16.so vfprintf 22 0.5308 busybox get_mono 14 0.3378 ld-2.16.so strcmp 14 0.3378 libc-2.16.so __x86.get_pc_thunk.cx . . .
Since we recorded the profile with a callchain depth of 6, we should be able to see our __copy_to_user_ll() callchains in the output, and indeed we can if we search around a bit in the 'opreport --callgraph' output:
root@crownbay:~# opreport --callgraph /boot/vmlinux-3.4.11-yocto-standard 392 6.9639 vmlinux-3.4.11-yocto-standard sock_aio_read 736 13.0751 vmlinux-3.4.11-yocto-standard __generic_file_aio_write 3255 57.8255 vmlinux-3.4.11-yocto-standard inet_recvmsg 785 0.1690 vmlinux-3.4.11-yocto-standard tcp_recvmsg 1790 31.7940 vmlinux-3.4.11-yocto-standard local_bh_enable 1238 21.9893 vmlinux-3.4.11-yocto-standard __kfree_skb 992 17.6199 vmlinux-3.4.11-yocto-standard lock_sock_nested 785 13.9432 vmlinux-3.4.11-yocto-standard tcp_recvmsg [self] 525 9.3250 vmlinux-3.4.11-yocto-standard release_sock 112 1.9893 vmlinux-3.4.11-yocto-standard tcp_cleanup_rbuf 72 1.2789 vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec 170 0.0366 vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec 1491 73.3038 vmlinux-3.4.11-yocto-standard memcpy_toiovec 327 16.0767 vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec 170 8.3579 vmlinux-3.4.11-yocto-standard skb_copy_datagram_iovec [self] 20 0.9833 vmlinux-3.4.11-yocto-standard copy_to_user 2588 98.2909 vmlinux-3.4.11-yocto-standard copy_to_user 2349 0.5059 vmlinux-3.4.11-yocto-standard __copy_to_user_ll 2349 89.2138 vmlinux-3.4.11-yocto-standard __copy_to_user_ll [self] 166 6.3046 vmlinux-3.4.11-yocto-standard do_page_fault
Remember that by default OProfile sessions are cumulative i.e. if you start and stop a profiling session, then start a new one, the new one will not erase the previous run(s) but will build on it. If you want to restart a profile from scratch, you need to reset:
root@crownbay:~# opcontrol --reset