Skip to content

October 2020

perf setup

Linux-perf aka perf is versatile, like a batmobile. It has all the tools and functionalities you need. And you'll feel like a superhero once you master it.

By default perf comes with may tools that relying on debug and trace symbols exported via procfs. But to add custom probes and probes with line numbers, kernel debug symbols and kernel source is necessary. In this post I'll walk you through the necessary setup process. I'm using an Ubuntu-20.04 VM running on Virtual box. I'm not going to rebuild and install kernel. The steps will be,

  • Install Linux-kernel debug symbols
  • Fetch Linux-kernel source
  • Install perf
  • First run

Enable non-common repositories

Enable debug repositories in apt source list.

bala@ubuntu-vm-1:~$ echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | sudo tee -a /etc/apt/sources.list.d/ddebs.list

Install debug keyring

bala@ubuntu-vm-1:~$ sudo apt install ubuntu-dbgsym-keyring

Enable source repositories in apt source list.

bala@ubuntu-vm-1:~$ grep deb-src /etc/apt/sources.list
deb-src http://in.archive.ubuntu.com/ubuntu/ focal main restricted

Do apt update

bala@ubuntu-vm-1:~$ sudo apt update

1. Install Linux-kernel debug symbols

Install Linux debug symbols corresponding the kernel installed in your machine.

bala@ubuntu-vm-1:~$ sudo apt install -y linux-image-`uname -r`-dbgsym

Linux image with debug symbols will be installed in the directory /usr/lib/debug/boot/

bala@ubuntu-vm-1:~$ ls -lh /usr/lib/debug/boot/vmlinux-5.4.0-52-generic
-rw-r--r-- 2 root root 742M Oct 15 15:58 /usr/lib/debug/boot/vmlinux-5.4.0-52-generic
bala@ubuntu-vm-1:~$

2. Fetch Linux kernel source

Fetch the source package corresponding to the installed kernel.

bala@ubuntu-vm-1:~$ sudo apt install linux-source
Kernel source with debian packaging files will be installed in the path /usr/src/linux-source-5.4.0. The kernel source is available in a tarball inside this directory. Copy to your desired location and extract.
bala@ubuntu-vm-1:~$ ls -lh /usr/src/linux-source-5.4.0/linux-source-5.4.0.tar.bz2
-rw-r--r-- 1 root root 129M Oct 15 15:58 /usr/src/linux-source-5.4.0/linux-source-5.4.0.tar.bz2
bala@ubuntu-vm-1:~$ cp -f /usr/src/linux-source-5.4.0/linux-source-5.4.0.tar.bz2 ~/source/
bala@ubuntu-vm-1:~$ cd ~/source/
bala@ubuntu-vm-1:~/source$ tar -xvf linux-source-5.4.0.tar.bz2
bala@ubuntu-vm-1:~/source$ ls ~/source/linux-source-5.4.0/
arch   certs    CREDITS  Documentation  dropped.txt  include  ipc     Kconfig  lib       MAINTAINERS  mm   README   scripts   snapcraft.yaml  tools   update-version-dkms  virt
block  COPYING  crypto   drivers        fs           init     Kbuild  kernel   LICENSES  Makefile     net  samples  security  sound           ubuntu  usr
bala@ubuntu-vm-1:~$

3. Install Linux perf

It comes with linux-tools-generic package on Ubuntu-20.04.

bala@ubuntu-vm-1:~$ sudo apt install linux-tools-generic

4. Run your first perf command

I want to count number of IPI (Inter Processor Interrupts) sent by resched_curr. It sends IPI when the target CPU is not the current CPU (the one executing the function itself). Here is the source code of that function.

void resched_curr(struct rq *rq)
{
    struct task_struct *curr = rq->curr;
    int cpu;

    lockdep_assert_held(&rq->lock);

    if (test_tsk_need_resched(curr))
        return;

    cpu = cpu_of(rq);

    if (cpu == smp_processor_id()) {
        set_tsk_need_resched(curr);
        set_preempt_need_resched();
        return;
    }

    if (set_nr_and_not_polling(curr))
        smp_send_reschedule(cpu);
    else
        trace_sched_wake_idle_without_ipi(cpu);
}

So if target CPU is the current CPU, line number 14 will get executed. Otherwise execution continues from line number 18. Also I want to record the target CPU in both cases.

Get the line numbers where you can insert probes from perf itself.

bala@ubuntu-vm-1:~/source$ sudo perf probe -k /usr/lib/debug/boot/vmlinux-5.4.0-52-generic -s ~/source/linux-source-5.4.0 -L resched_curr
<resched_curr@/home/bala/source/linux-source-5.4.0//kernel/sched/core.c:0>
      0  void resched_curr(struct rq *rq)
      1  {
      2         struct task_struct *curr = rq->curr;
                int cpu;

                lockdep_assert_held(&rq->lock);

      7         if (test_tsk_need_resched(curr))
                        return;

     10         cpu = cpu_of(rq);

     12         if (cpu == smp_processor_id()) {
     13                 set_tsk_need_resched(curr);
     14                 set_preempt_need_resched();
     15                 return;
                }

     18         if (set_nr_and_not_polling(curr))
     19                 smp_send_reschedule(cpu);
                else
     21                 trace_sched_wake_idle_without_ipi(cpu);
         }

         void resched_cpu(int cpu)

bala@ubuntu-vm-1:~/source$

Here is the probe for non-IPI case. I name it as resched_curr_same_cpu.

bala@ubuntu-vm-1:~$ sudo perf probe -k /usr/lib/debug/boot/vmlinux-5.4.0-52-generic -s source/linux-source-5.4.0 resched_curr_same_cpu='resched_curr:14 rq->cpu'

Probe for IPI case. And I name it as resched_curr_send_ipi.

bala@ubuntu-vm-1:~$ sudo perf probe -k /usr/lib/debug/boot/vmlinux-5.4.0-52-generic -s source/linux-source-5.4.0 resched_curr_send_ipi='resched_curr:19 rq->cpu'

Note: To probe the function resched_curr and its argument rq, we need Linux debug symbols. And to probe on line numbers we need Linux source. So that we have installed both of them earlier.

Now lets capture the execution of a stress-ng test.

bala@ubuntu-vm-1:~$ sudo perf record -e probe:resched_curr_same_cpu,probe:resched_curr_send_ipi stress-ng --mq 8 -t 5 --metrics-brief
stress-ng: info:  [22439] dispatching hogs: 8 mq
stress-ng: info:  [22439] successful run completed in 5.01s
stress-ng: info:  [22439] stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
stress-ng: info:  [22439]                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
stress-ng: info:  [22439] mq              2225397      5.00      3.57     16.14    445062.30    112907.00
[ perf record: Woken up 421 times to write data ]
[ perf record: Captured and wrote 105.404 MB perf.data (1380709 samples) ]
bala@ubuntu-vm-1:~$

And the report is,

bala@ubuntu-vm-1:~$ sudo perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of event 'probe:resched_curr_same_cpu'
# Event count (approx.): 1380698
#
# Overhead  Trace output
# ........  ........................
#
    29.13%  (ffffffff83ad740d) cpu=1
    27.77%  (ffffffff83ad740d) cpu=2
    24.74%  (ffffffff83ad740d) cpu=0
    18.36%  (ffffffff83ad740d) cpu=3


# Samples: 11  of event 'probe:resched_curr_send_ipi'
# Event count (approx.): 11
#
# Overhead  Trace output
# ........  ........................
#
    45.45%  (ffffffff83ad73af) cpu=1
    36.36%  (ffffffff83ad73af) cpu=3
     9.09%  (ffffffff83ad73af) cpu=0
     9.09%  (ffffffff83ad73af) cpu=2


#
# (Cannot load tips.txt file, please install perf!)
#
bala@ubuntu-vm-1:~$
As you can see only 11 times out of a million times an IPI is sent. More on this in later posts. Until then... "Perhaps you should read the instructions first?".

References

  • http://www.brendangregg.com/perf.html
  • https://wiki.ubuntu.com/Kernel/Reference/stress-ng
  • https://man7.org/linux/man-pages/man1/perf-probe.1.html
  • https://wiki.ubuntu.com/Debug%20Symbol%20Packages
  • https://askubuntu.com/questions/50145/how-to-install-perf-monitoring-tool

Quick kernel upgrade with kexec

One of the major issues we are facing is keeping up to date with security patches. That too keeping the kernel up to date is little harder. Because it requires a reboot. As reboot will take minutes to complete, there will be a significant service downtime. Or doing a service migration to avoid downtime will come with its own complexity.

kexec will be help in these situations. It can upgrade the kernel without complete reboot process. Though not zero, the downtime is very less compared to a full reboot. In this post, I'll demo upgrading kernel of a Virtual machine running Debian-9.

This VM is running Debian Linux-4.9.0-12. Let me update to the latest kernel available now - Linux-4.9.0-13.

Install kexec-tools

root@debian:~# apt install kexec-tools -qq

Install latest Linux-image package. This will not overwrite the existing kernel or initrd image in your /boot/ directory. So you can safely rollback if required.

root@debian:~# ls -lh /boot/
total 25M
-rw-r--r-- 1 root root 3.1M Jan 21  2020 System.map-4.9.0-12-amd64
-rw-r--r-- 1 root root 183K Jan 21  2020 config-4.9.0-12-amd64
drwxr-xr-x 5 root root 4.0K Apr 24 12:40 grub
-rw-r--r-- 1 root root  18M Apr 24 12:23 initrd.img-4.9.0-12-amd64
-rw-r--r-- 1 root root 4.1M Jan 21  2020 vmlinuz-4.9.0-12-amd64

root@debian:~# sudo apt update -qq
43 packages can be upgraded. Run 'apt list --upgradable' to see them.

root@debian:~# sudo apt install linux-image-amd64 -qq
The following additional packages will be installed:
  linux-image-4.9.0-13-amd64
Suggested packages:
  linux-doc-4.9 debian-kernel-handbook
The following NEW packages will be installed:
  linux-image-4.9.0-13-amd64
The following packages will be upgraded:
  linux-image-amd64
1 upgraded, 1 newly installed, 0 to remove and 42 not upgraded.
Need to get 39.3 MB of archives.
After this operation, 193 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Selecting previously unselected package linux-image-4.9.0-13-amd64.
(Reading database ... 26429 files and directories currently installed.)
Preparing to unpack .../linux-image-4.9.0-13-amd64_4.9.228-1_amd64.deb ...
Unpacking linux-image-4.9.0-13-amd64 (4.9.228-1) ...........................]
Preparing to unpack .../linux-image-amd64_4.9+80+deb9u11_amd64.deb .........]
Unpacking linux-image-amd64 (4.9+80+deb9u11) over (4.9+80+deb9u10) .........]
Setting up linux-image-4.9.0-13-amd64 (4.9.228-1) ..........................]
I: /vmlinuz is now a symlink to boot/vmlinuz-4.9.0-13-amd64.................]
I: /initrd.img is now a symlink to boot/initrd.img-4.9.0-13-amd64
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.9.0-13-amd64
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.9.0-13-amd64
Found initrd image: /boot/initrd.img-4.9.0-13-amd64
Found linux image: /boot/vmlinuz-4.9.0-12-amd64
Found initrd image: /boot/initrd.img-4.9.0-12-amd64
done
Setting up linux-image-amd64 (4.9+80+deb9u11) ...###########................]

root@debian:~# ls -lh /boot/
total 50M
-rw-r--r-- 1 root root 3.1M Jan 21  2020 System.map-4.9.0-12-amd64
-rw-r--r-- 1 root root 3.1M Jul  6 02:59 System.map-4.9.0-13-amd64   <---
-rw-r--r-- 1 root root 183K Jan 21  2020 config-4.9.0-12-amd64
-rw-r--r-- 1 root root 183K Jul  6 02:59 config-4.9.0-13-amd64       <---
drwxr-xr-x 5 root root 4.0K Oct  1 17:25 grub
-rw-r--r-- 1 root root  18M Apr 24 12:23 initrd.img-4.9.0-12-amd64
-rw-r--r-- 1 root root  18M Oct  1 17:25 initrd.img-4.9.0-13-amd64   <---
-rw-r--r-- 1 root root 4.1M Jan 21  2020 vmlinuz-4.9.0-12-amd64
-rw-r--r-- 1 root root 4.1M Jul  6 02:59 vmlinuz-4.9.0-13-amd64      <---

Now copy the kernel command line from /proc/cmdline. We should pass this to kexec.

root@debian:~# cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-12-amd64 root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro net.ifnames=0 biosdevname=0 cgroup_enable=memory console=tty0 console=ttyS0,115200 notsc scsi_mod.use_blk_mq=Y quiet

Load the new kernel using kexec -l.

root@debian:~# kexec -l /boot/vmlinuz-4.9.0-13-amd64 --initrd=/boot/initrd.img-4.9.0-13-amd64 --command-line="BOOT_IMAGE=/boot/vmlinuz-4.9.0-13-amd64 root=UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx ro net.ifnames=0 biosdevname=0 cgroup_enable=memory console=tty0 console=ttyS0,115200 notsc scsi_mod.use_blk_mq=Y quiet"
root@debian:~#

Now upgrade to the new kernel.

root@debian:~# uname -a
Linux debian 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux

root@debian:~# systemctl start kexec.target
[268181.341191] kexec_core: Starting new kernel
/dev/sda1: clean, 35704/655360 files, 366185/2621179 blocks
GROWROOT: NOCHANGE: partition 1 is size 20969439. it cannot be grown

Debian GNU/Linux 9 debian ttyS0

debian login: root
Password:
Last login: Mon Sep 28 14:59:30 IST 2020 on ttyS0
Linux debian 4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

root@debian:~# uname -a
Linux debian 4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux
root@debian:~#

Time to upgrade

This actually took no time. I was pinging this VM from its Host. There was a slight increase in latency while the upgrade was in progress. That was less than a second. But I didn't run any service and tested its status after reboot. Because it may vary from service to service.

64 bytes from 192.168.122.91: icmp_seq=176 ttl=64 time=0.465 ms
64 bytes from 192.168.122.91: icmp_seq=177 ttl=64 time=0.408 ms
64 bytes from 192.168.122.91: icmp_seq=181 ttl=64 time=8.32 ms   <---
64 bytes from 192.168.122.91: icmp_seq=182 ttl=64 time=0.452 ms
64 bytes from 192.168.122.91: icmp_seq=183 ttl=64 time=0.198 ms