You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
368 lines
20 KiB
368 lines
20 KiB
Demonstrations of funclatency, the Linux eBPF/bcc version.
|
|
|
|
|
|
Timing the do_sys_open() kernel function until Ctrl-C:
|
|
|
|
# ./funclatency do_sys_open
|
|
Tracing do_sys_open... Hit Ctrl-C to end.
|
|
^C
|
|
nsecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 124 |**************** |
|
|
4096 -> 8191 : 291 |**************************************|
|
|
8192 -> 16383 : 36 |**** |
|
|
16384 -> 32767 : 16 |** |
|
|
32768 -> 65535 : 8 |* |
|
|
65536 -> 131071 : 0 | |
|
|
131072 -> 262143 : 0 | |
|
|
262144 -> 524287 : 0 | |
|
|
524288 -> 1048575 : 0 | |
|
|
1048576 -> 2097151 : 0 | |
|
|
2097152 -> 4194303 : 1 | |
|
|
Detaching...
|
|
|
|
The output shows a histogram of function latency (call time), measured from when
|
|
the function began executing (was called) to when it finished (returned).
|
|
|
|
This example output shows that most of the time, do_sys_open() took between
|
|
2048 and 65536 nanoseconds (2 to 65 microseconds). The peak of this distribution
|
|
shows 291 calls of between 4096 and 8191 nanoseconds. There was also one
|
|
occurrence, an outlier, in the 2 to 4 millisecond range.
|
|
|
|
How this works: the function entry and return are traced using the kernel kprobe
|
|
and kretprobe tracer. Timestamps are collected, the delta time calculated, which
|
|
is the bucketized and stored as an in-kernel histogram for efficiency. The
|
|
histogram is visible in the output: it's the "count" column; everything else is
|
|
decoration. Only the count column is copied to user-level on output. This is an
|
|
efficient way to time kernel functions and examine their latency distribution.
|
|
|
|
|
|
Now trace a user function, pthread_mutex_lock in libpthread, to determine if
|
|
there is considerable lock contention:
|
|
|
|
# ./funclatency pthread:pthread_mutex_lock -p $(pidof contentions)
|
|
Tracing 1 function for "pthread:pthread_mutex_lock"... Hit Ctrl-C to end.
|
|
nsecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 508967 |****************************************|
|
|
4096 -> 8191 : 70072 |***** |
|
|
8192 -> 16383 : 27686 |** |
|
|
16384 -> 32767 : 5075 | |
|
|
32768 -> 65535 : 2318 | |
|
|
65536 -> 131071 : 581 | |
|
|
131072 -> 262143 : 38 | |
|
|
262144 -> 524287 : 5 | |
|
|
524288 -> 1048575 : 1 | |
|
|
1048576 -> 2097151 : 9 | |
|
|
Detaching...
|
|
|
|
It seems that most calls to pthread_mutex_lock completed rather quickly (in
|
|
under 4us), but there were some cases of considerable contention, sometimes
|
|
over a full millisecond.
|
|
|
|
|
|
Run a quick-and-dirty profiler over all the functions in an executable:
|
|
# ./funclatency /home/user/primes:* -p $(pidof primes) -F
|
|
Tracing 15 functions for "/home/user/primes:*"... Hit Ctrl-C to end.
|
|
^C
|
|
|
|
Function = is_prime [6556]
|
|
nsecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 1495322 |****************************************|
|
|
4096 -> 8191 : 95744 |** |
|
|
8192 -> 16383 : 9926 | |
|
|
16384 -> 32767 : 3070 | |
|
|
32768 -> 65535 : 1415 | |
|
|
65536 -> 131071 : 112 | |
|
|
131072 -> 262143 : 9 | |
|
|
262144 -> 524287 : 3 | |
|
|
524288 -> 1048575 : 0 | |
|
|
1048576 -> 2097151 : 8 | |
|
|
|
|
Function = insert_result [6556]
|
|
nsecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 111047 |****************************************|
|
|
4096 -> 8191 : 3998 |* |
|
|
8192 -> 16383 : 720 | |
|
|
16384 -> 32767 : 238 | |
|
|
32768 -> 65535 : 106 | |
|
|
65536 -> 131071 : 5 | |
|
|
131072 -> 262143 : 4 | |
|
|
Detaching...
|
|
|
|
From the results, we can see that the is_prime function has something resembling
|
|
an exponential distribution -- very few primes take a very long time to test,
|
|
while most numbers are verified as prime or composite in less than 4us. The
|
|
insert_result function exhibits a similar phenomenon, likely due to contention
|
|
over a shared results container.
|
|
|
|
|
|
Now vfs_read() is traced, and a microseconds histogram printed:
|
|
|
|
# ./funclatency -u vfs_read
|
|
Tracing vfs_read... Hit Ctrl-C to end.
|
|
^C
|
|
usecs : count distribution
|
|
0 -> 1 : 1143 |**************************************|
|
|
2 -> 3 : 420 |************* |
|
|
4 -> 7 : 159 |***** |
|
|
8 -> 15 : 295 |********* |
|
|
16 -> 31 : 25 | |
|
|
32 -> 63 : 5 | |
|
|
64 -> 127 : 1 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 1 | |
|
|
2048 -> 4095 : 0 | |
|
|
4096 -> 8191 : 5 | |
|
|
8192 -> 16383 : 0 | |
|
|
16384 -> 32767 : 0 | |
|
|
32768 -> 65535 : 0 | |
|
|
65536 -> 131071 : 7 | |
|
|
131072 -> 262143 : 7 | |
|
|
262144 -> 524287 : 3 | |
|
|
524288 -> 1048575 : 7 | |
|
|
Detaching...
|
|
|
|
This shows a bimodal distribution. Many vfs_read() calls were faster than 15
|
|
microseconds, however, there was also a small handful between 65 milliseconds
|
|
and 1 second, seen at the bottom of the table. These are likely network reads
|
|
from SSH, waiting on interactive keystrokes.
|
|
|
|
|
|
Tracing do_nanosleep() in milliseconds:
|
|
|
|
# ./funclatency -m do_nanosleep
|
|
Tracing do_nanosleep... Hit Ctrl-C to end.
|
|
^C
|
|
msecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 328 |**************************************|
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 0 | |
|
|
4096 -> 8191 : 32 |*** |
|
|
8192 -> 16383 : 0 | |
|
|
16384 -> 32767 : 0 | |
|
|
32768 -> 65535 : 2 | |
|
|
Detaching...
|
|
|
|
This looks like it has found threads that are sleeping every 1, 5, and 60
|
|
seconds.
|
|
|
|
|
|
An interval can be provided using -i, and timestamps added using -T. For
|
|
example, tracing vfs_read() latency in milliseconds and printing output
|
|
every 5 seconds:
|
|
|
|
# ./funclatency -mTi 5 vfs_read
|
|
Tracing vfs_read... Hit Ctrl-C to end.
|
|
|
|
20:10:08
|
|
msecs : count distribution
|
|
0 -> 1 : 1500 |*************************************+|
|
|
2 -> 3 : 3 | |
|
|
4 -> 7 : 1 | |
|
|
8 -> 15 : 2 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 4 | |
|
|
128 -> 255 : 3 | |
|
|
256 -> 511 : 1 | |
|
|
512 -> 1023 : 7 | |
|
|
|
|
20:10:13
|
|
msecs : count distribution
|
|
0 -> 1 : 1251 |*************************************+|
|
|
2 -> 3 : 3 | |
|
|
4 -> 7 : 2 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 2 | |
|
|
32 -> 63 : 3 | |
|
|
64 -> 127 : 5 | |
|
|
128 -> 255 : 5 | |
|
|
256 -> 511 : 3 | |
|
|
512 -> 1023 : 6 | |
|
|
1024 -> 2047 : 2 | |
|
|
|
|
20:10:18
|
|
msecs : count distribution
|
|
0 -> 1 : 1265 |*************************************+|
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 5 | |
|
|
8 -> 15 : 9 | |
|
|
16 -> 31 : 7 | |
|
|
32 -> 63 : 1 | |
|
|
64 -> 127 : 2 | |
|
|
128 -> 255 : 3 | |
|
|
256 -> 511 : 5 | |
|
|
512 -> 1023 : 5 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 1 | |
|
|
^C
|
|
20:10:20
|
|
msecs : count distribution
|
|
0 -> 1 : 249 |*************************************+|
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 1 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 0 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 1 | |
|
|
Detaching...
|
|
|
|
|
|
A single process can be traced, which filters in-kernel for efficiency. Here,
|
|
the vfs_read() function is timed as milliseconds for PID 17064, which is a
|
|
bash shell:
|
|
|
|
# ./funclatency -mp 17064 vfs_read
|
|
Tracing vfs_read... Hit Ctrl-C to end.
|
|
^C
|
|
msecs : count distribution
|
|
0 -> 1 : 1 |** |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 1 |** |
|
|
16 -> 31 : 2 |***** |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 13 |**************************************|
|
|
128 -> 255 : 10 |***************************** |
|
|
256 -> 511 : 4 |*********** |
|
|
Detaching...
|
|
|
|
The distribution between 64 and 511 milliseconds shows keystroke latency.
|
|
|
|
|
|
The -F option can be used to print a histogram per function. Eg:
|
|
|
|
# ./funclatency -uF 'vfs_r*'
|
|
Tracing 5 functions for "vfs_r*"... Hit Ctrl-C to end.
|
|
^C
|
|
|
|
Function = vfs_read
|
|
usecs : count distribution
|
|
0 -> 1 : 1044 |****************************************|
|
|
2 -> 3 : 383 |************** |
|
|
4 -> 7 : 76 |** |
|
|
8 -> 15 : 41 |* |
|
|
16 -> 31 : 26 | |
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 1 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 4 | |
|
|
4096 -> 8191 : 2 | |
|
|
8192 -> 16383 : 0 | |
|
|
16384 -> 32767 : 0 | |
|
|
32768 -> 65535 : 2 | |
|
|
65536 -> 131071 : 5 | |
|
|
131072 -> 262143 : 5 | |
|
|
262144 -> 524287 : 3 | |
|
|
524288 -> 1048575 : 7 | |
|
|
|
|
Function = vfs_rename
|
|
usecs : count distribution
|
|
0 -> 1 : 2 |**** |
|
|
2 -> 3 : 2 |**** |
|
|
4 -> 7 : 2 |**** |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 6 |************* |
|
|
32 -> 63 : 18 |****************************************|
|
|
Detaching...
|
|
|
|
|
|
|
|
USAGE message:
|
|
|
|
# ./funclatency -h
|
|
usage: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r] [-v]
|
|
pattern
|
|
|
|
Time functions and print latency as a histogram
|
|
|
|
positional arguments:
|
|
pattern search expression for functions
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
-p PID, --pid PID trace this PID only
|
|
-i INTERVAL, --interval INTERVAL
|
|
summary interval, in seconds
|
|
-d DURATION, --duration DURATION
|
|
total duration of trace, in seconds
|
|
-T, --timestamp include timestamp on output
|
|
-u, --microseconds microsecond histogram
|
|
-m, --milliseconds millisecond histogram
|
|
-F, --function show a separate histogram per function
|
|
-r, --regexp use regular expressions. Default is "*" wildcards
|
|
only.
|
|
-v, --verbose print the BPF program (for debugging purposes)
|
|
|
|
examples:
|
|
./funclatency do_sys_open # time the do_sys_open() kernel function
|
|
./funclatency c:read # time the read() C library function
|
|
./funclatency -u vfs_read # time vfs_read(), in microseconds
|
|
./funclatency -m do_nanosleep # time do_nanosleep(), in milliseconds
|
|
./funclatency -i 2 -d 10 c:open # output every 2 seconds, for duration 10s
|
|
./funclatency -mTi 5 vfs_read # output every 5 seconds, with timestamps
|
|
./funclatency -p 181 vfs_read # time process 181 only
|
|
./funclatency 'vfs_fstat*' # time both vfs_fstat() and vfs_fstatat()
|
|
./funclatency 'c:*printf' # time the *printf family of functions
|
|
./funclatency -F 'vfs_r*' # show one histogram per matched function
|