You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
158 lines
7.4 KiB
158 lines
7.4 KiB
Demonstrations of zfsslower, the Linux eBPF/bcc version.
|
|
|
|
|
|
zfsslower shows ZFS reads, writes, opens, and fsyncs, slower than a threshold.
|
|
It has been written to work on ZFS on Linux (http://zfsonlinux.org). For
|
|
example:
|
|
|
|
# ./zfsslower
|
|
Tracing ZFS operations slower than 10 ms
|
|
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
|
|
06:31:28 dd 25570 W 131072 38784 303.92 data1
|
|
06:31:34 dd 25686 W 131072 38784 388.28 data1
|
|
06:31:35 dd 25686 W 131072 78720 519.66 data1
|
|
06:31:35 dd 25686 W 131072 116992 405.94 data1
|
|
06:31:35 dd 25686 W 131072 153600 433.52 data1
|
|
06:31:36 dd 25686 W 131072 188672 314.37 data1
|
|
06:31:36 dd 25686 W 131072 222336 372.33 data1
|
|
06:31:36 dd 25686 W 131072 254592 309.59 data1
|
|
06:31:37 dd 25686 W 131072 285440 304.52 data1
|
|
06:31:37 dd 25686 W 131072 315008 236.45 data1
|
|
06:31:37 dd 25686 W 131072 343424 193.54 data1
|
|
06:31:38 dd 25686 W 131072 370560 286.07 data1
|
|
06:31:38 dd 25686 W 131072 396672 251.92 data1
|
|
[...]
|
|
|
|
This shows writes to a "data1" file, each taking well over the 10 ms threshold.
|
|
the slowest, on the 3rd line of output, reached 519.66 ms for a 128 Kbyte
|
|
write by the "dd" command.
|
|
|
|
This "latency" is measured from when the operation was issued from the VFS
|
|
interface to the file system (via the ZFS POSIX layer), to when it completed.
|
|
This spans everything: block device I/O (disk I/O), file system CPU cycles,
|
|
file system locks, run queue latency, etc. This is a better measure of the
|
|
latency suffered by applications reading from the file system than measuring
|
|
this down at the block device interface.
|
|
|
|
Note that this only traces the common file system operations previously
|
|
listed: other file system operations (eg, inode operations including
|
|
getattr()) are not traced.
|
|
|
|
|
|
A threshold of 0 will trace all operations. Warning: the output will be
|
|
verbose, as it will include all file system cache hits.
|
|
|
|
# ./zfsslower 0
|
|
Tracing ZFS operations
|
|
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
|
|
06:36:07 dd 32242 O 0 0 0.01 data1
|
|
06:36:07 dd 32242 W 131072 0 0.25 data1
|
|
06:36:07 dd 32242 W 131072 128 0.03 data1
|
|
06:36:07 dd 32242 W 131072 256 0.04 data1
|
|
06:36:07 dd 32242 W 131072 384 0.04 data1
|
|
06:36:07 dd 32242 W 131072 512 0.04 data1
|
|
06:36:07 dd 32242 W 131072 640 0.03 data1
|
|
06:36:07 dd 32242 W 131072 768 0.03 data1
|
|
06:36:07 dd 32242 W 131072 896 0.04 data1
|
|
06:36:07 dd 32242 W 131072 1024 0.28 data1
|
|
06:36:07 dd 32242 W 131072 1152 0.04 data1
|
|
06:36:07 dd 32242 W 131072 1280 0.03 data1
|
|
[...]
|
|
06:36:07 dd 32242 W 131072 13824 0.04 data1
|
|
06:36:07 dd 32242 W 131072 13952 0.04 data1
|
|
06:36:07 dd 32242 W 131072 14080 0.04 data1
|
|
06:36:07 dd 32242 W 131072 14208 398.92 data1
|
|
06:36:07 dd 32242 W 131072 14336 0.04 data1
|
|
06:36:07 dd 32242 W 131072 14464 0.04 data1
|
|
06:36:07 dd 32242 W 131072 15104 0.03 data1
|
|
[...]
|
|
|
|
The output now includes the open operation for this file ("O"), and then the
|
|
writes. Most of the writes are very fast, with only an occasional outlier that
|
|
is in the hundreds of milliseconds.
|
|
|
|
Fortunately this is not a real world environment: I setup a zpool on top of a
|
|
XFS file system for testing purposes. More debugging using other tools will
|
|
explain these outliers: possibly XFS flushing.
|
|
|
|
|
|
Here's a random read workload, and showing operations slower than 1 ms:
|
|
|
|
# ./zfsslower 1
|
|
Tracing ZFS operations slower than 1 ms
|
|
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
|
|
06:47:30 randread.pl 15431 R 8192 97840 1.03 data1
|
|
06:47:30 randread.pl 15431 R 8192 416744 1.12 data1
|
|
06:47:31 randread.pl 15431 R 8192 228856 1.96 data1
|
|
06:47:31 randread.pl 15431 R 8192 452248 1.02 data1
|
|
06:47:31 randread.pl 15431 R 8192 315288 5.90 data1
|
|
06:47:31 randread.pl 15431 R 8192 752696 1.20 data1
|
|
06:47:31 randread.pl 15431 R 8192 481832 1.39 data1
|
|
06:47:31 randread.pl 15431 R 8192 673752 1.39 data1
|
|
06:47:31 randread.pl 15431 R 8192 691736 1.01 data1
|
|
06:47:31 randread.pl 15431 R 8192 694776 1.78 data1
|
|
06:47:31 randread.pl 15431 R 8192 403328 3.75 data1
|
|
06:47:31 randread.pl 15431 R 8192 567688 1.08 data1
|
|
06:47:31 randread.pl 15431 R 8192 694280 1.31 data1
|
|
06:47:31 randread.pl 15431 R 8192 669280 1.06 data1
|
|
06:47:31 randread.pl 15431 R 8192 426608 1.56 data1
|
|
06:47:31 randread.pl 15431 R 8192 42512 1.01 data1
|
|
06:47:31 randread.pl 15431 R 8192 22944 1.33 data1
|
|
06:47:31 randread.pl 15431 R 8192 427432 1.48 data1
|
|
06:47:31 randread.pl 15431 R 8192 261320 1.28 data1
|
|
06:47:31 randread.pl 15431 R 8192 132248 1.23 data1
|
|
06:47:31 randread.pl 15431 R 8192 96936 1.04 data1
|
|
06:47:31 randread.pl 15431 R 8192 482800 2.63 data1
|
|
[...]
|
|
|
|
|
|
A -j option will print just the fields (parsable output, csv):
|
|
|
|
# ./zfsslower -j 1
|
|
ENDTIME_us,TASK,PID,TYPE,BYTES,OFFSET_b,LATENCY_us,FILE
|
|
252305490911,randread.pl,17922,R,8192,163446784,1156,data1
|
|
252305493852,randread.pl,17922,R,8192,321437696,1129,data1
|
|
252305498839,randread.pl,17922,R,8192,475152384,1154,data1
|
|
252305505515,randread.pl,17922,R,8192,49094656,1082,data1
|
|
252305506774,randread.pl,17922,R,8192,470401024,1245,data1
|
|
252305509265,randread.pl,17922,R,8192,553246720,2412,data1
|
|
252305512365,randread.pl,17922,R,8192,20963328,1093,data1
|
|
252305513755,randread.pl,17922,R,8192,304111616,1350,data1
|
|
252305583330,randread.pl,17922,R,8192,166174720,1154,data1
|
|
252305593913,randread.pl,17922,R,8192,175079424,1241,data1
|
|
252305602833,randread.pl,17922,R,8192,305340416,3307,data1
|
|
252305608663,randread.pl,17922,R,8192,655958016,2704,data1
|
|
252305611212,randread.pl,17922,R,8192,40951808,1033,data1
|
|
252305614609,randread.pl,17922,R,8192,318922752,2687,data1
|
|
252305623800,randread.pl,17922,R,8192,246734848,2983,data1
|
|
252305711125,randread.pl,17922,R,8192,581795840,1091,data1
|
|
252305728694,randread.pl,17922,R,8192,710483968,1034,data1
|
|
252305762046,randread.pl,17922,R,8192,329367552,1405,data1
|
|
252305798215,randread.pl,17922,R,8192,44482560,1030,data1
|
|
252305806748,randread.pl,17922,R,8192,660602880,1069,data1
|
|
252305826360,randread.pl,17922,R,8192,616144896,2327,data1
|
|
[...]
|
|
|
|
|
|
USAGE message:
|
|
|
|
# ./zfsslower -h
|
|
usage: zfsslower [-h] [-j] [-p PID] [min_ms]
|
|
|
|
Trace common ZFS file operations slower than a threshold
|
|
|
|
positional arguments:
|
|
min_ms minimum I/O duration to trace, in ms (default 10)
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
-j, --csv just print fields: comma-separated values
|
|
-p PID, --pid PID trace this PID only
|
|
|
|
examples:
|
|
./zfsslower # trace operations slower than 10 ms (default)
|
|
./zfsslower 1 # trace operations slower than 1 ms
|
|
./zfsslower -j 1 # ... 1 ms, parsable output (csv)
|
|
./zfsslower 0 # trace all operations (warning: verbose)
|
|
./zfsslower -p 185 # trace PID 185 only
|