You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
124 lines
5.6 KiB
124 lines
5.6 KiB
4 months ago
|
Demonstrations of fileslower, the Linux eBPF/bcc version.
|
||
|
|
||
|
|
||
|
fileslower shows file-based synchronous reads and writes slower than a
|
||
|
threshold. For example:
|
||
|
|
||
|
# ./fileslower
|
||
|
Tracing sync read/writes slower than 10 ms
|
||
|
TIME(s) COMM PID D BYTES LAT(ms) FILENAME
|
||
|
0.000 randread.pl 4762 R 8192 12.70 data1
|
||
|
8.850 randread.pl 4762 R 8192 11.26 data1
|
||
|
12.852 randread.pl 4762 R 8192 10.43 data1
|
||
|
|
||
|
This showed a few reads from a "randread.pl" program, each 8 Kbytes in size,
|
||
|
and from a "data1" file. These all had over 10 ms latency.
|
||
|
|
||
|
This "latency" is measured from when the read or write was issued at the VFS
|
||
|
interface, to when it completed. This spans everything: block device I/O (disk
|
||
|
I/O), file system CPU cycles, file system locks, run queue latency, etc. This
|
||
|
is a better measure of the latency suffered by applications reading from the
|
||
|
file system than measuring this down at the block device interface.
|
||
|
|
||
|
Note that this only traces file reads and writes: other file system operations
|
||
|
(eg, directory operations, open(), fflush()) are not traced.
|
||
|
|
||
|
|
||
|
The threshold can be provided as an argument. Eg, I/O slower than 1 ms:
|
||
|
|
||
|
# ./fileslower 1
|
||
|
Tracing sync read/writes slower than 1 ms
|
||
|
TIME(s) COMM PID D BYTES LAT(ms) FILENAME
|
||
|
0.000 randread.pl 6925 R 8192 1.06 data1
|
||
|
0.082 randread.pl 6925 R 8192 2.42 data1
|
||
|
0.116 randread.pl 6925 R 8192 1.78 data1
|
||
|
0.153 randread.pl 6925 R 8192 2.31 data1
|
||
|
0.330 randread.pl 6925 R 8192 1.14 data1
|
||
|
0.345 randread.pl 6925 R 8192 1.52 data1
|
||
|
0.359 randread.pl 6925 R 8192 1.04 data1
|
||
|
0.532 randread.pl 6925 R 8192 2.56 data1
|
||
|
0.609 supervise 1892 W 18 3.65 status.new
|
||
|
0.610 randread.pl 6925 R 8192 1.37 data1
|
||
|
0.614 randread.pl 6925 R 8192 3.04 data1
|
||
|
0.729 randread.pl 6925 R 8192 2.90 data1
|
||
|
0.755 randread.pl 6925 R 8192 1.12 data1
|
||
|
0.762 randread.pl 6925 R 8192 2.62 data1
|
||
|
0.771 randread.pl 6925 R 8192 1.07 data1
|
||
|
0.816 randread.pl 6925 R 8192 10.50 data1
|
||
|
0.983 randread.pl 6925 R 8192 1.73 data1
|
||
|
0.989 randread.pl 6925 R 8192 2.12 data1
|
||
|
0.992 randread.pl 6925 R 8192 2.17 data1
|
||
|
1.001 randread.pl 6925 R 8192 1.93 data1
|
||
|
1.007 randread.pl 6925 R 8192 2.03 data1
|
||
|
1.210 randread.pl 6925 R 8192 1.82 data1
|
||
|
1.213 randread.pl 6925 R 8192 2.58 data1
|
||
|
1.219 randread.pl 6925 R 8192 2.20 data1
|
||
|
1.430 randread.pl 6925 R 8192 1.01 data1
|
||
|
1.448 randread.pl 6925 R 8192 2.22 data1
|
||
|
[...]
|
||
|
|
||
|
There's now much more output (this spans only 1.4 seconds, the previous output
|
||
|
spanned 12 seconds), and the lower threshold is catching more I/O.
|
||
|
|
||
|
|
||
|
In the following example, the file system caches were dropped before running
|
||
|
fileslower, and then in another session a "man ls" was executed. The command
|
||
|
and files read from disk can be seen:
|
||
|
|
||
|
# echo 3 > /proc/sys/vm/drop_caches; ./fileslower 1
|
||
|
Tracing sync read/writes slower than 1 ms
|
||
|
TIME(s) COMM PID D BYTES LAT(ms) FILENAME
|
||
|
0.000 bash 9647 R 128 5.83 man
|
||
|
0.050 man 9647 R 832 19.52 libmandb-2.6.7.1.so
|
||
|
0.066 man 9647 R 832 15.79 libman-2.6.7.1.so
|
||
|
0.123 man 9647 R 832 56.36 libpipeline.so.1.3.0
|
||
|
0.135 man 9647 R 832 9.79 libgdbm.so.3.0.0
|
||
|
0.323 man 9647 R 4096 59.52 locale.alias
|
||
|
0.540 man 9648 R 8192 11.11 ls.1.gz
|
||
|
0.558 man 9647 R 72 6.97 index.db
|
||
|
0.563 man 9647 R 4096 5.12 index.db
|
||
|
0.723 man 9658 R 128 12.06 less
|
||
|
0.725 man 9656 R 128 14.52 nroff
|
||
|
0.779 man 9655 R 128 68.86 tbl
|
||
|
0.814 nroff 9660 R 128 14.55 locale
|
||
|
0.830 pager 9658 R 4096 28.27 .lesshst
|
||
|
0.866 man 9654 R 128 163.12 preconv
|
||
|
0.980 nroff 9684 R 128 13.80 groff
|
||
|
0.999 groff 9684 R 4096 14.29 DESC
|
||
|
1.036 groff 9685 R 128 5.94 troff
|
||
|
1.038 groff 9686 R 128 7.76 grotty
|
||
|
1.065 troff 9685 R 4096 6.33 R
|
||
|
1.082 troff 9685 R 4096 10.52 BI
|
||
|
1.096 troff 9685 R 4096 8.70 troffrc
|
||
|
1.176 troff 9685 R 4096 80.12 composite.tmac
|
||
|
1.195 troff 9685 R 4096 19.20 fallbacks.tmac
|
||
|
1.202 troff 9685 R 4096 6.79 tty.tmac
|
||
|
1.221 troff 9685 R 4096 7.87 man.local
|
||
|
2.977 supervise 1876 W 18 4.23 status.new
|
||
|
|
||
|
This caught an individual I/O reaching 163.12 ms, for the "preconv" file. While
|
||
|
the file system cache was flushed, causing these to need to be read from disk,
|
||
|
the duration here may not be entirely disk I/O: it can include file system
|
||
|
locks, run queue latency, etc. These can be explored using other commands.
|
||
|
|
||
|
|
||
|
USAGE message:
|
||
|
|
||
|
# ./fileslower -h
|
||
|
usage: fileslower.py [-h] [-p PID] [-a] [min_ms]
|
||
|
|
||
|
Trace slow synchronous file reads and writes
|
||
|
|
||
|
positional arguments:
|
||
|
min_ms minimum I/O duration to trace, in ms (default 10)
|
||
|
|
||
|
optional arguments:
|
||
|
-h, --help show this help message and exit
|
||
|
-p PID, --pid PID trace this PID only
|
||
|
-a, --all-files include non-regular file types
|
||
|
|
||
|
examples:
|
||
|
./fileslower # trace sync file I/O slower than 10 ms (default)
|
||
|
./fileslower 1 # trace sync file I/O slower than 1 ms
|
||
|
./fileslower -p 185 # trace PID 185 only
|