You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
307 lines
16 KiB
307 lines
16 KiB
Demonstrations of cpudist.
|
|
|
|
This program summarizes task on-CPU time as a histogram, showing how long tasks
|
|
spent on the CPU before being descheduled. This provides valuable information
|
|
that can indicate oversubscription (too many tasks for too few processors),
|
|
overhead due to excessive context switching (e.g. a common shared lock for
|
|
multiple threads), uneven workload distribution, too-granular tasks, and more.
|
|
|
|
Alternatively, the same options are available for summarizing task off-CPU
|
|
time, which helps understand how often threads are being descheduled and how
|
|
long they spend waiting for I/O, locks, timers, and other causes of suspension.
|
|
|
|
# ./cpudist.py
|
|
Tracing on-CPU time... Hit Ctrl-C to end.
|
|
^C
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 | |
|
|
4 -> 7 : 1 | |
|
|
8 -> 15 : 13 |** |
|
|
16 -> 31 : 187 |****************************************|
|
|
32 -> 63 : 89 |******************* |
|
|
64 -> 127 : 26 |***** |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 1 | |
|
|
|
|
This is from a mostly idle system. Tasks wake up occasionally and run for only
|
|
a few dozen microseconds, and then get descheduled.
|
|
|
|
Here's some output from a system that is heavily loaded by threads that perform
|
|
computation but also compete for a lock:
|
|
|
|
# ./cpudist.py
|
|
Tracing on-CPU time... Hit Ctrl-C to end.
|
|
^C
|
|
usecs : count distribution
|
|
0 -> 1 : 51 |* |
|
|
2 -> 3 : 395 |*********** |
|
|
4 -> 7 : 259 |******* |
|
|
8 -> 15 : 61 |* |
|
|
16 -> 31 : 75 |** |
|
|
32 -> 63 : 31 | |
|
|
64 -> 127 : 7 | |
|
|
128 -> 255 : 5 | |
|
|
256 -> 511 : 3 | |
|
|
512 -> 1023 : 5 | |
|
|
1024 -> 2047 : 6 | |
|
|
2048 -> 4095 : 4 | |
|
|
4096 -> 8191 : 1361 |****************************************|
|
|
8192 -> 16383 : 523 |*************** |
|
|
16384 -> 32767 : 3 | |
|
|
|
|
A bimodal distribution is now clearly visible. Most of the time, tasks were
|
|
able to run for 4-16ms before being descheduled (this is likely the quantum
|
|
length). Occasionally, tasks had to be descheduled a lot earlier -- possibly
|
|
because they competed for a shared lock.
|
|
|
|
If necessary, you can restrict the output to include only threads from a
|
|
particular process -- this helps reduce noise:
|
|
|
|
# ./cpudist.py -p $(pidof parprimes)
|
|
Tracing on-CPU time... Hit Ctrl-C to end.
|
|
^C
|
|
usecs : count distribution
|
|
0 -> 1 : 3 | |
|
|
2 -> 3 : 17 | |
|
|
4 -> 7 : 39 | |
|
|
8 -> 15 : 52 |* |
|
|
16 -> 31 : 43 | |
|
|
32 -> 63 : 12 | |
|
|
64 -> 127 : 13 | |
|
|
128 -> 255 : 0 | |
|
|
256 -> 511 : 1 | |
|
|
512 -> 1023 : 11 | |
|
|
1024 -> 2047 : 15 | |
|
|
2048 -> 4095 : 41 | |
|
|
4096 -> 8191 : 1134 |************************ |
|
|
8192 -> 16383 : 1883 |****************************************|
|
|
16384 -> 32767 : 65 |* |
|
|
|
|
You can also ask for output at predefined intervals, and include timestamps for
|
|
easier interpretation. While we're at it, the -P switch will print a histogram
|
|
separately for each process:
|
|
|
|
# ./cpudist.py -TP 5 3
|
|
Tracing on-CPU time... Hit Ctrl-C to end.
|
|
|
|
03:46:51
|
|
|
|
pid = 0
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 |** |
|
|
4 -> 7 : 17 |********************************** |
|
|
8 -> 15 : 11 |********************** |
|
|
16 -> 31 : 20 |****************************************|
|
|
32 -> 63 : 15 |****************************** |
|
|
64 -> 127 : 9 |****************** |
|
|
128 -> 255 : 6 |************ |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 0 | |
|
|
1024 -> 2047 : 0 | |
|
|
2048 -> 4095 : 1 |** |
|
|
|
|
pid = 5068
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 |************* |
|
|
4 -> 7 : 3 |****************************************|
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 1 |************* |
|
|
|
|
03:46:56
|
|
|
|
pid = 0
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 |** |
|
|
4 -> 7 : 19 |****************************************|
|
|
8 -> 15 : 11 |*********************** |
|
|
16 -> 31 : 9 |****************** |
|
|
32 -> 63 : 3 |****** |
|
|
64 -> 127 : 1 |** |
|
|
128 -> 255 : 3 |****** |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 1 |** |
|
|
|
|
pid = 5068
|
|
usecs : count distribution
|
|
0 -> 1 : 1 |******************** |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 2 |****************************************|
|
|
|
|
03:47:01
|
|
|
|
pid = 0
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 0 | |
|
|
4 -> 7 : 12 |******************************** |
|
|
8 -> 15 : 15 |****************************************|
|
|
16 -> 31 : 15 |****************************************|
|
|
32 -> 63 : 0 | |
|
|
64 -> 127 : 3 |******** |
|
|
128 -> 255 : 1 |** |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 1 |** |
|
|
|
|
pid = 5068
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 |****** |
|
|
4 -> 7 : 6 |****************************************|
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 2 |************* |
|
|
|
|
This histogram was obtained while executing `dd if=/dev/zero of=/dev/null` with
|
|
fairly large block sizes.
|
|
|
|
You could also ask for an off-CPU report using the -O switch. Here's a
|
|
histogram of task block times while the system is heavily loaded:
|
|
|
|
# ./cpudist -O -p $(parprimes)
|
|
Tracing off-CPU time... Hit Ctrl-C to end.
|
|
^C
|
|
usecs : count distribution
|
|
0 -> 1 : 0 | |
|
|
2 -> 3 : 1 | |
|
|
4 -> 7 : 0 | |
|
|
8 -> 15 : 0 | |
|
|
16 -> 31 : 0 | |
|
|
32 -> 63 : 3 | |
|
|
64 -> 127 : 1 | |
|
|
128 -> 255 : 1 | |
|
|
256 -> 511 : 0 | |
|
|
512 -> 1023 : 2 | |
|
|
1024 -> 2047 : 4 | |
|
|
2048 -> 4095 : 3 | |
|
|
4096 -> 8191 : 70 |*** |
|
|
8192 -> 16383 : 867 |****************************************|
|
|
16384 -> 32767 : 141 |****** |
|
|
32768 -> 65535 : 8 | |
|
|
65536 -> 131071 : 0 | |
|
|
131072 -> 262143 : 1 | |
|
|
262144 -> 524287 : 2 | |
|
|
524288 -> 1048575 : 3 | |
|
|
|
|
As you can see, threads are switching out for relatively long intervals, even
|
|
though we know the workload doesn't have any significant blocking. This can be
|
|
a result of over-subscription -- too many threads contending over too few CPUs.
|
|
Indeed, there are four available CPUs and more than four runnable threads:
|
|
|
|
# nproc
|
|
4
|
|
# cat /proc/loadavg
|
|
0.04 0.11 0.06 9/147 7494
|
|
|
|
(This shows we have 9 threads runnable out of 147 total. This is more than 4,
|
|
the number of available CPUs.)
|
|
|
|
Finally, let's ask for a per-thread report and values in milliseconds instead
|
|
of microseconds:
|
|
|
|
# ./cpudist.py -p $(pidof parprimes) -mL
|
|
Tracing on-CPU time... Hit Ctrl-C to end.
|
|
|
|
|
|
tid = 5092
|
|
msecs : count distribution
|
|
0 -> 1 : 3 | |
|
|
2 -> 3 : 4 | |
|
|
4 -> 7 : 4 | |
|
|
8 -> 15 : 535 |****************************************|
|
|
16 -> 31 : 14 |* |
|
|
|
|
tid = 5093
|
|
msecs : count distribution
|
|
0 -> 1 : 8 | |
|
|
2 -> 3 : 6 | |
|
|
4 -> 7 : 4 | |
|
|
8 -> 15 : 534 |****************************************|
|
|
16 -> 31 : 12 | |
|
|
|
|
tid = 5094
|
|
msecs : count distribution
|
|
0 -> 1 : 38 |*** |
|
|
2 -> 3 : 5 | |
|
|
4 -> 7 : 5 | |
|
|
8 -> 15 : 476 |****************************************|
|
|
16 -> 31 : 25 |** |
|
|
|
|
tid = 5095
|
|
msecs : count distribution
|
|
0 -> 1 : 31 |** |
|
|
2 -> 3 : 6 | |
|
|
4 -> 7 : 10 | |
|
|
8 -> 15 : 478 |****************************************|
|
|
16 -> 31 : 20 |* |
|
|
|
|
tid = 5096
|
|
msecs : count distribution
|
|
0 -> 1 : 21 |* |
|
|
2 -> 3 : 5 | |
|
|
4 -> 7 : 4 | |
|
|
8 -> 15 : 523 |****************************************|
|
|
16 -> 31 : 16 |* |
|
|
|
|
tid = 5097
|
|
msecs : count distribution
|
|
0 -> 1 : 11 | |
|
|
2 -> 3 : 7 | |
|
|
4 -> 7 : 7 | |
|
|
8 -> 15 : 502 |****************************************|
|
|
16 -> 31 : 23 |* |
|
|
|
|
tid = 5098
|
|
msecs : count distribution
|
|
0 -> 1 : 21 |* |
|
|
2 -> 3 : 5 | |
|
|
4 -> 7 : 3 | |
|
|
8 -> 15 : 494 |****************************************|
|
|
16 -> 31 : 28 |** |
|
|
|
|
tid = 5099
|
|
msecs : count distribution
|
|
0 -> 1 : 15 |* |
|
|
2 -> 3 : 4 | |
|
|
4 -> 7 : 6 | |
|
|
8 -> 15 : 521 |****************************************|
|
|
16 -> 31 : 12 | |
|
|
|
|
It looks like all threads are more-or-less equally busy, and are typically
|
|
switched out after running for 8-15 milliseconds (again, this is the typical
|
|
quantum length).
|
|
|
|
|
|
USAGE message:
|
|
|
|
# ./cpudist.py -h
|
|
|
|
usage: cpudist.py [-h] [-O] [-T] [-m] [-P] [-L] [-p PID] [interval] [count]
|
|
|
|
Summarize on-CPU time per task as a histogram.
|
|
|
|
positional arguments:
|
|
interval output interval, in seconds
|
|
count number of outputs
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
-O, --offcpu measure off-CPU time
|
|
-T, --timestamp include timestamp on output
|
|
-m, --milliseconds millisecond histogram
|
|
-P, --pids print a histogram per process ID
|
|
-L, --tids print a histogram per thread ID
|
|
-p PID, --pid PID trace this PID only
|
|
|
|
examples:
|
|
cpudist # summarize on-CPU time as a histogram
|
|
cpudist -O # summarize off-CPU time as a histogram
|
|
cpudist 1 10 # print 1 second summaries, 10 times
|
|
cpudist -mT 1 # 1s summaries, milliseconds, and timestamps
|
|
cpudist -P # show each PID separately
|
|
cpudist -p 185 # trace PID 185 only
|