You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
670 lines
30 KiB
670 lines
30 KiB
HOWTO - using the library with perf {#howto_perf}
|
|
===================================
|
|
|
|
@brief Using command line perf and OpenCSD to collect and decode trace.
|
|
|
|
This HOWTO explains how to use the perf cmd line tools and the openCSD
|
|
library to collect and extract program flow traces generated by the
|
|
CoreSight IP blocks on a Linux system. The examples have been generated using
|
|
an aarch64 Juno-r0 platform.
|
|
|
|
|
|
On Target Trace Acquisition - Perf Record
|
|
-----------------------------------------
|
|
|
|
Compile the perf tool from the same kernel source code version you are using with:
|
|
|
|
make -C tools/perf
|
|
|
|
This will yield a `perf` executable that will support CoreSight trace collection.
|
|
|
|
*Note:* If traces are to be decompressed **off** target, there is no need to download
|
|
and compile the openCSD library (on the target).
|
|
|
|
If you are instead planning to use perf to record and decode the trace on the target,
|
|
compile the perf tool linking against the openCSD library, in the following way:
|
|
|
|
make -C tools/perf VF=1 CORESIGHT=1
|
|
|
|
Further information on the needed build environments and options are detailed later
|
|
in the section **Off Target Perf Tools Compilation**.
|
|
|
|
Before launching a trace run a sink that will collect trace data needs to be
|
|
identified. All CoreSight blocks identified by the framework are registed in
|
|
sysFS:
|
|
|
|
|
|
linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
|
|
etm0 etm2 etm4 etm6 funnel0 funnel2 funnel4 stm0 tmc_etr0
|
|
etm1 etm3 etm5 etm7 funnel1 funnel3 replicator0 tmc_etf0
|
|
|
|
|
|
CoreSight blocks are listed in the device tree for a specific system and
|
|
discovered at boot time. Since tracers can be linked to more than one sink,
|
|
the sink that will recieve trace data needs to be identified and given as an
|
|
option on the perf command line. Once a sink has been identify trace collection
|
|
can start. An easy and yet interesting example is the `uname` command:
|
|
|
|
linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@tmc_etr0/ --per-thread uname
|
|
|
|
This will generate a `perf.data` file where execution has been traced for both
|
|
user and kernel space. To narrow the field to either user or kernel space the
|
|
`u` and `k` options can be specified. For example the following will limit
|
|
traces to user space:
|
|
|
|
|
|
linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@tmc_etr0/u --per-thread uname
|
|
Problems setting modules path maps, continuing anyway...
|
|
-----------------------------------------------------------
|
|
perf_event_attr:
|
|
type 8
|
|
size 112
|
|
{ sample_period, sample_freq } 1
|
|
sample_type IP|TID|IDENTIFIER
|
|
read_format ID
|
|
disabled 1
|
|
exclude_kernel 1
|
|
exclude_hv 1
|
|
enable_on_exec 1
|
|
sample_id_all 1
|
|
------------------------------------------------------------
|
|
sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8
|
|
------------------------------------------------------------
|
|
perf_event_attr:
|
|
type 1
|
|
size 112
|
|
config 0x9
|
|
{ sample_period, sample_freq } 1
|
|
sample_type IP|TID|IDENTIFIER
|
|
read_format ID
|
|
disabled 1
|
|
exclude_kernel 1
|
|
exclude_hv 1
|
|
mmap 1
|
|
comm 1
|
|
enable_on_exec 1
|
|
task 1
|
|
sample_id_all 1
|
|
mmap2 1
|
|
comm_exec 1
|
|
------------------------------------------------------------
|
|
sys_perf_event_open: pid 11375 cpu -1 group_fd -1 flags 0x8
|
|
mmap size 266240B
|
|
AUX area mmap length 131072
|
|
perf event ring buffer mmapped per thread
|
|
Synthesizing auxtrace information
|
|
Linux
|
|
auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0
|
|
[ perf record: Woken up 1 times to write data ]
|
|
overlapping maps:
|
|
7f99daf000-7f99db0000 0 [vdso]
|
|
7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
|
|
7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so
|
|
7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
|
|
failed to write feature 8
|
|
failed to write feature 9
|
|
failed to write feature 14
|
|
[ perf record: Captured and wrote 0.072 MB perf.data ]
|
|
|
|
linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data
|
|
_-rw------- 1 linaro linaro 77888 Mar 2 20:41 perf.data
|
|
|
|
/home/linaro/.debug/:
|
|
total 16
|
|
drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [kernel.kallsyms]
|
|
drwxr-xr-x 2 linaro linaro 4096 Mar 2 20:40 [vdso]
|
|
drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 bin
|
|
drwxr-xr-x 3 linaro linaro 4096 Mar 2 20:40 lib
|
|
|
|
Trace data filtering
|
|
--------------------
|
|
The amount of traces generated by CoreSight tracers is staggering, event for
|
|
the most simple trace scenario. Reducing trace generation to specific areas
|
|
of interest is desirable to save trace buffer space and avoid getting lost in
|
|
the trace data that isn't relevant. Supplementing the 'k' and 'u' options
|
|
described above is the notion of address filters.
|
|
|
|
On CoreSight two types of address filter have been implemented - address range
|
|
and start/stop filter:
|
|
|
|
**Address range filters:**
|
|
With address range filters traces are generated if the instruction pointer
|
|
falls within the specified range. Any work done by the CPU outside of that
|
|
range will not be traced. Address range filters can be specified for both
|
|
user and kernel space session:
|
|
|
|
perf record -e cs_etm/@tmc_etr0/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname
|
|
|
|
perf record -e cs_etm/@tmc_etr0/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
|
|
|
|
When dealing with kernel space trace addresses are typically taken in the
|
|
'System.map' file. In user space addresses are relocatable and can be
|
|
extracted from an objdump output:
|
|
|
|
$ aarch64-linux-gnu-objdump -d libcstest.so.1.0
|
|
...
|
|
...
|
|
000000000000072c <coresight_test1>: <------------ Beginning of traces
|
|
72c: d10083ff sub sp, sp, #0x20
|
|
730: b9000fe0 str w0, [sp,#12]
|
|
734: b9001fff str wzr, [sp,#28]
|
|
738: 14000007 b 754 <coresight_test1+0x28>
|
|
73c: b9400fe0 ldr w0, [sp,#12]
|
|
740: 11000800 add w0, w0, #0x2
|
|
744: b9000fe0 str w0, [sp,#12]
|
|
748: b9401fe0 ldr w0, [sp,#28]
|
|
74c: 11000400 add w0, w0, #0x1
|
|
750: b9001fe0 str w0, [sp,#28]
|
|
754: b9401fe0 ldr w0, [sp,#28]
|
|
758: 7100101f cmp w0, #0x4
|
|
75c: 54ffff0d b.le 73c <coresight_test1+0x10>
|
|
760: b9400fe0 ldr w0, [sp,#12]
|
|
764: 910083ff add sp, sp, #0x20
|
|
768: d65f03c0 ret
|
|
...
|
|
...
|
|
|
|
Following the address the amount of byte is specified and if tracing in user
|
|
space, the full path to the binary (or library) being traced.
|
|
|
|
**Start/Stop filters:**
|
|
With start/stop filters traces are generated when the instruction pointer is
|
|
equal to the start address. Incidentally traces stop being generated when the
|
|
insruction pointer is equal to the stop address. Anything that happens between
|
|
there to events is traced:
|
|
|
|
perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread uname
|
|
|
|
perf record -vvv -e cs_etm/@tmc_etr0/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0, \
|
|
stop 0x40082c@/home/linaro/main' \
|
|
--per-thread ./main
|
|
|
|
**Limitation on address filters:**
|
|
The only limitation on address filters is the amount of address comparator
|
|
found on an implementation and the mutual exclusion between range and
|
|
start stop filters. As such the following example would _not_ work:
|
|
|
|
perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \ // start/stop
|
|
filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' \ // address range
|
|
--per-thread uname
|
|
|
|
Additional Trace Options
|
|
------------------------
|
|
Additional options can be used during trace collection that add information to the captured trace.
|
|
|
|
- Timestamps: These packets are added to the trace streams to allow correlation of different sources where tools support this.
|
|
- Cycle Counts: These packets are added to get a count of cycles for blocks of executed instructions. Adding cycle counts will considerably increase the amount of generated trace.
|
|
The relationship between cycle counts and executed instructions differs according to the trace protocol.
|
|
For example, the ETMv4 protocol will emit counts for groups of instructions according to a minimum count threshold.
|
|
Presently this threshold is fixed at 256 cycles for `perf record`.
|
|
|
|
Command line options in `perf record` to use these features are part of the options for the `cs_etm` event:
|
|
|
|
perf record -e cs_etm/timestamp,cycacc,@tmc_etr0/ --per-thread uname
|
|
|
|
At current version, `perf record` and `perf script` do not use this additional information.
|
|
|
|
The cs_etm perf event
|
|
---------------------
|
|
|
|
System information for this perf pmu event can be found at:
|
|
|
|
/sys/devices/cs_etm
|
|
|
|
This contains internal format of the parameters described above:
|
|
|
|
root@linaro-developer:~# ls /sys/devices/cs_etm/format
|
|
contextid cycacc retstack sinkid timestamp
|
|
|
|
and names of registered sinks:
|
|
|
|
root@linaro-developer:~# ls /sys/devices/cs_etm/sinks
|
|
tmc_etf0 tmc_etr0 tpiu0
|
|
|
|
Note: The `sinkid` parameter is there to document the usage of a 32-bit internal parameter to
|
|
pass the sink name used in the cs_etm/@sink/ command to the kernel drivers. It can be used
|
|
directly as cs_etm/sinkid=<hash_value>/ but this is not recommended as the values used are
|
|
considered opaque and subject to changes.
|
|
|
|
On Target Trace Collection
|
|
--------------------------
|
|
The entire program flow will have been recorded in the `perf.data` file.
|
|
Information about libraries and executable is stored under `$HOME/.debug`:
|
|
|
|
linaro@linaro-nano:~/kernel$ tree ~/.debug
|
|
.debug
|
|
├── [kernel.kallsyms]
|
|
│ └── 0542921808098d591a7acba5a1163e8991897669
|
|
│ └── kallsyms
|
|
├── [vdso]
|
|
│ └── 551fbbe29579eb63be3178a04c16830b8d449769
|
|
│ └── vdso
|
|
├── bin
|
|
│ └── uname
|
|
│ └── ed95e81f97c4471fb2ccc21e356b780eb0c92676
|
|
│ └── elf
|
|
└── lib
|
|
└── aarch64-linux-gnu
|
|
├── ld-2.21.so
|
|
│ └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7
|
|
│ └── elf
|
|
└── libc-2.21.so
|
|
└── 169a143e9c40cfd9d09695333e45fd67743cd2d6
|
|
└── elf
|
|
|
|
13 directories, 5 files
|
|
linaro@linaro-nano:~/kernel$
|
|
|
|
|
|
All this information needs to be collected in order to successfully decode
|
|
traces off target:
|
|
|
|
linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug
|
|
|
|
|
|
Note that file `vmlinux` should also be added to the bundle if kernel traces
|
|
have also been collected.
|
|
|
|
|
|
Off Target OpenCSD Compilation
|
|
------------------------------
|
|
The openCSD library is not part of the perf tools. It is available on
|
|
[github][1] and needs to be compiled before the perf tools. Checkout the
|
|
required branch/tag version into a local directory.
|
|
|
|
linaro@t430:~/linaro/coresight$ git clone https://github.com/Linaro/OpenCSD.git my-opencsd
|
|
Cloning into 'OpenCSD'...
|
|
remote: Counting objects: 2063, done.
|
|
remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
|
|
Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done.
|
|
Resolving deltas: 100% (1399/1399), done.
|
|
Checking connectivity... done.
|
|
linaro@t430:~/linaro/coresight$ ls my-opencsd
|
|
decoder LICENSE README.md HOWTO.md TODO
|
|
|
|
Once the source code has been acquired compilation of the openCSD library can
|
|
take place. For Linux two options are available, LINUX and LINUX64, based on
|
|
the host's (which has nothing to do with the target) architecture:
|
|
|
|
linaro@t430:~/linaro/coresight/$ cd my-opencsd/decoder/build/linux/
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls
|
|
makefile rctdl_c_api_lib ref_trace_decode_lib
|
|
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ make LINUX64=1 DEBUG=1
|
|
...
|
|
...
|
|
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/
|
|
libopencsd.a libopencsd_c_api.a libopencsd_c_api.so libopencsd.so
|
|
|
|
From there the header file and libraries need to be installed on the system,
|
|
something that requires root privileges. The default installation path is
|
|
/usr/include/opencsd for the header files and /usr/lib/ for the libraries:
|
|
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ sudo make install
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/include/opencsd
|
|
total 60
|
|
drwxr-xr-x 2 root root 4096 Dec 12 10:19 c_api
|
|
drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv3
|
|
drwxr-xr-x 2 root root 4096 Dec 12 10:19 etmv4
|
|
-rw-r--r-- 1 root root 28049 Dec 12 10:19 ocsd_if_types.h
|
|
drwxr-xr-x 2 root root 4096 Dec 12 10:19 ptm
|
|
drwxr-xr-x 2 root root 4096 Dec 12 10:19 stm
|
|
-rw-r--r-- 1 root root 7264 Dec 12 10:19 trc_gen_elem_types.h
|
|
-rw-r--r-- 1 root root 3972 Dec 12 10:19 trc_pkt_types.h
|
|
|
|
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/lib/libopencsd*
|
|
-rw-r--r-- 1 root root 598720 Dec 12 10:19 /usr/lib/libopencsd_c_api.so
|
|
-rw-r--r-- 1 root root 4692200 Dec 12 10:19 /usr/lib/libopencsd.so
|
|
|
|
A "clean_install" target is also available so that openCSD installed files can
|
|
be removed from a system. Going forward the goal is to have the openCSD library
|
|
packaged as a Debian or RPM archive so that it can be installed from a
|
|
distribution without having to be compiled.
|
|
|
|
|
|
Off Target Perf Tools Compilation
|
|
---------------------------------
|
|
|
|
As mentioned above the openCSD library is not part of the perf tools' code base
|
|
and needs to be installed on a system prior to compilation. Information about
|
|
the status of the openCSD library on a system is given at compile time by the
|
|
perf tools build script:
|
|
|
|
linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf
|
|
Auto-detecting system features:
|
|
... dwarf: [ on ]
|
|
... dwarf_getlocations: [ on ]
|
|
... glibc: [ on ]
|
|
... gtk2: [ on ]
|
|
... libaudit: [ on ]
|
|
... libbfd: [ OFF ]
|
|
... libelf: [ on ]
|
|
... libnuma: [ OFF ]
|
|
... numa_num_possible_cpus: [ OFF ]
|
|
... libperl: [ on ]
|
|
... libpython: [ on ]
|
|
... libslang: [ on ]
|
|
... libcrypto: [ on ]
|
|
... libunwind: [ OFF ]
|
|
... libdw-dwarf-unwind: [ on ]
|
|
... zlib: [ on ]
|
|
... lzma: [ OFF ]
|
|
... get_cpuid: [ on ]
|
|
... bpf: [ on ]
|
|
... libopencsd: [ on ] <-------
|
|
|
|
|
|
At the end of the compilation a new perf binary is available in `tools/perf/`:
|
|
|
|
linaro@t430:~/linaro/linux-kernel$ ldd tools/perf/perf
|
|
linux-vdso.so.1 => (0x00007fff135db000)
|
|
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f15f9176000)
|
|
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f15f8f6e000)
|
|
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f15f8c64000)
|
|
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f15f8a60000)
|
|
libopencsd_c_api.so => /usr/lib/libopencsd_c_api.so (0x00007f15f884e000) <-------
|
|
libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f15f8635000)
|
|
libdw.so.1 => /usr/lib/x86_64-linux-gnu/libdw.so.1 (0x00007f15f83ec000)
|
|
libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f15f81c5000)
|
|
libslang.so.2 => /lib/x86_64-linux-gnu/libslang.so.2 (0x00007f15f7e38000)
|
|
libperl.so.5.22 => /usr/lib/x86_64-linux-gnu/libperl.so.5.22 (0x00007f15f7a5d000)
|
|
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f15f7693000)
|
|
libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f15f7104000)
|
|
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f15f6eea000)
|
|
/lib64/ld-linux-x86-64.so.2 (0x0000559b88038000)
|
|
libopencsd.so => /usr/lib/libopencsd.so (0x00007f15f6c62000) <-------
|
|
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f15f68df000)
|
|
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15f66c9000)
|
|
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f15f64a6000)
|
|
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f15f6296000)
|
|
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f15f605e000)
|
|
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f15f5e5a000)
|
|
|
|
|
|
Additional debug output from the decoder can be compiled in by setting the
|
|
`CSTRACE_RAW` environment variable. Setting this to `packed` gets trace frame
|
|
output as follows:-
|
|
|
|
Frame Data; Index 576; RAW_PACKED; d6 d6 d6 d6 d6 d6 d6 d6 fc fb d6 d6 d6 d6 e0 7f
|
|
Frame Data; Index 576; ID_DATA[0x14]; d7 d6 d7 d6 d7 d6 d7 d6 fd fb d7 d6 d7 d6 e0
|
|
|
|
Set to any other value will remove the RAW_PACKED lines.
|
|
|
|
Working with an alternate version of the openCSD library
|
|
--------------------------------------------------------
|
|
When compiling the perf tools it is possible to reference another version of
|
|
the openCSD library than the one installed on the system. This is useful when
|
|
working with multiple development trees or having the desire to keep system
|
|
libraries intact. Two environment variable are available to tell the perf tools
|
|
build script where to get the header file and libraries, namely CSINCLUDES and
|
|
CSLIBS:
|
|
|
|
linaro@t430:~/linaro/linux-kernel$ export CSINCLUDES=~/linaro/coresight/my-opencsd/decoder/include/
|
|
linaro@t430:~/linaro/linux-kernel$ export CSLIBS=~/linaro/coresight/my-opencsd/decoder/lib/builddir/
|
|
linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf
|
|
|
|
This will have the effect of compiling and linking against the provided library.
|
|
Since the system's openCSD library is in the loader's search patch the
|
|
LD_LIBRARY_PATH environment variable needs to be set.
|
|
|
|
linaro@t430:~/linaro/linux-kernel$ export LD_LIBRARY_PATH=$CSLIBS
|
|
|
|
|
|
Trace Decoding with Perf Report
|
|
-------------------------------
|
|
Before working with custom traces it is suggested to use a trace bundle that
|
|
is known to be working properly. A sample bundle has been made available
|
|
here [2]. Trace bundles can be extracted anywhere and have no dependencies on
|
|
where the perf tools and openCSD library have been compiled.
|
|
|
|
linaro@t430:~/linaro/coresight$ mkdir sept20
|
|
linaro@t430:~/linaro/coresight$ cd sept20
|
|
linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
|
|
linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz
|
|
f53f11d687ce72bdbe9de2e67e960ec6 uname.v4.user.sept20.tgz
|
|
linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz
|
|
linaro@t430:~/linaro/coresight/sept20$ ls -la
|
|
total 1312
|
|
drwxrwxr-x 3 linaro linaro 4096 Mar 3 10:26 .
|
|
drwxrwxr-x 5 linaro linaro 4096 Mar 3 10:13 ..
|
|
drwxr-xr-x 7 linaro linaro 4096 Feb 24 12:21 .debug
|
|
-rw------- 1 linaro linaro 78016 Feb 24 12:21 perf.data
|
|
-rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz
|
|
|
|
Perf is expecting files related to the trace capture (`perf.data`) to be located in the `buildid` directory.
|
|
By default this is under `~/.debug`. Alternatively the default `buildid` directory can be changed
|
|
using the command:
|
|
|
|
perf config --system buildid.dir=/my/own/buildid/dir
|
|
|
|
This example will remove the current `~/.debug` directory to be sure everything is clean.
|
|
|
|
linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug
|
|
linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/
|
|
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio
|
|
|
|
# To display the perf.data header info, please use --header/--header-only options.
|
|
#
|
|
#
|
|
# Total Lost Samples: 0
|
|
#
|
|
# Samples: 0 of event 'cs_etm//u'
|
|
# Event count (approx.): 0
|
|
#
|
|
# Children Self Command Shared Object Symbol
|
|
# ........ ........ ....... ............. ......
|
|
#
|
|
|
|
|
|
# Samples: 0 of event 'dummy:u'
|
|
# Event count (approx.): 0
|
|
#
|
|
# Children Self Command Shared Object Symbol
|
|
# ........ ........ ....... ............. ......
|
|
#
|
|
|
|
|
|
# Samples: 115K of event 'instructions:u'
|
|
# Event count (approx.): 522009
|
|
#
|
|
# Children Self Command Shared Object Symbol
|
|
# ........ ........ ....... ................ ......................
|
|
#
|
|
4.13% 4.13% uname libc-2.21.so [.] 0x0000000000078758
|
|
3.81% 3.81% uname libc-2.21.so [.] 0x0000000000078e50
|
|
2.06% 2.06% uname libc-2.21.so [.] 0x00000000000fcaf4
|
|
1.65% 1.65% uname libc-2.21.so [.] 0x00000000000fcae4
|
|
1.59% 1.59% uname ld-2.21.so [.] 0x000000000000a7f4
|
|
1.50% 1.50% uname libc-2.21.so [.] 0x0000000000078e40
|
|
1.43% 1.43% uname libc-2.21.so [.] 0x00000000000fcac4
|
|
1.31% 1.31% uname libc-2.21.so [.] 0x000000000002f0c0
|
|
1.26% 1.26% uname ld-2.21.so [.] 0x0000000000016888
|
|
1.24% 1.24% uname libc-2.21.so [.] 0x0000000000078e7c
|
|
1.24% 1.24% uname libc-2.21.so [.] 0x00000000000fcab8
|
|
...
|
|
|
|
Additional data can be obtained, which contains a dump of the trace packets received using the command
|
|
|
|
mjl@ubuntu-vbox:./perf-opencsd-master/coresight/tools/perf/perf report --stdio --dump
|
|
|
|
resulting a large amount of data, trace looking like:-
|
|
|
|
0x618 [0x30]: PERF_RECORD_AUXTRACE size: 0x11ef0 offset: 0 ref: 0x4d881c1f13216016 idx: 0 tid: 15244 cpu: -1
|
|
|
|
. ... CoreSight ETM Trace data: size 73456 bytes
|
|
|
|
0: I_ASYNC : Alignment Synchronisation.
|
|
12: I_TRACE_INFO : Trace Info.
|
|
17: I_TRACE_ON : Trace On.
|
|
18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS;
|
|
28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
|
|
29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
|
|
30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
|
|
32: I_ATOM_F6 : Atom format 6.; EEEEN
|
|
33: I_ATOM_F1 : Atom format 1.; E
|
|
34: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows;
|
|
36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C;
|
|
45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS;
|
|
56: I_TRACE_ON : Trace On.
|
|
57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS;
|
|
68: I_ATOM_F3 : Atom format 3.; NEE
|
|
69: I_ATOM_F3 : Atom format 3.; NEN
|
|
70: I_ATOM_F3 : Atom format 3.; NNE
|
|
71: I_ATOM_F5 : Atom format 5.; ENENE
|
|
72: I_ATOM_F5 : Atom format 5.; NENEN
|
|
73: I_ATOM_F5 : Atom format 5.; ENENE
|
|
74: I_ATOM_F5 : Atom format 5.; NENEN
|
|
75: I_ATOM_F5 : Atom format 5.; ENENE
|
|
76: I_ATOM_F3 : Atom format 3.; NNE
|
|
77: I_ATOM_F3 : Atom format 3.; NNE
|
|
78: I_ATOM_F3 : Atom format 3.; NNE
|
|
80: I_ATOM_F3 : Atom format 3.; NNE
|
|
81: I_ATOM_F3 : Atom format 3.; ENN
|
|
82: I_EXCEPT : Exception.; Data Fault; Ret Addr Follows;
|
|
84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0;
|
|
93: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS;
|
|
104: I_TRACE_ON : Trace On.
|
|
105: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; Ctxt: AArch64,EL0, NS;
|
|
116: I_ATOM_F5 : Atom format 5.; NNNNN
|
|
117: I_ATOM_F5 : Atom format 5.; NNNNN
|
|
|
|
|
|
Trace Decoding with Perf Script
|
|
-------------------------------
|
|
Working with perf scripts needs more command line options but yields
|
|
interesting results.
|
|
|
|
linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
|
|
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
|
|
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
|
|
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump
|
|
|
|
7f89f24d80: 910003e0 mov x0, sp
|
|
7f89f24d84: 94000d53 bl 7f89f282d0 <free@plt+0x3790>
|
|
7f89f282d0: d11203ff sub sp, sp, #0x480
|
|
7f89f282d4: a9ba7bfd stp x29, x30, [sp,#-96]!
|
|
7f89f282d8: 910003fd mov x29, sp
|
|
7f89f282dc: a90363f7 stp x23, x24, [sp,#48]
|
|
7f89f282e0: 9101e3b7 add x23, x29, #0x78
|
|
7f89f282e4: a90573fb stp x27, x28, [sp,#80]
|
|
7f89f282e8: a90153f3 stp x19, x20, [sp,#16]
|
|
7f89f282ec: aa0003fb mov x27, x0
|
|
7f89f282f0: 910a82e1 add x1, x23, #0x2a0
|
|
7f89f282f4: a9025bf5 stp x21, x22, [sp,#32]
|
|
7f89f282f8: a9046bf9 stp x25, x26, [sp,#64]
|
|
7f89f282fc: 910102e0 add x0, x23, #0x40
|
|
7f89f28300: f800841f str xzr, [x0],#8
|
|
7f89f28304: eb01001f cmp x0, x1
|
|
7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
|
|
7f89f28300: f800841f str xzr, [x0],#8
|
|
7f89f28304: eb01001f cmp x0, x1
|
|
7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
|
|
7f89f28300: f800841f str xzr, [x0],#8
|
|
7f89f28304: eb01001f cmp x0, x1
|
|
7f89f28308: 54ffffc1 b.ne 7f89f28300 <free@plt+0x37c0>
|
|
|
|
Kernel Trace Decoding
|
|
---------------------
|
|
|
|
When dealing with kernel space traces the vmlinux file has to be communicated
|
|
explicitely to perf using the "--vmlinux" command line option:
|
|
|
|
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio --vmlinux=./vmlinux
|
|
...
|
|
...
|
|
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf script --vmlinux=./vmlinux
|
|
|
|
When using scripts things get a little more convoluted. Using the same example
|
|
an above but for traces but for kernel traces, the command line becomes:
|
|
|
|
linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
|
|
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
|
|
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
|
|
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script \
|
|
--vmlinux=./vmlinux \
|
|
--script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- \
|
|
-d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump \
|
|
-k ./vmlinux
|
|
...
|
|
...
|
|
|
|
The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command
|
|
the same way it if for "perf report". The option "-k ./vmlinux" is dependant
|
|
on the script being executed and has no related to the "--vmlinux", though it
|
|
is highly advised to keep them synchronized.
|
|
|
|
|
|
Perf Test Environment Scripts
|
|
-----------------------------
|
|
|
|
The decoder library comes with a number of `bash` scripts that ease the setting up of the
|
|
offline build and test environment for perf, and executing tests.
|
|
|
|
These scripts can be found in
|
|
|
|
decoder/tests/perf-test-scripts
|
|
|
|
There are three scripts provided:
|
|
|
|
- `perf-setup-env.bash` : this sets up all the environment variables mentioned above.
|
|
- `perf-test-report.bash` : this runs `perf report` - using the environment setup by `perf-setup-env.bash`
|
|
- `perf-test-script.bash` : this runs `perf script` - using the environment setup by `perf-setup-env.bash`
|
|
|
|
Use as follows:-
|
|
|
|
1. Prior to building perf, edit `perf-setup-env.bash` to conform to your environment. There are four lines at the top of the file that will require editing.
|
|
|
|
2. Execute the script using the command:
|
|
|
|
source perf-setup-env.bash
|
|
|
|
This will set up a perf execute environment for using the perf report and script commands.
|
|
|
|
Alternatively use the command:
|
|
|
|
source perf-setup-env.base buildenv
|
|
|
|
This will add in the build environment variables mentioned in the sections on building above alongside the
|
|
environment for using the used by the `perf-test...` scripts to run the tests.
|
|
|
|
3. Build perf as described above.
|
|
4. Follow the instructions for downloading the test capture, or create a capture from your target.
|
|
5. Copy the `perf-test...` scripts into the capture data directory -> the one that contains `perf.data`.
|
|
|
|
6. The scripts can now be run. No options are required for the default operation, but any command line options will be added to the perf report / perf script command line.
|
|
|
|
e.g.
|
|
|
|
./perf-test-report.bash --dump
|
|
|
|
will add the --dump option to the end of the command line and run
|
|
|
|
${PERF_EXEC_PATH}/perf report --stdio --dump
|
|
|
|
|
|
Generating coverage files for Feedback Directed Optimization: AutoFDO
|
|
---------------------------------------------------------------------
|
|
|
|
See autofdo.md (@ref AutoFDO) for details and scripts.
|
|
|
|
|
|
The Linaro CoreSight Team
|
|
-------------------------
|
|
- Mike Leach
|
|
- Mathieu Poirier
|
|
|
|
|
|
One Last Thing
|
|
--------------
|
|
We welcome help on this project. If you would like to add features or help
|
|
improve the way things work, we want to hear from you.
|
|
|
|
Best regards,
|
|
*The Linaro CoreSight Team*
|
|
|
|
--------------------------------------
|
|
[1]: https://github.com/Linaro/OpenCSD
|
|
|
|
[2]: http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
|