You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
104 lines
3.9 KiB
104 lines
3.9 KiB
## Inferno
|
|
|
|
![logo](./inferno_small.png)
|
|
|
|
### Description
|
|
|
|
Inferno is a flamegraph generator for native (C/C++) Android apps. It was
|
|
originally written to profile and improve surfaceflinger performance
|
|
(Android compositor) but it can be used for any native Android application
|
|
. You can see a sample report generated with Inferno
|
|
[here](./report.html). Report are self-contained in HTML so they can be
|
|
exchanged easily.
|
|
|
|
Notice there is no concept of time in a flame graph since all callstack are
|
|
merged together. As a result, the width of a flamegraph represents 100% of
|
|
the number of samples and the height is related to the number of functions on
|
|
the stack when sampling occurred.
|
|
|
|
|
|
![flamegraph sample](./main_thread_flamegraph.png)
|
|
|
|
In the flamegraph featured above you can see the main thread of SurfaceFlinger.
|
|
It is immediatly apparent that most of the CPU time is spent processing messages
|
|
`android::SurfaceFlinger::onMessageReceived`. The most expensive task is to ask
|
|
the screen to be refreshed as `android::DisplayDevice::prepare` shows in orange
|
|
. This graphic division helps to see what part of the program is costly and
|
|
where a developer's effort to improve performances should go.
|
|
|
|
### Example of bottleneck
|
|
|
|
A flamegraph give you instant vision on the CPU cycles cost centers but
|
|
it can also be used to find specific offenders. To find them, look for
|
|
plateaus. It is easier to see an example:
|
|
|
|
![flamegraph sample](./bottleneck.png)
|
|
|
|
In the previous flamegraph, two
|
|
plateaus (due to `android::BufferQueueCore::validateConsistencyLocked`)
|
|
are immediately apparent.
|
|
|
|
### How it works
|
|
Inferno relies on simpleperf to record the callstack of a native application
|
|
thousands of times per second. Simpleperf takes care of unwinding the stack
|
|
either using frame pointer (recommended) or dwarf. At the end of the recording
|
|
`simpleperf` also symbolize all IPs automatically. The record are aggregated and
|
|
dumps dumped to a file `perf.data`. This file is pulled from the Android device
|
|
and processed on the host by Inferno. The callstacks are merged together to
|
|
visualize in which part of an app the CPU cycles are spent.
|
|
|
|
### How to use it
|
|
|
|
Open a terminal and from `simpleperf/scripts` directory type:
|
|
```
|
|
./inferno.sh (on Linux/Mac)
|
|
inferno.bat (on Windows)
|
|
```
|
|
|
|
Inferno will collect data, process them and automatically open your web browser
|
|
to display the HTML report.
|
|
|
|
### Parameters
|
|
|
|
You can select how long to sample for, the color of the node and many other
|
|
things. Use `-h` to get a list of all supported parameters.
|
|
|
|
```
|
|
./inferno.sh -h
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
#### Messy flame graph
|
|
A healthy flame graph features a single call site at its base (see [here](./report.html)).
|
|
If you don't see a unique call site like `_start` or `_start_thread` at the base
|
|
from which all flames originate, something went wrong. : Stack unwinding may
|
|
fail to reach the root callsite. These incomplete
|
|
callstack are impossible to merge properly. By default Inferno asks
|
|
`simpleperf` to unwind the stack via the kernel and frame pointers. Try to
|
|
perform unwinding with dwarf `-du`, you can further tune this setting.
|
|
|
|
|
|
#### No flames
|
|
If you see no flames at all or a mess of 1 level flame without a common base,
|
|
this may be because you compiled without frame pointers. Make sure there is no
|
|
` -fomit-frame-pointer` in your build config. Alternatively, ask simpleperf to
|
|
collect data with dward unwinding `-du`.
|
|
|
|
|
|
|
|
#### High percentage of lost samples
|
|
|
|
If simpleperf reports a lot of lost sample it is probably because you are
|
|
unwinding with `dwarf`. Dwarf unwinding involves copying the stack before it is
|
|
processed. Try to use frame pointer unwinding which can be done by the kernel
|
|
and it much faster.
|
|
|
|
The cost of frame pointer is negligible on arm64 parameter but considerable
|
|
on arm 32-bit arch (due to register pressure). Use a 64-bit build for better
|
|
profiling.
|
|
|
|
#### run-as: package not debuggable
|
|
If you cannot run as root, make sure the app is debuggable otherwise simpleperf
|
|
will not be able to profile it.
|