You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
438 lines
17 KiB
438 lines
17 KiB
# Debugging memory usage on Android
|
|
|
|
## Prerequisites
|
|
|
|
* A host running macOS or Linux.
|
|
* [ADB](https://developer.android.com/studio/command-line/adb) installed and
|
|
in PATH.
|
|
* A device running Android 11+.
|
|
|
|
If you are profiling your own app and are not running a userdebug build of
|
|
Android, your app needs to be marked as profileable or
|
|
debuggable in its manifest. See the [heapprofd documentation](
|
|
/docs/data-sources/native-heap-profiler.md#heapprofd-targets) for more
|
|
details on which applications can be targeted.
|
|
|
|
## dumpsys meminfo
|
|
|
|
A good place to get started investigating memory usage of a process is
|
|
`dumpsys meminfo` which gives a high-level overview of how much of the various
|
|
types of memory are being used by a process.
|
|
|
|
```bash
|
|
$ adb shell dumpsys meminfo com.android.systemui
|
|
|
|
Applications Memory Usage (in Kilobytes):
|
|
Uptime: 2030149 Realtime: 2030149
|
|
|
|
** MEMINFO in pid 1974 [com.android.systemui] **
|
|
Pss Private Private SwapPss Rss Heap Heap Heap
|
|
Total Dirty Clean Dirty Total Size Alloc Free
|
|
------ ------ ------ ------ ------ ------ ------ ------
|
|
Native Heap 16840 16804 0 6764 19428 34024 25037 5553
|
|
Dalvik Heap 9110 9032 0 136 13164 36444 9111 27333
|
|
|
|
[more stuff...]
|
|
```
|
|
|
|
Looking at the "Private Dirty" column of Dalvik Heap (= Java Heap) and
|
|
Native Heap, we can see that SystemUI's memory usage on the Java heap
|
|
is 9M, on the native heap it's 17M.
|
|
|
|
## Linux memory management
|
|
|
|
But what does *clean*, *dirty*, *Rss*, *Pss*, *Swap* actually mean? To answer
|
|
this question, we need to delve into Linux memory management a bit.
|
|
|
|
From the kernel's point of view, memory is split into equally sized blocks
|
|
called *pages*. These are generally 4KiB.
|
|
|
|
Pages are organized in virtually contiguous ranges called VMA
|
|
(Virtual Memory Area).
|
|
|
|
VMAs are created when a process requests a new pool of memory pages through
|
|
the [mmap() system call](https://man7.org/linux/man-pages/man2/mmap.2.html).
|
|
Applications rarely call mmap() directly. Those calls are typically mediated by
|
|
the allocator, `malloc()/operator new()` for native processes or by the
|
|
Android RunTime for Java apps.
|
|
|
|
VMAs can be of two types: file-backed and anonymous.
|
|
|
|
**File-backed VMAs** are a view of a file in memory. They are obtained passing a
|
|
file descriptor to `mmap()`. The kernel will serve page faults on the VMA
|
|
through the passed file, so reading a pointer to the VMA becomes the equivalent
|
|
of a `read()` on the file.
|
|
File-backed VMAs are used, for instance, by the dynamic linker (`ld`) when
|
|
executing new processes or dynamically loading libraries, or by the Android
|
|
framework, when loading a new .dex library or accessing resources in the APK.
|
|
|
|
**Anonymous VMAs** are memory-only areas not backed by any file. This is the way
|
|
allocators request dynamic memory from the kernel. Anonymous VMAs are obtained
|
|
calling `mmap(... MAP_ANONYMOUS ...)`.
|
|
|
|
Physical memory is only allocated, in page granularity, once the application
|
|
tries to read/write from a VMA. If you allocate 32 MiB worth of pages but only
|
|
touch one byte, your process' memory usage will only go up by 4KiB. You will
|
|
have increased your process' *virtual memory* by 32 MiB, but its resident
|
|
*physical memory* by 4 KiB.
|
|
|
|
When optimizing memory use of programs, we are interested in reducing their
|
|
footprint in *physical memory*. High *virtual memory* use is generally not a
|
|
cause for concern on modern platforms (except if you run out of address space,
|
|
which is very hard on 64 bit systems).
|
|
|
|
We call the amount a process' memory that is resident in *physical memory* its
|
|
**RSS** (Resident Set Size). Not all resident memory is equal though.
|
|
|
|
From a memory-consumption viewpoint, individual pages within a VMA can have the
|
|
following states:
|
|
|
|
* **Resident**: the page is mapped to a physical memory page. Resident pages can
|
|
be in two states:
|
|
* **Clean** (only for file-backed pages): the contents of the page are the
|
|
same of the contents on-disk. The kernel can evict clean pages more easily
|
|
in case of memory pressure. This is because if they should be needed
|
|
again, the kernel knows it can re-create its contents by reading them from
|
|
the underlying file.
|
|
* **Dirty**: the contents of the page diverge from the disk, or (in most
|
|
cases), the page has no disk backing (i.e. it's _anonymous_). Dirty pages
|
|
cannot be evicted because doing so would cause data loss. However they can
|
|
be swapped out on disk or ZRAM, if present.
|
|
* **Swapped**: a dirty page can be written to the swap file on disk (on most Linux
|
|
desktop distributions) or compressed (on Android and CrOS through
|
|
[ZRAM](https://source.android.com/devices/tech/perf/low-ram#zram)). The page
|
|
will stay swapped until a new page fault on its virtual address happens, at
|
|
which point the kernel will bring it back in main memory.
|
|
* **Not present**: no page fault ever happened on the page or the page was
|
|
clean and later was evicted.
|
|
|
|
It is generally more important to reduce the amount of _dirty_ memory as that
|
|
cannot be reclaimed like _clean_ memory and, on Android, even if swapped in
|
|
ZRAM, will still eat part of the system memory budget.
|
|
This is why we looked at *Private Dirty* in the `dumpsys meminfo` example.
|
|
|
|
*Shared* memory can be mapped into more than one process. This means VMAs in
|
|
different processes refer to the same physical memory. This typically happens
|
|
with file-backed memory of commonly used libraries (e.g., libc.so,
|
|
framework.dex) or, more rarely, when a process `fork()`s and a child process
|
|
inherits dirty memory from its parent.
|
|
|
|
This introduces the concept of **PSS** (Proportional Set Size). In **PSS**,
|
|
memory that is resident in multiple processes is proportionally attributed to
|
|
each of them. If we map one 4KiB page into four processes, each of their
|
|
**PSS** will increase by 1KiB.
|
|
|
|
#### Recap
|
|
|
|
* Dynamically allocated memory, whether allocated through C's `malloc()`, C++'s
|
|
`operator new()` or Java's `new X()` starts always as _anonymous_ and _dirty_,
|
|
unless it is never used.
|
|
* If this memory is not read/written for a while, or in case of memory pressure,
|
|
it gets swapped out on ZRAM and becomes _swapped_.
|
|
* Anonymous memory, whether _resident_ (and hence _dirty_) or _swapped_ is
|
|
always a resource hog and should be avoided if unnecessary.
|
|
* File-mapped memory comes from code (java or native), libraries and resource
|
|
and is almost always _clean_. Clean memory also erodes the system memory
|
|
budget but typically application developers have less control on it.
|
|
|
|
## Memory over time
|
|
|
|
`dumpsys meminfo` is good to get a snapshot of the current memory usage, but
|
|
even very short memory spikes can lead to low-memory situations, which will
|
|
lead to [LMKs](#lmk). We have two tools to investigate situations like this
|
|
|
|
* RSS High Watermark.
|
|
* Memory tracepoints.
|
|
|
|
### RSS High Watermark
|
|
|
|
We can get a lot of information from the `/proc/[pid]/status` file, including
|
|
memory information. `VmHWM` shows the maximum RSS usage the process has seen
|
|
since it was started. This value is kept updated by the kernel.
|
|
|
|
```bash
|
|
$ adb shell cat '/proc/$(pidof com.android.systemui)/status'
|
|
[...]
|
|
VmHWM: 256972 kB
|
|
VmRSS: 195272 kB
|
|
RssAnon: 30184 kB
|
|
RssFile: 164420 kB
|
|
RssShmem: 668 kB
|
|
VmSwap: 43960 kB
|
|
[...]
|
|
```
|
|
|
|
### Memory tracepoints
|
|
|
|
NOTE: For detailed instructions about the memory trace points see the
|
|
[Data sources > Memory > Counters and events](
|
|
/docs/data-sources/memory-counters.md) page.
|
|
|
|
We can use Perfetto to get information about memory management events from the
|
|
kernel.
|
|
|
|
```bash
|
|
$ adb shell perfetto \
|
|
-c - --txt \
|
|
-o /data/misc/perfetto-traces/trace \
|
|
<<EOF
|
|
|
|
buffers: {
|
|
size_kb: 8960
|
|
fill_policy: DISCARD
|
|
}
|
|
buffers: {
|
|
size_kb: 1280
|
|
fill_policy: DISCARD
|
|
}
|
|
data_sources: {
|
|
config {
|
|
name: "linux.process_stats"
|
|
target_buffer: 1
|
|
process_stats_config {
|
|
scan_all_processes_on_start: true
|
|
}
|
|
}
|
|
}
|
|
data_sources: {
|
|
config {
|
|
name: "linux.ftrace"
|
|
ftrace_config {
|
|
ftrace_events: "mm_event/mm_event_record"
|
|
ftrace_events: "kmem/rss_stat"
|
|
ftrace_events: "kmem/ion_heap_grow"
|
|
ftrace_events: "kmem/ion_heap_shrink"
|
|
}
|
|
}
|
|
}
|
|
duration_ms: 30000
|
|
|
|
EOF
|
|
```
|
|
|
|
While it is running, take a photo if you are following along.
|
|
|
|
Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/mem-trace`
|
|
and upload to the [Perfetto UI](https://ui.perfetto.dev). This will show
|
|
overall stats about system [ION](#ion) usage, and per-process stats to
|
|
expand. Scroll down (or Ctrl-F for) to `com.google.android.GoogleCamera` and
|
|
expand. This will show a timeline for various memory stats for camera.
|
|
|
|
![Camera Memory Trace](/docs/images/trace-rss-camera.png)
|
|
|
|
We can see that around 2/3 into the trace, the memory spiked (in the
|
|
mem.rss.anon track). This is where I took a photo. This is a good way to see
|
|
how the memory usage of an application reacts to different triggers.
|
|
|
|
## Which tool to use
|
|
|
|
If you want to drill down into _anonymous_ memory allocated by Java code,
|
|
labeled by `dumpsys meminfo` as `Dalvik Heap`, see the
|
|
[Analyzing the java heap](#java-hprof) section.
|
|
|
|
If you want to drill down into _anonymous_ memory allocated by native code,
|
|
labeled by `dumpsys meminfo` as `Native Heap`, see the
|
|
[Analyzing the Native Heap](#heapprofd) section. Note that it's frequent to end
|
|
up with native memory even if your app doesn't have any C/C++ code. This is
|
|
because the implementation of some framework API (e.g. Regex) is internally
|
|
implemented through native code.
|
|
|
|
If you want to drill down into file-mapped memory the best option is to use
|
|
`adb shell showmap PID` (on Android) or inspect `/proc/PID/smaps`.
|
|
|
|
|
|
## {#lmk} Low-memory kills
|
|
|
|
When an Android device becomes low on memory, a daemon called `lmkd` will
|
|
start killing processes in order to free up memory. Devices' strategies differ,
|
|
but in general processes will be killed in order of descending `oom_score_adj`
|
|
score (i.e. background apps and processes first, foreground processes last).
|
|
|
|
Apps on Android are not killed when switching away from them. They instead
|
|
remain *cached* even after the user finishes using them. This is to make
|
|
subsequent starts of the app faster. Such apps will generally be killed
|
|
first (because they have a higher `oom_score_adj`).
|
|
|
|
We can collect information about LMKs and `oom_score_adj` using Perfetto.
|
|
|
|
```protobuf
|
|
$ adb shell perfetto \
|
|
-c - --txt \
|
|
-o /data/misc/perfetto-traces/trace \
|
|
<<EOF
|
|
|
|
buffers: {
|
|
size_kb: 8960
|
|
fill_policy: DISCARD
|
|
}
|
|
buffers: {
|
|
size_kb: 1280
|
|
fill_policy: DISCARD
|
|
}
|
|
data_sources: {
|
|
config {
|
|
name: "linux.process_stats"
|
|
target_buffer: 1
|
|
process_stats_config {
|
|
scan_all_processes_on_start: true
|
|
}
|
|
}
|
|
}
|
|
data_sources: {
|
|
config {
|
|
name: "linux.ftrace"
|
|
ftrace_config {
|
|
ftrace_events: "lowmemorykiller/lowmemory_kill"
|
|
ftrace_events: "oom/oom_score_adj_update"
|
|
ftrace_events: "ftrace/print"
|
|
atrace_apps: "lmkd"
|
|
}
|
|
}
|
|
}
|
|
duration_ms: 60000
|
|
|
|
EOF
|
|
```
|
|
|
|
Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/oom-trace`
|
|
and upload to the [Perfetto UI](https://ui.perfetto.dev).
|
|
|
|
![OOM Score](/docs/images/oom-score.png)
|
|
|
|
We can see that the OOM score of Camera gets reduced (making it less likely
|
|
to be killed) when it is opened, and gets increased again once it is closed.
|
|
|
|
## {#heapprofd} Analyzing the Native Heap
|
|
|
|
**Native Heap Profiles require Android 10.**
|
|
|
|
NOTE: For detailed instructions about the native heap profiler and
|
|
troubleshooting see the [Data sources > Native heap profiler](
|
|
/docs/data-sources/native-heap-profiler.md) page.
|
|
|
|
Applications usually get memory through `malloc` or C++'s `new` rather than
|
|
directly getting it from the kernel. The allocator makes sure that your memory
|
|
is more efficiently handled (i.e. there are not many gaps) and that the
|
|
overhead from asking the kernel remains low.
|
|
|
|
We can log the native allocations and frees that a process does using
|
|
*heapprofd*. The resulting profile can be used to attribute memory usage
|
|
to particular function callstacks, supporting a mix of both native and Java
|
|
code. The profile *will only show allocations done while it was running*, any
|
|
allocations done before will not be shown.
|
|
|
|
### {#capture-profile-native} Capturing the profile
|
|
|
|
Use the `tools/heap_profile` script to profile a process. If you are having
|
|
trouble make sure you are using the [latest version](
|
|
https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile).
|
|
See all the arguments using `tools/heap_profile -h`, or use the defaults
|
|
and just profile a process (e.g. `system_server`):
|
|
|
|
```bash
|
|
$ tools/heap_profile -n system_server
|
|
|
|
Profiling active. Press Ctrl+C to terminate.
|
|
You may disconnect your device.
|
|
|
|
Wrote profiles to /tmp/profile-1283e247-2170-4f92-8181-683763e17445 (symlink /tmp/heap_profile-latest)
|
|
These can be viewed using pprof. Googlers: head to pprof/ and upload them.
|
|
```
|
|
|
|
When you see *Profiling active*, play around with the phone a bit. When you
|
|
are done, press Ctrl-C to end the profile. For this tutorial, I opened a
|
|
couple of apps.
|
|
|
|
### Viewing the data
|
|
|
|
Then upload the `raw-trace` file from the output directory to the
|
|
[Perfetto UI](https://ui.perfetto.dev) and click on diamond marker that
|
|
shows.
|
|
|
|
![Profile Diamond](/docs/images/profile-diamond.png)
|
|
|
|
The tabs that are available are
|
|
|
|
* **space**: how many bytes were allocated but not freed at this callstack the
|
|
moment the dump was created.
|
|
* **alloc\_space**: how many bytes were allocated (including ones freed at the
|
|
moment of the dump) at this callstack
|
|
* **objects**: how many allocations without matching frees were sampled at this
|
|
callstack.
|
|
* **alloc\_objects**: how many allocations (including ones with matching frees)
|
|
were sampled at this callstack.
|
|
|
|
The default view will show you all allocations that were done while the
|
|
profile was running but that weren't freed (the **space** tab).
|
|
|
|
![Native Flamegraph](/docs/images/syssrv-apk-assets-two.png)
|
|
|
|
We can see that a lot of memory gets allocated in paths through
|
|
`ResourceManager.loadApkAssets`. To get the total memory that was allocated
|
|
this way, we can enter "loadApkAssets" into the Focus textbox. This will only
|
|
show callstacks where some frame matches "loadApkAssets".
|
|
|
|
![Native Flamegraph with Focus](/docs/images/syssrv-apk-assets-focus.png)
|
|
|
|
From this we have a clear idea where in the code we have to look. From the
|
|
code we can see how that memory is being used and if we actually need all of
|
|
it. In this case the key is the `_CompressedAsset` that requires decompressing
|
|
into RAM rather than being able to (_cleanly_) memory-map. By not compressing
|
|
these data, we can save RAM.
|
|
|
|
## {#java-hprof} Analyzing the Java Heap
|
|
|
|
**Java Heap Dumps require Android 11.**
|
|
|
|
NOTE: For detailed instructions about the Java heap profiler and
|
|
troubleshooting see the [Data sources > Java heap profiler](
|
|
/docs/data-sources/java-heap-profiler.md) page.
|
|
|
|
### {#capture-profile-java} Capturing the profile
|
|
We can get a snapshot of the graph of all the Java objects that constitute the
|
|
Java heap. We use the `tools/java_heap_dump` script. If you are having trouble
|
|
make sure you are using the [latest version](
|
|
https://raw.githubusercontent.com/google/perfetto/master/tools/java_heap_dump).
|
|
|
|
```bash
|
|
$ tools/java_heap_dump -n com.android.systemui
|
|
|
|
Dumping Java Heap.
|
|
Wrote profile to /tmp/tmpup3QrQprofile
|
|
This can be viewed using https://ui.perfetto.dev.
|
|
```
|
|
|
|
### Viewing the Data
|
|
|
|
Upload the trace to the [Perfetto UI](https://ui.perfetto.dev) and click on
|
|
diamond marker that shows.
|
|
|
|
![Profile Diamond](/docs/images/profile-diamond.png)
|
|
|
|
This will present a flamegraph of the memory attributed to the shortest path
|
|
to a garbage-collection root. In general an object is reachable by many paths,
|
|
we only show the shortest as that reduces the complexity of the data displayed
|
|
and is generally the highest-signal. The rightmost `[merged]` stacks is the
|
|
sum of all objects that are too small to be displayed.
|
|
|
|
![Java Flamegraph](/docs/images/java-flamegraph.png)
|
|
|
|
The tabs that are available are
|
|
|
|
* **space**: how many bytes are retained via this path to the GC root.
|
|
* **objects**: how many objects are retained via this path to the GC root.
|
|
|
|
If we want to only see callstacks that have a frame that contains some string,
|
|
we can use the Focus feature. If we want to know all allocations that have to
|
|
do with notifications, we can put "notification" in the Focus box.
|
|
|
|
As with native heap profiles, if we want to focus on some specific aspect of the
|
|
graph, we can filter by the names of the classes. If we wanted to see everything
|
|
that could be caused by notifications, we can put "notification" in the Focus box.
|
|
|
|
![Java Flamegraph with Focus](/docs/images/java-flamegraph-focus.png)
|
|
|
|
We aggregate the paths per class name, so if there are multiple objects of the
|
|
same type retained by a `java.lang.Object[]`, we will show one element as its
|
|
child, as you can see in the leftmost stack above.
|