|
|
# heapprofd: Sampling for Memory Profiles
|
|
|
|
|
|
_tomlewin, fmayer **·** 2021-04-14_
|
|
|
|
|
|
## Background
|
|
|
|
|
|
A heap profiler associates memory allocations with the callstacks on which they
|
|
|
happen ([example visualization]). It is prohibitively expensive to handle every
|
|
|
allocation done by a program, so the [Android Heap Profiler] employs a sampling
|
|
|
approach that handles a statistically meaningful subset of allocations. This
|
|
|
document explores the statistical properties thereof.
|
|
|
|
|
|
## Conceptual motivation
|
|
|
Conceptually the sampling is implemented such that each byte has some
|
|
|
probability p of being sampled. In theory we can think of each byte undergoing a
|
|
|
Bernoulli trial. The reason we use a random sampling approach, as opposed to
|
|
|
taking every nth byte, is that there may be regular allocation patterns in the
|
|
|
code that may be missed by a correspondingly regular sampling.
|
|
|
|
|
|
To scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if
|
|
|
we sample a byte with probability 10%, then each byte sampled represents 10
|
|
|
bytes allocated.
|
|
|
|
|
|
|
|
|
## Implementation
|
|
|
|
|
|
In practice, the [algorithm] works as follows:
|
|
|
|
|
|
1. We look at an allocation
|
|
|
|
|
|
2. If the size of the allocation is large enough that there’s a greater than 99%
|
|
|
chance of it being sampled at least once, we return the size of the
|
|
|
allocation directly. This is a performance optimization.
|
|
|
|
|
|
3. If the size of the allocation is smaller, then we compute the number of times
|
|
|
we would draw a sample if we sampled each byte with the given sampling rate:
|
|
|
|
|
|
* In practice we do this by keeping track of the arrival time of the next
|
|
|
sample. When an allocation happens, we subtract its size from the arrival
|
|
|
time of the next sample, and check whether we have brought it below zero. We
|
|
|
then repeatedly draw from the exponential distribution (which is the
|
|
|
interarrival time of Poisson) until the arrival time is brought back
|
|
|
above 0. The amount of times we had to draw from the exponential
|
|
|
distribution is the number of samples the allocation should count as.
|
|
|
|
|
|
* We multiply the number of samples we drew within the allocation by the
|
|
|
sampling rate to get an estimate of the size of the allocation
|
|
|
|
|
|
We instead draw directly from the Poisson/binomial distribution to see how many
|
|
|
samples we get for a given allocation size, but this is not as computationally
|
|
|
efficient. This is because we don’t sample most of the allocations, due to their
|
|
|
small size and our low sampling rate. This means it’s more efficient to use the
|
|
|
exponential draw approach, as for a non-sample, we only need to decrement a
|
|
|
counter. Sampling from the Poisson for every allocation (rather than drawing
|
|
|
from exponential for every sample) is more expensive.
|
|
|
|
|
|
## Theoretical performance
|
|
|
|
|
|
If we sample at some rate 1 / p, then to set p reasonably we need to know what
|
|
|
our probability of sampling an allocation of any size is, as well as our
|
|
|
expected error when we sample it. If we set p = 1 then we sample everything and
|
|
|
have no error. Reducing the sampling rate costs us coverage and accuracy.
|
|
|
|
|
|
### Sampling probabilities
|
|
|
|
|
|
We will sample an allocation with probability Exponential(sampling rate) <
|
|
|
allocation size. This is equivalent to the probability that we do not fail to
|
|
|
sample all bytes within the allocation if we sample bytes at our sampling rate.
|
|
|
|
|
|
Because the exponential distribution is memoryless, we can add together
|
|
|
allocations that are the same even if they occur apart for the purposes of
|
|
|
probability. This means that if we have an allocation of 1MB and then another of
|
|
|
1MB, the probability of us taking at least one sample is the same as the
|
|
|
probability of us taking at least one sample of a contiguous 2MB allocation.
|
|
|
|
|
|
We can see from the chart below that if we 16X our sampling rate from 32KiB to
|
|
|
512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X
|
|
|
it to 2048KiB we still have an 80% chance to sample anything above 3.5MiB.
|
|
|
|
|
|
![](/docs/images/heapprofd-sampling/one-sample.png)
|
|
|
|
|
|
### Expected errors
|
|
|
We can also consider the expected error we’ll make at given allocation sizes and
|
|
|
sampling rates. Like before it doesn’t matter where the allocation happens — if
|
|
|
we have two allocations of the same type occurring at different times they have
|
|
|
the same statistical properties as a single allocation with size equal to their
|
|
|
sum. This is because the exponential distribution we use is memoryless.
|
|
|
|
|
|
|
|
|
For expected error we report something like the [mean absolute percentage
|
|
|
error]. This just means we see how wrong we might be in percent relative to the
|
|
|
true allocation size, and then scale these results according to their
|
|
|
probability of occurrence. This is an estimator that has some issues (i.e. it’s
|
|
|
biased such that underestimates are more penalised) but it’s easy to reason
|
|
|
about and it’s useful as a gauge of how wrong on average we might be with our
|
|
|
estimates. I would recommend just reading this as analogous to a standard
|
|
|
deviation.
|
|
|
|
|
|
|
|
|
Charts below show both the expected error and the conditional expected error:
|
|
|
the expected error assuming we have at least one sample within the allocation.
|
|
|
There’s periodicity in both in line with the sampling rate used. The thing to
|
|
|
take away is that, while the estimates are unbiased such that their expected
|
|
|
value is equal to their real value, substantial errors are still possible.
|
|
|
|
|
|
![](/docs/images/heapprofd-sampling/expected-error.png)
|
|
|
|
|
|
Assuming that we take at least one sample of an allocation, what error might we
|
|
|
expect? We can answer that using the following chart, which shows expected error
|
|
|
conditional on at least one sample being taken. This is the expected error we
|
|
|
can expect for the things that end up in our heap profile. It’s important to
|
|
|
emphasise that this expected error methodology is not exact, and also that the
|
|
|
underlying sampling remains unbiased — the expected value is the true value.
|
|
|
This should be considered more as a measure akin to standard deviation.
|
|
|
|
|
|
![](/docs/images/heapprofd-sampling/conditional-expected-error.png)
|
|
|
|
|
|
## Performance Considerations
|
|
|
### STL Distributions
|
|
|
|
|
|
Benchmarking of the STL distributions on a Pixel 4 reinforces our approach of
|
|
|
estimating the geometric distribution using an exponential distribution, as its
|
|
|
performance is ~8x better ([full results]).
|
|
|
|
|
|
|
|
|
Draw sample from exponential distribution with p = 1 / 32000:
|
|
|
|
|
|
ARM64: 21.3ns (sd 0.066ns, negligibly small, smaller than a CPU cycle)
|
|
|
|
|
|
ARM32: 33.2ns (sd 0.011ns, negligibly small, smaller than a CPU cycle)
|
|
|
|
|
|
|
|
|
Draw sample from geometric distribution with p = 1 / 32000:
|
|
|
|
|
|
ARM64: 169ns (sd 0.023ns, negligibly small, smaller than a CPU cycle) (7.93x)
|
|
|
|
|
|
ARM32: 222ns (sd 0.118ns, negligibly small, smaller than a CPU cycle) (6.69x)
|
|
|
|
|
|
## Appendix
|
|
|
|
|
|
### Improvements made
|
|
|
|
|
|
We previously (before Android 12) returned the size of the allocation accurately
|
|
|
and immediately if the allocation size was greater than our sampling rate. This
|
|
|
had several impacts.
|
|
|
|
|
|
|
|
|
The most obvious impact is that with the old approach we would expect to sample
|
|
|
an allocation equal in size to our sampling rate ~63% of the time, rather than
|
|
|
100%. This means there is a discontinuity in coverage between an allocation a
|
|
|
byte smaller than our sampling rate, and one a byte larger. This is still
|
|
|
unbiased from a statistical perspective, but it’s important to note.
|
|
|
|
|
|
|
|
|
Another unusual impact is that the sampling rate depends not only on the size of
|
|
|
the memory being used, but also how it’s allocated. If our sampling rate were
|
|
|
10KB, and we have an allocation that’s 10KB, we sample it with certainty. If
|
|
|
instead that 10KB is split among two 5KB allocations, we sample it with
|
|
|
probability 63%. Again this doesn’t bias our results, but it means that our
|
|
|
measurements of memory where there are many small allocations might be noisier
|
|
|
than ones where there are a few large allocations, even if total memory used is
|
|
|
the same. If we didn’t return allocations greater than the sampling rate every
|
|
|
time, then the memorylessness property of the exponential distribution would
|
|
|
mean our method is insensitive to how the memory is split amongst allocations,
|
|
|
which seems a desirable property.
|
|
|
|
|
|
|
|
|
We altered the cutoff at which we simply returned the allocation size.
|
|
|
Previously, as discussed, the cutoff was at the sampling rate, which led to a
|
|
|
discontinuity. Now the cutoff is determined by the size at which we have a >99%
|
|
|
chance of sampling an allocation at least once, given the sampling rate we’re
|
|
|
using. This resolves the above issues.
|
|
|
|
|
|
[algorithm]:
|
|
|
https://cs.android.com/android/platform/superproject/+/master:external/perfetto/src/profiling/memory/sampler.h
|
|
|
[example visualization]: /docs/images/syssrv-apk-assets-two.png
|
|
|
[Android Heap Profiler]: /docs/design-docs/heapprofd-design
|
|
|
[mean absolute percentage error]: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
|
|
|
[full results]: https://gist.github.com/fmayer/3aafcaf58f8db09714ba09aa4ac2b1ac |