In order to buy some performance on the common, uninstrumented, fast path, we replace repeated
checks for both allocation instrumentation and allocator changes by  a single function table
dispatch, and templatized allocation code that can be used to generate either instrumented
or uninstrumented versions of allocation routines.

When we call an allocation routine, we always indirect through a thread-local function table that
either points to instrumented or uninstrumented allocation routines. The instrumented code has a
`kInstrumented` = true template argument (or `kIsInstrumented` in some places), the uninstrumented
code has `kInstrumented` = false.

The function table is thread-local. There appears to be no logical necessity for that; it just
makes it easier to access from compiled Java code.

- The function table is switched out by `InstrumentQuickAllocEntryPoints[Locked]`, and a
corresponding `UninstrumentQuickAlloc`... function.

- These in turn are called by `SetStatsEnabled()`, `SetAllocationListener()`, et al, which
require the mutator lock is not held.

- With a started runtime, `SetEntrypointsInstrumented()` calls `ScopedSupendAll(`) before updating
  the function table.

Mutual exclusion in the dispatch table is thus ensured by the fact that it is only updated while
all other threads are suspended, and is only accessed with the mutator lock logically held,
which inhibits suspension.

To ensure correctness, we thus must:

1. Suspend all threads when swapping out the dispatch table, and
2. Make sure that we hold the mutator lock when accessing it.
3. Not trust kInstrumented once we've given up the mutator lock, since it could have changed in the
    interim.