You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
375 lines
16 KiB
375 lines
16 KiB
<html>
|
|
<head>
|
|
<title>Dalvik Porting Guide</title>
|
|
</head>
|
|
|
|
<body>
|
|
<h1>Dalvik Porting Guide</h1>
|
|
|
|
<p>
|
|
The Dalvik virtual machine is intended to run on a variety of platforms.
|
|
The baseline system is expected to be a variant of UNIX (Linux, BSD, Mac
|
|
OS X) running the GNU C compiler. Little-endian CPUs have been exercised
|
|
the most heavily, but big-endian systems are explicitly supported.
|
|
</p><p>
|
|
There are two general categories of work: porting to a Linux system
|
|
with a previously unseen CPU architecture, and porting to a different
|
|
operating system. This document covers the former.
|
|
</p><p>
|
|
Basic familiarity with the Android platform, source code structure, and
|
|
build system is assumed.
|
|
</p>
|
|
|
|
|
|
<h2>Core Libraries</h2>
|
|
|
|
<p>
|
|
The native code in the core libraries (chiefly <code>libcore</code>,
|
|
but also <code>dalvik/vm/native</code>) is written in C/C++ and is expected
|
|
to work without modification in a Linux environment.
|
|
</p><p>
|
|
The core libraries pull in code from many other projects, including
|
|
OpenSSL, zlib, and ICU. These will also need to be ported before the VM
|
|
can be used.
|
|
</p>
|
|
|
|
|
|
<h2>JNI Call Bridge</h2>
|
|
|
|
<p>
|
|
Most of the Dalvik VM runtime is written in portable C. The one
|
|
non-portable component of the runtime is the JNI call bridge. Simply put,
|
|
this converts an array of integers into function arguments of various
|
|
types, and calls a function. This must be done according to the C calling
|
|
conventions for the platform. The task could be as simple as pushing all
|
|
of the arguments onto the stack, or involve complex rules for register
|
|
assignment and stack alignment.
|
|
</p><p>
|
|
To ease porting to new platforms, the <a href="http://sourceware.org/libffi/">
|
|
open-source FFI library</a> (Foreign Function Interface) is used when a
|
|
custom bridge is unavailable. FFI is not as fast as a native implementation,
|
|
and the optional performance improvements it does offer are not used, so
|
|
writing a replacement is a good first step.
|
|
</p><p>
|
|
The code lives in <code>dalvik/vm/arch/*</code>, with the FFI-based version
|
|
in the "generic" directory. There are two source files for each architecture.
|
|
One defines the call bridge itself:
|
|
</p><p><blockquote>
|
|
<code>void dvmPlatformInvoke(void* pEnv, ClassObject* clazz, int argInfo,
|
|
int argc, const u4* argv, const char* signature, void* func,
|
|
JValue* pReturn)</code>
|
|
</blockquote></p><p>
|
|
This will invoke a C/C++ function declared:
|
|
</p><p><blockquote>
|
|
<code>return_type func(JNIEnv* pEnv, Object* this [, <i>args</i>])<br></code>
|
|
</blockquote>or (for a "static" method):<blockquote>
|
|
<code>return_type func(JNIEnv* pEnv, ClassObject* clazz [, <i>args</i>])</code>
|
|
</blockquote></p><p>
|
|
The role of <code>dvmPlatformInvoke</code> is to convert the values in
|
|
<code>argv</code> into C-style calling conventions, call the method, and
|
|
then place the return type into <code>pReturn</code> (a union that holds
|
|
all of the basic JNI types). The code may use the method signature
|
|
(a DEX "shorty" signature, with one character for the return type and one
|
|
per argument) to determine how to handle the values.
|
|
</p><p>
|
|
The other source file involved here defines a 32-bit "hint". The hint
|
|
is computed when the method's class is loaded, and passed in as the
|
|
"argInfo" argument. The hint can be used to avoid scanning the ASCII
|
|
method signature for things like the return value, total argument size,
|
|
or inter-argument 64-bit alignment restrictions.
|
|
|
|
|
|
<h2>Interpreter</h2>
|
|
|
|
<p>
|
|
The Dalvik runtime includes two interpreters, labeled "portable" and "fast".
|
|
The portable interpreter is largely contained within a single C function,
|
|
and should compile on any system that supports gcc. (If you don't have gcc,
|
|
you may need to disable the "threaded" execution model, which relies on
|
|
gcc's "goto table" implementation; look for the THREADED_INTERP define.)
|
|
</p><p>
|
|
The fast interpreter uses hand-coded assembly fragments. If none are
|
|
available for the current architecture, the build system will create an
|
|
interpreter out of C "stubs". The resulting "all stubs" interpreter is
|
|
quite a bit slower than the portable interpreter, making "fast" something
|
|
of a misnomer.
|
|
</p><p>
|
|
The fast interpreter is enabled by default. On platforms without native
|
|
support, you may want to switch to the portable interpreter. This can
|
|
be controlled with the <code>dalvik.vm.execution-mode</code> system
|
|
property. For example, if you:
|
|
</p><p><blockquote>
|
|
<code>adb shell "echo dalvik.vm.execution-mode = int:portable >> /data/local.prop"</code>
|
|
</blockquote></p><p>
|
|
and reboot, the Android app framework will start the VM with the portable
|
|
interpreter enabled.
|
|
</p>
|
|
|
|
|
|
<h3>Mterp Interpreter Structure</h3>
|
|
|
|
<p>
|
|
There may be significant performance advantages to rewriting the
|
|
interpreter core in assembly language, using architecture-specific
|
|
optimizations. In Dalvik this can be done one instruction at a time.
|
|
</p><p>
|
|
The simplest way to implement an interpreter is to have a large "switch"
|
|
statement. After each instruction is handled, the interpreter returns to
|
|
the top of the loop, fetches the next instruction, and jumps to the
|
|
appropriate label.
|
|
</p><p>
|
|
An improvement on this is called "threaded" execution. The instruction
|
|
fetch and dispatch are included at the end of every instruction handler.
|
|
This makes the interpreter a little larger overall, but you get to avoid
|
|
the (potentially expensive) branch back to the top of the switch statement.
|
|
</p><p>
|
|
Dalvik mterp goes one step further, using a computed goto instead of a goto
|
|
table. Instead of looking up the address in a table, which requires an
|
|
extra memory fetch on every instruction, mterp multiplies the opcode number
|
|
by a fixed value. By default, each handler is allowed 64 bytes of space.
|
|
</p><p>
|
|
Not all handlers fit in 64 bytes. Those that don't can have subroutines
|
|
or simply continue on to additional code outside the basic space. Some of
|
|
this is handled automatically by Dalvik, but there's no portable way to detect
|
|
overflow of a 64-byte handler until the VM starts executing.
|
|
</p><p>
|
|
The choice of 64 bytes is somewhat arbitrary, but has worked out well for
|
|
ARM and x86.
|
|
</p><p>
|
|
In the course of development it's useful to have C and assembly
|
|
implementations of each handler, and be able to flip back and forth
|
|
between them when hunting problems down. In mterp this is relatively
|
|
straightforward. You can always see the files being fed to the compiler
|
|
and assembler for your platform by looking in the
|
|
<code>dalvik/vm/mterp/out</code> directory.
|
|
</p><p>
|
|
The interpreter sources live in <code>dalvik/vm/mterp</code>. If you
|
|
haven't yet, you should read <code>dalvik/vm/mterp/README.txt</code> now.
|
|
</p>
|
|
|
|
|
|
<h3>Getting Started With Mterp</h3>
|
|
|
|
</p><p>
|
|
Getting started:
|
|
<ol>
|
|
<li>Decide on the name of your architecture. For the sake of discussion,
|
|
let's call it <code>myarch</code>.
|
|
<li>Make a copy of <code>dalvik/vm/mterp/config-allstubs</code> to
|
|
<code>dalvik/vm/mterp/config-myarch</code>.
|
|
<li>Create a <code>dalvik/vm/mterp/myarch</code> directory to hold your
|
|
source files.
|
|
<li>Add <code>myarch</code> to the list in
|
|
<code>dalvik/vm/mterp/rebuild.sh</code>.
|
|
<li>Make sure <code>dalvik/vm/Android.mk</code> will find the files for
|
|
your architecture. If <code>$(TARGET_ARCH)</code> is configured this
|
|
will happen automatically.
|
|
<li>Disable the Dalvik JIT. You can do this in the general device
|
|
configuration, or by editing the initialization of WITH_JIT in
|
|
<code>dalvik/vm/Dvm.mk</code> to always be <code>false</code>.
|
|
</ol>
|
|
</p><p>
|
|
You now have the basic framework in place. Whenever you make a change, you
|
|
need to perform two steps: regenerate the mterp output, and build the
|
|
core VM library. (It's two steps because we didn't want the build system
|
|
to require Python 2.5. Which, incidentally, you need to have.)
|
|
<ol>
|
|
<li>In the <code>dalvik/vm/mterp</code> directory, regenerate the contents
|
|
of the files in <code>dalvik/vm/mterp/out</code> by executing
|
|
<code>./rebuild.sh</code>. Note there are two files, one in C and one
|
|
in assembly.
|
|
<li>In the <code>dalvik</code> directory, regenerate the
|
|
<code>libdvm.so</code> library with <code>mm</code>. You can also use
|
|
<code>mmm dalvik/vm</code> from the top of the tree.
|
|
</ol>
|
|
</p><p>
|
|
This will leave you with an updated libdvm.so, which can be pushed out to
|
|
a device with <code>adb sync</code> or <code>adb push</code>. If you're
|
|
using the emulator, you need to add <code>make snod</code> (System image,
|
|
NO Dependency check) to rebuild the system image file. You should not
|
|
need to do a top-level "make" and rebuild the dependent binaries.
|
|
</p><p>
|
|
At this point you have an "all stubs" interpreter. You can see how it
|
|
works by examining <code>dalvik/vm/mterp/cstubs/entry.c</code>. The
|
|
code runs in a loop, pulling out the next opcode, and invoking the
|
|
handler through a function pointer. Each handler takes a "glue" argument
|
|
that contains all of the useful state.
|
|
</p><p>
|
|
Your goal is to replace the entry method, exit method, and each individual
|
|
instruction with custom implementations. The first thing you need to do
|
|
is create an entry function that calls the handler for the first instruction.
|
|
After that, the instructions chain together, so you don't need a loop.
|
|
(Look at the ARM or x86 implementation to see how they work.)
|
|
</p><p>
|
|
Once you have that, you need something to jump to. You can't branch
|
|
directly to the C stub because it's expecting to be called with a "glue"
|
|
argument and then return. We need a C stub "wrapper" that does the
|
|
setup and jumps directly to the next handler. We write this in assembly
|
|
and then add it to the config file definition.
|
|
</p><p>
|
|
To see how this works, create a file called
|
|
<code>dalvik/vm/mterp/myarch/stub.S</code> that contains one line:
|
|
<pre>
|
|
/* stub for ${opcode} */
|
|
</pre>
|
|
Then, in <code>dalvik/vm/mterp/config-myarch</code>, add this below the
|
|
<code>handler-size</code> directive:
|
|
<pre>
|
|
# source for the instruction table stub
|
|
asm-stub myarch/stub.S
|
|
</pre>
|
|
</p><p>
|
|
Regenerate the sources with <code>./rebuild.sh</code>, and take a look
|
|
inside <code>dalvik/vm/mterp/out/InterpAsm-myarch.S</code>. You should
|
|
see 256 copies of the stub function in a single large block after the
|
|
<code>dvmAsmInstructionStart</code> label. The <code>stub.S</code>
|
|
code will be used anywhere you don't provide an assembly implementation.
|
|
</p><p>
|
|
Note that each block begins with a <code>.balign 64</code> directive.
|
|
This is what pads each handler out to 64 bytes. Note also that the
|
|
<code>${opcode}</code> text changed into an opcode name, which should
|
|
be used to call the C implementation (<code>dvmMterp_${opcode}</code>).
|
|
</p><p>
|
|
The actual contents of <code>stub.S</code> are up to you to define.
|
|
See <code>entry.S</code> and <code>stub.S</code> in the <code>armv5te</code>
|
|
or <code>x86</code> directories for working examples.
|
|
</p><p>
|
|
If you're working on a variation of an existing architecture, you may be
|
|
able to use most of the existing code and just provide replacements for
|
|
a few instructions. Look at the <code>vm/mterp/config-*</code> files
|
|
for examples.
|
|
</p>
|
|
|
|
|
|
<h3>Replacing Stubs</h3>
|
|
|
|
<p>
|
|
There are roughly 250 Dalvik opcodes, including some that are inserted by
|
|
<a href="dexopt.html">dexopt</a> and aren't described in the
|
|
<a href="dalvik-bytecode.html">Dalvik bytecode</a> documentation. Each
|
|
one must perform the appropriate actions, fetch the next opcode, and
|
|
branch to the next handler. The actions performed by the assembly version
|
|
must exactly match those performed by the C version (in
|
|
<code>dalvik/vm/mterp/c/OP_*</code>).
|
|
</p><p>
|
|
It is possible to customize the set of "optimized" instructions for your
|
|
platform. This is possible because optimized DEX files are not expected
|
|
to work on multiple devices. Adding, removing, or redefining instructions
|
|
is beyond the scope of this document, and for simplicity it's best to stick
|
|
with the basic set defined by the portable interpreter.
|
|
</p><p>
|
|
Once you have written a handler that looks like it should work, add
|
|
it to the config file. For example, suppose we have a working version
|
|
of <code>OP_NOP</code>. For demonstration purposes, fake it for now by
|
|
putting this into <code>dalvik/vm/mterp/myarch/OP_NOP.S</code>:
|
|
<pre>
|
|
/* This is my NOP handler */
|
|
</pre>
|
|
</p><p>
|
|
Then, in the <code>op-start</code> section of <code>config-myarch</code>, add:
|
|
<pre>
|
|
op OP_NOP myarch
|
|
</pre>
|
|
</p><p>
|
|
This tells the generation script to use the assembly version from the
|
|
<code>myarch</code> directory instead of the C version from the <code>c</code>
|
|
directory.
|
|
</p><p>
|
|
Execute <code>./rebuild.sh</code>. Look at <code>InterpAsm-myarch.S</code>
|
|
and <code>InterpC-myarch.c</code> in the <code>out</code> directory. You
|
|
will see that the <code>OP_NOP</code> stub wrapper has been replaced with our
|
|
new code in the assembly file, and the C stub implementation is no longer
|
|
included.
|
|
</p><p>
|
|
As you implement instructions, the C version and corresponding stub wrapper
|
|
will disappear from the output files. Eventually you will have a 100%
|
|
assembly interpreter. You may find it saves a little time to examine
|
|
the output of your compiler for some of the operations. The
|
|
<a href="porting-proto.c.txt">porting-proto.c</a> sample code can be
|
|
helpful here.
|
|
</p>
|
|
|
|
|
|
<h3>Interpreter Switching</h3>
|
|
|
|
<p>
|
|
The Dalvik VM actually includes a third interpreter implementation: the debug
|
|
interpreter. This is a variation of the portable interpreter that includes
|
|
support for debugging and profiling.
|
|
</p><p>
|
|
When a debugger attaches, or a profiling feature is enabled, the VM
|
|
will switch interpreters at a convenient point. This is done at the
|
|
same time as the GC safe point check: on a backward branch, a method
|
|
return, or an exception throw. Similarly, when the debugger detaches
|
|
or profiling is discontinued, execution transfers back to the "fast" or
|
|
"portable" interpreter.
|
|
</p><p>
|
|
Your entry function needs to test the "entryPoint" value in the "glue"
|
|
pointer to determine where execution should begin. Your exit function
|
|
will need to return a boolean that indicates whether the interpreter is
|
|
exiting (because we reached the "bottom" of a thread stack) or wants to
|
|
switch to the other implementation.
|
|
</p><p>
|
|
See the <code>entry.S</code> file in <code>x86</code> or <code>armv5te</code>
|
|
for examples.
|
|
</p>
|
|
|
|
|
|
<h3>Testing</h3>
|
|
|
|
<p>
|
|
A number of VM tests can be found in <code>dalvik/tests</code>. The most
|
|
useful during interpreter development is <code>003-omnibus-opcodes</code>,
|
|
which tests many different instructions.
|
|
</p><p>
|
|
The basic invocation is:
|
|
<pre>
|
|
$ cd dalvik/tests
|
|
$ ./run-test 003
|
|
</pre>
|
|
</p><p>
|
|
This will run test 003 on an attached device or emulator. You can run
|
|
the test against your desktop VM by specifying <code>--reference</code>
|
|
if you suspect the test may be faulty. You can also use
|
|
<code>--portable</code> and <code>--fast</code> to explictly specify
|
|
one Dalvik interpreter or the other.
|
|
</p><p>
|
|
Some instructions are replaced by <code>dexopt</code>, notably when
|
|
"quickening" field accesses and method invocations. To ensure
|
|
that you are testing the basic form of the instruction, add the
|
|
<code>--no-optimize</code> option.
|
|
</p><p>
|
|
There is no in-built instruction tracing mechanism. If you want
|
|
to know for sure that your implementation of an opcode handler
|
|
is being used, the easiest approach is to insert a "printf"
|
|
call. For an example, look at <code>common_squeak</code> in
|
|
<code>dalvik/vm/mterp/armv5te/footer.S</code>.
|
|
</p><p>
|
|
At some point you need to ensure that debuggers and profiling work with
|
|
your interpreter. The easiest way to do this is to simply connect a
|
|
debugger or toggle profiling. (A future test suite may include some
|
|
tests for this.)
|
|
</p>
|
|
|
|
|
|
<h2>Other Performance Issues</h2>
|
|
|
|
<p>
|
|
The <code>System.arraycopy()</code> function is heavily used. The
|
|
implementation relies on the bionic C library to provide a fast,
|
|
platform-optimized data copy function for arrays with elements wider
|
|
than one byte. If you're not using bionic, or your platform does not
|
|
have an implementation of this method, Dalvik will use correct but
|
|
sub-optimal algorithms instead. For best performance you will want
|
|
to provide your own version.
|
|
</p><p>
|
|
See the comments in <code>dalvik/vm/native/java_lang_System.c</code>
|
|
for details.
|
|
</p>
|
|
|
|
<p>
|
|
<address>Copyright © 2009 The Android Open Source Project</address>
|
|
|
|
</body>
|
|
</html>
|