7.3 KiB
Reactor Debug Info Generation
Introduction
Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime configurations, or to even build a compiler.
In order to debug executable code at a higher level than disassembly, source code files are required.
Reactor has two potential sources of source code:
- The C++ source code of the program that calls into Reactor.
- External source files read by the program and passed to Reactor.
While case (2) is preferable for implementing a compiler, this is currently not implemented.
Reactor implements case (1) and this can be used by GDB to single line step and inspect variables.
Supported Platforms
Currently:
- Debug info generation is only supported on Linux with the LLVM 7 backend.
- GDB is the only supported debugger.
- The program must be compiled with debug info iteself.
Enabling
Debug generation is enabled with REACTOR_EMIT_DEBUG_INFO
CMake flag (defaults
to disabled).
Implementation details
Source Location
All Reactor functions begin with a call to RR_DEBUG_INFO_UPDATE_LOC()
, which calls into rr::DebugInfo::EmitLocation()
.
rr::DebugInfo::EmitLocation()
calls rr::DebugInfo::getCallerBacktrace()
,
which in turn uses libbacktrace
to unwind the stack and find the file, function and line of the caller.
This information is passed to llvm::IRBuilder<>::SetCurrentDebugLocation
to emit source line information for the next LLVM instructions to be built.
Variables
There are 3 aspects to generating variable debug information:
1. Variable names
Constructing a Reactor LValue
:
rr::Int a = 1;
Will emit an LLVM alloca
instruction to allocate the storage of the variable,
and emit another to initialize it to the constant 1
. While fluent, none of the
Reactor calls see the name of the C++ local variable "a
", and the LLVM alloca
value gets a meaningless numerical value.
There are two potential ways that Reactor can obtain the variable name:
- Use the running executable's own debug information to examine the local declaration and extract the local variable's name.
- Use the backtrace information to parse the name from the source file.
While (1) is arguably a cleaner and more robust solution, (2) is easier to implement and can work for the majority of use cases.
(2) is the current solution implemented.
rr::DebugInfo::getOrParseFileTokens()
scans a source file line by line, and
uses a regular expression to look for patterns of <type> <name>
. Matching is not
precise, but is adequate to find locals constructed with and without assignment.
2. Variable binding
Given that we can find a variable name for a given source line, we need a way of binding the LLVM values to the name.
Given our trivial example:
rr::Int a = 1
The rr::Int
constructor calls RR_DEBUG_INFO_EMIT_VAR()
passing the storage
value as single argument. RR_DEBUG_INFO_EMIT_VAR()
performs the backtrace
to find the source file and line and uses the token information produced by
rr::DebugInfo::getOrParseFileTokens()
to identify the variable name.
However, things get a bit more complicated when there are multiple variables being constructed on the same line.
Take for example:
rr::Int a = rr::Int(1) + rr::Int(2)
Here we have 3 calls to the rr::Int
constructor, each calling down
to RR_DEBUG_INFO_EMIT_VAR()
.
To disambiguate which of these should be bound to the variable name "a
",
rr::DebugInfo::EmitVariable()
buffers the binding into
scope.pending
and the last binding for a given line is used by
DebugInfo::emitPending()
. For variable construction and assignment, C++
guarantees that the LHS is the last value to be constructed.
This solution is not perfect.
Multi-line expressions, multiple assignments on a single line, macro obfuscation can all break variable bindings - however the majority of typical cases work.
3. Variable scope
rr::DebugInfo
maintains a stack of llvm::DIScope
s and llvm::DILocation
s
that mirrors the current backtrace for function being called.
A synthetic call stack is produced by chaining llvm::DILocation
s with
InlinedAt
s.
For example, at the declaration of i
:
void B()
{
rr::Int i; // <- here
}
void A()
{
B();
}
int main(int argc, const char* argv[])
{
A();
}
The DIScope
hierarchy would be:
DIFile: "foo.cpp"
rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A"
rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
The DILocation
hierarchy would be:
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
rr::DebugInfo::diScope[1].location: ↳ DILocation(DISubprogram: "A")
rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
Where '↳' represents an InlinedAt
.
rr::DebugInfo::diScope
is updated by rr::DebugInfo::syncScope()
.
llvm::DIScope
s typically do not nest - there is usually a separate
llvm::DISubprogram
for each function in the callstack. All local variables
within a function will typically share the same scope, regardless of whether
they are declared within a sub-block.
Loops and jumps within a function add complexity. Consider:
void B()
{
rr::Int i = 0;
}
void A()
{
for (int i = 0; i < 3; i++)
{
rr::Int x = 0;
}
B();
}
int main(int argc, const char* argv[])
{
A();
}
In this particular example Reactor will not be aware of the for
loop, and will
attempt to create three variables called "x
" in the same function scope for A()
.
Duplicate symbols in the same llvm::DIScope
result in undefined behavior.
To solve this, rr::DebugInfo::syncScope()
observes when a function jumps
backwards, and forks the current llvm::DILexicalBlock
for the function. This
results in a number of llvm::DILexicalBlock
chains, each declaring variables
that shadow the previous block.
At the declaration of i
, the DIScope
hierarchy would be:
DIFile: "foo.cpp"
rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
↳ DISubprogram: "A"
| ↳ DILexicalBlock: "A".1
rr::DebugInfo::diScope[1].di: | ↳ DILexicalBlock: "A".2
rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
The DILocation
hierarchy would be:
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
rr::DebugInfo::diScope[1].location: ↳ DILocation(DILexicalBlock: "A".2)
rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
Debugger integration
Once the debug information has been generated, it needs to be handed to the debugger.
Reactor uses llvm::JITEventListener::createGDBRegistrationListener()
to inform GDB of the JIT'd program and its debugging information.
More information can be found here.
LLDB should be able to support this same mechanism, but at the time of writing this does not appear to work.