You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
238 lines
7.3 KiB
238 lines
7.3 KiB
# Reactor Debug Info Generation
|
|
|
|
## Introduction
|
|
|
|
Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime
|
|
configurations, or to even build a compiler.
|
|
|
|
In order to debug executable code at a higher level than disassembly, source code files are required.
|
|
|
|
Reactor has two potential sources of source code:
|
|
|
|
1. The C++ source code of the program that calls into Reactor.
|
|
2. External source files read by the program and passed to Reactor.
|
|
|
|
While case (2) is preferable for implementing a compiler, this is currently not
|
|
implemented.
|
|
|
|
Reactor implements case (1) and this can be used by GDB to single line step and
|
|
inspect variables.
|
|
|
|
## Supported Platforms
|
|
|
|
Currently:
|
|
|
|
* Debug info generation is only supported on Linux with the LLVM 7
|
|
backend.
|
|
* GDB is the only supported debugger.
|
|
* The program must be compiled with debug info iteself.
|
|
|
|
## Enabling
|
|
|
|
Debug generation is enabled with `REACTOR_EMIT_DEBUG_INFO` CMake flag (defaults
|
|
to disabled).
|
|
|
|
## Implementation details
|
|
|
|
### Source Location
|
|
|
|
All Reactor functions begin with a call to `RR_DEBUG_INFO_UPDATE_LOC()`, which calls into `rr::DebugInfo::EmitLocation()`.
|
|
|
|
`rr::DebugInfo::EmitLocation()` calls `rr::DebugInfo::getCallerBacktrace()`,
|
|
which in turn uses [`libbacktrace`](https://github.com/ianlancetaylor/libbacktrace)
|
|
to unwind the stack and find the file, function and line of the caller.
|
|
|
|
This information is passed to `llvm::IRBuilder<>::SetCurrentDebugLocation`
|
|
to emit source line information for the next LLVM instructions to be built.
|
|
|
|
### Variables
|
|
|
|
There are 3 aspects to generating variable debug information:
|
|
|
|
#### 1. Variable names
|
|
|
|
Constructing a Reactor `LValue`:
|
|
|
|
```C++
|
|
rr::Int a = 1;
|
|
```
|
|
|
|
Will emit an LLVM `alloca` instruction to allocate the storage of the variable,
|
|
and emit another to initialize it to the constant `1`. While fluent, none of the
|
|
Reactor calls see the name of the C++ local variable "`a`", and the LLVM `alloca`
|
|
value gets a meaningless numerical value.
|
|
|
|
There are two potential ways that Reactor can obtain the variable name:
|
|
|
|
1. Use the running executable's own debug information to examine the local
|
|
declaration and extract the local variable's name.
|
|
2. Use the backtrace information to parse the name from the source file.
|
|
|
|
While (1) is arguably a cleaner and more robust solution, (2) is
|
|
easier to implement and can work for the majority of use cases.
|
|
|
|
(2) is the current solution implemented.
|
|
|
|
`rr::DebugInfo::getOrParseFileTokens()` scans a source file line by line, and
|
|
uses a regular expression to look for patterns of `<type> <name>`. Matching is not
|
|
precise, but is adequate to find locals constructed with and without assignment.
|
|
|
|
#### 2. Variable binding
|
|
|
|
Given that we can find a variable name for a given source line, we need a way of
|
|
binding the LLVM values to the name.
|
|
|
|
Given our trivial example:
|
|
|
|
```C++
|
|
rr::Int a = 1
|
|
```
|
|
|
|
The `rr::Int` constructor calls `RR_DEBUG_INFO_EMIT_VAR()` passing the storage
|
|
value as single argument. `RR_DEBUG_INFO_EMIT_VAR()` performs the backtrace
|
|
to find the source file and line and uses the token information produced by
|
|
`rr::DebugInfo::getOrParseFileTokens()` to identify the variable name.
|
|
|
|
However, things get a bit more complicated when there are multiple variables
|
|
being constructed on the same line.
|
|
|
|
Take for example:
|
|
|
|
```C++
|
|
rr::Int a = rr::Int(1) + rr::Int(2)
|
|
```
|
|
|
|
Here we have 3 calls to the `rr::Int` constructor, each calling down
|
|
to `RR_DEBUG_INFO_EMIT_VAR()`.
|
|
|
|
To disambiguate which of these should be bound to the variable name "`a`",
|
|
`rr::DebugInfo::EmitVariable()` buffers the binding into
|
|
`scope.pending` and the last binding for a given line is used by
|
|
`DebugInfo::emitPending()`. For variable construction and assignment, C++
|
|
guarantees that the LHS is the last value to be constructed.
|
|
|
|
This solution is not perfect.
|
|
|
|
Multi-line expressions, multiple assignments on a single line, macro obfuscation
|
|
can all break variable bindings - however the majority of typical cases work.
|
|
|
|
#### 3. Variable scope
|
|
|
|
`rr::DebugInfo` maintains a stack of `llvm::DIScope`s and `llvm::DILocation`s
|
|
that mirrors the current backtrace for function being called.
|
|
|
|
A synthetic call stack is produced by chaining `llvm::DILocation`s with
|
|
`InlinedAt`s.
|
|
|
|
For example, at the declaration of `i`:
|
|
|
|
```C++
|
|
void B()
|
|
{
|
|
rr::Int i; // <- here
|
|
}
|
|
|
|
void A()
|
|
{
|
|
B();
|
|
}
|
|
|
|
int main(int argc, const char* argv[])
|
|
{
|
|
A();
|
|
}
|
|
```
|
|
|
|
The `DIScope` hierarchy would be:
|
|
|
|
```C++
|
|
DIFile: "foo.cpp"
|
|
rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
|
|
rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A"
|
|
rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
|
|
```
|
|
|
|
The `DILocation` hierarchy would be:
|
|
|
|
```C++
|
|
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
|
|
rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
|
|
rr::DebugInfo::diScope[1].location: ↳ DILocation(DISubprogram: "A")
|
|
rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
|
|
```
|
|
|
|
Where '↳' represents an `InlinedAt`.
|
|
|
|
|
|
`rr::DebugInfo::diScope` is updated by `rr::DebugInfo::syncScope()`.
|
|
|
|
`llvm::DIScope`s typically do not nest - there is usually a separate
|
|
`llvm::DISubprogram` for each function in the callstack. All local variables
|
|
within a function will typically share the same scope, regardless of whether
|
|
they are declared within a sub-block.
|
|
|
|
Loops and jumps within a function add complexity. Consider:
|
|
|
|
```C++
|
|
void B()
|
|
{
|
|
rr::Int i = 0;
|
|
}
|
|
|
|
void A()
|
|
{
|
|
for (int i = 0; i < 3; i++)
|
|
{
|
|
rr::Int x = 0;
|
|
}
|
|
B();
|
|
}
|
|
|
|
int main(int argc, const char* argv[])
|
|
{
|
|
A();
|
|
}
|
|
```
|
|
|
|
In this particular example Reactor will not be aware of the `for` loop, and will
|
|
attempt to create three variables called "`x`" in the same function scope for `A()`.
|
|
Duplicate symbols in the same `llvm::DIScope` result in undefined behavior.
|
|
|
|
To solve this, `rr::DebugInfo::syncScope()` observes when a function jumps
|
|
backwards, and forks the current `llvm::DILexicalBlock` for the function. This
|
|
results in a number of `llvm::DILexicalBlock` chains, each declaring variables
|
|
that shadow the previous block.
|
|
|
|
At the declaration of `i`, the `DIScope` hierarchy would be:
|
|
|
|
```C++
|
|
DIFile: "foo.cpp"
|
|
rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
|
|
↳ DISubprogram: "A"
|
|
| ↳ DILexicalBlock: "A".1
|
|
rr::DebugInfo::diScope[1].di: | ↳ DILexicalBlock: "A".2
|
|
rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
|
|
```
|
|
|
|
The `DILocation` hierarchy would be:
|
|
|
|
```C++
|
|
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
|
|
rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
|
|
rr::DebugInfo::diScope[1].location: ↳ DILocation(DILexicalBlock: "A".2)
|
|
rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
|
|
```
|
|
|
|
### Debugger integration
|
|
|
|
Once the debug information has been generated, it needs to be handed to the
|
|
debugger.
|
|
|
|
Reactor uses [`llvm::JITEventListener::createGDBRegistrationListener()`](http://llvm.org/doxygen/classllvm_1_1JITEventListener.html#a004abbb5a0d48ac376dfbe3e3c97c306)
|
|
to inform GDB of the JIT'd program and its debugging information.
|
|
More information [can be found here](https://llvm.org/docs/DebuggingJITedCode.html).
|
|
|
|
LLDB should be able to support this same mechanism, but at the time of writing
|
|
this does not appear to work.
|
|
|