You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
268 lines
12 KiB
268 lines
12 KiB
4 months ago
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||
|
"http://www.w3.org/TR/html4/strict.dtd">
|
||
|
<html>
|
||
|
<head>
|
||
|
<title>Open Projects</title>
|
||
|
<link type="text/css" rel="stylesheet" href="menu.css">
|
||
|
<link type="text/css" rel="stylesheet" href="content.css">
|
||
|
<script type="text/javascript" src="scripts/menu.js"></script>
|
||
|
</head>
|
||
|
<body>
|
||
|
|
||
|
<div id="page">
|
||
|
<!--#include virtual="menu.html.incl"-->
|
||
|
<div id="content">
|
||
|
|
||
|
<h1>Open Projects</h1>
|
||
|
|
||
|
<p>This page lists several projects that would boost analyzer's usability and
|
||
|
power. Most of the projects listed here are infrastructure-related so this list
|
||
|
is an addition to the <a href="potential_checkers.html">potential checkers
|
||
|
list</a>. If you are interested in tackling one of these, please send an email
|
||
|
to the <a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev
|
||
|
mailing list</a> to notify other members of the community.</p>
|
||
|
|
||
|
<ul>
|
||
|
<li>Release checkers from "alpha"
|
||
|
<p>New checkers which were contributed to the analyzer,
|
||
|
but have not passed a rigorous evaluation process,
|
||
|
are committed as "alpha checkers" (from "alpha version"),
|
||
|
and are not enabled by default.</p>
|
||
|
|
||
|
<p>Ideally, only the checkers which are actively being worked on should be in
|
||
|
"alpha",
|
||
|
but over the years the development of many of those has stalled.
|
||
|
Such checkers should either be improved
|
||
|
up to a point where they can be enabled by default,
|
||
|
or removed from the analyzer entirely.
|
||
|
|
||
|
<ul>
|
||
|
<li><code>alpha.security.ArrayBound</code> and
|
||
|
<code>alpha.security.ArrayBoundV2</code>
|
||
|
<p>Array bounds checking is a desired feature,
|
||
|
but having an acceptable rate of false positives might not be possible
|
||
|
without a proper
|
||
|
<a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">loop widening</a> support.
|
||
|
Additionally, it might be more promising to perform index checking based on
|
||
|
<a href="https://en.wikipedia.org/wiki/Taint_checking">tainted</a> index values.
|
||
|
<p><i>(Difficulty: Medium)</i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li><code>alpha.unix.StreamChecker</code>
|
||
|
<p>A SimpleStreamChecker has been presented in the Building a Checker in 24
|
||
|
Hours talk
|
||
|
(<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
|
||
|
<a href="https://youtu.be/kdxlsP5QVPw">video</a>).</p>
|
||
|
|
||
|
<p>This alpha checker is an attempt to write a production grade stream checker.
|
||
|
However, it was found to have an unacceptably high false positive rate.
|
||
|
One of the found problems was that eagerly splitting the state
|
||
|
based on whether the system call may fail leads to too many reports.
|
||
|
A <em>delayed</em> split where the implication is stored in the state
|
||
|
(similarly to nullability implications in <code>TrustNonnullChecker</code>)
|
||
|
may produce much better results.</p>
|
||
|
<p><i>(Difficulty: Medium)</i></p>
|
||
|
</li>
|
||
|
</ul>
|
||
|
</li>
|
||
|
|
||
|
<li>Improve C++ support
|
||
|
<ul>
|
||
|
<li>Handle construction as part of aggregate initialization.
|
||
|
<p><a href="https://en.cppreference.com/w/cpp/language/aggregate_initialization">Aggregates</a>
|
||
|
are objects that can be brace-initialized without calling a
|
||
|
constructor (that is, <code><a href="https://clang.llvm.org/doxygen/classclang_1_1CXXConstructExpr.html">
|
||
|
CXXConstructExpr</a></code> does not occur in the AST),
|
||
|
but potentially calling
|
||
|
constructors for their fields and base classes
|
||
|
These
|
||
|
constructors of sub-objects need to know what object they are constructing.
|
||
|
Moreover, if the aggregate contains
|
||
|
references, lifetime extension needs to be properly modeled.
|
||
|
|
||
|
One can start untangling this problem by trying to replace the
|
||
|
current ad-hoc <code><a href="https://clang.llvm.org/doxygen/classclang_1_1ParentMap.html">
|
||
|
ParentMap</a></code> lookup in <a href="https://clang.llvm.org/doxygen/ExprEngineCXX_8cpp_source.html#l00430">
|
||
|
<code>CXXConstructExpr::CK_NonVirtualBase</code></a> branch of
|
||
|
<code>ExprEngine::VisitCXXConstructExpr()</code>
|
||
|
with proper support for the feature.
|
||
|
<p><i>(Difficulty: Medium) </i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li>Handle array constructors.
|
||
|
<p>When an array of objects is allocated (say, using the
|
||
|
<code>operator new[]</code> or defining a stack array),
|
||
|
constructors for all elements of the array are called.
|
||
|
We should model (potentially some of) such evaluations,
|
||
|
and the same applies for destructors called from
|
||
|
<code>operator delete[]</code>.
|
||
|
See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_with_new_array.cpp">handle_constructors_with_new_array.cpp</a>.
|
||
|
</p>
|
||
|
<p>
|
||
|
Constructing an array requires invoking multiple (potentially unknown)
|
||
|
amount of constructors with the same construct-expression.
|
||
|
Apart from the technical difficulties of juggling program points around
|
||
|
correctly to avoid accidentally merging paths together, we'll have to
|
||
|
be a judge on when to exit the loop and how to widen it.
|
||
|
Given that the constructor is going to be a default constructor,
|
||
|
a nice 95% solution might be to execute exactly one constructor and
|
||
|
then default-bind the resulting LazyCompoundVal to the whole array;
|
||
|
it'll work whenever the default constructor doesn't touch global state
|
||
|
but only initializes the object to various default values.
|
||
|
But if, say, we're making an array of strings,
|
||
|
depending on the implementation you might have to allocate a new buffer
|
||
|
for each string, and in this case default-binding won't cut it.
|
||
|
We might want to come up with an auxiliary analysis in order to perform
|
||
|
widening of these simple loops more precisely.
|
||
|
</p>
|
||
|
</li>
|
||
|
|
||
|
<li>Handle constructors that can be elided due to Named Return Value Optimization (NRVO)
|
||
|
<p>Local variables which are returned by values on all return statements
|
||
|
may be stored directly at the address for the return value,
|
||
|
eliding the copy or move constructor call.
|
||
|
Such variables can be identified using the AST call <code>VarDecl::isNRVOVariable</code>.
|
||
|
</p>
|
||
|
</li>
|
||
|
|
||
|
<li>Handle constructors of lambda captures
|
||
|
<p>Variables which are captured by value into a lambda require a call to
|
||
|
a copy constructor.
|
||
|
This call is not currently modeled.
|
||
|
</p>
|
||
|
</li>
|
||
|
|
||
|
<li>Handle constructors for default arguments
|
||
|
<p>Default arguments in C++ are recomputed at every call,
|
||
|
and are therefore local, and not static, variables.
|
||
|
See tests cases in <a href="https://github.com/llvm/llvm-project/tree/master/clang/test/Analysis/handle_constructors_for_default_arguments.cpp">handle_constructors_for_default_arguments.cpp</a>.
|
||
|
</p>
|
||
|
<p>
|
||
|
Default arguments are annoying because the initializer expression is
|
||
|
evaluated at the call site but doesn't syntactically belong to the
|
||
|
caller's AST; instead it belongs to the ParmVarDecl for the default
|
||
|
parameter. This can lead to situations when the same expression has to
|
||
|
carry different values simultaneously -
|
||
|
when multiple instances of the same function are evaluated as part of the
|
||
|
same full-expression without specifying the default arguments.
|
||
|
Even simply calling the function twice (not necessarily within the
|
||
|
same full-expression) may lead to program points agglutinating because
|
||
|
it's the same expression. There are some nasty test cases already
|
||
|
in temporaries.cpp (struct DefaultParam and so on). I recommend adding a
|
||
|
new LocationContext kind specifically to deal with this problem. It'll
|
||
|
also help you figure out the construction context when you evaluate the
|
||
|
construct-expression (though you might still need to do some additional
|
||
|
CFG work to get construction contexts right).
|
||
|
</p>
|
||
|
</li>
|
||
|
|
||
|
<li>Enhance the modeling of the standard library.
|
||
|
<p>The analyzer needs a better understanding of STL in order to be more
|
||
|
useful on C++ codebases.
|
||
|
While full library modeling is not an easy task,
|
||
|
large gains can be achieved by supporting only a few cases:
|
||
|
e.g. calling <code>.length()</code> on an empty
|
||
|
<code>std::string</code> always yields zero.
|
||
|
<p><i>(Difficulty: Medium)</i></p><p>
|
||
|
</li>
|
||
|
|
||
|
<li>Enhance CFG to model exception-handling.
|
||
|
<p>Currently exceptions are treated as "black holes", and exception-handling
|
||
|
control structures are poorly modeled in order to be conservative.
|
||
|
This could be improved for both C++ and Objective-C exceptions.
|
||
|
<p><i>(Difficulty: Hard)</i></p></p>
|
||
|
</li>
|
||
|
</ul>
|
||
|
</li>
|
||
|
|
||
|
<li>Core Analyzer Infrastructure
|
||
|
<ul>
|
||
|
<li>Handle unions.
|
||
|
<p>Currently in the analyzer the value of a union is always regarded as
|
||
|
an unknown.
|
||
|
This problem was
|
||
|
previously <a href="https://lists.llvm.org/pipermail/cfe-dev/2017-March/052864.html">discussed</a>
|
||
|
on the mailing list, but no solution was implemented.
|
||
|
<p><i> (Difficulty: Medium) </i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li>Floating-point support.
|
||
|
<p>Currently, the analyzer treats all floating-point values as unknown.
|
||
|
This project would involve adding a new <code>SVal</code> kind
|
||
|
for constant floats, generalizing the constraint manager to handle floats,
|
||
|
and auditing existing code to make sure it doesn't
|
||
|
make incorrect assumptions (most notably, that <code>X == X</code>
|
||
|
is always true, since it does not hold for <code>NaN</code>).
|
||
|
<p><i> (Difficulty: Medium)</i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li>Improved loop execution modeling.
|
||
|
<p>The analyzer simply unrolls each loop <tt>N</tt> times before
|
||
|
dropping the path, for a fixed constant <tt>N</tt>.
|
||
|
However, that results in lost coverage in cases where the loop always
|
||
|
executes more than <tt>N</tt> times.
|
||
|
A Google Summer Of Code
|
||
|
<a href="https://summerofcode.withgoogle.com/archive/2017/projects/6071606019358720/">project</a>
|
||
|
was completed to make the loop bound parameterizable,
|
||
|
but the <a href="https://en.wikipedia.org/wiki/Widening_(computer_science)">widening</a>
|
||
|
problem still remains open.
|
||
|
|
||
|
<p><i> (Difficulty: Hard)</i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li>Basic function summarization support
|
||
|
<p>The analyzer performs inter-procedural analysis using
|
||
|
either inlining or "conservative evaluation" (invalidating all data
|
||
|
passed to the function).
|
||
|
Often, a very simple summary
|
||
|
(e.g. "this function is <a href="https://en.wikipedia.org/wiki/Pure_function">pure</a>") would be
|
||
|
enough to be a large improvement over conservative evaluation.
|
||
|
Such summaries could be obtained either syntactically,
|
||
|
or using a dataflow framework.
|
||
|
<p><i>(Difficulty: Hard)</i></p><p>
|
||
|
</li>
|
||
|
|
||
|
<li>Implement a dataflow flamework.
|
||
|
<p>The analyzer core
|
||
|
implements a <a href="https://en.wikipedia.org/wiki/Symbolic_execution">symbolic execution</a>
|
||
|
engine, which performs checks
|
||
|
(use-after-free, uninitialized value read, etc.)
|
||
|
over a <em>single</em> program path.
|
||
|
However, many useful properties
|
||
|
(dead code, check-after-use, etc.) require
|
||
|
reasoning over <em>all</em> possible in a program.
|
||
|
Such reasoning requires a
|
||
|
<a href="https://en.wikipedia.org/wiki/Data-flow_analysis">dataflow analysis</a> framework.
|
||
|
Clang already implements
|
||
|
a few dataflow analyses (most notably, liveness),
|
||
|
but they implemented in an ad-hoc fashion.
|
||
|
A proper framework would enable us writing many more useful checkers.
|
||
|
<p><i> (Difficulty: Hard) </i></p></p>
|
||
|
</li>
|
||
|
|
||
|
<li>Track type information through casts more precisely.
|
||
|
<p>The <code>DynamicTypePropagation</code>
|
||
|
checker is in charge of inferring a region's
|
||
|
dynamic type based on what operations the code is performing.
|
||
|
Casts are a rich source of type information that the analyzer currently ignores.
|
||
|
<p><i>(Difficulty: Medium)</i></p></p>
|
||
|
</li>
|
||
|
|
||
|
</ul>
|
||
|
</li>
|
||
|
|
||
|
<li>Fixing miscellaneous bugs
|
||
|
<p>Apart from the open projects listed above,
|
||
|
contributors are welcome to fix any of the outstanding
|
||
|
<a href="https://bugs.llvm.org/buglist.cgi?component=Static%20Analyzer&list_id=147756&product=clang&resolution=---">bugs</a>
|
||
|
in the Bugzilla.
|
||
|
<p><i>(Difficulty: Anything)</i></p></p>
|
||
|
</li>
|
||
|
|
||
|
</ul>
|
||
|
|
||
|
</div>
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|