You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
167 lines
5.9 KiB
167 lines
5.9 KiB
.TH MINIJAIL0 "5" "July 2011" "Chromium OS" "User Commands"
|
|
.SH NAME
|
|
minijail0 \- sandbox a process
|
|
.SH DESCRIPTION
|
|
.PP
|
|
Runs PROGRAM inside a sandbox. See \fBminijail0\fR(1) for details.
|
|
.SH EXAMPLES
|
|
|
|
Safely switch from root to nobody while dropping all capabilities and
|
|
inheriting any groups from nobody:
|
|
|
|
# minijail0 -c 0 -G -u nobody /usr/bin/whoami
|
|
nobody
|
|
|
|
Run in a PID and VFS namespace without superuser capabilities (but still
|
|
as root) and with a private view of /proc:
|
|
|
|
# minijail0 -p -v -r -c 0 /bin/ps
|
|
PID TTY TIME CMD
|
|
1 pts/0 00:00:00 minijail0
|
|
2 pts/0 00:00:00 ps
|
|
|
|
Running a process with a seccomp filter policy at reduced privileges:
|
|
|
|
# minijail0 -S /usr/share/minijail0/$(uname -m)/cat.policy -- \\
|
|
/bin/cat /proc/self/seccomp_filter
|
|
...
|
|
|
|
.SH SECCOMP_FILTER POLICY
|
|
The policy file supplied to the \fB-S\fR argument supports the following syntax:
|
|
|
|
\fB<syscall_name>\fR:\fB<ftrace filter policy>\fR
|
|
\fB<syscall_number>\fR:\fB<ftrace filter policy>\fR
|
|
\fB<empty line>\fR
|
|
\fB# any single line comment\fR
|
|
|
|
Long lines may be broken up using \\ at the end.
|
|
|
|
A policy that emulates \fBseccomp\fR(2) in mode 1 may look like:
|
|
read: 1
|
|
write: 1
|
|
sig_return: 1
|
|
exit: 1
|
|
|
|
The "1" acts as a wildcard and allows any use of the mentioned system
|
|
call. More advanced filtering is possible if your kernel supports
|
|
CONFIG_FTRACE_SYSCALLS. For example, we can allow a process to open any
|
|
file read only and mmap PROT_READ only:
|
|
|
|
# open with O_LARGEFILE|O_RDONLY|O_NONBLOCK or some combination
|
|
open: arg1 == 32768 || arg1 == 0 || arg1 == 34816 || arg1 == 2048
|
|
mmap2: arg2 == 0x0
|
|
munmap: 1
|
|
close: 1
|
|
|
|
The supported arguments may be found by reviewing the system call
|
|
prototypes in the Linux kernel source code. Be aware that any
|
|
non-numeric comparison may be subject to time-of-check-time-of-use
|
|
attacks and cannot be considered safe.
|
|
|
|
\fBexecve\fR may only be used when invoking with CAP_SYS_ADMIN privileges.
|
|
|
|
In order to promote reusability, policy files can include other policy files
|
|
using the following syntax:
|
|
|
|
\fB@include /absolute/path/to/file.policy\fR
|
|
\fB@include ./path/relative/to/CWD/file.policy\fR
|
|
|
|
Inclusion is limited to a single level (i.e. files that are \fB@include\fRd
|
|
cannot themselves \fB@include\fR more files), since that makes the policies
|
|
harder to understand.
|
|
|
|
.SH SECCOMP_FILTER SYNTAX
|
|
More formally, the expression after the colon can be an expression in
|
|
Disjunctive Normal Form (DNF): a disjunction ("or", \fI||\fR) of
|
|
conjunctions ("and", \fI&&\fR) of atoms.
|
|
|
|
.SS "Atom Syntax"
|
|
Atoms are of the form \fIarg{DNUM} {OP} {VAL}\fR where:
|
|
.IP
|
|
\[bu] \fIDNUM\fR is a decimal number
|
|
|
|
\[bu] \fIOP\fR is an unsigned comparison operator:
|
|
\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, \fI>=\fR, \fI&\fR (flags set),
|
|
or \fIin\fR (inclusion)
|
|
|
|
\[bu] \fVAL\fR is a constant expression. It can be a named constant (like
|
|
\fBO_RDONLY\fR), a number (octal, decimal, or hexadecimal), a mask of constants
|
|
separated by \fI|\fR, or a parenthesized constant expression. Constant
|
|
expressions can also be prefixed with the bitwise complement operator \fI~\fR
|
|
to produce their complement.
|
|
.RE
|
|
|
|
\fI==\fR, \fI!=\fR, \fI<\fR, \fI<=\fR, \fI>\fR, and \fI>=\fR should be pretty
|
|
self explanatory.
|
|
|
|
\fI&\fR will test for a flag being set, for example, O_RDONLY for
|
|
.BR open (2):
|
|
|
|
open: arg1 & O_RDONLY
|
|
|
|
Minijail supports most common named constants, like O_RDONLY.
|
|
It's preferable to use named constants rather than numeric values as not all
|
|
architectures use the same numeric value.
|
|
|
|
When the possible combinations of allowed flags grow, specifying them all can
|
|
be cumbersome.
|
|
This is where the \fIin\fR operator comes handy.
|
|
The system call will be allowed iff the flags set in the argument are included
|
|
(as a set) in the flags in the policy:
|
|
|
|
mmap: arg3 in MAP_PRIVATE|MAP_ANONYMOUS
|
|
|
|
This will allow \fBmmap\fR(2) as long as \fIarg3\fR (flags) has any combination
|
|
of MAP_PRIVATE and MAP_ANONYMOUS, but nothing else. One common use of this is
|
|
to restrict \fBmmap\fR(2) / \fBmprotect\fR(2) to only allow write^exec
|
|
mappings:
|
|
|
|
mmap: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
|
|
mprotect: arg2 in ~PROT_EXEC || arg2 in ~PROT_WRITE
|
|
|
|
.SS "Return Values"
|
|
|
|
By default, blocked syscalls call the process to be killed.
|
|
The \fIreturn {NUM}\fR syntax can be used to force a specific errno to be
|
|
returned instead.
|
|
|
|
read: return EBADF
|
|
|
|
This expression will block the \fBread\fR(2) syscall, make it return -1, and set
|
|
\fBerrno\fR to EBADF (9 on x86 platforms).
|
|
|
|
An expression can also include an optional \fIreturn <errno>\fR clause,
|
|
separated by a semicolon:
|
|
|
|
read: arg0 == 0; return EBADF
|
|
|
|
This is, if the first argument to read is 0, then allow the syscall;
|
|
else, block the syscall, return -1, and set \fBerrno\fR to EBADF.
|
|
|
|
.SH SECCOMP_FILTER POLICY WRITING
|
|
|
|
Determining policy for seccomp_filter can be time consuming. System
|
|
calls are often named in arch-specific, or legacy tainted, ways. E.g.,
|
|
geteuid versus geteuid32. On process death due to a seccomp filter
|
|
rule, the offending system call number will be supplied with a best
|
|
guess of the ABI defined name. This information may be used to produce
|
|
working baseline policies. However, if the process being contained has
|
|
a fairly tight working domain, using \fBtools/generate_seccomp_policy.py\fR
|
|
with the output of \fBstrace -f -e raw=all <program>\fR can generate the list
|
|
of system calls that are needed. Note that when using libminijail or minijail
|
|
with preloading, supporting initial process setup calls will not be required.
|
|
Be conservative.
|
|
|
|
It's also possible to analyze the binary checking for all non-dead
|
|
functions and determining if any of them issue system calls. There is
|
|
no active implementation for this, but something like
|
|
code.google.com/p/seccompsandbox is one possible runtime variant.
|
|
|
|
.SH AUTHOR
|
|
The Chromium OS Authors <chromiumos-dev@chromium.org>
|
|
.SH COPYRIGHT
|
|
Copyright \(co 2011 The Chromium OS Authors
|
|
License BSD-like.
|
|
.SH "SEE ALSO"
|
|
.BR minijail0 (1)
|