You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
164 lines
9.1 KiB
164 lines
9.1 KiB
4 months ago
|
Here's how mount actually works:
|
||
|
|
||
|
The mount comand calls the mount system call, which has five arguments you
|
||
|
can see on the "man 2 mount" page:
|
||
|
|
||
|
int mount(const char *source, const char *target, const char *filesystemtype,
|
||
|
unsigned long mountflags, const void *data);
|
||
|
|
||
|
The command "mount -t ext2 /dev/sda1 /path/to/mntpoint -o ro,noatime",
|
||
|
parses its command line arguments to feed them into those five system call
|
||
|
arguments. In this example, the source is "/dev/sda1", the target is
|
||
|
"/path/to/mountpoint", and the filesystemtype is "ext2".
|
||
|
|
||
|
The other two syscall arguments (mountflags and data) come from the
|
||
|
"-o option,option,option" argument. The mountflags argument goes to the VFS
|
||
|
(explained below), and the data argument is passed to the filesystem driver.
|
||
|
|
||
|
The mount command's options string is a list of comma separated values. If
|
||
|
there's more than one -o argument on the mount command line, they get glued
|
||
|
together (in order) with a comma. The mount command also checks the file
|
||
|
/etc/fstab for default options, and the options you specify on the command
|
||
|
line get appended to those defaults (if any). Most other command line mount
|
||
|
flags are just synonyms for adding option flags (for example
|
||
|
"mount -o remount -w" is equivalent to "mount -o remount,rw"). Behind the
|
||
|
scenes they all get appended to the -o string and fed to a common parser.
|
||
|
|
||
|
VFS stands for "Virtual File System" and is the common infrastructure shared
|
||
|
by different filesystems. It handles common things like making the filesystem
|
||
|
read only. The mount command assembles an option string to supply to the "data"
|
||
|
argument of the option syscall, but first it parses it for VFS options
|
||
|
(ro,noexec,nodev,nosuid,noatime...) each of which corresponds to a flag
|
||
|
from #include <sys/mount.h>. The mount command removes those options from the
|
||
|
sting and sets the corresponding bit in mountflags, then the remaining options
|
||
|
(if any) form the data argument for the filesystem driver.
|
||
|
|
||
|
A few quick implementation details: the mountflag MS_SILENCE gets set by
|
||
|
default even if there's nothing in /etc/fstab. Some actions (such as --bind
|
||
|
and --move mounts, I.E. -o bind and -o move) are just VFS actions and don't
|
||
|
require any specific filesystem at all. The "-o remount" flag requires looking
|
||
|
up the filesystem in /proc/mounts and reassembling the full option string
|
||
|
because you don't _just_ pass in the changed flags but have to reassemble
|
||
|
the complete new filesystem state to give the system call. Some of the options
|
||
|
in /etc/fstab are for the mount command (such as "user" which only does
|
||
|
anything if the mount command has the suid bit set) and don't get passed
|
||
|
through to the system call.
|
||
|
|
||
|
When mounting a new filesystem, the "filesystem" argument to the mount system
|
||
|
call specifies which filesystem driver to use. All the loaded drivers are
|
||
|
listed in /proc/filesystems, but calling mount can also trigger a module load
|
||
|
request to add another. A filesystem driver is responsible for putting files
|
||
|
and subdirectories under the mount point: any time you open, close, read,
|
||
|
write, truncate, list the contents of a directory, move, or delete a file,
|
||
|
you're talking to a filesystem driver to do it. (Or when you call
|
||
|
ioctl(), stat(), statvfs(), utime()...)
|
||
|
|
||
|
Different drivers implement different filesystems, which have four categories:
|
||
|
|
||
|
1) Block device backed filesystems, such as ext2 and vfat.
|
||
|
|
||
|
This kind of filesystem driver acts as a lens to look at a block device
|
||
|
through. The source argument for block backed filesystems is a path to a
|
||
|
block device, such as "/dev/hda1", which stores the contents of the
|
||
|
filesystem in a fixed block of sequential storage, and there's a seperate
|
||
|
driver providing that block device.
|
||
|
|
||
|
Block backed filesystems are the "conventional" filesystem type most people
|
||
|
think of when they mount things. The name means that the "backing store"
|
||
|
(where the data lives when the system is switched off) is on a block device.
|
||
|
|
||
|
2) Server backed filesystems, such as cifs/samba or fuse.
|
||
|
|
||
|
These drivers convert filesystem operations into a sequential stream of
|
||
|
bytes, which it can send through a pipe to talk to a program. The filesystem
|
||
|
server could be a local Filesystem in Userspace daemon (connected to a local
|
||
|
process through a pipe filehandle), behind a network socket (CIFS and v9fs),
|
||
|
behind a char device (/dev/ttyS0), and so on. The common attribute is there's
|
||
|
some program on the other end sending and receiving a sequential bytestream.
|
||
|
The backing store is a server somewhere, and the filesystem driver is talking
|
||
|
to a process that reads and writes data in some known protocol.
|
||
|
|
||
|
The source argument for these filesystems indicates where the filesystem lives. It's often in a URL-like format for network filesystems, but it's really just a blob of data that the filesystem driver understands.
|
||
|
|
||
|
A lot of server backed filesystems want to open their own connection so they
|
||
|
don't have to pass their data through a persistent local userspace process,
|
||
|
not really for performance reasons but because in low memory situations a
|
||
|
chicken-and-egg situation can develop where all the process's pages have
|
||
|
been swapped out but the filesystem needs to write data to its backing
|
||
|
store in order to free up memory so it can swap the process's pages back in.
|
||
|
If this mechanism is providing the root filesystem, this can deadlock and
|
||
|
freeze the system solid. So while you _can_ pass some of them a filehandle,
|
||
|
more often than not you don't.
|
||
|
|
||
|
These are also known as "pipe backed" filesystems (or "network filesystems"
|
||
|
because that's a common case, although a network doesn't need to be inolved).
|
||
|
Conceptually they're char device backed filesystems (analogus to the block
|
||
|
device backed ones), but you don't commonly specify a character device in
|
||
|
/dev when mounting them because you're talking to a specific server process,
|
||
|
not a whole machine.
|
||
|
|
||
|
3) Ram backed filesystems, such as ramfs and tmpfs.
|
||
|
|
||
|
These are very simple filesystems that don't implement a backing store. Data
|
||
|
written to these gets stored in the disk cache, and the driver ignores requests
|
||
|
to flush it to backing store (reporting all the pages as pinned and
|
||
|
unfreeable).
|
||
|
|
||
|
These drivers essentially mount the VFS's page/dentry cache as if it was a
|
||
|
filesystem. (Page cache stores file contents, dentry cache stores directory
|
||
|
entries.) They grow and shrink dynamically, as needed: when you write files
|
||
|
into them they allocate more memory to store it, and when you delete files
|
||
|
the memory is freed.
|
||
|
|
||
|
There's a simple one (ramfs) that does only that, and a more complex one (tmpfs)
|
||
|
which adds a size limitation (by default 50%, but it's adjustable as a mount
|
||
|
option) so the system doesn't run out of memory and lock up if you
|
||
|
"cat /dev/zero > file", and can also report how much space is remaining
|
||
|
when asked (ramfs always says 0 bytes free). The other thing tmpfs does
|
||
|
is write its data out to swap space (like processes do) when the system
|
||
|
is under memory proessure.
|
||
|
|
||
|
Note that "ramdisk" is not the same as "ramfs". The ramdisk driver uses a
|
||
|
chunk of memory to implement a block device, and then you can format that
|
||
|
block device and mount it with a block device backed filesystem driver.
|
||
|
(This is the same "two device drivers" approach you always have with block
|
||
|
backed filesystems: one driver provides /dev/ram0 and the second driver mounts
|
||
|
it as vfat.) Ram disks are significantly less efficient than ramfs,
|
||
|
allocating a fixed amount of memory up front for the block device instead of
|
||
|
dynamically resizing itself as files are written into an deleted from the
|
||
|
page and dentry caches the way ramfs does.
|
||
|
|
||
|
Note: initramfs cpio, tmpfs as rootfs.
|
||
|
|
||
|
4) Synthetic filesystems, such as proc, sysfs, devpts...
|
||
|
|
||
|
These filesystems don't have any backing store either, because they don't
|
||
|
store arbitrary data the way the first three types of filesystems do.
|
||
|
|
||
|
Instead they present artificial contents, which can represent processes or
|
||
|
hardware or anything the driver writer wants them to show. Listing or reading
|
||
|
from these files calls a driver function that produces whatever output it's
|
||
|
programmed to, and writing to these files submits data to the driver which
|
||
|
can do anything it wants with it.
|
||
|
|
||
|
Synthetic ilesystems are often implemented to provide monitoring and control
|
||
|
knobs for parts of the operating system. It's an alternative to adding more
|
||
|
system calls (or ioctl, sysctl, etc), and provides a more human friendly user
|
||
|
interface which programs can use but which users can also interact with
|
||
|
directly from the command line via "cat" and redirecting the output of
|
||
|
"echo" into special files.
|
||
|
|
||
|
|
||
|
Those are the four types of filesystems: backing store can be a fixed length
|
||
|
block of storage, backing store can be some server the driver connects to,
|
||
|
backing store can not exist and the files merely reside in the disk cache,
|
||
|
or the filesystem driver can just make up its contents programmatically.
|
||
|
|
||
|
And that's how filesystems get mounted, using the mount system call which has
|
||
|
five arguments. The "filesystem" argument specifies the driver implementing
|
||
|
one of those filesystems, and the "source" and "data" arguments get fed to
|
||
|
that driver. The "target" and "mountflags" arguments get parsed (and handled)
|
||
|
by the generic VFS infrastructure. (The filesystem driver can peek at the
|
||
|
VFS data, but generally doesn't need to care. The VFS tells the filesystem
|
||
|
what to do, in response to what userspace said to do.)
|