MAC (Mandatory Access Control) support for Linux Copyright 2000, Malcolm Beattie This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 1, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License or the Artistic License for more details. You should have received a copy of the GNU General Public License in the file named "Copying". If not, you can get one by writing to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. DESCRIPTION This is a very, very preliminary alpha release of Mandatory Access Control support for Linux 2.2. It concentrates on implementing compartments for separating a running Linux system into multiple mutually invisible (or mutually restricted) systems which can nevertheless be administered, when required, as a single system. This is done using the "integrity" part of a traditional B2 "trusted system" model which seems to have a lot more real-world uses than the "sensitivity" part. Because the emphasis is on compartments, the feature list of the current implementation may look a little odd: for example, per-compartment IP addresses, IP routing and process table visibility are all implemented whereas per-file labels are not. Reminder: this is a very, very preliminary alpha release. INSTALLATION There are five main files in this distribution: mac30-kernel-2.2.12.patch Patch against kernel 2.2.12 mac30-iproute2.patch Patch against iproute2-2.2.4-now-ss991023 mac30-userland.tar.gz Tarball for userland mac_label utilities mac30-ipchains.patch Patch against ipchains 1.3.9 mlsfs.c Kernel filesystem module for MLS. Step 1. Kernel configuration and build. Patch your 2.2.12 kernel with mac30-kernel-2.2.12.patch. The patch may or may not work with 2.2.13 or higher--I haven't tested it. Configure your kernel with your favourite "make config"-like target, turning on CONFIG_EXPERIMENTAL, CONFIG_SECURE_MAC, CONFIG_FIREWALL, CONFIG_IP_FIREWALL, CONFIG_IP_FIREWALL_NETLINK, CONFIG_IP_ALIAS, CONFIG_IP_ADVANCED_ROUTER, CONFIG_RTNETLINK, CONFIG_NETLINK, CONFIG_IP_MULTIPLE_TABLES, CONFIG_IP_ROUTE_FWMARK along with any other stuff you normally turn on. Build your kernel. Step 2. Kernel module(s). Place mlsfs.c in a directory somewhere (it doesn't have to go in your kernel source tree) and compile it with something along the lines of cc -Wall -o mlsfs.o -DMODULE -D__KERNEL__ -I/usr/src/linux/include \ -O2 -c mlsfs.c but replace -I/usr/src/linux/include by the path to the include directory of your patched kernel tree. This should produce mlsfs.o with no errors or warnings output. Step 3. Userland utilities. Get iproute2-2.2.4-now-ss991023.tar.gz (or a later version at your own risk if there's a later one by the time you read this) from its home site ftp://ftp.inr.ac.ru/ip-routing or, preferably, one of its mirrors: ftp://linux.wauug.org/pub/net ftp://ftp.nc.ras.ru/pub/mirrors/ftp.inr.ac.ru/ip-routing/ ftp://ftp.gts.cz/MIRRORS/ftp.inr.ac.ru/ ftp://ftp.funet.fi/pub/mirrors/ftp.inr.ac.ru/ip-routing/ (STM1 to USA) ftp://sunsite.icm.edu.pl/pub/Linux/iproute/ ftp://ftp.sunet.se/pub/Linux/ip-routing/ ftp://ftp.nvg.ntnu.no/pub/linux/ip-routing/ ftp://ftp.crc.ca/pub/systems/linux/ip-routing/ ftp://ftp.proxad.net/mirrors/ftp.inr.ac.ru/ip-routing/ ftp://donlug.dn.ua/pub/mirrors/ip-routing/ ftp://omni.rk.tusur.ru/mirrors/ftp.inr.ac.ru/ip-routing/ ftp://ftp.src.uchicago.edu/pub/linux/ip-routing/ http://www.asit.ro/ip-routing/ ftp://ftp.infoscience.co.jp/pub/linux/ip-routing/ (Japan) ftp://ftp.sucs.swan.ac.uk/pub/mirrors/ftp.inr.ac.ru/ip-routing http://mirror.schell.de/ftp.inr.ac.ru/ip-routing/ (Germany) ftp://ftp.gin.cz/MIRRORS/ftp.inr.ac.ru/ip-routing ftp://mirror.aarnet.edu.au/pub/ip-routing/ (Australia) http://mirror.aarnet.edu.au/pub/ip-routing/ (Australia) Apply the patch mac30-iproute2.patch from this distribution and follow the instructions in the iproute2 distribution in order to get the executable "ip". Get ipchains 1.3.9 (or a later version at your own risk if there's a later one by the time you read this) from ftp://ftp.rustcorp.com/ipchains/ (or from the ipchains RPM that comes with Red Hat 6.1). Apply the patch mac30-ipchains.patch from this distribution and follow the instructions in the ipchains distribution in order to get the executable "ipchains". Unpack mac30-userland.tar.gz from this distribution to obtain files: Makefile prlabel.c newlabel.c fslabel.c and compile them with make CFLAGS=-I/path/to/your/patched/kernel/include/directory That should produce executables prlabel, fslabel and newlabel. Step 4. Reboot. Reboot on your new kernel. Load the mlsfs.o module with insmod /path/to/wherever/you/put/mlsfs.o DESCRIPTION Here is a brief description of what the new kernel does. For details of what you can *do* with it, see lower down. The kernel includes a "struct mac_label" structure in every process/thread (struct task), socket (struct socket and struct sock), skbuff (struct sk_buff) and routing rule (struct rt_key). The mac_label structure records the "integrity" of the subject/object it's a part of (and can contain a "sensitivity" label in future). For those unfamiliar with integrity and sensitivity labels, basically each is a partial ordering (at least according to the Biba integrity model and Bell-LaPadula sensitivity model) where you are allowed to read information labelled with a higher integrity and lower sensitivity but not vice versa. (In fact, most implementations, this one included, only lets you *write* to an object with the *same* integrity/sensitivity). For those who want an even simpler explanation: forget about the above and, for now, just think of the mac_label as a compartment number. Things with different compartment numbers can't see or do anything to each other (except for compartment 0 which is a bit special). The compartment is one of the three component members of struct mac_label but you can ignore the other two for now. For a task (process or thread), the mac_label is inherited by children. A task's mac_label can be read/changed by prctl(PR_[GS]ET_MAC_LABEL) by root only (will be a separate capability). Sockets created by a task are labelled with the mac_label of the creating task. Packets sent out by a socket are labelled with the mac_label of that socket (and hence with the mac_label of the originating task). When an IP packet is to be sent out, the mac_label of the packet is used as part of the routing key: you can the advanced routing features of Linux to make each compartment have its own IP addresses and routing tables. A task can only "see" another task (i.e. send it a signal, see it in ps output or, in general, anything which uses get_task_by_pid()) if their mac_labels allow. Incoming IP packets can be labelled with a mac_label by means of any IP firewall matching pattern. A task can only receive a packet on a socket if that socket has a mac_label which allows it do so. Mounted filesystems can also be given a mac_label (by an ugly kludge, currently) and then any task can only access that filesystem if its mac_label allows it to do so. (Currently, it's a basic all-or-nothing choice: it doesn't distinguish between read v. write access and it's only at the granularity of mounted filesystems, not arbitrary directories or files.) The mlsfs module lets you make any ordinary directory (/var, say) reach other directories of your choice depending on the mac_label of the task that refers to it. For example, suppose /var has subdirectories comp1 and comp2, then via mlsfs you can configure the system so that tasks in compartment 1 referring to /var actually refer to /var/comp1 and tasks in compartment 2 referring to /var actually refer to /var/comp2. That means you can install software in two completely separate systems running in two separate compartments. This differs from automounting and dynamic symlinks in three ways: (1) the directory "redirected" can be anywhere on the ordinary filesystem: it doesn't turn into a mount point and processes looking at the directory itself (stat/lstat) see the data for the target directory (e.g. /var/comp1). (2) Target directories for other tasks are not accessible via another part of the filesystem (there's no "/realvar/comp2" which a comp1 task could sneak a look at). (Actually, the implementation of mlsfs makes the target directories available via a backdoor in the mlsfs filesystem itself but that's an implementation issue and can be nailed down in various ways). (3) the directory "/var/comp1" truly appears to be "/var" to tasks in comp1 so that doing cd /var/lib/foo pwd in a task in comp1 will still show "/var/lib/foo" and "cd .." from "/var/lib" will put in you what appears to be "/var". The "comp1" part of the filesystem namespace is simply squashed out of existence for comp1 at the dcache level. MAC_LABEL ACCESS RULES Each mac_label actually has three components: grade, compartment and flags. For those familiar with B2/trusted systems, the compartment is the non-hierarchical part of the label and the other two, grade and flags, make up the hierarchical part (grade ordered numerically and flas ordered by bitmask inclusion). Following the KISS principle, I don't allow multiple labels on the same object. For those unfamiliar with B2/trusted systems, forget grade and compartment for now: always set them to 0 and just consider the compartment: a 16-bit number. Suppose a subject (think "task", "process" or "program") in compartment m (i.e. with the compartment member of its mac_label equal to m) wants to access an object (another task, filesystem, data packet, whatever) in compartment n. For "read access" checks (seeing another process, having the equivalent of execute permission on a mount point directory), it needs either m == n or one of m and n being zero. So, for example, doing a "ps auxw" from a process in compartment 1 sees all compartment 0 and compartment 1 processes but no others. Other checks (routing table matches, mlsfs redirection) check for an exact mac_label match. So, for an outgoing packet labelled with compartment 1, it will only look in routing tables associated with routing rules marked with compartment 1 (and not even those in compartment 0). USAGE (1) Changing the label of a task. # prlabel -p pid grade compartment flags Example # prlabel -p $$ 0 1 0 puts the current shell process in compartment 1. If that's the first label change you've made, it'll still be able to see all other processes (which are in the default compartment 0). However, if you go to another shell prompt and do # prlabel -p $$ 0 2 0 then you'll find that "ps" from one won't show up the other process and "kill" with its process number won't find it. (2) Routing by mac_label If you're not familiar with using Alexey Kuznetsov's "ip" program for manipulating his new advanced Linux routing features, then use the TeX-format documentation in doc/ip-cref.tex of the iproute2 source distribution mentioned above. I've LaTeX'd it myself and made the PostScript available as ftp://ftp.ox.ac.uk/pub/linux/iproute2-ss991023-cref.ps for those that can't LaTeX it easily themselves. Read that documentation (preferably with a cold towel around the forehead) and then read it again, particularly the sections on "ip route" and "ip rule". The functionality in this mac_label patch lets you mark each rule with a mac_label. Whenever an IP packet is to be routed out, the routing code will only apply those routing rules with a matching mac_label. Note that, in particular, the routing table set up as your system starts up and before changing any mac_labels appears as compartment (and grade and flags) zero. Any tasks in non-zero compartments won't be able to route packets outside the host at all since the default rules which look in the local/default tables are labelled with compartment zero. Example Suppose you're on subnet 10.0.0.0/24 and your subnet's router is 10.0.0.254. Your ordinary IP address is 10.0.0.10 but you want your "virtual system" in compartment 1 to have address 10.0.0.11. First bring up your ordinary networking. Then create an IP alias of 10.0.0.11 with # ifconfig eth0:0 10.0.0.11 netmask 255.255.255.0 broadcast 10.0.0.255 (or the equivalent "ip" command). Create a new routing table (call it table 1 say: as the iproute2 documentation says, you can pick any number between 1 and 252) and add to it your routes for compartment 1. Note that you can choose the source address for packets in this routing table and it is that that makes ordinary networking stuff choose the right address when running in your compartment (thank Alexey for the route-by-source and route-choose-source stuff): # ip route add table 1 10.0.0.0/24 dev eth0 src 10.0.0.11 # ip route add table 1 default via 10.0.0.254 src 10.0.0.11 Now add a rule to the routing rules list which has compartment 1 as its mac_label and which says to lookup routing table 1: # ip rule add mac_label 0,1,0 table 1 Now if you do # prlabel -p $$ 0 1 0 to set your compartment to 1 you find that outgoing packets are stamped with a sender address of 10.0.0.11 and that, as routing table 1 says, you can reach the network as normal otherwise. Of course, you can use the whole gamut of routing features if you want and do such things as limiting the bandwidth available to tasks in particular compartments or change their quality of service or route via different networks or whatever. (3) Labelling incoming IP packets with ipchains Using ipchains (patched as described above) you can use firewall matching rules to pick out desired IP packets and label them with a mac_label of your choice. To do this, use ipchains (patched as described above) as follows: # ipchains -A input [any matching options] -m 0xgccczzzz --set-zone where the argument to -m (the "firewall mark" option) is hex-encoded as shown with 4 bits of grade g, 12 bits of compartment c and 16 bits of flags z (and note the two leading hyphens of --set-zone). Yes, this is a disgusting kludge but with only 32 bits of firewall mark it will have to suffice until I extend the firewall code to handle labels properly. The result is that any incoming packet which matches your "[any matching options]" will be labelled with the chosen mac_label. This means that only sockets with appropriate labels will be able to receive that packet. If a particular socket doesn't have a label which allows it to see the labelled packet then the packet is silently dropped (or possibly it logs a debugging kernel message depending on how I left the debug preprocessor option by default). You can also do # ipchains -A output [any matching options] -m 0xgccczzzz --check-zone on the output (or, probably, forwarding) firewall chain. Outgoing packets will already be stamped with the sending socket's label and that firewall rule will only allow the packet through if the labels are mutually appropriate. In most cases, it will probably be better to use the per-label routing feature described above rather than this check-zone option. (4) Multi-Level filesystems via the mlsfs module. Ensure you've done an insmod of mlsfs.o. You should then find that "cat /proc/filesystems" shows a new (pseudo)filesystem registered with name mls. Mount this filesystem as follows # mkdir /mls if that directory doesn't exist. # mount -t mls none /mls (You can mount it anywhere you like, in fact. Unlike /proc, no userland programs know or care where the filesystem is mounted.) Example Let's say you have two compartments, 1 and 2, and you want them to have completely separate /var hierarchies. Actually, let's take the example of /var/lib instead because I've just realised that mlsfs doesn't work for mount points at the moment (I know why: it will be fixed in the next release along with a new way to do multi-level directories). We'll make them appear to the underlying system as /var/lib/comp1 and /var/lib/comp2. Instead of having special userland utilities to manipulate these bindings, the mlsfs presents a pseudo-filesystem view and lets you creates/delete these bindings with mkdir/rmdir/ln/rm instead. To create a new binding (i.e. to map something like "/var" to different targets) create a directory in /mls called anything you like: we'll choose # mkdir /mls/varcomp Make a symlink in that directory called "base" (a magic name) which points at the source directory (/var in our example). # ln -s /var/lib /mls/varcomp/base The mls filesystem immediately changes the dentry structure for /var/lib so that any task trying to access it gets an actual directory keyed by its mac_label. Since we haven't added any per-mac_label targets yet, no processes can refer to /var/lib any more. Now do # ln -s comp1 /mls/varcomp/00:0001:00000000 which will make tasks with mac_label grade 0, compartment 1, flags 0 access real directory /var/lib/comp1 whenever it refers to /var/lib. That label format there (gg:cccc:ffffffff) must be in hex and in exactly that format (i.e. 2 hex chars for the first field etc.). Similarly, # ln -s comp2 /mls/varcomp/00:0002:00000000 will make tasks in compartment 2 (and grade 0, flags 0 as usual) access /var/lib/comp2 whenever it refers to /var/lib. You may well want to do # ln -s . /mls/varcomp/00:0000:00000000 so that compartment 0 tasks get to see the /var directory itself otherwise the task in which you're doing these "ls" commands and all other compartment 0 tasks won't be able to see /var at all. The next release will have an alternative way of doing multi-level directories which looks more like the Trusted IRIX "moldy bit" method. (5) Changing the label of filesystems Currently it's a kludge. Doing # fslabel /foo/some/file g c f (for a grade g, compartment c and flags f) will label the mounted filesystem on which /foo/some/file lives as having that label. Tasks with labels which can't access label g,c,f behave as though they don't have execute permission on that mount point. The next release (or so) will have a way of labelling files or directories in an inheritable way at a VFS or dcache level. This has the advantage that no filesystem changes are necessary but has the disadvantage that labels won't be stored permanently in the filesystem. For some situations (marking /tcb with a high integrity label or marking particular directories with compartment labels for particular projects) this should be sufficient and practical. TODO This is all at a very early stage. A lot of the interfaces are horrible (all the different methods of specifying labels for example). The code in the kernel itself is mostly fine, though: it has a kernel label type (struct mac_label) and a user-mode label type (struct mac_label_data) and the appropriate separation is enforced. Most of the ugliness comes from the way I've quickly hacked programs (ipchains, mlsfs etc.) to handle label input. That will all be improved. Currently, only compartments are thoroughly tested and using different grades (to get "minthigh" system binaries/libraries/config files, for example) probably won't work properly. The mac_access_ok/mac_labels_equal stuff needs sorting out into mac_read_ok/mac_write_ok/mac_labels_equal. The permissions to do mac_label changing are currently allowed to CAP_SYS_ADMIN (root, mostly) but that will all be nailed down to its own capability/ies along with label-range checking and all the rest. It will be possible to allow almost full root access in a compartment in such a way that that root cannot affect tasks in other compartments at all. It should end up integrating with the secure capability stuff quite nicely. Per-compartment or per-label resource tracking and allocation needs doing, e.g. only let compartment n have so-much memory, so-much filesystem quota, so many processes, maybe so-much CPU. Bandwidth allocation per-compartment is probably already doable with Alexey's cbq/tc stuff. AVAILABILITY This distribution is available from ftp://ftp.ox.ac.uk/pub/linux/mac30-20000214.tar.gz or http://users.ox.ac.uk/~mbeattie/linux/mac30-20000214.tar.gz (note the first URL is ftp, the second URL is http). CONTACT Please don't mail me and ask me questions expecting a quick answer (or possibly an answer at all). I already get too much email and I tend to treat it like IP and drop it on the floor if I get saturated. Comments, suggestions and such-like are always appreciated of course. If there's enough interest, I'll set up a mailing list. Malcolm Beattie mbeattie@sable.ox.ac.uk http://users.ox.ac.uk/~mbeattie/ 14 February 2000