FreeBSD Diskless Booting

Our intent is to use the facilities provided by the OS in to reduce per-machine updates and configuration to a minimum. With only a single OS installation across multiple machines, we also hope to make upgrades easier, to allow easy retrogression, and to allow easy substitution of one piece of hardware for another. Our motivation has nothing to do with saving the cost of a boot disk, and we are not using the clients as xterms. This is not installing from the net. Not that there is anything wrong with that, but we have a different purpose. Our systems are not really diskless - but they boot from a centralized and shared network resource and provide remote compute, storage and network services.

We have experience with PXE booting FreeBSD 5.4, 6.0, 6.2, 7.0, 7.2, 8.0, 8.1 and 9.0beta3 using the exact methods documented here.

All bootp and all earlier versions of FreeBSD have quite different requirements for diskless booting which are covered in many other tutorials on the web and in /etc/share/examples/diskless to this day. These instructions will be quite misleading when applied to any recent release. If you see instructions greatly different from those here, they are likely for FreeBSD 4. or prior. The only other sources of information I am aware of for 5.4+ is the recently published 2nd edition of the book "Absolute FreeBSD" by Michael Lucas, and a postings by Eric Norgaard , Andy Thomas and Warren Block. The FreeBSD handbook has recently been revised, but it still does not have enough information to actually complete the task.

Overview

The client PXE boot code in ROM directs the client as it obtains from the dhcp server the IP address of the tftp server (called "next-server" by dhcpd) obtains the boot loader (called "pxeboot" by FreeBSD), and the name of the NFS server exporting the root directory. The boot loader arranges to NFS mount the root directory read only. Once root is mounted, the script /etc/rc.initdiskless runs on the client to create and populate /etc and /var, and otherwise readies the machine for login. This happens for each boot.

While /usr, /bin and /sbin can be shared across multiple systems without change, /var and /etc contain many per-client files, and many writable files, which require special treatment. During the diskless boot, /etc/rc.initdiskless creates and populates in memory versions of /var and /etc according to the contents of the /pxeroot/conf directory. While that directory is read-only, the in memory copies on the client can be written, although the contents are lost on reboot. Unfortunately many applications like to write outside the user home directory, and these require some customization to operate satisfactorily. An alternative to memory file systems would be NFS mounted persistent file systems, but after experimenting with those, we found the memory filesystems much easier to deal with. They only consume a couple of megabytes of RAM.

Conventions

In this document, the hostname of the diskless server is bsdboot, the clients are client1, client2... and all filenames and paths are on the server.

Client CMOS setup

Most any PC with a motherboard ethernet manufactured since 1999 can be set for PXE boot, however the correct settings are always well hidden and never documented. Generally you need to turn on the ethernet port, turn on the LAN boot ROM, and set the boot sequence to include LAN on three separate menus. Sometimes PXE support is called "Legacy LAN", but that works too. Sometimes you will need to "clear ESCD" also (resetting to factory defaults does not include clearing ESCD). Lately I have had to turn off fast boot and UEFI and set OS to "other" before PXE booting worked. Sometimes the LAN doesn't appear on the boot menu till the other flags have been set and CMOS saved and reloaded. Sometimes you can't make it work and must add a PCI card.

If you use a bootable PCI ethernet card, it should show a configuration prompt on the screen for a second or so before the main bios takes effect. Make sure the client displays some kind of "attempting to boot from network" message, otherwise it probably isn't trying. That message will include the MAC address, and since vendors stopped putting little printed stickers on the cards that is likely the way you will get the address - which is needed for the DHCP server configuration. Another way to get the MAC address is from an installed operating system using the ifconfig command (Unix) or ipconfig /all (Windows) or to tail /var/db/dhcp/dhcp-leases on the dhcp server..

Intel Pro/100 and Pro/1000 Desktop cards are the only cards generally available at retail that will support PXE. Don't expect to see it mentioned on the box or manual, however. The regular cards have always worked for us, but the Server version doesn't work without some modification we have not attempted.. They have some configuration options, however we use them as they come out of the box, with no changes.

There is a huge literature on the web about creating boot roms and boot floppies, etc. This literature fills a much needed gap and I am sure it discourages people from attempting to boot from the net. Modern motherboards come with functional PXE support. If it is buggy, that does not affect FreeBSD.

Regardless of PXE support in the motherboard firmware, your FreeBSD kernel may not support the onboard ethernet, in which case the kernel will stop with a "nfs_diskless: No interface" prompt. This is almost always the situation with any new to market motherboard, even if it claims to use a supported chip set for LAN connections. We have also found that while the motherboard ethernet may function, it may fall over with heavy use, often late in the nfs portion of the boot process. In that case we substitute an Intel card, which always seems to work.

Server installation

We start with a blank system on our intended boot server and do a "Standard Installation" with "All system sources, binaries and Xwindows" software, and make minimal modifications from there. No doubt much less than a full install is required, but we want something easily reproducible both by us and by any reader of this missive. A disadvantage of our procedure is that we don't learn what a minimal install would look like.

During the installation we accepted all defaults except for automatic partitioning. We place the read-only "pxeroot" directory on its own partition (mounted as /pxeroot to match the default requested by pxeboot). This partition does not need to be large - several hundred megabytes is sufficient unless you wish to add many ports. We allocate 4 gig, and it remains mostly empty.

Once the system is up, check that /pxeroot is mounted and writable. Sysinstall seems a bit scatterbrained about unfamiliar partitions, so you may need to establish a mount point, get the /dev name from /etc/fstab and newfs the partition:

mkdir /pxeroot
newfs /dev/ad...

Reboot and see that the partition mounts from the fstab.

TFTPD

In FreeBSD 6.0+ xinetd is protected by /etc/hosts.allow, which disallows all tftp requests from other than localhost. I suggest adding the following to that file:

tftpd: client1 client2 : ALLOW

You can use any TFTPD daemon to serve the boot loader. One is part of the default install, but is not turned on. Edit /etc/inetd.conf to uncomment the "UDP" version of tftpd:

tftp dgram udp wait root /usr/libexec/tftpd tftpd -l -s /tftpboot

Restart inetd with an appropriate kill -HUP pid.

The PXE bootloader is shipped in an obscure location and needs to be copied to /tftpboot. Make sure it is readable (but not writable) by all.

 cp /boot/pxeboot /tftpboot

A tftp client is part of most Unix installations (not RedHat, though), so after a reboot go to another system and test the server with:

.tftp bsdboot
.get pxeboot

It is probably a good idea to test after each step of the installation - if you wait till the end the symptoms won't seem very diagnostic unless you have considerable experience with FreeBSD.

DHCP

I won't discuss installing dhcpd - it seems likely you already have one, and another could cause conflicts, You will want to add some parameters to dhcpd.conf for the diskless booting group, and some for each of the client systems:

group {
next-server=66.251.72.8;
filename "pxeboot";
option root-path "66.251.72.8:/pxeroot";

host client1; (fixed-address client1; hardware ethernet 0:02:55:97:c9:15;)
host client2; (fixed-address client2; hardware ethernet 0:02:55:97:c9:16;}
}

where next-server specifies the tftp server, root-path points to FreeBSD boot server and the client hostnames are in your DNS. The filename and root-path specifications are actually optional, since below we use the default path on the default server, but there is no default for next-server. There is bug report docs/39348 claiming that "option host-name" is required - we haven't found that to be true. We use the ISC dhcp server and have not experienced any problems with it.

Restart your DHCP server and give the client a reboot to test your progress. The client console should show pxeboot load, and show the IP addresses for the boot server and gateway and the root path. If it doesn't, check the MAC address Of course it will also show an error message for failing to find a kernel. That is our next step.

Making the actual FreeBSD installation

Theoretically one should be able to export the root of the server for the client machines to use, but that would require (among other things) / and /usr to be in a single partition, which isn't the standard install. So we recompile world to get a root for the clients. This will do it:
setenv DESTDIR /pxeroot
cd /usr/src
make world
make kernel
cd etc
make distribution
mkdir  $DESTDIR/boot
cp /boot/device.hints $DESTDIR/boot

In our installation no kernel modifications were required and no configurations are edited. Make takes about an hour on our system and produces about 171 megabytes of files on /pxeroot. Surprisingly, subsequent makes of the same source take the same amount of time.

Copying the existing installation

An alternative to make world is to copy the required files. For reasons unclear to us this didn't work when we started, however we are now having success with the following which is much quicker than "make world":

cd /
cp -pR bin boot cdrom compat dist etc lib /pxeroot
cp -pR libexec media rescue root sbin sys usr var  /pxeroot
rm /pxeroot/var/db/mounttab
cd /pxeroot
mkdir dev proc tmp mnt
chmod 1777 tmp

The list of directories changes slowly with new versions of the OS so you may have to add a few if you are working with a version later than 8.1. /var/db/mounttab needs to be removed (if it exists) because it lists the currently mounted nfs partitions on the boot server, which you don't necessarily want the client to mount. /dev and /proc will be created and populated at boot time. We generally put /tmp on a local hard disk (specified in fstab) but it will need a place to mount, which we add to pxeroot.

NFS exports

NFS is part of the default install, but isn't turned on. To turn it on add this to /etc/rc.conf:

nfs_server_enable="YES"
rpcbind_enable="YES"

Edit the /etc/exports to export the root partition read-only:

/pxeroot -ro -maproot=0 -alldirs client1 client2...

We set "maproot=0" so that the client will have access to the password database on the server - which is readable only by root. We enumerate the allowed clients so that the password database and other material is not made readable to an insecure client. At this point you can reboot the client, and some semblance of a Unix system should load. If pxeboot complains that the kernel is unavailable, check that the NFS export is functioning and that forward and reverse DNS match.

showmount -e
will show the server exports.

Once FreeBSD presents a login prompt, you can login as root (no password yet) and observe that /etc is nearly empty. With a diskless client, many of the tasks that sysinstall would otherwise do for you are done by rc.initdiskless, which runs each time a client boots with an NFS root but which (unlike sysinstall) doesn't have a default set of files to install. It will copy files to the memory resident /var and /etc directories according to the contents of /pxeroot/conf, which you need to populate yourself. The "diskless" man page describes the configuration of that script in greater detail than presented here. One facility is described there that doesn't work with DHCP - ${class}.

Briefly, initdiskless copies over to the client /etc the files found in /pxeroot/conf/base/etc/, then copies over that the files found in /pxeroot/conf/default/etc/, then copies over that the files in /pxeroot/conf/hostname/etc (where hostname is the hostname or dotted numeric ip address of the client. A similar procedure is performed for the var directory.

Start with a copy of the standard /etc:

mkdir -p /pxeroot/conf/base
cp -r /etc /pxeroot/conf/base/etc

This is a good time to reboot the client and note that it now boots with the same configuration as the diskless server. If all your clients are identical, you could just edit the configuration files (rc.local, rc.conf, fstab) in /pxeroot/conf/default/etc, but we need separate configurations for each of our clients so we create files such as /pxeroot/conf/66.251.72.16/etc/rc.conf or /pxeroot/conf/client1/etc/rc.conf to contain rc/conf for the client with that IP address. No mount for / is required in any fstab file you create. However, if there is a /pxeroot/etc/fstab that shows a local disk as the root FS (as would happen if you just copy a locally installed distribution into /pxeroot) then it will mount that as /, defeating the diskless boot.

We add the following to every /pxeroot/conf/.../etc/rc.conf but your needs may vary:

amd_enable="YES" 
usbd_enable="YES" 
sshd_enable="YES" 
ntpupdate_enable="YES" 
nis_client_enable="YES"
nisdomainname="nberorgyp" 
nfs_client_enable="YES" 
rpcbind_enable="YES" 
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
nfs_client_enable is redundant, but lockd and statd need to be explicitly turned on or nfs file locking will not occur.

Passwords

Here is a script that will copy the server password file to the common client /etc. With this script if you asked for NIS in the original server install, it will be available to the client also.

cd /etc
cp passwd master.passwd /pxeroot/conf/default/etc/
cd /pxeroot/etc
pwd_mkdb -d /pxeroot/etc master.passwd

Note the space before master.passwd in the pwd_mkdb command. Somehow this (or an equivalent) script will have to be run whenever accounts are updated. This does give all the clients the same root password.

Proc filesystem

Eventually we noticed that there was no proc filesystem, which we remedied by adding: proc /proc procfs rw 0 0 to /pxeroot/conf/defaults/etc/fstab.

Mounting non-os filesystems

Since / is nfs mounted and read only, you can't mount arbitrary drives on it. You can only mount drives that have mount points established in the /pxeroot/conf system. For that reason we make a /var/mnt under the conf directory and use that directory for mounting remote filesystems.
mkdir /pxeroot/conf/default/var/mnt

Upates with pkg-add

pkg-add keeps a database of installed packages at /var/db/pkg. We do updates from a client machine with rw access to /pxeroot and the original freebsdboot:/pxeroot/var mounted as its /var.

Daemons

adapted from a message from Alex Aminoff

We have daemons that run on only one or a few of our diskless servers. We could just have separate rc.conf files for each server, but that makes maintainance error-prone. But you can put portions of rc.conf in separate files in /pxeroot/conf/.../etc/rc.conf.d and all will be concatenated to rc.conf when the diskless system boots. That is, we can create the file /pxeroot/conf/client1/etc/rc.conf.d/foo with the content:

foo_enable="Yes"
foo_config="/etc/foo.rc"
and client1 will boot running the foo service. This also makes it easy to check which machines are running foo with:
ls /pxeroot/conf/*/etc/rc.conf.d/foo

Power Failures

We have several multi-hour power failures each year, but it isn't necessary to maintain services during the outage, provided service restores when power returns. Our initial thought was that with only /tmp on local storage, there wasn't a great need for any UPS on each client computer. However we quickly ran into the problem that after power returns the client compute servers would attempt to boot before the NFS boot server was ready. The clients only attempt to load the kernel once, and then hang if it isn't yet available.

Some older NICs and some recent motherboards will keep requesting DHCP addresses untill they obtained one, so with those clients one could simply ensure that dhcpd didn't come up before the tftp and boot NFS servers. Few of our clients include that desirable feature. We are working on a solution involving pxelinux.

Details you won't hear about elsewhere

Network problems

With the OS on the network the client will freeze on any network interruption. It will retry at increasingly infrequent intervals, that will give the impression that the client is dead. With our dumb switches, rebooting a switch causes a brief pause in access, but no loss of connection. Our new 10GBE switch takes 2 minutes to reboot, and users are likely to give up before the connection is re-established. The authors of the NSF protocol were concerned that dropped NFS packets were likely due to an over-busy server, while our dropped packets are almost always from network maintenance. We are not aware of a way to speed up the reconnection process.

DNS

The DNS servers you specify in dhcp.conf are ignored after booting, at which time only the servers in /etc/resolv.conf are consulted.

Volitile /var

According to the Unix Filesystem Standard /var is for variable data that must be preserved across reboots. There are a lot of applications that depend on that, and each needs special treatment when /var is a memory filesystem.

Cron, at, batch

The crontab database lives in /var, if you want it to run on the clients you have to muck with the /pxeroot/conf tree. At and batch store their requests in directories /var/at and /var/batch. You would have to relocate those directories to persistent storage to preserve requests across reboots. We also needed to touch /var/at/at.deny, create /var/at/spool, and make both /var/at/jobs and /var/at/spool owned by daemon (not root).

Printing

Replies to PR 71488 indicate that /etc/rc.d/var should prepare for and start the line printer daemon, however we found it necessary to add the following commands to /etc/rc.local to enable printing:

/usr/sbin/chkprintcap -d
lpr
G. Lembono (Gunawan) reports that rpc_statd uses a random port in a range that includes some IANA assigned ports, including 631 (the CUPS port). He suggests setting a fixed, known to be unused for rpc_statd on the client rc.conf:
 
rpc_statd_flags="-p 704" 
and on the server:
rpc_statd_flags="-p 703" 
This is a known bug that was (in my opinion) improperly closed with a workaround.

Syslog

With the above setup, system logs are lost on reboot, you probably want to establish a syslog server and direct logs there. We have a syslog server named just that and add the following line to the default /etc/syslog.conf:

*.*              @syslog

Sendmail

There are numerous writable files in /etc/mail of a genuine sendmail installation, These are not written by sendmail itself but by makemap and newaliases. Some facility is probably required for updating /etc/mail on the clients when /etc/mail on the server is updated, or a client reboot will be required whenever the aliases or access files change.

We ran into the limitation that rc.initdiskless cannot copy a symlink on top of another symlink if the target symlink points to a directory. If all the clients forward mail to a mailhub, such changes can perhaps be avoided. The source for rc.initdiskless includes information on advanced use.

NTP

ntp likes to keep a driftfile on /var. We haven't done anything to keep it between reboots.

Samba

Samba creates a number of temporary files in /var/db/samba when it starts, but does not create the directory itself. Also, smbpasswd and secrets.tdb in /usr/local/etc/samba need to be writeable, and samba needs to be able to create files in that directory. Lastly, logging needs to be relocated from /usr/local/samba/var with the "syslog" or "log file" directives in smb.conf.

SSH daemon

The first time it runs on any host sshd creates some key files in /etc/sshd that are specific to the host IP address. If you preserve these files by copying them to /pxeroot/conf/ipaddress/etc/sshd/ then connections to these machines won't imagine that they are the victim of a man-in-the-middle attack.

/tmp

rc.initdiskless actually creates a small memory filesystem /tmp, which has not been a problem for us, largely because we use a larger /tmp on a local drive. Otherwise /tmp could be mounted over NFS (but must not be shared). If /tmp (and other local drives) are left out of the fstab, then an unclean shutdown won't prevent the system from booting to network mode, with sshd running. That is essential for us, so that local drives can be fscked remotely.

strace and sudo

In older versions of FreeBSD these terminate with an IO error, but by 8.1 (or perhaps somewhere before) they do work.

Gnome, hald, policykit, dbus

We have not put any effort into getting Gnome (or any GUI) running on the client, however Gunawan has sent us instructions with a request that we not post them. If you would like to see them, I can email them to you.

Other applications

Although we haven't been using them in our diskless booted clients, I am aware that the following default to keeping their databases in /var: GNU Mailman, MySQL, named (BIND), and nis (yp). Presumably a symbolic link is sufficient to move the actual files to permanent storage if you need to keep updates across boots, but as mentioned above, rc.initdiskless may choke on the symlink, and you may have to write a script to handle this after booting is complete.

Some applications, such as grepmail put configuration, cache or other (sometimes hidden) files in the home directory of the user. These will fail for the root user whose home directory is /root.

Bind, djbdns

In 8.0 only, it is not possible to "ping localhost" on the diskless client. See this bug report. This affects the ability to run BIND and djbdns.

Non-problems

We did not run into the problems with SSH and vi discussed in those threads, perhaps because they relate to earlier versions of FreeBSD or those applications. The Handbook mentions problems with swap and X. We don't attempt to swap over the network - our systems have local swap space or do without. We haven't tried to use X, and the handbook instructions do not seem sufficient. If you have made it work, we would like to hear from you.

Nor did we have the problems with syslogd, moused and devd reported in this posting, perhaps because we are not attempting to mount /var over NFS.

Server Restrictions

Currently we have moved our root filesystem to a Netapp filer, which creates an additional problem, Updates won't work unless the "chflags" command is turned into a noop because that command won't work over NFS.

The NetApp creates an additional problem with version 8.0 only - solved here by setting boot options to require NFS version 2.

Adam Feigin reports that

mounting the root file system via NFS does *NOT* work with anything based on debian squeeze (Ubuntu 10.04, for example, which I have several systems), but it *DOES* work on debian wheezy based (Ubuntu 12.04)

The problem manifests by the message "panic nfs_boot: mountd root, error=72". This is not something we have tried ourselves.

I expect there are other problems we haven't run into. The Handbook mentions problems with /dev - this hasn't been the case for us.

Hardware Problems

We have seen none, however if you use Brocade switches, see this.

Boot Menus

We have been network booting FreeBSD for some time with pxeboot. But now we would like to have menu of OSs to boot so we are experimenting with adding pxelinux before the pxeboot step. This is experimental.

Copy pxelinux.0 from the "core" directory of the syslinux-4.02 distribution to /tftpboot and replace the string "pxeboot" with "pxelinux.0" in the dhcpd.conf file. Pxelinux has a potentially complex configuration which is documented elsewhere, but at a minimum you would need a directory /tftpboot/pxelinux.cfg for the configuration files. A simple file to start with could be named default.cfg and contain:

UI menu.c32 default freebsd label freebsd PXE pxeboot

Of course a menu with just a single item isn't very interesting, but this covers the FreeBSD specific information we have about pxelinux. pxelinux is pickier about permissions than pxeboot. Both pxeboot and pxelinux.0 will need execute permission for all and of course /tftpboot/default.cfg will need public read access.

Note that gpxelinux did not work for us. (It hangs once a menu item is selected, or if more than one choice is available).

An update

Date: Tue, 06 Jan 2015 14:48:38 -0500 From: Alex Aminoff <aminoff@nber.org> To: "it-staff@nber.org" <it-staff@nber.org> Subject: FIXED: ldconfig was messed up by var and cleanvar At some point FreeBSD added /etc/rc.d/var and cleanvar. var makes an in-memory filesystem for /var if there is not already one there. It tries to copy useful generic stuff into that directory. cleanvar is another startup script that deletes what it considers unnecessary things in that directory. Both of these play havoc with our diskless architecture. I have added to rc.conf populate_var="NO" cleanvar_enable="NO" and it appears that this fixes the problem. - Alex

Comments and suggestions on diskless booting are welcome.

Daniel Feenberg
feenberg isat nber dotte org
(with thanks to Alex Aminoff, Mohan Ramanujan, Scott Bertilson and Clarence Chu. Inspired by Kenneth Cleary)


Date last modified: June 2nd, 2012